Are you kidding? We’ll never use open-source!

Back in the days when I first started using OSS/BSS software tools, there was no way any respectable telco was going to use open-source software (the other oss, for which I’ll use lower-case in this article) in their OSS/BSS stacks. The arguments were plenty, and if we’re being honest, probably had a strong element of truth in many cases back then.

These arguments included:

  • Security – This is the most commonly cited aversion I’ve heard to open-source. Our OSS/BSS control our network, so they absolutely have to be secure. Secure across all aspects of the stack from network / infrastructure to data (at rest and in motion) to account access to applications / code, etc. The argument against open-source is that the code is open to anyone to view, so vulnerabilities can be identified by hackers. Another argument is that community contributors could intentionally inject vulnerabilities that aren’t spotted by the rest of the community
  • Quality – There is a perception that open-source projects are more hobby projects  than professional. Related to that, hobbyists can’t expend enough effort to make the solution as feature-rich and/or user-friendly as commercial software
  • Flexibility – Large telcos tend to want to steer the products to their own unique needs via a lot of customisations. OSS/BSS transformation projects tend to be large enough to encourage proprietary software vendors to be paid to make the requested changes. Choosing open-source implies accepting the product (and its roadmap) is defined by its developer community unless you wish to develop your own updates
  • Support – Telcos run 24x7x365, so they often expect their OSS/BSS vendors to provide round-the-clock support as well. There’s a  belief that open-source comes with a best-effort support model with no contracted service obligations. And if something does go drastically wrong, that open-source disclaims all responsibility and liability
  • Continuity – Telcos not only run 24x7x365, but also expect to maintain this cadence for decades to come. They need to know that they can rely on their OSS/BSS today but also expect a roadmap of updates into the future. They can’t become dependent upon a hobbyist or community that decides they don’t want to develop their open-source project anymore

Luckily, these perceptions around open-source have changed in telco circles in recent years. The success of open-source organisations like Red Hat (acquired by IBM for $34 billion on annual revenues of $3.4 billion) have shown that valuable business models can be underpinned by open-source. There are many examples of open-source OSS/BSS projects driving valuable business models and associated professionalism. The change in perception has possibly also been driven by shifts in application architectures, from monolithic OSS/BSS to more modular ones. Having smaller modules has opened the door to utilisation of building block solutions like the Apache projects.

So let’s look at the same five factors above again, but through the lens of the pros rather than the cons.

  • Security – There’s no doubt that security is always a challenge, regardless of being open-source or proprietary software, especially for an industry like OSS/BSS where all organisations are still investing more heavily in innovation (new features/capabilitys) more than security optimisations. Clearly the openness of code means vulnerabilities are more easily spotted in open-source than in “walled-garden” proprietary solutions. Not just by nefarious actors, but its development community as well. Linus’ Law suggests that “given enough eyeballs, all bugs (and security flaws) are shallow.” The question for open-source OSS/BSS is whether there are actually many eyeballs. All commercially successful open-source OSS/BSS vendors that I’m aware of have their own teams of professional developers who control any changes to the code base, even on the rare occasions when there are community contributions. However, many modern open-source OSS/BSS leverage other open-source modules that do have many eyes (eg linux, snmp libaries, Apache projects, etc)
  • Quality – There’s no doubt that many open-source OSS/BSS have matured and found valuable business models to sustain them. With the profitable business model has come increased resources, professionalism and quality. With the increased modularity of modern architectures, open-source OSS/BSS projects are able to perform very specific niche functionalities. Contrast this with the monolithic proprietary solutions that have needed to spread their resources thinner across a much wider functional estate. Also successful open-source OSS/BSS organisations tend to focus on product development and product-related services (eg support), whereas the largest OSS/BSS firms tend to derive a much larger percentage of revenues from value-added services (eg transformations, customisations, consultancy, managed services, etc). The latter are more services-oriented companies than product companies.
  • Flexibility – There has been a significant shift in telco mindsets in recent years, from an off-the-shelf to a build-your-own OSS/BSS stack. Telcos like AT&T have seen the achievements of the hyperscalers, observed the increased virtualisation of networks and realised they needed to have more in-house software development skills. Having in-house developers and access to the code-base of open-source means that telcos have (almost) complete control over their OSS/BSS destinies. They don’t need to wait for proprietary vendors to acknowledge, quote, develop and release new feature requests. They can just slip the required changes into their CI/CD pipeline and prioritise according to resource availability
  • Support – Remember when I mentioned above that OSS/BSS organisations have found ways to build profitable business models around open-source software? In most cases, their revenues are derived from annual support contracts. The quality and coverage of their support (and the products that back it up) is directly tied to their income stream, so there’s commensurate professionalism assigned to support
  • Continuity – This is perhaps the most interesting one for me. There is the assumption that big, commercial software vendors are more reliable than open-source vendors. This may (or might not) be the case. Plenty of commercial vendors have gone out of business, just as plenty of open-source projects have burned out or dwindled away. To counter the first risk, telcos pay to enter into software escrow agreements with proprietary vendors to ensure critical fixes and roadmap can continue even in the event that a vendor ceases to operate. But the escrow contract may not cover when a commercial vendor chooses to obsolete a line of software or just fail to invest in new features or patches. They’re effectively paying an insurance fee to have access to the code for operational continuity purposes but escrow may still not be as open as open-source, which is available under any scenario. But the more important continuity consideration is the data and data is the reason OSS/BSS exist. When choosing a commercial provider, especially a cloud software / service provider, the data goes into a black box. What happens to the data inside the black box is proprietary and often what comes out of it is also. Telcos will tend to have far more control of their data destinies for operational continuity if using open-source solutions

Now, I’m not advocating one or the other for your particular situation. As cited above, there are clearly pros and cons for each approach as well as different products of best-fit for different operators. However, open-source can no longer be as summarily dismissed as it was when I first started on my OSS/BSS journey. There are many fine OSS and BSS products and vendors in our Blue Book OSS/BSS Vendor Directory that are worthy of your consideration too when looking into your next product or transformation.

How to calculate the right jeopardy metrics in your end-to-end workflows

Last week we created an article that described how to use your OSS/BSS log data to generate reliable / quantifiable process flow diagrams.

We’ve expanded upon this research to identify a reliable calculation of jeopardy metrics. Jeopardy Management is the method for notifying operators when an in-flight workflow (eg customer order, etc) is likely to breach targets such as RFS date (ie when the customer’s service will be available for use) and/or SLAs (service level agreements) are likely to be breached.

Jeopardy management techniques are used to predict forward before a breach has occurred, hopefully. For example if an Order to Activate workflow for a particular product type consists of 10 steps and only the first 2 steps are completed within 29 days of a target of 30 days RFS, then we could expect that the RFS date is likely to be missed. The customer should be alerted. If the right trackers were built, this order should’ve had a jeopardy notification long before 29 days had elapsed. 

In the past, jeopardy indicators have tended to be estimated thresholds. Operators have tended to set notifications based on gut-feel (eg step 2 must be completed by day 5).  But through the use of log data, we can now provide a more specific jeopardy indicator for every step in the process.

The chart above shows every activity within a workflow across the horizontal axis. The vertical axis shows the number of days elapsed since the start of the workflow.

By looking at all past instances of this workflow, we can show the jeopardy indicator as a series of yellow dots. In other words, if any activity has ever been finished later than its corresponding yellow dot, then the E2E workflow it was part of has breached its SLA

To use a more familiar analogy it’s the latest possible date that you can study for exams and still be able to pass the subject, using time-stamped historical data. Not that I ever left it late to study for exams back in uni days!!  🙂

And yet if you look closely, you’ll notice that some blue dots (average elapsed time for this activity) in this example are higher than the jeopardy indicator. You’ll also notice that the orange dots (the longest elapsed time to complete this task across all instances of this workflow according to log data) are almost all above the jeopardy indicator. Those examples highlight significant RFS / SLA breaches in this data set (over 10% are in breach).

Leave us a note below if you’d like us to assist with capturing your jeopardy indicators and identifying whether process interventions are required across your OSS/BSS.


How to Document, Benchmark and Optimise Operational Processes

Have you been tasked with:

  1. Capturing as-is process flows (eg swim-lane charts or BPMN [Business Process Model and Notation] diagrams)
  2. Starting a new project where understanding the current state is important
  3. Finding ways to optimise day-to-day activities performed by your team
  4. Creating a baseline process to identify automation opportunities
  5. Comparing your current processes with recommendations such as eTOM or ITIL
  6. Identifying which tasks are leading to SLA / OLA breaches

As you may’ve experienced during project kick-off phases, as-is processes are usually not well defined, captured or adequately quantified (eg transaction volumes, duration times, fall-outs, etc) by many customers. 

If process diagrams have been captured, they’re often theoretical workflow maps developed by Business Analysts and Subject Matter Experts to the best of their knowledge. As such, they don’t always reflect real and/or complete flows. They may have no awareness of the rare flows / tasks / situations that can often trip our OSS/BSS tools and operators up. The rarer the sub-flows, the less likely they are to be documented.

Even if the flows have been fully documented, real metrics / benchmarks are rarely recorded. Metrics such as end-to-end completion times and times taken between each activity within the flow can be really challenging to capture and visualise, especially when you have large numbers of flows underway at any point in time.

Do you struggle to know where the real bottlenecks are in your process flows? Which tasks cause fall-outs? Which team members need advanced training? Which process steps have the largest differences in max / min / average durations? Which steps are justified to build automations for? As the old saying goes, if you can’t measure it, you can’t manage it.

You need quantitative, not qualitative understanding of your workflows

As a result, we’ve developed a technique to reverse-engineer log data to map and quantify processes. Logs that our OSS/BSS routinely collect and automatically time-stamp. By using time-stamped logs, we can trace every step, every flow variant, every sequence in the chain and every duration between them. This technique can be used on fulfilment, assurance and other flows. The sample below shows transaction volumes / sequences, but can also show durations within the flows:

Note that this and subsequent diagrams have been intentionally left in low-res format here on this page.

Better than just volumes, we can compare the max / mean / min processing times to identify the duration of activities and show bottlenecks (thicker red lines in the diagram below) as well as identifying hold-ups and inconsistencies in processing times:

By combining insights from flow volumes and timings, we can also recommend the processes and/or steps that optimisation / automations are most justified for.

We can also use monitoring of the flows to identify failure situations that have occurred with a given process, such as the examples highlighted in red below. 

We can also use various visualisation techniques to identify changes / trends in processing over time. These techniques can even assist in identifying whether interventions (eg process improvements or automations) are having the intended impacts.

The following chart (which is clickable) can be used to identify which tasks are consistently leading to SLA (Service Level Agreement) breaches.

The yellow dots indicate the maximum elapsed time (from the start of a given workflow) that has not resulted in the SLA breach. In other words, if this activity has ever been finished later than the yellow dot, then the E2E workflow it was part of has breached its SLA. These numbers can be used during a workflow to predict likelihood that it will breach SLA. It can also be used for setting jeopardy values to notify operators of workflow slippages.

There are a few other points of interest in this chart:

  • Orange dots indicate the longest elapsed time for this activity seen within all flows in the log data
  • Grey dots indicate the shortest elapsed time from the beginning of a workflow
  • Blue dots indicate the average elapsed time
  • Yellow dots are the Jeopardy indicator, meaning that if the elapsed time of this activity has ever exceeded this value then it has gone on to breach SLA
  • The red line is SLA threshold for this particular workflow type
  • Shaded box shows tasks that have never been in an E2E flow that has met SLA
  • You’ll notice that many average values (blue dots) are above jeopardy, which indicates this activity is regularly appearing in flows that go on to breach SLA levels
  • Almost all max values are above jeopardy (most are so high that they’re off the top of the scale) so most activities have been part of an E2E flow that has breached SLA
  • The shaded blue box shows tasks that have never been in an E2E flow that has met SLA!!
  • Needless to say, there were some interventions required with this example!

Operational Process Summary

As described above, using log data that you probably already have ready access to in your OSS/BSS, we can assist you to develop quantifiable process flow information. Having quantifiable data in turn can lead to greater confidence in initiating process interventions, whether they are people (eg advanced training), process (eg business process re-engineering) or technology (eg automations).

This technique works equally well for:

  • Understanding the current situation before commencing an OSS/BSS transformation project
  • Benchmarking and refining processes on an OSS/BSS stack that is running in business-as-usual mode
  • Highlighting the impact a process intervention has had (ie comparing before and after)

Would you like to book a free consultation to discuss the challenges you face with your (or a customer’s) as-is process situation? Please leave your details and list of challenges in the contact form below.

019 – Modern OSS/BSS Transformation Techniques that start with the Customer Journey with Martin Pittard

Digital transformation is a term that’s entered the modern vernacular, but here in the world of OSS/BSS it’s just what we’ve been doing for decades. Whether aimed at delivering digital services, collecting data from all points of an organisation’s compass, increasing the internal efficiencies of operational teams or improving user experiences externally, this is just what our OSS/BSS tools and projects do.

Our guest on today’s episode, Martin Pittard, has been leading digital transformations since long before the digital transformation term existed. As Principal IT Architect at Vocus ( Martin is in the midst of leading his most recent digital transformation (ie OSS/BSS transformation project). On this latest transformation, Martin is using a number of new techniques plus well-held architectural principles including the use of dynamic / Open APIs (a TM Forum initiative), being catalog-driven, standards-based, model-based and having an intense focus on separation of concerns. Of perhaps even greater focus is the drive to improve customer journeys as well as ensuring solution flexibility to support customer interactions across future business and service models.

It was a recent talk at a TM Forum event in Sydney that reinforced our interest in having Martin on as a guest. During this presentation, Martin shared some fantastic ideas on how Vocus is tackling the specific challenges and techniques of its OSS/BSS transformation. So good was it that we turned it into an article on our blog. A video of Martin’s in-depth presentation plus a summary of key points can be found here:

In addition to the Vocus transformation, Martin also shares stories and insights from past transformations at organisations like Rockwell (building combat systems for submarines), Fujitsu, the structural separation of British Telecom to form Openreach, Alcatel-Lucent (transforming the Telstra network and OSS/BSS) and then nbn. On the latter, Martin spent 8+ years leading the build of mission-critical systems across industry integrations (ie customer-facing systems) and network assurance for nbn. During that time, Martin led a large team through the transition to Agile delivery and recounts some of the challenges, benefits and insights from embarking on that journey.

For any further questions you may have, Martin can be found at:

Disclaimer. All the views and opinions shared in this podcast, and others in the series, are solely those of our guest and do not reflect the opinions or beliefs of the organisations discussed.

How to improve user experience with a headless OSS

The first OSS/BSS I used, back in 2000, was built around an Oracle relational database and the GUI was built using Oracle Forms (as well as some C++ apps). The developers had implemented a concept they referred to as a boilerplate. It basically allowed the customers to modify any label they wished on the forms.

When I say labels, I mean any of the default text fields, such as the following in the diagram below:

  • Connecting Links
  • Connect Selected Endpoints
  • Disconnect Both Sides
  • Disconnect A Side
  • Disconnect B Side

By default, the forms in my first OSS were all written in English. But the boilerplate functionality offered the customer flexibility. Instead of “connecting links,” they may’ve preferred to call it “cross-connects” or “enlaces de conexión.” It was perfect for supporting different languages without changing the code. At the time, I thought this was a really neat feature. It was the first sign I’d seen of codeless flexibility in the UI of an OSS.

These days, most OSS/BSS need drastic improvements in their UI. As we described previously, they tend to be non-intuitive and take too long to master. We need to go further than boilerplate functionality. This is where headless or decoupled architectures appear to hold some promise.  But we’ll loop back to the how in a moment. First, we’ll take a look at the why.

Apple has shown the tech world the importance of customer experience (CX) and elegant UI / industrial design. Much like the contrast between the iPod and previous MP3 players, our OSS/BSS are anonymous, poorly made objects. We have some catching up to do.

But, let’s first start by asking who is the customer we’re designing a Customer Experience for? Well, there are two distinct categories of customer that we have to design our OSS/BSS and workflows for, as shown in the sample Order to Cash (O2C) flow infographic below.

  • We have the first level of customers, Customer 2, the operators in the diagram below. These use our OSS/BSS directly but often behind the scenes.
  • Then there’s the second level of customers, Customer 1, the end users who also interact with our OSS/BSS, but often indirectly

The end users need to have a CX that appears highly integrated, smooth and seamless. It has to appear consistent, even though there are multiple channels that often aren’t linked (or even linkable – eg a customer might interact with an IVR or chatbot without revealing personal identifiers that can be linked with a customer ID in the OSS/BSS).

The end user follows the journey through the left-hand column of the infographic from start to finish. However, to deliver upon the flow on the right-side of the infographic, the CSP side, it’s likely that dozens of operators using many completely unrelated applications / UIs will perform disparate activities. They’ll perform a small subset of activities, but for many different end-users within a given day. It’s highly unlikely that there will be a single person (right side) mirroring the end-user journey (left side), so CSPs have to hope their workflows, data flows and operators all work in unison, without any fall-outs along the way.

Along the right-hand path, the operators tend to have a plethora of different back-end tools (as implied in the comments in the O2C flow above). They might be integrated (via API), but the tools often come from different vendors, so there is no consistency in UI.

The “headless” approach (see article from for further info), allows the user interface to be decoupled from the application logic and data (as per the diagram below). If all the OSS/BSS tools along the right-hand path of the O2C infographic were headless, it might allow an integrator to design a smooth, consistent and robust end-to-end customer experience across many different vendor applications.

A few other thoughts too:

  • Engineers / Developers can create the application logic and data manipulation. They only need to solve this problem once and then move onto solving for the next use-case. It can then be handed to UX / UI experts to “skin” the products, potentially trialing many different UI variants until they find an optimal solution. UX experts tell me trial and error is the real science behind great user interface design. This is also reflected in great industrial design by organisations like Apple that create hundreds of prototypes before going to market. Our engineers / devs are too precious a resource to have them trialing many different UI variants
  • Most OSS/BSS have a lot of functionality baked into the product and UI. Unfortunately, most operators won’t need to use a lot of the functionality. Therefore it’s just cluttering up the UI and their daily activities. The better approach is to only show the functionality that the operators actually need
  • This “just show the operators what they need” concept can also allow for separation by skill. For example, a lot of workflows (eg an O2C or T2R) will have high-volume, highly repeatable, fast turnover variants that can be easily taught to new operators. However, the same O2C workflow might have less common or obscure variants, such as fall-outs that require handling by exception by more highly experienced / skilled / trained operators. You might choose to have a fast-lane (high-volume) UX versus a high-control UX for these scenarios 
  • A consistent UI could be underpinned by any number of applications. Theoretically, one OSS/BSS app could be swapped out for another without any change in workflow for the end user or even the CSP‘s operator
  • Consistent UIs that are matched to context-sensitive workflows could also streamline the automation and/or RPA of each
  • Consistent UIs allow for easier benchmarking for speed and effectiveness, which in turn could help with feedback loops, such as autonomous network concepts
  • UI / UX experts can design style guides that ensure consistency across all applications in the OSS/BSS stack
  • With the iPod, Apple arrived at the insight that it made more sense to take much of the logic “off-device”

Hat tip to George for the headless UI concept as a means of improving UX!

PS. I should also point out that another parallel universe I’m diving into is Augmented Reality based OSS/BSS and it provides a slightly different context when mentioning headless UX. In AR, headless implies having no screen to interact with, so the UX comes from audio and visual presentation of data that diverges significantly from the OSS/BSS of today. Regardless, the separation of logic / data from the front-end that we described earlier also seems like a step towards future AR/VR-based OSS/BSS.

018 – How a NaaS Transformation can Revolutionise your OSS_BSS Stack with Johanne Mayer

OSS/BSS stacks can be incredibly complex and cumbersome beasts, especially in large carriers with many different product, process and network variants. We don’t make that task any easier by creating many unique product offerings to take to market. And this time to market can be a significant competitive advantage, or be a serious impediment to it. NaaS, or Network as a Service, is a novel approach to increasing flexibility in our OSS/BSS stacks, inserting an API layer that provides a separation of concerns.

Our guest on today’s episode, Johanne Mayer, is so passionate about the benefits of NaaS-based transformations that she’s formed a company named NaaS Compass( to assist others with their transformations. She provides the hard-won experience from being involved with NaaS transformations at organisations like Telstra.

Prior to embarking on this latest venture, Johanne has also worked with many of the most iconic organisations in the telco / OSS/BSS industries. These include Nortel, Alcatel-Lucent (now part of Nokia), Ericsson, Oracle, Ciena Blue Planet and Analysis Mason. Johanne takes us on a journey through a career that has seen her work on exciting projects from the days of NMS and X.25 networks to more recent projects leading collaborative transformation with standards organisations like TM Forum (where she is a Distinguished Fellow), MEF and ETSI.

For any further questions you may have, Johanne can be found at:

Disclaimer. All the views and opinions shared in this podcast, and others in the series, are solely those of our guest and do not reflect the opinions or beliefs of the organisations discussed.

Just Launched: Do you Want to Learn more about OSS/BSS?

The world of OSS/BSS is changing rapidly. There’s so much to learn. New forms of information, approaches and technologies are proliferating. Are you wrestling with the challenge of trying to determine what OSS/BSS training you and/or your team requires? We’ve just revised and re-launched our core OSS/BSS training offerings.

Click on the image below to open the OSS/BSS Training Plan, which includes details about each course we offer.

The courses described in the OSS/BSS Training Plan include:

  1. An Introduction to OSS/BSS (PAOSS-INT-01)
  2. Strategic Analysis of Your OSS/BSS (PAOSS-INT-02)
  3. Creating an OSS/BSS Transformation Plan (PAOSS-PRE-01)
  4. OSS/BSS Persona and Workflow Mapping (PAOSS-PRE-02)
  5. Gathering Your Specific OSS/BSS Requirements (PAOSS-PRE-03)
  6. Choosing the Right OSS/BSS Products for Your Needs (PAOSS-PRE-04)
  7. Preparing Your OSS/BSS Business Case (PAOSS-PRE-05)
  8. OSS/BSS Project Planning (PAOSS-PRE-06)
  9. Developing your OSS/BSS Roadmap (PAOSS-PRE-07)
  10. Identifying and Mitigating Your Biggest OSS/BSS Risks (PAOSS-EE-01)
  11. Developing a Data Integration Plan (PAOSS-EE-02)
  12. Integrating with other Systems (PAOSS-EE-03)
  13. Defining OSS Naming Conventions (PAOSS-EE-04)

The attached Training Plan also describes how we can assist you to develop long-term training and mentoring programs to supplement the day-to-day tasks performed by your trainees in their OSS/BSS-related roles.

Are there other subjects you’re interested in that aren’t outlined above? Are you interested in making a group booking? We’d be delighted to work with you to develop customised training for your organisation’s specific needs. Feel free to leave us a note via the contact form below.

017 – Leading Global OSS/BSS Transformation through Collaboration with George Glass

Our OSS and BSS are highly complex by nature. However, we seem to do a great job of making them more complex, more challenging, less repeatable and hence, more difficult to change. Perhaps that caters to our deeper desires – so many of us in this industry love to prove our worth by solving complex problems.

Our guest on this episode, George Glass, has spent a career looking for ways to remove complexity and increase re-use in our OSS/BSS stacks. First during 31 years (to the day) working as a developer, architect and executive at BT. Now at TM Forum, where he’s CTO and continuing to carry the flame of next generation architectural concepts like ODA and the Open API initiative that started when George was still at BT.

George walks us through a career that started with cutting code on BT’s NMS solutions and the charging systems that allowed BT to drive (significant) revenue. He talks us through very early separation of charging, taking it away from mainframes and onto Unix server farms. He also discusses how he was instrumental in the development of BT’s SOA (Service Oriented Architecture) in circa 2008-9, which generated over £300M in cost-benefit for BT and remains in use (in a modified form) to this day. He also discusses how BT’s structural separation to form Openreach had architectural ramifications and learnings that propagated to other carrier environments around the world.

George then goes on to talk about the origins of TM Forum’s modern flagship architectural models and how they’re assisting their members with digital transformations globally. Not just telcos and their supporting vendors / partners / integrators, but also across other industries (including George’s favourite, the automotive industry).

For any further questions you may have, George can be found at:

Disclaimer. All the views and opinions shared in this podcast, and others in the series, are solely those of our guest and do not reflect the opinions or beliefs of the organisations discussed.

How Network Operations Centre (NOC) Efficiency is Powered by Your OSS/BSS

We just launched a new video series describing the Fundamentals of OSS/BSS yesterday. One of the videos in the series describes Network Operations Centres (or NOCs) for telcos or network operators. It also provides examples of the OSS/BSS tools and data sets that help to power them. 

Whilst creating this video, it dawned on me that we’ve done over 2,500 posts, but none specifically about Network Operations Centres (aka Network Management Centres). A significant oversight that must be addressed!! NOCs are the telco’s nerve centre through which the network is monitored and maintained. They are the ultimate insurance policy for any carrier. It also acts as the first line of defence against cyber-security attacks.

The video above provides a picture of Telstra’s GOC (or Global Operations Centre, which is just a glorified name for a NOC). The one below shows AT&T’s NOC (image courtesy of  [More about AT&T’s GNOC in a video later in this article]

In the middle band of the picture above, you’ll notice an impressive video wall with data presented from a number of OSS/BSS. These tend to show rolled-up information that gives a perspective on the current topology, traffic patterns and health of the network. There’s not a lot of red showing, so I’d assume the network was in a fairly healthy state at the time this photo was taken… Either that, or the “green screener” was activated to ensure that any visitors or VIPs weren’t scared off by the number of catastrophes that were being handled / remediated by the network operators. 🙂

Speaking of operators, you’ll notice all the operator pods in the foreground. You can see the workstations, where each operator is enveloped by multiple screens. Like the video wall, each of these operator screens will typically have multiple OSS/BSS applications open at any given point in time. Generally speaking, the operators will be dealing with more granular data-sets via OSS/BSS views on their workstations compared with those shown on the video wall. This is because they’ll be performing more specific tasks such as dealing with a specific device outage and will need to drill down to more detailed data.

Each operator has this wealth of visual real-estate for a reason. Our OSS/BSS gather, generate and process huge amounts of data with updated information arriving all the time. Operators need to pick through all these different data points and derive insights that allow them to perform BAU (Business as Usual) activities. These activities generally focus on assuring the health of the network and the customer services that are carried over that network, but can cover a broader scope too. When things get out of control, they become crisis management centres.

Information can be presented to these operators in a range of different ways depending on the task at hand, whether trying to identify a root-cause of an event / situation through to coordinating routine / preventative maintenance or remedial actions. Note that the second part of the video above gives some examples of OSS/BSS data visualisation techniques to perform these different functions.

Speaking of routine maintenance, many types of routine maintenance activities are coordinated through the NOC. This ensures there’s a coordinated management of the many changes happening during change windows. This includes maintenance of our OSS/BSS tools, which can require regular updates and patching as well as routine administrative activities. 

Our OSS and BSS assist NOC operators, mostly across the middle bands of TM Forum’s TAM (represented by blue clusters 7 to 11, plus 12 to 14, in the TAM diagram below), but potentially many others. Operators may also interact closely with the live network devices to retrieve or update device configurations. This might be achieved via command line interfaces (CLI) on devices or via EMS (Element Management Systems) and NMS (Network Management System) tools, which supplement our OSS/BSS.

Due to the complexity, variability and sheer volume of events that the NOC has to handle (and therefore the costs), our OSS/BSS can become an important efficiency engine. OSS / BSS business cases can often be built around the automation of processes, data processing and IT transactions because of the cost-benefit possibilities. The related benefits tend to be driven by human effort reductions, but also in the speed-up of fault resolution times.

This video below, courtesy of AT&T shows some of their OSS/BSS in action on the giant video wall within their NOC:

If you’re wondering about the role of NOC operators and shift managers, a great article can be found describing a day (night) in the life of Paul Harrison, Telstra’s National Emergency Response Manager, who works from the Telstra GOC. As his article highlights, most NOCs are operational 24x7x365, which requires multiple shifts (eg 3 x 8-hour shifts) to ensure 24-hour coverage. That is, they need at least 3 teams of operators to make sure the network and services are being monitored around the clock.

Depending on the functional coverage required at any given organisation, NOCs might also be considered SOCs (Service Operations Centres), SOCs (Security Operations Centes) or other names too. FWIW, In this earlier article, I pose the slightly novel concept of a DOC as well.

Also, if you’re interested, you might like to check out this video of an XR simulation of a NOC that was inspired by AT&T’s control centre. More about this project here. I’d love to get your thoughts on use-cases relating to how this could be applied. Leave us a comment below.


And speaking about mixed reality, I’m excited about how these technologies can be used to improve the command and control functionality of NOCs. Whether that’s in the form of:

  • Visual Collaboration – allowing operators in various locations to “see” the same thing (eg a person in the NOC, a worker in the field and an equipment vendor SME all viewing what the field worker can see on-site and discussing the best way to fix the on-site problem)
  • Decision Support – providing operators, especially field workers, with information provided by the NOC and/or our OSS/BSS that helps the operator perform their tasks effectively
  • Optimised UX – providing operators with OSS/BSS user interfaces (UIs) that are more intuitive and efficient to perform tasks with. Due to the massive amounts of information at NOC operator fingertips from which they have to derive actions, it seems that we need to provide far better UIs for them. Heads-up Displays (HUDs) seem like the natural progression for NOC UIs, so it’s something we’re already investing effort into. Watch this space closely in coming years

Just Launched: Fundamentals of OSS/BSS Video

We’ve just launched a multi-part video series to provide an introduction to OSS and BSS. It answers these fundamental questions and more:

  • Part 1 – What is an OSS? What is a BSS? Why are OSS and BSS even a thing?
  • Part 2 – Who uses an OSS and/or BSS?
  • Part 3 – What Functions do OSS and BSS Perform?
  • Part 4 – What Business Benefits do OSS/BSS Generate?
  • Part 5 – What’s the difference between an OSS and BSS?
  • Part 6 – How do OSS & BSS interact with a Comms Network?
  • Part 7 – What does an OSS/BSS look like?
  • Part 8 – Where can I Find Out More About OSS and BSS?:

Check it out. And give us a Like if you think it’s useful.

016 – Leading the Network Strategy and Operations at a Tier 1 Carrier with Carolyn Phiddian

If the network is ultimately the product for any network operator, then OSS/BSS are the great connectors, connecting customers to that product. Both for initial activation, but also ongoing utilisation of network resources. Whilst everyone has a different perspective on the relevance / importance of OSS/BSS, there tend to be even broader divergences of opinion across networks, operations and executive teams.

Our guest on this episode, Carolyn Phiddian, has formed a career around network strategy and is now a gun-for-hire broadband industry strategist. Networks are her main specialty, but she’s also held executive roles, including leading a team of nearly 500 people in network operations for a Tier 1 carrier. Carolyn has held roles with iconic telco organisations such as Telstra, British Telecom, Cable & Wireless, Alcatel-Lucent and more recently with nbn. Having experienced roles across network, operations and the C-suite within these organisations gives Carolyn a somewhat unique perspective on OSS/BSS.

Carolyn walks us through her career highlights to date but also shares some really important insights and experiences along the way. Some of these include:- that when Architects / Engineers are detailed, rather than big-picture people, then that detail can have the tendency to become hard-wired into their OSS/BSS, which has benefits and ramifications; that one of the biggest challenges for executives and sponsors of OSS/BSS projects is the mismatch between what people want and what they get, often through the stakeholders not knowing what they want or being able to adequately articulate it; and that diversity of thought combined with inquisitiveness are powerful traits for OSS/BSS implementation and operations teams.

For any further questions you may have, Carolyn can be found at:

Disclaimer. All the views and opinions shared in this podcast, and others in the series, are solely those of our guest and do not reflect the opinions or beliefs of the organisations discussed.

How to Transform your OSS/BSS with Open APIs

The video below, starring Martin Pittard, the Principal IT Architect at Vocus Group, provides a number of important OSS/BSS Transformation call-outs that we’ll dive into in this article.

Vocus has embarked on a journey to significantly overhaul its OSS/BSS stack and has heavily leveraged TM Forum’s Open API suite to do so. One of the primary objectives of the overhaul was to provide Vocus with business agility via its IT and digital assets.

As Martin indicates, all transformation journeys start with the customer, but there are a few other really interesting call-outs to make in relation to this slide below:

  • The journey started with 6 legacy networks and 8 legacy BSS stacks. This is a common situation facing many telcos. Legacy / clutter has developed over the years. It results in what I refer to as The Chessboard Analogy, which precludes business agility. Without a significant transformation, this clutter constrains you to incremental modifications, which leads to a Strangulation of Feature Releases.
  • Over 60,000 SKUs (Stock-keeping Units or distinct product offerings) and 100+ order characteristics is also sadly not unusual. The complexity ramifications of having this many SKUs is significant. I refer to it as The Colour Palette Analogy. The complexity is not just in IT systems, but also for customers and internal teams (eg sales, contact centre, operations, etc)
  • No end-to-end view of inventory. AKA Information of all sorts, not just inventory, spread across many siloes. The ramifications of this are also many, but ultimately it impacts speed of insight (and possibly related SLA implications). Whether that’s via swivel-chairing between products, data scientists having to determine ways to join disparate (and possibly incompatible) data sets, data quality / consistency implications, manual workarounds, etc
  • Manual Sales Processes and “Price on Availability” tend to also imply a lack of repeatability / consistency across sale (and other) processes. More repeatability means more precision, more room for continual improvement and more opportunity to automate and apply algorithmic actions (eg AI/ML-led self-healing). It’s the Mona Lisa of OSS. Or to quote Karl Popper, “Non-reproducible single occurrences are of no significance to science”… or OSS/BSS.

The Vocus transformation was built upon the following Architectural Principles:

Interestingly, Martin describes that Vocus has blended TM Forum SID and MEF data models in preparing its common data model, using standards of best-fit.

Now, this is the diagram that I most want to bring to your attention.

It shows how Vocus has leveraged TM Forum’s Open APIs across OSS/BSS functionalities / workflows in its stack. It shows actual Open API numbers (eg TMF620). You can find a table of all available Open APIs here including specifications, Swagger, Postman, etc resources.

A key to the business agility objective is making product offerings catalog-driven. Vocus’ broad list of product offerings are described as:

And underpinning the offerings are a hierarchical, catalog of building blocks:

Martin also raises the importance of tying operational processes / data-flows like telemetry, alarms, fault-fix, etc to the CFS entities, referring to it as, “Having a small set of tightly bounded reusable components.”

Martin then provides a helpful example to show flows between actual suppliers within their transformed stack, including the Open API reference number. In this case, he shows the CPQ (Configure, Price, Quote) process as part of the L2Q (Lead to Quote) process:

Note that the continuation of decomposition of actions is also described in the video, but not shown in this article.

And finally, Martin outlines the key learnings during the Vocus OSS/BSS transformation:

Final call-outs from Martin include:

  • By following standards as much as possible means less unique tech-debt to maintain
  • The dynamic architecture was the key to this transformation
  • The Open APIs have varying levels of maturity and vendor support. These Open APIs are definitely a work in progress, with evolving contributions from TM Forum members
  • Partnering with organisations that have experience with these technologies are important, but even more important is retaining design and architecture accountability within Vocus and similar carriers
  • Careful consideration of whether data needed to be retained or orphaned when bringing across to the new EIM (Enterprise Information Model)
  • There remains swivel-chair processes on some of the smaller-volume flows that didn’t warrant an API
  • Federated customer, network and services inventory information will continue to inter-work with legacy networks

Martin has also been kind enough to be a guest on The Passionate About OSS Podcast (Episode 019), where he provides additional layers of discussion to this article.

And a final call-out from me. It’s a really detailed and generous sharing of information about the recent OSS/BSS Transformation by Vocus and Martin. It’s well worth taking a closer look and listen for anyone embarking on an OSS/BSS transformation.

015 – Using Modern OSS/BSS Architectures to get Offerings to Market Fast with Greg Tilton

When it comes to OSS/BSS implementations (and products), Time to Market (TTM) is one of our most important metrics. Not just for the network operator to deliver new offerings to market, but also in getting solutions up and running quickly. Faster TTM provides the benefits of cost reduction and faster turn-on of revenue, but potentially allows the operator beat competitors to the acquisition of new customers.

Our guest on this episode, Greg Tilton, has spent many years building OSS and BSS with this key metric in mind. Initially with carriers / ISPs such as Telstra, Request, AAPT and nbn, but more recently with DGIT Systems ( Greg is a founder and CEO of DGIT, a company that has been creating BSS products since 2011 across order management, CPQ (Configure, Price, Quote), product catalog and billing (through its acquisition of Inomial in 2018).

Greg provides us with a range of helpful hints for improving TTM market across a number of facets. These include project implementations, architecture and product design as well as highlighting the importance of standardisation. On the latter, Greg and DGIT have long held a mutually beneficial relationship with TM Forum (, being both a consumer of and contributor to many of the standards that are widely used by the OSS/BSS / telco industries (and beyond).

For any further questions you may have, Greg can be found at: and via

Disclaimer. All the views and opinions shared in this podcast, and others in the series, are solely those of our guest and do not reflect the opinions or beliefs of the organisations discussed.

014 – The challenges and pitfalls awaiting OSS implementation teams with Michael De Boer

There are three distinct categories of organisations that interact with OSS/BSS – those who create them, those who use them and those who implement them. But no matter how good the first two are (ie the products / creators and the users), if the implementation isn’t done well, then the OSS/BSS is almost pre-destined to fail. There are many, many challenges and pitfalls that await implementation teams. There’s a reason why the Passionate About OSS logo is an octopus (the OctopOSS). Just when you think you have all the implementation tentacles under control, another comes and whacks you.

Our guest on this episode, Michael De Boer, has spent many years wrangling the OctopOSS. He’s had implementation roles on the buyer / user side with companies like NextGen, but also on the implementer / integrator side with Pitney Bowes. He’s also had ultimate accountability for OSS/BSS delivery as Managing Director of Dynamic Design Australia, where he had to quote and sell but also get hands-on with implementations. Now as Director of GQI (, Michael leads integrations and consultancies that extend beyond OSS/BSS and into other areas of ICT.

Michael describes some of his important learnings on how to ensure your OSS/BSS implementation runs smoothly. He pays particular attention to the people management and data management aspects of any implementation, noting that these are vital components of any build.

For any further questions you may have, Michael can be found at: and via

Disclaimer. All the views and opinions shared in this podcast, and others in the series, are solely those of our guest and do not reflect the opinions or beliefs of the organisations discussed.

013 – Using a Commercial and Open Source approach to Tackle Network Assurance with Keith Sinclair

Have you noticed the rise in trust, but also the rise in sophistication in Open Source OSS/BSS in recent years? There are many open-source OSS/BSS tools out there. Some have been built as side-projects by communities that have day jobs, whilst others have many employed developers / contributors. Generally speaking, the latter are able to employ developers because they have a reliable revenue stream to support the wages.

Our guest on this episode, Keith Sinclair, has made the leap from side-project to thriving OSS/BSS vendor whilst retaining an open-source model. His product, NMIS, has been around since the 1990s, building on the legendary work of other open-source developers like Tobias Oetiker. NMIS has since become one of the flagship products for his company, Opmantek ( Keith and the team have succeeded in creating a commercial construct around their open-source roots, offering product support and value-add products.

Keith retraces those steps, from the initial discussion that triggered the creation of NMIS, its evolution whilst he simultaneously worked at organisations like Cisco, Macquarie Bank and Anixter, through to the IP buy-out and formation of Opmantek, where he’s been CTO for over 10 years. He also describes some of the core beliefs that have guided this journey, from open-source itself, to the importance of automation, scalability and refactoring. The whole conversation is underpinned by a clear passion for helping SysAdmins and Network Admins tackle network assurance challenges at service providers and enterprises alike. Having done these roles himself, he has a powerful empathy for what these people face each day and how tools can help improve their consistency and effectiveness.

For any further questions you may have, Keith can be found at:

Disclaimer. All the views and opinions shared in this podcast, and others in the series, are solely those of our guest and do not reflect the opinions or beliefs of the organisations discussed.

Which approach is better for your OSS? Hedgehog or fox?

Jim Collins’ book, “Good to Great,” has achieved iconic status in the world of corporate strategy. One of the ideas he shares in this book is The Hedgehog Concept.

I’d encourage you to take a look at the link above. It provides excerpts from the book, outlining why the “great” companies in his study acted like hedgehogs, whereas his “comparison” companies (ie the lesser competitors) acted more like foxes.

This video of Jim provides context around hedgehogs and foxes for his comparison. The following is a transcript…

There’s this wonderful essay called “The Hedgehog and the Fox” that was written by Isaiah Berlin, a philosopher-thinker; and he basically said there are two types of thinkers. There are hedgehogs and there are foxes. Now, the hedgehog and the fox are different in the following way. The foxes, they love complexity. They love all the moving parts. They love basically showing how smart they are by making things so complex that other people can’t understand them. Hedgehogs, on the other hand, are a different breed. Hedgehogs tend to take the approach of saying, “You know, I know the world is complex, but we can’t function if we don’t simplify it.” What hedgehogs tend to do is get one big idea and focus on that, simplifying a complex world down to a fundamental, simple idea that is essentially right.”

Extending this analogy, how does this relate to your OSS and/or BSS?

Is it the fox – Lots of complexity. Lots of moving parts. Really difficult for people to understand intuitively?

Or is it the hedgehog – A single big idea. Singular focus on doing that. Doing it simply. Plugged into a world of complexity around it?

Looking at OSS/BSS (and the people that design and implement them), do you think we as an industry have a tendency towards being foxes? Are we, and our products, actually trying too hard to be prove that we’re smart?

How about the users? What do you think OSS/BSS users want? Hedgehog or Fox? I’m leaning towards the answer being the hedgehog. If Jim Collins’s “great” companies were hedgehogs, then maybe this could also be true of OSS/BSS companies? When evaluating OSS tools (and related sales-pitches of them), I often go back to the phrase, “The confused mind says no!

To re-use the link again from above, perhaps the Venn diagram in The Hedgehog Concept can provide guidance on what to be more singularly focused on…. Assuming you agree that you are too foxy!


Just an aside. Could it be that the fox is the single vendor and the best-of-breed approach is the hedgehog? Possibly, although best-of-breed can still be unfathomably complex too.

012 – Building an OSS/BSS from Scratch for a Mid-Market Telco with Steven White

While it’s the tier-1 telco OSS/BSS that get all the attention, it’s actually the mid-market that makes up the largest number of OSS/BSS by customer count (in most deregulated telco markets). The mid-market consists of Tier 2/3 telcos and ISPs with subscriber counts measured in the thousands rather than hundreds of thousands or millions. However, the OSS/BSS of this mid-market still has to cover the same broad estate of functionality as the tier-1s.

The dilemma for most of the mid-market is whether to build their own, often highly specific OSS/BSS solutions, or use off-the-shelf (COTS) solutions. Each approach comes with its own pros and cons.

Our guest on this episode, Steven White, has lived with this dilemma and navigated through it whilst working for Swoop (formerly known as Cirrus Communications). Along with his CTO, Steven and Cirrus / Swoop took the portal / OSS / BSS (named ATMOS) from concept to home-grown implementation and then ongoing refinement over the last 12 years.

As the designer and lead developer of this solution, Steven describes some of the key steps in this journey. These include the development path they took (including key re-factoring decisions along the way), their journey to cloud, the basis of some truly unique features ATMOS has and why they chose to offer unprecedented visibility to their customers using ATMOS.

For any further questions you may have, Steven can be found at: and as well as

Disclaimer. All the views and opinions shared in this podcast, and others in the series, are solely those of our guest and do not reflect the opinions or beliefs of the organisations discussed.

Where does OSS R&D come from?

A question came in from a reader this morning, “What is the role(inputs) of the service providers in the research (technology modernization) compared to the equipment vendors and standard bodies (3GPP, TMForum, IETF, etc.)? Do operators have any influence in the research and how the coordination happens (except being the members) between the vendors, operators, and standard bodies?

Brilliant question/s! I bet you have some thoughts on this!! It would be great if you could share your ideas in the comments section below.

A few notes on this question:

  1. In the old days, the telcos did a lot of R&D, including the invention of OSS/BSS and even the programming languages that were used to build these tools. Check out the book,  “The Idea Factory: Bell Labs and the Great Age of American Innovation” by Jon Gertner (read more about it here – This is probably the most inspiring book I’ve read in relation to the communications industry. The groundbreaking innovations that were developed within R&D powerhouses like Bell Labs during the 1900’s are staggering and something that we can barely even aspire to today. British Telecom and many other government-owned telcos, including Telstra’s R&D Labs here in Australia, also did amazing primary research. But they all offloaded that R&D responsibility to the vendors and to the standards bodies to a lesser extent many years ago. That’s a pity.
  2. A pity for the reasons mentioned in this old post that talks a little about the subject matter.
  3. Some of the telcos do contribute to product development, guiding the vendors with use-cases and requirements… not to mention providing the data and test cases for those use-cases to be validated against. Strategic product development partnerships exist between telcos and vendors
  4. Some of the telcos do create a lot of their own code and / or customisations to COTS products. Much of it is for their own internal use, but some of it is reaching the outside world… sometimes via the telcos’ own efforts and sometimes via their Solution Integrators who develop something at one telco and then cross-purpose it at other telcos
  5. Similar to #4, some code is getting out into the share / collaborative domain via projects like ONAP
  6. The more virtualised and modular nature of 5G has the potential for internally developed product modules (eg the NWDAF or RIC functions), but I suspect most telcos will continue to delegate this task to the vendors
  7. As you probably know, the standards bodies have real difficulty getting R&D to market quickly just because of the nature of having to take contributions from the cooperative. They perform a really important function nonetheless
  8. TM Forum’s Catalyst program is brilliant at getting vendors and carriers together to work on specific use-cases. It’s probably more integration effort around existing solutions than R&D in many cases, but it does perform an important role.
  9. There are a lot of open source software solutions (the other OSS) that contribute to the body of research and development being conducted in the OSS/BSS industry

How to Avoid the Pitfalls of OSS Sharecropping

Firstly, what is Digital Sharecropping? Nicholas Carr coined the term Digital Sharecropping all the way back in 2006, citing the phenomenon where, “One of the fundamental economic characteristics of Web 2.0 is the distribution of production into the hands of the many and the concentration of the economic rewards into the hands of the few.” In other words, the landholder or platform owner (eg Facebook) is able to derive the majority of the financial benefits of the toil of the platform users (eg content creators). In isolation, the platform users don’t deliver much value, but they do in aggregate. Yet the platform owner potentially derives the majority of the aggregated commercial benefit.

But you’re probably wondering what Digital Sharecropping has to do with OSS/BSS. First let me ask you the question:

Who are the Landholders / Platform Owners of OSS and BSS Industries?

Let’s answer that question by taking a Dickensian look at OSS of past, present and future.

OSS Past: The OSS and BSS of the past were often built using third-party tools such as Oracle Forms, AutoCAD, CORBA, SNMP collection libraries, various GIS, programming languages such as C++, etc.

OSS Present: The OSS, and particularly the BSS, of today are increasingly being built on cloud environments, Salesforce and similar.

OSS Future: In future, we’re likely to use even more of the hyperscaler tools, leveraging their pre-packaged AI, mixed reality, data ETL, visualisation and other tools. The benefits of SaaS and subscription revenue models to software providers means that there’s a likelihood that many more platform plays will be built for OSS vendors to use in future.

What are the Benefits of using Third-Party Platforms?

Put simply, using third-party tools can be a very effective way for OSS vendors to deliver products to market faster and cheaper. The reality is that most OSS companies don’t have the time or requisite skills to be able to create some of these tools (eg AI). OSS vendors  get to stand on the shoulders of giants!

So what’s all the fuss I’m making about this then? The answer can be quickly summarised in the phrase – Locus of Control.

The Pitfalls of Relying on Third-Party Platforms

The dangers of being heavily reliant on specific third-party platforms are varied and numerous. The following factors apply to OSS/BSS vendors and their network operator customers alike:

  • Roadmap Alignment – when you first choose to align with a landowner (third-party product / platform), first you decide on the product of best alignment, then you design your own product to fit within the constraints of all that it offers. However, over time the third-party is making their own roadmap decisions. Unless you’re a keystone customer of theirs, they’re probably making architectural decisions that have no alignment with where you want to take your own product
  • Refactoring – the landowner is probably making their own enhancements to their platform. In some cases, those changes mean your OSS also needs resources assigned to be refactored, taking away from the resources required to make enhancements to your own product
  • Price changes – the landowner can choose to make changes to their pricing structure at any time. What might have been a profitable long-term contract between OSS vendor and customer can easily become loss-making if the landowner decides to raise their fees, either through direct price increases or restructuring the cost model. If your product is tightly coupled to theirs, you don’t have the option of quickly transitioning to an alternate supplier. As Nicholas also described, “the sharecroppers operate happily in an attention economy while their overseers operate happily in a cash economy.” The platform players are in a position of power where they can strip out all the benefits of the cash economy, leaving OSS sharecroppers with only the attention economy (including the support of OSS products for their customers, the network operators)
  • Reputation (bug fix) – when you’re reliant on “black box” functionality supplied by others, where you’re unable to control the inner workings of that functionality, you don’t have end-to-end control over issues customers have with your OSS products. Your reputation is inextricably linked to the reliability, scalability, etc of the platform you rely upon
  • Change of Ownership (Ophaning) – if the landholder decides to sell, there’s the possibility that the platform you rely upon could be obsoleted, switched to their own alternative product or just left to languish (see next point)
  • Change in Support – landholder support comes in many different forms:
    • Functional Obsolescence – No additional functionality or roadmap is assigned
    • Professional Services – No professional services are provided to assist with ongoing development of your product
    • Patching – No patches or security fixes are supplied. This could also include an ability to support other third-party software dependencies (eg latest edition of Java)
    • Service Levels – Changes in support models and SLAs
  • Loss of Relevance – Some OSS today have a reliance on a choice of platform they made decades ago. Millions of developer hours have been invested into the product and that platform since. However, there can be structural shifts that mean the platform, technology or processes are no longer preferred practices. An example could be the CAD drawings (and A1 print-outs) that were used by field techs in the past have begun to be replaced by interactive applications viewed on smartphones or tablets. 

So, how to avoid the Pitfalls of Digital Sharecropping?

The benefits of leveraging third-party platforms are too useful to ignore. However, lock-in / dependence avoidance strategies should also be considered. The secret is to spend most of your effort building the assets that you have direct control over. For example, rather than building your code / logic within the platform’s tools, look to extricate that logic into your own modules, microservices, etc.

To follow the old adage, don’t build on rented land… which reminds me of a story a Sydney taxi driver once shared. He described in great detail how he’d invested $250,000 and years of his own effort building extensions to his home (right down to the placement of concrete gargoyles). However, my jaw hit the floor of his taxi when he finished his story with, “… I hope my landlord doesn’t mind!!” 


You’ve heard of a NOC and a SOC. What about a DOC?

You’ve no doubt heard about NOC (Network Operations Centres) and SOC (Security Operations Centres) [or perhaps alternatively, Service Operations Centres], which are the people / processes / infrastructure / tools that allow a network operator to manage the network health and security posture of their networks. The NOC and SOC are vitally important to keeping a modern network running. Unfortunately though, we’re missing a DOC, and I’m not talking about a word processing file here.

So what exactly is a DOC? Well, we’ll get to that shortly.

But first, let’s consider how OSS/BSS and their data contribute to NOC and SOC. Our tools collect all the events and telemetry data, then aggregate it for use within the NOC and SOC tools. Our tools help NOC and SOC teams then process the data, triage the problems, coordinate and manage the remediation efforts.

Speaking of data processing, Data Integrity / Data Quality (DI / DQ) is a significant challenge, and cost, for network operators globally. Most have to invest in systemic programs of work to maintain or improve data quality and ensure the data can be relied upon by the many people who interact with it. Operators know that if their OSS/BSS data goes into a death spiral, then the OSS/BSS tools become useless, no matter how good they are.

The problem with the data-fix programs is that operators tend to prefer algorithmic fixes (the little loop, rather than big loop) to maintain data quality. Algorithmic fixes, designed and implemented by data scientists, tend to be cheaper and easier. However, this has two ramifications. Firstly, little loop fixes tend to reach an asymptote (diminishing rate of return) long before reaching 100% data accuracy. Secondly, algorithmic fixes can only be cost-justified if they’re repairing batches of data.

The reality is that some data, particularly data that can’t be reconciled via an API request, can only be fixed via manual intervention. For example, passive infrastructure like conduits, can’t provide status or configuration updates. Similarly, some data faults are single-instance only and need to be fixed on a data-point by data-point basis. Unfortunately, most carrier processes don’t have the mechanism for immediate data fix – such as when a field-tech is still on site and is in the position to trace out the real situation on site. That’s where the DOC comes in. As you’ve probably worked out, the DOC I’m proposing is a Data Operations Centre.

We don’t have any pre-built data-fix tools like we do for network-fix or security health management today (only the analytics tools that are built for the ad-hoc needs of each customer). Unlike network or security faults, individual users or field workers (or perhaps even customers) can’t log a data fault, or be notified when it’s repaired.

The proposed DOC would be fitted out with the tools and processes required to log a data fault, apply triage to identify the problem / priority, determine a set of remedial activities and then ensure every prioritised fault is repaired. Our OSS/BSS tools have a big part to play (potentially) in supporting a DOC, but we’ll get to that next. First we’ll describe how OSS/BSS data can be better utilised.

The connected nature of our networks mean that faults in the network often ripple out to other parts of the network. It’s the proximity effects – by time (log files), by geo-position (location), by topology (connections to other devices) and hierarchy (relationships such as a card belonging to a device) – that our OSS/BSS store, thus allowing for cascading faults to be identified back to a root-cause.

Some network health issues can be immediate (eg a card failure), whilst some can be more pernicious (eg the slow deterioration of a signal as a connector corrodes over time). Just as a network fault can propagate, so too can a data fault. Data faults tend to be pernicious though and cascading data faults can be harder to pinpoint. Therefore they need to be fixed before they cause ripple-out impacts.

Just like with network adjacencies, data proximity factors are a fundamental element needed to generate a more repeatable approach for data fault management.

The data proximity factors are shown in the diagram above:

  1. Nodal / Hierarchical Proximity (list #1 above), which shows how data points can have parent-child relationships (eg a port is a child of a card, which is a child of a device, which is a child of a rack, and so on)
  2. Connected Proximity (list #2), where data points can be cross-linked with each other (eg a port on an antenna is connected to a port on a radio unit)
  3. Associated Proximity (not shown on diagram), where different data points can be associated with each other (eg a customer service can relate to a circuit, an IP address can relate to a port and/or subnet, a device can relate to a project, and many more)

These proximity factors can be leveraged in the following ways to support a DOC to log, categorise, visualise, then repair data faults:

  • Assign Confidence Levels* to each data point, which can be created:
    • Manually – where OSS/BSS users, field workers, customers, etc can provide a confidence rating against any given data point, particularly when they experience problematic data
    • Algorithmically – where algorithms can analyse data and identify faults (eg perform a trace and identify that only the A-end of a circuit exists, but not Z-end)
    • By Lineage – where certain data sources are deemed less reliable than others
    • By type / category / class – where data, possibly gathered from an external source, has some data classes that are given different confidence levels (eg circuit names exist, but there’s no port-level connectivity recorded for each circuit)
  • Having systematic confidence level rankings allows the following to be created:
    • Heat-maps, which show clusters of proximal data faults to be identified for repair
    • Fitness Functions or Scorecards, which quickly identify the level of data integrity and whether it is improving / deteriorating
    • Data Fault creation rules, which allow a data fault to be logged for repair if certain conditions are met (eg confidence is zero, implying a fault that needs remediation)
  • Faults can then be raised, either against individual data points, or jointly for systematic management through to repair / closure

* Note: I’ve only seen one OSS/BSS tool where data confidence functionality was built in. You can read more about it in the link.

Interestingly, the success of the NOC and SOC is dependent upon the quality of the data, so you could argue that a DOC should actually take precedence.

The key call-out in this article comes from drawing a distinction between a DOC and the way data is managed in most organisations, as follows:

  • Data quality issues should be treated as data faults
  • They need to be treated individually, as each unique data point, not just as a collective to apply an algorithm to (although like network faults, we may choose to aggregate unique data faults and treat them as a collective)
  • Each data fault needs to be managed systematically (eg itemised, acknowledged, actioned, possibly assigned remediation workflows, repaired and closed)
  • There is an urgency around the fix of each data fault, just like network faults. People who experience the data fault may expect for time-based data-fix SLAs to apply. Firstly so they can perform their actions with greater confidence / reliability. Secondly so the data faults don’t ripple out and cause additional problems
  • There is a known contact point (eg phone number, drop-box, etc) for the DOC, so anyone who experiences a data issue knows how to log a fault. By comparison, in many organisations, if a field worker notes a discrepancy between their design pack and the real situation in the field, they just work around the problem and leave without fixing the data fault/s. They invariably have no mechanism for providing feedback. The data problem continues to exist and will cause problems for the next field tech who comes to the same site. Note that there may also be algorithms / rules generating faults, not just humans
  • There are notifications upon closure and/or fix of a data fault (if needed)
  • We provide the DOC with fault management tools, like the ITSM tools we use to monitor and manage IT or network faults, but for managing data faults. It’s possible that we could even use our standard fault management tools, but with customisation to handle data type faults