Trickle-down impact planning

We introduced the concept of The Trickle-down Effect last year, an effect that sees the most minor changes trickling down through an OSS stack, with much bigger consequences than expected.

The trickle-down effect can be insidious, turning a nice open COTS solution into a beast that needs constant attention to cope with the most minor of operational changes. The more customisations made, the more gnarly the beast tends to be.”

Here’s an example I saw recently. An internal business unit wanted to introduce a new card type into the chassis set they managed. Speaking with the physical inventory team, it seemed the change was quite small and a budget was developed for the works… but the budget (dollars / time / risk) was about to blow out in a big way.

The new card wasn’t being picked up in their fault-management or performance management engines. It wasn’t picked up in key reports, nor was it being identified in the configuration management database or logical inventory. Every one of these systems needed interface changes. Not massive change obviously, but collectively the budget blew out by 10x and expedite changes pushed out the work previously planned by each of the interface development and testing teams.

These trickle-down impacts were known…. by some people…. but weren’t communicated to the business unit responsible for managing the new card type. There’s a possibility that they may not have even added the new card type if they realised the full OSS cost consequences.

Are these trickle-down impacts known and readily communicated within your OSS change processes?

Those who rule perfect data…

A Passionate About OSS article last month spoke of how the investment strategy of a $106 billion VC fund has changed my thinking on our OSS‘ most valuable asset. Masayoshi Son is quoted in that article as follows:

“Those who rule data will rule the entire world. That’s what people of the future will say.”

But one question keeps coming back to me… if you’re ruling poor quality data, will you rule nothing whatsoever?

Along the same lines, the old adage, “practice makes perfect,” is not very helpful if you’re not practicing in a constructive way. A better (albeit somewhat impossible) variant on the adage would be “PERFECT practice makes perfect.”

Let me share an example. There is a product that is completely ground-breaking in its ability to automate and optimise designs of large-scale network roll-outs – designs that include outside plant and access network technologies. In bake-offs with some of the best available network designers, this product and its algorithm consistently beats the humans by far more than 25% (when measured by capital costs, implementation time and various other metrics).

Its one challenge in taking over the world and automating every future network design is having a base set of data that is so perfect that no re-design work is required. For example, if the base data says a duct route is available and has capacity for inserting a cable, then the product assumes it can use the duct in its optimal design. But when the field techs arrive at site, they find the duct is too badly damaged to use or already filled to capacity with other cables that can’t be overhauled. A new optimal design has to be calculated to consider the lack of availability of that duct.

The tool still gives great results, even after all the manual intervention, but perfect source data would give breathtaking results.

So I’d look to make one small tweak to Masayoshi Son’s quote. “Those who rule PERFECT* data will rule the entire world. That’s what people of the future will say.”

* whereby perfect means as high in quality as realistically possible.

So, perhaps those expensive data audits and cumbersome data quality processes will have a far greater ROI (Return on Investment) in future than any of us could ever estimate.

The pruning saw technique for OSS fall-out management

Many different user journeys flow through our OSS every day. These include external / customer journeys, internal / operator journeys and possibly even machines-to-machine or system journeys. Unfortunately, not all of these journeys are correctly completed through to resolution.

The incomplete or unsatisfactory journeys could include inter-system fall-outs, customer complaints, service quality issues, and many more.

If we categorise and graph these unsuccessful journeys, we will often find a graph like the one below. This example shows that a small number of journey categories account for a large proportion of the problematic journeys. However, it also shows a long tail of other categories that are individually insignificant, but collectively significant.

Long tail of OSS demands

The place where everybody starts (including me) is to focus on the big wins on lhe left side of the graph. lt makes sense that this is where your efforts will have the biggest impacts. Unfortunately, since that’s where everybody focusses effort, chances are the significant gains have already been achieved and optimisation is already quite high (but numbers are still high due constraints within the existing solution stack).

The long tail intrigues me. It’s harder to build a business case for because there are many causes and little return for solving them. Alternatively, if we can slowly and systematically remove many of these rarer events, we’re removing the noise and narrowing our focus on the signal, thus simplifying our overall solution.

Since they’re statistically rarer, we can often afford to be more ruthless in the ways that we prevent them from occurring. But how do we identify and prevent them?

Each of the bars on the chart above represent leaves on a decision tree (faulty leaves in this case). If we work our way back up the branches, we can ruthlessly prune. Examples could be:

  • The removal of obscure service options that lead to faults
  • Reduction (or expansion or refinement) of process designs to better cope with boundary cases that generate problems
  • Removal of grandfathered products that have few customers, generate losses and consume disproportionate support effort
  • Removal or refinement of interfaces, especially east-west systems that contribute little to an end-to-end process but lead to many fall-outs
  • Identification of improved exception handling, either within systems or at hand-off points

I’m sure you can think of many more, especially when you start tracing back up decision trees with a pruning saw in hand!!


Interaction points with fast/slow processes

Further to yesterday’s post on fast / slow processes and factory platforms, a concept presented by Sylvain Denis of Orange in Melbourne last week, here’s a diagram from Sylvain’s presentation pack :

The yellow blocks represent the fast (automated) processes. The orange blocks represent the slow processes.

The next slide showed the human interaction points (blue boxes) into this API / factory stack.

Fast / Slow OSS processes

Yesterday’s post discussed using smart contracts and Network as a Service (NaaS) to give a network the properties that will allow it to self-heal.

It mentioned a couple of key challenges, one being that there will always be physical activities such as cable cuts fixes, faulty equipment replacement, physical equipment expansion / contraction / lifecycle-management.

In a TM Forum presentation last week, Sylvain Denis of Orange proposed the theory of fast and slow OSS processes. Fast – soft factories (software and logical resources) within the operations stack are inherently automatable (notwithstanding the complexities and cost-benefit dilemma of actually building automations). Slow – physical factories are slow processes as they usually rely on human tasks and/or have location constraints.

Orchestration relies on programmatic interfaces to both. Not all physical factories have programmatic interfaces in all OSS / BSS stacks yet. It will remain a key requirement for the forseeable future to be able to handle dual-speed processes / factories.

Bringing Eminem’s blank canvas to OSS

“When you start out in your career, you have a blank canvas, so you can paint anywhere that you want because the shit ain’t been painted on yet. And then your second album comes out, and you paint a little more and you paint a little more. By the time you get to your seventh and eighth album you’ve already painted all over it. There’s nowhere else to paint.”
Eminem. (on Rick Rubin and Malcolm Gladwell’s Broken Record podcast)

To each their own. Personally, Eminem’s music has never done it for me, whether his first or eighth album, but the quote above did strike a chord (awful pun).

It takes many, many hours to paint in the detail of an OSS painting. By the time a product has been going for a few years, there’s not much room left on the canvas and the detail of the existing parts of the work is so nuanced that it’s hard to contemplate painting over.

But this doesn’t consider that over the years, OSS have been painted on many different canvases. First there were mainframes, then client-server, relational databases, XaaS, virtualisation (of servers and networks), and a whole continuum in between… not to mention the future possibilities of blockchain, AI, IoT, etc. And that’s not even considering the changes in programming languages along the way. In fact, new canvases are now presenting themselves at a rate that’s hard to keep up with.

The good thing about this is that we have the chance to start over with a blank canvas each time, to create something uniquely suited to that canvas. However, we invariably attempt to bring as much of the old thinking across as possible, immediately leaving little space left to paint something new. Constraints that existed on the old canvas don’t always apply to each new canvas, but we still have a habit of bringing them across anyway.

We don’t always ask enough questions like:

  • Does this existing process still suit the new canvas
  • Can we skip steps
  • Can we obsolete any of the old / unused functionality
  • Are old and new architectures (at all levels) easily transmutable
  • Does the user interface need to be ported or replaced
  • Do we even need a user interface (assuming the rise of machine-to-machine with IoT, etc)
  • Does the old data model have any relevance to the new canvas
  • Do the assurance rules of fixed-network services still apply to virtualised networks
  • Do the fulfillment rules of fixed-network services still apply to virtualised networks
  • Are there too many devices to individually manage or can they be managed as a cohort
  • Does the new model give us access to new data and/or techniques that will allow us to make decisions (or derive insights) differently
  • Does the old billing or revenue model still apply to the new platform
  • Can we increase modularity and abstraction between modules

“The real reason “blockchain” or “AI” may actually change businesses now or in the future, isn’t that the technology can do remarkable things that can’t be done today, it’s that it provides a reason for companies to look at new ways of working, new systems and finally get excited about what can be done when you build around technology.”
Tom Goodwin

How the investment strategy of a $106 billion VC fund changed my OSS thinking

What is a service provider’s greatest asset?

Now I’m biased when considering the title question, but I believe OSS are the puppet-master of every modern service provider. They’re the systems that pull all of the strings of the organisation. They generate the revenue by operationalising and assuring the networks as well as the services they carry. They coordinate the workforce. They form the real-time sensor networks that collect and provide data, but more importantly, insights to all parts of the business. That, and so much more.

But we’re pitching our OSS all wrong. Let’s consider first how we raise revenue from OSS, be that either internal (via internal sponsors) or external (vendor/integrator selling to customers)? Most revenue is either generated from products (fixed, leased, consumption revenue models) or services (human effort).

This article from just last month ruminated, “An organisation buys an OSS, not because it wants an Operational Support System, but because it wants Operational Support,” but I now believe I was wrong – charting the wrong course in relation to the most valuable element of our OSS.

After researching Masayoshi Son’s Vision Fund, I’m certain we’re selling a fundamentally short-term vision. Yes, OSS are valuable for the operational support they provide, but their greatest value is as vast data collection and processing engines.

“Those who rule data will rule the entire world. That’s what people of the future will say.”
Masayoshi Son.

For those unfamiliar with Masayoshi Son, he’s Japan’s richest man, CEO of SoftBank, in charge of a monster (US$106 billion) venture capital fund called Vision Fund and is seen as one of the world’s greatest technology visionaries.

As this article on Fortune explains Vision Fund’s foundational strategy, “…there’s a slide that outlines the market cap of companies during the Industrial Revolution, including the Pennsylvania Railroad, U.S. Steel, and Standard Oil. The next frontier, he [Son] believes, is the data revolution. As people like Andrew Carnegie and John D. Rockefeller were able to drastically accelerate innovation by having a very large ownership over the inputs of the Industrial Revolution, it looks like Son is trying to do something similar. The difference being he’s betting on the notion that data is one of the most valuable digital resources of modern day.”

Matt Barnard is CEO of Plenty, one of the companies that Vision Fund has invested in. He had this to say about the pattern of investments by Vision Fund, “I’d say the thing we have in common with his other investments is that they are all part of some of the largest systems on the planet: energy, transportation, the internet and food.”

Telecommunications falls into that category too. SoftBank already owns significant stakes in telecommunications and broadband network providers.

But based on the other investments made by Vision Fund so far, there appears to be less focus on operational data and more focus on customer activity and decision-making data. In particular, unravelling the complexity of customer data in motion.

OSS “own” service provider data, but I wonder whether we’re spending too much time thinking about operational data (and how to feed it into AI engines to get operational insights) and not enough on stitching customer-related insight sets together. That’s where the big value is, but we’re rarely thinking about it or pitching it that way… even though it is perhaps the most valuable asset a service provider has.

The chains of integration are too light until…

Chains of habit are too light to be felt until they are too heavy to be broken.”
Warren Buffett (although he attributed it to an unknown author, perhaps originating with Samuel Johnson).

What if I were to replace the word “habit” in the quote above with “OSS integration” or “OSS customisation” or “feature releases?”

The elegant quote reflects this image:

The chains of feature releases are light at t0. They’re much heavier at t0+100.

Like habits, if we could project forward and see the effects, would we allow destructive customisations to form in their infancy? The problem is that we don’t see them as destructive at infancy. We don’t see how they entangle.

Posing a Network Data Synchronisation Protocol (NDSP) concept

Data quality is one of the biggest challenges we face in OSS. A product could be technically perfect, but if the data being pumped into it is poor, then the user experience of the product will be awful – the OSS becomes unusable, and that in itself generates a data quality death spiral.

This becomes even more important for the autonomous, self-healing, programmable, cooperative networks being developed (think IoT, virtualised networks, Self-Organizing Networks). If we look at IoT networks for example, they’ll be expected to operate unattended for long periods, but with code and data auto-propagating between nodes to ensure a level of self-optimisation.

So today I’d like to pose a question. What if we could develop the equivalent of Network Time Protocol (NTP) for data? Just as NTP synchronises clocking across networks, Network Data Synchronisation Protocol (NDSP) would synchronise data across our networks through a feedback-loop / synchronisation algorithm.

Of course there are differences from NTP. NTP only tries to coordinate one data field (time) along a common scale (time as measured along a 64+64 bits continuum). The only parallel for network data is in life-cycle state changes (eg in-service, port up/down, etc).

For NTP, the stratum of the clock is defined (see image below from wikipedia).

This has analogies with data, where some data sources can be seen to be more reliable than others (ie primary sources rather than secondary or tertiary sources). However, there are scenarios where stratum 2 sources (eg OSS) might push state changes down through stratum 1 (eg NMS) and into stratum 0 (the network devices). An example might be renaming of a hostname or pushing a new service into the network.

One challenge would be the vast different data sets and how to disseminate / reconcile across the network without overloading it with management / communications packets. The other would be that format consistency. I once had a device type that had four different port naming conventions, and that was just within its own NMS! Imagine how many port name variations (and translations) might have existed across the multiple inventories that exist in our networks. The good thing about the NDSP concept is that it might force greater consistency across different vendor platforms.

Another would be that NDSP would become a huge security target as it would have the power to change configurations and because of its reach through the network.

So what do you think? Has the NDSP concept already been developed? Have you implemented something similar in your OSS? What are the scenarios in which it could succeed? Or fail?

I’m predicting the demise of the OSS horse

“What will telcos do about the 30% of workers AI is going to displace?”
Dawn Bushaus

That question, which is the headline of Dawn’s article on TM Forum’s Inform platform, struck me as being quite profound.

As an aside, I’m not interested in the number – the 30% – because I concur with Tom Goodwin’s sentiments on LinkedIn, “There is a lot of nonsense about AI.
Next time someone says “x% of businesses will be using AI by 2020” or “AI will be worth $xxxBn by 2025” or any of those other generic crapspeak comments, know that this means nothing.
AI is a VERY broad area within computer science that includes about 6-8 very different strands of work. It spans robotics, image recognition, machine learning, natural language processing, speech recognition and far more. Nobody agrees on what is and isn’t in this.
This means it covers everything from superintelligence to artificial creativity to chatbots

For the purpose of this article, let’s just say that in 5 years AI will replace a percentage of jobs that we in tech / telco / OSS are currently doing. I know at least a few telcos that have created updated operating plans built around a headcount reduction much greater than the 30% mentioned in Dawn’s article. This is despite the touchpoint explosion and increased complexity that is already beginning to crash down onto us and will continue apace over the next 5 years.

Now, assuming you expect to still be working in 5 years time and are worried that your role might be in the disappearing 30% (or whatever percentage), what do you do now?

First, figure out what the modern equivalents of the horse are in the context of Warren Buffett’s quote below:

“What you really should have done in 1905 or so, when you saw what was going to happen with the auto is you should have gone short horses. There were 20 million horses in 1900 and there’s about 4 million now. So it’s easy to figure out the losers, the loser is the horse. But the winner is the auto overall. [Yet] 2000 companies (carmakers) just about failed.”

It seems impossible to predict how AI (all strands) might disrupt tech / telco / OSS in the next 5 years – and like the auto industry, more impossible to predict the winners (the technologies, the companies, the roles). However, it’s almost definitely easier to predict the losers.

Massive amounts are being invested into automation (by carriers, product vendors and integrators), so if the investments succeed, operational roles are likely to be net losers. OSS are typically built to make operational roles more efficient – but if swathes of operator roles are automated, then does operational support also become a net loser? In its current form, probably yes.

Second, if you are a modern-day horse, ponder which of your skills are transferable into the future (eg chassis building, brakes, steering, etc) and which are not (eg buggy-whip making, horse-manure collecting, horse grooming, etc). Assuming operator-driven OSS activities will diminish, but automation (with or without AI) will increase, can you take your current networks / operations knowledge and combine that with up-skilling in data, software and automation tools?

Even if OSS user interfaces are made redundant by automation and AI, we’ll still need to feed the new technologies with operations-style data, seed their learning algorithms and build new operational processes around them.

The next question is double-edged – for both individuals and telcos alike – how are you up-skilling for a future without horses?

Are your various device inventory repositories in synch?

Does your organisation have a number of different device inventory repositories?
Hint: You might even be surprised by how many you have.

Examples include:

  • Physical network inventory
  • Logical network inventory
  • DNS records
  • CMDB (Config Management DB)
  • IPAM (IP Address Management)
  • EMS (Element Management Systems)
  • SIEM (Security Information and Event Management)
  • Desktop / server management
  • not to mention the management information base on the devices themselves (the only true primary data source)

Have you recently checked how in-synch they are? As a starting point, are device counts aligned (noting that different repositories perhaps only cover subsets of device ranges)? If not, how many cross-match between repositories?

If they’re out of synch, do you have a routine process for triangulating / reconciling / cleansing? Even better, do you have an automated, closed-loop process for ensuring synchronisation across the different repositories?

Would anyone like to offer some thoughts on the many reasons why it’s important to have inventory alignment?

I’ll give a few little hints:

  • Security
  • Assurance
  • Activations
  • Integrations
  • Asset lifecycle management
  • Financial management (ie asset depreciation)

Torturous OSS version upgrades

Have you ever worked on an OSS where a COTS (Commercial Off-The-Shelf) solution has been so heavily customised that implementing the product’s next version upgrade has become a massive challenge? The solution has become so entangled that if the product was upgraded, it would break the customisations and/or integrations that are dependent upon that product.

This trickle-down effect is the perfect example of The Chess-board Analogy or The Tech-debt Wreck at work. Unfortunately, it is far too common, particularly in large, complex OSS environments.

The OSS then either has to:

  • skip the upgrade or
  • take a significant cost/effort hit and perform an upgrade that might otherwise be quite simple.

If the operator decides to take the “skip” path for a few upgrades in a row, then it gets further from the vendor’s baseline and potentially misses out on significant patches, functionality or security hardening. Then, when finally making the decision to upgrade, a much more complex project ensues.

It’s just one more reason why a “simple” customisation often has a much greater life-cycle cost than was initially envisaged.

How to reduce the impact?

  1. We’ve recently spoken about using RPA tools for pseudo-integrations, allowing the operator to leave the COTS product un-changed, but using RPA to shift data between applications
  2. Attempt to achieve business outcomes via data / process / config changes to the COTS product rather than customisations
  3. Enforce a policy of integration as a last resort as a means of minimising the chess-board implications (ie attempting to solve problems via processes, in data, etc before considering any integration or customisation)
  4. Enforcing modularity in the end-to-end architecture via carefully designed control points, microservices, etc

There are probably many other methods that I’m forgetting about whilst writing the article. I’d love to hear the approach/es you use to minimise the impact of COTS version upgrades. Similarly, have you heard of any clever vendor-led initiatives that are designed to minimise upgrade costs and/or simplify the upgrade path?

A summary of RPA uses in an OSS suite

This is the sixth and final post in a series about the four styles of RPA (Robotic Process Automation) in OSS.

Over the last few days, we’ve looked into the following styles of RPA used in OSS, their implementation approaches, pros / cons and the types of automation they’re best suited to:

  1. Automating repeatable tasks – using an algorithmic approach to completing regular, mundane tasks
  2. Streamlining processes / tasks – using an algorithmic approach to assist an operator during a process or as an alternate integration technique
  3. Predefined decision support – guiding operators through a complex decision process
  4. As part of a closed-loop system – that provides a learning, improving solution

RPA tools can significantly improve the usability of an OSS suite, especially for end-to-end processes that jump between different applications (in the many ways mentioned in the above links).

However, there can be a tendency to use the power of RPAs to “solve all problems” (see this article about automating bad processes). That can introduce a life-cycle of pain for operators and RPA admins alike. Like any OSS integration, we should look to keep the design as simple and streamlined as possible before embarking on implementation (subtraction projects).

RPA in OSS feedback loops

This is the fifth in a series about the four styles of RPA (Robotic Process Automation) in OSS.

The fourth of those styles is as part of a closed-loop system such as the one described here. Here’s a diagram from that link:
OSS / DSS feedback loop

This is the most valuable style of RPA because it represents a learning and improving system.

Note though that RPA tools only represent the DSS (Decision Support System) component of the closed-loop so they need to be supplemented with the other components. Also note that an RPA tool can only perform the DSS role in this loop if it can accept feedback (eg via an API) and modify its output in response. The RPA tool could then perform fully automated tasks (ie machine-to-machine) or semi-automated (Decision support for humans).

Setting up this type of solution can be far more challenging than the earlier styles of RPA use, but the results are potentially the most powerful too.

Almost any OSS process could be enhanced by this closed-loop model. It’s just a case of whether the benefits justify the effort. Broad examples include assurance (network health / performance), fulfilment / activations, operations, strategy, etc.

The OSS / RPA parrot on the shoulder analogy

This is the fourth in a series about the four styles of RPA (Robotic Process Automation) in OSS.

The third style is Decision Support. I refer to this style as the parrot on the shoulder because the parrot (RPA) guides the operator through their daily activities. It isn’t true automation but it can provide one of the best cost-benefit ratios of the different RPA styles. It can be a great blend of human-computer decision making.

OSS processes tend to have complex decision trees and need different actions performed depending on the information being presented. An example might be a customer on-boarding, which includes credit and identity check sub-processes, followed by the customer service order entry.

The RPA can guide the operator to perform each of the steps along the process including the mandatory fields to populate for regulatory purposes. It can also recommend the correct pull-down options to select so that the operator traverses the correct branch of the decision tree of each sub-process.

This functionality can allow organisations to deliver less training than they would without decision support. It can be highly cost-effective in situations where:

  • There are many inexperienced operators, especially if there is high staff turnover such as in NOCs, contact centres, etc
  • It is essential to have high process / data quality
  • The solution isn’t intuitive and it is easy to miss steps, such as a process that requires an operator to swivel-chair between multiple applications
  • There are many branches on the decision tree, especially when some of the branches are rarely traversed, even by experienced operators

In these situations the cost of training can far outweigh the cost of building an OSS (RPA) parrot on each operator’s shoulder.

Using RPA as an alternate OSS integration

This is the third in a series about the four styles of RPA (Robotic Process Automation) in OSS.

The second of those styles is Streamlining processes / tasks by following an algorithmic approach to simplify processes for operators.

These can be particularly helpful during swivel-chair processes where multiple disparate systems are partially integrated but each needs the same data (ie reducing the amount of duplicated data entry between systems). As well as streamlining the process it also improves data consistency rates.

The most valuable aspect of this style of RPA is that it can minimise the amount of integration between systems, thus potentially reducing solution maintenance into the future. The RPA can even act as the integration technique where an API isn’t available or documentation isn’t available (think legacy systems here).

Using RPA to automate OSS activities

This is the second in a series about the four styles of RPA (Robotic Process Automation) in OSS.

The first of those styles is automating repeatable tasks by following an algorithmic approach to complete regular, mundane tasks.

Running an OSS has many high value, challenging tasks for operators to perform. Unfortunately, they also have many repetitive, simple (brain-dead?) tasks that need to be done too.

This might include collecting data from various sources and aggregating it into a single file or report for consumption by humans or machines. Other examples include admin clean-up tasks like accounts / tempfiles / processes / sessions and myriad simple process automations.

When we think of OSS automations, we often think of high value but complicated tasks like orchestrations, network self-healing, etc. They can be expensive and inflexible, not always delivering the perceived worth for the investment.

However, when thinking of RPA I think about the simplest stuff first. They are basic and consistent processes that are straightforward to define an algorithm for, making them the “low-hanging fruit” of OSS / RPA activities. They help to build momentum towards the bigger automation fish. Best of all, they free up your talented OSS operators to do more valuable activities.

Automating repeatable tasks is the most basic RPA style. We’ll step up the value chain with each additional style over the next few days.

The four styles of RPA used in OSS

You’re probably already aware of RPA (Robotic Process Automation) tools. You’ve possibly even used one (or more) to enhance your OSS experience. In some ways, they’re a really good addition to your OSS suite. In some ways, potentially not. That all comes down to the way you use them.

There are four main ways that I see them being used (but happy for you to point out others):

  1. Automating repeatable tasks – following an algorithmic approach to getting regular, mundane tasks done (eg weekly report generation)
  2. Streamlining processes / tasks – again following an algorithmic approach to assist an operator during a process (eg reducing the amount of data entry when there is duplication between systems)
  3. Predefined decision support – to guide operators through a process that involves making different decisions based on the information being presented (eg in a highly regulated or complex process, with many options, RPA rules can ensure quality remains high)
  4. As part of a closed-loop system – if your RPA tool can handle changes to its rules through feedback (ie not just static rules) then it can become an important part of a learning, improving solution

You’ll notice an increasing level of sophistication from 1-4. Not just sophistication but potential value to operators too.

We’ll take a closer look at the use of RPA in OSS over the next couple of days.

The evolving complexity of RCA

Root cause analysis (RCA) is one of the great challenges of OSS. As you know, it aims to identify the probable cause of an alarm-storm, where all alarms are actually related to a single fault.

In the past, my go-to approach was to start with a circuit hierarchy-based algorithm. If you had an awareness of the hierarchy of circuits, usually through an awareness in inventory, if you have a lower-order fault (eg Loss of Signal on a transmission link caused by a cable break), then you could suppress all higher-order alarms (ie from bearers or tributaries that were dependent upon the L1 link. That works well in the fixed networks of distant past (think SDH / PDH). This approach worked well because it was repeatable between different customer environments.

Packet-switching data networks changed that to an extent, because a data service could traverse any number of links, on-net or off-net (ie leased links). The circuit hierarchy approach was still applicable, but needed to be supplemented with other rules.

Now virtualised networking is changing it again. RCA loses a little relevance in the virtualised layer. Workloads and resource allocations are dynamic and transient, making them less suited to fixed algorithms. The objective now becomes self-healing – if a failure is identified, failed resources are spun down and new ones spun up to take the load. The circuit hierarchy approach loses relevance, but perhaps infrastructure hierarchy still remains useful. Cable breaks, server melt-downs, hanging controller applications are all examples of root causes that will cause problems in higher layers.]

Rather than fixed-rules, machine-based pattern-matching is the next big hope to cope with the dynamically changing networks.

The number of layers and complexity of the network seems to be ever increasing, and with it RCA becomes more sophisticated…. If only we could evolve to simpler networks rather than more complex ones. Wishful thinking?

If the customer thinks they have a problem, they do have a problem

Omni-channel is an interesting concept because it generates two distinctly different views.
The customer will use whichever channel (eg digital, apps, contact-centre, IVR, etc) that they want to use.
The service provider will try to push the customer onto whichever channel suits the service provider best.

The customer will often want to use digital or apps, back-ended by OSS – whether that’s to place an order, make configuration changes, etc. The service provider is happy for the customer to use these low-cost, self-service channels.

But when the customer has a problem, they’ll often try to self-diagnose, then prefer to speak with a person who has the skills to trouble-shoot and work with the back-end systems and processes. Unfortunately, the service provider still tries to push the customer into low-cost, self-service channels. Ooops!

If the customer thinks they have a problem, they do have a problem (even if technically, they don’t).
Omni-channel means giving customers the channels that they want to work via, not the channels the service provider wants them to work via.
Call Volume Reduction (CVR) projects (which can overlap into our OSS) sometimes lose sight of this fact just because the service provider has their heart set on reducing costs.