Where are the reliability hotspots in your OSS?

As you already know, there are two categories of downtime – unplanned (eg failures) and planned (eg upgrades / maintenance).

Planned downtime sounds a lot nicer (for operators) but the reality is that you could call both types “incidents” – they both impact (or potentially impact) the customer. We sometimes underestimate that fact.

Today’s question is whether you’re able to identify where the hotspots are in your OSS suite when you combine both types of downtime. Can you tell which outages are service-impacting?

In a round-about way, I’m asking whether you already have a dashboard that monitors uptime of all the components (eg applications, probes, middleware, infra, etc) that make up your complete OSS / BSS estate? If you do, does it tell you what you anecdotally know already, or are there sometimes surprises?

Does the data give you the evidence you need to negotiate with the implementers of problematic components (eg patch cadence, the need for reliability fixes, streamlining the patch process, reduction in customisations, etc)? Does it give you reason to make architectural changes (eg webscaling)?

Stop looking for exciting new features for your OSS

The iPhone disrupted the handset business, but has not disrupted the cellular network operators at all, though many people were convinced that it would. For all that’s changed, the same companies still have the same business model and the same customers that they did in 2006. Online flight booking doesn’t disrupt airlines much, but it was hugely disruptive to travel agents. Online booking (for the sake of argument) was sustaining innovation for airlines and disruptive innovation for travel agents.
Meanwhile, the people who are first to bring the disruption to market may not be the people who end up benefiting from it, and indeed the people who win from the disruption may actually be doing something different – they may be in a different part of the value chain. Apple pioneered PCs but lost the PC market, and the big winners were not even other PC companies. Rather, most of the profits went to Microsoft and Intel, which both operated at different layers of the stack. PCs themselves became a low-margin commodity with fierce competition, but PC CPUs and operating systems (and productivity software) turned out to have very strong winner-takes-all effects
.”
Ben Evans
on his blog about Tesla.

As usual, Ben makes some thought-provoking points. The ones above have coaxed me into thinking about OSS from a slightly perspective.

I’d tended to look at OSS as a product to be consumed by network operators (and further downstream by the customers of those network operators). I figured that if our OSS delivered benefit to the downstream customers, the network operators would thrive and would therefore be prepared to invest more into OSS projects. In a way, it’s a bit like a sell-through model.

But the ideas above give some alternatives for OSS providers to reduce dependence on network operator budgets.

Traditional OSS fit within a value-chain that’s driven by customers that wish to communicate. In the past, the telephone network was perceived as the most valuable part of that value-chain. These days, digitisation and competition has meant that the perceived value of the network has dropped to being a low-margin commodity in most cases. We’re generally not prepared to pay a premium for a network service. The Microsofts and Intels of the communications value-chain is far more diverse. It’s the Googles, Facebooks, Instagrams, YouTubes, etc that are perceived to deliver most value to end customers today.

If I were looking for a disruptive OSS business model, I wouldn’t be looking to add exciting new features within the existing OSS model. In fact, I’d be looking to avoid our current revenue dependence on network operators (ie the commoditising aspects of the communications value-chain). Instead I’d be looking for ways to contribute to the most valuable aspects of the chain (eg apps, content, etc). Or even better, to engineer a more exceptional comms value-chain than we enjoy today, with an entirely new type of OSS.

OSS operationalisation at scale

We had a highly flexible network design team at a previous company. Not because we wanted to necessarily, but because we were forced to by the client’s allocation of work.

Our team was largely based on casual workers because there was little to predict whether we needed 2 designers or 50 in any given week. The workload being assigned by the client was incredibly lumpy.

But we were lucky. We only had design work. The lumpiness in design effort flowed down through the work stack into construction, test and deployment teams. The constructors had millions of dollars of equipment that they needed to mobilise and demobilise as the work ebbed and flowed. Unfortunately for the constructors, they’d prepared their rate cards on the assumption of a fairly consistent level of work coming through (it was a very big project).

This lumpiness didn’t work out for anyone in the delivery pipeline, the client included. It was actually quite instrumental in a few of the constructors going into liquidation. The client struggled to meet roll-out targets.

The allocation of work was being made via the client’s B/OSS stack. The B/OSS teams were blissfully unaware of the downstream impact of their sporadic allocation of designs. Towards the end of the project, they were starting to get more consistent and delivery teams started to get into more of a rhythm… just as the network was coming to the end of build.

As OSS builders, we sometimes get so wrapped up in delivering functionality that we can forget that one of the key requirements of an OSS is to operationalise at scale. In addition to UI / CX design, this might be something as simple as smoothing the effort allocation for work under our OSS‘s management.

If your partners don’t have to talk to you then you win

If your partners don’t have to talk to you then you win.”
Guy Lupo
.

Put another way, the best form of customer service is no customer service (ie your customers and/or partners are so delighted with your automated offerings that they have no reason to contact you). They don’t want to contact you anyway (generally speaking). They just want to consume a perfectly functional and reliable solution.

In the deep, distant past, our comms networks required operators. But then we developed automated dialling / switching. In theory, the network looked after itself and people made billions of calls per year unassisted.

Something happened in the meantime though. Telco operators the world over started receiving lots of calls about their platform and products. You could say that they’re unwanted calls. The telcos even have an acronym called CVR – Call Volume Reduction – that describes their ambitions to reduce the number of customer calls that reach contact centre agents. Tools such as chatbots and IVR have sprung up to reduce the number of calls that an operator fields.

Network as a Service (NaaS), the context within Guy’s comment above, represents the next new tool that will aim to drive CVR (amongst a raft of other benefits). NaaS theoretically allows customers to interact with network operators via impersonal contracts (in the form of APIs). The challenge will be in the reliability – ensuring that nothing falls between the cracks in any of the layers / platforms that combine to form the NaaS.

In the world of NaaS creation, Guy is exactly right – “If your partners [and customers] don’t have to talk to you then you win.” As always, it’s complexity that leads to gaps. The more complex the NaaS stack, the less likely you are to achieve CVR.

The OSS self-driving vehicle

I was lucky enough to get some time of a friend recently, a friend who’s running a machine-learning network assurance proof-of-concept (PoC).

He’s been really impressed with the results coming out of the PoC. However, one of the really interesting factors he’s been finding is how frequently BAU (business as usual) changes in the OSS data (eg changes in naming conventions, topologies, etc) would impact results. Little changes made by upstream systems effectively invalidated baselines identified by the machine-learning engines to key in on. Those little changes meant the engine had to re-baseline / re-learn to build back up to previous insight levels. Or to avoid invalidating the baseline, it would require re-normalising all of data prior to the identification of BAU changes.

That got me wondering whether DevOps (or any other high-change environment) might actually hinder our attempts to get machine-led assurance optimisation. But more to the point, does constant change (at all levels of a telco business) hold us back from reaching our aim of closed-loop / zero-touch assurance?

Just like the proverbial self-driving car, will we always need someone at the wheel of our OSS just in case a situation arises that the machines hasn’t seen before and/or can’t handle? How far into the future will it be before we have enough trust to take our hands off the OSS wheel and let the machines drive closed-loop processes without observation by us?

Designing an Operational Domain Manager (ODM)

A couple of weeks ago, Telstra and the TM Forum held an event in Melbourne on OSS for next gen architectures.

The diagram below comes from a presentation by Corey Clinger. It describes Telstra’s Operational Domain Manager (ODM) model that is a key component of their Network as a Service (NaaS) framework. Notice the API stubs across the top of the ODM? Corey went on to describe the TM Forum Open API model that Telstra is building upon.
Operational Domain Manager (ODM)

In a following session, Raman Balla indicated an perspective that differs from many existing OSS. The service owner (and service consumer) must know all aspects of a given service (including all dimensions, lifecycle, etc) in a common repository / catalog and it needs to be attribute-based. Raman also indicated that the aim he has for architecting NaaS is to not only standardise the service, but the entire experience around the service.

In the world of NaaS, operators can no longer just focus separately on assurance or fulfillment or inventory / capacity, etc. As per DevOps, operators are accountable for everything.

Expanding your bag of OSS tricks

Let me ask you a question – when you’ve expanded your bag of tricks that help you to manage your OSS, where have they typically originated?

By reading? By doing? By asking? Through mentoring? Via training courses?
Relating to technical? People? Process? Product?
Operations? Network? Hardware? Software?
Design? Procure? Implement / delivery? Test? Deploy?
By retrospective thinking? Creative thinking? Refinement thinking?
Other?

If you were to highlight the questions above that are most relevant to the development of your bag of tricks, how much coverage does your pattern show?

There are so many facets to our OSS (ie. tentacles on the OctopOSS) aren’t there? We have to have a large bag of tricks. Not only that, we need to be constantly adding new tricks too right?

I tend to find that our typical approaches to OSS knowledge transfer cover only a small subset (think about discussion topics at OSS conferences that tend to just focus on the technical / architectural)… yet don’t align with how we (or maybe just I) have developed capabilities in the past.

The question then becomes, how do we facilitate the broader learnings required to make our OSS great? To introduce learning opportunities for ourselves and our teams across vaguely related fields such as project management, change management, user interface design, process / workflows, creative thinking, etc, etc.

Zero touch network & Service Management (ZSM)

Zero touch network & Service Management (ZSM) is a next-gen network management approach using closed-loop principles hosted by ETSI. An ETSI blog has just demonstrated the first ZSM Proof of Concept (PoC). The slide deck describing the PoC, supplied by EnterpriseWeb, can be found here.

The diagram below shows a conceptual closed-loop assurance architecture used within the PoC
ETSI ZSM PoC.

It contains some similar concepts to a closed-loop traffic engineering project designed by PAOSS back in 2007, but with one big difference. That 2007 project was based on a single-vendor solution, as opposed to the open, multi-vendor PoC demonstrated here. Both were based on the principle of using assurance monitors to trigger fulfillment responses. For example, ours used SLA threshold breaches on voice switches to trigger automated remedial response through the OSS‘s provisioning engine.

For this newer example, ETSI’s blog details, “The PoC story relates to a congestion event caused by a DDoS (Denial of Service) attack that results in a decrease in the voice quality of a network service. The fault is detected by service monitoring within one or more domains and is shared with the end-to-end service orchestrator which correlates the alarms to interpret the events, based on metadata and metrics, and classifies the SLA violations. The end-to-end service orchestrator makes policy-based decisions which trigger commands back to the domain(s) for remediation.”

You’ll notice one of the key call-outs in the diagram above is real-time inventory. That was much harder for us to achieve back in 2007 than it is now with virtualised network and compute layers providing real-time telemetry. We used inventory that was only auto-discovered once daily and had to build in error handling, whilst relying on over-provisioned physical infrastructure.

It’s exciting to see these types of projects being taken forward by ETSI, EnterpriseWeb, et al.

Network slicing, another OSS activity

One business customer, for example, may require ultra-reliable services, whereas other business customers may need ultra-high-bandwidth communication or extremely low latency. The 5G network needs to be designed to be able to offer a different mix of capabilities to meet all these diverse requirements at the same time.
From a functional point of view, the most logical approach is to build a set of dedicated networks each adapted to serve one type of business customer. These dedicated networks would permit the implementation of tailor-made functionality and network operation specific to the needs of each business customer, rather than a one-size-fits-all approach as witnessed in the current and previous mobile generations which would not be economically viable.
A much more efficient approach is to operate multiple dedicated networks on a common platform: this is effectively what “network slicing” allows. Network slicing is the embodiment of the concept of running multiple logical networks as virtually independent business operations on a common physical infrastructure in an efficient and economical way.
.”
GSMA’s Introduction to Network Slicing.

Engineering a network is one of compromises. There are many different optimisation levers to pull to engineer a set of network characteristics. In the traditional network, it was a case of pulling all the levers to find a middle-ground set of characteristics that supported all their service offerings.

QoS striping of traffic allowed for a level of differentiation of traffic handling, but the underlying network was still a balancing act of settings. Network virtualisation offers new opportunities. It allows unique segmentation via virtual networks, where each can be optimised for the specific use-cases of that network slice.

For years, I’ve been posing the concept of telco offerings being like electricity networks – that we don’t need so many service variants. I should note that this analogy is not quite right. We do have a few different types of “electricity” such as highly available (health monitoring), high-bandwidth (content streaming), extremely low latency (rapid reaction scenarios such as real-time sensor networks), etc.

Now what do we need to implement and manage all these network slices?? Oh that’s right, OSS! It’s our OSS that will help to efficiently coordinate all the slicing and dicing that’s coming our way… to optimise all the levers across all the different network slices!

The OSS co-op business model

A co-operative is a member-owned business structure with at least five members, all of whom have equal voting rights regardless of their level of involvement or investment. All members are expected to help run the cooperative.”
Small Business WA.

The co-op business model has fascinated me since doing some tech projects in the dairy industry in the deep distant past. The dairy co-ops empower collaboration of dairy farmers where the might of the collective outweighs that of each individually. As the collective, they’ve been able to establish massive processing plants, distribution lines, bargaining power, etc. The dairy co-ops are a sell-side collaboration.

By contrast open source projects like ONAP represent an interesting hybrid – part buy-side collaboration (ie the service providers acquiring software to run their organisations) and part sell-side (ie the vendors contributing code to the project alongside the service providers).

I’ve long been intrigued by the potential for a pure sell-side co-operative in OSS.

As we all know, the OSS market is highly fragmented (just look the number of vendors / products on this page), which means inefficiency because of the duplicated effort across vendors. A level of market efficiency comes from mergers and acquisitions. In addition, some comes from vendors forming partnerships to offer more complete solutions to a given customer requirement list.

But the key to a true sell-side OSS co-operative would be in the definition above – “at least five members.” Perhaps it’s an open-source project that brings them together. Perhaps it’s an extended partnership.

As Tom Nolle stated in an article that prompted the writing of today’s post, “On the vendor side, commoditization tends to force consolidation. A vendor who doesn’t have a nice market share has little to hope for but slow decline. A couple such vendors (like Infinera and Coriant, recently) can combine with the hope that the combination will be more survivable than the individual companies were likely to be. Consolidation weeds out industry inefficiencies like parallel costly operations structures, and so makes the remaining players stronger.

Imagine for a moment if instead of having developers spread across 100 alarm management tools, that same developer pool can take a consolidated 5 alarm management products forward? Do you think we’d get better, more innovative, more complete products faster?

Having said that, co-ops have their weaknesses too.

What do you think? Could such a model work? Would it be a disaster?

Orchestration looks a bit like provisioning

The following is the result of a survey question posed by TM Forum:
Number 1 Driver for Orchestration

I’m not sure how the numbers tally, but conceptually the graph above paints an interesting perspective of why orchestration is important. The graph indicates the why.

But in this case, for me, the why is the by-product of the how. The main attraction of orchestration models is in how we can achieve modularity. All of the business outcomes mentioned in the graph above will only be achievable as a result of modularity.

Put another way, rather than having the integration spaghetti of an “old-school” OSS / BSS stack, orchestration (and orchestration plans) potentially provides the ability to provide clearer demarcation and abstraction all the way from product design down into transactions that hit the network… not to mention the meet-in-the-middle points between business units.

Demarcation points support catalog items (perhaps as APIs / microservices with published contracts), allowing building-block design of products rather than involvement of (and disputes between) business units all down the line of product design. This facilitates the speed (34%) and services on demand (28%) objectives stated in the graph.

But I used the term “old-school” with intent above. The modularity mentioned above was already achieved in some older OSS too. The ability to carve up, sequence, prioritise and re-construct a stream of service orders was already achievable by some provisioning + workflow engines of the past.

The business outcomes remain the same now as they were then, but perhaps orchestration takes it to the next level.

There is no differentiation left in out-bundling competitors

In 1998 Berkshire Hathaway acquired a reinsurance company called General Re. “The only significant staff change that followed the merger was the elimination of General Re’s investment unit. Some 150 people had been in charge of deciding where to invest the company’s funds; they were replaced with just one individual – Warren Buffett.
Robert G. Hagstrom
in, “The Warren Buffett Way.”

Buffett was able to replace 150 people, and significantly outperform them, because they were conducting (relatively) small value, high volume transactions and he did the exact opposite.

Compare this with Gemini Waghmare’s thoughts on BSS, “It used to be that operators differentiated by pricing. Complex bundles, friends and family plans, rollover minutes and megabytes were used as ways to win over consumers. This drove significant investment into charging platforms and product catalogs. The internet economy runs on one-click purchases and a recurring flat rate. Roaming and overages are going away and transactional VOD (video on-demand) makes way for subscription VOD.
It’s not uncommon for operators to have 10,000 price plans while Netflix has three. Facebook and Google make billions of dollars without charging a cent.
Operators would do well to deprecate the value of their charging systems and invest instead in cloud and flat-rate billing with added focus on collecting, normalizing and monetizing user data. By simplifying subscription models with lightweight billing platforms, the scale and cost of BSS will drop dramatically. After all, there is no differentiation left in out-bundling competitors
,” quoted here on Inform. There are some brilliant insights in this link, so I recommend you taking a closer look BTW.

10,000+ pricing plans definitely sounds like the equivalent to General Re before Buffett arrived. Having only 3 pricing plans would be more like the Buffett approach, change the dynamic of BSS tools and the size of the teams that use them! Having only 3 pricing plans would certainly change the dynamic for OSS too. The number of variants we’d be asked to handle would diminish, making it much easier to build and operate our OSS. Due to all the down-stream inefficiencies, you could actually argue that there is only negative-differentiation left in out-bundling competitors.

As an aside… Interesting comment that, “Facebook and Google make billions of dollars without charging a cent.” I’d beg to differ. Whilst consumers of the service aren’t billed, advertisers certainly are, which I assume still needs a billing engine… one that probably has quite a bit of algorithmic complexity.

Shooting the OSS messenger

NPS, or Net Promoter Score, has become commonly used in the telecoms industry in recent years. In effect, it is a metric that measures friction in the business. If NPS is high, the business runs more smoothly. Customers are happy with the service and want to buy more of it. They’re happy with the service so they don’t need to contact the business. If NPS is low, it’s harder to make sales and there’s the additional cost of time dealing with customer complaints, etc (until the customer goes away of course).

NPS can be easy to measure via survey, but a little more challenging as a near-real-time metric. What if we used customer contacts (via all channels such as phone, IVR, email, website, live-chat, etc) as a measure of friction? But more importantly, how does any of this relate to OSS / BSS? We’ll get to that shortly (I hope).

BSS (billing, customer relationship management, etc) and OSS (service health, network performance, etc) tend to be the final touchpoints of a workflow before reaching a customer. When the millions of workflows through a carrier are completing without customer contact, then friction is low. When there are problems, calls go up and friction / inefficiency is also going up. When there are problems, the people (or systems) dealing with the calls (eg contact centre operators) tend to start with OSS / BSS tools and then work their way back up the funnel to identify the cause of friction and attempt to resolve it.

The problem is that the OSS / BSS tools are often seen as the culprit because that’s where the issue first becomes apparent. It’s easier to log an issue against the OSS than to keep tracking back to the real source of the problem. Many times, it’s a case of shooting the messenger. Not only that, but if we’re not actually identifying the source of the problem then it becomes systemic (ie the poor customer experience perpetuates).

Maybe there’s a case for us to get better at tracking the friction caused further upstream of our OSS / BSS and to give more granular investigative tools to the call takers. Even if we do, our OSS / BSS are still the ones delivering the message.

The OSS Matrix – the blue or the red pill?

OSS Matrix
OSS tend to be very good at presenting a current moment in time – the current configuration of the network, the health of the network, the activities underway.

Some (but not all) tend to struggle to cope with other moments in time – past and future.

Most have tools that project into the future for the purpose of capacity planning, such as link saturation estimation (based on projecting forward from historical trend-lines). Predictive analytics is a current buzz-word as research attempts to predict future events and mitigate for them now.

Most also have the ability to look into the past – to look at historical logs to give an indication of what happened previously. However, historical logs can be painful and tend towards forensic analysis. We can generally see who (or what) performed an action at a precise timestamp, but it’s not so easy to correlate the surrounding context in which that action occurred. They rarely present a fully-stitched view in the OSS GUI that shows the state of everything else around it at that snapshot in time past. At least, not to the same extent that the OSS GUI can stitch and present current state together.

But the scenario that I find most interesting is for the purpose of network build / maintenance planning. Sometimes these changes occur as isolated events, but are more commonly run as projects, often with phases or milestone states. For network designers, it’s important to differentiate between assets (eg cables, trenches, joints, equipment, ports, etc) that are already in production versus assets that are proposed for installation in the future.

And naturally those states cross over at cut-in points. The proposed new branch of the network needs to connect to the existing network at some time in the future. Designers need to see available capacity now (eg spare ports), but be able to predict with confidence that capacity will still be available for them in the future. That’s where the “reserved” status comes into play, which tends to work for physical assets (eg physical ports) but can be more challenging for logical concepts like link utilisation.

In large organisations, it can be even more challenging because there’s not just one augmentation project underway, but many. In some cases, there can be dependencies where one project relies on capacity that is being stood up by other future projects.

Not all of these projects / plans will make it into production (eg funding is cut or a more optimal design option is chosen), so there is also the challenge of deprecating planned projects. Capability is required to find whether any other future projects are dependent on this deprecated future project.

It can get incredibly challenging to develop this time/space matrix in OSS. If you’re a developer of OSS, the question becomes whether you want to take the blue or red pill.

OSS stepping stone or wet cement

Very often, what is meant to be a stepping stone turns out to be a slab of wet cement that will harden around your foot if you do not take the next step soon enough.”
Richelle E. Goodrich
.

Not sure about your parts of the world, but I’ve noticed the terms “tactical” (ie stepping stone solution) and “strategic” (ie long-term solution) entering the architectural vernacular here in Australia.

OSS seem to be full of tactical solutions. We’re always on a journey to somewhere else. I love that mindset – getting moving now, but still keeping the future in mind. There’s just one slight problem… how many times have we seen a tactical solution that was build years before? Perhaps it’s not actually a problem at all in some cases – the short-term fix is obviously “good enough” to have survived.

As a colleague insightfully pointed out last week – “if you create a tactical solution without also preparing a strategic solution, you don’t have a tactical solution, you have a solution.

When architecting your OSS solutions, do you find yourself more easily focussing on the tactical, the strategic, or is having an eye on both the essential part of your solution?

OSS – just in time rather than just in case

We all know that once installed, OSS tend to stay in place for many years. Too much effort to air-lift in. Too much effort to air-lift back out, especially if tightly integrated over time.

The monolithic COTS (off-the-shelf) tools of the past would generally be commissioned and customised during the initial implementation project, with occasional integrations thereafter. That meant we needed to plan out what functionality might be required in future years and ask for it to be implemented, just in case. Along with all the baked-in functionality that is never needed, and the just in case but possibly never used, we ended up with a lot of bloat in our OSS.

With the current approach of implementing core OSS building blocks, then utilising rapid release and microservice techniques, we have an ongoing enhancement train. This provides us with an opportunity to build just in time, to build only functionality that we know to be essential.

This has pluses and minuses. On the plus side, we have more opportunity to restrict delivery to only what’s needed. On the minus side, a just in time mindset can build a stop-gap culture rather than strategic, long-term thinking. It’s always good to have long-term thinkers / planners on the team to steer the rapid release implementations (and reductions / refactoring) and avoid a new cause of bloat.

An OSS doomsday scenario

If I start talking about doomsday scenarios where the global OSS job industry is decimated, most people will immediately jump to the conclusion that I’m predicting an artificial intelligence (AI) takeover. AI could have a role to play, but is not a key facet of the scenario I’m most worried about.
OSS doomsday scenario

You’d think that OSS would be quite a niche industry, but there must be thousands of OSS practitioners in my home town of Melbourne alone. That’s partly due to large projects currently being run in Australia by major telcos such as nbn, Telstra, SingTel-Optus and Vodafone, not to mention all the smaller operators. Some of these projects are likely to scale back in coming months / years, meaning less seats in a game of OSS musical chairs. But this isn’t the doomsday scenario I’m hinting at in the title either. There will still be many roles at the telcos and the vendors / integrators that support them.

There are hundreds of OSS vendors in the market now, with no single dominant player. It’s a really fragmented market that would appear to be ripe for M&A (mergers and acquisitions). Ripe for consolidation, but massive consolidation is still not the doomsday scenario because there would still be many OSS roles in that situation.

The doomsday scenario I’m talking about is one where only one OSS gains domination globally. But how?

Most traditional telcos have a local geographic footprint with partners/subsidiaries in other parts of the world, but are constrained by the costs and regulations of a wired or cellular footprint to be able to reach all corners of the globe. All that uniqueness currently leads to the diversity of OSS offerings we see today. The doomsday scenario arises if one single network operator usurps all the traditional telcos and their legacy network / OSS / BSS stacks in one technological fell swoop.

How could a disruption of that magnitude happen? I’m not going to predict, but a satellite constellation such as the one proposed by Starlink has some of the hallmarks of such a scenario. By using low-earth orbit (LEO) satellites (ie lower latency than geostationary satellite solutions), point-to-point laser interconnects between them and peering / caching of data in the sky, it could fundamentally change the world of communications and OSS.

It has global reach, no need for carrier interconnect (hence no complex contract negotiations or OSS/BSS integration for that matter), no complicated lead-in negotiations or reinstatements, no long-haul terrestrial or submarine cable systems. None of the traditional factors that cost so much time and money to get customers connected and keep them connected (only the complication of getting and keeping the constellation of birds in the sky – but we’ll put that to the side for now). It would be hard for traditional telcos to compete.

I’m not suggesting that Starlink can or will be THE ubiquitous global communications network. What if Google, AWS or Microsoft added this sort of capability to their strengths in hosting / data? Such a model introduces a new, consistent network stack without the telcos’ tech debt burdens discussed here. The streamlined network model means the variant tree is millions of times simpler. And if the variant tree is that much simpler, so is the operations model and so is the OSS… with one distinct contradiction. It would need to scale for billions of customers rather than millions and trillions of events.

You might be wondering about all the enterprise OSS. Won’t they survive? Probably not. Comms networks are generally just an important means-to-an-end for enterprises. If the one global network provider were to service every organisation with local or global WANs, as well as all the hosting they would need, and hosted zero-touch network operations like Google is already pre-empting, would organisation have a need to build or own an on-premises OSS?

One ubiquitous global network, with a single pared back but hyperscaled OSS, most likely purpose-built with self-healing and/or AI as core constructs (not afterthoughts / retrofits like for existing OSS). How many OSS roles would survive that doomsday scenario?

Do you have an alternative OSS doomsday scenario that you’d like to share?

Hat tip again to Jay Fenton for pointing out what Starlink has been up to.

Using OSS machine learning to predict backwards not forwards

There’s a lot of excitement about what machine-led decisioning can introduce into the world of network operations, and rightly so. Excitement about predictions, automation, efficiency, optimisation, zero-touch assurance, etc.

There are so many use-cases that disruptors are proposing to solve using Artificial Intelligence (AI), Machine Learning (ML) and the like. I might have even been guilty of proposing a few ideas here on the PAOSS blog – ideas like closed-loop OSS learning / optimisation of common processes, the use of Robotic Process Automation (RPA) to reduce swivel-chairing and a few more.

But have you ever noticed that most of these use-cases are forward-looking? That is, using past data to look into the future (eg predictions, improvements, etc). What if we took a slightly different approach and used past data to reconcile what we’ve just done?

AI and ML models tend to rely on lots of consistent data. OSS produce or collect lots of consistent data, so we get a big tick there. The only problem is that a lot of our manually created OSS data is riddled with inconsistency, due to nuances in the way that each worker performs workflows, data entry, etc as well as our own personal biases and perceptions.

For example, zero-touch automation has significant cachet at the moment. To achieve that, we need network health indicators (eg alarms, logs, performance metrics), but also a record of interventions that have (hopefully) improved on those indicators. If we have inconsistent or erroneous trouble-ticketing information, our AI is going to struggle to figure out the right response to automate. Network operations teams tend to want to fix problems quickly, get the ticketing data populated as fast as possible and move on to the next fault, even if that means creating inconsistent ticketing data.

So, to get back to looking backwards, a machine-learning use-case to consider today is to look at what an operator has solved, compare it to past resolutions and either validate the accuracy of their resolution data or even auto-populate fields based on the operator’s actions.

Hat tip to Jay Fenton for the idea seed for this post.

1.045 Trillion reasons to re-consider your OSS strategy

The global Internet of Things (IoT) market will be worth $1.1 trillion in revenue by 2025 as market value shifts from connectivity to platforms, applications and services. By that point, there will be more than 25 billion IoT connections (cellular and non-cellular), driven largely by growth in the industrial IoT market. The Asia Pacific region is forecast to become the largest global IoT region in terms of both connections and revenue.
Although connectivity revenue will grow over the period, it will only account for 5 per cent of the total IoT revenue opportunity by 2025, underscoring the need for operators to expand their capabilities beyond connectivity in order to capture a greater share of market value
.”
GSMA Intelligence
, referred to here.

Let’s look at these projected numbers. The GSMA Intelligence report forecasts only 5 cents in every dollar of IoT spend (of a $1.1T market opportunity) will be allocated to connectivity. That leaves $1.045T on the table if network operators just focus on connectivity.

Traditional OSS tend to focus on managing connectivity – less so on managing marketplaces, customer-facing platforms and applications. Does that headline number – $1.045T – provide you with an incentive to re-consider what your OSS manages and future use cases?

IoT OSS market opportunity

IoT requires slightly different OSS thinking:

  • Rather than integrating to a (relatively) small number of device types, IoT will have an almost infinite number of sensor types from a huge range of suppliers.
  • Rather than managing devices individually, their sheer volume means that devices will need to be increasingly managed in cohorts via policy controls.
  • Rather than a fairly narrow set of network-comms based services, functionality explodes into diverse areas like metering, vehicle fleets, health-care, manufacturing, asset controls, etc, etc so IoT controllers will need to be developed by a much longer-tail of suppliers (meaning open development platforms and/or scalable certification processes to integrate into the IoT controller platforms).
  • There are undoubtedly many, many additional differences.

Caveat: I haven’t evaluated the claims / numbers in the GSMA Intelligence report. This blog is just to prompt a thought-experiment around hypothetical projections.

The paint the fence automation analogy

There are so many actions that could be automated by / with / in our OSS. It can be hard to know where to start can’t it? One approach is to look at where the largest amounts of manual effort is being expended by operators. Another way is to employ the “paint the fence” analogy.

When envisaging fulfilment workflows, it’s easiest to picture actions that start with a customer and wipe down through the OSS / BSS stack.

When envisaging assurance workflows, it’s easiest to picture actions that start in the network and wipe up through the OSS / BSS stack.
Paint the fence OSS analogy

Of course there are exceptions to these rules, but to go a step further, wipe down = revenue, wipe up = costs. We want to optimise both through automation of course.

Like ensuring paint coverage when painting a fence, OSS automation has the potential to best improve Customer Experience coverage when we use brushstrokes down and up.

On the downstroke, it’s through faster service activations, quotes, response times, etc. On the upstroke, it’s through network reliability (downtime reduction), preventative maintenance, expedited notifications, etc.

You’ll notice that these are indicators that are favourable to the customers. I’m sure it won’t take much sluething to see the association to trailing metrics that are favourable to the network operators though right?