Designing OSS to cope with greater transience

There are three broad models of networking in use today. The first is the adaptive model where devices exchange peer information to discover routes and destinations. This is how IP networks, including the Internet, work. The second is the static model where destinations and pathways (routes) are explicitly defined in a tabular way, and the final is the central model where destinations and routes are centrally controlled but dynamically set based on policies and conditions.”
Tom Nolle here.

OSS of decades past worked best with static networks. Services / circuits that were predominantly “nailed up” and (relatively) rarely changed after activation. This took the real-time aspect out of play and justified the significant manual effort required to establish a new service / circuit.

However, adaptive and centrally managed networks have come to dominate the landscape now. In fact, I’m currently working on an assignment where DWDM, a technology that was once largely static, is now being augmented to introduce an SDN controller and dynamic routing (at optical level no less!).

This paradigm shift changes the fundamentals of how OSS operate. Apart from the physical layer, network connectivity is now far more transient, so our tools must be able to cope with that. Not only that, but the changes are too frequent to justify the manual effort of the past.

To tie in with yesterday’s post, we are again faced with the option of abstract / generic modelling or specific modelling.

Put another way, we have to either come up with adaptive / algorithmic mechanisms to deal with that transience (the specific model), or need to mimic “nailed-up” concepts (the abstract model).

More on the implications of this tomorrow.

Which OSS tool model do you prefer – Abstract or Specific?

There’s something I’ve noticed about OSS products – they are either designed to be abstract / flexible or they are designed to cater for specific technologies / topologies.

When designed from the abstract perspective, the tools are built around generic core data models. For example, whether a virtual / logical device, a physical device, a customer premises device, a core network device, an access network device, a trasmission device, a security device, etc, they would all be lumped into a single object called a device (but with flexibility to cater for the many differences).

When designed from the specific perspective, the tools are built with exact network and/or service models in mind. That could be 5G, DWDM, physical plant, etc and when any new technology / topology comes along, a new plug-in has to be built to augment the tools.

The big advantage of the abstract approach is obvious (if they truly do have a flexible core design) – you don’t have to do any product upgrades to support a new technology / topology. You just configure the existing system to mimic the new technology / topology.

The first OSS product I worked with had a brilliant core design and was able to cope with any technology / topology that we were faced with. It often just required some lateral thinking to make the new stuff fit into the abstract data objects.

What I also noticed was that operators always wanted to customise the solution so that it became more specific. They were effectively trying to steer the product from an open / abstract / flexible tool-set to the other end of the scale. They generally paid significant amounts to achieve specificity – to mould the product to exactly what their needs were at that precise moment in time.

However, even as an OSS newbie, I found that to be really short-term thinking. A network (and the services it carries) is constantly evolving. Equipment goes end-of-life / end-of-support every few years. Useful life models are usually approx 5-7 years and capital refresh projects tend to be ongoing. Then, of course, vendors are constantly coming up with new products, features, topologies, practices, etc.

Given this constant change, I’d much rather have a set of tools that are abstract and flexible rather than specific but less malleable. More importantly, I’ll always try to ensure that any customisations should still retain the integrity of the abstract and flexible core rather than steering the product towards specificity.

How about you? What are your thoughts?

An OSS conundrum with many perspectives

Even aside from the OSS impact, it illustrates the contrast between “bottom-up” planning of networks (new card X is cheaper/has more ports) and “top down” (what do we need to change to reduce our costs/increase capacity).”
Robert Curran

Robert’s quote above is in response to a post called “Trickle-down impact planning.”

Robert makes a really interesting point. Adding a new card type is a relatively common event for a big network operator. It’s a relatively minor challenge for the networks team – a BAU (Business as Usual) activity in fact. But if you follow the breadcrumbs, the impact to other parts of the business can be quite significant.

Your position in your organisation possibly dictates your perspective on the alternative approaches Robert discusses above. Networks, IT, planning, operations, sales, marketing, projects/delivery, executive – all will have different impacts and a different field of view on the conundrum. This makes it an interesting problem to solve – which viewpoint is the “right” one to tackle the challenge from?

My “solutioning” background tends to align with the top down viewpoint, but today we’ll take a look at this from the perspective of how OSS can assist from either direction.

Bottom Up: In an ideal world, our OSS and associated processes would be able to identify a new card (or similar) and just ripple changes out without interface changes. The first OSS I worked on did this really well. However, it was a “single-vendor” solution so the ripples were self-contained (mostly). This is harder to control in the more typical “best-of-breed” OSS stacks of today. There are architectural mechanisms for controlling the ripples out but it’s still a significant challenge to solve. I’d love to hear from you if you’re aware of any vendors or techniques that do this really well.

Top Down: This is where things get interesting. Should top-down impact analysis even be the task of an OSS/BSS? Since it’s a common / BAU operational task, then you could argue it is. If so, how do we create OSS tools* that help with organisational impact / change / options analysis and not just network impact analysis? How do we build the tools* that can:

  1. Predict the rippling impacts
  2. Allow us to estimate the impact of each
  3. Present options (if relevant) and
  4. Provide a cost-benefit comparison to determine whether any of the options are viable for development

* When I say “tools,” this might be a product, but it could just mean a process, data extract, etc.

I have the sense that this type of functionality falls into the category of, “just because you can, doesn’t mean you should… build it into your OSS.” Have you seen an OSS/BSS with this type of impact analysis functionality built-in?

Blown away by one innovation. Now to extend on it

Our most recent two posts, from yesterday and Friday, have talked about one stunningly simple idea that helps to overcome one of OSS‘ biggest challenges – data quality. Those posts have stimulated quite a bit of dialogue and it seems there is some consensus about the cleverness of the idea.

I don’t know if the idea will change the OSS landscape (hopefully), or just continue to be a strong selling point for CROSS Network Intelligence, but it has prompted me to think a little longer about innovating around OSS‘ biggest challenges.

Our standard approach of just adding more coats of process around our problems, or building up layers of incremental improvements isn’t going to solve them any time soon (as indicated in our OSS Call for Innovation). So how?

Firstly, we have to be able to articulate the problems! If we know what they are, perhaps we can then take inspiration from the CROSS innovation to spur us into new ways of thinking?

Our biggest problem is complexity. That has infiltrated almost every aspect of our OSS. There are so many posts about identifying and resolving complexity here on PAOSS that we might skip over that one in this post.

I decided to go back to a very old post that used the Toyota 5-whys approach to identify the real cause of the problems we face in OSS [I probably should update that analysis because I have a whole bunch of additional ideas now, as I’m sure you do too… suggested improvements welcomed BTW].

What do you notice about the root-causes in that 5-whys analysis? Most of the biggest causes aren’t related to system design at all (although there are plenty of problems to fix in that space too!). CROSS has tackled the data quality root-cause, but almost all of the others are human-centric factors – change controls, availability of skilled resources, requirement / objective mis-matches, stakeholder management, etc. Yet, we always seem to see OSS as a technical problem.

How do you fix those people challenges? Ken Segal puts it this way, “When process is king, ideas will never be. It takes only common sense to recognize that the more layers you add to a process, the more watered down the final work will become.” Easier said than done, but a worthy objective!

Blown away by one innovation – a follow-up concept

Last Friday’s blog discussed how I’ve just been blown away by the most elegant OSS innovation I’ve seen in decades.

You can read more detail via the link, but the three major factors in this simple, elegant solution to data quality problems (probably OSS‘ biggest kryptonite) are:

  1. Being able to make connections that break standard object hierarchy rules; but
  2. Having the ability to mark that standard rules haven’t been followed; and
  3. Being able to uses the markers to prioritise the fixing of data at a more convenient time

It’s effectively point 2 that has me most excited. So novel, yet so obvious in hindsight. When doing data migrations in the past, I’ve used confidence flags to indicate what I can rely on and what needs further audit / remediation / cleansing. But the recent demo I saw of the CROSS product is the first time I’ve seen it built into the user interface of an OSS.

This one factor, if it spreads, has the ability to change OSS data quality in the same way that Likes (or equivalent) have changed social media by acting as markers of confidence / quality.

Think about this for a moment – what if everyone who interacts with an OSS GUI had the ability to rank their confidence in any element of data they’re touching, with a mechanism as simple as clicking a like/dislike button (or similar)?

Bad example here but let’s say field techs are given a design pack, and upon arriving at site, find that the design doesn’t match in-situ conditions (eg the fibre pairs they’re expecting to splice a customer lead-in cable to are already carrying live traffic, which they diagnose is due to data problems in an upstream distribution joint). Rather than jeopardising the customer activation window by having to spend hours/days fixing all the trickle-down effects of the distribution joint data, they just mark confidence levels in the vicinity and get the customer connected.

The aggregate of that confidence information is then used to show data quality heat maps and help remediation teams prioritise the areas that they need to work on next. It helps to identify data and process improvements using big circle and/or little circle remediation techniques.

Possibly the most important implication of the in-built ranking system is that everyone in the end-to-end flow, from order takers to designers through to coal-face operators, can better predict whether they need to cater for potential data problems.

Your thoughts?? In what scenarios do you think it could work best, or alternatively, not work?

I’ve just been blown away by the most elegant OSS innovation I’ve seen in decades

Looking back, I now consider myself extremely lucky to have worked with an amazing product on the first OSS project I worked on (all the way back in 2000). And I say amazing because the underlying data models and core product architecture are still better than any other I’ve worked with in the two decades since. The core is the most elegant, simple and powerful I’ve seen to date. Most importantly, the models were designed to cope with any technology, product or service variant that could be modelled as a hierarchy, whether physical or virtual / logical. I never found a technology that couldn’t be modelled into the core product and it required no special overlays to implement a new network model. Sadly, the company no longer exists and the product is languishing on the books of the company that bought out the assets but isn’t leveraging them.

Having been so spoilt on the first assignment, I’ve been slightly underwhelmed by the level of elegant innovation I’ve observed in OSS since. That’s possibly part of the reason for the OSS Call for Innovation published late last year. There have been many exciting innovations introduced since, but many modern tools are still more complex and complicated than they should be, for implementers and operators alike.

But during a product demo last week, I was blown away by an innovation that was so simple in concept, yet so powerful that it is probably the single most impressive innovation I’ve seen since that first OSS. Like any new elegant solution, it left me wondering why it hasn’t been thought of previously. You’re probably wondering what it is. Well first let me start by explaining the problem that it seeks to overcome.

Many inventory-based OSS rely on highly structured and hierarchical data. This is a double-edged sword. Significant inter-relationship of data increases the insight generation opportunities, but the downside is that it can be immensely challenging to get the data right (and to maintain a high-quality data state). Limited data inter-relationships make the project easier to implement, but tend to allow less rich data analyses. In particular, connectivity data (eg circuits, cables, bearers, VPNs, etc) can be a massive challenge because it requires the linking of separate silos of data, often with no linking key. In fact, the data quality problem was probably one of the most significant root-causes of the demise of my first OSS client.

Now getting back to the present. The product feature that blew me away was the first I’ve seen that allows significant inter-relationship of data (yet in a simple data model), but still copes with poor data quality. Let’s say your OSS has a hierarchical data model that comprises Location, Rack, Equipment, Card, Port (or similar) and you have to make a connection from one device’s port to another’s. In most cases, you have to build up the whole pyramid of data perfectly for each device before you can create a customer connection between them. Let’s also say that for one device you have a full pyramid of perfect data, but for the other end, you only know the location.

The simple feature is to connect a port to a location now, or any other point to point on the hierarchy (and clean up the far-end data later on if you wish). It also allows the intermediate hops on the route to be connected at any point in the hierarchy. That’s too simple right, yet most inventory tools don’t allow connections to be made between different levels of their hierarchies. For implementers, data migration / creation / cleansing gets a whole lot simpler with this approach. But what’s even more impressive is that the solution then assigns a data quality ranking to the data that’s just been created. The quality ranking is subsequently considered by tools such as circuit design / routing, impact analysis, etc. However, you’ll have noted that the data quality issue still hasn’t been fixed. That’s correct, so this product then provides the tools that show where quality rankings are lower, thus allowing remediation activities to be prioritised.

If you have an inventory data quality challenge and / or are wondering the name of this product, it’s CROSS, from the team at CROSS Network Intelligence (

Is your data AI-ready (part 2)

Further to yesterday’s post that posed the question about whether your data was AI ready for virtualised network assurance use cases, I thought I’d raise a few more notes.

The two reasons posed were:

  1. Our data sets haven’t had time to collect much elastic / dynamic network data yet
  2. Our data is riddled with human-generated data that is error-prone

On the latter case in particular, I sense that we’re going to have to completely re-architect the way we collect and store assurance data. We’re almost definitely going to have to think in terms of automated assurance actions and related logging to avoid the errors of human data creation / logging. The question becomes whether it’s worthwhile trying to wrangle all of our old data into formats that the AI engines can cope with or do we just start afresh with new models? (This brings to mind the recent “perfect data” discussion).

It will be one thing to identify patterns, but another thing entirely to identify optimum response activities and to automate those.

If we get these steps right, does it become logical that the NOC (network) and SOC (security operations centre) become conjoined… at least much more so than they tend to be today? In other words, does incident management merge network incidents and security incidents onto common analysis and response platforms? If so, does that imply another complete re-architecture? It certainly changes the operations model.

I’d love to hear your thoughts and predictions.

Are your existing data sets actually suited to seeding an AI engine?

In the virtualization domain, the old root cause technology is becoming obsolete because resources and workloads move around dynamically – we no longer have fixed network and compute resources. Existing service assurance systems in the telecommunication network were designed to manage a fixed set of resources and these assurance systems fall short in monitoring dynamic virtualized networks. Code was written using a rule based approach on known problems. Some advances have been made to develop signature patterns to determine the root cause of a problem, but this approach will also fall short in a dynamic virtualized network where autonomous changes will occur continuously.”
Patrick Kelly

This quote is taken from a really interesting article by Patrick Kelly (see link above).

The old models of determining service impact and root-cause certainly struggle to hold up in the transient world of virtualised networks. Artificial Intelligence or Machine Learning or machine-led pattern identification, or whatever the technologies will be called by their developers, have a really important part to play in networks that are not just dynamic, but undergoing a touchpoint explosion.

The fascinating part of this story is that these clever new models will rely on data. Lots of data. We already have lots of data to feed into the new models. Buuuuut…. I’ve long held the reservation that there might be one slight problem… does all of our existing data actually suit the “AI” models available today?

Firstly, our existing data doesn’t include much of a history on dynamically transient networks. But the more important factor is that our networks have been managed by humans – operators who have a tendency of recording the quickest, dirtiest (and not necessarily correct or complete) set of data that allows them to restore service quickly.

Following a recent discussion with someone who’s running an AI assurance PoC for a big telco, it seems this reservation is turning out to be true. Their existing data sets just aren’t suited to the AI models. They’re having to reconsider their whole approach to their data model and how to collect / store it. They’re now starting to get positive results from the custom-built data sets.

It’s coming back to the same story as a post from last week – having connectors that can translate the different languages of ops, data, AI, etc and building a people / process / technology solution that the AI models can cope with.

You might not be ready to start an AI experiment yet, but you may like to start the journey by understanding whether your existing data is suited to AI modelling. If not, you get the chance to change it and have a great repository of data to seed an AI engine when you are ready in future. The first step on an exponential OSS journey.

Trickle-down impact planning

We introduced the concept of The Trickle-down Effect last year, an effect that sees the most minor changes trickling down through an OSS stack, with much bigger consequences than expected.

The trickle-down effect can be insidious, turning a nice open COTS solution into a beast that needs constant attention to cope with the most minor of operational changes. The more customisations made, the more gnarly the beast tends to be.”

Here’s an example I saw recently. An internal business unit wanted to introduce a new card type into the chassis set they managed. Speaking with the physical inventory team, it seemed the change was quite small and a budget was developed for the works… but the budget (dollars / time / risk) was about to blow out in a big way.

The new card wasn’t being picked up in their fault-management or performance management engines. It wasn’t picked up in key reports, nor was it being identified in the configuration management database or logical inventory. Every one of these systems needed interface changes. Not massive change obviously, but collectively the budget blew out by 10x and expedite changes pushed out the work previously planned by each of the interface development and testing teams.

These trickle-down impacts were known…. by some people…. but weren’t communicated to the business unit responsible for managing the new card type. There’s a possibility that they may not have even added the new card type if they realised the full OSS cost consequences.

Are these trickle-down impacts known and readily communicated within your OSS change processes?

Those who rule perfect data…

A Passionate About OSS article last month spoke of how the investment strategy of a $106 billion VC fund has changed my thinking on our OSS‘ most valuable asset. Masayoshi Son is quoted in that article as follows:

“Those who rule data will rule the entire world. That’s what people of the future will say.”

But one question keeps coming back to me… if you’re ruling poor quality data, will you rule nothing whatsoever?

Along the same lines, the old adage, “practice makes perfect,” is not very helpful if you’re not practicing in a constructive way. A better (albeit somewhat impossible) variant on the adage would be “PERFECT practice makes perfect.”

Let me share an example. There is a product that is completely ground-breaking in its ability to automate and optimise designs of large-scale network roll-outs – designs that include outside plant and access network technologies. In bake-offs with some of the best available network designers, this product and its algorithm consistently beats the humans by far more than 25% (when measured by capital costs, implementation time and various other metrics).

Its one challenge in taking over the world and automating every future network design is having a base set of data that is so perfect that no re-design work is required. For example, if the base data says a duct route is available and has capacity for inserting a cable, then the product assumes it can use the duct in its optimal design. But when the field techs arrive at site, they find the duct is too badly damaged to use or already filled to capacity with other cables that can’t be overhauled. A new optimal design has to be calculated to consider the lack of availability of that duct.

The tool still gives great results, even after all the manual intervention, but perfect source data would give breathtaking results.

So I’d look to make one small tweak to Masayoshi Son’s quote. “Those who rule PERFECT* data will rule the entire world. That’s what people of the future will say.”

* whereby perfect means as high in quality as realistically possible.

So, perhaps those expensive data audits and cumbersome data quality processes will have a far greater ROI (Return on Investment) in future than any of us could ever estimate.

The pruning saw technique for OSS fall-out management

Many different user journeys flow through our OSS every day. These include external / customer journeys, internal / operator journeys and possibly even machines-to-machine or system journeys. Unfortunately, not all of these journeys are correctly completed through to resolution.

The incomplete or unsatisfactory journeys could include inter-system fall-outs, customer complaints, service quality issues, and many more.

If we categorise and graph these unsuccessful journeys, we will often find a graph like the one below. This example shows that a small number of journey categories account for a large proportion of the problematic journeys. However, it also shows a long tail of other categories that are individually insignificant, but collectively significant.

Long tail of OSS demands

The place where everybody starts (including me) is to focus on the big wins on lhe left side of the graph. lt makes sense that this is where your efforts will have the biggest impacts. Unfortunately, since that’s where everybody focusses effort, chances are the significant gains have already been achieved and optimisation is already quite high (but numbers are still high due constraints within the existing solution stack).

The long tail intrigues me. It’s harder to build a business case for because there are many causes and little return for solving them. Alternatively, if we can slowly and systematically remove many of these rarer events, we’re removing the noise and narrowing our focus on the signal, thus simplifying our overall solution.

Since they’re statistically rarer, we can often afford to be more ruthless in the ways that we prevent them from occurring. But how do we identify and prevent them?

Each of the bars on the chart above represent leaves on a decision tree (faulty leaves in this case). If we work our way back up the branches, we can ruthlessly prune. Examples could be:

  • The removal of obscure service options that lead to faults
  • Reduction (or expansion or refinement) of process designs to better cope with boundary cases that generate problems
  • Removal of grandfathered products that have few customers, generate losses and consume disproportionate support effort
  • Removal or refinement of interfaces, especially east-west systems that contribute little to an end-to-end process but lead to many fall-outs
  • Identification of improved exception handling, either within systems or at hand-off points

I’m sure you can think of many more, especially when you start tracing back up decision trees with a pruning saw in hand!!


Interaction points with fast/slow processes

Further to yesterday’s post on fast / slow processes and factory platforms, a concept presented by Sylvain Denis of Orange in Melbourne last week, here’s a diagram from Sylvain’s presentation pack :

The yellow blocks represent the fast (automated) processes. The orange blocks represent the slow processes.

The next slide showed the human interaction points (blue boxes) into this API / factory stack.

Fast / Slow OSS processes

Yesterday’s post discussed using smart contracts and Network as a Service (NaaS) to give a network the properties that will allow it to self-heal.

It mentioned a couple of key challenges, one being that there will always be physical activities such as cable cuts fixes, faulty equipment replacement, physical equipment expansion / contraction / lifecycle-management.

In a TM Forum presentation last week, Sylvain Denis of Orange proposed the theory of fast and slow OSS processes. Fast – soft factories (software and logical resources) within the operations stack are inherently automatable (notwithstanding the complexities and cost-benefit dilemma of actually building automations). Slow – physical factories are slow processes as they usually rely on human tasks and/or have location constraints.

Orchestration relies on programmatic interfaces to both. Not all physical factories have programmatic interfaces in all OSS / BSS stacks yet. It will remain a key requirement for the forseeable future to be able to handle dual-speed processes / factories.

Bringing Eminem’s blank canvas to OSS

“When you start out in your career, you have a blank canvas, so you can paint anywhere that you want because the shit ain’t been painted on yet. And then your second album comes out, and you paint a little more and you paint a little more. By the time you get to your seventh and eighth album you’ve already painted all over it. There’s nowhere else to paint.”
Eminem. (on Rick Rubin and Malcolm Gladwell’s Broken Record podcast)

To each their own. Personally, Eminem’s music has never done it for me, whether his first or eighth album, but the quote above did strike a chord (awful pun).

It takes many, many hours to paint in the detail of an OSS painting. By the time a product has been going for a few years, there’s not much room left on the canvas and the detail of the existing parts of the work is so nuanced that it’s hard to contemplate painting over.

But this doesn’t consider that over the years, OSS have been painted on many different canvases. First there were mainframes, then client-server, relational databases, XaaS, virtualisation (of servers and networks), and a whole continuum in between… not to mention the future possibilities of blockchain, AI, IoT, etc. And that’s not even considering the changes in programming languages along the way. In fact, new canvases are now presenting themselves at a rate that’s hard to keep up with.

The good thing about this is that we have the chance to start over with a blank canvas each time, to create something uniquely suited to that canvas. However, we invariably attempt to bring as much of the old thinking across as possible, immediately leaving little space left to paint something new. Constraints that existed on the old canvas don’t always apply to each new canvas, but we still have a habit of bringing them across anyway.

We don’t always ask enough questions like:

  • Does this existing process still suit the new canvas
  • Can we skip steps
  • Can we obsolete any of the old / unused functionality
  • Are old and new architectures (at all levels) easily transmutable
  • Does the user interface need to be ported or replaced
  • Do we even need a user interface (assuming the rise of machine-to-machine with IoT, etc)
  • Does the old data model have any relevance to the new canvas
  • Do the assurance rules of fixed-network services still apply to virtualised networks
  • Do the fulfillment rules of fixed-network services still apply to virtualised networks
  • Are there too many devices to individually manage or can they be managed as a cohort
  • Does the new model give us access to new data and/or techniques that will allow us to make decisions (or derive insights) differently
  • Does the old billing or revenue model still apply to the new platform
  • Can we increase modularity and abstraction between modules

“The real reason “blockchain” or “AI” may actually change businesses now or in the future, isn’t that the technology can do remarkable things that can’t be done today, it’s that it provides a reason for companies to look at new ways of working, new systems and finally get excited about what can be done when you build around technology.”
Tom Goodwin

How the investment strategy of a $106 billion VC fund changed my OSS thinking

What is a service provider’s greatest asset?

Now I’m biased when considering the title question, but I believe OSS are the puppet-master of every modern service provider. They’re the systems that pull all of the strings of the organisation. They generate the revenue by operationalising and assuring the networks as well as the services they carry. They coordinate the workforce. They form the real-time sensor networks that collect and provide data, but more importantly, insights to all parts of the business. That, and so much more.

But we’re pitching our OSS all wrong. Let’s consider first how we raise revenue from OSS, be that either internal (via internal sponsors) or external (vendor/integrator selling to customers)? Most revenue is either generated from products (fixed, leased, consumption revenue models) or services (human effort).

This article from just last month ruminated, “An organisation buys an OSS, not because it wants an Operational Support System, but because it wants Operational Support,” but I now believe I was wrong – charting the wrong course in relation to the most valuable element of our OSS.

After researching Masayoshi Son’s Vision Fund, I’m certain we’re selling a fundamentally short-term vision. Yes, OSS are valuable for the operational support they provide, but their greatest value is as vast data collection and processing engines.

“Those who rule data will rule the entire world. That’s what people of the future will say.”
Masayoshi Son.

For those unfamiliar with Masayoshi Son, he’s Japan’s richest man, CEO of SoftBank, in charge of a monster (US$106 billion) venture capital fund called Vision Fund and is seen as one of the world’s greatest technology visionaries.

As this article on Fortune explains Vision Fund’s foundational strategy, “…there’s a slide that outlines the market cap of companies during the Industrial Revolution, including the Pennsylvania Railroad, U.S. Steel, and Standard Oil. The next frontier, he [Son] believes, is the data revolution. As people like Andrew Carnegie and John D. Rockefeller were able to drastically accelerate innovation by having a very large ownership over the inputs of the Industrial Revolution, it looks like Son is trying to do something similar. The difference being he’s betting on the notion that data is one of the most valuable digital resources of modern day.”

Matt Barnard is CEO of Plenty, one of the companies that Vision Fund has invested in. He had this to say about the pattern of investments by Vision Fund, “I’d say the thing we have in common with his other investments is that they are all part of some of the largest systems on the planet: energy, transportation, the internet and food.”

Telecommunications falls into that category too. SoftBank already owns significant stakes in telecommunications and broadband network providers.

But based on the other investments made by Vision Fund so far, there appears to be less focus on operational data and more focus on customer activity and decision-making data. In particular, unravelling the complexity of customer data in motion.

OSS “own” service provider data, but I wonder whether we’re spending too much time thinking about operational data (and how to feed it into AI engines to get operational insights) and not enough on stitching customer-related insight sets together. That’s where the big value is, but we’re rarely thinking about it or pitching it that way… even though it is perhaps the most valuable asset a service provider has.

The chains of integration are too light until…

Chains of habit are too light to be felt until they are too heavy to be broken.”
Warren Buffett (although he attributed it to an unknown author, perhaps originating with Samuel Johnson).

What if I were to replace the word “habit” in the quote above with “OSS integration” or “OSS customisation” or “feature releases?”

The elegant quote reflects this image:

The chains of feature releases are light at t0. They’re much heavier at t0+100.

Like habits, if we could project forward and see the effects, would we allow destructive customisations to form in their infancy? The problem is that we don’t see them as destructive at infancy. We don’t see how they entangle.

Posing a Network Data Synchronisation Protocol (NDSP) concept

Data quality is one of the biggest challenges we face in OSS. A product could be technically perfect, but if the data being pumped into it is poor, then the user experience of the product will be awful – the OSS becomes unusable, and that in itself generates a data quality death spiral.

This becomes even more important for the autonomous, self-healing, programmable, cooperative networks being developed (think IoT, virtualised networks, Self-Organizing Networks). If we look at IoT networks for example, they’ll be expected to operate unattended for long periods, but with code and data auto-propagating between nodes to ensure a level of self-optimisation.

So today I’d like to pose a question. What if we could develop the equivalent of Network Time Protocol (NTP) for data? Just as NTP synchronises clocking across networks, Network Data Synchronisation Protocol (NDSP) would synchronise data across our networks through a feedback-loop / synchronisation algorithm.

Of course there are differences from NTP. NTP only tries to coordinate one data field (time) along a common scale (time as measured along a 64+64 bits continuum). The only parallel for network data is in life-cycle state changes (eg in-service, port up/down, etc).

For NTP, the stratum of the clock is defined (see image below from wikipedia).

This has analogies with data, where some data sources can be seen to be more reliable than others (ie primary sources rather than secondary or tertiary sources). However, there are scenarios where stratum 2 sources (eg OSS) might push state changes down through stratum 1 (eg NMS) and into stratum 0 (the network devices). An example might be renaming of a hostname or pushing a new service into the network.

One challenge would be the vast different data sets and how to disseminate / reconcile across the network without overloading it with management / communications packets. The other would be that format consistency. I once had a device type that had four different port naming conventions, and that was just within its own NMS! Imagine how many port name variations (and translations) might have existed across the multiple inventories that exist in our networks. The good thing about the NDSP concept is that it might force greater consistency across different vendor platforms.

Another would be that NDSP would become a huge security target as it would have the power to change configurations and because of its reach through the network.

So what do you think? Has the NDSP concept already been developed? Have you implemented something similar in your OSS? What are the scenarios in which it could succeed? Or fail?

I’m predicting the demise of the OSS horse

“What will telcos do about the 30% of workers AI is going to displace?”
Dawn Bushaus

That question, which is the headline of Dawn’s article on TM Forum’s Inform platform, struck me as being quite profound.

As an aside, I’m not interested in the number – the 30% – because I concur with Tom Goodwin’s sentiments on LinkedIn, “There is a lot of nonsense about AI.
Next time someone says “x% of businesses will be using AI by 2020” or “AI will be worth $xxxBn by 2025” or any of those other generic crapspeak comments, know that this means nothing.
AI is a VERY broad area within computer science that includes about 6-8 very different strands of work. It spans robotics, image recognition, machine learning, natural language processing, speech recognition and far more. Nobody agrees on what is and isn’t in this.
This means it covers everything from superintelligence to artificial creativity to chatbots

For the purpose of this article, let’s just say that in 5 years AI will replace a percentage of jobs that we in tech / telco / OSS are currently doing. I know at least a few telcos that have created updated operating plans built around a headcount reduction much greater than the 30% mentioned in Dawn’s article. This is despite the touchpoint explosion and increased complexity that is already beginning to crash down onto us and will continue apace over the next 5 years.

Now, assuming you expect to still be working in 5 years time and are worried that your role might be in the disappearing 30% (or whatever percentage), what do you do now?

First, figure out what the modern equivalents of the horse are in the context of Warren Buffett’s quote below:

“What you really should have done in 1905 or so, when you saw what was going to happen with the auto is you should have gone short horses. There were 20 million horses in 1900 and there’s about 4 million now. So it’s easy to figure out the losers, the loser is the horse. But the winner is the auto overall. [Yet] 2000 companies (carmakers) just about failed.”

It seems impossible to predict how AI (all strands) might disrupt tech / telco / OSS in the next 5 years – and like the auto industry, more impossible to predict the winners (the technologies, the companies, the roles). However, it’s almost definitely easier to predict the losers.

Massive amounts are being invested into automation (by carriers, product vendors and integrators), so if the investments succeed, operational roles are likely to be net losers. OSS are typically built to make operational roles more efficient – but if swathes of operator roles are automated, then does operational support also become a net loser? In its current form, probably yes.

Second, if you are a modern-day horse, ponder which of your skills are transferable into the future (eg chassis building, brakes, steering, etc) and which are not (eg buggy-whip making, horse-manure collecting, horse grooming, etc). Assuming operator-driven OSS activities will diminish, but automation (with or without AI) will increase, can you take your current networks / operations knowledge and combine that with up-skilling in data, software and automation tools?

Even if OSS user interfaces are made redundant by automation and AI, we’ll still need to feed the new technologies with operations-style data, seed their learning algorithms and build new operational processes around them.

The next question is double-edged – for both individuals and telcos alike – how are you up-skilling for a future without horses?

Are your various device inventory repositories in synch?

Does your organisation have a number of different device inventory repositories?
Hint: You might even be surprised by how many you have.

Examples include:

  • Physical network inventory
  • Logical network inventory
  • DNS records
  • CMDB (Config Management DB)
  • IPAM (IP Address Management)
  • EMS (Element Management Systems)
  • SIEM (Security Information and Event Management)
  • Desktop / server management
  • not to mention the management information base on the devices themselves (the only true primary data source)

Have you recently checked how in-synch they are? As a starting point, are device counts aligned (noting that different repositories perhaps only cover subsets of device ranges)? If not, how many cross-match between repositories?

If they’re out of synch, do you have a routine process for triangulating / reconciling / cleansing? Even better, do you have an automated, closed-loop process for ensuring synchronisation across the different repositories?

Would anyone like to offer some thoughts on the many reasons why it’s important to have inventory alignment?

I’ll give a few little hints:

  • Security
  • Assurance
  • Activations
  • Integrations
  • Asset lifecycle management
  • Financial management (ie asset depreciation)

Torturous OSS version upgrades

Have you ever worked on an OSS where a COTS (Commercial Off-The-Shelf) solution has been so heavily customised that implementing the product’s next version upgrade has become a massive challenge? The solution has become so entangled that if the product was upgraded, it would break the customisations and/or integrations that are dependent upon that product.

This trickle-down effect is the perfect example of The Chess-board Analogy or The Tech-debt Wreck at work. Unfortunately, it is far too common, particularly in large, complex OSS environments.

The OSS then either has to:

  • skip the upgrade or
  • take a significant cost/effort hit and perform an upgrade that might otherwise be quite simple.

If the operator decides to take the “skip” path for a few upgrades in a row, then it gets further from the vendor’s baseline and potentially misses out on significant patches, functionality or security hardening. Then, when finally making the decision to upgrade, a much more complex project ensues.

It’s just one more reason why a “simple” customisation often has a much greater life-cycle cost than was initially envisaged.

How to reduce the impact?

  1. We’ve recently spoken about using RPA tools for pseudo-integrations, allowing the operator to leave the COTS product un-changed, but using RPA to shift data between applications
  2. Attempt to achieve business outcomes via data / process / config changes to the COTS product rather than customisations
  3. Enforce a policy of integration as a last resort as a means of minimising the chess-board implications (ie attempting to solve problems via processes, in data, etc before considering any integration or customisation)
  4. Enforcing modularity in the end-to-end architecture via carefully designed control points, microservices, etc

There are probably many other methods that I’m forgetting about whilst writing the article. I’d love to hear the approach/es you use to minimise the impact of COTS version upgrades. Similarly, have you heard of any clever vendor-led initiatives that are designed to minimise upgrade costs and/or simplify the upgrade path?