Using OSS machine learning to predict backwards not forwards

There’s a lot of excitement about what machine-led decisioning can introduce into the world of network operations, and rightly so. Excitement about predictions, automation, efficiency, optimisation, zero-touch assurance, etc.

There are so many use-cases that disruptors are proposing to solve using Artificial Intelligence (AI), Machine Learning (ML) and the like. I might have even been guilty of proposing a few ideas here on the PAOSS blog – ideas like closed-loop OSS learning / optimisation of common processes, the use of Robotic Process Automation (RPA) to reduce swivel-chairing and a few more.

But have you ever noticed that most of these use-cases are forward-looking? That is, using past data to look into the future (eg predictions, improvements, etc). What if we took a slightly different approach and used past data to reconcile what we’ve just done?

AI and ML models tend to rely on lots of consistent data. OSS produce or collect lots of consistent data, so we get a big tick there. The only problem is that a lot of our manually created OSS data is riddled with inconsistency, due to nuances in the way that each worker performs workflows, data entry, etc as well as our own personal biases and perceptions.

For example, zero-touch automation has significant cachet at the moment. To achieve that, we need network health indicators (eg alarms, logs, performance metrics), but also a record of interventions that have (hopefully) improved on those indicators. If we have inconsistent or erroneous trouble-ticketing information, our AI is going to struggle to figure out the right response to automate. Network operations teams tend to want to fix problems quickly, get the ticketing data populated as fast as possible and move on to the next fault, even if that means creating inconsistent ticketing data.

So, to get back to looking backwards, a machine-learning use-case to consider today is to look at what an operator has solved, compare it to past resolutions and either validate the accuracy of their resolution data or even auto-populate fields based on the operator’s actions.

Hat tip to Jay Fenton for the idea seed for this post.

How economies of unscale change the OSS landscape

For more than a century, economies of scale made the corporation an ideal engine of business. But now, a flurry of important new technologies, accelerated by artificial intelligence (AI), is turning economies of scale inside out. Business in the century ahead will be driven by economies of unscale, in which the traditional competitive advantages of size are turned on their head.
Economies of unscale are enabled by two complementary market forces: the emergence of platforms and technologies that can be rented as needed. These developments have eroded the powerful inverse relationship between fixed costs and output that defined economies of scale. Now, small, unscaled companies can pursue niche markets and successfully challenge large companies that are weighed down by decades of investment in scale — in mass production, distribution, and marketing
.”
Hemant Taneja with Kevin Maney
in their Sloan Review article, “The End of Scale.”

There are two pathways I can envisage OSS playing a part in the economies of unscale indicated in the Sloan Review quote above.

The first is the changing way of working towards smaller, more nimble organisations, which includes increasing freelancing. There are already many modularised activities managed within an OSS, such as field work, designs, third-party service bundling, where unscale is potentially an advantage. OSS natively manages all these modules with existing tools, whether that’s ticketing, orchestration, provisioning, design, billing, contract management, etc.

Add smart contract management and John Reilly’s value fabric will undoubtedly increase in prevalence. John states that a value fabric is a mesh of interwoven, cooperating organizations and individuals, called parties, who directly or indirectly deliver value to customers. It gives the large, traditional network operators the chance to be more creative in their use of third parties when they look beyond their “Not Invented Here” syndrome of the past. It also provides the opportunity to develop innovative supply and procurement chains (meshes) that can generate strategic competitive advantage.

The second comes with an increasing openness to using third-party platforms and open-source OSS tools within operator environments. The OSS market is already highly fragmented, from multi-billion dollar companies (by market capitalisation) through to niche, even hobby, projects. However, there tended to be barriers to entry for the small or hobbyist OSS provider – they either couldn’t scale their infrastructure or they didn’t hold the credibility mandated by risk averse network operators.

As-a-Service platforms have changed the scale dynamic because they now allow OSS developers to rent infrastructure on a pay-as-you-eat model. In other words, the more their customers consume, the more infrastructure an OSS supplier can afford to rent from platforms such as AWS. More importantly, this become a possibility because operators are now increasingly open to renting third-party services on shared (but compartmentalised / virtualised) infrastructure. BTW. When I say “infrastructure” here, I’m not just talking about compute / network / storage but also virtualisation, containerisation, databases, AI, etc, etc.

Similarly, the credibility barrier-to-entry is being pulled down like the Berlin Wall as operators are increasingly investing in open-source projects. There are large open-source OSS projects / platforms being driven by the carriers themselves (eg ONAP, OpenStack, OPNFV, etc) that are accommodative of smaller plug-in modules. Unlike the proprietary, monolithic OSS/BSS stacks of the past, these platforms are designed with collaboration and integration being front-of-mind.

However, there’s an element of “potential” in these economies of unscale. Andreas Hegers likens open-source to the wild west, as many settlers seek to claim their patch of real-estate in an uncharted map. Andreas states further, “In theory, vendor interoperability from open source should be convenient — even harmonious — with innovations being shared like recipes. Unfortunately for many, the system has not lived up to this reality.”

Where do you sit on the potential of economies of unscale and open-source OSS?

Getting lost in the flow of OSS

The myth is that people play games because they want to avoid challenging work. The reality is, people play games to engage in well-designed, challenging work. The only thing they are avoiding is poorly designed work. In essence, we are replacing poorly designed work with work that provides a more meaningful challenge and offers a richer sense of progress.
And we should note at this point that just because something is a game, it doesn’t mean it’s good. As we’ll soon see, it can be argued that everything is a game. The difference is in the design.
Really good games have been ruthlessly play-tested and calibrated to the point where achieving a state of flow is almost guaranteed for many. Play-testing is just another word for iterative development, which is essentially the conducting of progressive experiments
.”
Dr Jason Fox
in his book, “The Game Changer.”

Reflect with me for a moment – when it comes to your OSS activities, in which situations do you consistently get into a state of flow?

For me, it’s in quite a few different scenarios, but one in particular stands out – building up a network model in an inventory management tool. This activity starts with building models / patterns of devices, services, connections, etc, then using the models to build a replica of the network, either manually or via data migration, within the inventory tool(s). I can lose complete track of time when doing this task. In fact I have almost every single time I’ve performed this task.

Whilst not being much of a gamer, I suspect it’s no coincidence that by far my favourite video game genre is empire-building strategy games like the Civilization series. Back in the old days, I could easily get lost in them for hours too. Could we draw a comparison from getting that same sense of achievement, seeing a network (of devices in OSS, of cities in the empire strategy games) grow rapidly as a result of your actions?

What about fans of first-person shooter games? I wonder whether they get into a state of flow on assurance activities, where they get to hunt down and annihilate every fault in their terrain?

What about fans of horse grooming and riding games? Well…. let’s not go there. 🙂

Anyway, enough of all these reflections and musings. I would like to share three concepts with you that relate to Dr Fox’s quote above:

  1. Gamification – I feel that there is MASSIVE scope for gamification of our OSS, but I’ve yet to hear of any OSS developers using game design principles
  2. Play-testing – How many OSS are you aware of that have been, “ruthlessly play-tested and calibrated?” In almost every OSS situation I’ve seen, as soon as functionality meets requirements, we stop and move on to the next feature. We don’t pause and try a few more variants to see which is most likely to result in a great design, refining the solution, “to the point where achieving a state of flow is almost guaranteed for many
  3. Richer Progress – How many of our end-to-end workflows are designed with, “a richer sense of progress” in mind? Feedback tends to come through retrospective reporting (if at all), rarely through the OSS game-play itself. Chances are that our end-to-end processes actually flow through multiple un-related applications, so it comes back to clever integration design to deliver more compelling feedback. We simply don’t use enough specialist creative designers in OSS

An OSS automation mind-flip

I recently had something of a perspective-flip moment in relation to automation within the realm of OSS.

In the past, I’ve tended to tackle the automation challenge from the perspective of applying automated / scripted responses to tasks that are done manually via the OSS. But it’s dawned on me that I have it around the wrong way! It is an incremental perspective on the main objective of automations – global zero-touch networks.

If we take all of the tasks performed by all of the OSS around the globe, the number of variants is incalculable… which probably means the zero-touch problem is unsolvable (we might be able to solve for many situations, but not all).

The more solvable approach would be to develop a more homogeneous approach to network self-care / self-optimisation. In other words, the majority of the zero-touch challenge is actually handled at the equivalent of EMS level and below (I’m perhaps using out-dated self-healing terminology, but hopefully terminology that’s familiar to readers) and only cross-domain issues bubble up to OSS level.

As the diagram below describes, each layer up abstracts but connects (as described in more detail in “What an OSS shouldn’t do“). That is, each higher layer in the stack reduces the amount if information/control within a domain that it’s responsible for, but it assumes more a broader responsibility for connecting domains together.
OSS abstract and connect

The abstraction process reduces the number of self-healing variants the OSS needs to handle. But to cope with the complexity of self-caring for connected domains, we need a more homogeneous set of health information being presented up from the network.

Whereas the intent model is designed to push actions down into the lower layers with a standardised, simplified language, this would be the reverse – pushing network health knowledge up to higher layers to deal with… in a standard, consistent approach.

And BTW, unlike the pervading approach of today, I’m clearly saying that when unforeseen (or not previously experienced) scenarios appear within a domain, they’re not just kicked up to OSS, but the domains are stoic enough to deal with the situation inside the domain.

Using OSS/BSS to steer the ship

For network operators, our OSS and BSS touch most parts of the business. The network, and the services they carry, are core business so a majority of business units will be contributing to that core business. As such, our OSS and BSS provide many of the metrics used by those business units.

This is a privileged position to be in. We get to see what indicators are most important to the business, as well as the levers used to control those indicators. From this privileged position, we also get to see the aggregated impact of all these KPIs.

In your years of working on OSS / BSS, how many times have you seen key business indicators that are conflicting between business units? They generally become more apparent on cross-team projects where the objectives of one internal team directly conflict with the objectives of another internal team/s.

In theory, a KPI tree can be used to improve consistency and ensure all business units are pulling towards a common objective… [but what if, like most organisations, there are many objectives? Does that mean you have a KPI forest and the trees end up fighting for light?]

But here’s a thought… Have you ever seen an OSS/BSS suite with the ability to easily build KPI trees? I haven’t. I’ve seen thousands of standalone reports containing myriad indicators, but never a consolidated roll-up of metrics. I have seen a few products that show operational metrics rolled-up into a single dashboard, but not business metrics. They appear to have been designed to show an information hierarchy, but not necessarily with KPI trees in mind specifically.

What do you think? Does it make sense for us to offer KPI trees as base product functionality from our reporting modules? Would this functionality help our OSS/BSS add more value back into the businesses we support?

The Goldilocks OSS story

We all know the story of Goldilocks and the Three Bears where Goldilocks chooses the option that’s not too heavy, not too light, but just right.

The same model applies to OSS – finding / building a solution that’s not too heavy, not too light, but just right. To be honest, we probably tend to veer towards the too heavy, especially over time. We put more complexity into our architectures, integrations and customisations… because we can… which end up burdening us and our solutions.

A perfect example is AT&T offering its ECOMP project (now part of the even bigger Linux Foundation Network Fund) up for open source in the hope that others would contribute and help mature it. As a fairytale analogy, it’s an admission that it’s too heavy even for one of the global heavyweights to handle by itself.

The ONAP Charter has some great plans including, “…real-time, policy-driven orchestration and automation of physical and virtual network functions that will enable software, network, IT and cloud providers and developers to rapidly automate new services and support complete lifecycle management.”

These are fantastic ambitions to strive for, especially at the Pappa Bear end of the market. I have huge admiration for those who are creating and chasing bold OSS plans. But what about for the large majority of customers that fall into the Goldilocks category? Is our field of vision so heavy (ie so grand and so far into the future) that we’re missing the opportunity to solve the business problems of our customers and make a difference for them with lighter solutions today?

TM Forum’s Digital Transformation World is due to start in just over two weeks. It will be fascinating to see how many of the presentations and booths consider the Goldilocks requirements. There probably won’t be many because it’s just not as sexy a story as one that mentions heavy solutions like policy-driven orchestration, zero-touch automation, AI / ML / analytics, self-scaling / self-healing networks, etc.

[I should also note that I fall into the category of loving to listen to the heavy solutions too!! ]

Networks lead. OSS are an afterthought. Time for a change?

In a recent post, we described how changing conditions in networks (eg topologies, technologies, etc) cause us to reconsider our OSS.

Networks always lead and OSS (or any form of network management including EMS/NMS) is always an afterthought. Often a distant afterthought.

But what if we spun this around? What if OSS initiated change in our networks / services? After all, OSS is the platform that operationalises the network. So instead of attempting to cope with a huge variety of network options (which introduces a massive number of variants and in turn, massive complexity, which we’re always struggling with in our OSS), what if we were to define the ways that networks are operationalised?

Let’s assume we want to lead. What has to happen first?

Network vendors tend to lead currently because they’re offering innovation in their networks, but more importantly on the associated services supported over the network. They’re prepared to take the innovation risk knowing that operators are looking to invest in solutions they can offer to their customers (as products / services) for a profit. The modus operandi is for operators to look to network vendors, not OSS vendors / integrators, to help to generate new revenues. It would take a significant perception shift for operators to break this nexus and seek out OSS vendors before network vendors. For a start, OSS vendors have to create a revenue generation story rather than the current tendency towards a cost-out business case.

ONAP provides an interesting new line of thinking though. As you know, it’s an open-source project that represents multiple large network operators banding together to build an innovative new approach to OSS (even if it is being driven by network change – the network virtualisation paradigm shift in particular). With a white-label, software-defined network as a target, we have a small opening. But to turn this into an opportunity, our OSS need to provide innovation in the services being pushed onto the SDN. That innovation has to be in the form of services/products that are readily monetisable by the operators.

Who’s up for this challenge?

As an aside:
If we did take the lead, would our OSS look vastly different to what’s available today? Would they unequivocally need to use the abstract model to cope with the multitude of scenarios?

A purple cow in our OSS paddock

A few years ago, I read a book that had a big impact on the way I thought about OSS and OSS product development. Funnily enough, the book had nothing to do with OSS or product development. It was a book about marketing – a subject that I wasn’t very familiar with at the time, but am now fascinated with.

And the book? Purple Cow by Seth Godin.
Purple Cow

The premise behind the book is that when we go on a trip into the countryside, we notice the first brown or black cows, but after a while we don’t pay attention to them anymore. The novelty has worn off and we filter them out. But if there was a purple cow, that would be remarkable. It would definitely stand out from all the other cows and be talked about. Seth promoted the concept of building something into your products that make them remarkable, worth talking about.

I recently heard an interview with Seth. Despite the book being launched in 2003, apparently he’s still asked on a regular basis whether idea X is a purple cow. His answer is always the same – “I don’t decide whether your idea is a purple cow. The market does.”

That one comment brought a whole new perspective to me. As hard as we might try to build something into our OSS products that create a word-of-mouth buzz, ultimately we don’t decide if it’s a purple cow concept. The market does.

So let me ask you a question. You’ve probably seen plenty of different OSS products over the years (I know I have). How many of them are so remarkable that you want to talk about them with your OSS colleagues, or even have a single feature that’s remarkable enough to discuss?

There are a lot of quite brilliant OSS products out there, but I would still classify almost all of them as brown cows. Brilliant in their own right, but unremarkable for their relative sameness to lots of others.

The two stand-out purple cows for me in recent times have been CROSS’ built-in data quality ranking and Moogsoft’s Incident Room model. But it’s not for me to decide. The market will ultimately decide whether these features are actual purple cows.

I’d love to hear about your most memorable OSS purple cows.

You may also be wondering how to go about developing your own purple OSS cow. Well I start by asking, “What are people complaining about?” or “What are our biggest issues?” That’s where the opportunities lie. Once discovering those issues, the challenge is solving the problem/s in an entirely different, but better, way. I figure that if people care enough to complain about those issues, then they’re sure to talk about any product that solves the problem for them.

Designing OSS to cope with greater transience

There are three broad models of networking in use today. The first is the adaptive model where devices exchange peer information to discover routes and destinations. This is how IP networks, including the Internet, work. The second is the static model where destinations and pathways (routes) are explicitly defined in a tabular way, and the final is the central model where destinations and routes are centrally controlled but dynamically set based on policies and conditions.”
Tom Nolle here.

OSS of decades past worked best with static networks. Services / circuits that were predominantly “nailed up” and (relatively) rarely changed after activation. This took the real-time aspect out of play and justified the significant manual effort required to establish a new service / circuit.

However, adaptive and centrally managed networks have come to dominate the landscape now. In fact, I’m currently working on an assignment where DWDM, a technology that was once largely static, is now being augmented to introduce an SDN controller and dynamic routing (at optical level no less!).

This paradigm shift changes the fundamentals of how OSS operate. Apart from the physical layer, network connectivity is now far more transient, so our tools must be able to cope with that. Not only that, but the changes are too frequent to justify the manual effort of the past.

To tie in with yesterday’s post, we are again faced with the option of abstract / generic modelling or specific modelling.

Put another way, we have to either come up with adaptive / algorithmic mechanisms to deal with that transience (the specific model), or need to mimic “nailed-up” concepts (the abstract model).

More on the implications of this tomorrow.

When your ideas get stolen

When your ideas get stolen.
A few meditations from Seth Godin:
“Good for you. Isn’t it better that your ideas are worth stealing? What would happen if you worked all that time, created that book or that movie or that concept and no one wanted to riff on it, expand it or run with it? Would that be better?
You’re not going to run out of ideas. In fact, the more people grab your ideas and make magic with them, the more of a vacuum is sitting in your outbox, which means you will prompted to come up with even more ideas, right?
Ideas that spread win. They enrich our culture, create connection and improve our lives. Isn’t that why you created your idea in the first place?
The goal isn’t credit. The goal is change.”

A friend of mine has lots of great ideas. Enough to write a really valuable blog. Unfortunately he’s terrified that someone else will steal those ideas. In the meantime, he’s missing out on building a really important personal brand for himself. Do you know anyone like him?

The great thing about writing a daily blog is that it forces you to generate lots of ideas. It forces you to be constantly thinking about your subject matter and how it relates to the world. Putting them out there in the hope that others want to run with them, in the hope that they spread. In the hope that others will expand upon them and make them more powerful, teaching you along the way. At over 2000 posts now, it’s been an immensely enriching experience for me anyway. As Seth states, the goal is definitely change and we can all agree that OSS is in desperate need for change.

It is incumbent on all of us in the OSS industry to come up with a constant stream of ideas – big and small. That’s what we tend to do on a daily basis right? Do yours tend towards the smaller end of the scale, to resolve daily delivery tasks or the larger end of the scale, to solve the industry’s biggest problems?

Of your biggest ideas, how do you get them out into the world for others to riff on? How many of your ideas have been stolen and made a real difference?

If someone rips off your ideas, it’s a badge of honour and you know that you’ll always generate more…unless you don’t let your idea machine run.

Re-writing the Sales vs Networks cultural divide

Brand, marketing, pricing and sales were seen as sexy. Networks and IT were the geeks no one seemed to speak to or care about. … This isolation and excommunication of our technical team had created an environment of disillusion. If you wanted something done the answer was mostly ‘No – we have no budget and no time for that’. Our marketing team knew more about loyalty points … than about our own key product, the telecommunications network.”
Olaf Swantee
, from his book, “4G Mobile Revolution”

Great note here (picked up by James Crawshaw at Heavy Reading). It talks about the great divide that always seems to exist between Sales / Marketing and Network / Ops business units.

I’m really excited about the potential for next generation OSS / orchestration / NaaS (Network as a Service) architectures to narrow this divide though.

In this case:

  1. The Network is offered as a microservice (let’s abstractly call them Resource Facing Services [RFS]);
  2. Sales / Marketing construct customer offerings (let’s call them Customer Facing Services [CFS]) from those RFS; and
  3. There’s a catalog / orchestration layer that marries the CFS with the cohesive set of RFS

The third layer becomes a meet-in-the-middle solution where Sales / Marketing comes together with Network / Ops – and where they can discuss what customers want and what the network can provide.

The RFS are suitably abstracted that Sales / Marketing doesn’t need to understand the network and complexity that sits behind the veil. Perhaps it’s time for Networks / Ops to shine, where the RFS can be almost as sexy as CFS (am I falling too far into the networks / geeky side of the divide?  🙂  )

The CFS are infinitely composable from RFS (within the constraints of the RFS that are available), allowing Sales / Marketing teams to build whatever they want and the Network / Ops teams don’t have to be constantly reacting to new customer offerings.

I wonder if this revolution will give Olaf cause to re-write this section of his book in a few years, or whether we’ll still have the same cultural divide despite the exciting new tools.

Does the death of ATM bear comparison with telco-grade open-source OSS?

Hands up if you’re old enough to remember ATM here? And I don’t mean the type of ATM that sits on the side of a building dispensing cash – no I mean Asynchronous Transfer Mode.

For those who aren’t familiar with ATM, a little background. ATM was THE telco-grade packet-switching technology of choice for most carriers globally around the turn of the century. Who knows, there might still be some ATM switches/routers out there in the wild today.

ATM was a powerful beast, with enormous configurability and custom-designed with immense scale in mind. It was created by telco-grade standards bodies with the intent of carrying voice, video, data, whatever, over big data pipes.

With such pedigree, you may be wondering then, how it was beaten out by a technology that was designed to cheaply connect small groups of computers clustered within 100 metres of each other (and a theoretical maximum bandwidth of 10Mbps).

Why does the technology that scaled up to become carrier Ethernet exist in modern telco networks, whereas ATM is largely obsoleted? Others may beg to differ, and there are probably a multitude of factors, but I feel it boils down to operational simplicity. Customers wanted operational simplicity and operators didn’t want to have a degree in ATM just to be able to drive it. By being designed to be all things to all people (carriers), did that make ATM compromised from the start?

Now I’ll state up front that I love the initiative and collaboration being shown by many of the telcos in committing to open-source programs like ONAP. It’s a really exciting time for the industry. It’s a sign that the telcos are wresting control back from the vendors in terms of driving where the collective innovation goes.

Buuuuuuut…..

Just like with ATM, are the big open source programs just too big and too complicated? Do you need a 100% focus on ONAP to be able to make it work, or even to follow all the moving parts? Are these initiatives trying to be all things to all carriers instead of changing needs to more simplified use cases?

Sometimes the ‘right’ way to do it just doesn’t exist yet, but often it does exist but is very expensive. So, the question is whether the ‘cheap, bad’ solution gets better faster than the ‘expensive, good’ solution gets cheap. In the broader tech industry (as described in the ‘disruption’ concept), generally the cheap product gets good. The way that the PC grew and killed specialized professional hardware vendors like Sun and SGi is a good example. However, in mobile it has tended to be the other way around – the expensive good product gets cheaper faster than the cheap bad product can get good.”
Ben Evans
here.

Is there an Ethernet equivalent in the OSS world, something that’s “cheap, bad” but getting better (and getting customer buy-in) rapidly?

Blown away by one innovation. Now to extend on it

Our most recent two posts, from yesterday and Friday, have talked about one stunningly simple idea that helps to overcome one of OSS‘ biggest challenges – data quality. Those posts have stimulated quite a bit of dialogue and it seems there is some consensus about the cleverness of the idea.

I don’t know if the idea will change the OSS landscape (hopefully), or just continue to be a strong selling point for CROSS Network Intelligence, but it has prompted me to think a little longer about innovating around OSS‘ biggest challenges.

Our standard approach of just adding more coats of process around our problems, or building up layers of incremental improvements isn’t going to solve them any time soon (as indicated in our OSS Call for Innovation). So how?

Firstly, we have to be able to articulate the problems! If we know what they are, perhaps we can then take inspiration from the CROSS innovation to spur us into new ways of thinking?

Our biggest problem is complexity. That has infiltrated almost every aspect of our OSS. There are so many posts about identifying and resolving complexity here on PAOSS that we might skip over that one in this post.

I decided to go back to a very old post that used the Toyota 5-whys approach to identify the real cause of the problems we face in OSS [I probably should update that analysis because I have a whole bunch of additional ideas now, as I’m sure you do too… suggested improvements welcomed BTW].

What do you notice about the root-causes in that 5-whys analysis? Most of the biggest causes aren’t related to system design at all (although there are plenty of problems to fix in that space too!). CROSS has tackled the data quality root-cause, but almost all of the others are human-centric factors – change controls, availability of skilled resources, requirement / objective mis-matches, stakeholder management, etc. Yet, we always seem to see OSS as a technical problem.

How do you fix those people challenges? Ken Segal puts it this way, “When process is king, ideas will never be. It takes only common sense to recognize that the more layers you add to a process, the more watered down the final work will become.” Easier said than done, but a worthy objective!

Blown away by one innovation – a follow-up concept

Last Friday’s blog discussed how I’ve just been blown away by the most elegant OSS innovation I’ve seen in decades.

You can read more detail via the link, but the three major factors in this simple, elegant solution to data quality problems (probably OSS‘ biggest kryptonite) are:

  1. Being able to make connections that break standard object hierarchy rules; but
  2. Having the ability to mark that standard rules haven’t been followed; and
  3. Being able to uses the markers to prioritise the fixing of data at a more convenient time

It’s effectively point 2 that has me most excited. So novel, yet so obvious in hindsight. When doing data migrations in the past, I’ve used confidence flags to indicate what I can rely on and what needs further audit / remediation / cleansing. But the recent demo I saw of the CROSS product is the first time I’ve seen it built into the user interface of an OSS.

This one factor, if it spreads, has the ability to change OSS data quality in the same way that Likes (or equivalent) have changed social media by acting as markers of confidence / quality.

Think about this for a moment – what if everyone who interacts with an OSS GUI had the ability to rank their confidence in any element of data they’re touching, with a mechanism as simple as clicking a like/dislike button (or similar)?

Bad example here but let’s say field techs are given a design pack, and upon arriving at site, find that the design doesn’t match in-situ conditions (eg the fibre pairs they’re expecting to splice a customer lead-in cable to are already carrying live traffic, which they diagnose is due to data problems in an upstream distribution joint). Rather than jeopardising the customer activation window by having to spend hours/days fixing all the trickle-down effects of the distribution joint data, they just mark confidence levels in the vicinity and get the customer connected.

The aggregate of that confidence information is then used to show data quality heat maps and help remediation teams prioritise the areas that they need to work on next. It helps to identify data and process improvements using big circle and/or little circle remediation techniques.

Possibly the most important implication of the in-built ranking system is that everyone in the end-to-end flow, from order takers to designers through to coal-face operators, can better predict whether they need to cater for potential data problems.

Your thoughts?? In what scenarios do you think it could work best, or alternatively, not work?

I’ve just been blown away by the most elegant OSS innovation I’ve seen in decades

Looking back, I now consider myself extremely lucky to have worked with an amazing product on the first OSS project I worked on (all the way back in 2000). And I say amazing because the underlying data models and core product architecture are still better than any other I’ve worked with in the two decades since. The core is the most elegant, simple and powerful I’ve seen to date. Most importantly, the models were designed to cope with any technology, product or service variant that could be modelled as a hierarchy, whether physical or virtual / logical. I never found a technology that couldn’t be modelled into the core product and it required no special overlays to implement a new network model. Sadly, the company no longer exists and the product is languishing on the books of the company that bought out the assets but isn’t leveraging them.

Having been so spoilt on the first assignment, I’ve been slightly underwhelmed by the level of elegant innovation I’ve observed in OSS since. That’s possibly part of the reason for the OSS Call for Innovation published late last year. There have been many exciting innovations introduced since, but many modern tools are still more complex and complicated than they should be, for implementers and operators alike.

But during a product demo last week, I was blown away by an innovation that was so simple in concept, yet so powerful that it is probably the single most impressive innovation I’ve seen since that first OSS. Like any new elegant solution, it left me wondering why it hasn’t been thought of previously. You’re probably wondering what it is. Well first let me start by explaining the problem that it seeks to overcome.

Many inventory-based OSS rely on highly structured and hierarchical data. This is a double-edged sword. Significant inter-relationship of data increases the insight generation opportunities, but the downside is that it can be immensely challenging to get the data right (and to maintain a high-quality data state). Limited data inter-relationships make the project easier to implement, but tend to allow less rich data analyses. In particular, connectivity data (eg circuits, cables, bearers, VPNs, etc) can be a massive challenge because it requires the linking of separate silos of data, often with no linking key. In fact, the data quality problem was probably one of the most significant root-causes of the demise of my first OSS client.

Now getting back to the present. The product feature that blew me away was the first I’ve seen that allows significant inter-relationship of data (yet in a simple data model), but still copes with poor data quality. Let’s say your OSS has a hierarchical data model that comprises Location, Rack, Equipment, Card, Port (or similar) and you have to make a connection from one device’s port to another’s. In most cases, you have to build up the whole pyramid of data perfectly for each device before you can create a customer connection between them. Let’s also say that for one device you have a full pyramid of perfect data, but for the other end, you only know the location.

The simple feature is to connect a port to a location now, or any other point to point on the hierarchy (and clean up the far-end data later on if you wish). It also allows the intermediate hops on the route to be connected at any point in the hierarchy. That’s too simple right, yet most inventory tools don’t allow connections to be made between different levels of their hierarchies. For implementers, data migration / creation / cleansing gets a whole lot simpler with this approach. But what’s even more impressive is that the solution then assigns a data quality ranking to the data that’s just been created. The quality ranking is subsequently considered by tools such as circuit design / routing, impact analysis, etc. However, you’ll have noted that the data quality issue still hasn’t been fixed. That’s correct, so this product then provides the tools that show where quality rankings are lower, thus allowing remediation activities to be prioritised.

If you have an inventory data quality challenge and / or are wondering the name of this product, it’s CROSS, from the team at CROSS Network Intelligence (www.cross-ni.com).

Is your data AI-ready (part 2)

Further to yesterday’s post that posed the question about whether your data was AI ready for virtualised network assurance use cases, I thought I’d raise a few more notes.

The two reasons posed were:

  1. Our data sets haven’t had time to collect much elastic / dynamic network data yet
  2. Our data is riddled with human-generated data that is error-prone

On the latter case in particular, I sense that we’re going to have to completely re-architect the way we collect and store assurance data. We’re almost definitely going to have to think in terms of automated assurance actions and related logging to avoid the errors of human data creation / logging. The question becomes whether it’s worthwhile trying to wrangle all of our old data into formats that the AI engines can cope with or do we just start afresh with new models? (This brings to mind the recent “perfect data” discussion).

It will be one thing to identify patterns, but another thing entirely to identify optimum response activities and to automate those.

If we get these steps right, does it become logical that the NOC (network) and SOC (security operations centre) become conjoined… at least much more so than they tend to be today? In other words, does incident management merge network incidents and security incidents onto common analysis and response platforms? If so, does that imply another complete re-architecture? It certainly changes the operations model.

I’d love to hear your thoughts and predictions.

Are your existing data sets actually suited to seeding an AI engine?

In the virtualization domain, the old root cause technology is becoming obsolete because resources and workloads move around dynamically – we no longer have fixed network and compute resources. Existing service assurance systems in the telecommunication network were designed to manage a fixed set of resources and these assurance systems fall short in monitoring dynamic virtualized networks. Code was written using a rule based approach on known problems. Some advances have been made to develop signature patterns to determine the root cause of a problem, but this approach will also fall short in a dynamic virtualized network where autonomous changes will occur continuously.”
Patrick Kelly
here.

This quote is taken from a really interesting article by Patrick Kelly (see link above).

The old models of determining service impact and root-cause certainly struggle to hold up in the transient world of virtualised networks. Artificial Intelligence or Machine Learning or machine-led pattern identification, or whatever the technologies will be called by their developers, have a really important part to play in networks that are not just dynamic, but undergoing a touchpoint explosion.

The fascinating part of this story is that these clever new models will rely on data. Lots of data. We already have lots of data to feed into the new models. Buuuuut…. I’ve long held the reservation that there might be one slight problem… does all of our existing data actually suit the “AI” models available today?

Firstly, our existing data doesn’t include much of a history on dynamically transient networks. But the more important factor is that our networks have been managed by humans – operators who have a tendency of recording the quickest, dirtiest (and not necessarily correct or complete) set of data that allows them to restore service quickly.

Following a recent discussion with someone who’s running an AI assurance PoC for a big telco, it seems this reservation is turning out to be true. Their existing data sets just aren’t suited to the AI models. They’re having to reconsider their whole approach to their data model and how to collect / store it. They’re now starting to get positive results from the custom-built data sets.

It’s coming back to the same story as a post from last week – having connectors that can translate the different languages of ops, data, AI, etc and building a people / process / technology solution that the AI models can cope with.

You might not be ready to start an AI experiment yet, but you may like to start the journey by understanding whether your existing data is suited to AI modelling. If not, you get the chance to change it and have a great repository of data to seed an AI engine when you are ready in future. The first step on an exponential OSS journey.

One sentence to make most OSS experts cringe

Let me warn you. The following sentence is going to make many OSS experts cringe, maybe even feel slightly disgusted, but take the time to read the remainder of the post and ponder how it fits within your specific OSS context/s.

“Our OSS need to help people spend money!”

Notice the word is “help” and not “coerce?” This is not a post about turning our OSS into sales tools, well, not directly anyway.

May I ask you a question – Do you ever spend time thinking about how your OSS is helping your customer’s customer (which I’ll refer to as the end-customer) to spend their money? And I mean making it easier for them to buy the stuff they want to buy in return for some form of value / utility, not trick or coerce them into buying stuff they don’t want.

Let me step you through the layers of thinking here.

The first layer for most OSS experts is their direct customer, which is usually the service provider or enterprise that buys and operates the OSS. We might think they are buying an OSS, but we’re wrong. An organisation buys an OSS, not because it wants an Operational Support System, but because it wants Operational Support.

The second layer is a distinct mindset change for most OSS experts. Following on from the first layer, OSS has the potential to be far more than just operational support. Operational support conjures up the image of being a cost-centre, or something that is a necessary evil of doing business (ie in support of other revenue-raising activities). To remain relevant and justify OSS project budgets, we have to flip the cost-centre mentality and demonstrate a clear connection with revenue chains. The more obvious the connection, the better. Are you wondering how?

That’s where the third layer comes in. We have to think hard about the end-customer and empathise with their experiences. These experiences might be a consumer to a service provider’s (your direct customer) product offerings. It might even be a buying cycle that the service provider’s products facilitate. Either way, we need to simplify their ability to buy.

So let’s work back up through those layers again:
Layer 3 – If end-customers find it easier to buy stuff, then your customer wins more revenue (and brand value)
Layer 2 – If your customer sees that its OSS / BSS has unquestionably influenced revenue increase, then more is invested on OSS projects
Layer 1 – If your customer recognises that your OSS / BSS has undeniably influenced the increased OSS project budget, you too get entrusted with a greater budget to attempt to repeat the increased end-customer buy cycle… but only if you continue to come up with ideas that make it easier for people (end-customers) to spend their money.

At what layer does your thinking stop?

How smart contracts might reduce risk and enhance trust on OSS projects

Last Friday, we spoke about all wanting to develop trusted OSS supplier / customer relationships but rarely finding them and a contrarian factor for why trust is so hard to achieve in OSS – complexity.

Trust is the glue that allows OSS projects to happen. Not only that, it becomes a catch-22 with complexity. If OSS partners don’t trust each other, requirements, contracts, etc get more complex as a self-protection barrier. But with every increase in complexity, there becomes an increasing challenge to deliver and hence, risk of further reduction in trust.

On a smaller scale, you’ve seen it on all projects – if the project starts to falter, increased monitoring attention is placed on the project, which puts increased administrative load on the project team and reduces the time they have to deliver the intended outcomes. Sometimes the increased admin / report gains the attention of sponsors and access to additional resources, but usually it just detracts from the available delivery capability.

Vish Nandlall also associates trust and complexity in organisational models in his LinkedIn post below:

This is one of the reasons I’m excited about what smart contracts can do for the organisations and OSS projects of the future. Just as “Likes” and “Supplier Rankings” have facilitated online trust models, smart contracts success rankings have the ability to do the same for OSS suppliers, large and small. For example, rather than needing to engage “Big Vendor A” to build your entire, monolithic OSS stack, if an operator develops simpler, more modular work breakdowns (eg microservices), then they can engage “Freelancer B” and “Small Vendor C” to make valuable contributions on smaller risk increments. Being lower in complexity and risk means B and C have a greater chance of engendering trust, but their historical contract success ranking forces them to develop trust as a key metric.

The challenges in transforming network assurance to network healing

A couple of interesting concepts have the ability to fundamentally change the way networks and services are maintained. If they can be harnessed, we could replace the term “network assurance” with “network healing.”

The first concept is SON, which has been formulated specifically with mobile radio networks in mind, but has the potential to extend into all network types.

A Self-Organizing Network (SON) is an automation technology designed to make the planning, configuration, management, optimization and healing of mobile radio access networks simpler and faster.”
Wikipedia

One of the challenges of creating self-organising, self-optimising, self-healing networks is that every network has physical points of failure – cable cuts, equipment failure, etc. These can’t be fixed with software alone. That’s where the second concept comes in.

The second concept is smart-contract technology (possibly facilitated by Blockchain), which provides the potential for a more automated way of engaging a mini procurement / delivery / test / payment process to fix physical problems (or logical for that matter). Whilst the work might be done in the physical world, it could be done by third-parties, initiated by the OSS via microservice. Network Fix as a Service (NFaaS), with implementation, test, acceptance and payment all done in software as far as the OSS sees it.

To an extent this already happens via the issuance of ToW (Tickets of Work) to third party fault-fix teams, but it’s normally a significantly manual process currently.

However, the bigger challenge of transforming network assurance to network healing is to find a way to self-heal services that span multiple network domains. This could be physical network functions (PNF), virtual network functions (VNF) and the myriad topologies, technologies and protocols that interconnect them.

I can’t help but think that to simplify (self-healing) we first have to simplify (network variant minimisation).

If we can drastically reduce the number of variants, we have a better chance of building self-heal automations… and don’t just tell me that AI engines will solve all these problems! Maybe one day, but perhaps we can start with baby steps first.