In recent years, we’ve been trying to generate at least one deep-dive tech report per year here at PAOSS. Last year it was AIOps of the Future, the year before it was The Most Exciting OSS/BSS Innovations, etc. They tend to be an itch that we want to scratch for ourselves, but they’ve had an awesome side-benefit. Some really big names have reached out and asked us to present the findings of these reports.
Just last week, we gave a presentation to one of the big-name consulting firms about our AIOps report. Such an honour! We used a specifically modified version of the following presentation below. We’ve used the basis of this to also give presentations to experts within two of the world’s biggest-name telcos.
What’s even cooler than being asked to give the presentations themselves is having the opportunity to then share ideas with the many insightful attendees after the sessions for in-depth Q&A.
The follow-on session last week was a prime example. This prestigious consulting firm is renowned for organisational / tech transformations, so naturally the conversation centred around the progressive benefit realisation and evolving ops models (at around 25:30 in the video).
They rightly latched onto a looming dilemma facing the telco industry (although it’s probably also true of almost any firm that comes to rely heavily on AI).
AIOps has the potential to progressively handle more and more event situations automatically, as per this asymptote diagram in the presentation.
As we move into the future of this graph from T0 to T10 and beyond, AIOps and associated automations progressively handle more incidents without human involvement (the blue bars get ever-closer to 100%). We aim to pick the easiest, most important, most impactful, etc incident types and automate those first (T1). Then we take the next most important (T2), then the next (T3), etc. But we tend to see a diminishing return as we get closer to the orange asymptote. This is clearly not the dilemma. This is the exact reason for AIOps – to reduce the noise and the human involvement. We want the smallest possible number of incidents actually bubbling up that require a human to action (the white-space between the blue bars and the orange line) and for the machines to handle the rest.
But this is also the crux of the dilemma we face. The tiny number of incidents that do bubble up will tend to either be the obscure or highly complex event patterns that the machines can’t easily resolve and/or have never been seen before (black swan events). In a recent network resilience study we did for a high-profile T1 telco, we found that almost all Sev1 outages were complex combinations of events – black swans.
Network virtualisation is arguably making the combinations even more complex. It’s in these patterns where highly specialised resolution skills are required. Let’s call these experts “ninjas” as they tend to have highly advanced training combined with decades of experience. These ninjas have already done their apprenticeships. They’ve generally worked their way up from solving basic problems and have seen all number of event patterns along the way. They’ve followed the path from apprentice to master.
What happens though when there is no longer a need for the apprentices? The AIOps is handling the apprentice-level patterns. AIOps removes the pathway to becoming the masters / ninjas that every telco will desperately need to resolve / avoid Sev1’s (for the foreseeable future anyway). [As an analogy, they won’t have the chance to make lots of pots, but will be unrealistically expected to make perfect pots].
Exacerbating this problem further, many telcos have already outsourced / off-shored these apprenticeship roles for the last few decades. They’re now incredibly concerned about the skills shortage (skills cliff?) because their current ninjas are approaching retirement and the next generation of workers are either employed by other companies or do/will rely on machines to resolve most situations automatically. It seems that our people, process and technology decisions are looking at the short game, not the long game. We haven’t created a skills shortage as much as we’ve created a skills chasm – an ever-widening gap between apprentice and ninja – a chasm that we may no longer be able to bridge.
One other problem is that most operational tools we use today were designed back in the days when all operations were handled by humans. Few OSS tools have gone through a radical re-framing exercise to design user interfaces to cater for the new norm that will soon arise. That is, the machines handling the volume and only the complex resolutions being solved by the ninjas. The UI needs to be designed to decipher the complexity, not handle the volumes anymore. The user experience is far different, as are the tools required.
I also see the parallels with yesterday’s article about navigating our way across the OSS savannah. Due to tech proliferation, it’s less likely that we will be able to know the entire tech estate ourselves as a single, all-powerful, ninja. If we’ve done the apprenticeships, we’ll have a general feel for the “lay of the land,” but it’s impossible to personally have ninja-level expertise across all facets of our OSS, network, infrastructure, data architectures, service offerings, etc, etc. We need to work as teams – a combination of the super-connectors that know who to call for any given situation and the super-experts that know how to efficiently navigate their small sections of the savannah. We need to develop better Ninja Academies (hat-tip to Carolyn for creating that trademark 🙂 )
Correspondingly, we’ll also need to develop tools that coordinate the collaboration between super-experts in much more coherent ways than the trouble ticket “tick and flick” approach of the past.
What are you seeing? Do you agree that there’s a skills chasm? Do you agree that the gap is widening? If so, what mechanisms are you putting in place to avoid it? Do you disagree? Are you seeing other patterns evolving? How are you positioning yourself careers-wise to play into the opportunities that this will create into the future? As always, I’d love to hear your thoughts via a personal message and/or in the comments below.