The layers of ITIL redundancy

Today’s is something of a heretical post, especially for the believers in ITIL. In the world of OSS, we look to build in layers of resiliency and not layers of redundancy.

The following diagram and subsequent text in italics describes a typical ITIL process and is all taken from https://www.computereconomics.com/article.cfm?id=1074

Example of relationship between ITIL incidents, problems, and changes

The sequence of events as shown in Figure 1 is as follows:

  • At TIME = 0, an External Event is detected by the Incident Management process. This could be as simple as a customer calling to say that service is unavailable or it could be an automated alert from a system monitoring device.The incident owner logs and classifies this as incident i2. Then, the incident owner tries to match i2 to known errors, work-arounds, or temporary fixes, but cannot find a match in the database.
  • At TIME = 1, the incident owner dispatches a problem request to the Problem Management process anticipating a work-around, temporary fix, or other assistance. In doing so, the incident owner has prompted the creation of Problem p2.
  • At TIME = 2, the problem owner of p2 returns the expected temporary fix to the incident owner of i2.  Note that both i2 and p2 are active and exist simultaneously. The incident owner for i2 applies the temporary fix.
  • In this case, the work-around requires a change request.  So, at Time = 3, the incident owner for i2 initiates change request, c2.
  • The change request c2 is applied successfully, and at TIME = 4, c2 is closed. Note that for a while i2, p2 and c2 all exist simultaneously.
  • Because c2 was successful, the incident owner for i2 can now confirm that the incident is resolved. At TIME = 5, i2 is closed. However, p2 remains active while the problem owner searches for a permanent fix. The problem owner for p2 would be responsible for implementing the permanent fix and initiating any necessary change requests.

But I look at it slightly differently. At their root, why do Incident Management, Problem Management and Change Management exist? They’re all just mechanisms for resolving a system* health issue. If we detect an event/s and fix it, we don’t have to expend all the effort of flicking tickets around.

Thinking within the T2R paradigm of trouble -> incident -> problem -> change -> resolve holds us back. If we can skip the middle steps and immediately associate a resolution with the event/s, we get a whole lot more efficient. If we can immediately relate a trigger with a reaction, we can also get rid of the intermediate ticket flickers and the processing cycle time.

So, the NOC of the future surely requires us to build a trigger -> reaction recommendation engine (and body of knowledge). That’s a more powerful tool to supply to our NOC operators than incidents, problems and change requests. (Easier to write about than to actually solve though of course)

The TMN model suffers from modern network anxiety

As the TMN diagram below describes, each layer up in the network management stack abstracts but connects (as described in more detail in “What an OSS shouldn’t do“). That is, each higher layer reduces the amount if information/control within a domain that it’s responsible for, but it assumes a broader responsibility for connecting multiple domains together.
OSS abstract and connect

There’s just one problem with the diagram. It’s a little dated when we take modern virtualised infrastructure into account.

In the old days, despite what the layers may imply, it was common for an OSS to actually touch every layer of the pyramid to resolve faults. That is, OSS regularly connected to NMS, EMS and even devices (NE) to gather network health data. The services defined at the top of the stack (BSS) could be traced to the exact devices (NE / NEL) via the circuits that traversed them, regardless of the layers of abstraction. It helped for root-cause analysis (RCA) and service impact analysis (SIA).

But with modern networks, the infrastructure is virtualised, load-balanced and since they’re packet-switched, they’re completely circuitless (I’m excluding virtual circuits here by the way). The bottom three layers of the diagram could effectively be replaced with a cloud icon, a cloud that the OSS has little chance of peering into (see yellow cloud in the diagram later in this post).

The concept of virtualisation adds many sub-layers of complexity too by the way, as higlighted in the diagram below.

ONAP triptych

So now the customer services at the top of the pyramid (BSS / BML) are quite separated from the resources at the bottom, other than to say the services consume from a known pool of resources. Fault resolution becomes more abstracted as a result.

But what’s interesting is that there’s another layer that’s not shown on the typical TMN model above. That is the physical network inventory (PNI) layer. The cables, splices, joints, patch panels, equipment cards, etc that underpin every network. Yes, even virtual networks.

In the old networks the OSS touched every layer, including the missing layer. That functionality was provided by PNI management. Fault resolution also occurred at this layer through tickets of work conducted by the field workforce (Workforce Management – WFM).

In new networks, OSS/BSS tie services to resource pools (the top two layers). They also still manage PNI / WFM (the bottom, physical layer). But then there’s potentially an invisible cloud in the middle. Three distinctly different pieces, probably each managed by a different business unit or operational group.
BSS OSS cloud abstract

Just wondering – has your OSS/BSS developed control anxiety issues from losing some of the control that it once had?

OSS data that’s even more useless than useless

About 6-8 years ago, I was becoming achingly aware that I’d passed well beyond an information overload (I-O) threshold. More information was reaching my brain each day than I was able to assimilate, process and archive. What to do?

Well, I decided to stop reading newspapers and watching the news, in fact almost all television. I figured that those information sources were empty calories for the brain. At first it was just a trial, but I found that I didn’t miss it much at all and continued. Really important news seemed to find me at the metaphorical water-cooler anyway.

To be completely honest, I’m still operating beyond the I-O threshold, but at least it’s (arguably) now a more healthy information diet. I’m now far more useless at trivia game shows, which could be embarrassing if I ever sign up as a contestant on “Who Wants to be a Millionaire.” And missing out on the latest news sadly makes me far less capable of advising the Queen on how to react to Meghan Markle’s latest royal “atrocity.” The crosses we bear.

But I’m digressing markedly (and Markle-ey) from what this blog is all about – O.S.S.

Let me ask you a question – Is your OSS data like almost everybody else’s (ie also in I-O mode)?

Seth Godin recently quoted 3 rules of data:
First, don’t collect data unless it has a non-zero chance of changing your actions.
Second, before you seek to collect data, consider the costs of processing that data.
Third, acknowledge that data collected isn’t always accurate, and consider the costs of acting on data that’s incorrect.”

If I remove the double-negative from rule #1 – Only collect data if it has even the slightest chance of changing your actions.

Most people take the perspective that we might as well collect everything because storage is just getting so cheap (and we should keep it, not because we ever use it, but just in case our AI tools eventually find some relevance locked away inside it).

In the meantime, pre-AI (rolls eyes), Seth’s other two rules provide further sanity to the situation. Storing data is cheap, except where it has to be timely and accurate enough to make decisive, reliable actions on.

So, let me go back to the revised quote in bold. How much of the data in your OSS database / data-lake / data-warehouse / etc has even the slightest chance of changing your actions? As a percentage??

I suspect a majority is never used. And as most of it ages, it becomes even more useless than useless. One wonders, why are we storing it then?

Zero Touch Assurance – ZTA (part 3)

This is the third in a series on ZTA, following on from yesterday’s post that suggested intentionally triggering events to allow the accumulation of a much larger library of historical network data.

Today we’ll look at the impact of data collection on our ability to achieve ZTA and refer back to part 1 in the series too.

  1. Monitoring – There is monitoring the events that happen in the network and responding manually
  2. Post-cognition – There is monitoring events that happen in the network, comparing them to past events and actions (using analytics to identify repeating patterns), using the past to recommend (or automate) a response
  3. Pre-cognition – There is identifying events that have never happened in the network before, yet still being able to provide a recommended / automated response

In my early days of OSS projects, it was common that network performance data would be collected at 15 minute intervals at best. Sometimes even less if it put too much load on the processor of any given network devices. That was useful for long and medium term trend analysis, but averaging across the 15 minute period meant that significant performance events could be missed. Back in those days it was mostly Stage 1 – Monitoring. Stage 2, Post-cognition, was unsophisticated at a system level (eg manually adjusting threshold-crossing event levels) so post-cognition relied on talented operators who could remember similar events in the past.

If we want to reach the goal of ZTA, we have to drastically reduce measurement / polling / notification intervals. Ideally, we want near-real-time data collection across the following dimensions:

  • To extract (from the device/EMS)
  • To transform / normalise the data (different devices may use different counter models for example)
  • To load
  • To identify patterns (15 minute poll cycles disguise too many events)
  • To compare with past patterns
  • To compare with past responses / results
  • To recommend or automate a response

I’m sure you can see the challenge here. The faster the poll cycle, the more data that needs to be processed. It can be a significant feat of engineering to process large data volumes at near-real-time speeds (streaming analytics) on large networks.

Zero Touch Assurance – ZTA (part 2)

Yesterday we described the three steps on the path to Zero Touch Assurance:

  1. Monitoring – Monitoring the events that happen in the network and responding manually
  2. Post-cognition – Monitoring events / trends that happen in the network, comparing them to past situations (using analytics to identify repeating patterns), using the past to recommend (or automate) a response
  3. Pre-cognition – Identification of events / trends that have never happened in the network before, yet still being able to provide a recommended / automated response

At face-value, it seems that we need pre-cognition to be able to achieve ZTA, but we also seem to be some distance away from achieving step 3 technologically (please correct me if I’m wrong here!). But today we pose a possible alternate way, using only the more achievable step 2 technology.

The weakness of Post-cognition is that it’s only as useful as the history of past events that it can call upon. But rather than waiting for events to naturally occur, perhaps we could constantly trigger simulated events and reactions to seed the historical database with a far greater set of data to call upon. In other words, pull all the levers to ensure that there is no event that has never happened before. The problem with this brute-force approach is that the constant tinkering could trigger a catastrophic network failure. We want to build up a library of all possible situations, but without putting live customer services at risk.

So we could run many of the more risky, cascading or long-run variants on what other industries might call a “digital twin” of the network instead. By their nature of storing all the current operating data about a given network, an OSS could already be considered to be a digital twin. We’d just need to build the sophisticated, predictive simulations to run on the twin.

More to come tomorrow when we discuss how data collection impacts our ability to achieve ZTA.

Zero Touch Assurance – ZTA (part 1)

A couple of years ago, we published a series on pre-cognitive OSS based on the following quote by Ben Evans about three classes of search/discovery:

  1. There is giving you what you already know you want (Amazon, Google)
  2. There is working out what you want (Amazon and Google’s aspiration)
  3. And then there is suggesting what you might want (Heywood Hill).

Today, I look to apply a similar model towards the holy grail of OSS – Zero Touch Assurance (ZTA).

  1. Monitoring – There is monitoring the events that happen in the network and responding manually
  2. Post-cognition – There is monitoring events that happen in the network, comparing them to past events and actions (using analytics to identify repeating patterns), using the past to recommend (or automate) a response
  3. Pre-cognition – There is identifying events that have never happened in the network before, yet still being able to provide a recommended / automated response

The third step, pre-cognition, is where the supposed holy grail lies. It’s where everyone talks about the prospect of AI solving all of our problems. It seems we’re still a way off this happening.

But I wonder whether the actual ZTA solution might be more of a brute-force version of step 2 – post-cognition?

More on that tomorrow.

To link or not to link your OSS. That is the question

The first OSS project I worked on had a full-suite, single vendor solution. All products within the suite were integrated into a single database and that allowed their product developers to introduce a lot of cross-linking. That has its strengths and weaknesses.

The second OSS suite I worked with came from one of the world’s largest network vendors and integrators. Their suite primarily consisted of third-party products that they integrated together for the customer. It was (arguably) a best-of-breed all implemented as a single solution, but since the products were disparate, there was very little cross-linking. This approach also has strengths and weaknesses.

I’d become so used to the massive data migration and cross-referencing exercise required by the first OSS that I was stunned by the lack of time allocated by the second vendor for their data migration activities. The first took months and a significant level of expertise. The second took days and only required fairly simple data sets. That’s a plus for the second OSS.

However, the second OSS was severely lacking in cross-domain data, which impacted the richness of insight that could be easily unlocked.

Let me give an example to give better context.

We know that a trouble ticketing system is responsible for managing the tracking, reporting and resolution of problems in a network operator’s network. This could be as simple as a repository for storing a problem identifier and a list of notes performed to resolve the problem. There’s almost no cross-linking required.

A more referential ticketing system might have links to:

  • Alarm management – to show the events linked to the problem
  • Inventory management – to show the impacted resources (or possibly impacted)
  • Service management – to show the services impacted
  • Customer management – to show the customers impacted and possibly the related customer interactions
  • Spares management – to show the life-cycle of physical resources impacted
  • Workforce management – to manage the people / teams performing restorative actions
  • etc

The referential ticketing system gives far richer information, obviously, but you have to trade that off against the amount of integration and data maintenance that needs to go into supporting it. The question to ask is what level of linking is justifiable from a cost-benefit perspective.

TM Forum’s Open API links

Those of you familiar with TM Forum are already quite familiar with the Frameworx enterprise architecture model. It’s as close as we get to a standard used across the OSS industry.

Frameworx consists of four main modules, with eTOM, TAM and SID being the most widely referred to:

But there’s a newer weapon in the TM Forum arsenal that appears to be gaining widespread use. It’s the TM Forum Open API suite, which has over 50 REST-based APIs as well as having many more under development. This link provides the full list of APIs, including specifications, swagger files and postman collections.

The following diagram comes from the Open API Map (GB992) (accurate as of 9 Jan 2018).
GB992_Open_API_Map_R17.0.1

It’s well worth reviewing as I think you’ll be hearing about TMF642 (Alarm Management API) and all its sister products as commonly as you hear eTOM or SID mentioned today.

How to kill the OSS RFP (part 4)

This is the fourth, and final part (I think) in the series on killing the OSS RFI/RFP process, a process that suppliers and customers alike find to be inefficient. The concept is based on an initiative currently being investigated by TM Forum.

The previous three posts focused on the importance of trusted partnerships and the methods to develop them via OSS procurement events.

Today’s post takes a slightly different tack. It proposes a structural obsolescence that may lead to the death of the RFP. We might not have to kill it. It might die a natural death.

Actually, let me take that back. I’m sure RFPs won’t die out completely as a procurement technique. But I can see a time when RFPs are far less common and significantly different in nature to today’s procurement events.

How??
Technology!
That’s the answer all technologists cite to any form of problem of course. But there’s a growing trend that provides a portent to the future here.

It comes via the XaaS (As a Service) model of software delivery. We’re increasingly building and consuming cloud-native services. OSS of the future, the small-grid model, are likely to consume software as services from multiple suppliers.

And rather than having to go through a procurement event like an RFP to form each supplier contract, the small grid model will simply be a case of consuming one/many services via API contracts. The API contract (eg OpenAPI specification / swagger) will be available for the world to see. You either consume it or you don’t. No lengthy contract negotiation phase to be had.

Now as mentioned above, the RFP won’t die, but evolve. We’ll probably see more RFPs formed between customers and the services companies that will create customised OSS solutions (utilising one/many OSS supplier services). And these RFPs may not be with the massive multinational services companies of today, but increasingly through smaller niche service companies. These micro-RFPs represent the future of OSS work, the gig economy, and will surely be facilitated by smart-RFP / smart-contract models (like the OSS Justice League model).

The Theory of Evolution, OSS evolution

Evolution says that biological change is a property of populations — that every individual is a trial run of an experimental combination of traits, and that at the end of the trial, you are done and discarded, and the only thing that matters is what aggregate collection of traits end up in the next generation. The individual is not the focus, the population is. And that’s hard for many people to accept, because their entire perception is centered on self and the individual.”
FreeThoughtBlog.

Have we almost reached the point where the same can be said for OSS workflows? In the past (and the present?) we had pre-defined process flows. There may be an occasional if/else decision gate, but we could capture most variants on a process diagram. These pre-defined processes were / are akin to a production line.

Process diagrams are becoming harder to lock down as our decision trees get more complicated. Technologies proliferate, legacy product lines don’t get obsoleted, the number of customer contact channels increases. Not only that, but we’re now marketing to a segment of one, treating every one of our customers as unique, whilst trying not to break our OSS / BSS.

Do we have the technology yet that allows each transaction / workflow instance to just be treated as an experimental combination of attributes / tasks? More importantly, do we have the ability to identify any successful mutations that allow the population (ie the combination of all transactions) to get progressively better, faster, stronger.

It seems that to get to CX nirvana, being able to treat every customer completely uniquely, we need to first master an understanding of the population at scale. Conversely, to achieve the benefits of scale, we need to understand and learn from every customer interaction uniquely.

That’s evolution. The benchmark sets the pattern for future workflows until a variant / mutation identifies a better benchmark to establish the new pattern for future workflows, which continues.

The production line workflow model of the past won’t get us there. We need an evolution workflow model that is designed to accommodate infinite optionality and continually learn from it.

Does such a workflow tool exist yet? Actually, it’s more than a workflow tool. It’s a continually improving loop workflow.

The OSS proof-of-worth dilemma

Earlier this week we posted an article describing Sutton’s Law of OSS, which effectively tells us to go where the money is. The article suggested that in OSS we instead tend towards the exact opposite – the inverse-Sutton – we go to where the money isn’t. Instead of robbing banks like Willie Sutton, we break into a cemetery and aimlessly look for the cash register.

A good friend responded with the following, “Re: The money trail in OSS … I have yet to meet a senior exec. / decision maker in any telco who believes that any OSS component / solution / process … could provide benefit or return on any investment made. In telco, OSS = cost. I’ve tried very hard and worked with many other clever people also trying hard to find a way to pitch OSS which overcomes this preconception. BSS is often a little easier … in many cases it’s clear that “real money” flows through BSS and needs to be well cared for.”

He has a strong argument. The cost-out mentality is definitely ingrained in our industry.

We are saddled with the burden of proof. We need to prove, often to multiple layers of stakeholders, the true value of the (often intangible) benefits that our OSS deliver.

The same friend also posited, “The consequence is the necessity to establish beneficial working relationships with all key stakeholders – those who specify needs, those who design and implement solutions and those, especially, who hold or control the purse strings. [To an outsider] It’s never immediately obvious who these people are, nor what are their key drivers. Sometimes it’s ambition to climb the ladder, sometimes political need to “wedge” peers to build empires, sometimes it’s necessity to please external stakeholders – sometimes these stakeholders are vendors or government / international agencies. It’s complex and requires true consultancy – technical, business, political … at all levels – to determine needs and steer interactions.

Again, so true. It takes more than just technical arguments.

I’m big on feedback loops, but also surprised at how little they’re used in telco – at all levels.

  • We spend inordinate amounts of time building and justifying business cases, but relatively little measuring the actual benefits produced after we’ve finished our projects (or gaining the learnings to improve the next round of business cases)
  • We collect data in our databases, obliviously let it age, realise at some point in the future that we have a data quality issue and perform remedial actions (eg audits, fixes, etc) instead of designing closed-loop improvement cycles that ensure DQ isn’t constantly deteriorating
  • We’re great at spending huge effort in gathering / arguing / prioritising requirements, but don’t always run requirements traceability all the way into testing and operational rollout.
  • etc

Which leads us back to the burden of proof. Our OSS have all the data in the world, but how often do we use it to justify and persuade – to prove?

Our OSS products have so many modules and technical functionality (so much of it effectively duplicated from vendor to vendor). But I’ve yet to see any vendor product that allows their customer, the OSS operators, to automatically gather proof-of-worth stats (ie executive-ready reports). Nor have I seen any integrator build proof-of-worth consultancy into their offer, whereby they work closely with their customer to define and collect the metrics that matter. BTW. If this sounds hard, I’d be happy to discuss how I approach this task.

So let me leave you with three important questions today:

  1. Have you also experienced the overwhelming burden of the “OSS = cost” mentality
  2. If so, do you have any suggestions / experiences on how you’ve overcome it
  3. Does the proof-of-worth product functionality sound like it could be useful (noting that it doesn’t even have to be a product feature, but a custom report / portal using data that’s constantly coursing through our OSS databases)

OSS’ Rosetta Stone

When working on OSS projects, I find that linking or reference keys are so valuable at so many levels. Not just data management within the OSS database, but project management, design activities, task assignment / delivery, etc.

People might call things all sorts of different names, which leads to confusion.

Let me cite an example. When a large organisation has lots of projects underway and many people are maintaining project lists, but each with a slightly (or massively!!) different variant of the project name, there can be confusion, and possibly duplication. But introduce a project number and it becomes a lot easier to compare project lists (if everyone cites the project number in their documentation).

Trying to pattern match text / language can be really difficult. But if data sets have linking keys, we can easily use Excel’s vlookup function (or the human brain, or any other equivalent tool of choice) to make comparison a whole lot easier.

Linking keys could be activity codes in a WBS, a device name in a log file, a file number, a process number, part numbers in a designer’s pick-lists, device configuration code, etc.

Correlation and reconcile tasks are really important in OSS. An OSS does a lot of data set joins at application / database level. We can take a lead from code-level use of linking keys.

Just like the Rosetta stone proved to be the key to unlocking Egyptian hieroglyphs, introducing linking keys can be useful for translating different “languages” at many levels in OSS projects.

How OSS/BSS facilitated Telkomsel’s structural revenue changes

The following two slides were presented by Monty Hong of Indonesia’s Telkomsel at Digital Transformation Asia 2018 last week. They provide a fascinating insight into the changing landscape of comms revenues that providers are grappling with globally and the associated systems decisions that Telkomsel has made.

The first shows the drastic declines in revenues from Telkomsel’s traditional telco products (orange line), contrasted with the rapid rise in revenues from content such as video and gaming.
Telkomsel Revenue Curve

The second shows where Telkomsel is repositioning itself into additional segments of the content value-chain (red chevrons at top of page show where Telkomsel is playing).
Telkomsel gaming ecosystem

Telkomsel has chosen to transform its digital core to specifically cater for this new revenue model with one API ecosystem. One of the focuses of this transformation is to support a multi-speed architectural model. Traditional back-end systems (eg OSS/BSS and system of records) are expected to rarely change, whilst customer-facing systems are expected to be highly agile to cater to changing customer needs.

More about the culture of this change tomorrow.

Is your data getting too heavy for your OSS to lift?

Data mass is beginning to exhibit gravitational properties – it’s getting heavy – and eventually it will be too big to move.”
Guy Lupo
in this article on TM Forum’s Inform that also includes contributions from George Glass and Dawn Bushaus.

Really interesting concept, and article, linked above.

The touchpoint explosion is helping to make our data sets ever bigger… and heavier.

In my earlier days in OSS, I was tasked with leading the migration of large sets of data into relational databases for use by OSS tools. I was lucky enough to spend years working on a full-scope OSS (ie it’s central database housed data for inventory management, alarm management, performance management, service order management, provisioning, etc, etc).

Having all those data sets in one database made it incredibly powerful as an insight generation tool. With a few SQL joins, you could correlate almost any data sets imaginable. But it was also a double-edged sword. Firstly, ensuring that all of the sets would have linking keys (and with high data quality / reliability) was a data migrator’s nightmare. Secondly, all those joins being done by the OSS made it computationally heavy. It wasn’t uncommon for a device list query to take the OSS 10 minutes to provide a response in the PROD environment.

There’s one concept that makes GIS tools more inherently capable of lifting heavier data sets than OSS – they generally load data in layers (that can be turned on and off in the visual pane) and unlike OSS, don’t attempt to stitch the different sets together. The correlation between data sets is achieved through geographical proximity scans, either algorithmically, or just by the human eye of the operator.

If we now consider real-time data (eg alarms/events, performance counters, etc), we can take a leaf out of Einstein’s book and correlate by space and time (ie by geographical and/or time-series proximity between otherwise unrelated data sets). Just wondering – How many OSS tools have you seen that use these proximity techniques? Very few in my experience.

BTW. I’m the first to acknowledge that a stitched data set (ie via linking keys such as device ID between data sets) is definitely going to be richer than uncorrelated data sets. Nonetheless, this might be a useful technique if your data is getting too heavy for your OSS to lift (eg simple queries are causing minutes of downtime / delay for operators).

Are telco services and SLAs no longer relevant?

I wonder if we’re reaching the point where “telecommunication services” is no longer a relevant term? By association, SLAs are also a bust. But what are they replaced by?

A telecommunication service used to effectively be the allocation of a carrier’s resources for use by a specific customer. Now? Well, less so

  1. Service consumption channel alternatives are increasing, from TV and radio; to PC, to mobile, to tablet, to YouTube, to Insta, to Facebook, to a million others.
    Consumption sources are even more prolific.
  2. Customer contact channel alternatives are also increasing, from contact centres; to IVR, to online, to mobile apps, to Twitter, etc.
  3. A service bundle often utilises third-party components, some of which are “off-net”
  4. Virtualisation is increasingly abstracting services from specific resources. They’re now loosely coupled with resource pools and rely on high availability / elasticity to ensure customer service continuity. Not only that, but those resource pools might extend beyond the carrier’s direct control and out to cloud provider infrastructure

The growing variant-tree is taking the concept beyond the reach of “customer services” and evolves to become “customer experiences.”

The elements that made up a customer service in the past tended to fall within the locus of control of a telco and its OSS. The modern customer experience extends far beyond the control of any one company or its OSS. An SLA – Service Level Agreement – only pertains to the sub-set of an experience that can be measured by the OSS. We can aspire to offer an ELA – Experience Level Agreement – because we don’t have the mechanisms by which to measure or manage the entire experience yet.

The metrics that matter most for telcos today tend to revolve around customer experience (eg NPS). But aside from customer surveys, ratings and derived / contrived metrics, we don’t have electronic customer experience measurements.

Customer services are dead; Long live the customer experiences king… if only we can invent a way to measure the whole scope of what makes up customer experiences.

Intent to simplify our OSS

The left-hand panel of the triptych below shows the current state of interactions with most OSS. There are hundreds of variants inbound via external sources (ie multi-channel) and even internal sources (eg different service types). Similarly, there are dozens of networks (and downstream systems), each with different interface models. Each needs different formatting and integration costs escalate.
Intent model OSS

The intent model of network provisioning standardises the network interface, drastically simplifying the task of the OSS and the variants required for it to handle. This becomes particularly relevant in a world of NFVs, where it doesn’t matter which vendor’s device type (router say) can be handled via a single command intent rather than having separate interfaces to each different vendor’s device / EMS northbound interface. The unique aspects of each vendor’s implementation are abstracted from the OSS.

The next step would be in standardising the interface / data model upstream of the OSS. That’s a more challenging task!!

Facebook’s algorithmic feed for OSS

This is the logic that led Facebook inexorably to the ‘algorithmic feed’, which is really just tech jargon for saying that instead of this random (i.e. ‘time-based’) sample of what’s been posted, the platform tries to work out which people you would most like to see things from, and what kinds of things you would most like to see. It ought to be able to work out who your close friends are, and what kinds of things you normally click on, surely? The logic seems (or at any rate seemed) unavoidable. So, instead of a purely random sample, you get a sample based on what you might actually want to see. Unavoidable as it seems, though, this approach has two problems. First, getting that sample ‘right’ is very hard, and beset by all sorts of conceptual challenges. But second, even if it’s a successful sample, it’s still a sample… Facebook has to make subjective judgements about what it seems that people want, and about what metrics seem to capture that, and none of this is static or even in in principle perfectible. Facebook surfs user behaviour..”
Ben Evans
here.

Most of the OSS I’ve seen tend to be akin to Facebook’s old ‘chronological feed’ (where users need to sift through thousands of posts to find what’s most interesting to them).

The typical OSS GUI has thousands of functions (usually displayed on a screen all at once – via charts, menus, buttons, pull-downs, etc). But of all of those available functions, any given user probably only interacts with a handful.
Current-style OSS interface

Most OSS give their users the opportunity to customise their menus, colour schemes, even filters. For some roles such as network ops, designers, order entry operators, there are activity lists, often with sophisticated prioritisation and skills-based routing, which starts to become a little more like the ‘algorithmic feed.’

However, unlike the random nature of information hitting the Facebook feed, there is a more explicit set of things that an OSS user is tasked to achieve. It is a little more directed, like a Google search.

That’s why I feel the future OSS GUI will be more like a simple search bar (like Google) that will provide a direction of intent as well as some recent / regular activity icons. Far less clutter than the typical OSS. The graphs and activity lists that we know and love would still be available to users, but the way of interacting with the OSS to find the most important stuff quickly needs to get more intuitive. In future it may even get predictive in knowing what information will be of interest to you.
OSS interface of the future

Extending the OSS beyond a customer’s locus of control

While the 20th century was all about centralizing command and control to gain efficiency through vertical integration and mass standardization, 21st century automation is about decentralization – gaining efficiency through horizontal integration of partner ecosystems and mass customization, as in the context-aware cloud where personalized experience across channels is dynamically orchestrated.
The operational challenge of our time is to coordinate these moving parts into coherent and manageable value chains. Instead of building yet another siloed and brittle application stack, the age of distributed computing requires that we re-think business architecture to avoid becoming hopelessly entangled in a “big ball of CRUD”
.”
Dave Duggal
here on TM Forum’s Inform back in May 2016.

We’ve quickly transitioned from a telco services market driven by economies of scale (Dave’s 20th century comparison) to a “market of one” (21st century), where the market wants a personalised experience that seamlessly crosses all channels.

By and large, the OSS world is stuck between the two centuries. Our tools are largely suited to the 20th century model (in some cases, today’s OSS originated in the 20th century after all), but we know we need to get to personalisation at scale and have tried to retrofit them. We haven’t quite made the jump to the model Dave describes yet, although there are positive signs.

It’s interesting. Telcos have the partner ecosystems, but the challenge is that the entire ecosystem still tends to be centrally controlled by the telco. This is the so-called best-of-breed model.

In the truly distributed model Dave talks about, the telcos would get the long tail of innovation / opportunity by extending their value chain beyond their own OSS stack. They could build an ecosystem that includes partners outside their locus of control. Outside their CAPEX budget too, which is the big attraction. They telcos get to own their customers, build products that are attractive to those customers, gain revenues from those products / customers, but not incur the big capital investment of building the entire OSS stack. Their partners build (and share profits from) external components.

It sounds attractive right? As-a-service models are proliferating and some are steadily gaining take-up, but why is it still not happening much yet, relatively speaking? I believe it comes down to control.

Put simply, the telcos don’t yet have the right business architectures to coordinate all the moving parts. From my customer observation at least, there are too many fall-outs as a customer journeys hand off between components within the internally controlled partner ecosystem. This is especially when we talk omni-channel. A fully personalised solution leaves too many integration / data variants to provide complete test coverage. For example, just at the highest level of an architecture, I’ve yet to see a solution that tracks end-to-end customer journeys across all components of the OSS/BSS as well as channels such as online, IVR, apps, etc.

Dave rightly points out that this is the challenge of our times. If we can coherently and confidently manage moving parts across the entire value chain, we have more chance of extending the partner ecosystem beyond the telco’s locus of control.

OSS holds the key to network slicing

Network slicing opens new business opportunities for operators by enabling them to provide specialized services that deliver specific performance parameters. Guaranteeing stringent KPIs enables operators to charge premium rates to customers that value such performance. The flip side is that such agreements will inevitably come with tough contractual obligations and penalties when the agreed KPIs are not met…even high numbers of slices could be managed without needing to increase the number of operational staff. The more automation applied, the lower the operating costs. At 100 percent automation, there is virtually no cost increase with the number of slices. Granted this is a long-term goal and impractical in the short to medium term, yet even 50 percent automation will bring very significant benefits.”
From a paper by Nokia – “Unleashing the economic potential of network slicing.”

With typical communications services tending towards commoditisation, operators will naturally seek out premium customers. Customers with premium requirements such as latency, throughput, reliability, mobility, geography, security, analytics, etc.

These custom requirements often come with unique network configuration requirements. This is why network slicing has become an attractive proposition. The white paper quoted above makes an attempt at estimating profitability of network slicing including some sensitivity analyses. It makes for an interesting read.

The diagram below is one of many contained in the White Paper:
Nokia Network Slicing

It indicates that a significant level of automation is going to be required to achieve an equivalent level of operational cost to a single network. To re-state the quote, “The more automation applied, the lower the operating costs. At 100 percent automation, there is virtually no cost increase with the number of slices. Granted this is a long-term goal and impractical in the short to medium term, yet even 50 percent automation will bring very significant benefits.”

Even 50% operational automation is a significant ambition. OSS hold the key to delivering on this ambition. Such ambitious automation goals means we have to look at massive simplification of operational variant trees. Simplifications that include, but go far beyond OSS, BSS and networks. This implies whole-stack simplification.

OSS designed as a bundle, or bundled after?

Over the years I’m sure you’ve seen many different OSS demonstrations. You’ve probably also seen presentations by vendors / integrators that have shown multiple different products from their suite.

How integrated have they appeared to you?

  1. Have they seemed tightly integrated, as if carved from a single piece of stone?
  2. Or have they seemed loosely integrated, a series of obviously different stones joined together with some mortar?
  3. Or perhaps even barely associated, a series of completely different objects (possibly through product acquisition) branded under a common marketing name?

There are different pros and cons with each approach. Tight integration possibly suits a greenfields OSS. Looser integration perhaps better suits carve-off for best-of-breed customer architecture models.

I don’t know about you, but I always prefer to be given the impression that an attempt has been made to ensure consistency in the bundling. Consistency of user-interface, workflow, data modelling/presentation, reports, etc. With modern presentation layers, database technologies and the availability of UX / CX expertise, this should be less of a hurdle than it has been in the past.