OSS feature parity. A functionality arms race

OSS Vendor 1. “I have 1 million features.” (Dr Evil puts finger in mouth)
OSS Vendor 2. “Yeah, well I have 1,000,001 features in my OSS.”

This is the arms-race that we see in OSS, just like almost any other tech product. I imagine that vendors get into this arms-race because they wish to differentiate. Better to differentiate on functionality than price. If there’s a feature parity, then the only differentiator is price. We all know that doesn’t end well!

But I often ask myself a few related questions:

  • Of those million features, how many are actually used regularly
  • As a vendor do you have logging that actually allows you to know what features are being used
  • Taking the Whale Curve perspective, even if being used, how many of those features are actually contributing to the objectives of the vendor
    • Do they clearly contribute towards making sales
    • Do customers delight in using them
    • Would customers be irate if you removed them
    • etc

Earlier this week, I spoke about a friend who created an alarm management tool by himself over a weekend. It didn’t have a million features, but it did have all of what I’d consider to be the most important ones. It did look like a lot of other alarm managers that are now on the market. The GUI based on alarm lists still pervades.

If they all look alike, and all have feature parity, how do you differentiate? If you try to add more features, is it safe to assume that those features will deliver diminishing returns?

But is an alarm list and the flicking of tickets the best way to manage network health?

What if, instead of seeking incremental improvement, someone went back to the most important requirements and considered whether the current approach is meeting those customer needs? I have a strong suspicion that customer feedback will indicate that there are definitely flaws to overcome, especially on high event volume networks.

Clever use of large data volumes provides a level of pre-cognition and automation that wasn’t available when simple alarm lists were first invented. This in turn potentially changes the way that operators can engage with network monitoring and management.

What if someone could identify a whole new user interface / approach that overcame the current flaws and exceeded the key requirements? Would that be more of a differentiator than adding a 1,000,002nd feature?

If you’re looking for a comparison, there were plenty of MP3 players on the market with a heap of features, many more than the iPod. We all know how that one played out!

Pitching an OSS? Don’t call it OSS.

If you asked me how to sell cybersecurity, I wouldn’t call it cybersecurity.” The raw truth of the statement hit me like a lightning bolt between the eyes. Cybersecurity might loosely describe what we do, and we tell people it’s what we’re selling, but it’s not what people buy.
Safety. Assurance. Peace of mind. Confidence. These are the kinds of things that people buy, concepts which ordinary people can understand and relate to because they are feelings which they have experienced themselves. Cybersecurity is not a next gen firewall, or multi-layered endpoint protection with machine learning and threat sandbox technology. Cybersecurity is not risk management or ISO27001 policies. Cybersecurity is being able to use the Internet in any way I can imagine without having to worry I might lose my family photos, get robbed, or get in trouble with my boss. If you could (honestly) sell me “worry free Internet”, I’d buy it in a heartbeat, and so would everyone you know
.”
Corch X
, here.

Sound familiar?
If you asked me how to sell OSS, I wouldn’t call it OSS. Doh! Now you enlighten me… after I’ve already chosen the domain name, PassionateAboutOSS.com. After I’ve already written over 2,000 posts on topics like orchestration, microservices, cloud-native, DevOps, and every other technical buzzword. Time to start again from scratch.

One thing in my favour is that you, the audience I’m interacting with, also speaks in the same jargon. These are the terms we use to communicate with each other. To get things started. To get things done. To get things delivered.

That’s all fine if we’re only interacting with like-minded OSS experts. However, of the thousands of people who interact with our OSS / BSS, only a small percentage are OSS experts. A majority of people use the tools rather than designing, building or commissioning them.

The people who use the tools have a huge range of job roles and reasons for needing to use our OSS / BSS. Just like with cybersecurity, the core reasons could be Safety. Assurance. Peace of mind. Confidence. But they might also include Speed. Efficiency. Reliability. Repeatability. Simplicity. Monetisation. Insightful. And more.

The challenge we have is that so much of the benefit that our OSS and BSS deliver is intangible. We might talk about orchestration delivering speed, simplicity, reliability, etc. But how do we establish a more tangible link?

How do we achieve the equivalent of what the “Intel Inside” marketing ploy delivered, which made people associate an otherwise obscure integrated circuit with a premium feature to consider when they bought their next computing device. How do we ensure that people know that our OSS / BSS is the master of puppets that make our networks dance? It’s our OSS / BSS that are pulling all the strings of operationalisation, connecting customers with networks.

Would an EoL be beneficial for OSS?

In the world of networking, it’s common for devices to go EOL (end-of-life). Capital spend and depreciation models are based around refresh cycles of around 5-7 years. Vendors reinforce this refresh cycle by designing obsolescence into maintenance, support and part supplies. Customers tend to simply submit to the risk of having no vendor support by buying the next generation replacements.

But how often do you hear of an OSS going EOL? Not often right? They tend to get written off only when the cost of upkeep outweighs new revenues.

I know, I can hear you saying that software is different from hardware and of course I agree with you. I’d partially counter by claiming that software architectures and development platforms also have a discernibly useful life just like physical network devices. If you doubt that, I’m sure you’ve seen OSS tools with origins in the 1990s that are still being developed upon. I tend to believe that product usefulness becomes asymptotic for its vendors. With the speed of change and proliferation of new platforms, useful lives are getting ever-shorter.

Would a pre-ordained product replacement life-cycle be beneficial for the OSS industry? It has some merits.

For a start, planned obsolescence enforces designs with interchangeability, in line with the small-grid OSS described yesterday. It promotes short-term enhancements to long-term visions. It becomes easier for customers to write off their investment and inject new capital into the vendor market. It penalises the amount of Frankenstein integrations that tend to become increasingly burdensome (to vendor and customer) into the future. It enforces those mythical beasts of telco software – subtraction projects. It promotes innovation to avoid the asymptotic benefit deterioration curve shown below:
Asymptotic OSS feature development

As the asymptote is being reached, a new jumping-off point commences with the new product.

But it’s a difficult status-quo to break. Vendors have invested millions of developer hours into their products. Taking a product EoL is effectively throwing that invested effort away. For carriers, it means the risk and cost of breaking integrations / processes and replacing them with new ones.

I’d love to hear your thoughts on whether an EOL model might be relevant / useful for your OSS.

The future of work and its impact on OSS

Many years ago, I worked on a seriously big OSS transformation for one of the region’s biggest telcos. Everything was big on the project, the investment, the resources, the documentation. Everything except the outcomes. There was so much inefficiency that I often spoke about making one day of progress for every ten on site. Meetings, bureaucracy, impossible approval cycles, customer re-organisations, over-analysis, etc all added up to stagnation.

This contrasted so much with some of the amazing small teams I’ve worked alongside. Teams that worked cohesively, cleverly and just got stuff done with almost no resources. It’s one of the reasons I feel that the future of work, even for the very large organisations, will be via small teams. Outsourced to small, efficient teams / organisations. The gig economy, and the proliferation of tools that support it, make it an obvious approach to take, especially for very large organisations to leverage. Proof of work technologies, such as those building upon the discovery of blockchain, will provide further impetus to use smaller teams of experts.

Experts like a friend and colleague of mine who once built an alarm management tool in a weekend, by himself. It also happened to be more sophisticated than his employer’s existing tool that had taken years of combined developer effort by a larger team.

Maybe I’ll be proven wrong, but I see the transition to this model of work as being inevitable. The question I have is how to make our OSS more accommodating of this work model. Behemoth OSS stacks won’t. Highly modular OSS made up of many smaller components probably will, as long as they don’t succumb to the OSS chessboard analogy. The pulleys and strings will make it impossible for small, interchangable teams to decipher and manage.

A small-grid OSS model is the one I’d be backing in.

OSS – like a duck on a pond

Let’s start with a basic question. “What does an OSS need to do?”

The basic answer is, “make operations easier.”

The real answer(s) is so much more nuanced than that of course. The term easier can also encapsulate other words such as faster, more accurate, more repeatable, cheaper, etc.

Designing, building, operating and maintaining a sizable network is extremely challenging, despite network operators around the world, and the vendors that supply to them, employing some of the best and brightest. So we design OSS and related tools / processes to make operations easier.

Yet I sometimes wonder whether we achieve that aim – to make operations easier. Seems to me that we tend to focus more on just replicating functions at a higher layer in the management stack. That is, moving the function to the OSS rather than EMS/NMS, without really making it much easier operationally.

Let’s start at the user interface (UI). How often are they intuitive enough for an experienced network operator to start doing tasks with negligible OSS expert guidance?
Let’s look at deployments. How often are the projects low on effort, risk, cost and complexity?
Let’s look at flexibility (ie in-flight modifications or transformations). How often do we actually deliver flexibility to our customers through our OSS. To ask the same as above, how often are our changes low on effort, risk, cost and complexity?

As a small step towards providing an answer, I wonder whether it’s a case of making the hard things look easy and the easy things look hard.

We want to make the really hard operational things much easier to do within an OSS because that’s the primary purpose of an OSS. That’s the example of a duck on a pond. The OSS is gliding along effortlessly across the top of the water, but under the water it is paddling furiously.

Conversely, we want to make the really easy* operational things look hard to do within an OSS so that we’re not constantly being asked to build functionality / complexity into our OSS that doesn’t warrant being there. It diffuses the intent of the OSS. Just because we can, doesn’t mean we should.

For fear of OSS investment

Friday’s post discussed three analogies about the challenges of performing an OSS pivot.

The biggest challenge in initiating the transformation / replacement of any significant OSS is fear. There are many OSS out there whose “owners” want to change and need to change… but fear changing because a significant pivot would mean a “sell the farm” decision.

The fear is completely understandable. These are highly complex projects with so many potential pitfalls that invest massive amounts of resource (time, money, people). The risks can be huge for sponsors / stakeholders / investors. Failure of these projects can be career changing. The upside potential rarely balances the downside risk.

So, the only choice we have is to present pivots that aren’t “bet the farm” decisions.

The delivery approach of a bet the farm pivot tends to look like this:
The Bet-the-farm OSS Transformation Approach

The less risky, dependency-reduced, stepping-stone transformation tends to look a bit like this, but probably with a lot more verticals, as described here:
The Stepping-Stone OSS Transformation Approach

Do the laws of physics prevent you from making an OSS pivot?

AIrcraft carrier
Image linked from GCaptain.com.

As you already know, the word pivot has become common in the world of business, particularly the world of start-ups. It’s a euphemism for a significant change in strategic direction. In the context of today’s post, I love the word pivot because it implies a rapid change in direction, something that’s seemingly impossible for most of our OSS and the customers who use them.

I like to use analogies. It’s no coincidence that some of the analogies posted here on PAOSS relate to the challenge in making strategic change in our OSS. Here are just three of those analogies:

The OSS intertia principle relates classical physics with our OSS, where Force equals Mass x Acceleration (F = ma). In other words, the greater the mass (of your OSS), the more force must be applied to reach a given acceleration (ie to effect a change)

The OSS chess-board analogy talks about the rubber bands and pulleys (ie integrations) that enmesh the pieces on our OSS chessboard. This means that other pieces get dragged out of position whenever we try to move any individual piece and chaos ensues.

The aircraft carrier analogy compares OSS (and the CSPs they service) with navies of old. In days gone by, CSPs enjoyed command of the sea. Their boats were big, powerful and mobile enough to move around world. However, their size requires significant planning to change course. The newer application and content communications models are analogous to the advent of aviation. The over the top (OTT) business model has the speed, flexibility, lower cost base and diversity of aircraft. Air supremacy has changed the competitive dynamic. CSPs and our OSS can’t quickly change from being a navy to being an airforce, so the aircraft carrier approach looks to the future whilst working within the constraints of the past.

When making day to day changes within, and to, your OSS does the ability to pivot ever come to mind?

Do you intentionally ensure it stays small, modular and limit its integrations to simplify your game of OSS chess?
If constrained by existing mass that you simply can’t eliminate, do you seek to transform via OSS‘s aviation equivalents?
Or like many of the OSS around the world, are you just making them larger, enmeshed behemoths that will never be able to change the laws of physics and achieve a pivot?

Do any of our global target architectures represent such behemoths?

Falsely rewarding based on OSS existence rather than excellence

There’s a common belief that most jobs see people rewarded for presence rather than performance. That is, they’re encouraged to be on site from 9am to 5pm rather than being given free reign over their work schedules as long as key outcomes are met / exceeded.

In OSS vendor / product selection there’s a similar concept. Contracts are often awarded based on existence rather than excellence. When evaluating a product, if it’s able to do a majority of the functions in the long list of requirements then the box is ticked.

However, this doesn’t take into account that there are usually only a very small number of functions that any given customer’s OSS needs to perform at a very high level of efficiency. All the others are effectively just nice to have. That’s the 80/20 rule at work.

When guiding a customer through their vendor selections, I always take them through an exercise to identify the use-cases / functions that really matter. Then we ensure that the demos or proofs of concept focus closely on how excellent the OSS is at those most important factors.

OSS automations – just because we can, doesn’t mean we should

Automation is about using machines / algorithms to respond faster than humans can, or more efficiently than humans can, or more accurately than humans can… but only if the outcomes justify the costs. When it comes to automations, it’s a case of, “just because we can, doesn’t mean we should.”

The more complex the decision tree you’re trying to automate, the higher the costs and therefore the harder it becomes to cost-justify. So the first step in any automation is taking a lateral thinking approach to simplifying the decision tree.

This recent post highlighted a graph from Nokia’s Bell Labs and the financial dependency that network slicing has on operational automation:
Nokia Network Slicing

Let’s use the Toyota Five Whys technique to work our way through the implications of this:

Statement 0: As CSPs, we need to drastically reduce complexity in the processes / decision-trees across our whole organisation.

Why 1? So that we can apply significant levels of automation

Why 2? So that we can apply technologies / techniques such as network slicing or virtualisation that are cost-justifiable

Why 3? So that we can offer differentiated, premium services

Why 4? So that our offerings don’t become commodities

Why 5? So that we retain corporate profitability to return to shareholders and/or invest in further interesting projects

I love that we’re looking to all number of automation technologies / techniques to apply to our OSS. However, we’re bypassing the all-important statement 0. We’re starting at Why 1 and partially missing the cost-justifiable part of Why 2. If our automation projects don’t prove cost-justifiable, then we never get the chance to reach whys 3, 4 and 5.

OSS that are profitable, difficult, or important?

Apple became the first company to be worth a trillion dollars. They did that by spending five years single-mindedly focusing on doing profitable work. They’ve consistently pushed themselves toward high margin luxury goods and avoided just about everything else. Belying their first two decades, when they focused on breakthrough work that was difficult and perhaps important, nothing they’ve done recently has been either…
Profitable, difficult, or important — each is an option. A choice we get to make every day. ‘None of the above’ is also available, but I’m confident we can seek to do better than that
. ”
Seth Godin
in this post.

I encourage you to view the entire post at the link above. It gives definitions (and examples) of organisations that focus on profitable, difficult or important activities.

In OSS, the organisations that focus on the profitable are the ones investing heavily on glossy sales / marketing and only making incremental improvements to products that have been around for years.

Then there are others that are doing the difficult and innovative and complex work (ie the sexy work for all of us tech-heads). This recent article about ONAP talks about the fantastic tech-driven ambitions of that program, but then distills it down to the business objectives.

That leaves us with the important – the business needs / objectives – and this is where the customers come in. Speak with any OSS customer (or customer’s customer for that matter) and you’ll tend to find frustrations with their OSS. Frustration with complexity, time to deliver / modify, cost to deliver / modify, risks, functionality constraints, etc.

This is a simplification of course, but do you notice that as an industry, our keen focus on the profitable and difficult might just be holding us back from doing the important?

OSS holds the key to network slicing

Network slicing opens new business opportunities for operators by enabling them to provide specialized services that deliver specific performance parameters. Guaranteeing stringent KPIs enables operators to charge premium rates to customers that value such performance. The flip side is that such agreements will inevitably come with tough contractual obligations and penalties when the agreed KPIs are not met…even high numbers of slices could be managed without needing to increase the number of operational staff. The more automation applied, the lower the operating costs. At 100 percent automation, there is virtually no cost increase with the number of slices. Granted this is a long-term goal and impractical in the short to medium term, yet even 50 percent automation will bring very significant benefits.”
From a paper by Nokia – “Unleashing the economic potential of network slicing.”

With typical communications services tending towards commoditisation, operators will naturally seek out premium customers. Customers with premium requirements such as latency, throughput, reliability, mobility, geography, security, analytics, etc.

These custom requirements often come with unique network configuration requirements. This is why network slicing has become an attractive proposition. The white paper quoted above makes an attempt at estimating profitability of network slicing including some sensitivity analyses. It makes for an interesting read.

The diagram below is one of many contained in the White Paper:
Nokia Network Slicing

It indicates that a significant level of automation is going to be required to achieve an equivalent level of operational cost to a single network. To re-state the quote, “The more automation applied, the lower the operating costs. At 100 percent automation, there is virtually no cost increase with the number of slices. Granted this is a long-term goal and impractical in the short to medium term, yet even 50 percent automation will bring very significant benefits.”

Even 50% operational automation is a significant ambition. OSS hold the key to delivering on this ambition. Such ambitious automation goals means we have to look at massive simplification of operational variant trees. Simplifications that include, but go far beyond OSS, BSS and networks. This implies whole-stack simplification.

OSS designed as a bundle, or bundled after?

Over the years I’m sure you’ve seen many different OSS demonstrations. You’ve probably also seen presentations by vendors / integrators that have shown multiple different products from their suite.

How integrated have they appeared to you?

  1. Have they seemed tightly integrated, as if carved from a single piece of stone?
  2. Or have they seemed loosely integrated, a series of obviously different stones joined together with some mortar?
  3. Or perhaps even barely associated, a series of completely different objects (possibly through product acquisition) branded under a common marketing name?

There are different pros and cons with each approach. Tight integration possibly suits a greenfields OSS. Looser integration perhaps better suits carve-off for best-of-breed customer architecture models.

I don’t know about you, but I always prefer to be given the impression that an attempt has been made to ensure consistency in the bundling. Consistency of user-interface, workflow, data modelling/presentation, reports, etc. With modern presentation layers, database technologies and the availability of UX / CX expertise, this should be less of a hurdle than it has been in the past.

If ONAP is the answer, what are the questions?

ONAP provides a comprehensive platform for real-time, policy-driven orchestration and automation of physical and virtual network functions that will enable software, network, IT and cloud providers and developers to rapidly automate new services and support complete lifecycle management.
By unifying member resources, ONAP is accelerating the development of a vibrant ecosystem around a globally shared architecture and implementation for network automation–with an open standards focus–faster than any one product could on its own
.”
Part of the ONAP charter from onap.org.

The ONAP project is gaining attention in service provider circles. The Steering Committee of the ONAP project hints at the types of organisations investing in the project. The statement above summarises the mission of this important project. You can bet that the mission has been carefully crafted. As such, one can assume that it represents what these important stakeholders jointly agree to be the future needs of their OSS.

I find it interesting that there are quite a few technical terms (eg policy-driven orchestration) in the mission statement, terms that tend to pre-empt the solution. However, I don’t feel that pre-emptive technical solutions are the real mission, so I’m going to try to reverse-engineer the statement into business needs. Hopefully the business needs (the “why? why? why?” column below) articulates a set of questions / needs that all OSS can work to, as opposed to replicating the technical approach that underpins ONAP.

Phrase Interpretation Why? Why? Why?
real-time The ability to make instantaneous decisions Why1: To adapt to changing conditions
Why2: To take advantage of fleeting opportunities or resolve threats
Why 3: To optimise key business metrics such as financials
Why 4: As CSPs are under increasing pressure from shareholders to deliver on key metrics
policy-driven orchestration To use policies to increase the repeatability of key operational processes Why 1: Repeatability provides the opportunity to improve efficiency, quality and performance
Why 2: Allows an operator to service more customers at less expense
Why 3: Improves corporate profitability and customer perceptions
Why 4: As CSPs are under increasing pressure from shareholders to deliver on key metrics
policy-driven automation To use policies to increase the amount of automation that can be applied to key operational processes Why 1: Automated processes provide the opportunity to improve efficiency, quality and performance
Why 2: Allows an operator to service more customers at less expense
Why 3: Improves corporate profitability and customer perceptions
physical and virtual network functions Our networks will continue to consist of physical devices, but we will increasingly introduce virtualised functionality Why 1: Physical devices will continue to exist into the foreseeable future but virtualisation represents an exciting approach into the future
Why 2: Virtual entities are easier to activate and manage (assuming sufficient capacity exists)
Why 3: Physical equipment supply, build, deploy and test cycles are much longer and labour intensive
Why 4: Virtual assets are more flexible, faster and cheaper to commission
Why 5: Customer services can be turned up faster and cheaper
software, network, IT and cloud providers and developers With this increase in virtualisation, we find an increasingly large and diverse array of suppliers contributing to our value-chain. These suppliers contribute via software, network equipment, IT functions and cloud resources Why 1: CSPs can access innovation and efficiency occurring outside their own organisation
Why 2: CSPs can leverage the opportunities those innovations provide
Why 3: CSPs can deliver more attractive offers to customers
Why 4: Key metrics such as profitability and customer satisfaction are enhanced
rapidly automate new services We want the flexibility to introduce new products and services far faster than we do today Why 1: CSPs can deliver more attractive offers to customers faster than competitors
Why 2: Key metrics such as market share, profitability and customer satisfaction are enhanced as well as improved cashflow
support complete lifecycle management The components that make up our value-chain are changing and evolving so quickly that we need to cope with these changes without impacting customers across any of their interactions with their service Why 1: Customer satisfaction is a key metric and a customer’s experience spans the entire lifecyle of their service.
Why 2: CSPs don’t want customers to churn to competitors
Why 3: Key metrics such as market share, profitability and customer satisfaction are enhanced
unifying member resources To reduce the amount of duplicated and under-synchronised development currently being done by the member bodies of ONAP Why 1: Collaboration and sharing reduces the effort each member body must dedicate to their OSS
Why 2: A reduced resource pool is required
Why 3: Costs can be reduced whilst still achieving a required level of outcome from OSS
vibrant ecosystem To increase the level of supplier interchangability Why 1: To reduce dependence on any supplier/s
Why 2: To improve competition between suppliers
Why 3: Lower prices, greater choice and greater innovation tend to flourish in competitive environments
Why 4: CSPs, as customers of the suppliers, benefit
globally shared architecture To make networks, services and support systems easier to interconnect across the global communications network Why 1: Collaboration on common standards reduces the integration effort between each member at points of interconnect
Why 2: A reduced resource pool is required
Why 3: Costs can be reduced whilst still achieving interconnection benefits

As indicated in earlier posts, ONAP is an exciting initiative for the CSP industry for a number of reasons. My fear for ONAP is that it becomes such a behemoth of technical complexity that it becomes too unwieldy for use by any of the member bodies. I use the analogy of ATM versus Ethernet here, where ONAP is equivalent to ATM in power and complexity. The question is whether there’s an Ethernet answer to the whys that ONAP is trying to solve.

I’d love to hear your thoughts.

(BTW. I’m not saying that the technologies the ONAP team is investigating are the wrong ones. Far from it. I just find it interesting that the mission is starting with a technical direction in mind. I see parallels with the OSS radar analogy.)

Very little OSS data is ever actually used

We keep shiploads of data in our OSS don’t we? Just think about how much storage your OSS estate consumes.

Technically, it doesn’t cost much (relatively) to retain all that potential for insight generation with the cost of storage diminishing. The real cost of storing the data goes a little deeper than the $/Mb though. Other cost factors include data curation, cleansing, database search performance, etc.

There’s a whole field of study relating to this, named Information Lifecycle Management (ILM), but let’s look at it in terms of relevance to OSS.

We collect information across different timescales including real-time processing, short-term correlations, longer-term trending and long-term statutory / regulatory.

Information Lifecycle Management
Note: I suspect the “Less Archive” box actually should say “Less Active”.
Diagram above sourced from here.

But rather than blindly just storing everything, we could ask ourselves at what stage does each data sub-set lose relevance. As our OSS data ages, it can tend to deteriorate because the models it uses also deteriorate. Model deterioration factors, such as those described in this recent post about a machine-learning PoC and the following, are numerous:

  • Network devices change (including cards, naming conventions used, life-cycle upgrades, capacity, new alarm types, etc)
  • Network topologies change
  • Business processes change
  • Customer behaviours change
  • Product / Service offerings change
  • Regulations change
  • New datasets become available
  • Data model factors change to cope with gaps in original models

Each of these factors (and more) lead to deterioration in the usefulness of baseline data. This means the insight signals in the data becomes less clear, or at worst the baseline needs to be re-established, making old data invalid. If it’s invalid, then retention would appear to be pointless. Shifting it to the right through the storage types shown in the diagram above could also be pointless.

Very little of the OSS data you store is ever actually used, decreasingly so as it ages. Do you have a heatmap of what data you use in your OSS?

Where are the reliability hotspots in your OSS?

As you already know, there are two categories of downtime – unplanned (eg failures) and planned (eg upgrades / maintenance).

Planned downtime sounds a lot nicer (for operators) but the reality is that you could call both types “incidents” – they both impact (or potentially impact) the customer. We sometimes underestimate that fact.

Today’s question is whether you’re able to identify where the hotspots are in your OSS suite when you combine both types of downtime. Can you tell which outages are service-impacting?

In a round-about way, I’m asking whether you already have a dashboard that monitors uptime of all the components (eg applications, probes, middleware, infra, etc) that make up your complete OSS / BSS estate? If you do, does it tell you what you anecdotally know already, or are there sometimes surprises?

Does the data give you the evidence you need to negotiate with the implementers of problematic components (eg patch cadence, the need for reliability fixes, streamlining the patch process, reduction in customisations, etc)? Does it give you reason to make architectural changes (eg webscaling)?

Persona mapping for OSS PoCs

When selecting new applications for an OSS or to augment an existing OSS, it always makes sense to me to run a Proof of Concept. But what do we want to demonstrate in that PoC? For me, we want to run demonstrations of the factors (eg features, use-cases, processes, etc) that justify the investment.

A simple exercise you can use is to identify the personas / roles that interact with the OSS. This could include personas such as NOC operator, strategic planner, network engineer, order entry, field ops, data / analytics, application administrator, etc. The actual personas will differ within each organisation of course.

For each of those personas, we can identify and interview an individual that represents that persona.

Interview questions include:

  1. What are the key responsibilities of your role
  2. What is the most important goal / KPI for your role
  3. How does this OSS (or proposed OSS) support you meeting this goal
  4. Describe the single most important process / function that you perform using the OSS
  5. Why is it so important
  6. How often do you perform this process / function
  7. Please provide a short list of other important processes / functions you perform with this OSS

We can then build this into a matrix and seek to prioritise into a set of use-cases. Based on time and cost constraints, we can then build the top-n of those use-cases into implementation scenarios for the PoC.

If your partners don’t have to talk to you then you win

If your partners don’t have to talk to you then you win.”
Guy Lupo
.

Put another way, the best form of customer service is no customer service (ie your customers and/or partners are so delighted with your automated offerings that they have no reason to contact you). They don’t want to contact you anyway (generally speaking). They just want to consume a perfectly functional and reliable solution.

In the deep, distant past, our comms networks required operators. But then we developed automated dialling / switching. In theory, the network looked after itself and people made billions of calls per year unassisted.

Something happened in the meantime though. Telco operators the world over started receiving lots of calls about their platform and products. You could say that they’re unwanted calls. The telcos even have an acronym called CVR – Call Volume Reduction – that describes their ambitions to reduce the number of customer calls that reach contact centre agents. Tools such as chatbots and IVR have sprung up to reduce the number of calls that an operator fields.

Network as a Service (NaaS), the context within Guy’s comment above, represents the next new tool that will aim to drive CVR (amongst a raft of other benefits). NaaS theoretically allows customers to interact with network operators via impersonal contracts (in the form of APIs). The challenge will be in the reliability – ensuring that nothing falls between the cracks in any of the layers / platforms that combine to form the NaaS.

In the world of NaaS creation, Guy is exactly right – “If your partners [and customers] don’t have to talk to you then you win.” As always, it’s complexity that leads to gaps. The more complex the NaaS stack, the less likely you are to achieve CVR.

The OSS self-driving vehicle

I was lucky enough to get some time of a friend recently, a friend who’s running a machine-learning network assurance proof-of-concept (PoC).

He’s been really impressed with the results coming out of the PoC. However, one of the really interesting factors he’s been finding is how frequently BAU (business as usual) changes in the OSS data (eg changes in naming conventions, topologies, etc) would impact results. Little changes made by upstream systems effectively invalidated baselines identified by the machine-learning engines to key in on. Those little changes meant the engine had to re-baseline / re-learn to build back up to previous insight levels. Or to avoid invalidating the baseline, it would require re-normalising all of data prior to the identification of BAU changes.

That got me wondering whether DevOps (or any other high-change environment) might actually hinder our attempts to get machine-led assurance optimisation. But more to the point, does constant change (at all levels of a telco business) hold us back from reaching our aim of closed-loop / zero-touch assurance?

Just like the proverbial self-driving car, will we always need someone at the wheel of our OSS just in case a situation arises that the machines hasn’t seen before and/or can’t handle? How far into the future will it be before we have enough trust to take our hands off the OSS wheel and let the machines drive closed-loop processes without observation by us?

Designing an Operational Domain Manager (ODM)

A couple of weeks ago, Telstra and the TM Forum held an event in Melbourne on OSS for next gen architectures.

The diagram below comes from a presentation by Corey Clinger. It describes Telstra’s Operational Domain Manager (ODM) model that is a key component of their Network as a Service (NaaS) framework. Notice the API stubs across the top of the ODM? Corey went on to describe the TM Forum Open API model that Telstra is building upon.
Operational Domain Manager (ODM)

In a following session, Raman Balla indicated an perspective that differs from many existing OSS. The service owner (and service consumer) must know all aspects of a given service (including all dimensions, lifecycle, etc) in a common repository / catalog and it needs to be attribute-based. Raman also indicated that the aim he has for architecting NaaS is to not only standardise the service, but the entire experience around the service.

In the world of NaaS, operators can no longer just focus separately on assurance or fulfillment or inventory / capacity, etc. As per DevOps, operators are accountable for everything.

The OSS Ferrari analogy

A friend and colleague has recently been talking about a Ferrari analogy on a security project we’ve been contributing to.

The end customers have decided they want a Ferrari solution, a shiny new, super-specified new toy (or in this case toys!). There’s just one problem though. The customer has a general understanding of what it is to drive, but doesn’t have driving experience or a driver’s license yet (ie they have a general understanding of what they want but haven’t described what they plan to do with the shiny toys operationally once the keys are handed over).

To take a step further back, since the project hasn’t articulated exactly where the customers want to go with the solution, we’re asking whether a Ferrari is even the right type of vehicle to take them there. As amazing as Ferraris are, might it actually make more sense to buy a 4WD vehicle?

As indicated in yesterday’s post, sometimes the requirements gathering process identifies the goal-based expectations (ie the business requirements – where the customer wants to go), but can often just identify a set of product features (ie the functional requirements such as a turbo-charged V8 engine, mid-mount engine, flappy-paddle gear change, etc, etc). The latter leads to buying a Ferrari. The former is more likely to lead to buying the vehicle best-suited to getting to the desired destination.

The OSS Ferrari sounds nice, but…