As a network owner….

….I want to make my network so observable, reliable, predictable and repeatable that I don’t need anyone to operate it.

That’s clearly a highly ambitious goal. Probably even unachievable if we say it doesn’t need anyone to run it. But I wonder whether this has to be the starting point we take on behalf of our network operator customers?

If we look at most networks, OSS, BSS, NOC, SOC, etc (I’ll call this whole stack “the black box” in this article), they’ve been designed from the ground up to be human-driven. We’re now looking at ways to automate as many steps of operations as possible.

If we were to instead design the black-box to be machine-driven, how different would it look?

In fact, before we do that, perhaps we have to take two unique perspectives on this question:

  1. Retro-fitting existing black-boxes to increase their autonomy
  2. Designing brand new autonomous black-boxes

I suspect our approaches / architectures will be vastly different.

The first will require a incredibly complex measure, command and control engine to sit over top of the existing black box. It will probably also need to reach into many of the components that make up the black box and exert control over them. This approach has many similarities with what we already do in the OSS world. The only exception would be that we’d need to be a lot more “closed-loop” in our thinking. I should also re-iterate that this is incredibly complex because it inherits an existing “decision tree” of enormous complexity and adds further convolution.

The second approach holds a great deal more promise. However, it will require a vastly different approach on many levels:

  1. We have to take a chainsaw to the decision tree inside the black box. For example:
    • We start by removing as much variability from the network as possible. Think of this like other utilities such as water or power. Our electricity service only has one feed-type for almost all residential and business customers. Yet it still allows us great flexibility in what we plug into it. What if a network operator were to simply offer a “broadband dial-tone” service and end users decide what they overlay on that bit-stream
    • This reduces the “protocol stack” in the network (think of this in terms of the long list of features / tick-boxes on any router’s brochure)
    • As well as reducing network complexity, it drastically reduces the variables an end-user needs to decide from. The operator no longer needs 50 grandfathered, legacy products 
    • This also reduces the decision tree in BSS-related functionality like billing, rating, charging, clearing-house
    • We achieve a (globally?) standardised network services catalog that’s completely independent of vendor offerings
    • We achieve a more standardised set of telemetry data coming from the network
    • In turn, this drives a more standardised and minimal set of service-impact and root-cause analyses
  2. We design data input/output methods and interfaces (to the black box and to any of its constituent components) to have closed-loop immediacy in mind. At the moment we tend to have interfaces that allow us to interrogate the network and push changes into the network separately rather than tasking the network to keep itself within expected operational thresholds
  3. We allow networks to self-regulate and self-heal, not just within a node, but between neighbours without necessarily having to revert to centralised control mechanisms like OSS
  4. All components within the black-box, down to device level, are programmable. [As an aside, we need to consider how to make the physical network more programmable or reconcilable, considering that cables, (most) patch panels, joints, etc don’t have APIs. That’s why the physical network tends to give us the biggest data quality challenges, which ripples out into our ability to automate networks]
  5. End-to-end data flows (ie controls) are to be near-real-time, not constrained by processing lags (eg 15 minute poll cycles, hourly log processing cycles, etc) 
  6. Data minimalism engineering. It’s currently not uncommon for network devices to produce dozens, if not hundreds, of different metrics. Most are never used by operators manually, nor are likely to be used by learning machines. This increases data processing, distribution and storage overheads. If we only produce what is useful, then it should improve data flow times (point 5 above). Therefore learning machines should be able to control which data sets they need from network devices and at what cadence. The learning engine can start off collecting all metrics, then progressively turning them off as they deem metrics unnecessary. This could also extend to controlling log-levels (ie how much granularity of data is generated for a particular log, event, performance counter)
  7. Perhaps we even offer AI-as-a-service, whereby any of the components within the black-box can call upon a centralised AI service (and the common data lake that underpins it) to assist with localised self-healing, self-regulation, etc. This facilitates closed-loop decisions throughout the stack rather than just an over-arching command and control mechanism

I’m barely exposing the tip of the iceberg here. I’d love to get your thoughts on what else it will take to bring fully autonomous network to reality.

Net Simplicity Score (NSS) gets a little more complex

In last Tuesday’s post, I asked the community here on PAOSS and on TM Forum’s Engage platform for ideas about how you would benchmark complexity.

I also provided a reference to an old post that described the concept of a NSS (Net Simplicity Score) for our OSS/BSS.

Due to the complexity of factors that contribute to a complexity score, the NSS is a “catch-all” simplicity metric. Hopefully it will allow subtraction projects to be easily justified, just as the NPS (Net Promoter Score) metric has helped justify customer experience initiatives.

The NSS (Net Simplicity Score), could be further broken down into:

  • The NCSS (Net Customer Simplicity Score) – A ranking from 0 (lowest) to 10 (highest) how easy is it to choose and use the company / product / service? This is an external metric (ie the ranking of the level of difficulty that your customers face)
  • The NOSS (Net Operator Simplicity Score) – A ranking from 0 (lowest) to 10 (highest) how easy is it to choose and use the company / product / service? This is an internal metric (ie for operators to rank complexity of systems and their constituent applications / data / processes)

One interesting item of feedback came from Ronald Hasenberger. He rightly pointed out that just because something is simple for users to interact with, doesn’t mean it’s simple behind the scenes – often exactly the opposite. The iPod example I used in earlier posts is a case in point. The iPod was more intuitive than existing MP3 players, but a huge amount of design and engineering went into making it that way. The underlying “system” certainly wasn’t simple.

So perhaps there’s a third simplicity factor to add to the two bullets listed above:

  • The NSSS (Net System Simplicity Score) – and this one does require a more sophisticated algorithm than just an aggregate of perceptions. Not only that, but it’s the one that truly reflects the systems we design and build. I wonder whether the first two are an initial set of proxies that help drive complexity out of our solutions, but we need to develop Ronald’s third one to make the biggest impact?

Again, I’d love to hear your thoughts!

The digital transformation paradox twins

There’s an old adage that “the confused mind always says no.”

Consider this from your own perspective. If you’re in a state of confusion about something, are you likely to commit wholeheartedly or will you look to delay / procrastinate?

The paradox for digital transformation is that our projects are almost always complex, but complexity breeds confusion and uncertainty. Transformation may be urgently needed, but it’s really hard to persuade stakeholders and sponsors to commit to change if they don’t have a clear picture of the way forward.

As change agents, we face another paradox. It’s our task to simplify the messaging. but our messaging should not imply that the project will be simple. That will just set unrealistic expectations for our stakeholders (“but this project was supposed to be simple,” they say).

Like all paradoxes, there’s no perfect solution. However, one technique that I’ve found to be useful is to narrow down the choices. Not by discarding them outright, but by figuring out filters – ways to quick include or exclude branches of the decision tree.

Let’s take the example of OSS vendor selection. An organisation asks itself, “what is the best-fit OSS/BSS for our needs?” The Blue Book OSS/BSS Vendor Directory will show that there are well over 400 OSS/BSS providers to choose from. Confusion!

So let’s figure out what our needs are. We could dive into really detailed requirement gathering, but that in itself requires many complex decisions. What if we instead just use a few broad needs as our first line of filtering? We know we need an outside plant management tool. Our list of 400+ now becomes 20. There’s still confusion, but we’re now more targeted.

But 20 is still a lot to choose from. A slightly deeper level of filtering should allow us to get to a short list of 3-5. The next step is to test those 3-5 to see which does the best at fulfilling the most important needs of the organisation. Chances are that the best-fit won’t fulfil every requirement, but generally it will clearly fulfil more than any of the other alternatives. It’s best-fit, not perfect fit.

We haven’t made the project less complex, but we have simplified the decision. We’ve arrived at the “best” option, so the way forward should be clear right?

Unfortunately, it’s not always that easy. Even though the best way forward has been identified, there’s still uncertainties in the minds of stakeholders caused purely by the complexity of the upcoming project. I’ve seen examples where the choice of vendor has been clear, with the best-fit clearly surpassing the next-best, but the buyer is still indecisive. I completely get it. Our task as change agents is to reduce doubts and increase transformation confidence.

What will get your CEO fired? (part 4)

In Monday’s article, we suggested that the three technical factors that could get the big boss fired are probably only limited to:

  1. Repeated and/or catastrophic failure (of network, systems, etc)
  2. Inability to serve the market (eg offerings, capacity, etc)
  3. Inability to operate network assets profitably

In that article, we looked closely at a human factor and how current trends of open-source, Agile and microservices might actually exacerbate it. In yesterday’s article we looked at market-serving factors for us to investigate and monitor.

But let’s look at point 3 today. The profitability factors we could consider that reduce the chances of the big boss getting fired are:

  1. Ability to see revenues in near-real-time (revenues are relatively easy to collect, so we use these numbers a lot. Much harder are profitability measures because of the shared allocation of fixed costs)

  2. Ability to see cost breakdown (particularly which parts of the technical solution are most costly, such as what device types / topologies are failing most often)

  3. Ability to measure profitability by product type, customer, etc

  4. Are there more profitable or cost-effective solutions available

  5. Is there greater profitability that could be unlocked by simplification

What will get your CEO fired? (part 3)

In Monday’s article, we suggested that the three technical factors that could get the big boss fired are probably only limited to:

  1. Repeated and/or catastrophic failure (of network, systems, etc)
  2. Inability to serve the market (eg offerings, capacity, etc)
  3. Inability to operate network assets profitably

In that article, we looked closely at a human factor and how current trends of open-source, Agile and microservices might actually exacerbate it. In yesterday’s article we looked at the broader set of catastrophic failure factors for us to investigate and monitor.

But let’s look at some of the broader examples under point 2 today. The market-serving factors we could consider that reduce the chances of the big boss getting fired are:

  1. Immediate visibility of key metrics by boss and execs (what are the metrics that matter, eg customer numbers, ARPU, churn, regulatory, media hot-buttons, network health, etc)

  2. Response to “voice of customer” (including customer feedback, public perception, etc)

  3. Human resources (incl up-skill for new tech, etc)

  4. Ability to implement quickly / efficiently

  5. Ability to handle change (to network topology, devices/vendors, business products, systems, etc)

  6. Measuring end-to-end user experience, not just “nodal” monitoring

  7. Scalability / Capacity (ability to serve customer demand now and into a foreseeable future)

What will get your CEO fired? (part 2)

In Monday’s article, we suggested that the three technical factors that could get the big boss fired are probably only limited to:

  1. Repeated and/or catastrophic failure (of network, systems, etc)
  2. Inability to serve the market (eg offerings, capacity, etc)
  3. Inability to operate network assets profitably

In that article, we looked closely at a human factor and how current trends of open-source, Agile and microservices might actually exacerbate it.

But let’s look at some of the broader examples under point 1 today. The failure factors we could consider that might result in the big boss getting fired are:

  1. Availability (nodal and E2E)

  2. Performance (nodal and E2E)

  3. Security (security trust model – cloud vs corporate vs active network and related zones)

  4. Remediation times, systems & processes (Assurance), particularly effectiveness of process for handling P1 (Priority 1) incidents

  5. Resilience Architecture

  6. Disaster Recovery Plan (incl Backup and Restore process, what black-swan events the organisation is susceptible to, etc)

  7. Supportability and Maintenance Routines

  8. Change and Release Management approaches

  9. Human resources (incl business continuity risk of losing IP, etc)

  10. Where are the SPoFs (Single Points of Failure)

We should note too that these should be viewed through two lenses:

  • The lens of the network our OSS/BSS is managing and
  • The lens of the systems (hardware/software/cloud) that make up our OSS/BSS

A lighter-touch OSS procurement approach (part 3)

We’ve spoken at length about TM Forum’s, “Time to kill the RFP? Reinventing IT procurement for the 2020s,” report so far this week. We’ve also spoken about the feeling that the OSS/BSS RFP (Request For Proposal) still has relevance in some situations… as long as it’s more of a lighter-touch than most. We’ve spoken about a more pragmatic approach that aims to find best available fit (for key objectives through stages of filtering) rather than perfect fit (for all requirements through detailed analyses). And I should note that “best available fit” includes measurement against these three contrarian procurement KPIs ahead of the traditional ones.

Yesterday’s post discussed how we get to a short list with minimal involvement of buyers and sellers, with the promise that we’d discuss the detailed analysis stage today.

It’s where we do use an RFP, but with thought given to the many pain-points cited so brilliantly by Mark Newman and team in the abovementioned TM Forum report.

The RFP provides the mechanism to firm up pricing and architecture, but is also closely tied to a PoC (Proof of Concept) demonstration. The RFP helps to prioritise the order in which PoCs are performed. PoCs tend to be very time consuming for buyer and seller. So if there’s a clear leader from the paper studies so far, then they will demonstrate first.

If there’s not a clear difference, or if the prime candidate’s demonstration identified significant gaps, then additional PoCs are run.

And to ensure the PoCs are run against the objectives that matter most, we use scenarios that were prioritised during part 1 of this series.

Next steps are to form the more detailed designs, commercials / contracts and ratify that the business case still holds up.

In yesterday’s post, I also promised to share our “starting-point” procurement methodology. I say starting point because each buyer situation is different and we tend to customise it to each buyer’s needs. It’s useful for starting discussions.

The overall methodology diagram is shown below:

PAOSS vendor selection process

A few key notes here:

  1. The process looks much heavier than it really is… if you use traditional procurement processes as an indicator
  2. We have existing templates for all the activities marked in yellow
  3. The activity marked in blue partially represents the project we’re getting really excited to introduce to you tomorrow

 

A lighter-touch OSS procurement approach (part 2)

Yesterday’s post described the approach to get from 400+ possible OSS/BSS suppliers/products down to a more manageable list without:

  1. Having to get into significant discussions with vendors (yet)
  2. Gathering all your stakeholders together to prepare a detailed list of requirements

We’ll call this “the long list,” which might consist of 5-20 suppliers. We use this evaluation technique (which we’ll share more about on Monday) to ensure we’ve looked at the broad market of suppliers rather than just the few the buyer already knows.

The next step we follow helps us to get to a much smaller list, which we’ll call “the short list.”

For this, we do need to contact vendors (the long list) and we do need to prepare a list of requirements to add to the objectives and key workflows we’ve previously identified. The requirements won’t need to be detailed, but will still probably number into the 100s – some from our pick-list, others customised to each client’s needs.

Then we engage in what we refer to as an EOI (Expression of Interest) phase. Our EOIs are not just a generic market capability analysis like many  buyers conduct. Ours seek indicative vendor compliance (to objectives and requirements) and indicative pricing based on the dimensions we supply. We’ve refined this model over the years to make it quite quick and (relatively) easy for vendors to respond to.

Using compliance to measure suitability and indicative pricing to plug in to our long-term TCO (Total Cost of Ownership) model, the long list usually becomes a clear short list of 1-5 very quickly.

Now we can get into detailed discussions with a very small number of best-fit suppliers without having wasted much time of buyer or seller. 

More on the detailed discussions tomorrow!

A lighter-touch OSS procurement approach (part 1)

You may have noticed that we’ve run a series of posts about OSS/BSS procurement, and about the RFP process by association.

One of the first steps in the traditional procurement process is preparing a strategy and detailed set of requirements.

As TM Forum’s, “Time to kill the RFP? Reinventing IT procurement for the 2020s,” report describes:
Before an RFP can be issued, the CSP’s IT or network team must produce a document detailing the strategy for implementing a technology or delivering a service, which is a lengthy process because of the number of stakeholders involved and the need to describe requirements in a way that satisfies them all.”

The problem with most requirements documents, the ones I’ve seen at least, is that they tend to get down into a deep, deep level of detail. And when it’s down in that level of detail, contrasting opinions from different stakeholders can make it really difficult to reach agreement. Have you ever been in a room with many high-value (and high cost) stakeholders spending days debating the semantics (and wording) of requirements? Every stakeholder group needs a say and needs to be heard.

The theory is that you need a great level of detail to evaluate supplier offerings for best-fit. Well, maybe, but not in the initial stages.

First things first – I seek to find out what’s really important for the organisation. That rarely comes from a detailed requirements spreadsheet, but by determining the things that are done most often and/or add the most value to the buyer’s organisation. I use persona mapping, long-tail and perhaps whale-curve mapping approaches to determine this.

Persona mapping means identifying all the groups within the buyer’s organisation that need to interact with the OSS/BSS (current and proposed). Then sitting with each group to determine what they need to achieve, who they need to interact with and what their workflows look like. That also gives a chance for all groups to be heard.

From this, we can collaboratively determine some high-level evaluation criteria, maybe only 15-20 to start with. You’d be surprised at how quickly this 15-20 criteria can help with initial supplier filtering.

Armed with the initial 15-20 evaluation criteria and the project we’re getting excited to launch on Monday, we can get to a relevant list of possible suppliers quite quickly. It allows us to do a broad market search to compile a list of suppliers, not just from the 5-10 suppliers the buyer already knows about, but from the 400+ suppliers/products available on the market. And we don’t even have to ask the suppliers to fill out any lengthy requirement response spreadsheets / forms yet.

We’ll continue the discussion over the next two days. We’ll also share our procurement methodology pack on Sunday.

OSS that make men feel more masculine and in command

From watching ESPN, I’d learned about the power of information bombardment. ESPN strafes its viewers with an almost hysterical amount of data and details. Scrolling boxes. Panels. Bars. Graphics. Multi-angle camera perspectives. When exposed to a surfeit of data, men tend to feel more masculine and in command. Do most men bother to decipher these boxes, panels, bars and graphics? No – but that’s not really the point.”
Martin Lindstrom
, in his book, “Small Data.”

I’ve just finished reading Small Data, a fascinating book that espouses forensic analysis of the lives of users (ie small data) rather than using big data methods to identify market opportunities. I like the idea of applying both approaches to our OSS products. After all, we need to make them more intuitive, endearing and ultimately, effective.

The quote above struck a chord in particular. Our OSS GUIs (user interfaces) can tend towards the ESPN model can’t they? The following paraphrasing doesn’t seem completely at odds with most of the OSS that we interact with – “[the OSS] strafes its viewers with an almost hysterical amount of data and details.”

And if what Lindstrom says is an accurate psychological analysis, does it mean:

  1. The OSS GUIs we’re designing help make their developers “feel more masculine and in command” or
  2. Our OSS operators “feel more masculine and in command” or
  3. Both

Intriguingly, does the feeling of being more masculine and in command actually help or hinder their effectiveness?

I find it fascinating that:

  1. Our OSS/BSS form a multi billion dollar industry
  2. Our OSS/BSS are the beating heart of the telecoms industry, being wholly responsible for operationalising the network assets that so much capital is invested in
  3. So little effort is invested in making the human to OSS interface far more effective than they  are today
  4. I keep hearing operators bemoan the complexities and challenges of wrangling their OSS, yet only hear “more functionality” being mentioned by vendors, never “better usability”

Maybe the last point comes from me being something of a rarity. Almost every one of the thousands of people I know in OSS either works for the vendor/supplier or the customer/operator. Conversely, I’ve represented both sides of the fence and often even sit in the middle acting as a conduit between buyers and sellers. Or am I just being a bit precious? Do you also spot the incongruence of point D on a regular basis?

Whether you’re buy-side or sell-side, would you love to make your OSS more effective? Let us know and we can discuss some of the optimisation techniques that might work for you.

Going to the OSS zoo

There’s the famous quote that if you want to understand how animals live, you don’t go to the zoo, you go to the jungle. The Future Lab has really pioneered that within Lego, and it hasn’t been a theoretical exercise. It’s been a real design-thinking approach to innovation, which we’ve learned an awful lot from.”
Jorgen Vig Knudstorp
.

This quote prompted me to ask the question – how many times during OSS implementations had I sought to understand user behaviour at the zoo versus the jungle?

By that, how many times had I simply spoken with the user’s representative on the project team rather than directly with end users? What about the less obvious personas as discussed in this earlier post about user personas? Had I visited the jungles where internal stakeholders like project sponsors, executives, data consumers, etc. or external stakeholders such as end-customers, regulatory bodies, etc go about their daily lives?

I can truthfully, but regretfully, say I’ve spent far more time at OSS zoos than in jungles. This is something I need to redress.

But, at least I can claim to have spent most time in customer-facing roles.

Too many of the product development teams I’ve worked closely with don’t even visit OSS zoos let alone jungles in any given year. They never get close to observing real customers in their native environments.

 

OSS Persona 10:10:10 Mapping

We sometimes attack OSS/BSS planning at a quite transactional level. For example, think about the process of gathering detailed requirements at the start of a project. They tend to be detailed and transactional don’t they? This type of requirement gathering is more like the WHAT and HOW rings in Simon Sinek’s Golden Circle.

Just curious, do you have a persona map that shows all of the different user groups that interact with your OSS/BSS?
More importantly, do you deeply understand WHY they interact with your OSS/BSS? Not just on a transaction-by-transaction level, but in the deeper context of how the organisation functions? Perhaps even on a psychological level?

If you do, you’re in a great position to apply the 10:10:10 mapping rule. That is, to describe how you’re adding value to each user group 10 minutes from now, 10 days from now and 10 months from now…

OSS Persona 10:10:10 Mapping

The mapping table could describe current tense (ie how your OSS/BSS is currently adding value), or as a planning mechanism for a future tense (ie how your OSS/BSS can add value in the future).
This mapping table can act as a guide for the evolution of your solution.

I should also point out that the diagram above only shows a sample of the internal personas that directly interact with your OSS/BSS. But I’d encourage you to look further. There are other personas that have direct and indirect engagement with your OSS/BSS. These include internal stakeholders like project sponsors, executives, data consumers, etc. They also include external stakeholders such as end-customers, regulatory bodies, etc.

If you need assistance to unlock your current state through persona mapping, real process mapping, etc and then planning out your target-state, Passionate About OSS would be delighted to help.

I’m really excited by a just-finished OSS analysis (part 3)

This is the third part of a series describing a really exciting analysis I’ve just finished.

Part 1 described how we can turn simple log files into a Sankey diagram that shows real-life process flows (not just a theoretical diagram drawn by BAs and SMEs), like below:

Part 2 described how the logs are broken down into a design tree and how we can assign weightings to each branch based on the data stored in the logs, as below:
OSS Decision Tree Analysis

I’ve already had lots of great feedback in relation to the Part 1 blog, especially from people who’ve had challenges capturing as-is process. The feedback has been greatly appreciated so I’m looking forward to helping them draw up their flow-charts on the way to helping optimise their process flows.

But that’s just the starting point. Today’s post is where things get really exciting (for me at least). Today we build on part 2 and not just record weightings, but use them to assist future decisions.

We can use the decision tree to “predict forward” and help operators / algorithms make optimal decisions whilst working towards process completion. We can use a feedback loop to steer an operator (or application) down the most optimal branches of the tree (and/or avoid the fall-out variants).

This allows us to create a closed-loop, self-optimising, Decision Support System (DSS), as follows:

Note: Diagram sourced from https://passionateaboutoss.com/closing-the-loop-to-make-better-decisions, where further explanation is provided

Using log data alone, we can perform decision optimisation based on “likelihood of success” or “time to complete” as per the weightings table. If supplemented with additional data, the weightings table could also allow decisions to be optimised by “cost to complete” or many other factors.

The model has the potential to be used in “real-time” mode, using the constant stream of process logs to continually refine and adapt. For example:

  • If the long-term average of a process path is 1 minute, but there’s currently a problem with and that path is failing, then another path (one that is otherwise slightly less optimised over the long-term), could be used until the first path is repaired
  • An operator happens to choose a new, more optimal path than has ever been identified previously (the delta function in the diagram). It then sets a new benchmark and informs the new approach via the DSS (Darwinian selection)

If you’re wondering how the DSS could be implemented, I can envisage a few ways:

  1. Using existing RPA (Robotic Process Automation) tools [which are particularly relevant if the workflow box in the diagram above crosses multiple different applications (not just a single monolithic OSS/BSS)]
  2. Providing a feedback path into the functionality of the OSS/BSS and it’s GUI
  3. Via notifications (eg email, Slack, etc) to operators
  4. Via a simple, more manual process like flow diagrams, work instructions, scorecards or similar
  5. You can probably envisage other methods

I’m really excited by a just-finished OSS analysis (part 2)

As the title suggests, this is the second part in a series describing a process flow visualisation, optimisation and decision support methodology that uses simple log data as input.

Yesterday’s post, part 1 in the series, showed the visualisation aspect in the form of a Sankey flow diagram.

This visualisation is exciting because it shows how your processes are actually flowing (or not), as opposed to the theoretical process diagrams that are laboriously created by BAs in conjunction with SMEs. It also shows which branches in the flow are actually being utilised and where inefficiencies are appearing (and are therefore optimisation targets).

Some people have wondered how simple activity logs can be used to show the Sankey diagrams. Hopefully the diagram below helps to describe this. You scan the log data looking for variants / patterns of flows and overlay those onto a map of decision states (DPs). In the diagram above, there are only 3 DPs, but 303 different variants (sounds implausible, but there are many variants that do multiple loops through the 3 states and are therefore considered to be a different variant).

OSS Decision Tree Analysis

The numbers / weightings you see on the Sankey diagram are the number* of instances (of a single flow type) that have transitioned between two DPs / states.

* Note that this is not the same as the count value that appears in the Weightings table. We’ll get to that in tomorrow’s post when we describe how to use the weightings data for decision support.

I’m really excited by a just-finished OSS analysis

In your travels, I don’t suppose you’ve ever come across anyone having challenges to capture and/or optimise their as-is OSS/BSS process flows? Once or twice?? 🙂

Well I’ve just completed an analysis that I’m really excited about. It’s something I’ve been thinking about for some time, but have just finished proving on the weekend. I thought it might have relevance to you too. It quickly helps to visualise as-is process and identify areas to optimise.

The method takes activity logs (eg from OSS, ITIL, WFM, SAP or similar) and turns them into a process diagram (a Sankey diagram) like below with real instance volumes. Much better than a theoretical process map designed by BAs and SMEs don’t you think?? And much faster and more accurate too!!

OSS Sankey process diagram

A theoretical process map might just show a sequence of 3 steps, but the diagram above has used actual logs to show what’s really occurring. It highlights governance issues (skipped steps) and inefficiencies (ie the various loops) in the process too. Perfect for process improvement.

But more excitingly, it proves a path towards real-time “predict-forward” decision support without having to get into the complexities of AI. More has been included in the analysis!

If this is of interest to you, let me know and I’ll be happy to walk you through the full analysis. Or if you want to know how your real as-is processes perform, I’d be happy to help turn your logs into visuals like the one above.

PS1. You might think you need a lot of fields to prepare the diagrams above. The good news is the only mandatory fields would be something like:

  1. Flow type – eg Order type, project type or similar (only required if the extract contains multiple flow types mixed together. The diagram above represents just one flow type)
  2. Flow instance identifier – eg Order number, project number or similar (the diagram above was based on data that had around 600,000 flow instances)
  3. Activity identifier – eg Activity name (as per the 3 states in the diagram above), recorded against each flow instance. Note that they will ideally be an enumerated list (ie from a finite pick-list)
  4. Timestamps – Start/end timestamp on each activity instance

If the log contains other details such as the name of the operator who completed each activity, that can help add richness, but not mandatory.

PS2. The main objective of the analysis was to test concepts raised in the following blog posts:

The OSS “out of control” conundrum

Over the years in OSS, I’ve spent a lot of my time helping companies create their OSS / BSS strategies and roadmaps. Sometimes clients come from the buy side (eg carriers, utilities, enterprise), other times clients come from the sell side (eg vendors, integrators). There’s one factor that seems to be most commonly raised by these clients, and it comes from both sides.

What is that one factor? Well, we’ll come back to what that factor is a little later, but let’s cover some background first.

OSS / BSS covers a fairly broad estate of functionality:
OSS and BSS overlaid onto the TAM

Even if only covering a simplified version of this map, very few suppliers can provide coverage of the entire estate. That infers two things:

  1. Integrations; and
  2. Relationships

If you’re from the buy-side, you need to manage both to build a full-function OSS/BSS suite. If you’re from the sell-side, you’re either forced into dealing with both (reactive) or sometimes you can choose to develop those to bring a more complete offering to market (proactive).

You will have noticed that both are double-ended. Integrations bring two applications / functions together. Relationships bring two organisations together.

This two-ended concept means there’s always a “far-side” that’s outside your control. It’s in our nature to worry about what’s outside our control. We tend to want to put controls around what we can’t control. Not only that, but it’s incumbent on us as organisation planners to put mitigation strategies in place.

Which brings us back to the one factor that is raised by clients on most occasions – substitution – how do we minimise our exposure to lock-in with an OSS product / service partner/s if our partnership deteriorates?

Well, here are some thoughts:

  1. Design your own architecture with product / partner substitution in mind (and regularly review your substitution plan because products are always evolving)
  2. Develop multiple integrations so that you always have active equivalency. This is easier for sell-side “reactives” because their different customers will have different products to integrate to (eg an OSS vendor that is able to integrate with four different ITSM tools because they have different customers with each of those variants)
  3. Enhance your own offerings so that you no longer require the partnership, but can do it yourself
  4. Invest in your partnerships to ensure they don’t deteriorate. This is the OSS marriage analogy where ongoing mutual benefits encourage the relationship to continue.

Stealing fire for OSS

I’ve recently started reading a book called Stealing Fire: How Silicon Valley, the Navy SEALs, and Maverick Scientists Are Revolutionizing the Way We Live and Work. To completely over-generalise the subject matter, it’s about finding optimal performance states, aka finding flow. Not the normal topic of conversation for here on the PAOSS blog!!

However, the book’s content has helped to make the link between flow and OSS more palpable than you might think.

In the early days of working on OSS delivery projects, I found myself getting into a flow state on a daily basis – achieving more than I thought capable, learning more effectively than I thought capable and completely losing track of time. In those days of project delivery, I was lucky enough to get hours at a time without interruptions, to focus on what was an almost overwhelming list of tasks to be done. Over the first 5-ish years in OSS, I averaged an 85 hour week because I was just so absorbed by it. It was the source from where my passion for OSS originated. Or was it??

The book now has me pondering a chicken or egg conundrum – did I become so passionate about OSS that I could get into a state of flow or did I only become passionate about OSS because I was able to readily get into a state of flow with it? That’s where the book provides the link between getting in the zone and the brain chemicals that leave us with a feeling of ecstasis or happiness (not to mention the addictive nature of it). The authors describe this state of consciousness as Selflessness, Timelessness, Effortlessness, and Richness, or STER for short. OSS definitely triggered STER for me,, but chicken or egg??

Having spent much of the last few years embedded in big corporate environments, I’ve found a decreased ability to get into the same flow state. Meetings, emails, messenger pop-ups, distractions from surrounding areas in open-plan offices, etc. They all interrupt. It’s left me with a diminishing opportunity to get in the zone. With that has come a growing unease and sense of sub-optimal productivity during “office hours.” It was increasingly disheartening that I could generally only get into the zone outside office hours. For example, whilst writing blogs on the train-trip or in the hours after the rest of my family was asleep.

Since making the concerted effort to leave that “office state,” I’ve been both surprised and delighted at the increased productivity. Not just that, but the ability to make better lateral connections of ideas and to learn more effectively again.

I’d love to hear your thoughts on this in the comments section below. Some big questions for you:

  1. Have you experienced a similar productivity gap between “flow state” and “office state” on your OSS projects?
  2. Have you had the same experience as me, where modern ways of working seem to be lessening the long chunks of time required to get into flow state?
  3. If yes, how can our sponsor organisations and our OSS products continue to progress if we’re increasingly working only in office state?

Step-by-step guide to build a systematic root-cause analysis (RCA) pipeline

Fault / Alarm management tools have lots of strings to their functionality bows to help operators focus in on the target/s that matter most. ITU-T’s recommendation X.733 provided an early framework and common model for classification of alarms. This allowed OSS vendors to build a standardised set of filters (eg severity, probable cause, etc). ITU-T’s recommendation M.3703 then provided a set of guiding use cases for managing alarms. These recommendations have been around since the 1990’s (or possibly even before).

Despite these “noise reduction” tools being readily available, they’re still not “compressing” event lists enough in all cases.

I imagine, like me, you’ve heard many customer stories where so many new events are appearing in an event list each day that the NOC (network operations centre) just can’t keep up. Dozens of new events are appearing on the screen, then scrolling off the bottom of it before an operator has even had a chance to stop and think about a resolution.

So if humans can’t keep up with the volume, we need to empower machines with their faster processing capabilities to do the job. But to do so, we first have to take a step away from the noise and help build a systematic root-cause analysis (RCA) pipeline.

I call it a pipeline because there are generally a lot of RCA rules that are required. There are a few general RCA rules that can be applied “out of the box” on a generic network, but most need to be specifically crafted to each network.

So here’s a step-by-step guide to build your RCA pipeline:

  1. Scope – Identify your initial target / scope. For example, what are you seeking to prioritise:
    1. Event volume reduction to give the NOC breathing space to function better
    2. Identifying “most important” events (but defining what is most important)
    3. Minimising SLA breaches
    4. etc
  2. Gather Data – Gather incident and ticket data. Your OSS is probably already doing this, but you may need to pull data together from various sources (eg alarms/events, performance, tickets, external sources like weather data, etc)
  3. Pattern Identification – Pattern identification and categorisation of incidents. This generally requires a pattern identification tool, ideally supplied by your alarm management and/or analytics supplier
  4. Prioritise – Using a long-tail graph like below, prioritise pattern groups by the following (and in line with item #1 above):
      1. Number of instances of the pattern / group (ie frequency)
      2. Priority of instances (ie urgency of resolution)
      3. Number of linked incidents (ie volume)
      4. Other technique, such as a cumulative/blended metric

  5. Gather Resolution Knowledge – Understand current NOC approaches to fault-identification and triage, as well as what’s important to them (noting that they may have biases such as managing to vanity metrics)
  6. Note any Existing Resolutions – Identify and categorise any existing resolutions and/or RCA rules (if data supports this)
  7. Short-list Remaining Patterns – Overlay resolution pattern on long-tail (to show which patterns are already solved for). then identify remaining priority patterns on the long-tail that don’t have a resolution yet.
  8. Codify Patterns – Progressively set out to identify possible root-cause by analysing cause-effect such as:
    1. Topology-based
    2. Object hierarchy
    3. Time-based ripple
    4. Geo-based ripple
    5. Other (as helped to be defined by NOC operators)
  9. Knowledge base – Create a knowledge base that itemises root-causes and supporting information
  10. Build Algorithm / Automation – Create an algorithm for identifying root-cause and related alarms. Identify level of complexity, risks, unknowns, likelihood, control/monitoring plan for post-install, etc. Then build pilot algorithm (and possibly roll-back technique??). This might not just be an RCA rule, but could also include other automations. Automations could include creating a common problem and linking all events (not just root cause event but all related events), escalations, triggering automated workflows, etc
  11. Test pilot algorithm (with analytics??)
  12. Introduce algorithm into production use – But continue to monitor what’s being suppressed to
  13. Repeat – Then repeat from steps 7 to 12 to codify the next most important pattern
  14. Leading metrics – Identify leading metrics and/or preventative measures that could precede the RCA rule. Establish closed-loop automated resolution
  15. Improve – Manage and maintain process improvement

What if most OSS/BSS are overkill? Planning a simpler version

You may recall a recent article that provided a discussion around the demarcation between OSS and BSS, which included the following graph:

Note that this mapping is just my demarc interpretation, but isn’t the definitive guide. It’s definitely open to differing opinions (ie religious wars).

Many of you will be familiar with the framework that the mapping is overlaid onto – TM Forum’s TAM (The Application Map). Version R17.5.1 in this case. It is as close as we get to a standard mapping of OSS/BSS functionality modules. I find it to be a really useful guide, so today’s article is going to call on the TAM again.

As you would’ve noticed in the diagram above, there are many, many modules that make up the complete OSS/BSS estate. And you should note that the diagram above only includes Level 2 mapping. The TAM recommendation gets a lot more granular than this. This level of granularity can be really important for large, complex telcos.

For the OSS/BSS that support smaller telcos, network providers or utilities, this might be overkill. Similarly, there are OSS/BSS vendors that want to cover all or large parts of the entire estate for these types of customers. But as you’d expect, they don’t want to provide the same depth of functionality coverage that the big telcos might need.

As such, I thought I’d provide the cut-down TAM mapping below for those who want a less complex OSS/BSS suite.

It’s a really subjective mapping because each telco, provider or vendor will have their own perspective on mandatory features or modules. Hopefully it provides a useful starting point for planning a low complexity OSS/BSS.

Then what high-level functionality goes into these building blocks? That’s possibly even more subjective, but here are some hints:

In an OSS, what are O2A, T2R, U2C, P2O and DBA?

Let’s start with the last one first – DBA.

In the context of OSS/BSS, DBA has multiple meanings but I think the most relevant is Death By Acronym (don’t worry all you Database Administrators out there, I haven’t forgotten about you). Our industry is awash with TLAs (Three-Letter Acronyms) that lead to DBA.

Having said that, today’s article is about four that are commonly used in relation to end to end workflows through our OSS/BSS stacks. They often traverse different products, possibly even multiple different vendors’ products. They are as follows:

  • P2O – Prospect to Order – This workflow operates across the boundary between the customer and the customer-facing staff at the service provider. It allows staff to check what products can be offered to a customer. This includes service qualification (SQ), feasibility checks, then design, assign and reserve resources.
  • O2A – Order to Activate – This workflow includes all activities to manage customer services across entire life-cycles. That is, not just the initial activation of a service, but in-flight changes during activation and post-activation changes as well
  • U2C – Usage to Cash – This workflow allows customers or staff to evaluate the usage or consumption of a service (or services) that has already been activated for a customer
  • T2R – Trouble to Resolve – This “workflow” is more like a bundle of workflows that relate to assuring health of the services (and the network that carries them). They can be categorised as reactive (ie a customer triggers a resolution workflow by flagging an issue to the service provider) or a proactive (ie the service provider identifies and issue, degradation or potential for issue and triggers a resolution workflow internally)

If you’re interested in seeing how these workflows relate to the TM Forum APIs and specifically to NaaS (Network as a Service) designs, there’s a great document (TMF 909A v1.5) that can be found at the provided link. It shows the sub-elements (and associated APIs) that each of these workflows rely on.

PS. I recently read a vendor document that described additional flows:- I2I (Idea to Implementation – service onboarding, through a catalog presumably), P2P (Plan to Production – resource provisioning) and O2S (Order to Service). There’s also C2M (Concept to Market), L2C (Lead to Cash) and I’m sure I’m forgetting a number of others. Are there any additional TLAs that I should be listing here to describe end-to-end workflows?