“There’s the famous quote that if you want to understand how animals live, you don’t go to the zoo, you go to the jungle. The Future Lab has really pioneered that within Lego, and it hasn’t been a theoretical exercise. It’s been a real design-thinking approach to innovation, which we’ve learned an awful lot from.”
Jorgen Vig Knudstorp.
This quote prompted me to ask the question – how many times during OSS implementations had I sought to understand user behaviour at the zoo versus the jungle?
By that, how many times had I simply spoken with the user’s representative on the project team rather than directly with end users? What about the less obvious personas as discussed in this earlier post about user personas? Had I visited the jungles where internal stakeholders like project sponsors, executives, data consumers, etc. or external stakeholders such as end-customers, regulatory bodies, etc go about their daily lives?
I can truthfully, but regretfully, say I’ve spent far more time at OSS zoos than in jungles. This is something I need to redress.
But, at least I can claim to have spent most time in customer-facing roles.
Too many of the product development teams I’ve worked closely with don’t even visit OSS zoos let alone jungles in any given year. They never get close to observing real customers in their native environments.
Interestingly, I also just stumbled upon OpenTelemetry, an open source project designed to capture traces / metrics / logs from apps / microservices. It intrigued me because it introduces the concept of telemetry on spans (not just application nodes). Tomorrow’s article will explore how the concept of spans / traces / metrics / logs for apps might provide insight into the challenge we face getting true end-to-end metrics from our networks (as opposed to the easy to come by nodal metrics).
In the network world, we’re good at getting nodal metrics / logs / events, but not very good at getting trace data (ie end-to-end service chains, or an aggregation of spans in OpenTelemetry nomenclature). And if we can’t monitor traces, we can’t easily interpret a customer’s experience whilst they’re using their network service. We currently do “service assurance” by reverse-engineering nodal logs / events, which seems a bit backward to me.
Table 4 (from the Netrounds white paper link above) provides a view of the most common AI/ML techniques used. Classification and Clustering are useful techniques for alarm / event “optimisation,” (filter, group, correlate and prioritize alarms). That is, to effectively minimise the number of alarms / events a NOC operator needs to look at. In effect, traditional data collection allows AI / ML to remove the noise, but still leaves the problem to be solved manually (ie network assurance, not service assurance)
They’re helping optimise network / resource problems, but not solving the more important service-related problems, as articulated in Table 5 below (again from Netrounds).
If we can directly collect trace data (ie the “active measurements” described in yesterday’s post), we have the data to answer specific questions (which better aligns with our narrow AI technologies of today). To paraphrase questions in the Netrounds white paper, we can ask:
Has the digital service been properly activated.
What service level is currently being experienced by customers (and are SLAs being met)
Is there an outage or degradation of end to end service chains (established over multi-domain, hybrid and multi-layered networks)
Does feedback need to be applied (eg via an orchestration solution) to heal the network
Today we’ll discuss the approach/es to overcome the constraints described in yesterday’s post.
As shown via the inserted blue row in Table 6 below (source: Netrounds), a proposed solution is to use active measurements that reflect the end-to-end user experience.
The blue row only talks about the real-time monitoring of “synthetic user traffic,” in the table below. However, there are at least two other active measurement techniques that I can think of:
We can monitor real user traffic if we use port-mirroring techniques
We can also apply techniques such as TR-069 to collect real-time customer service meta data
Note: There are strengths and weaknesses of each of the three approaches, but we won’t dive into that here. Maybe another time.
You may recall in yesterday’s post that we couldn’t readily ask service-related questions of our traditional systems or data. Excitingly though, active measurement solutions do allow us to ask more customer-centric questions, like those shown in the orange box below. We can start to collect metrics that do relate directly to what the customer is paying for (eg real data throughput rates on a storage backup service). We can start to monitor real SLA metrics, not just proxy / vanity metrics (like device up-time).
Interestingly, I’ve only had the opportunity to use one vendor’s active measurement solutions so far (one synthetic transaction tool and one port-mirror tool). [The vendor is not Netrounds’ I should add. I haven’t seen Netround’s solution yet, just their insightful white paper]. Figure 3 actually does a great job of articulating why the other vendor’s UI (user interface) and APIs are currently lacking.
Whilst they do collect active metrics, the UI doesn’t allow the user to easily ask important service health questions of the data like in the orange box. Instead, the user has to dig around in all the metrics and make their own inferences. Similarly the APIs don’t allow for the identification of events (eg threshold crossing) or automatic push of notifications to external systems.
This leaves a gap in our ability to apply self-healing (automated resolution) and resolution prior to failure (prediction) algorithms like discussed in yesterday’s post. Excitingly, it can collect service-centric data. It just can’t close the loop with it yet!
Below are three insightful tables from the Netrounds white paper:
Table 1 looks at the typical components (systems) that service assurance is comprised of. But more interestingly, it looks at the types of questions / challenges each traditional system is designed to resolve. You’ll have noticed that none of them directly answer any service quality questions (except perhaps inventory systems, which can be prone to having sketchy associations between services and the resources they utilise).
Table 2 takes a more data-centric approach. This becomes important when we look at the big picture here – ensuring reliable and effective delivery of customer services. Infrastructure failures are a fact of life, so improved service assurance models of the future will depend on automated and predictive methods… which rely on algorithms that need data. Again, we notice an absence of service-related data sets here (apart from Inventory again). You can see the constraints of the traditional data collection approach can’t you?
Table 3 instead looks at the goals of an ideal service-centric assurance solution. The traditional systems / data are convenient but clearly don’t align well to those goals. They’re constrained by what has been presented in tables 1 and 2. Even the highly touted panaceas of AI and ML are likely to struggle against those constraints.
What if we instead start with Table 3’s assurance of customer services in mind and work our way back? Or even more precisely, what if we start with an objective of perfect availability and performance of every customer service?
That might imply self-healing (automated resolution) and resolution prior to failure (prediction) as well as resilience. But let’s first give our algorithms (and dare I say it, AI/ML techniques) a better chance of success.
Working back – What must the data look like? What should the systems look like? What questions should these new systems be answering?
We sometimes attack OSS/BSS planning at a quite transactional level. For example, think about the process of gathering detailed requirements at the start of a project. They tend to be detailed and transactional don’t they? This type of requirement gathering is more like the WHAT and HOW rings in Simon Sinek’s Golden Circle.
Just curious, do you have a persona map that shows all of the different user groups that interact with your OSS/BSS?
More importantly, do you deeply understand WHY they interact with your OSS/BSS? Not just on a transaction-by-transaction level, but in the deeper context of how the organisation functions? Perhaps even on a psychological level?
If you do, you’re in a great position to apply the 10:10:10 mapping rule. That is, to describe how you’re adding value to each user group 10 minutes from now, 10 days from now and 10 months from now…
The mapping table could describe current tense (ie how your OSS/BSS is currently adding value), or as a planning mechanism for a future tense (ie how your OSS/BSS can add value in the future).
This mapping table can act as a guide for the evolution of your solution.
I should also point out that the diagram above only shows a sample of the internal personas that directly interact with your OSS/BSS. But I’d encourage you to look further. There are other personas that have direct and indirect engagement with your OSS/BSS. These include internal stakeholders like project sponsors, executives, data consumers, etc. They also include external stakeholders such as end-customers, regulatory bodies, etc.
If you need assistance to unlock your current state through persona mapping, real process mapping, etc and then planning out your target-state, Passionate About OSS would be delighted to help.
This visualisation is exciting because it shows how your processes are actually flowing (or not), as opposed to the theoretical process diagrams that are laboriously created by BAs in conjunction with SMEs. It also shows which branches in the flow are actually being utilised and where inefficiencies are appearing (and are therefore optimisation targets).
Some people have wondered how simple activity logs can be used to show the Sankey diagrams. Hopefully the diagram below helps to describe this. You scan the log data looking for variants / patterns of flows and overlay those onto a map of decision states (DPs). In the diagram above, there are only 3 DPs, but 303 different variants (sounds implausible, but there are many variants that do multiple loops through the 3 states and are therefore considered to be a different variant).
The numbers / weightings you see on the Sankey diagram are the number* of instances (of a single flow type) that have transitioned between two DPs / states.
* Note that this is not the same as the count value that appears in the Weightings table. We’ll get to that in tomorrow’s post when we describe how to use the weightings data for decision support.
“Whatever is well conceived is clearly said,
And the words to say it flow with ease.”
I’d like to hijack this quote and re-direct it towards architectures. Could we equally state that a well conceived architecture can be clearly understood? Some modern OSS/IT frameworks that I’ve seen recently are hugely complex. The question I’ve had to ponder is whether they’re necessarily complex. As the aphorism states, “Everything should be made as simple as possible, but not simpler.”
Just take in the complexity of this triptych I prepared to overlay SDN, NFV and MANO frameworks.
Yet this is only a basic model. It doesn’t consider networks with a blend of PNF and VNF (Physical and Virtual Network Functions). It doesn’t consider closed loop assurance. It doesn’t consider other automations, or omni-channel, or etc, etc.
Yesterday’s post raised an interesting concept from Tom Nolle that as our solutions become more complex, our ability to make a basic assessment of value becomes more strained. And by implication, we often need to upskill a team before even being able to assess the value of a proposed project.
It seems to me that we need simpler architectures to be able to generate persuasive business cases. But it poses the question, do they need to be complex or are our solutions just not well enough conceived yet?
To borrow a story from Wikiquote, “Richard Feynman, the late Nobel Laureate in physics, was once asked by a Caltech faculty member to explain why spin one-half particles obey Fermi Dirac statistics. Rising to the challenge, he said, “I’ll prepare a freshman lecture on it.” But a few days later he told the faculty member, “You know, I couldn’t do it. I couldn’t reduce it to the freshman level. That means we really don’t understand it.“
“…as technology gets more complicated, it becomes more difficult for buyers to acquire the skills needed to make even a basic assessment of value. Without such an assessment, it’s hard to get a project going, and in particular hard to get one going the right way.”
Have you noticed that over the last few years, OSS choice has proliferated, making project assessment more challenging? Previously, the COTS (Commercial Off-the-Shelf) product solution dominated. That was already a challenge because there are hundreds to choose from (there are around 400 on our vendors page alone). But that’s just the tip of the iceberg.
We now also have choices to make across factors such as:
Building OSS tools with open-source projects
An increasing amount of in-house development (as opposed to COTS implementations by the product’s vendors)
Smaller niche products that need additional integration
An increase in the number of “standards” that are seeking to solve traditional OSS/BSS problems (eg ONAP, ETSI’s ZSM, TM Forum’s ODA, etc, etc)
Revolutions from the IT world such as cloud, containerisation, virtualisation, etc
As Tom indicates in the quote above, the diversity of skills required to make these decisions is broadening. Broadening to the point where you generally need a large team to have suitable skills coverage to make even a basic assessment of value.
At Passionate About OSS, we’re seeking to address this in the following ways:
We have two development projects underway (more news to come)
One to simplify the vendor / product selection process
One to assist with up-skilling on open-source and IT tools to build modern OSS
In addition to existing pages / blogs, we’re assembling more content about “standards” evolution, which should appear on this blog in coming days
Use our “Finding an Expert” tool to match experts to requirements
And of course there are the variety of consultancy services we offer ranging from strategy, roadmap, project business case and vendor selection through to resource identification and implementation. Leave us a message on our contact page if you’d like to discuss more
We contrasted this with the mechanisms used in most OSS that actually prevent flow-state from occurring. Today I’m going to dive into the work that goes into creating a new design (to activate a customer), and how our current OSS designs / processes inhibit flow.
“Being switched on at all times and expected to pick things up immediately makes us miserable, says [Cal] Newport. “It mismatches with the social circuits in our brain. It makes us feel bad that someone is waiting for us to reply to them. It makes us anxious.”
Because it is so easy to dash off a quick reply on email, Slack or other messaging apps, we feel guilty for not doing so, and there is an expectation that we will do it. This, says Newport, has greatly increased the number of things on people’s plates. “The average knowledge worker is responsible for more things than they were before email. This makes us frenetic. We should be thinking about how to remove the things on their plate, not giving people more to do…
Going cold turkey on email or Slack will only work if there is an alternative in place. Newport suggests, as many others now do, that physical communication is more effective. But the important thing is to encourage a culture where clear communication is the norm.
Newport is advocating for a more linear approach to workflows. People need to completely stop one task in order to fully transition their thought processes to the next one. However, this is hard when we are constantly seeing emails or being reminded about previous tasks. Some of our thoughts are still on the previous work – an effect called attention residue.”
That resonates completely with me. So let’s consider that and look into the collaboration process of a stylised order activation:
Customer places an order via an order-entry portal
Perform SQ (Service Qualification) and Credit Checks, automated processes
Order is broken into work order activities (automated process)
Designer1 picks up design work order activity from activity list and commences outside plant design (cables, pits, pipes). Her design pack includes:
Updating AutoCAD / GIS drawings to show outside plant (new cable in existing pit/pipe, plus lead-in cable)
Updating OSS to show splicing / patching changes
Creates project BoQ (bill of quantities) in a spreadsheet
Designer2 picks up next work order activity from activity list and commences active network design. His design pack includes:
Allocation of CPE (Customer Premises Equipment) from warehouse
Allocation of IP address from ranges available in IPAM (IP address manager)
Configuration plan for CPE and network edge devices
FieldWorkTeamLeader reviews inside plant and outside plant designs and allocates to FieldWorker1. FieldWorker1 is also issued with a printed design pack and the required materials
FieldWorker1 commences build activities and finds out there’s a problem with the design. It indicates splicing the customer lead-in to fibres 1/2, but they appear to already be in use
So, what does FieldWorker1 do next?
The activity list / queue process has worked reasonably well up until this step in the process. It allowed each person to work autonomously, stay in deep focus and in the sequence of their own choosing. But now, FieldWorker1 needs her issue resolved within only a few minutes or must move on to her next job (and next site). That would mean an additional truck-roll, but also annoying the customer who now has to re-schedule and take an additional day off work to open their house for the installer.
FieldWorker1 now needs to collaborate quickly with Designer1, Designer2 and FieldWorkTeamLeader. But most OSS simply don’t provide the tools to do so. The go-forward decision in our example draws upon information from multiple sources (ie AutoCAD drawing, GIS, spreadsheet, design document, IPAM and the OSS). Not only that, but the print-outs given to the field worker don’t reflect real-time changes in data. Nor do they give any up-stream context that might help her resolve this issue.
So FieldWorker1 contacts the designers directly (and separately) via phone.
Designer1 and Designer2 have to leave deep-think mode to respond urgently to the notification from FieldWorker1 and then take minutes to pull up the data. Designer1 and Designer2 have to contact each other about conflicting data sets. Too much time passes. FieldWorker1 moves to her next job.
Our challenge as OSS designers is to create a collaborative workspace that has real-time access to all data (not just the local context as the issue probably lies in data that’s upstream of what’s shown in the design pack). Our workspace must also provide all participants with the tools to engage visually and aurally – to choreograph head-office and on-site resources into “group flow” to resolve the issue.
Even if such tools existed today, the question I still have is how we ensure our designers aren’t interrupted from their all-important deep-think mode. How do we prevent them from having to drop everything multiple times a day/hour? Perhaps the answer is in an organisational structure – where all designers have to cycle through the Design Support function (eg 1 day in a fortnight), to take support calls from field workers and help them resolve issues. It will give designers a greater appreciation for problems occurring in the field and also help them avoid responding to emails, slack messages, etc when in design mode.
TM Forum’s Open Digital Architecture (ODA) White Paper begins with the following statement:
Telecoms is at a crucial turning point. The last decade has dealt a series of punishing blows to an industry that had previously enjoyed enviable growth for more than 20 years. Services that once returned high margins are being reduced to commodities in the digital world, and our insatiable appetite for data demands continuous investment in infrastructure. On the other hand, communications service providers (CSPs) and their partners are in an excellent position to guide and capitalize on the next wave of digital revolution.
Clearly, a reduction in profitability leads to a reduction in cash available for projects – including OSS transformation projects. And reduced profitability almost inevitably leads executives to start thinking about head-count reduction too.
As Luke Clifton of Macquarie Telecom observed here, “Telstra is reportedly planning to shed 1,200 people from its enterprise business with many of these people directly involved in managing small-to-medium sized business customers. More than 10,000 customers in this segment will no longer have access to dedicated Account Managers, instead relegated to being managed by Telstra’s “Digital Hub”… Telstra, like the big banks once did, is seemingly betting that customers won’t leave them nor will they notice the downgrade in their service. It will be interesting to see how 10,000 additional organisations will be managed through a Digital Hub.
Simply put, you cannot cut quality people without cutting the quality of service. Those two ideals are intrinsically linked…”
As a fairly broad trend across the telco sector, projects and jobs are being cut, whilst technology change is forcing transformation. And as suggested in Luke’s “Digital Hub” quote above, it all leads to increased expectations on our OSS/BSS.
Pressure is coming at our OSS from all angles, and with no signs of abating.
To quote Queen, “Pressure. Pushing down on me.Pressing down on you.”
So it seems to me there are only three broad options when planning our OSS roadmaps:
We learn to cope with increased pressure (although this doesn’t seem like a viable long-term option)
We reduce the size (eg functionality, transaction volumes, etc) of our OSS footprint [But have you noticed that all of our roadmaps seem expansionary in terms of functionality, volumes, technologies incorporated, etc??]
We look beyond the realms of traditional OSS/BSS functionality (eg just servicing operations) and into areas of opportunity
TM Forum’s ODA White Paper goes on to state, “The growth opportunities attached to new 5G ecosystems are estimated to be worth over $580 billion in the next decade. Servicing these opportunities requires transformation of the entire industry. Early digital transformation efforts focused on improving customer experience and embracing new technologies such as virtualization, with promises of wide-scale automation and greater agility. It has become clear that these ‘projects’ alone are not enough. CSPs’ business and operating models, choice of technology partners, mindset, decision-making and time to market must also change. True digital business transformation is not an easy or quick path, but it is essential to surviving and thriving in the future digital market.”
BTW. I’m not suggesting 5G is the panacea or single opportunity here. My use of the quote above is drawing more heavily on the opportunities relating to digital transformation. Not of the telcos themselves, but digital transformation of their customers. If data is the oil of the 21st century, then our OSS/BSS and telco assets have the potential to be the miners and pipelines of that oil.
If / when our OSS go from being cost centres to revenue generators (directly attributable to revenue, not the indirect attribution by most OSS today), then we might feel some of the pressure easing off us.
Note that this mapping is just my demarc interpretation, but isn’t the definitive guide. It’s definitely open to differing opinions (ie religious wars).
Many of you will be familiar with the framework that the mapping is overlaid onto – TM Forum’s TAM (The Application Map). Version R17.5.1 in this case. It is as close as we get to a standard mapping of OSS/BSS functionality modules. I find it to be a really useful guide, so today’s article is going to call on the TAM again.
As you would’ve noticed in the diagram above, there are many, many modules that make up the complete OSS/BSS estate. And you should note that the diagram above only includes Level 2 mapping. The TAM recommendation gets a lot more granular than this. This level of granularity can be really important for large, complex telcos.
For the OSS/BSS that support smaller telcos, network providers or utilities, this might be overkill. Similarly, there are OSS/BSS vendors that want to cover all or large parts of the entire estate for these types of customers. But as you’d expect, they don’t want to provide the same depth of functionality coverage that the big telcos might need.
As such, I thought I’d provide the cut-down TAM mapping below for those who want a less complex OSS/BSS suite.
It’s a really subjective mapping because each telco, provider or vendor will have their own perspective on mandatory features or modules. Hopefully it provides a useful starting point for planning a low complexity OSS/BSS.
Then what high-level functionality goes into these building blocks? That’s possibly even more subjective, but here are some hints:
Let me start today with a question: Does your future OSS/BSS need to be drastically different to what it is today?
Please leave me a comment below, answering yes or no.
I’m going to take a guess that most OSS/BSS experts will answer yes to this question, that our future OSS/BSS will change significantly. It’s the reason I wrote the OSS Call for Innovation manifesto some time back. As great as our OSS/BSS are, there’s still so much need for improvement.
But big improvement needs big change. And big change is scary, as Tom Nolle points out:
“IT vendors, like most vendors, recognize that too much revolution doesn’t sell. You have to creep up on change, get buyers disconnected from the comfortable past and then get them to face not the ultimate future but a future that’s not too frightening.”
Do you feel like we’re already in the midst of a revolution? Cloud computing, web-scaling and virtualisation (of IT and networks) have been partly responsible for it. Agile and continuous integration/delivery models too.
The following diagram shows a “from the moon” level view of how I approach (almost) any new project.
The key to Tom’s quote above is in step 2. Just how far, or how ambitious, into the future are you projecting your required change? Do you even know what that future will look like? After all, the environment we’re operating within is changing so fast. That’s why Tom is suggesting that for many of us, step 2 is just a “creep up on it change.” The gap is essentially small.
The “creep up on it change” means just adding a few new relatively meaningless features at the end of the long tail of functionality. That’s because we’ve already had the most meaningful functionality in our OSS/BSS for decades (eg customer management, product / catalog management, service management, service activation, network / service health management, inventory / resource management, partner management, workforce management, etc). We’ve had the functionality, but that doesn’t mean we’ve perfected the cost or process efficiency of using it.
So let’s say we look at step 2 with a slightly different mindset. Let’s say we don’t try to add any new functionality. We lock that down to what we already have. Instead we do re-factoring and try to pull the efficiency levers, which means changes to:
Platforms (eg cloud computing, web-scaling and virtualisation as well as associated management applications)
Methodologies (eg Agile, DevOps, CI/CD, noting of course that they’re more than just methodologies, but also come with tools, etc)
Process (eg User Experience / User Interfaces [UX/UI], supply chain, business process re-invention, machine-led automations, etc)
It’s harder for most people to visualise what the Step 2 Future State looks like. And if it’s harder to envisage Step 2, how do we then move onto Steps 3 and 4 with confidence?
This is the challenge for OSS/BSS vendors, supplier, integrators and implementers. How do we, “get buyers disconnected from the comfortable past and then get them to face not the ultimate future but a future that’s not too frightening?” And I should point out, that it’s not just buyers we need to get disconnected from the comfortable past, but ourselves, myself definitely included.
In the context of OSS/BSS, DBA has multiple meanings but I think the most relevant is Death By Acronym (don’t worry all you Database Administrators out there, I haven’t forgotten about you). Our industry is awash with TLAs (Three-Letter Acronyms) that lead to DBA.
Having said that, today’s article is about four that are commonly used in relation to end to end workflows through our OSS/BSS stacks. They often traverse different products, possibly even multiple different vendors’ products. They are as follows:
P2O – Prospect to Order – This workflow operates across the boundary between the customer and the customer-facing staff at the service provider. It allows staff to check what products can be offered to a customer. This includes service qualification (SQ), feasibility checks, then design, assign and reserve resources.
O2A – Order to Activate – This workflow includes all activities to manage customer services across entire life-cycles. That is, not just the initial activation of a service, but in-flight changes during activation and post-activation changes as well
U2C – Usage to Cash – This workflow allows customers or staff to evaluate the usage or consumption of a service (or services) that has already been activated for a customer
T2R – Trouble to Resolve – This “workflow” is more like a bundle of workflows that relate to assuring health of the services (and the network that carries them). They can be categorised as reactive (ie a customer triggers a resolution workflow by flagging an issue to the service provider) or a proactive (ie the service provider identifies and issue, degradation or potential for issue and triggers a resolution workflow internally)
PS. I recently read a vendor document that described additional flows:- I2I (Idea to Implementation – service onboarding, through a catalog presumably), P2P (Plan to Production – resource provisioning) and O2S (Order to Service). There’s also C2M (Concept to Market), L2C (Lead to Cash) and I’m sure I’m forgetting a number of others. Are there any additional TLAs that I should be listing here to describe end-to-end workflows?
There’s a famous Zig Ziglar quote that goes something like, “You can have everything in life you want, if you will just help enough other people get what they want.”
You could safely assume that this was written for the individual reader, but there is some truth in it within the OSS context too. For the OSS designer, builder, integrator, does the statement “You can have everything in your OSS you want, if you will just help enough other people get what they want,” apply?
We often just think about the O in OSS – Operations people, when looking for who to help. But OSS/BSS has the ability to impact far wider than just the Ops team/s.
The halcyon days of OSS were probably in the 1990’s to early 2000’s when the term OSS/BSS was at its most sexy and exciting. The big telcos were excitedly spending hundreds of millions of dollars. Those projects were huge… and hugely complex… and hugely fun!
With that level of investment, there was the expectation that the OSS/BSS would help many people. And they did. But the lustre has come off somewhat since then. We’ve helped sooooo many people, but perhaps didn’t help enough people enough. Just speak with anybody involved with an OSS/BSS stack and you’ll hear hints of a large gap that exists between their current state and a desired future state.
Do you mind if I ask two questions?
When you reflect on your OSS activities, do you focus on the technology, the opportunities or the problems
Do you look at the local, day-to-day activities or the broader industry
I tend to find myself focusing on the problems – how to solve them within the daily context on customer challenges, but the broader industry problems when I take the time to reflect, such as writing these blogs.
The part I find interesting is that we still face most of the same problems today that we did back in the 1990’s-2000’s. The same source of risks. We’ve done a fantastic job of helping many people get what they want on their day-to-day activities (the incremental). We still haven’t cracked the big challenges though. That’s why I wrote the OSS Call for Innovation, to articulate what lays ahead of us.
It’s why I’m really excited about two of the concepts we’ve discussed this week:
OSS projects are full of risks we all know it. OSS projects have “earned” a bad name because of all those risks. On the other side of that same coin, OSS projects disappoint, in part I suspect because stakeholders expect such big things from their resource investments.
Ask anyone familiar with OSS projects and you’ll be sure to hear a long list of failings.
For those less familiar with what an OSS project has in store for you, I’d like to share a list of the most common risks I’ve seen on OSS projects.
Most people working in the OSS industry are technology-centric, so they’ll tend to cite risks that relate to the tech. That’s where I used to focus attention too. Now technology risk definitely exists, but as you’ll see below, I tend to start by looking at other risk factors first these days.
Most common OSS project risks / issues:
Complexity (to be honest, this is probably more the root-cause / issue that manifests as many of the following risks). However, complexity across many aspects of OSS projects is one of the biggest problem sources
Change Management – OSS tend to introduce significant change to an organisation – operationally, organisationally, processes, training, etc. This is probably the most regularly underestimated component of any large OSS build
Stakeholder Support / Politics – Challenges appear on every single OSS project. They invariably need strong support from stakeholders and sponsors to clear a path through the biggest challenges. If the project’s leaders aren’t fully committed and in unison, the delivery teams will be heavily constrained
Ill-defined Scope – Over-scoping, scope omission and scope creep all represent risks to an OSS project. But scope is never perfectly defined or static, so scope management mechanisms need to be developed up-front rather than in-flight. Tying back to point 1 above, complexity minimisation should be a key element of scope planning. To hark back to my motto for OSS, “just because we can, doesn’t mean we should)
Financial and commercial – As with scope, it’s virtually impossible to plan an OSS project to perfection. There are always unknowns.These unknowns can directly impact the original estimates. Projects with blow-outs and no contingency for time or money increase pressure on point 3 (stakeholders/sponsors) to maintain their support
Client resource skills / availability – An OSS has to be built to the needs of a client. If the client is unable to provide resources to steer the implementation, then it’s unlikely for the client to get a solution that is perfectly adapted to the client’s needs. One challenge for the client is that their most valuable guides, those with the client’s tribal knowledge, are also generally in high demand by “business as usual” teams. It becomes a challenge to allocate enough of their time to guide the OSS delivery team. Another challenge is augmenting the team with the required skill-set when a project introduces new skill requirements
Communication – OSS projects aren’t built in a vacuum. They have many project contributors and even more end-users. There are many business units that touch an OSS/BSS, each with their own jargon and interpretations. For example, how many alternate uses of the term “service” can you think of? I think an important early-stage activity is to agree on and document naming conventions
Culture – Of the client team and/or project team. Culture contributes to (or detracts from) motivation, morale, resource turnover, etc, which can have an impact on the team’s ability to deliver
Design / Integration – Finally, a technology risk. This item is particularly relevant with complex projects, it can be difficult for all of the planned components to operate and integrate as planned. A commonly unrecognised risk relates to the viability of implementing a design. It’s common for an end-state design to be specified but with no way of navigating through a series of steps / phases and reach the end-state
Technology – Similar to the previous point, there are many technology risks relating to items such as quality, scalability, resiliency, security, supportability, obsolescence, interoperability, etc
There’s one thing you will have probably noticed about this list. Most of the risks are common to other projects, not just OSS projects. However, the risks do tend to amplify on OSS projects because of their inherent complexity.
The traditional telco (and OSS) ran at different speeds. Some tasks had to happen immediately (eg customers calling one another) while others took time (eg getting a connection to a customer’s home, which included designs, approvals, builds, etc), often weeks.
Our OSS have processes that must happen sequentially and expediently. They also have processes that must wait for dependencies, conditional events and time delays. Some roles need “fast,” others can cope with “slow.” Who wins out in this dilemma?
Even the data we rely on can transact at different speeds. For capacity planning, we’re generally interested in longer-term data. We don’t have to process at real-time. Therefore we can choose to batch process at longer cycle times and with summarised data sets. For network assurance, we’re generally interested in getting data as quick as is viable.
Today’s post is about that word, viable, and pragmatism we sometimes have to apply to our OSS.
For example, if our operations teams want to reduce network performance poll cycles from every 15 mins down to once a minute, we increase the amount of data to process by 15x. That means our data storage costs go up by 15x (assuming a flat-rate cost structure applies). The other hidden cost is that our compute and network costs also go up because we have to transfer and process 15x as much data.
The trade-off we have to make in responses to this rapid escalation of cost (when going from 15 to 1 min) is in the benefits we might derive. Can we avoid SLA (Service Level Agreement) breach costs? Can we avoid costly outages? Can we avoid damage to equipment? Can we reduce the risk of losing our carrier license?
The other question is whether our operators actually have the ability to respond to 15x as much data. Do we have enough people to respond at an increased cycle time? Do we have OSS tools that are capable of filtering what’s important and disregarding “background” activity? Do we have OSS tools that are capable of learning from every single metric (eg AI), at volumes the human brain could never cope with?
Does it make sense that we have a single platform for handling fast and slow processes? For example, do we use the same platform to process 1 minute-cycle performance data for long-term planning (batch-processed once daily) and quick-fire assurance (processed as fast as possible)?
If we stick to one platform, can our OSS apply data reduction techniques (eg selective discard of records) to get the benefits of speed, but with the cost reduction of slow?
Is your OSS a single pane of glass, or a single glass of pain?
You can tell I’m being a little flippant here. People often (perhaps idealistically) talk about OSS as being the single pane of glass (SPOG) to manage a network.
I say “idealistically” for a couple of reasons:
There are usually many personas who interact with an OSS, each with vastly different user interface (UI) needs
There is usually more than one OSS product in a client’s OSS suite, often from different vendors, with varying levels of integration
Where a single pane of glass can be a true ambition is as a consolidated health-status dashboard / portal, Invariably, this portal is used by executive / leader / manager personas who want to quickly see a single-screen health status that covers all networks and/or parts of the OSS suite. When things go wrong, this portal becomes the single glass of pain.
These single panes tend to be heavily customised for each organisation as every one has a unique set of metrics-that-matter. For those designing these panes, the key is to not just include vanity metrics, but to show information that the leader can action.
But the interesting perspective here is whether the single glass of pain is even relevant within your organisation’s culture. It’s just my opinion, but I prefer for coal-face workers to be empowered to make rapid recovery actions rather than requiring direction from up high in the org-chart. Coal-face workers generally have different tools with UIs that *should* help them monitor, manage and repair super-efficiently.
To get back to the “idealistic” comment above, each OSS UI needs to be fit-for-purpose for each unique persona (eg designers, product owners, network operations, etc). To me this implies that there is no single pane of glass…
I should caveat that by citing the example of an OSS search interface, something I’ve yet to see in OSS… although that’s just a front end to dozens of persona-specific panes of glass.
“There’s a fable of a man stuck in a flood. Convinced that God is going to save him, he says no to a passing canoe, boat, and helicopter that offer to help. He dies, and in heaven asks God why He didn’t save him. God says, “I sent you a canoe, a boat, and a helicopter!”
We all have vivid imaginations. We get a goal in our mind and picture the path so clearly. Then it’s hard to stop focusing on that vivid image, to see what else could work.
New technologies make old things easier, and new things possible. That’s why you need to re-evaluate your old dreams to see if new means have come along.”
Derek Sivers, here.
In the past, we could make OSS platform decisions with reasonable confidence that our choices would remain viable for many years. For example, in the 1990s if we decided to build our OSS around a particular brand of relational database then it probably remained a valid choice until after 2010.
But today, there are so many more platforms to choose from, not to mention the technologies that underpin them. And it’s not just the choices currently available but the speed with which new technologies are disrupting the existing tech. In the 1990s, it was a safe bet to use AutoCAD for outside plant visualisation without the risk of heavy re-tooling within a short timeframe.
If making the same decision today, the choices are far less clear-cut. And the risk that your choice will be obsolete within a year or two has skyrocketed.
With the proliferation of open-source projects, the decision has become harder again. That means the skill-base required to service each project has also spread thinner. In turn, decisions for big investments like OSS projects are based more on the critical mass of developers than the functionality available today. If many organisations and individuals have bought into a particular project, you’re more likely to get your new features developed than from a better open-source project that has less community buy-in.
We end up with two ends of a continuum to choose between. We can either chase every new bright shiny object and re-factor for each, or we can plan a course of action and stick to it even if it becomes increasingly obsoleted over time. The reality is that we probably fit somewhere between the two ends of the spectrum.
To be brutally honest I don’t have a solution to this conundrum. The closest technique I can suggest is to design your solution with modularity in mind, as opposed to the monolithic OSS of the past. That’s the small-grid OSS architecture model. It’s easier to replace one building than an entire city.
In that example, the OSS/BSS, and possibly the associated people / process, had a direct impact on poor customer experience. Admittedly, that 7 truck-roll experience was a number of years ago now.
We have fewer excuses these days. Smart phones and network connected devices allow us to get OSS/BSS data into the field in ways we previously couldn’t. There’s no need for printed job lists, design packs and the like. Our OSS/BSS can leverage these connected devices to give far better decision intelligence in real time.
If we look to the logistics industry, we can see how parcel tracking technologies help to automatically provide status / progress to parcel recipients. We can see how recipients can also modify their availability, which automatically adjusts logistics delivery sequencing / scheduling.
This has multiple benefits for the logistics company:
It increases first time delivery rates
Improves the ability to automatically notify customers (eg email, SMS, chatbots)
Decreases customer enquiries / complaints
Decreases the amount of time the truck drivers need to spend communicating back to base and with clients
But most importantly, it improves the customer experience
Logistics is an interesting challenge for our OSS/BSS due to the sheer volume of customer interaction events handled each day.
But it’s another area that excites me even more, where CX is improved through improved data quality:
It’s the ability for field workers to interact with OSS/BSS data in real-time
To see the design packs
To compare with field situations
To update the data where there is inconsistency.
Even more excitingly, to introduce augmented reality to assist with decision intelligence for field work crews:
To provide an overlay of what fibres need to be spliced together
To show exactly which port a patch-lead needs to connect to
“This minimum feature set (sometimes called the “minimum viable product”) causes lots of confusion. Founders act like the “minimum” part is the goal. Or worse, that every potential customer should want it. In the real world not every customer is going to get overly excited about your minimum feature set. Only a special subset of customers will and what gets them breathing heavy is the long-term vision for your product.
The reality is that the minimum feature set is 1) a tactic to reduce wasted engineering hours (code left on the floor) and 2) to get the product in the hands of early visionary customers as soon as possible.
You’re selling the vision and delivering the minimum feature set to visionaries not everyone.” Steve Blankhere.
Note that you can replace the term pilot in these posts with MVP – Minimum Viable Product.
The attraction in getting an MVP / pilot version of your OSS into the hands of users is familiarity and momentum. The solution becomes more tangible and therefore needs less documentation (eg architecture, designs, requirement gathering, etc) to describe foreign concepts to customers. The downside of the MVP / pilot is that not every customer will “get overly excited about your minimum feature set.”
As Steve says, “Only a special subset of customers will and what gets them breathing heavy is the long-term vision for your product.” The challenge for all of us in OSS is articulating the long-term vision and making it compelling…. and not just leaving the product in its pilot state (we’ve all seen this happen haven’t we?)
We’ll provide an example of a long-term vision tomorrow.