“There’s the famous quote that if you want to understand how animals live, you don’t go to the zoo, you go to the jungle. The Future Lab has really pioneered that within Lego, and it hasn’t been a theoretical exercise. It’s been a real design-thinking approach to innovation, which we’ve learned an awful lot from.”
Jorgen Vig Knudstorp.
This quote prompted me to ask the question – how many times during OSS implementations had I sought to understand user behaviour at the zoo versus the jungle?
By that, how many times had I simply spoken with the user’s representative on the project team rather than directly with end users? What about the less obvious personas as discussed in this earlier post about user personas? Had I visited the jungles where internal stakeholders like project sponsors, executives, data consumers, etc. or external stakeholders such as end-customers, regulatory bodies, etc go about their daily lives?
I can truthfully, but regretfully, say I’ve spent far more time at OSS zoos than in jungles. This is something I need to redress.
But, at least I can claim to have spent most time in customer-facing roles.
Too many of the product development teams I’ve worked closely with don’t even visit OSS zoos let alone jungles in any given year. They never get close to observing real customers in their native environments.
The diagram below attempts to demonstrate the concept visually, in the form of three important sliders.
When it comes to the technical delivery, it makes sense that most of the responsibility falls upon the supplier. They obviously have the greater know-how from building and implementing their own products. However, and despite what some clients expect, you’ll notice that the slider isn’t all the way to the left though. The client can’t just “throw the hand grenade over the fence” and expect the supplier to just build the solution in isolation. The client needs to be involved to ensure the solution is configured to their unique requirements. This covers factors such as network types, service types, process models, naming conventions, personas supported, integrations, approvals, etc.
Unfortunately, organisational change is an afterthought far too often on OSS projects. Not only that, but the client often expects the supplier to handle that too. They expect the slider to fall far to the left too. In my opinion, this is completely unrealistic. In most cases, the supplier simply doesn’t have the knowledge of, or influence over, the individuals within the client’s organisation. That’s why the middle slider falls mostly towards the right-hand (client) side. Not all the way though because the supplier will have suggestions / input / training based on learnings from past implementations. BTW. The link above also describes an important perspective shift to help the org change aspect of OSS transformation.
And lastly, the success of a project relies on strength of relationship throughout, but also far beyond, the initial implementation. You’d expect that most OSS implementations will have a useful life of many years. Due to the complexity of OSS transformations, clients want to stay with the same supplier for long periods because they don’t want to endure a change-out. Like any relationships, trust plays an important role. The relationship clearly has to be beneficial to both parties. Unfortunately, three factors often doom OSS relationships from the outset.
Firstly, the sliders above show my unbiased perspective of the weight of responsibility on a generic OSS project. If each party has a vastly different expectation of slider positioning, then the project can be off to a difficult (but all-too-common) start.
Secondly, the nature of vendor selection process can also gnaw away at trust quite quickly. The client wants an as-low-as-possible cost in the contract (obviously). The supplier wants to win the bid, so they keep costs as low as possible, often hoping to make up the difference through the inevitable variations that happen on these complex projects.
And thirdly, the complexity of these projects means challenges almost always arise and can cause cynicism being hurled across the fence by both parties.
You may be wondering why the third slider isn’t perfectly centred between both. You may claim that significant responsibility for humility, fairness and forgiveness lies with each participant to ensure a long-lasting, trusted relationship. I’d agree with you on that, but I’d also argue that the supplier carries slightly more responsibility as they (usually) hold a slight balance in power. They know the client doesn’t want to endure another OSS change-out project any time soon, so the client generally has more to lose from a relationship breakdown. Unfortunately, I’ve seen this leveraged by vendors too many times.
Do you agree/disagree with these observations? I’d love to hear your thoughts.
Oh, and if you’re ever need an independent third-party to help set the right balance of expectations across these sliders on your project, you’re welcome to call upon Passionate About OSS to assist.
OSS implementations / transformations are always challenging. Stakeholders seem to easily get their heads around the fact that there will be technical challenges (even if they / we can’t always get their head around the actual changes initially).
When a supplier is charged with doing an OSS implementation, the client (perhaps rightly) expects the supplier to lead the technical implementation and guide the client through any challenges. It’s the, “Over to you!” client mentality at times.
However, it’s the change management challenges that are often overlooked and/or underestimated (by client and supplier alike). It’s far less realistic for a client to delegate these activities and challenges to the supplier. The supplier simply doesn’t have the reach or influence within the client’s organisation (unless they’re long-term trusted partners). Just doing a 2 week training course at the end of the implementation rarely works.
Now, if you do represent the client, change management starts all the way back at the start of the project – from the time we start to gather current and desired future state, including process and persona mappings.
At that time we can put ourselves in the shoes of each person impacted by OSS change and consider, “If your current normal is exactly what you need, then different isn’t worth exploring” (a Seth Godin quote).
How many times have you heard about operators bypassing their sophisticated new OSS and reverting back to their old spreadsheets (thus keeping an offline store of data that would be valuable to be stored in the OSS)?
Interestingly though, if you approach those same people before the OSS implementation and ask them whether their as-is spreadsheet model gives them exactly what they need, you will undoubtedly get some great insights (either yes it is and here’s why…
or not it’s not because…).
You have a stronger position of influence with these operators if you involve and listen pre-implementation than enforcing change afterwards.
To again quote Seth, they’re not always, “hesitant about this new idea because it’s a risky, problematic, defective idea… [but] because it’s simply different than [they’re] used to.”
“Whatever is well conceived is clearly said,
And the words to say it flow with ease.”
I’d like to hijack this quote and re-direct it towards architectures. Could we equally state that a well conceived architecture can be clearly understood? Some modern OSS/IT frameworks that I’ve seen recently are hugely complex. The question I’ve had to ponder is whether they’re necessarily complex. As the aphorism states, “Everything should be made as simple as possible, but not simpler.”
Just take in the complexity of this triptych I prepared to overlay SDN, NFV and MANO frameworks.
Yet this is only a basic model. It doesn’t consider networks with a blend of PNF and VNF (Physical and Virtual Network Functions). It doesn’t consider closed loop assurance. It doesn’t consider other automations, or omni-channel, or etc, etc.
Yesterday’s post raised an interesting concept from Tom Nolle that as our solutions become more complex, our ability to make a basic assessment of value becomes more strained. And by implication, we often need to upskill a team before even being able to assess the value of a proposed project.
It seems to me that we need simpler architectures to be able to generate persuasive business cases. But it poses the question, do they need to be complex or are our solutions just not well enough conceived yet?
To borrow a story from Wikiquote, “Richard Feynman, the late Nobel Laureate in physics, was once asked by a Caltech faculty member to explain why spin one-half particles obey Fermi Dirac statistics. Rising to the challenge, he said, “I’ll prepare a freshman lecture on it.” But a few days later he told the faculty member, “You know, I couldn’t do it. I couldn’t reduce it to the freshman level. That means we really don’t understand it.“
“…as technology gets more complicated, it becomes more difficult for buyers to acquire the skills needed to make even a basic assessment of value. Without such an assessment, it’s hard to get a project going, and in particular hard to get one going the right way.”
Have you noticed that over the last few years, OSS choice has proliferated, making project assessment more challenging? Previously, the COTS (Commercial Off-the-Shelf) product solution dominated. That was already a challenge because there are hundreds to choose from (there are around 400 on our vendors page alone). But that’s just the tip of the iceberg.
We now also have choices to make across factors such as:
Building OSS tools with open-source projects
An increasing amount of in-house development (as opposed to COTS implementations by the product’s vendors)
Smaller niche products that need additional integration
An increase in the number of “standards” that are seeking to solve traditional OSS/BSS problems (eg ONAP, ETSI’s ZSM, TM Forum’s ODA, etc, etc)
Revolutions from the IT world such as cloud, containerisation, virtualisation, etc
As Tom indicates in the quote above, the diversity of skills required to make these decisions is broadening. Broadening to the point where you generally need a large team to have suitable skills coverage to make even a basic assessment of value.
At Passionate About OSS, we’re seeking to address this in the following ways:
We have two development projects underway (more news to come)
One to simplify the vendor / product selection process
One to assist with up-skilling on open-source and IT tools to build modern OSS
In addition to existing pages / blogs, we’re assembling more content about “standards” evolution, which should appear on this blog in coming days
Use our “Finding an Expert” tool to match experts to requirements
And of course there are the variety of consultancy services we offer ranging from strategy, roadmap, project business case and vendor selection through to resource identification and implementation. Leave us a message on our contact page if you’d like to discuss more
Over the years in OSS, I’ve spent a lot of my time helping companies create their OSS / BSS strategies and roadmaps. Sometimes clients come from the buy side (eg carriers, utilities, enterprise), other times clients come from the sell side (eg vendors, integrators). There’s one factor that seems to be most commonly raised by these clients, and it comes from both sides.
What is that one factor? Well, we’ll come back to what that factor is a little later, but let’s cover some background first.
OSS / BSS covers a fairly broad estate of functionality:
If you’re from the buy-side, you need to manage both to build a full-function OSS/BSS suite. If you’re from the sell-side, you’re either forced into dealing with both (reactive) or sometimes you can choose to develop those to bring a more complete offering to market (proactive).
You will have noticed that both are double-ended. Integrations bring two applications / functions together. Relationships bring two organisations together.
This two-ended concept means there’s always a “far-side” that’s outside your control. It’s in our nature to worry about what’s outside our control. We tend to want to put controls around what we can’t control. Not only that, but it’s incumbent on us as organisation planners to put mitigation strategies in place.
Which brings us back to the one factor that is raised by clients on most occasions – substitution – how do we minimise our exposure to lock-in with an OSS product / service partner/s if our partnership deteriorates?
Well, here are some thoughts:
Design your own architecture with product / partner substitution in mind (and regularly review your substitution plan because products are always evolving)
Develop multiple integrations so that you always have active equivalency. This is easier for sell-side “reactives” because their different customers will have different products to integrate to (eg an OSS vendor that is able to integrate with four different ITSM tools because they have different customers with each of those variants)
Enhance your own offerings so that you no longer require the partnership, but can do it yourself
Invest in your partnerships to ensure they don’t deteriorate. This is the OSS marriage analogy where ongoing mutual benefits encourage the relationship to continue.
The last four posts have discussed how our OSS/BSS need to cope with different modes of working to perform effectively. We started off with the thread of “group flow,” where multiple different users of our tools can work cohesively. Then we talked about how flow requires a lack of interruptions, yet many of the roles using our OSS actually need constant availability (ie to be constantly interrupted).
From a user experience (UI/UX) perspective, we need an awareness of the state the operator/s needs to be in to perform each step of an end-to-end process, be it:
Deep think or flow mode – where the operator needs uninterrupted time to resolve a complex and/or complicated activity (eg a design activity)
Constant availability mode – where the operator needs to quickly respond to the needs of others and therefore needs a stream of notifications / interruptions (eg network fault resolutions)
Group flow mode – where a group of operators need to collaborate effectively and cohesively to resolve a complex and/or complicated activity (eg resolve a cross-domain fault situation)
This is a strong argument for every OSS/BSS supplier to have UI/UX experts on their team. Yet most leave their UI/UX with their coders. They tend to take the perspective that if the function can be performed, it’s time to move on to building the next function. That was the same argument used by all MP3 player suppliers before the iPod came along with its beautiful form and function and dominated the market.
Interestingly, modern architectural principles potentially make UI/UX design more challenging. With old, monolithic OSS/BSS, you at least had more control over end-to-end workflows (I’m not suggesting we should go back to the monoliths BTW). These days, you need to accommodate the unique nuances / inconsistencies of third-party modules like APIs / microservices.
As Evan Linwood incisively identified, ” I guess we live in the age of cloud based API providers, theoretically enabling loads of pre-canned integration patterns but these may not be ideal for a large service provider… Definitely if the underlying availability isn’t there, but could also occur through things like schema mismanagement across multiple providers? (Which might actually be an argument for better management / B/OSS, rather than against the use of microservices!”
Am I convincing any of you to hire more UI/UX resources? Or convincing you to register for UI/UX as your next training course instead of learning a ninth programming language?
Put simply, we need your assistance to take our OSS from this…
“I believe in the principle that deep work and constant availability are repulsive concepts (in the magnetic sense).”
Tyler Mumfordin comment 2 to this post.
This blogging thing really amazes me at times. I’m regularly left shocked at the serendipitous connections that form when writing posts. Take today’s post. I did a web search looking for the thread of an idea that had no relation at all to yesterday’s post. But of the millions of possible authors that could’ve come up in the search, the article I read first was by Cal Newport. The same Cal Newport as quoted in yesterday’s post. The two articles weren’t even from the same domain (BBC.com vs calnewport.com).
Not only that but the quote above from Tyler Mumford, in serendipitous response to Cal’s article, perfectly articulates what I was struggling to describe to close out yesterday’s post. Deep work and constant availability are indeed repulsive (ie mutually exclusive). Yet both exist within the activities performed using our OSS!!
Think about that for a moment.
There are some tasks that require constant availability (think about the NOC operators who have to respond urgently to any degradation in their network’s health).
There are other tasks that require deep work (think about the NOC operators who have to identify the root-cause of a really gnarly and catastrophic fault).
But the OSS user interfaces we build do little to separate them. The processes we design don’t consider their repulsiveness. Even the way we resource our OSS implementation projects suffers from this magnetic repulsion.
As an OSS implementer, I’ve always found it interesting that clients struggle to provide suitable expertise to steer the build, to ensure it’s configured precisely for their needs. I often quote the old parable of “you get back what you put in.” I still believe the saying, but there’s more to it than that.
An OSS implementation team needs significant input from the most knowledgeable end-users. They provide the local context, the tribal knowledge. But the most knowledgeable end-users are also most valuable at performing BAU (business as usual) tasks [assuming you’re transforming an OSS whilst still maintaining an existing network]. But I’ve rarely seen a client get the balance right between providing expertise to the “build” and “run” streams in parallel. Even rarer have I seen a client expert who can quickly task-switch between build and run activities. It seems to be much more effective if client expert/s can be seconded to work on the OSS project team with few BAU activities. Tyler’s quote above helps to explain why.
Build mode requires deep work, for the most part (eg coding, process design, solution architecture, data mapping, etc). Run mode tends to require constant availability, with a few key exceptions (eg network designs, root-cause identification, etc). The two require separation.
So perhaps the parable should be, “you get back what you put in and separate out.” 🙂
We contrasted this with the mechanisms used in most OSS that actually prevent flow-state from occurring. Today I’m going to dive into the work that goes into creating a new design (to activate a customer), and how our current OSS designs / processes inhibit flow.
“Being switched on at all times and expected to pick things up immediately makes us miserable, says [Cal] Newport. “It mismatches with the social circuits in our brain. It makes us feel bad that someone is waiting for us to reply to them. It makes us anxious.”
Because it is so easy to dash off a quick reply on email, Slack or other messaging apps, we feel guilty for not doing so, and there is an expectation that we will do it. This, says Newport, has greatly increased the number of things on people’s plates. “The average knowledge worker is responsible for more things than they were before email. This makes us frenetic. We should be thinking about how to remove the things on their plate, not giving people more to do…
Going cold turkey on email or Slack will only work if there is an alternative in place. Newport suggests, as many others now do, that physical communication is more effective. But the important thing is to encourage a culture where clear communication is the norm.
Newport is advocating for a more linear approach to workflows. People need to completely stop one task in order to fully transition their thought processes to the next one. However, this is hard when we are constantly seeing emails or being reminded about previous tasks. Some of our thoughts are still on the previous work – an effect called attention residue.”
That resonates completely with me. So let’s consider that and look into the collaboration process of a stylised order activation:
Customer places an order via an order-entry portal
Perform SQ (Service Qualification) and Credit Checks, automated processes
Order is broken into work order activities (automated process)
Designer1 picks up design work order activity from activity list and commences outside plant design (cables, pits, pipes). Her design pack includes:
Updating AutoCAD / GIS drawings to show outside plant (new cable in existing pit/pipe, plus lead-in cable)
Updating OSS to show splicing / patching changes
Creates project BoQ (bill of quantities) in a spreadsheet
Designer2 picks up next work order activity from activity list and commences active network design. His design pack includes:
Allocation of CPE (Customer Premises Equipment) from warehouse
Allocation of IP address from ranges available in IPAM (IP address manager)
Configuration plan for CPE and network edge devices
FieldWorkTeamLeader reviews inside plant and outside plant designs and allocates to FieldWorker1. FieldWorker1 is also issued with a printed design pack and the required materials
FieldWorker1 commences build activities and finds out there’s a problem with the design. It indicates splicing the customer lead-in to fibres 1/2, but they appear to already be in use
So, what does FieldWorker1 do next?
The activity list / queue process has worked reasonably well up until this step in the process. It allowed each person to work autonomously, stay in deep focus and in the sequence of their own choosing. But now, FieldWorker1 needs her issue resolved within only a few minutes or must move on to her next job (and next site). That would mean an additional truck-roll, but also annoying the customer who now has to re-schedule and take an additional day off work to open their house for the installer.
FieldWorker1 now needs to collaborate quickly with Designer1, Designer2 and FieldWorkTeamLeader. But most OSS simply don’t provide the tools to do so. The go-forward decision in our example draws upon information from multiple sources (ie AutoCAD drawing, GIS, spreadsheet, design document, IPAM and the OSS). Not only that, but the print-outs given to the field worker don’t reflect real-time changes in data. Nor do they give any up-stream context that might help her resolve this issue.
So FieldWorker1 contacts the designers directly (and separately) via phone.
Designer1 and Designer2 have to leave deep-think mode to respond urgently to the notification from FieldWorker1 and then take minutes to pull up the data. Designer1 and Designer2 have to contact each other about conflicting data sets. Too much time passes. FieldWorker1 moves to her next job.
Our challenge as OSS designers is to create a collaborative workspace that has real-time access to all data (not just the local context as the issue probably lies in data that’s upstream of what’s shown in the design pack). Our workspace must also provide all participants with the tools to engage visually and aurally – to choreograph head-office and on-site resources into “group flow” to resolve the issue.
Even if such tools existed today, the question I still have is how we ensure our designers aren’t interrupted from their all-important deep-think mode. How do we prevent them from having to drop everything multiple times a day/hour? Perhaps the answer is in an organisational structure – where all designers have to cycle through the Design Support function (eg 1 day in a fortnight), to take support calls from field workers and help them resolve issues. It will give designers a greater appreciation for problems occurring in the field and also help them avoid responding to emails, slack messages, etc when in design mode.
Yesterday’s post talked about the difference between “flow state” and “office state” in relation to OSS delivery. It referenced a book I’m currently reading called Stealing Fire.
The post mainly focused on how the interruptions of “office state” actually inhibit our productivity, learning and ability to think laterally on our OSS. But that got me thinking that perhaps flow doesn’t just relate to OSS project delivery. It also relates to post-implementation use of the OSS we implement.
If we think about the various personas who use an OSS (such as NOC operators, designers, order entry operators, capacity planners, etc), do our user interfaces and workflows assist or inhibit them to get into the zone? More importantly, if those personas need to work collaboratively with others, do we facilitate them getting into “group flow?”
Stealing Fire suggests that it costs around $500k to train each Navy SEAL and around $4.25m to train each elite SEAL (DEVGRU). It also describes how this level of training allows DEVGRU units to quickly get into group flow and function together almost as if choreographed, even in high-pressure / high-noise environments.
Contrast this with collaborative activities within our OSS. We use tickets, emails, Slack notifications, work order activity lists, etc to collaborate. It seems to me that these are the precise instruments that prevent us from getting into flow individually. I assume it’s the same collectively. I can’t think back to any end-to-end OSS workflows that seem highly choreographed or seamlessly effective.
Think about it. If you experience significant rates of process fall-out / error, then it would seem to indicate an OSS that’s not conducive to group flow. Ditto for lengthy O2A (order to activate) or T2R (trouble to resolve) times. Ditto for bringing new products to market.
I’d love to hear your thoughts. Has any OSS environment you’ve worked in facilitated group flow? If so, was it the people and/or the tools? Alternatively, have the OSS you’ve used inhibited group flow?
PS. Stealing Fire details how organisations such as Google and DARPA are investing heavily in flow research. They can obviously see the pay-off from those investments (or potential pay-offs). We seem to barely even invest in UI/UX experts to assist with the designs of our OSS products and workflows.
TM Forum’s Open Digital Architecture (ODA) White Paper begins with the following statement:
Telecoms is at a crucial turning point. The last decade has dealt a series of punishing blows to an industry that had previously enjoyed enviable growth for more than 20 years. Services that once returned high margins are being reduced to commodities in the digital world, and our insatiable appetite for data demands continuous investment in infrastructure. On the other hand, communications service providers (CSPs) and their partners are in an excellent position to guide and capitalize on the next wave of digital revolution.
Clearly, a reduction in profitability leads to a reduction in cash available for projects – including OSS transformation projects. And reduced profitability almost inevitably leads executives to start thinking about head-count reduction too.
As Luke Clifton of Macquarie Telecom observed here, “Telstra is reportedly planning to shed 1,200 people from its enterprise business with many of these people directly involved in managing small-to-medium sized business customers. More than 10,000 customers in this segment will no longer have access to dedicated Account Managers, instead relegated to being managed by Telstra’s “Digital Hub”… Telstra, like the big banks once did, is seemingly betting that customers won’t leave them nor will they notice the downgrade in their service. It will be interesting to see how 10,000 additional organisations will be managed through a Digital Hub.
Simply put, you cannot cut quality people without cutting the quality of service. Those two ideals are intrinsically linked…”
As a fairly broad trend across the telco sector, projects and jobs are being cut, whilst technology change is forcing transformation. And as suggested in Luke’s “Digital Hub” quote above, it all leads to increased expectations on our OSS/BSS.
Pressure is coming at our OSS from all angles, and with no signs of abating.
To quote Queen, “Pressure. Pushing down on me.Pressing down on you.”
So it seems to me there are only three broad options when planning our OSS roadmaps:
We learn to cope with increased pressure (although this doesn’t seem like a viable long-term option)
We reduce the size (eg functionality, transaction volumes, etc) of our OSS footprint [But have you noticed that all of our roadmaps seem expansionary in terms of functionality, volumes, technologies incorporated, etc??]
We look beyond the realms of traditional OSS/BSS functionality (eg just servicing operations) and into areas of opportunity
TM Forum’s ODA White Paper goes on to state, “The growth opportunities attached to new 5G ecosystems are estimated to be worth over $580 billion in the next decade. Servicing these opportunities requires transformation of the entire industry. Early digital transformation efforts focused on improving customer experience and embracing new technologies such as virtualization, with promises of wide-scale automation and greater agility. It has become clear that these ‘projects’ alone are not enough. CSPs’ business and operating models, choice of technology partners, mindset, decision-making and time to market must also change. True digital business transformation is not an easy or quick path, but it is essential to surviving and thriving in the future digital market.”
BTW. I’m not suggesting 5G is the panacea or single opportunity here. My use of the quote above is drawing more heavily on the opportunities relating to digital transformation. Not of the telcos themselves, but digital transformation of their customers. If data is the oil of the 21st century, then our OSS/BSS and telco assets have the potential to be the miners and pipelines of that oil.
If / when our OSS go from being cost centres to revenue generators (directly attributable to revenue, not the indirect attribution by most OSS today), then we might feel some of the pressure easing off us.
Fault / Alarm management tools have lots of strings to their functionality bows to help operators focus in on the target/s that matter most. ITU-T’s recommendation X.733 provided an early framework and common model for classification of alarms. This allowed OSS vendors to build a standardised set of filters (eg severity, probable cause, etc). ITU-T’s recommendation M.3703 then provided a set of guiding use cases for managing alarms. These recommendations have been around since the 1990’s (or possibly even before).
Despite these “noise reduction” tools being readily available, they’re still not “compressing” event lists enough in all cases.
I imagine, like me, you’ve heard many customer stories where so many new events are appearing in an event list each day that the NOC (network operations centre) just can’t keep up. Dozens of new events are appearing on the screen, then scrolling off the bottom of it before an operator has even had a chance to stop and think about a resolution.
So if humans can’t keep up with the volume, we need to empower machines with their faster processing capabilities to do the job. But to do so, we first have to take a step away from the noise and help build a systematic root-cause analysis (RCA) pipeline.
I call it a pipeline because there are generally a lot of RCA rules that are required. There are a few general RCA rules that can be applied “out of the box” on a generic network, but most need to be specifically crafted to each network.
So here’s a step-by-step guide to build your RCA pipeline:
Scope – Identify your initial target / scope. For example, what are you seeking to prioritise:
Event volume reduction to give the NOC breathing space to function better
Identifying “most important” events (but defining what is most important)
Minimising SLA breaches
Gather Data – Gather incident and ticket data. Your OSS is probably already doing this, but you may need to pull data together from various sources (eg alarms/events, performance, tickets, external sources like weather data, etc)
Pattern Identification – Pattern identification and categorisation of incidents. This generally requires a pattern identification tool, ideally supplied by your alarm management and/or analytics supplier
Prioritise – Using a long-tail graph like below, prioritise pattern groups by the following (and in line with item #1 above):
Number of instances of the pattern / group (ie frequency)
Priority of instances (ie urgency of resolution)
Number of linked incidents (ie volume)
Other technique, such as a cumulative/blended metric
Gather Resolution Knowledge – Understand current NOC approaches to fault-identification and triage, as well as what’s important to them (noting that they may have biases such as managing to vanity metrics)
Note any Existing Resolutions – Identify and categorise any existing resolutions and/or RCA rules (if data supports this)
Short-list Remaining Patterns – Overlay resolution pattern on long-tail (to show which patterns are already solved for). then identify remaining priority patterns on the long-tail that don’t have a resolution yet.
Codify Patterns – Progressively set out to identify possible root-cause by analysing cause-effect such as:
Other (as helped to be defined by NOC operators)
Knowledge base – Create a knowledge base that itemises root-causes and supporting information
Build Algorithm / Automation – Create an algorithm for identifying root-cause and related alarms. Identify level of complexity, risks, unknowns, likelihood, control/monitoring plan for post-install, etc. Then build pilot algorithm (and possibly roll-back technique??). This might not just be an RCA rule, but could also include other automations. Automations could include creating a common problem and linking all events (not just root cause event but all related events), escalations, triggering automated workflows, etc
Test pilot algorithm (with analytics??)
Introduce algorithm into production use – But continue to monitor what’s being suppressed to
Repeat – Then repeat from steps 7 to 12 to codify the next most important pattern
Leading metrics – Identify leading metrics and/or preventative measures that could precede the RCA rule. Establish closed-loop automated resolution
Let me start today with a question: Does your future OSS/BSS need to be drastically different to what it is today?
Please leave me a comment below, answering yes or no.
I’m going to take a guess that most OSS/BSS experts will answer yes to this question, that our future OSS/BSS will change significantly. It’s the reason I wrote the OSS Call for Innovation manifesto some time back. As great as our OSS/BSS are, there’s still so much need for improvement.
But big improvement needs big change. And big change is scary, as Tom Nolle points out:
“IT vendors, like most vendors, recognize that too much revolution doesn’t sell. You have to creep up on change, get buyers disconnected from the comfortable past and then get them to face not the ultimate future but a future that’s not too frightening.”
Do you feel like we’re already in the midst of a revolution? Cloud computing, web-scaling and virtualisation (of IT and networks) have been partly responsible for it. Agile and continuous integration/delivery models too.
The following diagram shows a “from the moon” level view of how I approach (almost) any new project.
The key to Tom’s quote above is in step 2. Just how far, or how ambitious, into the future are you projecting your required change? Do you even know what that future will look like? After all, the environment we’re operating within is changing so fast. That’s why Tom is suggesting that for many of us, step 2 is just a “creep up on it change.” The gap is essentially small.
The “creep up on it change” means just adding a few new relatively meaningless features at the end of the long tail of functionality. That’s because we’ve already had the most meaningful functionality in our OSS/BSS for decades (eg customer management, product / catalog management, service management, service activation, network / service health management, inventory / resource management, partner management, workforce management, etc). We’ve had the functionality, but that doesn’t mean we’ve perfected the cost or process efficiency of using it.
So let’s say we look at step 2 with a slightly different mindset. Let’s say we don’t try to add any new functionality. We lock that down to what we already have. Instead we do re-factoring and try to pull the efficiency levers, which means changes to:
Platforms (eg cloud computing, web-scaling and virtualisation as well as associated management applications)
Methodologies (eg Agile, DevOps, CI/CD, noting of course that they’re more than just methodologies, but also come with tools, etc)
Process (eg User Experience / User Interfaces [UX/UI], supply chain, business process re-invention, machine-led automations, etc)
It’s harder for most people to visualise what the Step 2 Future State looks like. And if it’s harder to envisage Step 2, how do we then move onto Steps 3 and 4 with confidence?
This is the challenge for OSS/BSS vendors, supplier, integrators and implementers. How do we, “get buyers disconnected from the comfortable past and then get them to face not the ultimate future but a future that’s not too frightening?” And I should point out, that it’s not just buyers we need to get disconnected from the comfortable past, but ourselves, myself definitely included.
In the context of OSS/BSS, DBA has multiple meanings but I think the most relevant is Death By Acronym (don’t worry all you Database Administrators out there, I haven’t forgotten about you). Our industry is awash with TLAs (Three-Letter Acronyms) that lead to DBA.
Having said that, today’s article is about four that are commonly used in relation to end to end workflows through our OSS/BSS stacks. They often traverse different products, possibly even multiple different vendors’ products. They are as follows:
P2O – Prospect to Order – This workflow operates across the boundary between the customer and the customer-facing staff at the service provider. It allows staff to check what products can be offered to a customer. This includes service qualification (SQ), feasibility checks, then design, assign and reserve resources.
O2A – Order to Activate – This workflow includes all activities to manage customer services across entire life-cycles. That is, not just the initial activation of a service, but in-flight changes during activation and post-activation changes as well
U2C – Usage to Cash – This workflow allows customers or staff to evaluate the usage or consumption of a service (or services) that has already been activated for a customer
T2R – Trouble to Resolve – This “workflow” is more like a bundle of workflows that relate to assuring health of the services (and the network that carries them). They can be categorised as reactive (ie a customer triggers a resolution workflow by flagging an issue to the service provider) or a proactive (ie the service provider identifies and issue, degradation or potential for issue and triggers a resolution workflow internally)
PS. I recently read a vendor document that described additional flows:- I2I (Idea to Implementation – service onboarding, through a catalog presumably), P2P (Plan to Production – resource provisioning) and O2S (Order to Service). There’s also C2M (Concept to Market), L2C (Lead to Cash) and I’m sure I’m forgetting a number of others. Are there any additional TLAs that I should be listing here to describe end-to-end workflows?
There’s a famous Zig Ziglar quote that goes something like, “You can have everything in life you want, if you will just help enough other people get what they want.”
You could safely assume that this was written for the individual reader, but there is some truth in it within the OSS context too. For the OSS designer, builder, integrator, does the statement “You can have everything in your OSS you want, if you will just help enough other people get what they want,” apply?
We often just think about the O in OSS – Operations people, when looking for who to help. But OSS/BSS has the ability to impact far wider than just the Ops team/s.
The halcyon days of OSS were probably in the 1990’s to early 2000’s when the term OSS/BSS was at its most sexy and exciting. The big telcos were excitedly spending hundreds of millions of dollars. Those projects were huge… and hugely complex… and hugely fun!
With that level of investment, there was the expectation that the OSS/BSS would help many people. And they did. But the lustre has come off somewhat since then. We’ve helped sooooo many people, but perhaps didn’t help enough people enough. Just speak with anybody involved with an OSS/BSS stack and you’ll hear hints of a large gap that exists between their current state and a desired future state.
Do you mind if I ask two questions?
When you reflect on your OSS activities, do you focus on the technology, the opportunities or the problems
Do you look at the local, day-to-day activities or the broader industry
I tend to find myself focusing on the problems – how to solve them within the daily context on customer challenges, but the broader industry problems when I take the time to reflect, such as writing these blogs.
The part I find interesting is that we still face most of the same problems today that we did back in the 1990’s-2000’s. The same source of risks. We’ve done a fantastic job of helping many people get what they want on their day-to-day activities (the incremental). We still haven’t cracked the big challenges though. That’s why I wrote the OSS Call for Innovation, to articulate what lays ahead of us.
It’s why I’m really excited about two of the concepts we’ve discussed this week:
I’d like to introduce the concept of CT/IR – Continual Test / Incremental Resilience. Analogous to CI/CD (Continuous Integration / Continuous Delivery) before it, CT/IR is a method to systematically and programmatically test the resilience of the network, then ensuring resilience is continually improving.
This is done by storing a knowledge base of failure cases, pre-emptively triggering them and then recording the results as seed data (for manual or AI / ML observations). Using traditional techniques, we look at event logs and try to reverse-engineer what the root-cause MIGHT be. In the case of CT/IR, the root-cause is certain. We KNOW the root-cause because we systematically and intentionally triggered it.
The continual, incremental improvement in resiliency potentially comes via multiple feedback loops:
Ideally, the existing resilience mechanisms work around or overcome any degradation or failure in the network
The continual triggering of faults into the network will provide additional seed data for AI/ML tools to learn from and improve upon, especially root-cause analysis
We can program the network to overcome the problem (eg turn up extra capacity, re-engineer traffic flows, change configurations, etc). Having the NaaS that we spoke about yesterday, provides greater programmability for the network by the way.
We can implement systematic programs / projects to fix endemic faults or weak spots in the network *
Perform regression tests to constantly stress-test the network as it evolves through network augmentation, new device types, etc
Now, you may argue that no carrier in their right mind will allow intentional faults to be triggered. So that’s where we unleash the chaos monkeys on our digital twin technology and/or PSUP (Production Support) environments at first. Then on our prod network if we develop enough trust in it.
I live in Australia, which suffers from severe bushfires every summer. Our fire-fighters spend a lot of time back-burning during the cooler months to reduce flammable material and therefore the severity of summer fires. Occasionally the back-burns get out of control, causing problems. But they’re still done for the greater good. The same principle could apply to unleashing chaos monkeys on a production network… once you’re confident in your ability to control the problems that might follow.
* When I say network, I’m also referring to the physical and logical network, but also support functions such as EMS (Element Management Systems), NCM (Network Configuration Management tools), backup/restore mechanisms, service order replay processes in the event of an outage, OSS/BSS, NaaS, etc.
As the title suggests above, NaaS has the potential to be as big a paradigm shift for networks (and OSS/BSS) as Agile has been for software development.
There are many facets to the Agile story, but for me one of the most important aspects is that it has taken end-to-end (E2E), monolithic thinking and has modularised it. Agile has broken software down into pieces that can be worked on by smaller, more autonomous teams than the methods used prior to it.
The same monolithic, E2E approach pervades the network space currently. If a network operator wants to add a new network type or a new product type/bundle, large project teams must be stood up. And these project teams must tackle E2E complexity, especially across an IT stack that is already a spaghetti of interactions.
But before I dive into the merits of NaaS, let me take you back a few steps, back into the past. Actually, for many operators, it’s not the past, but the current-day model.
As per the orange arrow, customers of all types (Retail, Enterprise and Wholesale) interact with their network operator through BSS (and possibly OSS) tools. [As an aside, see this recent post for a “religious war” discussion on where BSS ends and OSS begins]. The customer engagement occurs (sometimes directly, sometimes indirectly) via BSS tools such as:
Order Entry, Order Management
Product Catalog (Product / Offer Management)
SLA (Service Level Agreement) Management
If the customer wants a new instance of an existing service, then all’s good with the current paradigm. Where things become more challenging is when significant changes occur (as reflected by the yellow arrows in the diagram above).
For example, if any of the following are introduced, there are end-to-end impacts. They necessitate E2E changes to the IT spaghetti and require formation of a project team that includes multiple business units (eg products, marketing, IT, networks, change management to support all the workers impacted by system/process change, etc)
A new product or product bundle is to be taken to market
An end-customer needs a custom offering (especially in the case of managed service offerings for large corporate / government customers)
A new network type is added into the network
System and / or process transformations occur in the IT stack
If we just narrow in on point 3 above, fundamental changes are happening in network technology stacks already. Network virtualisation (SDN/NFV) and 5G are currently generating large investments of time and money. They’re fundamental changes because they also change the shape of our traditional OSS/BSS/IT stacks, as follows.
We now not only have Physical Network Functions (PNF) to manage, but Virtual Network Functions (VNF) as well. In fact it now becomes even more difficult because our IT stacks need to handle PNF and VNF concurrently. Each has their own nuances in terms of over-arching management.
The virtualisation of networks and application infrastructure means that our OSS see greater southbound abstraction. Greater southbound abstraction means we potentially lose E2E visibility of physical infrastructure. Yet we still need to manage E2E change to IT stacks for new products, network types, etc.
The diagram below shows how NaaS changes the paradigm. It de-couples the network service offerings from the network itself. Customer Facing Services (CFS) [as presented by BSS/OSS/NaaS] are de-coupled from Resource Facing Services (RFS) [as presented by the network / domains].
NaaS becomes a “meet-in-the-middle” tool. It effectively de-couples
The products / marketing teams (who generate customer offerings / bundles) from
The networks / operations teams (who design, build and maintain the network).and
The IT teams (who design, build and maintain the IT stack)
It allows product teams to be highly creative with their CFS offerings from the available RFS building blocks. Consider it like Lego. The network / ops teams create the building blocks and the products / marketing teams have huge scope for innovation. The products / marketing teams rarely need to ask for custom building blocks to be made.
You’ll notice that the entire stack shown in the diagram below is far more modular than the diagram above. Being modular makes the network stack more suited to being worked on by smaller autonomous teams. The yellow arrows indicate that modularity, both in terms of the IT stack and in terms of the teams that need to be stood up to make changes. Hence my claim that NaaS is to networks what Agile has been to software.
You will have also noted that NaaS allows the Network / Resource part of this stack to be broken into entirely separate network domains. Separation in terms of IT stacks, management and autonomy. It also allows new domains to be stood up independently, which accommodates the newer virtualised network domains (and their VNFs) as well as platforms such as ONAP.
The NaaS layer comprises:
A TMF standards-based API Gateway
A Master Services Catalog
A common / consistent framework of presentation of all domains
The ramifications of this excites me even more that what’s shown in the diagram above. By offering access to the network via APIs and as a catalog of services, it allows a large developer pool to provide innovative offerings to end customers (as shown in the green box below). It opens up the long tail of innovation that we discussed last week.
Some telcos will open up their NaaS to internal or partner developers. Others are drooling at the prospect of offering network APIs for consumption by the market.
You’ve probably already identified this, but the awesome thing for the developer community is that they can combine services/APIs not just from the telcos but any other third-party providers (eg Netflix, Amazon, Facebook, etc, etc, etc). I could’ve shown these as East-West services in the diagram but decided to keep it simpler.
Developers are not constrained to offering communications services. They can now create / offer higher-order services that also happen to have communications requirements.
If you weren’t already on board with the concept, hopefully this article has convinced you that NaaS will be to networks what Agile has been to software.
Agree or disagree? Leave me a comment below.
PS1. I’ve used the old TMN pyramid as the basis of the diagram to tie the discussion to legacy solutions, not to imply size or emphasis of any of the layers.
PS3. Similarly, the size of the NaaS layer is to bring attention to it rather than to imply it is a monolithic stack in it’s own right. In reality, it is actually a much thinner shim layer architecturally
PS4. The analogy between NaaS and Agile is to show similarities, not to imply that NaaS replaces Agile. They can definitely be used together
PS5. I’ve used the term IT quite generically (operationally and technically) just to keep the diagram and discussion as simple as possible. In reality, there are many sub-functions like data centre operations, application monitoring, application control, applications development, product owner, etc. These are split differently at each operator.
OSS projects are full of risks we all know it. OSS projects have “earned” a bad name because of all those risks. On the other side of that same coin, OSS projects disappoint, in part I suspect because stakeholders expect such big things from their resource investments.
Ask anyone familiar with OSS projects and you’ll be sure to hear a long list of failings.
For those less familiar with what an OSS project has in store for you, I’d like to share a list of the most common risks I’ve seen on OSS projects.
Most people working in the OSS industry are technology-centric, so they’ll tend to cite risks that relate to the tech. That’s where I used to focus attention too. Now technology risk definitely exists, but as you’ll see below, I tend to start by looking at other risk factors first these days.
Most common OSS project risks / issues:
Complexity (to be honest, this is probably more the root-cause / issue that manifests as many of the following risks). However, complexity across many aspects of OSS projects is one of the biggest problem sources
Change Management – OSS tend to introduce significant change to an organisation – operationally, organisationally, processes, training, etc. This is probably the most regularly underestimated component of any large OSS build
Stakeholder Support / Politics – Challenges appear on every single OSS project. They invariably need strong support from stakeholders and sponsors to clear a path through the biggest challenges. If the project’s leaders aren’t fully committed and in unison, the delivery teams will be heavily constrained
Ill-defined Scope – Over-scoping, scope omission and scope creep all represent risks to an OSS project. But scope is never perfectly defined or static, so scope management mechanisms need to be developed up-front rather than in-flight. Tying back to point 1 above, complexity minimisation should be a key element of scope planning. To hark back to my motto for OSS, “just because we can, doesn’t mean we should)
Financial and commercial – As with scope, it’s virtually impossible to plan an OSS project to perfection. There are always unknowns.These unknowns can directly impact the original estimates. Projects with blow-outs and no contingency for time or money increase pressure on point 3 (stakeholders/sponsors) to maintain their support
Client resource skills / availability – An OSS has to be built to the needs of a client. If the client is unable to provide resources to steer the implementation, then it’s unlikely for the client to get a solution that is perfectly adapted to the client’s needs. One challenge for the client is that their most valuable guides, those with the client’s tribal knowledge, are also generally in high demand by “business as usual” teams. It becomes a challenge to allocate enough of their time to guide the OSS delivery team. Another challenge is augmenting the team with the required skill-set when a project introduces new skill requirements
Communication – OSS projects aren’t built in a vacuum. They have many project contributors and even more end-users. There are many business units that touch an OSS/BSS, each with their own jargon and interpretations. For example, how many alternate uses of the term “service” can you think of? I think an important early-stage activity is to agree on and document naming conventions
Culture – Of the client team and/or project team. Culture contributes to (or detracts from) motivation, morale, resource turnover, etc, which can have an impact on the team’s ability to deliver
Design / Integration – Finally, a technology risk. This item is particularly relevant with complex projects, it can be difficult for all of the planned components to operate and integrate as planned. A commonly unrecognised risk relates to the viability of implementing a design. It’s common for an end-state design to be specified but with no way of navigating through a series of steps / phases and reach the end-state
Technology – Similar to the previous point, there are many technology risks relating to items such as quality, scalability, resiliency, security, supportability, obsolescence, interoperability, etc
There’s one thing you will have probably noticed about this list. Most of the risks are common to other projects, not just OSS projects. However, the risks do tend to amplify on OSS projects because of their inherent complexity.
Network operators spend huge amounts on building and maintaining their OSS/BSS every year. There are many reasons they invest so heavily, but in most cases it can be distilled back to one thing – improving operational efficiency.
And our OSS/BSS definitely do improve operational efficiency, but there are still so many sources of friction. They’re squeaking like un-oiled bearings. Here are just a few of the common sources:
Identifying best-fit tools
Procurement of new tools
Update / release processes
Continuous data quality / consistency improvement
Navigating to all features through the user interface
Non-intuitive functionality / processes
So many variants / complexity that end-users take years to attain expert-level capability
Integration / interconnect
Getting new starters up to speed
Getting proficient operators to expertise
Unlocking actionable insights from huge data piles
Resolving the root-cause of complex faults
Onboarding new customers
Productionising new functionality
Exception and fallout handling
Access to supplier expertise to resolve challenges
The list goes on far deeper than that list too. The challenge for many OSS product teams, for any number of reasons, is that their focus is on adding new features rather than reducing friction in what already exists.
The challenge for product teams is diagnosing where the friction and risks are for their customers / stakeholders. How do you get that feedback?
Every vendor has a product support team, so that’s a useful place to start, both in terms of what’s generating the most support calls and in terms of first-hand feedback from customers
Do you hold user forums on a regular basis, where you get many of your customers together to discuss their challenges, your future roadmap, new improvements / features
Does your process “flow” data show where the sticking points are for operators
Do you conduct gemba walks with your customers
Do you have a program of ensuring all developers spend at least a few days a year interacting directly with customers on their site/s
Do you observe areas of difficulty when delivering training
Do you go out of your way to ask your customers / stakeholders questions that are framed around their pain-points, not just framed within the context of your existing OSS
Do you conduct customer surveys? More importantly, do you conduct surveys through an independent third-party?
On the last dot-point, I’ve been surprised at some of the profound insights end-users have shared with me when I’ve been conducting these reviews as the independent interviewer. I’ve tended to find answers are more open / honest when being delivered to an independent third-party than if the supplier asks directly. If you’d like assistance running a third-party review, leave us a note on the contact page. We’d be delighted to assist.
One of the longer lead-time items in relation to OSS data and processes is in network build and customer connections. From the time when capacity planning or a customer order creates the signal to build, it can be many weeks or months before the physical infrastructure work is complete and appearing in the OSS.
There are two financial downsides to this. Firstly, it tends to be CAPEX-heavy with equipment, construction, truck-rolls, government approvals, etc burning through money. Meanwhile, it’s also a period where there is no money coming in because the services aren’t turned on yet. The time-to-cash cycle of new build (or augmentation) is the bane of all telcos.
This is one of the exciting aspects of network virtualisation for telcos. In a time where connectivity is nearly ubiquitous in most countries, often with high-speed broadband access, physical build becomes less essential (except over-builds). Technologies such as uCPE (Universal Customer Premises Equipment), NFV (Network Function Virtualisation), SD WAN (Software-Defined Wide Area Networks), SDN (Software Defined Networks) and others mean that we can remotely upgrade and reconfigure the network without field work.
Network virtualisation gives the potential to speed up many of the slowest, and costliest processes that run through our OSS… but only if our OSS can support efficient orchestration of virtualised networks. And that means having an OSS with the flexibility to easily change out slow processes to replace them with fast ones without massive overhauls.