Seems this post from last week has triggered some really interesting debate – Is your service assurance really service assurance?? (Part 5). It was a post that looked into collecting end-to-end service metrics rather than our traditional method of collecting network device events/metrics and trying to reverse-engineer to form a service-level perspective.
Thought I’d give you an update. I’m thinking along the following lines, but admit that I don’t have it all worked out by any means yet:
We need to concept of span like OpenTelemetry does between microservices (in a way, it’s like nearest-neighbour of where each packet is getting pushed).
Note that for us a span is on a service-by-service basis between nodes, not just a network link-by-link basis between nodes
We need to be able to measure the real-time metrics of the performance of each span as well as any events/faults impacting them
One challenge (one of probably many) is how to avoid flooding the data/management planes. Possibly a telemetry beacon at each node that’s aggregating performance/events of each packet passed for each service?? But what aggregation-window / cache-size to use? Still too impossibly huge to process except with ridiculously low sampling rates??
By chaining the spans we get a real-time, end-to-end trace of services and the performance (and real-time snapshot of service-by-service resource usage in a packet-switched network)
How to efficiently get the beacon data to a centralised logging/management point? Send beacons via management plane? Send via data plane? Take an approach similar to Netflow / IPFIX-style protocols?
How to store data for a short period (ie for real-time analysis/reporting) as well as for long periods. Due to volumes, we’d have to apply aging policies to the data, but it would still be valuable for the purpose of mid and long-term SLA, network health, optimisation, capacity management, etc
As you can see, there are still so many wide-open questions about the feasibility of the concept. But getting feedback from multiple very clever people who read this blog is definitely helping! Thank you!!
I also just stumbled upon OpenTelemetry, an open source project designed to capture traces / metrics / logs from apps / microservices. It intrigued me because just as you have the concept of traces / metrics / logs for apps, you similarly have traces / metrics / logs for networks.
In the network world, we’re good at getting metrics / logs / events, but not very good at getting trace data (ie end-to-end service chains) as described earlier in this blog series. And if we can’t monitor traces, we can’t easily interpret a customer’s experience whilst they’re using their network service. We currently do “service assurance” by reverse-engineering logs / events, which seems a bit backward to me.
Take a closer look at the OpenTelemetry link above, which provides an overview of how their team is going to gather application telemetry. With increasing software-ification of our networks (eg SDN / NFV) and the use of microservices / NaaS / APIs in our management stacks, could this actually be our path to the holy grail of service assurance (ie capturing trace data – network service telemetry)?? Is it data plane? Is it control / management plane? Is it something in between?
Note: The “active measurements” approach described in part 3 is slightly compromised in current form, which is why I’m so intrigued by the potential of extending the concepts of OpenTelemetry into our software / virtual networks.
I’d really love your take on this one because I’m sure there are many elements to this that I haven’t thought through yet. Please leave your thoughts on the viability of the approach.
“Whatever is well conceived is clearly said,
And the words to say it flow with ease.”
I’d like to hijack this quote and re-direct it towards architectures. Could we equally state that a well conceived architecture can be clearly understood? Some modern OSS/IT frameworks that I’ve seen recently are hugely complex. The question I’ve had to ponder is whether they’re necessarily complex. As the aphorism states, “Everything should be made as simple as possible, but not simpler.”
Just take in the complexity of this triptych I prepared to overlay SDN, NFV and MANO frameworks.
Yet this is only a basic model. It doesn’t consider networks with a blend of PNF and VNF (Physical and Virtual Network Functions). It doesn’t consider closed loop assurance. It doesn’t consider other automations, or omni-channel, or etc, etc.
Yesterday’s post raised an interesting concept from Tom Nolle that as our solutions become more complex, our ability to make a basic assessment of value becomes more strained. And by implication, we often need to upskill a team before even being able to assess the value of a proposed project.
It seems to me that we need simpler architectures to be able to generate persuasive business cases. But it poses the question, do they need to be complex or are our solutions just not well enough conceived yet?
To borrow a story from Wikiquote, “Richard Feynman, the late Nobel Laureate in physics, was once asked by a Caltech faculty member to explain why spin one-half particles obey Fermi Dirac statistics. Rising to the challenge, he said, “I’ll prepare a freshman lecture on it.” But a few days later he told the faculty member, “You know, I couldn’t do it. I couldn’t reduce it to the freshman level. That means we really don’t understand it.“
“…as technology gets more complicated, it becomes more difficult for buyers to acquire the skills needed to make even a basic assessment of value. Without such an assessment, it’s hard to get a project going, and in particular hard to get one going the right way.”
Have you noticed that over the last few years, OSS choice has proliferated, making project assessment more challenging? Previously, the COTS (Commercial Off-the-Shelf) product solution dominated. That was already a challenge because there are hundreds to choose from (there are around 400 on our vendors page alone). But that’s just the tip of the iceberg.
We now also have choices to make across factors such as:
Building OSS tools with open-source projects
An increasing amount of in-house development (as opposed to COTS implementations by the product’s vendors)
Smaller niche products that need additional integration
An increase in the number of “standards” that are seeking to solve traditional OSS/BSS problems (eg ONAP, ETSI’s ZSM, TM Forum’s ODA, etc, etc)
Revolutions from the IT world such as cloud, containerisation, virtualisation, etc
As Tom indicates in the quote above, the diversity of skills required to make these decisions is broadening. Broadening to the point where you generally need a large team to have suitable skills coverage to make even a basic assessment of value.
At Passionate About OSS, we’re seeking to address this in the following ways:
We have two development projects underway (more news to come)
One to simplify the vendor / product selection process
One to assist with up-skilling on open-source and IT tools to build modern OSS
In addition to existing pages / blogs, we’re assembling more content about “standards” evolution, which should appear on this blog in coming days
Use our “Finding an Expert” tool to match experts to requirements
And of course there are the variety of consultancy services we offer ranging from strategy, roadmap, project business case and vendor selection through to resource identification and implementation. Leave us a message on our contact page if you’d like to discuss more
One of the benefits of virtualisation or NaaS (Network as a Service) is that it provides a layer of programmability to your network. That is, to be able to instantiate network services by software through a network API. Virtualisation also tends to assume/imply that there is a huge amount of available capacity (the resource pool) that it can shift workloads between. If one virtual service instance dies or deteriorates, then just automatically spin up another. If one route goes down, customer services are automatically re-directed via alternate routes and the service is maintained. No problem…
But there are some problems that can’t be solved in software. You can’t just use software to fix a cable that’s been cut by an excavator. You can’t just use software to fix failed electronics. Modern virtualised networks can do a great job of self-healing, routing around the problem areas. But there are still physical failures that need to be repaired / replaced / maintained by a field workforce. NSA doesn’t tend to cover that.
Looking at the diagram below, NSA does a great job of the closed-loop assurance within the red circle. But it then needs to kick out to the green closed-loop assurance processes that are already driven by our OSS/BSS.
As described in the link above, “Perhaps if the NSA was just assuring the yellow cloud/s, any time it identifies any physical degradation / failure in the resource pool, it kicks a notification up to the Customer Service Assurance (CSA) tools in the OSS/BSS layers? The OSS/BSS would then coordinate 1) any required customer notifications and 2) any truck rolls or fixes that can’t be achieved programmatically; just like it already does today. The additional benefit of this two-tiered assurance approach is that NSA can handle the NFV / VNF world, whilst not trying to replicate the enormous effort that’s already been invested into the CSA (ie the existing OSS/BSS assurance stack that looks after PNFs, other physical resources and the field workforce processes that look after it all).”
Therefore, a key part of the NSA process is how it kicks up from closed-loop 1 to closed-loop 2. Then, after closed-loop 2 has repaired the physical problem, NSA needs to be aware that the repaired resource is now back in the pool of available resources. Does your NSA automatically notice this, or must it receive a notification from closed loop 2?
It could be as simple as NSA sending alarms into the alarm list with a clearly articulate root-cause. The alarm has a ticket/s raised against it. The ticket triggers the field workforce to rectify it and the triggers customer assurance teams/tools to send notifications to impacted customers (if indeed they send notifications to customers who may not actually be effected yet due to the resilience measures that have kicked in). Standard OSS/BSS practice!
Let me start today with a question: Does your future OSS/BSS need to be drastically different to what it is today?
Please leave me a comment below, answering yes or no.
I’m going to take a guess that most OSS/BSS experts will answer yes to this question, that our future OSS/BSS will change significantly. It’s the reason I wrote the OSS Call for Innovation manifesto some time back. As great as our OSS/BSS are, there’s still so much need for improvement.
But big improvement needs big change. And big change is scary, as Tom Nolle points out:
“IT vendors, like most vendors, recognize that too much revolution doesn’t sell. You have to creep up on change, get buyers disconnected from the comfortable past and then get them to face not the ultimate future but a future that’s not too frightening.”
Do you feel like we’re already in the midst of a revolution? Cloud computing, web-scaling and virtualisation (of IT and networks) have been partly responsible for it. Agile and continuous integration/delivery models too.
The following diagram shows a “from the moon” level view of how I approach (almost) any new project.
The key to Tom’s quote above is in step 2. Just how far, or how ambitious, into the future are you projecting your required change? Do you even know what that future will look like? After all, the environment we’re operating within is changing so fast. That’s why Tom is suggesting that for many of us, step 2 is just a “creep up on it change.” The gap is essentially small.
The “creep up on it change” means just adding a few new relatively meaningless features at the end of the long tail of functionality. That’s because we’ve already had the most meaningful functionality in our OSS/BSS for decades (eg customer management, product / catalog management, service management, service activation, network / service health management, inventory / resource management, partner management, workforce management, etc). We’ve had the functionality, but that doesn’t mean we’ve perfected the cost or process efficiency of using it.
So let’s say we look at step 2 with a slightly different mindset. Let’s say we don’t try to add any new functionality. We lock that down to what we already have. Instead we do re-factoring and try to pull the efficiency levers, which means changes to:
Platforms (eg cloud computing, web-scaling and virtualisation as well as associated management applications)
Methodologies (eg Agile, DevOps, CI/CD, noting of course that they’re more than just methodologies, but also come with tools, etc)
Process (eg User Experience / User Interfaces [UX/UI], supply chain, business process re-invention, machine-led automations, etc)
It’s harder for most people to visualise what the Step 2 Future State looks like. And if it’s harder to envisage Step 2, how do we then move onto Steps 3 and 4 with confidence?
This is the challenge for OSS/BSS vendors, supplier, integrators and implementers. How do we, “get buyers disconnected from the comfortable past and then get them to face not the ultimate future but a future that’s not too frightening?” And I should point out, that it’s not just buyers we need to get disconnected from the comfortable past, but ourselves, myself definitely included.
Back in the old days, Network Service Assurance probably had a different meaning than it might today.
Clearly it’s assurance of a network service. That’s fairly obvious. But it’s in the definition of “network service” where the old and new terminologies have the potential to diverge.
In years past, telco networks were “nailed up” and network functions were physical appliances. I would’ve implied (probably incorrectly, but bear with me) that a “network service” was “owned” by the carrier and was something like a bearer circuit (as distinct from a customer service or customer circuit). Those bearer circuits, using protocols such as in DWDM, SDH, SONET, ATM, etc potentially carried lots of customer circuits so they were definitely worth assuring. And in those nailed-up networks, we knew exactly which network appliances / resources / bearers were being utilised. This simplified service impact analysis (SIA) and allowed targeted fault-fix.
In those networks the OSS/BSS was generally able to establish a clear line of association from customer service to physical resources as per the TMN pyramid below. Yes, some abstraction happened as information permeated up the stack, but awareness of connectivity and resource utilisation was generally retained end-to-end (E2E).
But in the more modern computer or virtualised network, it all goes a bit haywire, perhaps starting right back at the definition of a network service.
The modern “network service” is more aligned to ETSI’s NFV definition – “a composition of network functions and defined by its functional and behavioral specification. The Network Service contributes to the behaviour of the higher layer service, which is characterised by at least performance, dependability, and security specifications. The end-to-end network service behaviour is the result of a combination of the individual network function behaviours as well as the behaviours of the network infrastructure composition mechanism.”
They are applications running at OSI’s application layer that can be consumed by other applications. These network services include DNS, DHCP, VoIP, etc, but the concept of NaaS (Network as a Service) expands the possibilities further.
So now the customer services at the top of the pyramid (BSS / BML) are quite separated from the resources at the physical layer, other than to say the customer services consume from a pool of resources (the yellow cloud below). Assurance becomes more disconnected as a result.
OSS/BSS are able to tie customer services to pools of resources (the yellow cloud). And OSS/BSS tools also include PNI / WFM (Physical Network Inventory / Workforce Management) to manage the bottom, physical layer. But now there’s potentially an opaque gulf in the middle where virtualisation / NaaS exists.
The end-to-end association between customer services and the physical resources that carry them is lost. Unless we can find a way to establish E2E association, we just have to hope that our modern Network Service Assurance (NSA) tools make the yellow cloud robust to the point of infallibility. BTW. If the yellow cloud includes NaaS, then the NSA has to assure the NaaS gateway, catalog and all services instantiated through the gateway.
But as we know, there will always be failures in physical infrastructure (cable cuts, electronic malfunctions, etc). The individual resources can’t afford to be infallible, even if the resource pool seeks to provide collective resiliency.
Modern NSA has to find a way to manage the resource pool but also coordinate fault-fix in the physical resources that underpin it like the OSS used to do (still do??). They have to do more than just build policies and actions to ensure SLAs don’t they? They can seek to manage security, power, performance, utilisation and more. Unfortunately, not everything can be fixed programmatically, although that is a great place for NSA to start.
Perhaps if the NSA was just assuring the yellow cloud, any time it identifies any physical degradation / failure in the resource pool, it kicks a notification up to the Customer Service Assurance (CSA) tools in the OSS/BSS layers? The OSS/BSS would then coordinate 1) any required customer notifications and 2) any truck rolls or fixes that can’t be achieved programmatically; just like it already does today. The additional benefit of this two-tiered assurance approach is that NSA can handle the NFV / VNF world, whilst not trying to replicate the enormous effort that’s already been invested into the CSA (ie the existing OSS/BSS assurance stack that looks after PNFs, other physical resources and the field workforce processes that look after it all).
I’d love to hear your thoughts. Hopefully you can even correct me if/where I’m wrong.
As the title suggests above, NaaS has the potential to be as big a paradigm shift for networks (and OSS/BSS) as Agile has been for software development.
There are many facets to the Agile story, but for me one of the most important aspects is that it has taken end-to-end (E2E), monolithic thinking and has modularised it. Agile has broken software down into pieces that can be worked on by smaller, more autonomous teams than the methods used prior to it.
The same monolithic, E2E approach pervades the network space currently. If a network operator wants to add a new network type or a new product type/bundle, large project teams must be stood up. And these project teams must tackle E2E complexity, especially across an IT stack that is already a spaghetti of interactions.
But before I dive into the merits of NaaS, let me take you back a few steps, back into the past. Actually, for many operators, it’s not the past, but the current-day model.
As per the orange arrow, customers of all types (Retail, Enterprise and Wholesale) interact with their network operator through BSS (and possibly OSS) tools. [As an aside, see this recent post for a “religious war” discussion on where BSS ends and OSS begins]. The customer engagement occurs (sometimes directly, sometimes indirectly) via BSS tools such as:
Order Entry, Order Management
Product Catalog (Product / Offer Management)
SLA (Service Level Agreement) Management
If the customer wants a new instance of an existing service, then all’s good with the current paradigm. Where things become more challenging is when significant changes occur (as reflected by the yellow arrows in the diagram above).
For example, if any of the following are introduced, there are end-to-end impacts. They necessitate E2E changes to the IT spaghetti and require formation of a project team that includes multiple business units (eg products, marketing, IT, networks, change management to support all the workers impacted by system/process change, etc)
A new product or product bundle is to be taken to market
An end-customer needs a custom offering (especially in the case of managed service offerings for large corporate / government customers)
A new network type is added into the network
System and / or process transformations occur in the IT stack
If we just narrow in on point 3 above, fundamental changes are happening in network technology stacks already. Network virtualisation (SDN/NFV) and 5G are currently generating large investments of time and money. They’re fundamental changes because they also change the shape of our traditional OSS/BSS/IT stacks, as follows.
We now not only have Physical Network Functions (PNF) to manage, but Virtual Network Functions (VNF) as well. In fact it now becomes even more difficult because our IT stacks need to handle PNF and VNF concurrently. Each has their own nuances in terms of over-arching management.
The virtualisation of networks and application infrastructure means that our OSS see greater southbound abstraction. Greater southbound abstraction means we potentially lose E2E visibility of physical infrastructure. Yet we still need to manage E2E change to IT stacks for new products, network types, etc.
The diagram below shows how NaaS changes the paradigm. It de-couples the network service offerings from the network itself. Customer Facing Services (CFS) [as presented by BSS/OSS/NaaS] are de-coupled from Resource Facing Services (RFS) [as presented by the network / domains].
NaaS becomes a “meet-in-the-middle” tool. It effectively de-couples
The products / marketing teams (who generate customer offerings / bundles) from
The networks / operations teams (who design, build and maintain the network).and
The IT teams (who design, build and maintain the IT stack)
It allows product teams to be highly creative with their CFS offerings from the available RFS building blocks. Consider it like Lego. The network / ops teams create the building blocks and the products / marketing teams have huge scope for innovation. The products / marketing teams rarely need to ask for custom building blocks to be made.
You’ll notice that the entire stack shown in the diagram below is far more modular than the diagram above. Being modular makes the network stack more suited to being worked on by smaller autonomous teams. The yellow arrows indicate that modularity, both in terms of the IT stack and in terms of the teams that need to be stood up to make changes. Hence my claim that NaaS is to networks what Agile has been to software.
You will have also noted that NaaS allows the Network / Resource part of this stack to be broken into entirely separate network domains. Separation in terms of IT stacks, management and autonomy. It also allows new domains to be stood up independently, which accommodates the newer virtualised network domains (and their VNFs) as well as platforms such as ONAP.
The NaaS layer comprises:
A TMF standards-based API Gateway
A Master Services Catalog
A common / consistent framework of presentation of all domains
The ramifications of this excites me even more that what’s shown in the diagram above. By offering access to the network via APIs and as a catalog of services, it allows a large developer pool to provide innovative offerings to end customers (as shown in the green box below). It opens up the long tail of innovation that we discussed last week.
Some telcos will open up their NaaS to internal or partner developers. Others are drooling at the prospect of offering network APIs for consumption by the market.
You’ve probably already identified this, but the awesome thing for the developer community is that they can combine services/APIs not just from the telcos but any other third-party providers (eg Netflix, Amazon, Facebook, etc, etc, etc). I could’ve shown these as East-West services in the diagram but decided to keep it simpler.
Developers are not constrained to offering communications services. They can now create / offer higher-order services that also happen to have communications requirements.
If you weren’t already on board with the concept, hopefully this article has convinced you that NaaS will be to networks what Agile has been to software.
Agree or disagree? Leave me a comment below.
PS1. I’ve used the old TMN pyramid as the basis of the diagram to tie the discussion to legacy solutions, not to imply size or emphasis of any of the layers.
PS3. Similarly, the size of the NaaS layer is to bring attention to it rather than to imply it is a monolithic stack in it’s own right. In reality, it is actually a much thinner shim layer architecturally
PS4. The analogy between NaaS and Agile is to show similarities, not to imply that NaaS replaces Agile. They can definitely be used together
PS5. I’ve used the term IT quite generically (operationally and technically) just to keep the diagram and discussion as simple as possible. In reality, there are many sub-functions like data centre operations, application monitoring, application control, applications development, product owner, etc. These are split differently at each operator.
One of the longer lead-time items in relation to OSS data and processes is in network build and customer connections. From the time when capacity planning or a customer order creates the signal to build, it can be many weeks or months before the physical infrastructure work is complete and appearing in the OSS.
There are two financial downsides to this. Firstly, it tends to be CAPEX-heavy with equipment, construction, truck-rolls, government approvals, etc burning through money. Meanwhile, it’s also a period where there is no money coming in because the services aren’t turned on yet. The time-to-cash cycle of new build (or augmentation) is the bane of all telcos.
This is one of the exciting aspects of network virtualisation for telcos. In a time where connectivity is nearly ubiquitous in most countries, often with high-speed broadband access, physical build becomes less essential (except over-builds). Technologies such as uCPE (Universal Customer Premises Equipment), NFV (Network Function Virtualisation), SD WAN (Software-Defined Wide Area Networks), SDN (Software Defined Networks) and others mean that we can remotely upgrade and reconfigure the network without field work.
Network virtualisation gives the potential to speed up many of the slowest, and costliest processes that run through our OSS… but only if our OSS can support efficient orchestration of virtualised networks. And that means having an OSS with the flexibility to easily change out slow processes to replace them with fast ones without massive overhauls.
One popular approach is to build a proof-of-concept or sandpit quickly on cloud hosting or in lab environments. It’s fast for a number of reasons including reduced number of approvals, faster activation of infrastructure, reduced safety checks (eg security, privacy, etc), minimised integration with legacy systems and many other reasons. The cloud hosting business model is thriving for all of these reasons.
However, it’s one thing to speed up development of an OSS PoC and another entirely to speed up deployment to a PROD environment. As soon as you wish to absorb the PoC-proven solution back into PROD, all the items listed above (eg security sign-offs) come back into play. Something that took days/weeks to stand up in PoC now takes months to productionise.
Have you noticed that the safety checks currently being used were often defined for the old world? They often aren’t designed with transition from cloud to PROD in mind. Similarly, the culture of design cross-checks and approvals can also be re-framed (especially when the end-to-end solution crosses multiple different business units). Lastly, and way outside my locus of competence, is in re-visiting security / privacy / deployment / etc models to facilitate easier transition.
One consideration to make is just how much absorption is required. For example, there are examples of services being delivered to the large entity’s subscribers by a smaller, external entity. The large entity then just “clips-the-ticket,” gaining a revenue stream with limited involvement. But the more common (and much more challenging) absorption model is for the partner to fold the solution back into the large entity’s full OSS/BSS stack.
So let’s consider your opportunity in terms of the absorption continuum that ranges between:
Perhaps it’s feasible for your opportunity to fit somewhere in between (partially absorbed)? Perhaps part of that answer resides in the cloud model you decide to use (public, private, hybrid, cloud-managed private cloud) as well as the partnership model?
Modularity and reduced complexity (eg integrations) are also a factor to consider (as always).
I haven’t seen an ideal response to the absorption challenge yet, but I believe the solution lies in re-framing corporate culture and technology stacks. We’ll look at that in more detail tomorrow.
How about you? Have you or your organisation managed to speed up your transition from PoC to PROD? What techniques have you found to be successful?
As the TMN diagram below describes, each layer up in the network management stack abstracts but connects (as described in more detail in “What an OSS shouldn’t do“). That is, each higher layer reduces the amount if information/control within a domain that it’s responsible for, but it assumes a broader responsibility for connecting multiple domains together.
There’s just one problem with the diagram. It’s a little dated when we take modern virtualised infrastructure into account.
In the old days, despite what the layers may imply, it was common for an OSS to actually touch every layer of the pyramid to resolve faults. That is, OSS regularly connected to NMS, EMS and even devices (NE) to gather network health data. The services defined at the top of the stack (BSS) could be traced to the exact devices (NE / NEL) via the circuits that traversed them, regardless of the layers of abstraction. It helped for root-cause analysis (RCA) and service impact analysis (SIA).
But with modern networks, the infrastructure is virtualised, load-balanced and since they’re packet-switched, they’re completely circuitless (I’m excluding virtual circuits here by the way). The bottom three layers of the diagram could effectively be replaced with a cloud icon, a cloud that the OSS has little chance of peering into (see yellow cloud in the diagram later in this post).
The concept of virtualisation adds many sub-layers of complexity too by the way, as higlighted in the diagram below.
So now the customer services at the top of the pyramid (BSS / BML) are quite separated from the resources at the bottom, other than to say the services consume from a known pool of resources. Fault resolution becomes more abstracted as a result.
But what’s interesting is that there’s another layer that’s not shown on the typical TMN model above. That is the physical network inventory (PNI) layer. The cables, splices, joints, patch panels, equipment cards, etc that underpin every network. Yes, even virtual networks.
In the old networks the OSS touched every layer, including the missing layer. That functionality was provided by PNI management. Fault resolution also occurred at this layer through tickets of work conducted by the field workforce (Workforce Management – WFM).
In new networks, OSS/BSS tie services to resource pools (the top two layers). They also still manage PNI / WFM (the bottom, physical layer). But then there’s potentially an invisible cloud in the middle. Three distinctly different pieces, probably each managed by a different business unit or operational group.
Just wondering – has your OSS/BSS developed control anxiety issues from losing some of the control that it once had?
The advertisement includes the following text:
“Amazon Web Services (AWS) is leading the next paradigm shift in computing and is looking for a world class candidate to manage an elite portfolio of strategic AWS technology partners focused on the Operation support System (OSS) and Business Support System (BSS) applications within telecommunications segment. Your job will be to use these strategic partners to develop OSS and BSS applications on AWS infrastructure and platform.”
How do you read this advertisement? I have a few different perspectives to pose to you:
I can’t predict AWS’ future success with this initiative, but I’m assuming they’re creating the role because they see a big opportunity that they wish to capture. They have plenty of places they could otherwise invest, so they must believe the opportunity is big (eg the industry of OSS suppliers selling to CSPs is worth multi-billions of dollars and is waiting to be disrupted).
OSS/BSS are typically seen by CSPs as a very expensive (and risky) cost of doing business. I’m certain there’s a business model for any organisation (possibly AWS and its tech partners) that can significantly improve the OSS/BSS delivery costs/risks for CSPs.
The ad identifies CSPs (specifically the term, “major telecom infrastructure providers”) as the target customer. You could pose the concept that the CSPs won’t want to support a competitor in AWS. The CSPs I’m dealing with can’t get close to matching AWS cost structures so are partnering with AWS etc. Not just for private cloud, but also public and hybrid cloud too. The clip-the-ticket / partnership selling model appears to be becoming more common for telcos globally, so the fear-of-competition barrier “seems” to be coming down a little.
The other big challenge facing the role is network and data security. What’s surprised me most are core network services like directory services (used for internal authentication/AAA purposes). I never thought I’d see these outsourced to third-party cloud providers, but have seen the beginnings of it recently. If CSPs consume those, then OSS/BSS must be up for grabs at some CSPs too. For example, I’d imagine that OSS/BSS tools were amongst the 1,000 business apps that Verizon is moving to AWS.
The really interesting future consideration could be the advanced innovation that AWS et al could bring to the OSS space, and in ways that the telcos and OSS suppliers simply can’t. This recent post showed Google’s intent to bring AI to network operations. It could revolutionise the OSS/BSS industry. Not just for CSPs, but for their customers as well (eg their enterprise-grade OSS). Could it even represent another small step towards the OSS Doomsday Scenario posed here?
This is the fourth, and final part (I think) in the series on killing the OSS RFI/RFP process, a process that suppliers and customers alike find to be inefficient. The concept is based on an initiative currently being investigated by TM Forum.
The previous three posts focused on the importance of trusted partnerships and the methods to develop them via OSS procurement events.
Today’s post takes a slightly different tack. It proposes a structural obsolescence that may lead to the death of the RFP. We might not have to kill it. It might die a natural death.
Actually, let me take that back. I’m sure RFPs won’t die out completely as a procurement technique. But I can see a time when RFPs are far less common and significantly different in nature to today’s procurement events.
That’s the answer all technologists cite to any form of problem of course. But there’s a growing trend that provides a portent to the future here.
It comes via the XaaS (As a Service) model of software delivery. We’re increasingly building and consuming cloud-native services. OSS of the future, the small-grid model, are likely to consume software as services from multiple suppliers.
And rather than having to go through a procurement event like an RFP to form each supplier contract, the small grid model will simply be a case of consuming one/many services via API contracts. The API contract (eg OpenAPI specification / swagger) will be available for the world to see. You either consume it or you don’t. No lengthy contract negotiation phase to be had.
Now as mentioned above, the RFP won’t die, but evolve. We’ll probably see more RFPs formed between customers and the services companies that will create customised OSS solutions (utilising one/many OSS supplier services). And these RFPs may not be with the massive multinational services companies of today, but increasingly through smaller niche service companies. These micro-RFPs represent the future of OSS work, the gig economy, and will surely be facilitated by smart-RFP / smart-contract models (like the OSS Justice League model).
I wonder if we’re reaching the point where “telecommunication services” is no longer a relevant term? By association, SLAs are also a bust. But what are they replaced by?
A telecommunication service used to effectively be the allocation of a carrier’s resources for use by a specific customer. Now? Well, less so
Service consumption channel alternatives are increasing, from TV and radio; to PC, to mobile, to tablet, to YouTube, to Insta, to Facebook, to a million others.
Consumption sources are even more prolific.
Customer contact channel alternatives are also increasing, from contact centres; to IVR, to online, to mobile apps, to Twitter, etc.
A service bundle often utilises third-party components, some of which are “off-net”
Virtualisation is increasingly abstracting services from specific resources. They’re now loosely coupled with resource pools and rely on high availability / elasticity to ensure customer service continuity. Not only that, but those resource pools might extend beyond the carrier’s direct control and out to cloud provider infrastructure
The growing variant-tree is taking the concept beyond the reach of “customer services” and evolves to become “customer experiences.”
The elements that made up a customer service in the past tended to fall within the locus of control of a telco and its OSS. The modern customer experience extends far beyond the control of any one company or its OSS. An SLA – Service Level Agreement – only pertains to the sub-set of an experience that can be measured by the OSS. We can aspire to offer an ELA – Experience Level Agreement – because we don’t have the mechanisms by which to measure or manage the entire experience yet.
The metrics that matter most for telcos today tend to revolve around customer experience (eg NPS). But aside from customer surveys, ratings and derived / contrived metrics, we don’t have electronic customer experience measurements.
Customer services are dead; Long live the customer experiences king… if only we can invent a way to measure the whole scope of what makes up customer experiences.
The left-hand panel of the triptych below shows the current state of interactions with most OSS. There are hundreds of variants inbound via external sources (ie multi-channel) and even internal sources (eg different service types). Similarly, there are dozens of networks (and downstream systems), each with different interface models. Each needs different formatting and integration costs escalate.
The intent model of network provisioning standardises the network interface, drastically simplifying the task of the OSS and the variants required for it to handle. This becomes particularly relevant in a world of NFVs, where it doesn’t matter which vendor’s device type (router say) can be handled via a single command intent rather than having separate interfaces to each different vendor’s device / EMS northbound interface. The unique aspects of each vendor’s implementation are abstracted from the OSS.
The next step would be in standardising the interface / data model upstream of the OSS. That’s a more challenging task!!
“ONAP provides a comprehensive platform for real-time, policy-driven orchestration and automation of physical and virtual network functions that will enable software, network, IT and cloud providers and developers to rapidly automate new services and support complete lifecycle management.
By unifying member resources, ONAP is accelerating the development of a vibrant ecosystem around a globally shared architecture and implementation for network automation–with an open standards focus–faster than any one product could on its own.”
Part of the ONAP charter from onap.org.
The ONAP project is gaining attention in service provider circles. The Steering Committee of the ONAP project hints at the types of organisations investing in the project. The statement above summarises the mission of this important project. You can bet that the mission has been carefully crafted. As such, one can assume that it represents what these important stakeholders jointly agree to be the future needs of their OSS.
I find it interesting that there are quite a few technical terms (eg policy-driven orchestration) in the mission statement, terms that tend to pre-empt the solution. However, I don’t feel that pre-emptive technical solutions are the real mission, so I’m going to try to reverse-engineer the statement into business needs. Hopefully the business needs (the “why? why? why?” column below) articulates a set of questions / needs that all OSS can work to, as opposed to replicating the technical approach that underpins ONAP.
Why? Why? Why?
The ability to make instantaneous decisions
Why1: To adapt to changing conditions
Why2: To take advantage of fleeting opportunities or resolve threats
Why 3: To optimise key business metrics such as financials
Why 4: As CSPs are under increasing pressure from shareholders to deliver on key metrics
To use policies to increase the repeatability of key operational processes
Why 1: Repeatability provides the opportunity to improve efficiency, quality and performance
Why 2: Allows an operator to service more customers at less expense
Why 3: Improves corporate profitability and customer perceptions
Why 4: As CSPs are under increasing pressure from shareholders to deliver on key metrics
To use policies to increase the amount of automation that can be applied to key operational processes
Why 1: Automated processes provide the opportunity to improve efficiency, quality and performance
Why 2: Allows an operator to service more customers at less expense
Why 3: Improves corporate profitability and customer perceptions
physical and virtual network functions
Our networks will continue to consist of physical devices, but we will increasingly introduce virtualised functionality
Why 1: Physical devices will continue to exist into the foreseeable future but virtualisation represents an exciting approach into the future
Why 2: Virtual entities are easier to activate and manage (assuming sufficient capacity exists)
Why 3: Physical equipment supply, build, deploy and test cycles are much longer and labour intensive
Why 4: Virtual assets are more flexible, faster and cheaper to commission
Why 5: Customer services can be turned up faster and cheaper
software, network, IT and cloud providers and developers
With this increase in virtualisation, we find an increasingly large and diverse array of suppliers contributing to our value-chain. These suppliers contribute via software, network equipment, IT functions and cloud resources
Why 1: CSPs can access innovation and efficiency occurring outside their own organisation
Why 2: CSPs can leverage the opportunities those innovations provide
Why 3: CSPs can deliver more attractive offers to customers
Why 4: Key metrics such as profitability and customer satisfaction are enhanced
rapidly automate new services
We want the flexibility to introduce new products and services far faster than we do today
Why 1: CSPs can deliver more attractive offers to customers faster than competitors
Why 2: Key metrics such as market share, profitability and customer satisfaction are enhanced as well as improved cashflow
support complete lifecycle management
The components that make up our value-chain are changing and evolving so quickly that we need to cope with these changes without impacting customers across any of their interactions with their service
Why 1: Customer satisfaction is a key metric and a customer’s experience spans the entire lifecyle of their service.
Why 2: CSPs don’t want customers to churn to competitors
Why 3: Key metrics such as market share, profitability and customer satisfaction are enhanced
unifying member resources
To reduce the amount of duplicated and under-synchronised development currently being done by the member bodies of ONAP
Why 1: Collaboration and sharing reduces the effort each member body must dedicate to their OSS
Why 2: A reduced resource pool is required
Why 3: Costs can be reduced whilst still achieving a required level of outcome from OSS
To increase the level of supplier interchangability
Why 1: To reduce dependence on any supplier/s
Why 2: To improve competition between suppliers
Why 3: Lower prices, greater choice and greater innovation tend to flourish in competitive environments
Why 4: CSPs, as customers of the suppliers, benefit
globally shared architecture
To make networks, services and support systems easier to interconnect across the global communications network
Why 1: Collaboration on common standards reduces the integration effort between each member at points of interconnect
Why 2: A reduced resource pool is required
Why 3: Costs can be reduced whilst still achieving interconnection benefits
As indicated in earlier posts, ONAP is an exciting initiative for the CSP industry for a number of reasons. My fear for ONAP is that it becomes such a behemoth of technical complexity that it becomes too unwieldy for use by any of the member bodies. I use the analogy of ATM versus Ethernet here, where ONAP is equivalent to ATM in power and complexity. The question is whether there’s an Ethernet answer to the whys that ONAP is trying to solve.
I’d love to hear your thoughts.
(BTW. I’m not saying that the technologies the ONAP team is investigating are the wrong ones. Far from it. I just find it interesting that the mission is starting with a technical direction in mind. I see parallels with the OSS radar analogy.)
“If your partners don’t have to talk to you then you win.”
Put another way, the best form of customer service is no customer service (ie your customers and/or partners are so delighted with your automated offerings that they have no reason to contact you). They don’t want to contact you anyway (generally speaking). They just want to consume a perfectly functional and reliable solution.
In the deep, distant past, our comms networks required operators. But then we developed automated dialling / switching. In theory, the network looked after itself and people made billions of calls per year unassisted.
Something happened in the meantime though. Telco operators the world over started receiving lots of calls about their platform and products. You could say that they’re unwanted calls. The telcos even have an acronym called CVR – Call Volume Reduction – that describes their ambitions to reduce the number of customer calls that reach contact centre agents. Tools such as chatbots and IVR have sprung up to reduce the number of calls that an operator fields.
Network as a Service (NaaS), the context within Guy’s comment above, represents the next new tool that will aim to drive CVR (amongst a raft of other benefits). NaaS theoretically allows customers to interact with network operators via impersonal contracts (in the form of APIs). The challenge will be in the reliability – ensuring that nothing falls between the cracks in any of the layers / platforms that combine to form the NaaS.
In the world of NaaS creation, Guy is exactly right – “If your partners [and customers] don’t have to talk to you then you win.” As always, it’s complexity that leads to gaps. The more complex the NaaS stack, the less likely you are to achieve CVR.
The diagram below comes from a presentation by Corey Clinger. It describes Telstra’s Operational Domain Manager (ODM) model that is a key component of their Network as a Service (NaaS) framework. Notice the API stubs across the top of the ODM? Corey went on to describe the TM Forum Open API model that Telstra is building upon.
In a following session, Raman Balla indicated an perspective that differs from many existing OSS. The service owner (and service consumer) must know all aspects of a given service (including all dimensions, lifecycle, etc) in a common repository / catalog and it needs to be attribute-based. Raman also indicated that the aim he has for architecting NaaS is to not only standardise the service, but the entire experience around the service.
In the world of NaaS, operators can no longer just focus separately on assurance or fulfillment or inventory / capacity, etc. As per DevOps, operators are accountable for everything.
“One business customer, for example, may require ultra-reliable services, whereas other business customers may need ultra-high-bandwidth communication or extremely low latency. The 5G network needs to be designed to be able to offer a different mix of capabilities to meet all these diverse requirements at the same time.
From a functional point of view, the most logical approach is to build a set of dedicated networks each adapted to serve one type of business customer. These dedicated networks would permit the implementation of tailor-made functionality and network operation specific to the needs of each business customer, rather than a one-size-fits-all approach as witnessed in the current and previous mobile generations which would not be economically viable.
A much more efficient approach is to operate multiple dedicated networks on a common platform: this is effectively what “network slicing” allows. Network slicing is the embodiment of the concept of running multiple logical networks as virtually independent business operations on a common physical infrastructure in an efficient and economical way..”
GSMA’s Introduction to Network Slicing.
Engineering a network is one of compromises. There are many different optimisation levers to pull to engineer a set of network characteristics. In the traditional network, it was a case of pulling all the levers to find a middle-ground set of characteristics that supported all their service offerings.
QoS striping of traffic allowed for a level of differentiation of traffic handling, but the underlying network was still a balancing act of settings. Network virtualisation offers new opportunities. It allows unique segmentation via virtual networks, where each can be optimised for the specific use-cases of that network slice.
For years, I’ve been posing the concept of telco offerings being like electricity networks – that we don’t need so many service variants. I should note that this analogy is not quite right. We do have a few different types of “electricity” such as highly available (health monitoring), high-bandwidth (content streaming), extremely low latency (rapid reaction scenarios such as real-time sensor networks), etc.
Now what do we need to implement and manage all these network slices?? Oh that’s right, OSS! It’s our OSS that will help to efficiently coordinate all the slicing and dicing that’s coming our way… to optimise all the levers across all the different network slices!
As an implementer of OSS, what’s the single factor that makes it challenging for us to deliver on any of the three constraints of project delivery? Complexity. Or put another way, variants. The more variants, the less chance we have of delivering on time, cost or functionality.
So let me ask you, is our next evolution simpler? No, actually. At least, it doesn’t seem so to me.
For all their many benefits, are virtualised networks simpler? We can apply abstractions to give a simpler view to higher layers in the stack, but we’ve actually only introduced more layers. Virtualisation will also bring an even higher volume of devices, transactions, etc to monitor, so we’re going to have to develop complex ways of managing these factors in cohorts.
We’re big on automations to simplify the roles of operators. But automations don’t make the task simpler for OSS implementers. Once we build a whole bunch of complex automations it might give the appearance of being simpler. But under the hood, it’s not. There are actually more moving parts.
Are we making it simpler through repetition across the industry? Nope, with the proliferation of options we’re getting more diverse. For example, back in the day, we only had a small number of database options to store our OSS data in (I won’t mention the names, I’m sure you know them!). But what about today? We have relational databases of course, but also have so many more options. What about virtualisation options? Mediation / messaging options? Programming languages? Presentation / reporting options? The list goes on. Each different OSS uses a different suite of tools, meaning less standardisation.
Our OSS lives seem to be getting harder by the day!