OSS that repair virtualised networks – the dual loop approach

In a recent article, we talked about Network Service Assurance (NSA) in an environment where network virtualisation exists.

One of the benefits of virtualisation or NaaS (Network as a Service) is that it provides a layer of programmability to your network. That is, to be able to instantiate network services by software through a network API. Virtualisation also tends to assume/imply that there is a huge amount of available capacity (the resource pool) that it can shift workloads between. If one virtual service instance dies or deteriorates, then just automatically spin up another. If one route goes down, customer services are automatically re-directed via alternate routes and the service is maintained. No problem…

But there are some problems that can’t be solved in software. You can’t just use software to fix a cable that’s been cut by an excavator. You can’t just use software to fix failed electronics. Modern virtualised networks can do a great job of self-healing, routing around the problem areas. But there are still physical failures that need to be repaired / replaced / maintained by a field workforce. NSA doesn’t tend to cover that.

Looking at the diagram below, NSA does a great job of the closed-loop assurance within the red circle. But it then needs to kick out to the green closed-loop assurance processes that are already driven by our OSS/BSS.

As described in the link above, “Perhaps if the NSA was just assuring the yellow cloud/s, any time it identifies any physical degradation / failure in the resource pool, it kicks a notification up to the Customer Service Assurance (CSA) tools in the OSS/BSS layers? The OSS/BSS would then coordinate 1) any required customer notifications and 2) any truck rolls or fixes that can’t be achieved programmatically; just like it already does today. The additional benefit of this two-tiered assurance approach is that NSA can handle the NFV / VNF world, whilst not trying to replicate the enormous effort that’s already been invested into the CSA (ie the existing OSS/BSS assurance stack that looks after PNFs, other physical resources and the field workforce processes that look after it all).

Therefore, a key part of the NSA process is how it kicks up from closed-loop 1 to closed-loop 2. Then, after closed-loop 2 has repaired the physical problem, NSA needs to be aware that the repaired resource is now back in the pool of available resources. Does your NSA automatically notice this, or must it receive a notification from closed loop 2?

It could be as simple as NSA sending alarms into the alarm list with a clearly articulate root-cause. The alarm has a ticket/s raised against it. The ticket triggers the field workforce to rectify it and the triggers customer assurance teams/tools to send notifications to impacted customers (if indeed they send notifications to customers who may not actually be effected yet due to the resilience measures that have kicked in). Standard OSS/BSS practice!

OSS change…. but not too much… oh no…..

Let me start today with a question:
Does your future OSS/BSS need to be drastically different to what it is today?

Please leave me a comment below, answering yes or no.

I’m going to take a guess that most OSS/BSS experts will answer yes to this question, that our future OSS/BSS will change significantly. It’s the reason I wrote the OSS Call for Innovation manifesto some time back. As great as our OSS/BSS are, there’s still so much need for improvement.

But big improvement needs big change. And big change is scary, as Tom Nolle points out:
IT vendors, like most vendors, recognize that too much revolution doesn’t sell. You have to creep up on change, get buyers disconnected from the comfortable past and then get them to face not the ultimate future but a future that’s not too frightening.”

Do you feel like we’re already in the midst of a revolution? Cloud computing, web-scaling and virtualisation (of IT and networks) have been partly responsible for it. Agile and continuous integration/delivery models too.

The following diagram shows a “from the moon” level view of how I approach (almost) any new project.

The key to Tom’s quote above is in step 2. Just how far, or how ambitious, into the future are you projecting your required change? Do you even know what that future will look like? After all, the environment we’re operating within is changing so fast. That’s why Tom is suggesting that for many of us, step 2 is just a “creep up on it change.” The gap is essentially small.

The “creep up on it change” means just adding a few new relatively meaningless features at the end of the long tail of functionality. That’s because we’ve already had the most meaningful functionality in our OSS/BSS for decades (eg customer management, product / catalog management, service management, service activation, network / service health management, inventory / resource management, partner management, workforce management, etc). We’ve had the functionality, but that doesn’t mean we’ve perfected the cost or process efficiency of using it.

So let’s say we look at step 2 with a slightly different mindset. Let’s say we don’t try to add any new functionality. We lock that down to what we already have. Instead we do re-factoring and try to pull the efficiency levers, which means changes to:

  1. Platforms (eg cloud computing, web-scaling and virtualisation as well as associated management applications)
  2. Methodologies (eg Agile, DevOps, CI/CD, noting of course that they’re more than just methodologies, but also come with tools, etc)
  3. Process (eg User Experience / User Interfaces [UX/UI], supply chain, business process re-invention, machine-led automations, etc)

It’s harder for most people to visualise what the Step 2 Future State looks like. And if it’s harder to envisage Step 2, how do we then move onto Steps 3 and 4 with confidence?

This is the challenge for OSS/BSS vendors, supplier, integrators and implementers. How do we, “get buyers disconnected from the comfortable past and then get them to face not the ultimate future but a future that’s not too frightening?” And I should point out, that it’s not just buyers we need to get disconnected from the comfortable past, but ourselves, myself definitely included.

In an OSS, what are O2A, T2R, U2C, P2O and DBA?

Let’s start with the last one first – DBA.

In the context of OSS/BSS, DBA has multiple meanings but I think the most relevant is Death By Acronym (don’t worry all you Database Administrators out there, I haven’t forgotten about you). Our industry is awash with TLAs (Three-Letter Acronyms) that lead to DBA.

Having said that, today’s article is about four that are commonly used in relation to end to end workflows through our OSS/BSS stacks. They often traverse different products, possibly even multiple different vendors’ products. They are as follows:

  • P2O – Prospect to Order – This workflow operates across the boundary between the customer and the customer-facing staff at the service provider. It allows staff to check what products can be offered to a customer. This includes service qualification (SQ), feasibility checks, then design, assign and reserve resources.
  • O2A – Order to Activate – This workflow includes all activities to manage customer services across entire life-cycles. That is, not just the initial activation of a service, but in-flight changes during activation and post-activation changes as well
  • U2C – Usage to Cash – This workflow allows customers or staff to evaluate the usage or consumption of a service (or services) that has already been activated for a customer
  • T2R – Trouble to Resolve – This “workflow” is more like a bundle of workflows that relate to assuring health of the services (and the network that carries them). They can be categorised as reactive (ie a customer triggers a resolution workflow by flagging an issue to the service provider) or a proactive (ie the service provider identifies and issue, degradation or potential for issue and triggers a resolution workflow internally)

If you’re interested in seeing how these workflows relate to the TM Forum APIs and specifically to NaaS (Network as a Service) designs, there’s a great document (TMF 909A v1.5) that can be found at the provided link. It shows the sub-elements (and associated APIs) that each of these workflows rely on.

PS. I recently read a vendor document that described additional flows:- I2I (Idea to Implementation – service onboarding, through a catalog presumably), P2P (Plan to Production – resource provisioning) and O2S (Order to Service). There’s also C2M (Concept to Market), L2C (Lead to Cash) and I’m sure I’m forgetting a number of others. Are there any additional TLAs that I should be listing here to describe end-to-end workflows?

Cool new feature – An OSS masquerading as…

I spent some time with a client going through their OSS/BSS yesterday. They’re an Australian telco with a primarily home-grown, browser-based OSS/BSS. One of its features was something I’ve never seen in an OSS/BSS before. But really quite subtle and cool.

They have four tiers of users:

  1. Super-admins (the carrier’s in-house admins),
  2. Standard (their in-house users),
  3. Partners (they use many channel partners to sell their services),
  4. Customer (the end-users of the carrier’s services).

All users have access to the same OSS/BSS, but just with different levels of functionality / visibility, of course.

Anyway, the feature that I thought was really cool was that the super-admins have access to what they call the masquerade function. It allows them to masquerade as any other user on the system without having to log-out / login to other accounts. This allows them to see exactly what each user is seeing and experience exactly what they’re experiencing (notwithstanding any platform or network access differences such as different browsers, response times, etc).

This is clearly helpful for issue resolution, but I feel it’s even more helpful for design, feature release and testing across different personas.

In my experience at least, OSS/BSS builders tend to focus on a primary persona (eg the end-user) and can overlook multi-persona design and testing. The masquerade function can make this task easier.

Network slicing and a seismic shift in OSS responsibility

Network slicing allows operators to segment their network and configure each different slice to the specific needs of that customer (or group of customers). So rather than the network infrastructure being configured for the best compromise that suits all use-cases, instead each slice can be configured optimally for each use-case. That’s an exciting concept.

The big potential roadblock however, falls almost entirely on our OSS/BSS. If our operational tools require significant manual intervention on just one network now, then what chance do operators have of efficiently looking after many networks (ie all the slices).

This article describes the level of operational efficiency / automation required to make network slicing cost effective. It clearly shows that we’ll have to deliver massive sophistication in our OSS/BSS to handle automation, not to mention the huge number of variants we’d have to cope with across all the slices. If that’s the case, network slicing isn’t going to be viable any time soon.

But something just dawned on me today. I was assuming that the onus for managing each slice would fall on the network operator. What if we take the approach that telcos use with security on network pipes instead? That is, the telco shifts the onus of security onto their customer (in most cases). They provide a dumb pipe and ask the customer to manage their own security mechanisms (eg firewalls) on the end.

In the case of network slicing, operators just provide “dumb slices.” The operator assumes responsibility for providing the network resource pool (VNFs – Virtual Network Functions) and the automation of slice management including fulfilment (ie adds, modifies, deletes, holds, etc) and assurance. But the customers take responsibility for actually managing their network (slice) with their own OSS/BSS (which they probably already have a suite of anyway).

This approach doesn’t seem to require the same level of sophistication. The main impacts I see (and I’m probably overlooking plenty of others) are:

    1. There’s a new class of OSS/BSS required by the operators, that of automated slice management
    2. The customers already have their own OSS/BSS, but they currently tend to focus on monitoring, ticketing, escalations, etc. Their new customer OSS/BSS would need to take more responsibility for provisioning, including traffic engineering
    3. And I’d expect that to support customer-driven provisioning, the operators would probably need to provide ways for customers to programmatically interface with the network resources that make up their slice. That is, operators would need to offer network APIs or NaaS to their customers externally, not just for internal purposes
    4. Determining the optimal slice model. For example, does the carrier offer:
      1. A small number of slice types (eg video, IoT low latency, IoT low chat, etc), where each slice caters for a category of customers, but with many slice instances (one for each customer)
      2. A small number of slice instances, where all customers in that category share the single slice
      3. Customised slices for premium customers
      4. A mix of the above

.In the meantime, changes could be made as they have in the past, via customer portals, etc.

Thoughts?

Network Service Assurance has new meaning

Back in the old days, Network Service Assurance probably had a different meaning than it might today.

Clearly it’s assurance of a network service. That’s fairly obvious. But it’s in the definition of “network service” where the old and new terminologies have the potential to diverge.

In years past, telco networks were “nailed up” and network functions were physical appliances. I would’ve implied (probably incorrectly, but bear with me) that a “network service” was “owned” by the carrier and was something like a bearer circuit  (as distinct from a customer service or customer circuit). Those bearer circuits, using protocols such as in DWDM, SDH, SONET, ATM, etc potentially carried lots of customer circuits so they were definitely worth assuring. And in those nailed-up networks, we knew exactly which network appliances / resources / bearers were being utilised. This simplified service impact analysis (SIA) and allowed targeted fault-fix.

In those networks the OSS/BSS was generally able to establish a clear line of association from customer service to physical resources as per the TMN pyramid below. Yes, some abstraction happened as information permeated up the stack, but awareness of connectivity and resource utilisation was generally retained end-to-end (E2E).
OSS abstract and connect

But in the more modern computer or virtualised network, it all goes a bit haywire, perhaps starting right back at the definition of a network service.

The modern “network service” is more aligned to ETSI’s NFV definition – “a composition of network functions and defined by its functional and behavioral specification. The Network Service contributes to the behaviour of the higher layer service, which is characterised by at least performance, dependability, and security specifications. The end-to-end network service behaviour is the result of a combination of the individual network function behaviours as well as the behaviours of the network infrastructure composition mechanism.”

They are applications running at OSI’s application layer that can be consumed by other applications. These network services include DNS, DHCP, VoIP, etc, but the concept of NaaS (Network as a Service) expands the possibilities further.

So now the customer services at the top of the pyramid (BSS / BML) are quite separated from the resources at the physical layer, other than to say the customer services consume from a pool of resources (the yellow cloud below). Assurance becomes more disconnected as a result.

BSS OSS cloud abstract

OSS/BSS are able to tie customer services to pools of resources (the yellow cloud). And OSS/BSS tools also include PNI / WFM (Physical Network Inventory / Workforce Management) to manage the bottom, physical layer. But now there’s potentially an opaque gulf in the middle where virtualisation / NaaS exists.

The end-to-end association between customer services and the physical resources that carry them is lost. Unless we can find a way to establish E2E association, we just have to hope that our modern Network Service Assurance (NSA) tools make the yellow cloud robust to the point of infallibility. BTW. If the yellow cloud includes NaaS, then the NSA has to assure the NaaS gateway, catalog and all services instantiated through the gateway.

But as we know, there will always be failures in physical infrastructure (cable cuts, electronic malfunctions, etc). The individual resources can’t afford to be infallible, even if the resource pool seeks to provide collective resiliency.

Modern NSA has to find a way to manage the resource pool but also coordinate fault-fix in the physical resources that underpin it like the OSS used to do (still do??). They have to do more than just build policies and actions to ensure SLAs don’t they? They can seek to manage security, power, performance, utilisation and more. Unfortunately, not everything can be fixed programmatically, although that is a great place for NSA to start.

Perhaps if the NSA was just assuring the yellow cloud, any time it identifies any physical degradation / failure in the resource pool, it kicks a notification up to the Customer Service Assurance (CSA) tools in the OSS/BSS layers? The OSS/BSS would then coordinate 1) any required customer notifications and 2) any truck rolls or fixes that can’t be achieved programmatically; just like it already does today. The additional benefit of this two-tiered assurance approach is that NSA can handle the NFV / VNF world, whilst not trying to replicate the enormous effort that’s already been invested into the CSA (ie the existing OSS/BSS assurance stack that looks after PNFs, other physical resources and the field workforce processes that look after it all).

I’d love to hear your thoughts. Hopefully you can even correct me if/where I’m wrong.

Two concepts to help ease long-standing OSS problems

There’s a famous Zig Ziglar quote that goes something like, “You can have everything in life you want, if you will just help enough other people get what they want.”

You could safely assume that this was written for the individual reader, but there is some truth in it within the OSS context too. For the OSS designer, builder, integrator, does the statement “You can have everything in your OSS you want, if you will just help enough other people get what they want,” apply?

We often just think about the O in OSS – Operations people, when looking for who to help. But OSS/BSS has the ability to impact far wider than just the Ops team/s.

The halcyon days of OSS were probably in the 1990’s to early 2000’s when the term OSS/BSS was at its most sexy and exciting. The big telcos were excitedly spending hundreds of millions of dollars. Those projects were huge… and hugely complex… and hugely fun!

With that level of investment, there was the expectation that the OSS/BSS would help many people. And they did. But the lustre has come off somewhat since then. We’ve helped sooooo many people, but perhaps didn’t help enough people enough. Just speak with anybody involved with an OSS/BSS stack and you’ll hear hints of a large gap that exists between their current state and a desired future state.

Do you mind if I ask two questions?

  1. When you reflect on your OSS activities, do you focus on the technology, the opportunities or the problems
  2. Do you look at the local, day-to-day activities or the broader industry

I tend to find myself focusing on the problems – how to solve them within the daily context on customer challenges, but the broader industry problems when I take the time to reflect, such as writing these blogs.

The part I find interesting is that we still face most of the same problems today that we did back in the 1990’s-2000’s. The same source of risks. We’ve done a fantastic job of helping many people get what they want on their day-to-day activities (the incremental). We still haven’t cracked the big challenges though. That’s why I wrote the OSS Call for Innovation, to articulate what lays ahead of us.

It’s why I’m really excited about two of the concepts we’ve discussed this week:

Auto-releasing chaos monkeys to harden your network (CT/IR)

In earlier posts, we’ve talked about using Netflix’s chaos monkey approach as a way of getting to Zero Touch Assurance (ZTA). The chaos monkeys intentionally trigger faults in the network as a means of ensuring resilience. Not just for known degradation / outage events, but to unknown events too.

I’d like to introduce the concept of CT/IR – Continual Test / Incremental Resilience. Analogous to CI/CD (Continuous Integration / Continuous Delivery) before it, CT/IR is a method to systematically and programmatically test the resilience of the network, then ensuring resilience is continually improving.

The continual, incremental improvement in resiliency potentially comes via multiple feedback loops:

  1. Ideally, the existing resilience mechanisms work around or overcome any degradation or failure in the network
  2. The continual triggering of faults into the network will provide additional seed data for AI/ML tools to learn from and improve upon, especially root-cause analysis (noting that in the case of CT/IR, the root-cause is certain – we KNOW the cause – because we triggered it – rather than reverse engineering what the cause may have been)
  3. We can program the network to overcome the problem (eg turn up extra capacity, re-engineer traffic flows, change configurations, etc). Having the NaaS that we spoke about yesterday, provides greater programmability for the network by the way.
  4. We can implement systematic programs / projects to fix endemic faults or weak spots in the network *
  5. Perform regression tests to constantly stress-test the network as it evolves through network augmentation, new device types, etc

Now, you may argue that no carrier in their right mind will allow intentional faults to be triggered. So that’s where we unleash the chaos monkeys on our digital twin technology and/or PSUP (Production Support) environments at first. Then on our prod network if we develop enough trust in it.

I live in Australia, which suffers from severe bushfires every summer. Our fire-fighters spend a lot of time back-burning during the cooler months to reduce flammable material and therefore the severity of summer fires. Occasionally the back-burns get out of control, causing problems. But they’re still done for the greater good. The same principle could apply to unleashing chaos monkeys on a production network… once you’re confident in your ability to control the problems that might follow.

* When I say network, I’m also referring to the physical and logical network, but also support functions such as EMS (Element Management Systems), NCM (Network Configuration Management tools), backup/restore mechanisms, service order replay processes in the event of an outage, OSS/BSS, NaaS, etc.

Who attended TM Forum’s DTW event in Nice this week?

Some of the world’s leading OSS thinkers met at TM Forum’s Digital Transformation World in Nice, France, this week.

Were you one of them?

I was unable to attend due to a date with a surgeon (no, not the romantic kind of date). If you attended, I’d love to live vicariously through you and hear your key takeaways from the event.

Please leave us a comment below, either with your thoughts or via links to blogs/articles you’ve authored!

NaaS is to networks what Agile is to software

After Telstra’s NaaS (Network as a Service) program won a TM Forum excellence award, I promised yesterday to share a post that describes why I’m so excited about the concept of NaaS.

As the title suggests above, NaaS has the potential to be as big a paradigm shift for networks (and OSS/BSS) as Agile has been for software development.

There are many facets to the Agile story, but for me one of the most important aspects is that it has taken end-to-end (E2E), monolithic thinking and has modularised it. Agile has broken software down into pieces that can be worked on by smaller, more autonomous teams than the methods used prior to it.

The same monolithic, E2E approach pervades the network space currently. If a network operator wants to add a new network type or a new product type/bundle, large project teams must be stood up. And these project teams must tackle E2E complexity, especially across an IT stack that is already a spaghetti of interactions.

But before I dive into the merits of NaaS, let me take you back a few steps, back into the past. Actually, for many operators, it’s not the past, but the current-day model.

Networks become Agile with NaaS (the TMN model)

As per the orange arrow, customers of all types (Retail, Enterprise and Wholesale) interact with their network operator through BSS (and possibly OSS) tools. [As an aside, see this recent post for a “religious war” discussion on where BSS ends and OSS begins]. The customer engagement occurs (sometimes directly, sometimes indirectly) via BSS tools such as:

  • Order Entry, Order Management
  • Product Catalog (Product / Offer Management)
  • Service Management
  • SLA (Service Level Agreement) Management
  • Billing
  • Problem Management
  • Customer Management
  • Partner Management
  • etc

If the customer wants a new instance of an existing service, then all’s good with the current paradigm. Where things become more challenging is when significant changes occur (as reflected by the yellow arrows in the diagram above).

For example, if any of the following are introduced, there are end-to-end impacts. They necessitate E2E changes to the IT spaghetti and require formation of a project team that includes multiple business units (eg products, marketing, IT, networks, change management to support all the workers impacted by system/process change, etc)

  1. A new product or product bundle is to be taken to market
  2. An end-customer needs a custom offering (especially in the case of managed service offerings for large corporate / government customers)
  3. A new network type is added into the network
  4. System and / or process transformations occur in the IT stack

If we just narrow in on point 3 above, fundamental changes are happening in network technology stacks already. Network virtualisation (SDN/NFV) and 5G are currently generating large investments of time and money. They’re fundamental changes because they also change the shape of our traditional OSS/BSS/IT stacks, as follows.

Networks become Agile with NaaS (the virtualisation model)

We now not only have Physical Network Functions (PNF) to manage, but Virtual Network Functions (VNF) as well. In fact it now becomes even more difficult because our IT stacks need to handle PNF and VNF concurrently. Each has their own nuances in terms of over-arching management.

The virtualisation of networks and application infrastructure means that our OSS see greater southbound abstraction. Greater southbound abstraction means we potentially lose E2E visibility of physical infrastructure. Yet we still need to manage E2E change to IT stacks for new products, network types, etc.

The diagram below shows how NaaS changes the paradigm. It de-couples the network service offerings from the network itself. Customer Facing Services (CFS) [as presented by BSS/OSS/NaaS] are de-coupled from Resource Facing Services (RFS) [as presented by the network / domains].

NaaS becomes a “meet-in-the-middle” tool. It effectively de-couples

  • The products / marketing teams (who generate customer offerings / bundles) from
  • The networks / operations teams (who design, build and maintain the network).and
  • The IT teams (who design, build and maintain the IT stack)

It allows product teams to be highly creative with their CFS offerings from the available RFS building blocks. Consider it like Lego. The network / ops teams create the building blocks and the products / marketing teams have huge scope for innovation. The products / marketing teams rarely need to ask for custom building blocks to be made.

You’ll notice that the entire stack shown in the diagram below is far more modular than the diagram above. Being modular makes the network stack more suited to being worked on by smaller autonomous teams. The yellow arrows indicate that modularity, both in terms of the IT stack and in terms of the teams that need to be stood up to make changes. Hence my claim that NaaS is to networks what Agile has been to software.

Networks become Agile with NaaS (the NaaS model)

You will have also noted that NaaS allows the Network / Resource part of this stack to be broken into entirely separate network domains. Separation in terms of IT stacks, management and autonomy. It also allows new domains to be stood up independently, which accommodates the newer virtualised network domains (and their VNFs) as well as platforms such as ONAP.

The NaaS layer comprises:

  • A TMF standards-based API Gateway
  • A Master Services Catalog
  • A common / consistent framework of presentation of all domains

The ramifications of this excites me even more that what’s shown in the diagram above. By offering access to the network via APIs and as a catalog of services, it allows a large developer pool to provide innovative offerings to end customers (as shown in the green box below). It opens up the long tail of innovation that we discussed last week.
Networks become Agile with NaaS (the developer model)

Some telcos will open up their NaaS to internal or partner developers. Others are drooling at the prospect of offering network APIs for consumption by the market.

You’ve probably already identified this, but the awesome thing for the developer community is that they can combine services/APIs not just from the telcos but any other third-party providers (eg Netflix, Amazon, Facebook, etc, etc, etc). I could’ve shown these as East-West services in the diagram but decided to keep it simpler.

Developers are not constrained to offering communications services. They can now create / offer higher-order services that also happen to have communications requirements.

If you weren’t already on board with the concept, hopefully this article has convinced you that NaaS will be to networks what Agile has been to software.

Agree or disagree? Leave me a comment below.

PS1. I’ve used the old TMN pyramid as the basis of the diagram to tie the discussion to legacy solutions, not to imply size or emphasis of any of the layers.

PS2. I use the terms OSS/BSS as per TMN pyramid. The actual demarcation line between what OSS and BSS does tend to be grey and trigger religious wars, as per the post earlier this week.

PS3. Similarly, the size of the NaaS layer is to bring attention to it rather than to imply it is a monolithic stack in it’s own right. In reality, it is actually a much thinner shim layer architecturally

PS4. The analogy between NaaS and Agile is to show similarities, not to imply that NaaS replaces Agile. They can definitely be used together

PS5. I’ve used the term IT quite generically (operationally and technically) just to keep the diagram and discussion as simple as possible. In reality, there are many sub-functions like data centre operations, application monitoring, application control, applications development, product owner, etc. These are split differently at each operator.

TM Forum announces 2019 award winners

On Monday evening in Nice, TM Forum announced the list of 12 Excellence Awards for 2019. They also anointed Distinguished Fellow and Distinguished Engineer status on two of the industry’s leading contributors.

Huge congratulations to each of the following award winners:

  1. Business Transformation – Royal KPN, Vlocity and Salesforce
  2. IT Transformation – China Mobile & Huawei
  3. Operational Transformation & Agility – Netcracker
  4. Network Transformation – Telstra
  5. Digital Service Innovator of the Year – Orange
  6. Open Digital Ecosystem Platform of the Year – TELUS
  7. Disruptive Innovation – Etiya & Fizz, Zeotap
  8. Outstanding Customer Centricity – Whale Cloud & China Telecom
  9. Open API – DGiT, Netcracker
  10. CIO of the Year – Cody Sanford, T-Mobile USA
  11. CTO of the Year – Giovanni Chiarelli, MTN South Africa
  12. Future Digital Leader – Ye Ouyang, AsiaInfo
  13. Distinguished Fellow – Dr. Lester Thomas, Vodafone
  14. Distinguished Engineer – Takayuki Nakamura, NTT Group

I’m most excited about number 4 on the list, partly because I was involved in the early concept / PoC stages of Telstra’s NaaS (Network as a Service) project and because of what NaaS represents to our industry [more on that in tomorrow’s post]. I’m also excited to see a little Australian company, DGiT, appearing amongst a list that’s mostly made up of industry heavyweights.

Where does BSS end and OSS begin?

Over the years, I’ve been asked the question many times, “what’s the difference between OSS (Operational Support Systems) and BSS (Business Support Systems)?” I’ve also been asked, albeit slightly less regularly, how OSS and BSS map to TM Forum standards like the TAM and eTOM.

To my knowledge, TM Forum has never attempted to map OSS vs BSS. It sets off too many religious wars.

Just for fun, I thought I’d have a crack at trying to map OSS and BSS onto the TAM. Click on the image for a larger PDF version.

OSS and BSS overlaid onto the TAM

I’ve taken the perspective that customer or business-facing functionality is generally considered to be BSS. Alternatively, network / operations-facing functionality is generally considered to be OSS.
And these two tend to overlap at the service layer.

Or, you could just simply call them business operations systems (BOS) that cover the entire TAM estate.

What do you think? Does it trigger a religious war for you? Comments welcomed below.

FWIW. I come from an era when my “OSS” tools had a lot of functionality that could arguably be classified as BSS-centric (eg product management, customer relationship management, service order entry, etc). They also happened to deliver functionality that others might classify as NMS or EMS (Network Management System or Element Management System) in nature. In my mind, they’ve always just been software that supports operationalisation of a network, whether customer or network/resource-facing. It’s one of the reasons this site is called Passionate About OSS, not Passionate About OSS/BSS/NMS/EMS.

Top 10 most common OSS project risks

OSS projects are full of risks we all know it. OSS projects have “earned” a bad name because of all those risks. On the other side of that same coin, OSS projects disappoint, in part I suspect because stakeholders expect such big things from their resource investments.

Ask anyone familiar with OSS projects and you’ll be sure to hear a long list of failings.

For those less familiar with what an OSS project has in store for you, I’d like to share a list of the most common risks I’ve seen on OSS projects.

Most people working in the OSS industry are technology-centric, so they’ll tend to cite risks that relate to the tech. That’s where I used to focus attention too. Now technology risk definitely exists, but as you’ll see below, I tend to start by looking at other risk factors first these days.

Most common OSS project risks / issues:

  1. Complexity (to be honest, this is probably more the root-cause / issue that manifests as many of the following risks). However, complexity across many aspects of OSS projects is one of the biggest problem sources
  2. Change ManagementOSS tend to introduce significant change to an organisation – operationally, organisationally, processes, training, etc. This is probably the most regularly underestimated component of any large OSS build
  3. Stakeholder Support / Politics – Challenges appear on every single OSS project. They invariably need strong support from stakeholders and sponsors to clear a path through the biggest challenges. If the project’s leaders aren’t fully committed and in unison, the delivery teams will be heavily constrained
  4. Ill-defined Scope – Over-scoping, scope omission and scope creep all represent risks to an OSS project. But scope is never perfectly defined or static, so scope management mechanisms need to be developed up-front rather than in-flight. Tying back to point 1 above, complexity minimisation should be a key element of scope planning. To hark back to my motto for OSS, “just because we can, doesn’t mean we should)
  5. Financial and commercial – As with scope, it’s virtually impossible to plan an OSS project to perfection. There are always unknowns.These unknowns can directly impact the original estimates. Projects with blow-outs and no contingency for time or money increase pressure on point 3 (stakeholders/sponsors) to maintain their support
  6. Client resource skills / availability – An OSS has to be built to the needs of a client. If the client is unable to provide resources to steer the implementation, then it’s unlikely for the client to get a solution that is perfectly adapted to the client’s needs. One challenge for the client is that their most valuable guides, those with the client’s tribal knowledge, are also generally in high demand by “business as usual” teams. It becomes a challenge to allocate enough of their time to guide  the OSS delivery team. Another challenge is augmenting the team with the required skill-set when a project introduces new skill requirements
  7. CommunicationOSS projects aren’t built in a vacuum. They have many project contributors and even more end-users. There are many business units that touch an OSS/BSS, each with their own jargon and interpretations.  For example, how many alternate uses of the term “service” can you think of? I think an important early-stage activity is to agree on and document naming conventions
  8. Culture – Of the client team and/or project team. Culture contributes to (or detracts from) motivation, morale, resource turnover, etc, which can have an impact on the team’s ability to deliver
  9. Design / Integration – Finally, a technology risk. This item is particularly relevant with complex projects, it can be difficult for all of the planned components to operate and integrate as planned. A commonly unrecognised risk relates to the viability of implementing a design. It’s common for an end-state design to be specified but with no way of navigating through a series of steps / phases and reach the end-state
  10. Technology – Similar to the previous point, there are many technology risks relating to items such as quality, scalability, resiliency, security, supportability, obsolescence, interoperability, etc

There’s one thing you will have probably noticed about this list. Most of the risks are common to other projects, not just OSS projects. However, the risks do tend to amplify on OSS projects because of their inherent complexity.

TM Forum’s Digital Transformation World (DTW) starts tomorrow

For those who don’t already know, one of the peak OSS industry events starts tomorrow in Nice, France. It’s known as Digital Transformation World (DTW) and is run by TM Forum.

OSS and BSS (and so much more) experts will be there, collaborating via multiple different methods – talks, presentations, demonstrations, industry proofs-of-concept, campfire events, etc.

Leave us a comment below if you’re attending and tell us what you’re most excited about.

Inverting the pyramid of OSS and network innovation

Back in the earliest days of OSS (and networks for that matter), it was the telcos that generated almost all of the innovation. That effectively limited innovation to being developed by the privileged few, those who worked for the government-owned, monopoly telcos.

But over time, the financial leaders at those telcos felt the costs of their amazing research and development labs outweighed the benefits and shut them down (or starved them at best). OSS (and network) vendors stepped into the void to assume responsibility for most of the innovation. But there was a dilemma for the vendors (and for telcos and consumers too) – they needed to innovate fast enough to win work against their competitors, but slow enough to accrue revenues from the investment in their earlier innovations. And innovation was still being constrained to the privileged few, those who worked for vendors and integrators.

Now, the telcos are increasingly pushing to innovate wider and faster than the current vendor collective can accommodate. It means we have to reach further out to the long-tail of innovators. To open the floor beyond the privileged few. Excitingly, this opportunity appears to be looming.

“How?” you may ask.

Network as a Service (NaaS) and API platform offerings.

If every telco offers consumption of their infrastructure via API, it provides the opportunity for any developer to bundle their own unique offering of products, services, applications, hosting, etc and take it to market. If you’re heading to TM Forum’s Digital Transformation World (DTW) in Nice next week, there are a number of Catalyst projects on display in this space, including:

Zero-touch partnering could make platform ‘utopia’ real for telcos

Packaging Open APIs for NaaS

The challenge for the telcos is in how to support the growth of this model. To foster the vendor market, it was easy enough for the telcos to identify the big suppliers and funnel projects (and funding) through them. But now they have to figure out a funnel that’s segmented at a much smaller scale – to facilitate take-up by the millions of developers globally who might consume their products (network APIs in this case) rather than the hundreds/thousands of large suppliers.

This brings us back to smart contracts and micro-procurement as well as the technologies such as blockchain that support these models. This ties in with another TM Forum initiative to revolutionise the procurement event:

Time to kill the RFP? Reinventing IT procurement for the 2020s: Volume 1

But an additional benefit for the telcos, if and when the NaaS platform model takes hold, is that the developers also become a unpaid salesforce for the telcos. The developers will be responsible for marketing and selling their own bundles, which will drive consumption and revenues on the telcos’ assets.

Exciting new business models and supply chains are bound to evolve out of this long tail of innovation.

Could you believe it? An OSS with less features that helps more?

All OSS products are excellent these days. And all OSS vendors know what the most important functionality is. They already have those features built into their products. That is, they’ve already added the all-important features at the left side of the graph.
Long-tail features

But it also means product teams are tending to only add the relatively unimportant new features to the right edge of the graph (ie inside the red box). Relatively unimportant and therefore delivering minimal differential advantage.

The challenge for users is that there is a huge amount of relatively worthless functionality that they have to navigate around. This tends to make the user interfaces non-intuitive.

In a previous post, we mentioned that it’s the services wrapper where OSS suppliers have the potential to differentiate.

But another approach, a product-led differentiator, dawned on me when discussing the many sources of OSS friction in yesterday’s post. What if we asked our product teams to take a focus on designing solutions that remove friction instead of the typical approach of adding features (and complexity)?

Almost every OSS I’m aware of has many areas of friction. It’s what gives the OSS industry a bad name. But what if one vendor reduced friction to levels far less than any other competitor? Would it be a differentiator? I’m quite certain customers would be lining up to buy a frictionless OSS even if it didn’t have every perceivable feature.

But can it work? What do you think?

Is your OSS squeaking like an un-oiled bearing?

Network operators spend huge amounts on building and maintaining their OSS/BSS every year. There are many reasons they invest so heavily, but in most cases it can be distilled back to one thing – improving operational efficiency.

And our OSS/BSS definitely do improve operational efficiency, but there are still so many sources of friction. They’re squeaking like un-oiled bearings. Here are just a few of the common sources:

  1. First-time Installation
  2. Identifying best-fit tools
  3. Procurement of new tools
  4. Update / release processes
  5. Continuous data quality / consistency improvement
  6. Navigating to all features through the user interface
  7. Non-intuitive functionality / processes
  8. So many variants / complexity that end-users take years to attain expert-level capability
  9. Integration / interconnect
  10. Getting new starters up to speed
  11. Getting proficient operators to expertise
  12. Unlocking actionable insights from huge data piles
  13. Resolving the root-cause of complex faults
  14. Onboarding new customers
  15. Productionising new functionality
  16. Exception and fallout handling
  17. Access to supplier expertise to resolve challenges

The list goes on far deeper than that list too. The challenge for many OSS product teams, for any number of reasons, is that their focus is on adding new features rather than reducing friction in what already exists.

The challenge for product teams is diagnosing where the friction  and risks are for their customers / stakeholders. How do you get that feedback?

  • Every vendor has a product support team, so that’s a useful place to start, both in terms of what’s generating the most support calls and in terms of first-hand feedback from customers
  • Do you hold user forums on a regular basis, where you get many of your customers together to discuss their challenges, your future roadmap, new improvements / features
  • Does your process “flow” data show where the sticking points are for operators
  • Do you conduct gemba walks with your customers
  • Do you have a program of ensuring all developers spend at least a few days a year interacting directly with customers on their site/s
  • Do you observe areas of difficulty when delivering training
  • Do you go out of your way to ask your customers / stakeholders questions that are framed around their pain-points, not just framed within the context of your existing OSS
  • Do you conduct customer surveys? More importantly, do you conduct surveys through an independent third-party?

On the last dot-point, I’ve been surprised at some of the profound insights end-users have shared with me when I’ve been conducting these reviews as the independent interviewer. I’ve tended to find answers are more open / honest when being delivered to an independent third-party than if the supplier asks directly. If you’d like assistance running a third-party review, leave us a note on the contact page. We’d be delighted to assist.

Fast and slow OSS, where uCPE and network virtualisation fits in

Yesterday’s post talked about one of the many dichotomies in OSSfast and slow data / processes.

One of the longer lead-time items in relation to OSS data and processes is in network build and customer connections. From the time when capacity planning or a customer order creates the signal to build, it can be many weeks or months before the physical infrastructure work is complete and appearing in the OSS.

There are two financial downsides to this. Firstly, it tends to be CAPEX-heavy with equipment, construction, truck-rolls, government approvals, etc burning through money. Meanwhile, it’s also a period where there is no money coming in because the services aren’t turned on yet. The time-to-cash cycle of new build (or augmentation) is the bane of all telcos.

This is one of the exciting aspects of network virtualisation for telcos. In a time where connectivity is nearly ubiquitous in most countries, often with high-speed broadband access, physical build becomes less essential (except over-builds). Technologies such as uCPE (Universal Customer Premises Equipment), NFV (Network Function Virtualisation), SD WAN (Software-Defined Wide Area Networks), SDN (Software Defined Networks) and others mean that we can remotely upgrade and reconfigure the network without field work.

Network virtualisation gives the potential to speed up many of the slowest, and costliest processes that run through our OSS… but only if our OSS can support efficient orchestration of virtualised networks. And that means having an OSS with the flexibility to easily change out slow processes to replace them with fast ones without massive overhauls.

Give me a fast OSS and I might ask you to slooooow doooown

The traditional telco (and OSS) ran at different speeds. Some tasks had to happen immediately (eg customers calling one another) while others took time (eg getting a connection to a customer’s home, which included designs, approvals, builds, etc), often weeks.

Our OSS have processes that must happen sequentially and expediently. They also have processes that must wait for dependencies, conditional events and time delays. Some roles need “fast,” others can cope with “slow.” Who wins out in this dilemma?

Even the data we rely on can transact at different speeds. For capacity planning, we’re generally interested in longer-term data. We don’t have to process at real-time. Therefore we can choose to batch process at longer cycle times and with summarised data sets. For network assurance, we’re generally interested in getting data as quick as is viable.

Today’s post is about that word, viable, and pragmatism we sometimes have to apply to our OSS.

For example, if our operations teams want to reduce network performance poll cycles from every 15 mins down to once a minute, we increase the amount of data to process by 15x. That means our data storage costs go up by 15x (assuming a flat-rate cost structure applies). The other hidden cost is that our compute and network costs also go up because we have to transfer and process 15x as much data.

The trade-off we have to make in responses to this rapid escalation of cost (when going from 15 to 1 min) is in the benefits we might derive. Can we avoid SLA (Service Level Agreement) breach costs? Can we avoid costly outages? Can we avoid damage to equipment? Can we reduce the risk of losing our carrier license?

The other question is whether our operators actually have the ability to respond to 15x as much data. Do we have enough people to respond at an increased cycle time? Do we have OSS tools that are capable of filtering what’s important and disregarding “background” activity? Do we have OSS tools that are capable of learning from every single metric (eg AI), at volumes the human brain could never cope with?

Does it make sense that we have a single platform for handling fast and slow processes? For example, do we use the same platform to process 1 minute-cycle performance data for long-term planning (batch-processed once daily) and quick-fire assurance (processed as fast as possible)?

If we stick to one platform, can our OSS apply data reduction techniques (eg selective discard of records) to get the benefits of speed, but with the cost reduction of slow?

Hansen acquires Sigma Systems

Acquisition of Sigma Systems.

Hansen Technologies Limited announced the signing of definitive agreements for the acquisition of Sigma Systems (“Sigma”). The acquisition is expected to close on 31 May 2019.

Key Points:

Founded in 1996, Sigma is a leading global provider of catalog-driven software products for telecommunications, media, and high-tech companies. Its software is designed to streamline complex product and service offerings and provide a faster path to creating, selling and delivering new digital products and services, combined and packaged with traditional core services.
It is being acquired for an enterprise value (EV) of CAD157.0m (AUD166.2m1) – which equates to an EV/EBITDA acquisition multiple of 8.3 times calendar year 2018 (CY182) normalised EBITDA3
Funding for the acquisition will be 100% provided by a new bank debt facility of AUD225m underwritten by RBC Capital Markets.
Sigma has been majority owned by private equity investor Birch Hill Equity Partners Management since 2015.
CY18 revenue was CAD73.1m (AUD75.5m4) and CY18 normalised EBITDA3 was CAD18.8m (AUD19.4m4).
CY18 revenue split was Americas 56%, EMEA 29% and APAC 15%.
Sigma has more than 70 customers with deployments in some 40 countries. Tier 1 customers include Liberty Global, Telstra, Vodafone, Inmarsat, Telkomsel, Altice, and Cox Communications.
Sigma has more than 480 staff with major offices located in Toronto, Canada (Head Office), London and Wales (UK) and Pune, India.
The strong strategic rationale for the acquisition includes:
The business is a high-quality asset – being a global leader in providing enterprise catalog-driven software products to the communications, media and high-tech sectors
It significantly expands Hansen’s scale and scope in the telecommunications sector – revenue from the telecommunications sector would have been 38% of total revenue in CY18 on a pro-forma basis if Sigma was owned during that period, compared to actual of 17%
Cross-sell opportunities exist into Hansen’s large utilities customer base by integrating the Catalog product into our energy product offerings, as well as PayTV.
The acquisition is expected to be earnings per share (EPS) accretive in FY202, excluding amortisation of acquired intangibles5.

Sigma Overview
Products
The Sigma product portfolio comprises catalog-driven software solutions that streamlines complex product and service offerings for communications, media, and high-tech companies.

The product portfolio includes:

Catalog – which provides a single source of “knowledge” for all of the service provider’s products and services, enabling the introduction and management of new and existing products and services with a single point of control, thus reducing the time-to-market for new offerings.
Configure Price Quote (CPQ) – Catalog-driven, this product applies real-time, enterprise-wide pricing structures to quote and capture orders, from standardised consumer offerings to complex tailored enterprise services.
Order Management – provides end-to-end management of an order, from when it is placed to when it is fulfilled and operational.
Portfolio Inventory – provides a single source of “knowledge” on all the products customers have ordered, the services that were activated for those products, and the resources that were provisioned for those services.
Provisioning – a network service and device activation product that manages, tracks and activates a complete range of network communication services and devices from a set of preconfigured activation solutions.
Insights – an analytics tool that provides service providers with real-time visibility of operational and sales performance at a granular level, allowing them to adjust sales strategies as necessary.
Sigma’s products enable business growth from new digital services combined and packaged with traditional core services. The product suite is highly complementary in nature and drives cross-sell expansion after initial deployment of one product. Product deployment can be either cloud or on-premise.

Sigma’s go-to-market strategy comprises a global direct sales force, combined with partnering with several systems integrators and CRM providers such as PwC, Infosys, Tech Mahindra, Microsoft and Salesforce to expand reach.

Customer Base
Sigma has a diversified revenue base with over 70 customers globally with deployments in approximately 40 countries. The average customer age is more than 8 years and includes many Tier 1 operators across the globe.

Customers include: Vodafone, Liberty Global, Telstra (Australia), Altice, Cox Communications (USA), Ziggo (Netherlands), Telkomsel (Indonesia), J:Com (Japan), Inmarsat (UK), Telmex (Mexico), Tiscali (Italy), Telus (Canada), Sky (UK), EWE TEL (Germany) and ViaSat (USA).

Financial Profile
Revenue in calendar year 2018 (CY18) was CAD73.1m (AUD75.5m4) and CY18 normalised EBITDA3 was CAD18.8m (AUD19.4m4), equating to a normalised EBITDA margin of 25.7%

Sigma has high levels of recurring revenue – derived from maintenance & support fees, periodic licence fees and managed professional services, while non-recurring revenue is derived from professional services and one-off licence fees.

CY18 revenue split was Americas 56%, EMEA 29% and APAC 15%.

Strategic Rationale
There is strong strategic rationale for the acquisition:

The business is a high-quality asset – being a global leader in providing enterprise catalog-driven software products to the communications, media and high-tech sectors
Sigma’s proprietary products sit within or adjacent to our core business of billing and customer management, and are well designed to capture growth opportunities from the rollout of new telecommunications services such as 5G
It significantly expands Hansen’s scale and scope in the telecommunications sector – revenue from the telecommunications sector would have been 38% of total revenue in CY18 on a pro-forma basis if Sigma was owned during that period, compared to actual of 17%
Cross-sell opportunities exist into Hansen’s large utilities customer base by integrating the Catalog product into our energy product offerings, as well as PayTV
Sigma further expands the depth and breadth of our global presence
It is expected to be earnings per share accretive in FY20, excluding amortisation of acquired intangibles5.