NaaS is to networks what Agile is to software

After Telstra’s NaaS (Network as a Service) program won a TM Forum excellence award, I promised yesterday to share a post that describes why I’m so excited about the concept of NaaS.

As the title suggests above, NaaS has the potential to be as big a paradigm shift for networks (and OSS/BSS) as Agile has been for software development.

There are many facets to the Agile story, but for me one of the most important aspects is that it has taken end-to-end (E2E), monolithic thinking and has modularised it. Agile has broken software down into pieces that can be worked on by smaller, more autonomous teams than the methods used prior to it.

The same monolithic, E2E approach pervades the network space currently. If a network operator wants to add a new network type or a new product type/bundle, large project teams must be stood up. And these project teams must tackle E2E complexity, especially across an IT stack that is already a spaghetti of interactions.

But before I dive into the merits of NaaS, let me take you back a few steps, back into the past. Actually, for many operators, it’s not the past, but the current-day model.

Networks become Agile with NaaS (the TMN model)

As per the orange arrow, customers of all types (Retail, Enterprise and Wholesale) interact with their network operator through BSS (and possibly OSS) tools. [As an aside, see this recent post for a “religious war” discussion on where BSS ends and OSS begins]. The customer engagement occurs (sometimes directly, sometimes indirectly) via BSS tools such as:

  • Order Entry, Order Management
  • Product Catalog (Product / Offer Management)
  • Service Management
  • SLA (Service Level Agreement) Management
  • Billing
  • Problem Management
  • Customer Management
  • Partner Management
  • etc

If the customer wants a new instance of an existing service, then all’s good with the current paradigm. Where things become more challenging is when significant changes occur (as reflected by the yellow arrows in the diagram above).

For example, if any of the following are introduced, there are end-to-end impacts. They necessitate E2E changes to the IT spaghetti and require formation of a project team that includes multiple business units (eg products, marketing, IT, networks, change management to support all the workers impacted by system/process change, etc)

  1. A new product or product bundle is to be taken to market
  2. An end-customer needs a custom offering (especially in the case of managed service offerings for large corporate / government customers)
  3. A new network type is added into the network
  4. System and / or process transformations occur in the IT stack

If we just narrow in on point 3 above, fundamental changes are happening in network technology stacks already. Network virtualisation (SDN/NFV) and 5G are currently generating large investments of time and money. They’re fundamental changes because they also change the shape of our traditional OSS/BSS/IT stacks, as follows.

Networks become Agile with NaaS (the virtualisation model)

We now not only have Physical Network Functions (PNF) to manage, but Virtual Network Functions (VNF) as well. In fact it now becomes even more difficult because our IT stacks need to handle PNF and VNF concurrently. Each has their own nuances in terms of over-arching management.

The virtualisation of networks and application infrastructure means that our OSS see greater southbound abstraction. Greater southbound abstraction means we potentially lose E2E visibility of physical infrastructure. Yet we still need to manage E2E change to IT stacks for new products, network types, etc.

The diagram below shows how NaaS changes the paradigm. It de-couples the network service offerings from the network itself. Customer Facing Services (CFS) [as presented by BSS/OSS/NaaS] are de-coupled from Resource Facing Services (RFS) [as presented by the network / domains].

NaaS becomes a “meet-in-the-middle” tool. It effectively de-couples

  • The products / marketing teams (who generate customer offerings / bundles) from
  • The networks / operations teams (who design, build and maintain the network).and
  • The IT teams (who design, build and maintain the IT stack)

It allows product teams to be highly creative with their CFS offerings from the available RFS building blocks. Consider it like Lego. The network / ops teams create the building blocks and the products / marketing teams have huge scope for innovation. The products / marketing teams rarely need to ask for custom building blocks to be made.

You’ll notice that the entire stack shown in the diagram below is far more modular than the diagram above. Being modular makes the network stack more suited to being worked on by smaller autonomous teams. The yellow arrows indicate that modularity, both in terms of the IT stack and in terms of the teams that need to be stood up to make changes. Hence my claim that NaaS is to networks what Agile has been to software.

Networks become Agile with NaaS (the NaaS model)

You will have also noted that NaaS allows the Network / Resource part of this stack to be broken into entirely separate network domains. Separation in terms of IT stacks, management and autonomy. It also allows new domains to be stood up independently, which accommodates the newer virtualised network domains (and their VNFs) as well as platforms such as ONAP.

The NaaS layer comprises:

  • A TMF standards-based API Gateway
  • A Master Services Catalog
  • A common / consistent framework of presentation of all domains

The ramifications of this excites me even more that what’s shown in the diagram above. By offering access to the network via APIs and as a catalog of services, it allows a large developer pool to provide innovative offerings to end customers (as shown in the green box below). It opens up the long tail of innovation that we discussed last week.
Networks become Agile with NaaS (the developer model)

Some telcos will open up their NaaS to internal or partner developers. Others are drooling at the prospect of offering network APIs for consumption by the market.

You’ve probably already identified this, but the awesome thing for the developer community is that they can combine services/APIs not just from the telcos but any other third-party providers (eg Netflix, Amazon, Facebook, etc, etc, etc). I could’ve shown these as East-West services in the diagram but decided to keep it simpler.

Developers are not constrained to offering communications services. They can now create / offer higher-order services that also happen to have communications requirements.

If you weren’t already on board with the concept, hopefully this article has convinced you that NaaS will be to networks what Agile has been to software.

Agree or disagree? Leave me a comment below.

PS1. I’ve used the old TMN pyramid as the basis of the diagram to tie the discussion to legacy solutions, not to imply size or emphasis of any of the layers.

PS2. I use the terms OSS/BSS as per TMN pyramid. The actual demarcation line between what OSS and BSS does tend to be grey and trigger religious wars, as per the post earlier this week.

PS3. Similarly, the size of the NaaS layer is to bring attention to it rather than to imply it is a monolithic stack in it’s own right. In reality, it is actually a much thinner shim layer architecturally

PS4. The analogy between NaaS and Agile is to show similarities, not to imply that NaaS replaces Agile. They can definitely be used together

PS5. I’ve used the term IT quite generically (operationally and technically) just to keep the diagram and discussion as simple as possible. In reality, there are many sub-functions like data centre operations, application monitoring, application control, applications development, product owner, etc. These are split differently at each operator.

Top 10 most common OSS project risks

OSS projects are full of risks we all know it. OSS projects have “earned” a bad name because of all those risks. On the other side of that same coin, OSS projects disappoint, in part I suspect because stakeholders expect such big things from their resource investments.

Ask anyone familiar with OSS projects and you’ll be sure to hear a long list of failings.

For those less familiar with what an OSS project has in store for you, I’d like to share a list of the most common risks I’ve seen on OSS projects.

Most people working in the OSS industry are technology-centric, so they’ll tend to cite risks that relate to the tech. That’s where I used to focus attention too. Now technology risk definitely exists, but as you’ll see below, I tend to start by looking at other risk factors first these days.

Most common OSS project risks / issues:

  1. Complexity (to be honest, this is probably more the root-cause / issue that manifests as many of the following risks). However, complexity across many aspects of OSS projects is one of the biggest problem sources
  2. Change ManagementOSS tend to introduce significant change to an organisation – operationally, organisationally, processes, training, etc. This is probably the most regularly underestimated component of any large OSS build
  3. Stakeholder Support / Politics – Challenges appear on every single OSS project. They invariably need strong support from stakeholders and sponsors to clear a path through the biggest challenges. If the project’s leaders aren’t fully committed and in unison, the delivery teams will be heavily constrained
  4. Ill-defined Scope – Over-scoping, scope omission and scope creep all represent risks to an OSS project. But scope is never perfectly defined or static, so scope management mechanisms need to be developed up-front rather than in-flight. Tying back to point 1 above, complexity minimisation should be a key element of scope planning. To hark back to my motto for OSS, “just because we can, doesn’t mean we should)
  5. Financial and commercial – As with scope, it’s virtually impossible to plan an OSS project to perfection. There are always unknowns.These unknowns can directly impact the original estimates. Projects with blow-outs and no contingency for time or money increase pressure on point 3 (stakeholders/sponsors) to maintain their support
  6. Client resource skills / availability – An OSS has to be built to the needs of a client. If the client is unable to provide resources to steer the implementation, then it’s unlikely for the client to get a solution that is perfectly adapted to the client’s needs. One challenge for the client is that their most valuable guides, those with the client’s tribal knowledge, are also generally in high demand by “business as usual” teams. It becomes a challenge to allocate enough of their time to guide  the OSS delivery team. Another challenge is augmenting the team with the required skill-set when a project introduces new skill requirements
  7. CommunicationOSS projects aren’t built in a vacuum. They have many project contributors and even more end-users. There are many business units that touch an OSS/BSS, each with their own jargon and interpretations.  For example, how many alternate uses of the term “service” can you think of? I think an important early-stage activity is to agree on and document naming conventions
  8. Culture – Of the client team and/or project team. Culture contributes to (or detracts from) motivation, morale, resource turnover, etc, which can have an impact on the team’s ability to deliver
  9. Design / Integration – Finally, a technology risk. This item is particularly relevant with complex projects, it can be difficult for all of the planned components to operate and integrate as planned. A commonly unrecognised risk relates to the viability of implementing a design. It’s common for an end-state design to be specified but with no way of navigating through a series of steps / phases and reach the end-state
  10. Technology – Similar to the previous point, there are many technology risks relating to items such as quality, scalability, resiliency, security, supportability, obsolescence, interoperability, etc

There’s one thing you will have probably noticed about this list. Most of the risks are common to other projects, not just OSS projects. However, the risks do tend to amplify on OSS projects because of their inherent complexity.

Is your OSS squeaking like an un-oiled bearing?

Network operators spend huge amounts on building and maintaining their OSS/BSS every year. There are many reasons they invest so heavily, but in most cases it can be distilled back to one thing – improving operational efficiency.

And our OSS/BSS definitely do improve operational efficiency, but there are still so many sources of friction. They’re squeaking like un-oiled bearings. Here are just a few of the common sources:

  1. First-time Installation
  2. Identifying best-fit tools
  3. Procurement of new tools
  4. Update / release processes
  5. Continuous data quality / consistency improvement
  6. Navigating to all features through the user interface
  7. Non-intuitive functionality / processes
  8. So many variants / complexity that end-users take years to attain expert-level capability
  9. Integration / interconnect
  10. Getting new starters up to speed
  11. Getting proficient operators to expertise
  12. Unlocking actionable insights from huge data piles
  13. Resolving the root-cause of complex faults
  14. Onboarding new customers
  15. Productionising new functionality
  16. Exception and fallout handling
  17. Access to supplier expertise to resolve challenges

The list goes on far deeper than that list too. The challenge for many OSS product teams, for any number of reasons, is that their focus is on adding new features rather than reducing friction in what already exists.

The challenge for product teams is diagnosing where the friction  and risks are for their customers / stakeholders. How do you get that feedback?

  • Every vendor has a product support team, so that’s a useful place to start, both in terms of what’s generating the most support calls and in terms of first-hand feedback from customers
  • Do you hold user forums on a regular basis, where you get many of your customers together to discuss their challenges, your future roadmap, new improvements / features
  • Does your process “flow” data show where the sticking points are for operators
  • Do you conduct gemba walks with your customers
  • Do you have a program of ensuring all developers spend at least a few days a year interacting directly with customers on their site/s
  • Do you observe areas of difficulty when delivering training
  • Do you go out of your way to ask your customers / stakeholders questions that are framed around their pain-points, not just framed within the context of your existing OSS
  • Do you conduct customer surveys? More importantly, do you conduct surveys through an independent third-party?

On the last dot-point, I’ve been surprised at some of the profound insights end-users have shared with me when I’ve been conducting these reviews as the independent interviewer. I’ve tended to find answers are more open / honest when being delivered to an independent third-party than if the supplier asks directly. If you’d like assistance running a third-party review, leave us a note on the contact page. We’d be delighted to assist.

Fast and slow OSS, where uCPE and network virtualisation fits in

Yesterday’s post talked about one of the many dichotomies in OSSfast and slow data / processes.

One of the longer lead-time items in relation to OSS data and processes is in network build and customer connections. From the time when capacity planning or a customer order creates the signal to build, it can be many weeks or months before the physical infrastructure work is complete and appearing in the OSS.

There are two financial downsides to this. Firstly, it tends to be CAPEX-heavy with equipment, construction, truck-rolls, government approvals, etc burning through money. Meanwhile, it’s also a period where there is no money coming in because the services aren’t turned on yet. The time-to-cash cycle of new build (or augmentation) is the bane of all telcos.

This is one of the exciting aspects of network virtualisation for telcos. In a time where connectivity is nearly ubiquitous in most countries, often with high-speed broadband access, physical build becomes less essential (except over-builds). Technologies such as uCPE (Universal Customer Premises Equipment), NFV (Network Function Virtualisation), SD WAN (Software-Defined Wide Area Networks), SDN (Software Defined Networks) and others mean that we can remotely upgrade and reconfigure the network without field work.

Network virtualisation gives the potential to speed up many of the slowest, and costliest processes that run through our OSS… but only if our OSS can support efficient orchestration of virtualised networks. And that means having an OSS with the flexibility to easily change out slow processes to replace them with fast ones without massive overhauls.

Would you hire a furniture maker as an OSS CEO?

Well, would you hire a furniture maker as CEO of an OSS vendor?

At face value, it would seem to be an odd selection right? There doesn’t seem to be much commonality between furniture and OSS does there? It seems as likely as hiring a furniture maker to be CEO of a car maker?

Oh wait. That did happen.

Ford Motor Company made just such a decision last year when appointing Jim Hackett, a furniture industry veteran, as its CEO. Whether the appointment proves successful or not, it’s interesting that Ford made the decision. But why? To focus on user experience and design as it’s next big differentiator. Clever line of thinking Bill Ford!!

I’ve prepared a slightly light-hearted table for comparison purposes between cars and OSS. Both are worth comparing as they’re both complex feats of human engineering:

Idx Comparison Criteria Car OSS
1 Primary objective Transport passengers between destinations Operationalise and monetise a comms network
2 Claimed “Business” justification Personal freedom Reducing the cost of operations
3 Operation of common functionality without conscious thought (developed through years of operator practice) Steering

Changing gears

Indicating

Hmmm??? Depends on which sales person or operator you speak with
4 Error detection and current-state monitoring Warning lights and instrument cluster/s Alarm lists, performance graphs
5 Key differentiator for customers (1970’s) Engine size Database / CPU size
6 Key differentiator for customers (2000’s) Gadgets / functions / cup-holders Functionality
7 Key differentiator for customers (2020+) User Experience

Self-driving

Connected car (car as an “experience platform”)

User Experience??

Zero-touch assurance?

Connected OSS (ie OSS as an experience platform)???

I’d like to focus on three key areas next:

  1. Item 3
  2. Item 4 and
  3. The transition between items 6 and 7

Item 3 – operating on auto-pilot

If we reference against item 1, the primary objective, experienced operators of cars can navigate from point A to point B with little conscious thought. Key activities such as steering, changing gears and Indicating can be done almost as a background task by our brains whilst doing other mental processing (talking, thinking, listening to podcasts, etc).

Experienced operators of OSS can do primary objectives quickly, but probably not on auto-pilot. There are too many “levers” to pull, too many decisions to make, too many options to choose from, for operators to background-process key OSS activities. The question is, could we re-architect to achieve key objectives more as background processing tasks?

Item 4 – error detection and monitoring

In a car, error detection is also a background task, where operators are rarely notified, only for critical alerts (eg engine light, fuel tank empty, etc). In an OSS, error detection is not a background task. We need full-time staff monitoring all the alarms and alerts popping up on our consoles! Sometimes they scroll off the page too fast for us to even contemplate.

In a car, monitoring is kept to the bare essentials (speedo, tacho, fuel guage, etc). In an OSS, we tend to be great at information overload – we have a billion graphs and are never sure which ones, or which thresholds, actually allow us to operate our “vehicle” effectively. So we show them all.

Transitioning from current to future-state differentiators

In cars, we’ve finally reached peak-cup-holders. Manufacturers know they can no longer differentiate from competitors just by having more cup-holders (at least, I think this claim is true). They’ve also realised that even entry-level cars have an astounding list of features that are only supplementary to the primary objective (see item 1). They now know it’s not the amount of functionality, but how seamlessly and intuitively the users interact with the vehicle on end-to-end tasks. The car is now seen as an extension of the user’s phone rather than vice versa, unlike the recent past.

In OSS, I’ve yet to see a single cup holder (apart from the old gag about CD trays). Vendors mark that down – cup holders could be a good differentiator. But seriously, I’m not sure if we realise the OSS arms race of features is no longer the differentiator. Intuitive end-to-end user experience can be a huge differentiator amongst the sea of complex designs, user interfaces and processes available currently. But nobody seems to be talking about this. Go to any OSS event and we only hear from engineers talking about features. Where are the UX experts talking about innovative new ways for users to interact with machines to achieve primary objectives (see item 1)?

But a functionality arms race isn’t a completely dead differentiator. In cars, there is a horizon of next-level features that can be true differentiators like self-driving or hover-cars. Likewise in OSS, incremental functionality increases aren’t differentiators. However, any vendor that can not just discuss, but can produce next-level capabilities like zero touch assurance (ZTA) and automated O2A (Order to Activate) will definitely hold a competitive advantage.

Hat tip to Jerry Useem, whose article on Atlantic provided the idea seed for this OSS post.

What are OSS “platform wrapper” roadblocks?

OSS can be cumbersome at times. Making change can be difficult. We tend to build layers of protections around them and the networks we manage. I get that. Change can be risky (although the protections are often implemented because the OSS and/or network platforms might not be as robust as they could be).

Contrast this with the OSS we want to create. We want to create a platform for rapid innovation, the platform that helps us and our clients generate opportunities and advantages.

For us to build a platform that allows our customers (and their customers) to revolutionise their markets, we might have to consider whether the protective layers around our OSS that are stymying change. Things like firewall burns, change review boards, documentation, approvals, politics, individuals with a reticence to change, etc.

For example, Netflix takes a contrarian, whitelist approach to access by its engineers rather than a blacklist. It assumes that its engineers are professional enough to only use the tools that they need to get their tasks done. They enable their engineers to use commonly off-limits functionality such as adding their own DNS records (ie to support the stand-up of new infrastructure). But they also take a use-it-or-lose-it approach, monitoring the tools that the engineer uses and rescinding access to tools they haven’t used within 90 days. But if they do need access again, it’s as simple as a message on Slack to reinstate it.

This is just one small example of streamlining the platform wrapper. There are probably a million others.

When working on OSS projects as the integrator / installer, I’ve seen many of these “platform wrapper” roadblocks. I’m sure you have too. If you see them as the installer, chances are the ops team you hand over to will also experience these roadblocks.

Question though. Do you flag these platform wrapper roadblocks for improvement, or do you treat them as non-platform and therefore just live with them?

Only do the OSS that only you can do

A friend of mine has a great saying, “only do what only you can do.”

Do you think that this holds true for the companies undergoing digital transformation? Banks are now IT companies. Insurers are IT companies. Car manufacturers are now IT companies. Telcos are, well, some are IT companies.

We’ve spoken before about the skill transformations that need to happen within telcos if they’re to become IT companies. Some are actively helping their workforce to become more developer-centric. Some of the big telcos that I’ve been assisting in the last few years are embarking on bold Agile-led IT transformations. They’re cutting more of their own code and managing their own IT developments.

That’s exciting news for all of us in OSS. Even if it loses the name OSS in future, telcos will still need software that efficiently operationalises their networks. We have the overlapping skills in software, networks, business and operations.

But I wonder about the longevity of the in-house approach unless we come focus clearly on the first quote above. If all development is brought in-house, we end up with a lot of duplication across the industry. I’m not really sure that it makes sense doing all the heavy-lifting of all custom OSS tools when the heavy-lifting has already been done elsewhere.

It’s the old ebb and flow between in-house and outsourced OSS.

In my very humble opinion, it’s not just a choice between in-house and outsourced that matters. The more important decisions are around choosing to only develop the tools in-house that only you can do (ie the strategic differentiators).

I sent you an OSS helicopter

There’s a fable of a man stuck in a flood. Convinced that God is going to save him, he says no to a passing canoe, boat, and helicopter that offer to help. He dies, and in heaven asks God why He didn’t save him. God says, “I sent you a canoe, a boat, and a helicopter!”
We all have vivid imaginations. We get a goal in our mind and picture the path so clearly. Then it’s hard to stop focusing on that vivid image, to see what else could work.
New technologies make old things easier, and new things possible. That’s why you need to re-evaluate your old dreams to see if new means have come along
.”
Derek Sivers
, here.

In the past, we could make OSS platform decisions with reasonable confidence that our choices would remain viable for many years. For example, in the 1990s if we decided to build our OSS around a particular brand of relational database then it probably remained a valid choice until after 2010.

But today, there are so many more platforms to choose from, not to mention the technologies that underpin them. And it’s not just the choices currently available but the speed with which new technologies are disrupting the existing tech. In the 1990s, it was a safe bet to use AutoCAD for outside plant visualisation without the risk of heavy re-tooling within a short timeframe.

If making the same decision today, the choices are far less clear-cut. And the risk that your choice will be obsolete within a year or two has skyrocketed.

With the proliferation of open-source projects, the decision has become harder again. That means the skill-base required to service each project has also spread thinner. In turn, decisions for big investments like OSS projects are based more on the critical mass of developers than the functionality available today. If many organisations and individuals have bought into a particular project, you’re more likely to get your new features developed than from a better open-source project that has less community buy-in.

We end up with two ends of a continuum to choose between. We can either chase every new bright shiny object and re-factor for each, or we can plan a course of action and stick to it even if it becomes increasingly obsoleted over time. The reality is that we probably fit somewhere between the two ends of the spectrum.

To be brutally honest I don’t have a solution to this conundrum. The closest technique I can suggest is to design your solution with modularity in mind, as opposed to the monolithic OSS of the past. That’s the small-grid OSS architecture model. It’s easier to replace one building than an entire city.

Life-cycles of key platforms are likely to now be a few years at best (rather than decades if starting in the 1990s). Hence, we need to limit complexity (as per the triple-constraint of OSS) and functionality to support the most high-value objectives.

I’m sure you face the same conundrums on a regular basis. Please leave a comment below to tell us how you overcome them.

OSS orgitecture

So far this week we’ve been focusing on ways to improve the OSS transformation process. Monday provided 7 models for achieving startup-like efficiency for larger OSS transformations. Tuesday provided suggestions for speeding up the transition from OSS PoC to getting the solution into production, specifically strategies for absorbing an OSS PoC into production.

Both of these posts talk about the speed of getting things done outside the bureaucracy of big operators, big networks and big OSS. Today, as the post title suggests, we’re going to look at orgitecture – how re-designing the structure and culture of an organisation can help streamline digital transformations.

Do you agree with the premise that smaller entities (eg Agile autonomous groups, partners, consultants, etc) can get OSS tasks done more efficiently when operating at arms-length of the larger entity (eg the carrier)? I believe that this is a first principle of physics at play.

If you’ve worked under this arms-length arrangement in the past, you’ll also know that at some point those delivery outcomes need to get integrated back into the big entity. It’s what we referred to yesterday as absorption, where the level of integration effort falls on a continuum between minimally absorbed to fully absorbed.

OSS orgitecture is the re-architecture of the people, processes, culture and org structure to better allow for the absorption process. In the past, all the safety-checks (eg security, approvals, ops handover, etc) were designed on the assumption that internal teams were doing the work. They’re not always a great fit, especially when it comes to documentation review and approval.

For example, I have a belief that the effectiveness of documentation review and approval is inversely proportional to the number of reviewers (in most, but not all cases). Unfortunately, when an external entity is delivering, there tends to be inherently less trust than if an internal entity was delivering. As such, the safety-checks increase.

Another example is when the large organisation uses Agile delivery models, but use supply partners to deliver scope of works. The partners are able to assign effort in a sequential / waterfall manner, but can be delayed by only getting timeslices of attention from client’s staff (ie resources are available according to Agile sprint planning).

Security and cutover planning mechanisms such as Change Review Boards (CRB) have also been designed around old internal delivery models. They also need to be reconsidered to facilitate a pipeline of externally-implemented change.

Perhaps the biggest orgitecture factor is in getting multiple internal business units to work together effectively. In the old world we needed all the business units to reach consensus for a new product to come to market. Sales/Marketing/Products had to work with OSS/IT and Networks. Each of these units tend to have vastly different cultures and different cadences for getting their tasks done. Delivering a new product was as much an organisational challenge as it was a technical challenge and often took months. Those times-to-market are not feasible in a world of software where competitive advantages are fleeting. External entities can potentially help or hinder these timeframes. Careful design of small autonomous teams have the potential to improve abstraction at the interlocks, but culture remains the potential roadblock.

I’m excited by the opportunity for OSS delivery improvement coming from leveraging the gig economy. But if big OSS transformations are to make use of these efficiency gains, then we may also need to consider culture and process refinement as part of the change management.

Speeding up your OSS transition from PoC to PROD

In yesterday’s article, we discussed 7 models for achieving startup-like efficiency on large OSS transformations.

One popular approach is to build a proof-of-concept or sandpit quickly on cloud hosting or in lab environments. It’s fast for a number of reasons including reduced number of approvals, faster activation of infrastructure, reduced safety checks (eg security, privacy, etc), minimised integration with legacy systems and many other reasons. The cloud hosting business model is thriving for all of these reasons.

However, it’s one thing to speed up development of an OSS PoC and another entirely to speed up deployment to a PROD environment. As soon as you wish to absorb the PoC-proven solution back into PROD, all the items listed above (eg security sign-offs) come back into play. Something that took days/weeks to stand up in PoC now takes months to productionise.

Have you noticed that the safety checks currently being used were often defined for the old world? They often aren’t designed with transition from cloud to PROD in mind. Similarly, the culture of design cross-checks and approvals can also be re-framed (especially when the end-to-end solution crosses multiple different business units). Lastly, and way outside my locus of competence, is in re-visiting security / privacy / deployment / etc models to facilitate easier transition.

One consideration to make is just how much absorption is required. For example, there are examples of services being delivered to the large entity’s subscribers by a smaller, external entity. The large entity then just “clips-the-ticket,” gaining a revenue stream with limited involvement. But the more common (and much more challenging) absorption model is for the partner to fold the solution back into the large entity’s full OSS/BSS stack.

So let’s consider your opportunity in terms of the absorption continuum that ranges between:

clip-the-ticket (minimally absorbed) <-----------|-----------> folded-in (fully absorbed)

Perhaps it’s feasible for your opportunity to fit somewhere in between (partially absorbed)? Perhaps part of that answer resides in the cloud model you decide to use (public, private, hybrid, cloud-managed private cloud) as well as the partnership model?

Modularity and reduced complexity (eg integrations) are also a factor to consider (as always).

I haven’t seen an ideal response to the absorption challenge yet, but I believe the solution lies in re-framing corporate culture and technology stacks. We’ll look at that in more detail tomorrow.

How about you? Have you or your organisation managed to speed up your transition from PoC to PROD? What techniques have you found to be successful?

Seven OSS transformation efficiency models

Do you work in a large organisation? Have you also worked in smaller organisations?
Where have you felt more efficient?

I’ve been lucky enough to work on some massive OSS transformations for large T1 telcos. But I’ve always noticed the inefficiency of working on these projects when embedded inside the bureaucracy of the beast. With all of the documentation, sign-offs, meetings, politics, gaining consensus, budget allocations, etc it can sometimes feel so inefficient. On some past projects, I’ve felt I can accomplish more in a day outside than a week or more inside the beast.

This makes sense when applying the fundamental law of physics F = M x a to OSS projects. In other words, the greater the mass (of the organisation), the more force must be applied to reach a given acceleration (ie to effect a change).

It’s one of the reasons I love working within a small entity (Passionate About OSS), but into big entities (the big telcos and utilities). It’s also why I strongly believe that the big entities need to better leverage smaller working groups to facilitate big OSS change. Not just OSS transformation, but any project where the size of the culture and technology stack are prohibitive.

Here are a few ways you can use to bring a start-up’s efficiency to a big OSS transformation:

  1. Agile methodologies – If done well, Agile can be great at breaking transformations down into smaller, more manageable pieces. The art is in designing small autonomous teams / responsibilities and breakdown of work to minimise dependencies
  2. Partnerships – Using smaller, external partners to deliver outcomes (eg product builds or service offerings) that can be absorbed into the big organisation. There are varying levels of absorption here – from an external, “clip-the-ticket” offering to offerings that are fully absorbed into the large entity’s OSS/BSS stack
  3. Consultancies – Similar to partnerships, but using smaller teams to implement professional services
  4. Spin-out / spin-in teams – Separating small teams of experts out from the bureaucracy of the large organisation so that they can achieve rapid progress
  5. Smart contracts / RFPs – I love the potential for smart contracts to automate the offer of small chunks of work to trusted partners to bid upon and then deliver upon
  6. Externalised Proofs of Concept (PoC) – One of the big challenges in implementing for large organisations is all of the safety checks that slow progress. Many, such as security and privacy mechanisms, are completely justified for a production network. But when a concept needs to be proved, such as user journeys, product integrations, sand-pit environments, etc, then cloud-based PoCs can be brilliant
  7. Alternate brands – Have you also noticed that some of the tier-1 telcos have been spinning out low-cost and/or niche brands with much leaner OSS/BSS stacks, offerings and related culture lately? It’s a clever business model on many levels. Combined with the strangler fig transformation approach, this might just represent a pathway for the big brand to shed many of their OSS/BSS legacy constraints

Can you think of other models that I’ve missed?

The key to these strategies is not so much the carve-out, the process of getting small teams to do tasks efficiently, but the absorb-in process. For example, how to absorb a cloud-based PoC back into the PROD network, where all safety checks (eg security, privacy, operations acceptance, etc) still need to be performed. More on that in tomorrow’s post.

OSS resilience vs precision

Resilience is what happens when we’re able to move forward even when things don’t fit together the way we expect.[OSS project anyone???] And tolerances are an engineer’s measurement of how well the parts meet spec.
One way to ensure that things work out the way you hope is to spend the time and money to ensure that every part, every form, every worker meets spec. Tighten your spec, increase precision and you’ll discover that systems become more reliable.
The other alternative is to embrace the fact that nothing is ever exactly on spec, and to build resilient systems.
You’ll probably find that while precision feels like the way forward, resilience, the ability to thrive when things go wrong, is a much safer bet
.”
Seth Godin
here.

Yesterday’s post talked about the difference between having a team of artisans versus a team that paints by numbers. Seth’s blog provides a similar comparison. Instead of comparing by talent, Seth compares by attitude.

I’m really conflicted by Seth’s comparison.

From the side of past experience, resilience is a massive factor in overcoming the many obstacles faced on implementation projects. I’ve yet to work on an OSS project where all challenges were known at inception.

From the side of an ideal future, precision and repeatability are essential factors in improving the triple constraint of OSS delivery and increasing reliability for customers. And whilst talking about the future, the concept of network slicing (which holds the key for 5G) is dependent upon OSS repeatability and efficiency.

So which do we focus on? Building a vastly talented, experienced and resilient implementation team? Or building a highly reliable, repeatable implementation system? Both, most likely.

But what if you only get to choose one? Which do you focus on (for you and your team/system)?

Can OSS/BSS assist CX? We’re barely touching the surface

Have you ever experienced an epic customer experience (CX) fail when dealing a network service operator, like the one I described yesterday?

In that example, the OSS/BSS, and possibly the associated people / process, had a direct impact on poor customer experience. Admittedly, that 7 truck-roll experience was a number of years ago now.

We have fewer excuses these days. Smart phones and network connected devices allow us to get OSS/BSS data into the field in ways we previously couldn’t. There’s no need for printed job lists, design packs and the like. Our OSS/BSS can leverage these connected devices to give far better decision intelligence in real time.

If we look to the logistics industry, we can see how parcel tracking technologies help to automatically provide status / progress to parcel recipients. We can see how recipients can also modify their availability, which automatically adjusts logistics delivery sequencing / scheduling.

This has multiple benefits for the logistics company:

  • It increases first time delivery rates
  • Improves the ability to automatically notify customers (eg email, SMS, chatbots)
  • Decreases customer enquiries / complaints
  • Decreases the amount of time the truck drivers need to spend communicating back to base and with clients
  • But most importantly, it improves the customer experience

Logistics is an interesting challenge for our OSS/BSS due to the sheer volume of customer interaction events handled each day.

But it’s another area that excites me even more, where CX is improved through improved data quality:

  • It’s the ability for field workers to interact with OSS/BSS data in real-time
  • To see the design packs
  • To compare with field situations
  • To update the data where there is inconsistency.

Even more excitingly, to introduce augmented reality to assist with decision intelligence for field work crews:

  • To provide an overlay of what fibres need to be spliced together
  • To show exactly which port a patch-lead needs to connect to
  • To show where an underground cable route goes
  • To show where a cable runs through trayway in a data centre
  • etc, etc

We’re barely touching the surface of how our OSS/BSS can assist with CX.

The 7 truck-roll fail

In yesterday’s post we talked about the cost of quality. We talked about examples of primary, secondary and tertiary costs of bad data quality (DQ). We also highlighted that the tertiary costs, including the damage to brand reputation, can be one of the biggest factors.

I often cite an example where it took 7 truck rolls to connect a service to my house a few years ago. This provider was unable to provide an estimate of when their field staff would arrive each day, so it meant I needed to take a full day off work on each of those 7 occasions.

The primary cost factors are fairly obvious, for me, for the provider and for my employer at the time. On the direct costs alone, it would’ve taken many months, if not years, for the provider to recoup their install costs. Most of it attributable to the OSS/BSS and associated processes.

Many of those 7 truck rolls were a direct result of having bad or incomplete data:

  • They didn’t record that it was a two storey house (and therefore needed a crew with “working at heights” certification and gear)
  • They didn’t record that the install was at a back room at the house (and therefore needed a higher-skilled crew to perform the work)
  • The existing service was installed underground, but they had no records of the route (they went back to the designs and installed a completely different access technology because replicating the existing service was just too complex)

Customer Experience (CX), aka brand damage, is the greatest of all cost of quality factors when you consider studies such as those mentioned below.

A dissatisfied customer will tell 9-15 people about their experience. Around 13% of dissatisfied customers tell more than 20 people.”
White House Office of Consumer Affairs
(according to customerthink.com).

Through this page alone, I’ve told a lot more than 20 (although I haven’t mentioned the provider’s name, so perhaps it doesn’t count! 🙂  ).

But the point is that my 7 truck-roll example above could’ve been avoided if the provider’s OSS/BSS gave better information to their field workers (or perhaps enforced that the field workers populated useful data).

We’ll talk a little more tomorrow about modern Field Services tools and how our OSS/BSS can impact CX in a much more positive way.

Waiting for the disaster to invest in the data

Have you seen OSS tools where the applications are brilliant but consigned to failure by bad data? I definitely have! I call it the data death spiral. It’s a well known fact in the industry that bad data can ruin an OSS. You know it. I know it. Everyone knows it.

But how many companies do you know that invest in data quality? I mean truly invest in it.

The status quo is not to invest in the data, but the disaster. That is the disaster caused by the data!

Being a data nerd, it boggles my brain to understand why that is. My only assumption to date is that we don’t adequately measure the cost of quality. Or more to the point, what the cost impact is resulting from bad data.

I recently attempted to model the cost of quality. My model focuses on the ripple-out impacts from poor PNI (Physical Network Inventory) quality data alone. Using conservative numbers, the cost of quality is in the millions for the first carrier I applied it to.

Why do you think operators wait for the disaster before investing in the data? What alternate techniques do you use to focus attention, and investment, on the data?

The OSS Tinder effect

On Friday, we provided a link to an inspiring video showing Rolls-Royce’s vision of an operations centre. That article is a follow-on from other recent posts about to pros and cons of using MVPs (Minimum Viable Products) as an OSS transformation approach.

I’ve been lucky to work on massive OSS projects. Projects that have taken months / years of hard implementation grind to deliver an OSS for clients. One was as close to perfect (technically) as I’ve been involved with. But, alas, it proved to be a failure.

How could that be you’re wondering? Well, it’s what I refer to as the Tinder Effect. On Tinder, first appearances matter. Liked or disliked at the swipe of a hand.

Many new OSS are delivered to users who are already familiar with one or more OSS. If they’re not as pretty or as functional or as intuitive as what the users are accustomed to, then your OSS gets a swipe to the left. As we found out on that project (a ‘we’ that included all the client’s stakeholders and sponsors), first impressions can doom an otherwise successful OSS implementation.

Since then, I’ve invested a lot more time into change management. Change management that starts long before delivery and handover. Long before designs are locked in. Change management that starts with hearts and minds. And starts by involving the end users early in the change process. Getting them involved in the vision, even if not quite as elaborate as Rolls-Royce’s.

The Rolls Royce vision of OSS

Yesterday’s post mentioned the importance of setting a future vision as part of your MVP delivery strategy.

As Steve Blank said here, Founders act like the “minimum” part is the goal. Or worse, that every potential customer should want it. In the real world not every customer is going to get overly excited about your minimum feature set…You’re selling the vision and delivering the minimum feature set to visionaries not everyone.”

Yesterday’s post promised to give you an example of an exciting vision. Not just any vision, the Rolls-Royce version of a vision.

We’ve all seen examples of customers wanting a Rolls-Royce OSS solution. Here’s a video that’s as close as possible to Rolls-Royce’s own vision of an OSS solution.

The OSS Minimum Feature Set is Not The Goal

This minimum feature set (sometimes called the “minimum viable product”) causes lots of confusion. Founders act like the “minimum” part is the goal. Or worse, that every potential customer should want it. In the real world not every customer is going to get overly excited about your minimum feature set. Only a special subset of customers will and what gets them breathing heavy is the long-term vision for your product.

The reality is that the minimum feature set is 1) a tactic to reduce wasted engineering hours (code left on the floor) and 2) to get the product in the hands of early visionary customers as soon as possible.

You’re selling the vision and delivering the minimum feature set to visionaries not everyone.”
Steve Blank here.

A recent blog series discussed the use of pilots as an OSS transformation and augmentation change agent.
I have the need for OSS speed
Re-framing an OSS replacement strategy
OSS transformation is hard. What can we learn from open source?

Note that you can replace the term pilot in these posts with MVP – Minimum Viable Product.

The attraction in getting an MVP / pilot version of your OSS into the hands of users is familiarity and momentum. The solution becomes more tangible and therefore needs less documentation (eg architecture, designs, requirement gathering, etc) to describe foreign concepts to customers. The downside of the MVP / pilot is that not every customer will “get overly excited about your minimum feature set.”

As Steve says, “Only a special subset of customers will and what gets them breathing heavy is the long-term vision for your product.” The challenge for all of us in OSS is articulating the long-term vision and making it compelling…. and not just leaving the product in its pilot state (we’ve all seen this happen haven’t we?)

We’ll provide an example of a long-term vision tomorrow.

PS. I should also highlight that the maximum feature set also isn’t the goal either.

The layers of ITIL redundancy

Today’s is something of a heretical post, especially for the believers in ITIL. In the world of OSS, we look to build in layers of resiliency and not layers of redundancy.

The following diagram and subsequent text in italics describes a typical ITIL process and is all taken from https://www.computereconomics.com/article.cfm?id=1074

Example of relationship between ITIL incidents, problems, and changes

The sequence of events as shown in Figure 1 is as follows:

  • At TIME = 0, an External Event is detected by the Incident Management process. This could be as simple as a customer calling to say that service is unavailable or it could be an automated alert from a system monitoring device.The incident owner logs and classifies this as incident i2. Then, the incident owner tries to match i2 to known errors, work-arounds, or temporary fixes, but cannot find a match in the database.
  • At TIME = 1, the incident owner dispatches a problem request to the Problem Management process anticipating a work-around, temporary fix, or other assistance. In doing so, the incident owner has prompted the creation of Problem p2.
  • At TIME = 2, the problem owner of p2 returns the expected temporary fix to the incident owner of i2.  Note that both i2 and p2 are active and exist simultaneously. The incident owner for i2 applies the temporary fix.
  • In this case, the work-around requires a change request.  So, at Time = 3, the incident owner for i2 initiates change request, c2.
  • The change request c2 is applied successfully, and at TIME = 4, c2 is closed. Note that for a while i2, p2 and c2 all exist simultaneously.
  • Because c2 was successful, the incident owner for i2 can now confirm that the incident is resolved. At TIME = 5, i2 is closed. However, p2 remains active while the problem owner searches for a permanent fix. The problem owner for p2 would be responsible for implementing the permanent fix and initiating any necessary change requests.

But I look at it slightly differently. At their root, why do Incident Management, Problem Management and Change Management exist? They’re all just mechanisms for resolving a system* health issue. If we detect an event/s and fix it, we don’t have to expend all the effort of flicking tickets around.

Thinking within the T2R paradigm of trouble -> incident -> problem -> change -> resolve holds us back. If we can skip the middle steps and immediately associate a resolution with the event/s, we get a whole lot more efficient. If we can immediately relate a trigger with a reaction, we can also get rid of the intermediate ticket flickers and the processing cycle time.

So, the NOC of the future surely requires us to build a trigger -> reaction recommendation engine (and body of knowledge). That’s a more powerful tool to supply to our NOC operators than incidents, problems and change requests. (Easier to write about than to actually solve though of course)

OSS transformation is hard. What can we learn from open source?

Have you noticed an increasing presence of open-source tools in your OSS recently? Have you also noticed that open-source is helping to trigger transformation? Have you thought about why that might be?

Some might rightly argue that it is the cost factor. You could also claim that they tend to help resolve specific, but common, problems. They’re smaller and modular.

I’d argue that the reason relates to our most recent two blog posts. They’re fast to install (don’t need to get bogged down in procurement) and they’re easy to run in parallel for comparison purposes.

If you’re designing an OSS can you introduce the same concepts? Your OSS might be for internal purposes or to sell to market. Either way, if you make it fast to build and easy to use, you have a greater chance of triggering transformation.

If you have a behemoth OSS to “sell,” transformation persuasion is harder. The customer needs to rally more resources (funds, people, time) just to compare with what they already have. If you have a behemoth on your hands, you need to try even harder to be faster, easier and more modular.