Cool new feature – An OSS masquerading as…

I spent some time with a client going through their OSS/BSS yesterday. They’re an Australian telco with a primarily home-grown, browser-based OSS/BSS. One of its features was something I’ve never seen in an OSS/BSS before. But really quite subtle and cool.

They have four tiers of users:

  1. Super-admins (the carrier’s in-house admins),
  2. Standard (their in-house users),
  3. Partners (they use many channel partners to sell their services),
  4. Customer (the end-users of the carrier’s services).

All users have access to the same OSS/BSS, but just with different levels of functionality / visibility, of course.

Anyway, the feature that I thought was really cool was that the super-admins have access to what they call the masquerade function. It allows them to masquerade as any other user on the system without having to log-out / login to other accounts. This allows them to see exactly what each user is seeing and experience exactly what they’re experiencing (notwithstanding any platform or network access differences such as different browsers, response times, etc).

This is clearly helpful for issue resolution, but I feel it’s even more helpful for design, feature release and testing across different personas.

In my experience at least, OSS/BSS builders tend to focus on a primary persona (eg the end-user) and can overlook multi-persona design and testing. The masquerade function can make this task easier.

Network slicing and a seismic shift in OSS responsibility

Network slicing allows operators to segment their network and configure each different slice to the specific needs of that customer (or group of customers). So rather than the network infrastructure being configured for the best compromise that suits all use-cases, instead each slice can be configured optimally for each use-case. That’s an exciting concept.

The big potential roadblock however, falls almost entirely on our OSS/BSS. If our operational tools require significant manual intervention on just one network now, then what chance do operators have of efficiently looking after many networks (ie all the slices).

This article describes the level of operational efficiency / automation required to make network slicing cost effective. It clearly shows that we’ll have to deliver massive sophistication in our OSS/BSS to handle automation, not to mention the huge number of variants we’d have to cope with across all the slices. If that’s the case, network slicing isn’t going to be viable any time soon.

But something just dawned on me today. I was assuming that the onus for managing each slice would fall on the network operator. What if we take the approach that telcos use with security on network pipes instead? That is, the telco shifts the onus of security onto their customer (in most cases). They provide a dumb pipe and ask the customer to manage their own security mechanisms (eg firewalls) on the end.

In the case of network slicing, operators just provide “dumb slices.” The operator assumes responsibility for providing the network resource pool (VNFs – Virtual Network Functions) and the automation of slice management including fulfilment (ie adds, modifies, deletes, holds, etc) and assurance. But the customers take responsibility for actually managing their network (slice) with their own OSS/BSS (which they probably already have a suite of anyway).

This approach doesn’t seem to require the same level of sophistication. The main impacts I see (and I’m probably overlooking plenty of others) are:

    1. There’s a new class of OSS/BSS required by the operators, that of automated slice management
    2. The customers already have their own OSS/BSS, but they currently tend to focus on monitoring, ticketing, escalations, etc. Their new customer OSS/BSS would need to take more responsibility for provisioning, including traffic engineering
    3. And I’d expect that to support customer-driven provisioning, the operators would probably need to provide ways for customers to programmatically interface with the network resources that make up their slice. That is, operators would need to offer network APIs or NaaS to their customers externally, not just for internal purposes
    4. Determining the optimal slice model. For example, does the carrier offer:
      1. A small number of slice types (eg video, IoT low latency, IoT low chat, etc), where each slice caters for a category of customers, but with many slice instances (one for each customer)
      2. A small number of slice instances, where all customers in that category share the single slice
      3. Customised slices for premium customers
      4. A mix of the above

.In the meantime, changes could be made as they have in the past, via customer portals, etc.

Thoughts?

Two concepts to help ease long-standing OSS problems

There’s a famous Zig Ziglar quote that goes something like, “You can have everything in life you want, if you will just help enough other people get what they want.”

You could safely assume that this was written for the individual reader, but there is some truth in it within the OSS context too. For the OSS designer, builder, integrator, does the statement “You can have everything in your OSS you want, if you will just help enough other people get what they want,” apply?

We often just think about the O in OSS – Operations people, when looking for who to help. But OSS/BSS has the ability to impact far wider than just the Ops team/s.

The halcyon days of OSS were probably in the 1990’s to early 2000’s when the term OSS/BSS was at its most sexy and exciting. The big telcos were excitedly spending hundreds of millions of dollars. Those projects were huge… and hugely complex… and hugely fun!

With that level of investment, there was the expectation that the OSS/BSS would help many people. And they did. But the lustre has come off somewhat since then. We’ve helped sooooo many people, but perhaps didn’t help enough people enough. Just speak with anybody involved with an OSS/BSS stack and you’ll hear hints of a large gap that exists between their current state and a desired future state.

Do you mind if I ask two questions?

  1. When you reflect on your OSS activities, do you focus on the technology, the opportunities or the problems
  2. Do you look at the local, day-to-day activities or the broader industry

I tend to find myself focusing on the problems – how to solve them within the daily context on customer challenges, but the broader industry problems when I take the time to reflect, such as writing these blogs.

The part I find interesting is that we still face most of the same problems today that we did back in the 1990’s-2000’s. The same source of risks. We’ve done a fantastic job of helping many people get what they want on their day-to-day activities (the incremental). We still haven’t cracked the big challenges though. That’s why I wrote the OSS Call for Innovation, to articulate what lays ahead of us.

It’s why I’m really excited about two of the concepts we’ve discussed this week:

NaaS is to networks what Agile is to software

After Telstra’s NaaS (Network as a Service) program won a TM Forum excellence award, I promised yesterday to share a post that describes why I’m so excited about the concept of NaaS.

As the title suggests above, NaaS has the potential to be as big a paradigm shift for networks (and OSS/BSS) as Agile has been for software development.

There are many facets to the Agile story, but for me one of the most important aspects is that it has taken end-to-end (E2E), monolithic thinking and has modularised it. Agile has broken software down into pieces that can be worked on by smaller, more autonomous teams than the methods used prior to it.

The same monolithic, E2E approach pervades the network space currently. If a network operator wants to add a new network type or a new product type/bundle, large project teams must be stood up. And these project teams must tackle E2E complexity, especially across an IT stack that is already a spaghetti of interactions.

But before I dive into the merits of NaaS, let me take you back a few steps, back into the past. Actually, for many operators, it’s not the past, but the current-day model.

Networks become Agile with NaaS (the TMN model)

As per the orange arrow, customers of all types (Retail, Enterprise and Wholesale) interact with their network operator through BSS (and possibly OSS) tools. [As an aside, see this recent post for a “religious war” discussion on where BSS ends and OSS begins]. The customer engagement occurs (sometimes directly, sometimes indirectly) via BSS tools such as:

  • Order Entry, Order Management
  • Product Catalog (Product / Offer Management)
  • Service Management
  • SLA (Service Level Agreement) Management
  • Billing
  • Problem Management
  • Customer Management
  • Partner Management
  • etc

If the customer wants a new instance of an existing service, then all’s good with the current paradigm. Where things become more challenging is when significant changes occur (as reflected by the yellow arrows in the diagram above).

For example, if any of the following are introduced, there are end-to-end impacts. They necessitate E2E changes to the IT spaghetti and require formation of a project team that includes multiple business units (eg products, marketing, IT, networks, change management to support all the workers impacted by system/process change, etc)

  1. A new product or product bundle is to be taken to market
  2. An end-customer needs a custom offering (especially in the case of managed service offerings for large corporate / government customers)
  3. A new network type is added into the network
  4. System and / or process transformations occur in the IT stack

If we just narrow in on point 3 above, fundamental changes are happening in network technology stacks already. Network virtualisation (SDN/NFV) and 5G are currently generating large investments of time and money. They’re fundamental changes because they also change the shape of our traditional OSS/BSS/IT stacks, as follows.

Networks become Agile with NaaS (the virtualisation model)

We now not only have Physical Network Functions (PNF) to manage, but Virtual Network Functions (VNF) as well. In fact it now becomes even more difficult because our IT stacks need to handle PNF and VNF concurrently. Each has their own nuances in terms of over-arching management.

The virtualisation of networks and application infrastructure means that our OSS see greater southbound abstraction. Greater southbound abstraction means we potentially lose E2E visibility of physical infrastructure. Yet we still need to manage E2E change to IT stacks for new products, network types, etc.

The diagram below shows how NaaS changes the paradigm. It de-couples the network service offerings from the network itself. Customer Facing Services (CFS) [as presented by BSS/OSS/NaaS] are de-coupled from Resource Facing Services (RFS) [as presented by the network / domains].

NaaS becomes a “meet-in-the-middle” tool. It effectively de-couples

  • The products / marketing teams (who generate customer offerings / bundles) from
  • The networks / operations teams (who design, build and maintain the network).and
  • The IT teams (who design, build and maintain the IT stack)

It allows product teams to be highly creative with their CFS offerings from the available RFS building blocks. Consider it like Lego. The network / ops teams create the building blocks and the products / marketing teams have huge scope for innovation. The products / marketing teams rarely need to ask for custom building blocks to be made.

You’ll notice that the entire stack shown in the diagram below is far more modular than the diagram above. Being modular makes the network stack more suited to being worked on by smaller autonomous teams. The yellow arrows indicate that modularity, both in terms of the IT stack and in terms of the teams that need to be stood up to make changes. Hence my claim that NaaS is to networks what Agile has been to software.

Networks become Agile with NaaS (the NaaS model)

You will have also noted that NaaS allows the Network / Resource part of this stack to be broken into entirely separate network domains. Separation in terms of IT stacks, management and autonomy. It also allows new domains to be stood up independently, which accommodates the newer virtualised network domains (and their VNFs) as well as platforms such as ONAP.

The NaaS layer comprises:

  • A TMF standards-based API Gateway
  • A Master Services Catalog
  • A common / consistent framework of presentation of all domains

The ramifications of this excites me even more that what’s shown in the diagram above. By offering access to the network via APIs and as a catalog of services, it allows a large developer pool to provide innovative offerings to end customers (as shown in the green box below). It opens up the long tail of innovation that we discussed last week.
Networks become Agile with NaaS (the developer model)

Some telcos will open up their NaaS to internal or partner developers. Others are drooling at the prospect of offering network APIs for consumption by the market.

You’ve probably already identified this, but the awesome thing for the developer community is that they can combine services/APIs not just from the telcos but any other third-party providers (eg Netflix, Amazon, Facebook, etc, etc, etc). I could’ve shown these as East-West services in the diagram but decided to keep it simpler.

Developers are not constrained to offering communications services. They can now create / offer higher-order services that also happen to have communications requirements.

If you weren’t already on board with the concept, hopefully this article has convinced you that NaaS will be to networks what Agile has been to software.

Agree or disagree? Leave me a comment below.

PS1. I’ve used the old TMN pyramid as the basis of the diagram to tie the discussion to legacy solutions, not to imply size or emphasis of any of the layers.

PS2. I use the terms OSS/BSS as per TMN pyramid. The actual demarcation line between what OSS and BSS does tend to be grey and trigger religious wars, as per the post earlier this week.

PS3. Similarly, the size of the NaaS layer is to bring attention to it rather than to imply it is a monolithic stack in it’s own right. In reality, it is actually a much thinner shim layer architecturally

PS4. The analogy between NaaS and Agile is to show similarities, not to imply that NaaS replaces Agile. They can definitely be used together

PS5. I’ve used the term IT quite generically (operationally and technically) just to keep the diagram and discussion as simple as possible. In reality, there are many sub-functions like data centre operations, application monitoring, application control, applications development, product owner, etc. These are split differently at each operator.

Where does BSS end and OSS begin?

Over the years, I’ve been asked the question many times, “what’s the difference between OSS (Operational Support Systems) and BSS (Business Support Systems)?” I’ve also been asked, albeit slightly less regularly, how OSS and BSS map to TM Forum standards like the TAM and eTOM.

To my knowledge, TM Forum has never attempted to map OSS vs BSS. It sets off too many religious wars.

Just for fun, I thought I’d have a crack at trying to map OSS and BSS onto the TAM. Click on the image for a larger PDF version.

OSS and BSS overlaid onto the TAM

I’ve taken the perspective that customer or business-facing functionality is generally considered to be BSS. Alternatively, network / operations-facing functionality is generally considered to be OSS.
And these two tend to overlap at the service layer.

Or, you could just simply call them business operations systems (BOS) that cover the entire TAM estate.

What do you think? Does it trigger a religious war for you? Comments welcomed below.

FWIW. I come from an era when my “OSS” tools had a lot of functionality that could arguably be classified as BSS-centric (eg product management, customer relationship management, service order entry, etc). They also happened to deliver functionality that others might classify as NMS or EMS (Network Management System or Element Management System) in nature. In my mind, they’ve always just been software that supports operationalisation of a network, whether customer or network/resource-facing. It’s one of the reasons this site is called Passionate About OSS, not Passionate About OSS/BSS/NMS/EMS.

Top 10 most common OSS project risks

OSS projects are full of risks we all know it. OSS projects have “earned” a bad name because of all those risks. On the other side of that same coin, OSS projects disappoint, in part I suspect because stakeholders expect such big things from their resource investments.

Ask anyone familiar with OSS projects and you’ll be sure to hear a long list of failings.

For those less familiar with what an OSS project has in store for you, I’d like to share a list of the most common risks I’ve seen on OSS projects.

Most people working in the OSS industry are technology-centric, so they’ll tend to cite risks that relate to the tech. That’s where I used to focus attention too. Now technology risk definitely exists, but as you’ll see below, I tend to start by looking at other risk factors first these days.

Most common OSS project risks / issues:

  1. Complexity (to be honest, this is probably more the root-cause / issue that manifests as many of the following risks). However, complexity across many aspects of OSS projects is one of the biggest problem sources
  2. Change ManagementOSS tend to introduce significant change to an organisation – operationally, organisationally, processes, training, etc. This is probably the most regularly underestimated component of any large OSS build
  3. Stakeholder Support / Politics – Challenges appear on every single OSS project. They invariably need strong support from stakeholders and sponsors to clear a path through the biggest challenges. If the project’s leaders aren’t fully committed and in unison, the delivery teams will be heavily constrained
  4. Ill-defined Scope – Over-scoping, scope omission and scope creep all represent risks to an OSS project. But scope is never perfectly defined or static, so scope management mechanisms need to be developed up-front rather than in-flight. Tying back to point 1 above, complexity minimisation should be a key element of scope planning. To hark back to my motto for OSS, “just because we can, doesn’t mean we should)
  5. Financial and commercial – As with scope, it’s virtually impossible to plan an OSS project to perfection. There are always unknowns.These unknowns can directly impact the original estimates. Projects with blow-outs and no contingency for time or money increase pressure on point 3 (stakeholders/sponsors) to maintain their support
  6. Client resource skills / availability – An OSS has to be built to the needs of a client. If the client is unable to provide resources to steer the implementation, then it’s unlikely for the client to get a solution that is perfectly adapted to the client’s needs. One challenge for the client is that their most valuable guides, those with the client’s tribal knowledge, are also generally in high demand by “business as usual” teams. It becomes a challenge to allocate enough of their time to guide  the OSS delivery team. Another challenge is augmenting the team with the required skill-set when a project introduces new skill requirements
  7. CommunicationOSS projects aren’t built in a vacuum. They have many project contributors and even more end-users. There are many business units that touch an OSS/BSS, each with their own jargon and interpretations.  For example, how many alternate uses of the term “service” can you think of? I think an important early-stage activity is to agree on and document naming conventions
  8. Culture – Of the client team and/or project team. Culture contributes to (or detracts from) motivation, morale, resource turnover, etc, which can have an impact on the team’s ability to deliver
  9. Design / Integration – Finally, a technology risk. This item is particularly relevant with complex projects, it can be difficult for all of the planned components to operate and integrate as planned. A commonly unrecognised risk relates to the viability of implementing a design. It’s common for an end-state design to be specified but with no way of navigating through a series of steps / phases and reach the end-state
  10. Technology – Similar to the previous point, there are many technology risks relating to items such as quality, scalability, resiliency, security, supportability, obsolescence, interoperability, etc

There’s one thing you will have probably noticed about this list. Most of the risks are common to other projects, not just OSS projects. However, the risks do tend to amplify on OSS projects because of their inherent complexity.

Inverting the pyramid of OSS and network innovation

Back in the earliest days of OSS (and networks for that matter), it was the telcos that generated almost all of the innovation. That effectively limited innovation to being developed by the privileged few, those who worked for the government-owned, monopoly telcos.

But over time, the financial leaders at those telcos felt the costs of their amazing research and development labs outweighed the benefits and shut them down (or starved them at best). OSS (and network) vendors stepped into the void to assume responsibility for most of the innovation. But there was a dilemma for the vendors (and for telcos and consumers too) – they needed to innovate fast enough to win work against their competitors, but slow enough to accrue revenues from the investment in their earlier innovations. And innovation was still being constrained to the privileged few, those who worked for vendors and integrators.

Now, the telcos are increasingly pushing to innovate wider and faster than the current vendor collective can accommodate. It means we have to reach further out to the long-tail of innovators. To open the floor beyond the privileged few. Excitingly, this opportunity appears to be looming.

“How?” you may ask.

Network as a Service (NaaS) and API platform offerings.

If every telco offers consumption of their infrastructure via API, it provides the opportunity for any developer to bundle their own unique offering of products, services, applications, hosting, etc and take it to market. If you’re heading to TM Forum’s Digital Transformation World (DTW) in Nice next week, there are a number of Catalyst projects on display in this space, including:

Zero-touch partnering could make platform ‘utopia’ real for telcos

Packaging Open APIs for NaaS

The challenge for the telcos is in how to support the growth of this model. To foster the vendor market, it was easy enough for the telcos to identify the big suppliers and funnel projects (and funding) through them. But now they have to figure out a funnel that’s segmented at a much smaller scale – to facilitate take-up by the millions of developers globally who might consume their products (network APIs in this case) rather than the hundreds/thousands of large suppliers.

This brings us back to smart contracts and micro-procurement as well as the technologies such as blockchain that support these models. This ties in with another TM Forum initiative to revolutionise the procurement event:

Time to kill the RFP? Reinventing IT procurement for the 2020s: Volume 1

But an additional benefit for the telcos, if and when the NaaS platform model takes hold, is that the developers also become a unpaid salesforce for the telcos. The developers will be responsible for marketing and selling their own bundles, which will drive consumption and revenues on the telcos’ assets.

Exciting new business models and supply chains are bound to evolve out of this long tail of innovation.

Could you believe it? An OSS with less features that helps more?

All OSS products are excellent these days. And all OSS vendors know what the most important functionality is. They already have those features built into their products. That is, they’ve already added the all-important features at the left side of the graph.
Long-tail features

But it also means product teams are tending to only add the relatively unimportant new features to the right edge of the graph (ie inside the red box). Relatively unimportant and therefore delivering minimal differential advantage.

The challenge for users is that there is a huge amount of relatively worthless functionality that they have to navigate around. This tends to make the user interfaces non-intuitive.

In a previous post, we mentioned that it’s the services wrapper where OSS suppliers have the potential to differentiate.

But another approach, a product-led differentiator, dawned on me when discussing the many sources of OSS friction in yesterday’s post. What if we asked our product teams to take a focus on designing solutions that remove friction instead of the typical approach of adding features (and complexity)?

Almost every OSS I’m aware of has many areas of friction. It’s what gives the OSS industry a bad name. But what if one vendor reduced friction to levels far less than any other competitor? Would it be a differentiator? I’m quite certain customers would be lining up to buy a frictionless OSS even if it didn’t have every perceivable feature.

But can it work? What do you think?

Is your OSS squeaking like an un-oiled bearing?

Network operators spend huge amounts on building and maintaining their OSS/BSS every year. There are many reasons they invest so heavily, but in most cases it can be distilled back to one thing – improving operational efficiency.

And our OSS/BSS definitely do improve operational efficiency, but there are still so many sources of friction. They’re squeaking like un-oiled bearings. Here are just a few of the common sources:

  1. First-time Installation
  2. Identifying best-fit tools
  3. Procurement of new tools
  4. Update / release processes
  5. Continuous data quality / consistency improvement
  6. Navigating to all features through the user interface
  7. Non-intuitive functionality / processes
  8. So many variants / complexity that end-users take years to attain expert-level capability
  9. Integration / interconnect
  10. Getting new starters up to speed
  11. Getting proficient operators to expertise
  12. Unlocking actionable insights from huge data piles
  13. Resolving the root-cause of complex faults
  14. Onboarding new customers
  15. Productionising new functionality
  16. Exception and fallout handling
  17. Access to supplier expertise to resolve challenges

The list goes on far deeper than that list too. The challenge for many OSS product teams, for any number of reasons, is that their focus is on adding new features rather than reducing friction in what already exists.

The challenge for product teams is diagnosing where the friction  and risks are for their customers / stakeholders. How do you get that feedback?

  • Every vendor has a product support team, so that’s a useful place to start, both in terms of what’s generating the most support calls and in terms of first-hand feedback from customers
  • Do you hold user forums on a regular basis, where you get many of your customers together to discuss their challenges, your future roadmap, new improvements / features
  • Does your process “flow” data show where the sticking points are for operators
  • Do you conduct gemba walks with your customers
  • Do you have a program of ensuring all developers spend at least a few days a year interacting directly with customers on their site/s
  • Do you observe areas of difficulty when delivering training
  • Do you go out of your way to ask your customers / stakeholders questions that are framed around their pain-points, not just framed within the context of your existing OSS
  • Do you conduct customer surveys? More importantly, do you conduct surveys through an independent third-party?

On the last dot-point, I’ve been surprised at some of the profound insights end-users have shared with me when I’ve been conducting these reviews as the independent interviewer. I’ve tended to find answers are more open / honest when being delivered to an independent third-party than if the supplier asks directly. If you’d like assistance running a third-party review, leave us a note on the contact page. We’d be delighted to assist.

Give me a fast OSS and I might ask you to slooooow doooown

The traditional telco (and OSS) ran at different speeds. Some tasks had to happen immediately (eg customers calling one another) while others took time (eg getting a connection to a customer’s home, which included designs, approvals, builds, etc), often weeks.

Our OSS have processes that must happen sequentially and expediently. They also have processes that must wait for dependencies, conditional events and time delays. Some roles need “fast,” others can cope with “slow.” Who wins out in this dilemma?

Even the data we rely on can transact at different speeds. For capacity planning, we’re generally interested in longer-term data. We don’t have to process at real-time. Therefore we can choose to batch process at longer cycle times and with summarised data sets. For network assurance, we’re generally interested in getting data as quick as is viable.

Today’s post is about that word, viable, and pragmatism we sometimes have to apply to our OSS.

For example, if our operations teams want to reduce network performance poll cycles from every 15 mins down to once a minute, we increase the amount of data to process by 15x. That means our data storage costs go up by 15x (assuming a flat-rate cost structure applies). The other hidden cost is that our compute and network costs also go up because we have to transfer and process 15x as much data.

The trade-off we have to make in responses to this rapid escalation of cost (when going from 15 to 1 min) is in the benefits we might derive. Can we avoid SLA (Service Level Agreement) breach costs? Can we avoid costly outages? Can we avoid damage to equipment? Can we reduce the risk of losing our carrier license?

The other question is whether our operators actually have the ability to respond to 15x as much data. Do we have enough people to respond at an increased cycle time? Do we have OSS tools that are capable of filtering what’s important and disregarding “background” activity? Do we have OSS tools that are capable of learning from every single metric (eg AI), at volumes the human brain could never cope with?

Does it make sense that we have a single platform for handling fast and slow processes? For example, do we use the same platform to process 1 minute-cycle performance data for long-term planning (batch-processed once daily) and quick-fire assurance (processed as fast as possible)?

If we stick to one platform, can our OSS apply data reduction techniques (eg selective discard of records) to get the benefits of speed, but with the cost reduction of slow?

Would you hire a furniture maker as an OSS CEO?

Well, would you hire a furniture maker as CEO of an OSS vendor?

At face value, it would seem to be an odd selection right? There doesn’t seem to be much commonality between furniture and OSS does there? It seems as likely as hiring a furniture maker to be CEO of a car maker?

Oh wait. That did happen.

Ford Motor Company made just such a decision last year when appointing Jim Hackett, a furniture industry veteran, as its CEO. Whether the appointment proves successful or not, it’s interesting that Ford made the decision. But why? To focus on user experience and design as it’s next big differentiator. Clever line of thinking Bill Ford!!

I’ve prepared a slightly light-hearted table for comparison purposes between cars and OSS. Both are worth comparing as they’re both complex feats of human engineering:

Idx Comparison Criteria Car OSS
1 Primary objective Transport passengers between destinations Operationalise and monetise a comms network
2 Claimed “Business” justification Personal freedom Reducing the cost of operations
3 Operation of common functionality without conscious thought (developed through years of operator practice) Steering

Changing gears

Indicating

Hmmm??? Depends on which sales person or operator you speak with
4 Error detection and current-state monitoring Warning lights and instrument cluster/s Alarm lists, performance graphs
5 Key differentiator for customers (1970’s) Engine size Database / CPU size
6 Key differentiator for customers (2000’s) Gadgets / functions / cup-holders Functionality
7 Key differentiator for customers (2020+) User Experience

Self-driving

Connected car (car as an “experience platform”)

User Experience??

Zero-touch assurance?

Connected OSS (ie OSS as an experience platform)???

I’d like to focus on three key areas next:

  1. Item 3
  2. Item 4 and
  3. The transition between items 6 and 7

Item 3 – operating on auto-pilot

If we reference against item 1, the primary objective, experienced operators of cars can navigate from point A to point B with little conscious thought. Key activities such as steering, changing gears and Indicating can be done almost as a background task by our brains whilst doing other mental processing (talking, thinking, listening to podcasts, etc).

Experienced operators of OSS can do primary objectives quickly, but probably not on auto-pilot. There are too many “levers” to pull, too many decisions to make, too many options to choose from, for operators to background-process key OSS activities. The question is, could we re-architect to achieve key objectives more as background processing tasks?

Item 4 – error detection and monitoring

In a car, error detection is also a background task, where operators are rarely notified, only for critical alerts (eg engine light, fuel tank empty, etc). In an OSS, error detection is not a background task. We need full-time staff monitoring all the alarms and alerts popping up on our consoles! Sometimes they scroll off the page too fast for us to even contemplate.

In a car, monitoring is kept to the bare essentials (speedo, tacho, fuel guage, etc). In an OSS, we tend to be great at information overload – we have a billion graphs and are never sure which ones, or which thresholds, actually allow us to operate our “vehicle” effectively. So we show them all.

Transitioning from current to future-state differentiators

In cars, we’ve finally reached peak-cup-holders. Manufacturers know they can no longer differentiate from competitors just by having more cup-holders (at least, I think this claim is true). They’ve also realised that even entry-level cars have an astounding list of features that are only supplementary to the primary objective (see item 1). They now know it’s not the amount of functionality, but how seamlessly and intuitively the users interact with the vehicle on end-to-end tasks. The car is now seen as an extension of the user’s phone rather than vice versa, unlike the recent past.

In OSS, I’ve yet to see a single cup holder (apart from the old gag about CD trays). Vendors mark that down – cup holders could be a good differentiator. But seriously, I’m not sure if we realise the OSS arms race of features is no longer the differentiator. Intuitive end-to-end user experience can be a huge differentiator amongst the sea of complex designs, user interfaces and processes available currently. But nobody seems to be talking about this. Go to any OSS event and we only hear from engineers talking about features. Where are the UX experts talking about innovative new ways for users to interact with machines to achieve primary objectives (see item 1)?

But a functionality arms race isn’t a completely dead differentiator. In cars, there is a horizon of next-level features that can be true differentiators like self-driving or hover-cars. Likewise in OSS, incremental functionality increases aren’t differentiators. However, any vendor that can not just discuss, but can produce next-level capabilities like zero touch assurance (ZTA) and automated O2A (Order to Activate) will definitely hold a competitive advantage.

Hat tip to Jerry Useem, whose article on Atlantic provided the idea seed for this OSS post.

The no accounts receivable OSS model

Unfortunately for OSS vendors / integrators, their business models have a dependency (and major risk) on accounts receivable.

Investopedia states, “Accounts receivable are amounts of money owed by customers to another entity for goods or services delivered or used on credit but not yet paid for by clients.”

One of the earliest OSS projects I worked on was worth in excess of $30m for the vendor. It was a multi-year implementation. Two years in, they’d only received the initial mobilisation payment. With implementation costs blowing out, it was proving to be a major challenge for the company to continue operating.

The team had delivered a majority of the functionality written into the contract, as well as many other features negotiated in-flight. It was successfully being used in production, helping to deliver revenues to the customer. Unfortunately for the vendor, there was some key functionality that was still a way off being delivered. That meant contractual objectives hadn’t all lined up for payments to occur.

The balance of financial power was definitely in the hands of the customer.

Whether it’s in a large, complex implementation or ongoing license fees, accounts receivable can be the bane of OSS vendors.

That’s why I try to establish a no accounts receivable model for OSS vendors. That means up-front payment, but as shown below, means up-front value also needs to be delivered. It’s one of the attractive aspects of cloud-delivery business models.

The project I mentioned above had a product suite that worked out of the box, but only delivered value after features, data, integrations and automations were custom built… over a period of years.

So a couple of questions for the OSS vendors out there:

  1. How to deliver value, not just functionality, early in a project and then ongoing through the product lifecycle?
  2. How to give the customer enough confidence that they’ll receive up-front (and recurring) value that they’re prepared to pay up-front (and recurring)?

Leave me a comment below if accounts receivable is a bane of your organisation’s existence or whether you’ve found a way to have less reliance on AR.

What are OSS “platform wrapper” roadblocks?

OSS can be cumbersome at times. Making change can be difficult. We tend to build layers of protections around them and the networks we manage. I get that. Change can be risky (although the protections are often implemented because the OSS and/or network platforms might not be as robust as they could be).

Contrast this with the OSS we want to create. We want to create a platform for rapid innovation, the platform that helps us and our clients generate opportunities and advantages.

For us to build a platform that allows our customers (and their customers) to revolutionise their markets, we might have to consider whether the protective layers around our OSS that are stymying change. Things like firewall burns, change review boards, documentation, approvals, politics, individuals with a reticence to change, etc.

For example, Netflix takes a contrarian, whitelist approach to access by its engineers rather than a blacklist. It assumes that its engineers are professional enough to only use the tools that they need to get their tasks done. They enable their engineers to use commonly off-limits functionality such as adding their own DNS records (ie to support the stand-up of new infrastructure). But they also take a use-it-or-lose-it approach, monitoring the tools that the engineer uses and rescinding access to tools they haven’t used within 90 days. But if they do need access again, it’s as simple as a message on Slack to reinstate it.

This is just one small example of streamlining the platform wrapper. There are probably a million others.

When working on OSS projects as the integrator / installer, I’ve seen many of these “platform wrapper” roadblocks. I’m sure you have too. If you see them as the installer, chances are the ops team you hand over to will also experience these roadblocks.

Question though. Do you flag these platform wrapper roadblocks for improvement, or do you treat them as non-platform and therefore just live with them?

Only do the OSS that only you can do

A friend of mine has a great saying, “only do what only you can do.”

Do you think that this holds true for the companies undergoing digital transformation? Banks are now IT companies. Insurers are IT companies. Car manufacturers are now IT companies. Telcos are, well, some are IT companies.

We’ve spoken before about the skill transformations that need to happen within telcos if they’re to become IT companies. Some are actively helping their workforce to become more developer-centric. Some of the big telcos that I’ve been assisting in the last few years are embarking on bold Agile-led IT transformations. They’re cutting more of their own code and managing their own IT developments.

That’s exciting news for all of us in OSS. Even if it loses the name OSS in future, telcos will still need software that efficiently operationalises their networks. We have the overlapping skills in software, networks, business and operations.

But I wonder about the longevity of the in-house approach unless we come focus clearly on the first quote above. If all development is brought in-house, we end up with a lot of duplication across the industry. I’m not really sure that it makes sense doing all the heavy-lifting of all custom OSS tools when the heavy-lifting has already been done elsewhere.

It’s the old ebb and flow between in-house and outsourced OSS.

In my very humble opinion, it’s not just a choice between in-house and outsourced that matters. The more important decisions are around choosing to only develop the tools in-house that only you can do (ie the strategic differentiators).

A single glass of pain or single pane of glass??

Is your OSS a single pane of glass, or a single glass of pain?

You can tell I’m being a little flippant here. People often (perhaps idealistically) talk about OSS as being the single pane of glass (SPOG) to manage a network.

I say “idealistically” for a couple of reasons:

  1. There are usually many personas who interact with an OSS, each with vastly different user interface (UI) needs
  2. There is usually more than one OSS product in a client’s OSS suite, often from different vendors, with varying levels of integration

Where a single pane of glass can be a true ambition is as a consolidated health-status dashboard / portal, Invariably, this portal is used by executive / leader / manager personas who want to quickly see a single-screen health status that covers all networks and/or parts of the OSS suite. When things go wrong, this portal becomes the single glass of pain.

These single panes tend to be heavily customised for each organisation as every one has a unique set of metrics-that-matter. For those designing these panes, the key is to not just include vanity metrics, but to show information that the leader can action.

But the interesting perspective here is whether the single glass of pain is even relevant within your organisation’s culture. It’s just my opinion, but I prefer for coal-face workers to be empowered to make rapid recovery actions rather than requiring direction from up high in the org-chart. Coal-face workers generally have different tools with UIs that *should* help them monitor, manage and repair super-efficiently.

To get back to the “idealistic” comment above, each OSS UI needs to be fit-for-purpose for each unique persona (eg designers, product owners, network operations, etc). To me this implies that there is no single pane of glass…

I should caveat that by citing the example of an OSS search interface, something I’ve yet to see in OSS… although that’s just a front end to dozens of persona-specific panes of glass.

An OSS without the shackles of topology

It’s been nearly two decades since I designed my first root-cause analysis (RCA) rule. It was completely reliant on network topology – more specifically, it relied on a network hierarchy to determine which alarms could be suppressed.

I had a really interesting discussion today with some colleagues who are using much more modern RCA techniques. I was somewhat surprised, but not surprised at all in hindsight, that their Machine Learning engine doesn’t even use topology data. It just looks at events and tries to identify patterns.

That’s a really interesting insight that hadn’t dawned on me before. But it’s an exciting one because it effectively unshackles our fault management tools from data quality perfection in our inventory / asset databases. It also possibly lessens the need for integrations that share topological data.

Equally interesting, the ML engine had identified over 4,000 patterns, but only a dozen had been codified and put into use so far. In other words, the machine was learning, but humans still needed to get involved in the process to confirm that the machine had learned correctly.

Makes me wonder whether the ML pre-seeding technique we discussed in an earlier post might actually be useful for confirmations at a greater scale than the team had achieved with 12 of 4000+ to date.

The standard approach is to let the ML loose and identify patterns. This is the reactive approach. The ML reacts to the alarms that are pushed up from the network. It looks at alarms and determines what the root cause is based on historical data. A human then has to check that the root cause is correct by reverse engineering the alarm stream (just like a network operator used to do before RCA tools came along) and comparing. If the comparison is successful, the person then approves this pattern.

My proposed alternate approach is the proactive method. If we proactively trigger a fault (e.g. pull a patch lead, take a port down, etc), we start from a position of already knowing what the root cause is. This has three benefits:
1. We can check if the ML’s interpretation of root cause is right
2. We’ve proactively seeded the ML’s data with this root cause example
3. We categorically know what the root cause is, unlike the reactive mode which only assumes the operator has correctly diagnosed the root cause

Then we just have to figure out a whole bunch of proactive failures to test safely. Where to start? Well, I’d speak with the NOC operators to find out what their most common root causes are and look to trigger those events.

More tomorrow on intentionally triggering failures in production systems.

I sent you an OSS helicopter

There’s a fable of a man stuck in a flood. Convinced that God is going to save him, he says no to a passing canoe, boat, and helicopter that offer to help. He dies, and in heaven asks God why He didn’t save him. God says, “I sent you a canoe, a boat, and a helicopter!”
We all have vivid imaginations. We get a goal in our mind and picture the path so clearly. Then it’s hard to stop focusing on that vivid image, to see what else could work.
New technologies make old things easier, and new things possible. That’s why you need to re-evaluate your old dreams to see if new means have come along
.”
Derek Sivers
, here.

In the past, we could make OSS platform decisions with reasonable confidence that our choices would remain viable for many years. For example, in the 1990s if we decided to build our OSS around a particular brand of relational database then it probably remained a valid choice until after 2010.

But today, there are so many more platforms to choose from, not to mention the technologies that underpin them. And it’s not just the choices currently available but the speed with which new technologies are disrupting the existing tech. In the 1990s, it was a safe bet to use AutoCAD for outside plant visualisation without the risk of heavy re-tooling within a short timeframe.

If making the same decision today, the choices are far less clear-cut. And the risk that your choice will be obsolete within a year or two has skyrocketed.

With the proliferation of open-source projects, the decision has become harder again. That means the skill-base required to service each project has also spread thinner. In turn, decisions for big investments like OSS projects are based more on the critical mass of developers than the functionality available today. If many organisations and individuals have bought into a particular project, you’re more likely to get your new features developed than from a better open-source project that has less community buy-in.

We end up with two ends of a continuum to choose between. We can either chase every new bright shiny object and re-factor for each, or we can plan a course of action and stick to it even if it becomes increasingly obsoleted over time. The reality is that we probably fit somewhere between the two ends of the spectrum.

To be brutally honest I don’t have a solution to this conundrum. The closest technique I can suggest is to design your solution with modularity in mind, as opposed to the monolithic OSS of the past. That’s the small-grid OSS architecture model. It’s easier to replace one building than an entire city.

Life-cycles of key platforms are likely to now be a few years at best (rather than decades if starting in the 1990s). Hence, we need to limit complexity (as per the triple-constraint of OSS) and functionality to support the most high-value objectives.

I’m sure you face the same conundrums on a regular basis. Please leave a comment below to tell us how you overcome them.

Mythical OSS beasts – feature removal releases

Life can be improved by adding, or by subtracting. The world pushes us to add, because that benefits them. But the secret is to focus on subtracting…

No amount of adding will get me where I want to be. The adding mindset is deeply ingrained. It’s easy to think I need something else. It’s hard to look instead at what to remove.

The least successful people I know run in conflicting directions, drawn to distractions, say yes to almost everything, and are chained to emotional obstacles.

The most successful people I know have a narrow focus, protect against time-wasters, say no to almost everything, and have let go of old limiting beliefs.”
Derek Sivers, here.

I’m really curious here. Have you ever heard of an OSS product team removing a feature? Nope?? Me either!

I’ve seen products re-factored, resulting in changes to features. I’ve also seen products obsoleted and their replacements not offer all of the same features. But what about a version upgrade to an existing OSS product that has features subtracted? That never happens does it?? The adding mindset is deeply ingrained.

So let’s say we do want to go on a subtraction drive and remove some of the clutter from our OSS. I know plenty of OSS GUIs where subtraction is desperately needed BTW! But how do we know what to remove?

I have no data to back this up, but I would guess that almost every OSS would have certain functions that are not used, by any of their customers, in a whole year. That functionality was probably built for a specific use-case for a specific customer that no longer has relevance. Perhaps for a service type that is no longer desired by the market or a network type that will never be used again.

Question is, does your OSS have profiling instrumentation that allows you to measure what functionality is and isn’t used across your whole client base?

Can your products team readily produce a usage profile graph like the following that shows a list of functions (x-axis) by the number of times each function is used (y-axis) in a given time window? Per client? Across all clients?
Long-tail of OSS functionality use

Leave us a comment below if you’ve ever seen this type of profiling instrumentation (not for code optimisation, but for identifying client utilisation levels) and/or systematic feature subtraction initiatives.

BTW. I should make the distinction that just because a function hasn’t been used in a while, doesn’t mean it should automatically be removed. Some functionality (eg data loaders) might be rarely used, but important to retain.

The use of drones by OSS

The last few days have been all about organisational structuring to support OSS and digital transformations. Today we take a different tack – a more technical diversion – onto how drones might be relevant to the field of OSS.

A friend recently asked for help to look into the use of drones in his archaeological business. This got me to thinking about how they might apply in cross-over with OSS.

I know they’re already used to perform really accurate 3D cable route / corridor surveying. Much cooler than the old surveyor diagrams on A1 sheets from the old days. Apparently experts in the field can even tell if there’s rock in the surveyed area by looking at the vegetation patterns, heat and LIDAR scans.

But my main area of interest is in the physical inventory. With accurate geo-tagging available on drones and the ability to GPS correct the data, it seems like a really useful technique for getting outside plant (OSP) data into OSS inventory systems. Or geo-correcting data for brownfields assets.
Drone-based cable corridor surveys
Have you heard of drone-based OSP asset identification and mapping data being fed into inventory systems yet? I haven’t, but it seems like the logical next step. Do you know anyone who has started to dabble in this type of work? If you do, please send me a note as I’d love to be introduced.

Once loaded into the inventory system, with 3d geo-location, we then have the ability to visualise the OSP data with augmented reality solutions.

And other applications for drone technology?

OSS orgitecture

So far this week we’ve been focusing on ways to improve the OSS transformation process. Monday provided 7 models for achieving startup-like efficiency for larger OSS transformations. Tuesday provided suggestions for speeding up the transition from OSS PoC to getting the solution into production, specifically strategies for absorbing an OSS PoC into production.

Both of these posts talk about the speed of getting things done outside the bureaucracy of big operators, big networks and big OSS. Today, as the post title suggests, we’re going to look at orgitecture – how re-designing the structure and culture of an organisation can help streamline digital transformations.

Do you agree with the premise that smaller entities (eg Agile autonomous groups, partners, consultants, etc) can get OSS tasks done more efficiently when operating at arms-length of the larger entity (eg the carrier)? I believe that this is a first principle of physics at play.

If you’ve worked under this arms-length arrangement in the past, you’ll also know that at some point those delivery outcomes need to get integrated back into the big entity. It’s what we referred to yesterday as absorption, where the level of integration effort falls on a continuum between minimally absorbed to fully absorbed.

OSS orgitecture is the re-architecture of the people, processes, culture and org structure to better allow for the absorption process. In the past, all the safety-checks (eg security, approvals, ops handover, etc) were designed on the assumption that internal teams were doing the work. They’re not always a great fit, especially when it comes to documentation review and approval.

For example, I have a belief that the effectiveness of documentation review and approval is inversely proportional to the number of reviewers (in most, but not all cases). Unfortunately, when an external entity is delivering, there tends to be inherently less trust than if an internal entity was delivering. As such, the safety-checks increase.

Another example is when the large organisation uses Agile delivery models, but use supply partners to deliver scope of works. The partners are able to assign effort in a sequential / waterfall manner, but can be delayed by only getting timeslices of attention from client’s staff (ie resources are available according to Agile sprint planning).

Security and cutover planning mechanisms such as Change Review Boards (CRB) have also been designed around old internal delivery models. They also need to be reconsidered to facilitate a pipeline of externally-implemented change.

Perhaps the biggest orgitecture factor is in getting multiple internal business units to work together effectively. In the old world we needed all the business units to reach consensus for a new product to come to market. Sales/Marketing/Products had to work with OSS/IT and Networks. Each of these units tend to have vastly different cultures and different cadences for getting their tasks done. Delivering a new product was as much an organisational challenge as it was a technical challenge and often took months. Those times-to-market are not feasible in a world of software where competitive advantages are fleeting. External entities can potentially help or hinder these timeframes. Careful design of small autonomous teams have the potential to improve abstraction at the interlocks, but culture remains the potential roadblock.

I’m excited by the opportunity for OSS delivery improvement coming from leveraging the gig economy. But if big OSS transformations are to make use of these efficiency gains, then we may also need to consider culture and process refinement as part of the change management.