The biggest OSS loser

You are so much more likely to put effort into something when you know whether it will pay off and what the gains will be. Not knowing how things will turn out undermines your motivation and makes you delay taking action.”
Dr Theo Tsaousides
in his book, Brainblocks.

Have you seen the reality TV show, “The Biggest Loser?” I rarely watch TV, but have noticed that it’s been a runaway hit in the ratings here in Australia (and overseas apparently). Why has it been so successful and what does it have to do with OSS?

Well, according to Dr Tsaousides, the success of the show comes down to the obvious body-shape / fitness transformations each of the contestants makes over each season of the show. But more specifically, “You need to watch only one season from beginning to end and you will start craving to be a contestant on the show, regardless of your current weight… Seeing the people’s amazing transformation over a few months is a much more convincing way to start working out and eating well than being told by your doctor that you need to lose weight and about the cardiovascular advantages of exercise. Forecasting a positive outcome, especially when dealing with something new and unfamiliar, leads to action.”

Can you see how this might be a useful technique when planning an OSS transformation?

Change management is always a challenging task on any large OSS transformation. It’s always best to have the entire OSS user population involved in the change, but that’s not always feasible for large groups of users.

It’s one of the reasons I’m always a big advocate for getting a baseline, sandpit version of off-the-shelf OSS stood up and available for the user population to start interacting with. This is particularly helpful if the sandpit is perceptibly better than the current one.

To paraphrase, “Forecasting a positive outcome (via the OSS sandpit), especially when dealing with something new and unfamiliar (the future state after OSS transformation), leads to action (more excitement, engagement and less pushback from the user population during the course of the transformation).”

Do you think the biggest loser technique could work on your next OSS transformation?

Presence vs omni-presence and the green button of OSS design

In OSS there are some tasks that require availability (the green button on communicator). The Network Operations Centre (NOC) is one. But does it require on-site presence in the NOC?

An earlier post showed how wrong I was about collaboration rooms. It seems that ticket flicking (and perhaps communication tools like slack) is the preferred model. If this is the preferred model, then perhaps there is no need for a NOC… perhaps only a DR NOC (Disaster Recover NOC).

Truth is, there are hardly any good reasons to know if someone’s available or away at any given moment. If you truly need something from someone, ask them. If they respond, then you have what you needed. If they don’t, it’s not because they’re ignoring you – it’s because they’re busy. Respect that! Assume people are focused on their own work.
Are there exceptions? Of course. It might be good to know who’s around in a true emergency, but 1% occasions like that shouldn’t drive policy 99% of the time. 
Jason Fried on Signal v Noise

Customer service needs availability. But with a multitude of channels (for customers) and collaboration tools (for staff*), it decreasingly needs presence (except in retail outlets perhaps). You could easily argue that contact centres, online chat operators, etc don’t require presence, just availability.

The one area where I’m considering the paradox of presence is in OSS design / architecture. There are often many facets of a design that require multiple SMEs – OSS application, security, database, workflow, user-experience design, operations, IT, cloud etc.

When we get many clever SMEs in the one room, they often have so many ideas and so much expertise that the design process resembles an endless loop. Presence seems to inspire omnipresence (the need to show expertise across all facets of the design). Sometimes we achieve a lot in these design workshops. Sometimes we go around in circles almost entirely because of the cleverness of our experts. They come up with so many good ideas we end up in paralysis by analysis.

The idea I’m toying with is how to use the divide and conquer theory – being able to carve up areas of responsibility and demarcation points to ensure each expert focuses on their area of responsibility. Having one expert come up with their best model within their black box of responsibility and connecting their black box with adjacent demarcation points. The benefits are also the detriments. The true double-edged sword. The benefits are having one true expert work through the options within the black box. The detriments are having only one expert work through the options within the black box.

There are some past projects that I wished I’d tried to inspire the divide and conquer approach in hindsight. In others, the collaboration model has worked extremely well.

But to get back to presence, I wonder whether thrashing up front to define black boxes and demarcation points then allows the experts to do their thing remotely and become less inclined to analyse and opine on everyone else’s areas of expertise.

* I use the term staff to represent anyone representing the organisation (staff, contractor, consultant, freelancer, etc)

Would an EoL be beneficial for OSS?

In the world of networking, it’s common for devices to go EOL (end-of-life). Capital spend and depreciation models are based around refresh cycles of around 5-7 years. Vendors reinforce this refresh cycle by designing obsolescence into maintenance, support and part supplies. Customers tend to simply submit to the risk of having no vendor support by buying the next generation replacements.

But how often do you hear of an OSS going EOL? Not often right? They tend to get written off only when the cost of upkeep outweighs new revenues.

I know, I can hear you saying that software is different from hardware and of course I agree with you. I’d partially counter by claiming that software architectures and development platforms also have a discernibly useful life just like physical network devices. If you doubt that, I’m sure you’ve seen OSS tools with origins in the 1990s that are still being developed upon. I tend to believe that product usefulness becomes asymptotic for its vendors. With the speed of change and proliferation of new platforms, useful lives are getting ever-shorter.

Would a pre-ordained product replacement life-cycle be beneficial for the OSS industry? It has some merits.

For a start, planned obsolescence enforces designs with interchangeability, in line with the small-grid OSS described yesterday. It promotes short-term enhancements to long-term visions. It becomes easier for customers to write off their investment and inject new capital into the vendor market. It penalises the amount of Frankenstein integrations that tend to become increasingly burdensome (to vendor and customer) into the future. It enforces those mythical beasts of telco software – subtraction projects. It promotes innovation to avoid the asymptotic benefit deterioration curve shown below:
Asymptotic OSS feature development

As the asymptote is being reached, a new jumping-off point commences with the new product.

But it’s a difficult status-quo to break. Vendors have invested millions of developer hours into their products. Taking a product EoL is effectively throwing that invested effort away. For carriers, it means the risk and cost of breaking integrations / processes and replacing them with new ones.

I’d love to hear your thoughts on whether an EOL model might be relevant / useful for your OSS.

OSS – like a duck on a pond

Let’s start with a basic question. “What does an OSS need to do?”

The basic answer is, “make operations easier.”

The real answer(s) is so much more nuanced than that of course. The term easier can also encapsulate other words such as faster, more accurate, more repeatable, cheaper, etc.

Designing, building, operating and maintaining a sizable network is extremely challenging, despite network operators around the world, and the vendors that supply to them, employing some of the best and brightest. So we design OSS and related tools / processes to make operations easier.

Yet I sometimes wonder whether we achieve that aim – to make operations easier. Seems to me that we tend to focus more on just replicating functions at a higher layer in the management stack. That is, moving the function to the OSS rather than EMS/NMS, without really making it much easier operationally.

Let’s start at the user interface (UI). How often are they intuitive enough for an experienced network operator to start doing tasks with negligible OSS expert guidance?
Let’s look at deployments. How often are the projects low on effort, risk, cost and complexity?
Let’s look at flexibility (ie in-flight modifications or transformations). How often do we actually deliver flexibility to our customers through our OSS. To ask the same as above, how often are our changes low on effort, risk, cost and complexity?

As a small step towards providing an answer, I wonder whether it’s a case of making the hard things look easy and the easy things look hard.

We want to make the really hard operational things much easier to do within an OSS because that’s the primary purpose of an OSS. That’s the example of a duck on a pond. The OSS is gliding along effortlessly across the top of the water, but under the water it is paddling furiously.

Conversely, we want to make the really easy* operational things look hard to do within an OSS so that we’re not constantly being asked to build functionality / complexity into our OSS that doesn’t warrant being there. It diffuses the intent of the OSS. Just because we can, doesn’t mean we should.

OSS Road-itecture. Part-roadmap, part-architecture

A post from earlier this week discussed a less risky, dependency-reduced, stepping-stone transformation approach. It contrasted with the big-bang delivery model that’s often proposed on OSS projects.

Taking the same train of thought, have you noticed how often architects (including myself) come up with an end-state view of what an OSS, or IT, or networks will be? Have you also noticed that they often seek to demonstrate the cleverness of their architecture in the end-state?

To be honest, I’m more impressed with architectures that cleverly guide a reader through the minefield of complexity via multiple lesser steps and steer towards an intended end-state. To be equally honest, this type of architecture is probably part-roadmap, part-architecture. The journey often demonstrates the impracticality of an ideal end-state.

This may lead to an OSS with compromises but at least it’s not compromised.

The big-bang end-state might look really impressive on paper, but not be viable for the delivery team.

For fear of OSS investment

Friday’s post discussed three analogies about the challenges of performing an OSS pivot.

The biggest challenge in initiating the transformation / replacement of any significant OSS is fear. There are many OSS out there whose “owners” want to change and need to change… but fear changing because a significant pivot would mean a “sell the farm” decision.

The fear is completely understandable. These are highly complex projects with so many potential pitfalls that invest massive amounts of resource (time, money, people). The risks can be huge for sponsors / stakeholders / investors. Failure of these projects can be career changing. The upside potential rarely balances the downside risk.

So, the only choice we have is to present pivots that aren’t “bet the farm” decisions.

The delivery approach of a bet the farm pivot tends to look like this:
The Bet-the-farm OSS Transformation Approach

The less risky, dependency-reduced, stepping-stone transformation tends to look a bit like this, but probably with a lot more verticals, as described here:
The Stepping-Stone OSS Transformation Approach

Build an OSS and they will come… or sometimes not

Build it and they will come.

This is not always true for OSS. Let me recount a few examples.

The project team is disconnected from the users – The team that’s building the OSS in parallel to existing operations doesn’t (or isn’t able to) engage with the end users of the OSS. Once it comes time for cut-over, the end users want to stick with what they know and don’t use the shiny new OSS. From painful experience I can attest that stakeholder management is under-utilised on large OSS projects.

Turf wars – Different groups within a customer are unable to gain consensus on the solution. For example, the operational design team gains the budget to build an OSS but the network assurance team doesn’t endorse this decision. The assurance team then decides not to endorse or support the OSS that is designed and built by the design team. I’ve seen an OSS worth tens of millions of dollars turned off less than 2 years after handover because of turf wars. Stakeholder management again, although this could be easier said than done in this situation.

It sounded like a good idea at the time – The very clever OSS solution team keeps coming up with great enhancements that don’t get used, for whatever reason (eg non fit-for-purpose, lack of awareness of its existence by users, lack of training, etc). I’ve seen a customer that introduced over 500 customisations to an off-the-shelf solution, yet hundreds of those customisations hadn’t been touched by users within a full year prior to doing a utilisation analysis. That’s right, not even used once in the preceding 12 months. Some made sense because they were once-off tools (eg custom migration activities), but many didn’t.

The new OSS is a scary beast – The new solution might be perfect for what the customer has requested in terms of functionality. But if the solution differs greatly from what the operators are used to, it can be too intimidating to be used. A two-week classroom-based training course at the end of an OSS build doesn’t provide sufficient learning to take up all the nuances of the new system like the operators have developed with the old solution. Each significant new OSS needs an apprenticeship, not just a short-course.

It’s obsolete before it’s finishedOSS work in an environment of rapid change – networks, IT infrastructure, organisation models, processes, product offerings, regulatory shifts, disruptive innovation, etc, etc. The longer an OSS takes to implement, the greater the likelihood of obsolescence. All the more reason for designing for incremental delivery of business value rather than big-bang delivery.

What other examples have you experienced where an OSS has been built, but the users haven’t come?

Falsely rewarding based on OSS existence rather than excellence

There’s a common belief that most jobs see people rewarded for presence rather than performance. That is, they’re encouraged to be on site from 9am to 5pm rather than being given free reign over their work schedules as long as key outcomes are met / exceeded.

In OSS vendor / product selection there’s a similar concept. Contracts are often awarded based on existence rather than excellence. When evaluating a product, if it’s able to do a majority of the functions in the long list of requirements then the box is ticked.

However, this doesn’t take into account that there are usually only a very small number of functions that any given customer’s OSS needs to perform at a very high level of efficiency. All the others are effectively just nice to have. That’s the 80/20 rule at work.

When guiding a customer through their vendor selections, I always take them through an exercise to identify the use-cases / functions that really matter. Then we ensure that the demos or proofs of concept focus closely on how excellent the OSS is at those most important factors.

OSS implementation, but without the dependencies

One of the challenges with getting a new OSS or OSS transformation project completed can be the large number of dependencies that can cause momentum gridlock. If you’re looking to deliver business value in one big-bang, which is a really common approach to delivering OSS projects, then you end up juggling many different activities and hoping they all align at the right times.

I’ve noticed that the vendors tend to design their delivery schedules around big-bang / waterfall approaches like below.
Big-bang OSS delivery

Many vendors will even assure you that this is their standard practice and are hesitant to consider changes to their “best practice” delivery scheduling. Having been involved in many of these types of deliveries in the past, on both vendor and customer side, I can assure you that they rarely work well.

Generally speaking, the gridlocks occur on the customer-side, but the result is detrimental to customer and vendor alike. Hold-ups mean inefficient allocation of resources as well as the resultant cost / time over-runs.

The alternative is to apply a bit more lateral thinking to how you break down the work into smaller chunks. The lateral thinking work breakdown aims are two-fold:

  1. How to break up the work so that it best avoids dependencies; whilst also
  2. Delivering some sort of value to the customer

There are many dependencies on a typical OSS project – hardware, procurement, IT infrastructure, network connectivity, security, approvals, integrations, licensing, resource availability, data quality and many more. However, each different customer, their org chart and project has its own unique mix of dependencies, so I don’t subscribe to the “best practice” argument to project delivery.

The diagram below shows an example of an alternate breakdown. The business value chunks that are delivered might be tiny in some cases, but at least momentum can be demonstrated. Rather than having a mass of entwined dependencies, you can isolate and minimise dependencies for that sliver of business value. When the dependency/ies has cleared, you can jump straight onto the next activity from an existing build-state rather than having to align all the activities to land in perfect precision.
Incremental OSS work breakdown

If ONAP is the answer, what are the questions?

ONAP provides a comprehensive platform for real-time, policy-driven orchestration and automation of physical and virtual network functions that will enable software, network, IT and cloud providers and developers to rapidly automate new services and support complete lifecycle management.
By unifying member resources, ONAP is accelerating the development of a vibrant ecosystem around a globally shared architecture and implementation for network automation–with an open standards focus–faster than any one product could on its own
.”
Part of the ONAP charter from onap.org.

The ONAP project is gaining attention in service provider circles. The Steering Committee of the ONAP project hints at the types of organisations investing in the project. The statement above summarises the mission of this important project. You can bet that the mission has been carefully crafted. As such, one can assume that it represents what these important stakeholders jointly agree to be the future needs of their OSS.

I find it interesting that there are quite a few technical terms (eg policy-driven orchestration) in the mission statement, terms that tend to pre-empt the solution. However, I don’t feel that pre-emptive technical solutions are the real mission, so I’m going to try to reverse-engineer the statement into business needs. Hopefully the business needs (the “why? why? why?” column below) articulates a set of questions / needs that all OSS can work to, as opposed to replicating the technical approach that underpins ONAP.

Phrase Interpretation Why? Why? Why?
real-time The ability to make instantaneous decisions Why1: To adapt to changing conditions
Why2: To take advantage of fleeting opportunities or resolve threats
Why 3: To optimise key business metrics such as financials
Why 4: As CSPs are under increasing pressure from shareholders to deliver on key metrics
policy-driven orchestration To use policies to increase the repeatability of key operational processes Why 1: Repeatability provides the opportunity to improve efficiency, quality and performance
Why 2: Allows an operator to service more customers at less expense
Why 3: Improves corporate profitability and customer perceptions
Why 4: As CSPs are under increasing pressure from shareholders to deliver on key metrics
policy-driven automation To use policies to increase the amount of automation that can be applied to key operational processes Why 1: Automated processes provide the opportunity to improve efficiency, quality and performance
Why 2: Allows an operator to service more customers at less expense
Why 3: Improves corporate profitability and customer perceptions
physical and virtual network functions Our networks will continue to consist of physical devices, but we will increasingly introduce virtualised functionality Why 1: Physical devices will continue to exist into the foreseeable future but virtualisation represents an exciting approach into the future
Why 2: Virtual entities are easier to activate and manage (assuming sufficient capacity exists)
Why 3: Physical equipment supply, build, deploy and test cycles are much longer and labour intensive
Why 4: Virtual assets are more flexible, faster and cheaper to commission
Why 5: Customer services can be turned up faster and cheaper
software, network, IT and cloud providers and developers With this increase in virtualisation, we find an increasingly large and diverse array of suppliers contributing to our value-chain. These suppliers contribute via software, network equipment, IT functions and cloud resources Why 1: CSPs can access innovation and efficiency occurring outside their own organisation
Why 2: CSPs can leverage the opportunities those innovations provide
Why 3: CSPs can deliver more attractive offers to customers
Why 4: Key metrics such as profitability and customer satisfaction are enhanced
rapidly automate new services We want the flexibility to introduce new products and services far faster than we do today Why 1: CSPs can deliver more attractive offers to customers faster than competitors
Why 2: Key metrics such as market share, profitability and customer satisfaction are enhanced as well as improved cashflow
support complete lifecycle management The components that make up our value-chain are changing and evolving so quickly that we need to cope with these changes without impacting customers across any of their interactions with their service Why 1: Customer satisfaction is a key metric and a customer’s experience spans the entire lifecyle of their service.
Why 2: CSPs don’t want customers to churn to competitors
Why 3: Key metrics such as market share, profitability and customer satisfaction are enhanced
unifying member resources To reduce the amount of duplicated and under-synchronised development currently being done by the member bodies of ONAP Why 1: Collaboration and sharing reduces the effort each member body must dedicate to their OSS
Why 2: A reduced resource pool is required
Why 3: Costs can be reduced whilst still achieving a required level of outcome from OSS
vibrant ecosystem To increase the level of supplier interchangability Why 1: To reduce dependence on any supplier/s
Why 2: To improve competition between suppliers
Why 3: Lower prices, greater choice and greater innovation tend to flourish in competitive environments
Why 4: CSPs, as customers of the suppliers, benefit
globally shared architecture To make networks, services and support systems easier to interconnect across the global communications network Why 1: Collaboration on common standards reduces the integration effort between each member at points of interconnect
Why 2: A reduced resource pool is required
Why 3: Costs can be reduced whilst still achieving interconnection benefits

As indicated in earlier posts, ONAP is an exciting initiative for the CSP industry for a number of reasons. My fear for ONAP is that it becomes such a behemoth of technical complexity that it becomes too unwieldy for use by any of the member bodies. I use the analogy of ATM versus Ethernet here, where ONAP is equivalent to ATM in power and complexity. The question is whether there’s an Ethernet answer to the whys that ONAP is trying to solve.

I’d love to hear your thoughts.

(BTW. I’m not saying that the technologies the ONAP team is investigating are the wrong ones. Far from it. I just find it interesting that the mission is starting with a technical direction in mind. I see parallels with the OSS radar analogy.)

What OSS environments do you need?

When we’re planning a new OSS, we tend to be focused on the production (PROD) environment. After all, that’s where it’s primary purpose is served, to operationalise a network asset. That is where the majority of an OSS‘s value gets created.

But we also need some (roughly) equivalent environments for separate purposes. We’ll describe some of those environments below.

By default, vendors will tend to only offer licensing for a small number of database instances – usually just PROD and a development / test environment (DEV/TEST). You may not envisage that you will need more than this, but you might want to negotiate multiple / unlimited instances just in case. If nothing else, it’s worth bringing to the negotiation table even if it gets shot down because budgets are tight and / or vendor pricing is inflexible relating to extra environments.

Examples where multiple instances may be required include:

  1. Production (PROD) – as indicated above, that’s where the live network gets managed. User access and controls need to be tight here to prevent catastrophic events from happening to the OSS and/or network
  2. Disaster Recovery (DR) – depending on your high-availability (HA) model (eg cold standby, primary / redundant, active / active), you may require a DR or backup environment
  3. Sandpit (DEV / TEST) – these environments are essential for OSS operators to be able to prototype and learn freely without the risk of causing damage to production environments. There may need to be multiple versions of this environment depending on how reflective of PROD they need to be and how viable it is to take refresh / updates from PROD (aka PROD cuts). Sometimes also known as non-PROD (NP)
  4. Regression testing (REG TEST) – regression testing requires a baseline data set to continually test and compare against, flagging any variations / problems that have arisen from any change within the OSS or networks (eg new releases). This implies a need for data and applications to be shielded from the constant change occurring on other types of environments (eg DEV / TEST). In situations where testing transforms data (eg activation processes), REG TEST needs to have the ability to roll-back to the previous baseline state
  5. Training (TRAIN) – your training environments may need to be established with a repeatable set of training scenarios that also need to be re-set after each training session. This should also be separated from the constant change occurring on dev/test environments. However, due to a shortage of environments, and the relative rarity of training needed at some customers, TRAIN often ends up as another DEV or TEST environment
  6. Production Support (PROD-SUP) – this type of environment is used to prototype patches, releases or defect fixes (for defects on the PROD environment) prior to release into PROD. PROD-SUP might also be used for stress and volume testing, or SVT may require its own environment
  7. Data Migration (DATA MIG) – At times, data creation and loading needs to be prototype in a non-PROD environment. Sometimes this can be done in PROD-SUP or even a DEV / TEST environment. On other occasions it needs its own dedicated environment so as to not interrupt BAU (business as usual) activities on those other environments
  8. System Integration Testing (SIT)OSS integrate with many other systems and often require dedicated integration testing environments

Am I forgetting any? What other environments do you find to be essential on your OSS?

Your OSS Justice League

Is it just me or has there been a proliferation of superhero movies coming out at cinemas lately? Not only that, but movies where teams of superheros link up to defeat the baddies (eg Deadpool 2, Justice League, etc)?

The thing that strikes me as interesting is that there’s rarely an overlap of super-powers within the team. They all have their different strengths and points of difference. The sum of the parts… blah, blah, blah.

Anyway, I’m curious whether you’ve noticed the same thing as me on OSS projects, that when there are multiple team members with significant skill / experience overlap, the project can bog down in indecision? I’ve noticed this particularly when there are many architects, often super-talented ones, on a project. Instead of getting the benefit of collaboration of great minds, we can end up with too many possibilities (and possibly egos) to work through and the project stagnates.

If you were to hand-pick your all-star cast for your OSS Justice League, just like in the movies, you’d look for a team of differentiated, but hopefully complementary, super-heroes I assume. But I’m diverting away from my main point here.

Each project, just like each formidable foe in the movies, is slightly different and needs slightly different super-powers to tackle it. When selecting a cast for a movie, directors have a global pool to choose from. When selecting a cast for an OSS project, directors have traditionally chosen from their own organisation, possibly with some outside hires to fill the long-term gaps.

With the increasing availability of freelance resources (ie people who aren’t intrinsically tied to carriers or vendors), the proposition of selecting a purpose-built project team of OSS super-heroes is actually beginning to become more possible. I’m wondering how much the gig economy will change the traditional OSS project team model in coming years.

I’d love to hear your thoughts and experiences on this.

An alternate way of slicing OSS (part 2)

Last week we talked about an alternate way of slicing OSS projects. Today, we’ll look a little deeper and include some diagrams.

The traditional (aka waterfall) approach to delivering an OSS project sees one big-bang delivery of business value at the end of the implementation.
OSS project delivery via waterfall

The yellow arrows indicate the sequential nature of this style of delivery. The implications include:

  1. If the project runs out of funds before the project finishes, no (negligible) value is delivered
  2. If there’s no modularity of delivery then the project team must stay the course of the original project plan. There’s no room for prioritising or dropping or including delivery modules. Project plans are rarely perfect at first after all
  3. Any changes in project plan tend to have knock-on effects into the rest of the delivery
  4. There is only one true delivery of value, but milestones demonstrate momentum for the project… a key for change management and team morale
  5. Large deliverables represent the proverbial overload one segment of the project delivery team then under-utilises the rest in each stage.  This isn’t great for project flow or team utilisation

The alternate approach seeks to deliver in multiple phases by business value, not artefacts, as shown in the sample model below:
OSS project delivery via AgilePhased enhancements following a base platform build (eg Sandpit and/or Single-site above) could include the following, where each provides a tangible outcome / benefit for the business, thus maintaining perception of momentum (assurance use-cases cited):

  • Additional event collection (ie additional collectors / probes / mediation-devices can be added or configured)
  • Additional filters / sorting of events
  • Event prioritisation mapping / presentation
  • Event correlation
  • Fault suppression
  • Fault escalation
  • Alarm augmentation
  • Alarm thresholding
  • Root-cause analysis (intra, then inter-domain)
  • Other configurations such as latching, auto-acknowledgement, visualisation parameters, etc
  • Heart-beat function (ie devices are unreachable for a user-defined period)
  • Knowledge base (ie developing a database of activities to respond to certain events)
  • Interfacing with other systems (eg trouble-ticket, work-force management, inventory, etc)
  • Setup of roles/groups
  • Setup of skills-based routing
  • Setup of reporting
  • Setup of notifications (eg email, SMS, etc)
  • Naming convention refinements
  • etc, etc

The latter is a more Agile-style breakdown of work, but doesn’t need to be delivered using Agile methodology.

Of course there are pros and cons of each approach. I’d love to hear your thoughts and experiences with different OSS delivery approaches.

The OSS Ferrari analogy

A friend and colleague has recently been talking about a Ferrari analogy on a security project we’ve been contributing to.

The end customers have decided they want a Ferrari solution, a shiny new, super-specified new toy (or in this case toys!). There’s just one problem though. The customer has a general understanding of what it is to drive, but doesn’t have driving experience or a driver’s license yet (ie they have a general understanding of what they want but haven’t described what they plan to do with the shiny toys operationally once the keys are handed over).

To take a step further back, since the project hasn’t articulated exactly where the customers want to go with the solution, we’re asking whether a Ferrari is even the right type of vehicle to take them there. As amazing as Ferraris are, might it actually make more sense to buy a 4WD vehicle?

As indicated in yesterday’s post, sometimes the requirements gathering process identifies the goal-based expectations (ie the business requirements – where the customer wants to go), but can often just identify a set of product features (ie the functional requirements such as a turbo-charged V8 engine, mid-mount engine, flappy-paddle gear change, etc, etc). The latter leads to buying a Ferrari. The former is more likely to lead to buying the vehicle best-suited to getting to the desired destination.

The OSS Ferrari sounds nice, but…

Optimisation Support Systems

We’ve heard of OSS being an acronym for operational support systems, operations support systems, even open source software. I have a new one for you today – Optimisation Support Systems – that exists for no purpose other than to drive a mindset shift.

I think we have to transition from “expectations” in a hype sense to “expectations” in a goal sense. NFV is like any technology; it depends on a business case for what it proposes to do. There’s a lot wrong with living up to hype (like, it’s impossible), but living up to the goals set for a technology is never unrealistic. Much of the hype surrounding NFV was never linked to any real business case, any specific goal of the NFV ISG.”
Tom Nolle
in his blog here.

This is a really profound observation (and entire blog) from Tom. Our technology, OSS included, tends to be surrounded by “hyped” expectations – partly from our own optimistic desires, partly from vendor sales pitches. It’s far easier to build our expectations from hype than to actually understand and specify the goals that really matter. Goals that are end-to-end in manner and preferably quantifiable.

When embarking on a technology-led transformation, our aim is to “make things better,” obviously. A list of hundreds of functional requirements might help. However, having an up-front, clear understanding of the small number of use cases you’re optimising for tends to define much clearer goal-driven expectations.

New OSS functionality or speed and scale?

We all know that revenue per bit (of data transferred across comms networks) is trending lower. How could we not? It’s posited as one of the reasons for declining profitability of the industry. The challenge for telcos is how to engineer an environment of low revenue per bit but still be cost viable.

I’m sure there are differentiated comms products out there in the global market. However, for the many products that aren’t differentiated, there’s a risk of commoditisation. Customers of our OSS are increasingly moving into a paradigm of commoditisation, which in turn impacts the form our OSS must mould themselves to.

The OSS we deliver can either be the bane or the saviour. They can be a differentiator where otherwise there is none. For example, getting each customer’s order ready for service (RFS) faster than competitors. Or by processing orders at scale, yet at a lower cost-base through efficiencies / repeatability such as streamlined products, processes and automations.

OSS exist to improve efficiency at scale of course, but I wonder whether we lose sight of that sometimes? I’ve noticed that we have a tendency to focus on functionality (ie delivering new features) rather than scale.

This isn’t just the OSS vendors or implementation teams either by the way. It’s often apparent in customer requirements too. If you’ve been lucky enough to be involved with any OSS procurement processes, which side of the continuum was the focus – on introducing a raft of features, or narrowing the field of view down to doing the few really important things at scale and speed?

Zero touch network & Service Management (ZSM)

Zero touch network & Service Management (ZSM) is a next-gen network management approach using closed-loop principles hosted by ETSI. An ETSI blog has just demonstrated the first ZSM Proof of Concept (PoC). The slide deck describing the PoC, supplied by EnterpriseWeb, can be found here.

The diagram below shows a conceptual closed-loop assurance architecture used within the PoC
ETSI ZSM PoC.

It contains some similar concepts to a closed-loop traffic engineering project designed by PAOSS back in 2007, but with one big difference. That 2007 project was based on a single-vendor solution, as opposed to the open, multi-vendor PoC demonstrated here. Both were based on the principle of using assurance monitors to trigger fulfillment responses. For example, ours used SLA threshold breaches on voice switches to trigger automated remedial response through the OSS‘s provisioning engine.

For this newer example, ETSI’s blog details, “The PoC story relates to a congestion event caused by a DDoS (Denial of Service) attack that results in a decrease in the voice quality of a network service. The fault is detected by service monitoring within one or more domains and is shared with the end-to-end service orchestrator which correlates the alarms to interpret the events, based on metadata and metrics, and classifies the SLA violations. The end-to-end service orchestrator makes policy-based decisions which trigger commands back to the domain(s) for remediation.”

You’ll notice one of the key call-outs in the diagram above is real-time inventory. That was much harder for us to achieve back in 2007 than it is now with virtualised network and compute layers providing real-time telemetry. We used inventory that was only auto-discovered once daily and had to build in error handling, whilst relying on over-provisioned physical infrastructure.

It’s exciting to see these types of projects being taken forward by ETSI, EnterpriseWeb, et al.

Just in time design

It’s interesting how we tend to go in cycles. Back in the early days of OSS, the network operators tended to build their OSS from the ground up. Then we went through a phase of using Commercial off-the-shelf (COTS) OSS software developed by third-party vendors. We now seem to be cycling back towards in-house development, but with collaboration that includes vendors and external assistance through open-source projects like ONAP. Interesting too how Agile fits in with these cycles.

Regardless of where we are in the cycle for our OSS, as implementers we’re always challenged with finding the Goldilocks amount of documentation – not too heavy, not too light, but just right.

The Agile Manifesto espouses, “working software over comprehensive documentation.” Sounds good to me! It perplexes me that some OSS implementations are bogged down by lengthy up-front documentation phases, especially if we’re basing the solution on COTS offerings. These can really stall the momentum of a project.

Once a solution has been selected (which often does require significant analysis and documentation), I’m more of a proponent of getting the COTS software stood up, even if only in a sandpit environment. This is where just-in-time (JIT) documentation comes into play. Rather than having every aspect of the solution documented (eg process flows, data models, high availability models, physical connectivity, logical connectivity, databases, etc, etc), we only need enough documentation for collaborative stakeholders to do their parts (eg IT to set up hardware / hosting, networks to set up physical connectivity, vendor to provide software, integrator to perform build, etc) to stand up a vanilla solution.

Then it’s time to start building trial scenarios through the solution. There’s usually quite a bit of trial and error in this stage, as we seek to optimise the scenarios for the intended users. Then we add a few more scenarios.

There’s little point trying to document the solution in detail before a scenario is trialled, but some documentation can be really helpful. For example, if the scenario is to build a small sub-section of a network, then draw up some diagrams of that sub-network that include the intended naming conventions for each object (eg device, physical connectivity, addresses, logical connectivity, etc). That allows you to determine whether there are unexpected challenges with naming conventions, data modelling, process design, etc. There are always unexpected challenges that arise!

I figure you’re better off documenting the real challenges than theorising on the “what if?” challenges, which is what often happens with up-front documentation exercises. There are always brilliant stakeholders who can imagine millions of possible challenges, but these often bog the design phase down.

With JIT design, once the solution evolves, the documentation can evolve too… if there is an ongoing reason for its existence (eg as a user guide, for a test plan, as a training cheat-sheet, a record of configuration for fault-finding purposes, etc).

Interestingly, the first value in the Agile Manifesto is, “individuals and interactions over processes and tools.” This is where the COTS vs in-house-dev comes back into play. When using COTS software, individuals, interactions and processes are partly driven by what the tools support. COTS functionality constrains us but we can still use Agile configuration and customisation to optimise our solution for our customers’ needs (where cost-benefit permits).

Having a working set of vanilla tools allows our customers to get a much better feel for what needs to be done rather than trying to understand the intent of up-front design documentation. And that’s the key to great customer outcomes – having the customers knowledgeable enough about the real solution (not hypothetical solutions) to make the most informed decisions possible.

Of course there are always challenges with this JIT design model too, especially when third-party contracts are involved!

Aggregated OSS buying models

Last week we discussed a sell-side co-op business model. Today we’ll look at buy-side co-op models.

In other industries, we hear of buying groups getting great deals through aggregated buying volumes. This is a little harder to achieve with products that are as uniquely customised as OSS. It’s possible that OSS buy-side aggregation could occur for operators that are similar in nature but don’t compete (eg regional operators). Having said that, I’ve yet to see any co-ops formed to gain OSS group-purchase benefits. If you have, I’d love to hear about it.

In OSS, there are three approaches that aren’t exactly co-op buying models but do aggregate the evaluation and buying decision.

The most obvious is for corporations that run multiple carriers under one umbrella such as Telefonica (see Telefonica’s various OSS / BSS contract notifications here), SingTel (group contracts here), etisalat, etc. There would appear to benefits in standardising OSS platforms across each of the group companies.

A far less formal co-op buying model I’ve noticed is the social-proof approach. This is where one, typically large, network operator in a region goes through an extensive OSS / BSS evaluation and chooses a vendor. Then there’s a domino effect where other, typically smaller, network operators also buy from the same vendor.

Even less formal again is by using third-party organisations like Passionate About OSS to assist with a standard vendor selection methodology. The vendors selected aren’t standardised because each operator’s needs are different, but the product / vendor selection methodology builds on the learnings of past selection processes across multiple operators. The benefits comes in the evaluation and decision frameworks.

How an OSS is like an F1 car

A recent post discussed the challenge of getting a timeslice of operations people to help build the OSS. That post surmised, “as the old saying goes, you get back what you put in. In the case of OSS I’ve seen it time and again that operations need to contribute significantly to the implementation to ensure they get a solution that fits their needs.”

I have a new saying for you today, this time from T.D. Jakes, “You can’t be committed to the dream. You have to be committed to the process.”

If you’re representing an organisation that is buying an OSS solution from a vendor / integrator, please consider these two adages above. Sometimes we’re good at forming the dream (eg business requirements, business case, etc) and expecting the vendor to conduct almost all of the process. While our network operations teams are hired for the process of managing the network, we also need their significant input on the process of building / configuring an OSS. The vendor / integrator can’t just develop it in isolation and then hand it over to ops with a few days of training at the end.

The process of bringing a new OSS into an organisation is not like buying a road car. With an OSS, you can’t just place an order with some optional features like paint and trim specified, then expect to start driving it as soon as it leaves the vendor’s assembly line. It’s more like an F1 car where the driver is in constant communications with the pit-crew, changing and tweaking and refining to optimise the car to the driver’s unique needs (and in turn to hopefully optimise the results).

At least, that’s what current-state OSS are like. Perhaps in the future… we’ll strive to refine our OSS to be more like a road-car – standardised and intuitive enough for operators to drive straight off the assembly line.