Over the years I’m sure you’ve seen many different OSS demonstrations. You’ve probably also seen presentations by vendors / integrators that have shown multiple different products from their suite.
How integrated have they appeared to you?
Have they seemed tightly integrated, as if carved from a single piece of stone?
Or have they seemed loosely integrated, a series of obviously different stones joined together with some mortar?
Or perhaps even barely associated, a series of completely different objects (possibly through product acquisition) branded under a common marketing name?
There are different pros and cons with each approach. Tight integration possibly suits a greenfields OSS. Looser integration perhaps better suits carve-off for best-of-breed customer architecture models.
I don’t know about you, but I always prefer to be given the impression that an attempt has been made to ensure consistency in the bundling. Consistency of user-interface, workflow, data modelling/presentation, reports, etc. With modern presentation layers, database technologies and the availability of UX / CX expertise, this should be less of a hurdle than it has been in the past.
“ONAP provides a comprehensive platform for real-time, policy-driven orchestration and automation of physical and virtual network functions that will enable software, network, IT and cloud providers and developers to rapidly automate new services and support complete lifecycle management.
By unifying member resources, ONAP is accelerating the development of a vibrant ecosystem around a globally shared architecture and implementation for network automation–with an open standards focus–faster than any one product could on its own.”
Part of the ONAP charter from onap.org.
The ONAP project is gaining attention in service provider circles. The Steering Committee of the ONAP project hints at the types of organisations investing in the project. The statement above summarises the mission of this important project. You can bet that the mission has been carefully crafted. As such, one can assume that it represents what these important stakeholders jointly agree to be the future needs of their OSS.
I find it interesting that there are quite a few technical terms (eg policy-driven orchestration) in the mission statement, terms that tend to pre-empt the solution. However, I don’t feel that pre-emptive technical solutions are the real mission, so I’m going to try to reverse-engineer the statement into business needs. Hopefully the business needs (the “why? why? why?” column below) articulates a set of questions / needs that all OSS can work to, as opposed to replicating the technical approach that underpins ONAP.
Why? Why? Why?
The ability to make instantaneous decisions
Why1: To adapt to changing conditions
Why2: To take advantage of fleeting opportunities or resolve threats
Why 3: To optimise key business metrics such as financials
Why 4: As CSPs are under increasing pressure from shareholders to deliver on key metrics
To use policies to increase the repeatability of key operational processes
Why 1: Repeatability provides the opportunity to improve efficiency, quality and performance
Why 2: Allows an operator to service more customers at less expense
Why 3: Improves corporate profitability and customer perceptions
Why 4: As CSPs are under increasing pressure from shareholders to deliver on key metrics
To use policies to increase the amount of automation that can be applied to key operational processes
Why 1: Automated processes provide the opportunity to improve efficiency, quality and performance
Why 2: Allows an operator to service more customers at less expense
Why 3: Improves corporate profitability and customer perceptions
physical and virtual network functions
Our networks will continue to consist of physical devices, but we will increasingly introduce virtualised functionality
Why 1: Physical devices will continue to exist into the foreseeable future but virtualisation represents an exciting approach into the future
Why 2: Virtual entities are easier to activate and manage (assuming sufficient capacity exists)
Why 3: Physical equipment supply, build, deploy and test cycles are much longer and labour intensive
Why 4: Virtual assets are more flexible, faster and cheaper to commission
Why 5: Customer services can be turned up faster and cheaper
software, network, IT and cloud providers and developers
With this increase in virtualisation, we find an increasingly large and diverse array of suppliers contributing to our value-chain. These suppliers contribute via software, network equipment, IT functions and cloud resources
Why 1: CSPs can access innovation and efficiency occurring outside their own organisation
Why 2: CSPs can leverage the opportunities those innovations provide
Why 3: CSPs can deliver more attractive offers to customers
Why 4: Key metrics such as profitability and customer satisfaction are enhanced
rapidly automate new services
We want the flexibility to introduce new products and services far faster than we do today
Why 1: CSPs can deliver more attractive offers to customers faster than competitors
Why 2: Key metrics such as market share, profitability and customer satisfaction are enhanced as well as improved cashflow
support complete lifecycle management
The components that make up our value-chain are changing and evolving so quickly that we need to cope with these changes without impacting customers across any of their interactions with their service
Why 1: Customer satisfaction is a key metric and a customer’s experience spans the entire lifecyle of their service.
Why 2: CSPs don’t want customers to churn to competitors
Why 3: Key metrics such as market share, profitability and customer satisfaction are enhanced
unifying member resources
To reduce the amount of duplicated and under-synchronised development currently being done by the member bodies of ONAP
Why 1: Collaboration and sharing reduces the effort each member body must dedicate to their OSS
Why 2: A reduced resource pool is required
Why 3: Costs can be reduced whilst still achieving a required level of outcome from OSS
To increase the level of supplier interchangability
Why 1: To reduce dependence on any supplier/s
Why 2: To improve competition between suppliers
Why 3: Lower prices, greater choice and greater innovation tend to flourish in competitive environments
Why 4: CSPs, as customers of the suppliers, benefit
globally shared architecture
To make networks, services and support systems easier to interconnect across the global communications network
Why 1: Collaboration on common standards reduces the integration effort between each member at points of interconnect
Why 2: A reduced resource pool is required
Why 3: Costs can be reduced whilst still achieving interconnection benefits
As indicated in earlier posts, ONAP is an exciting initiative for the CSP industry for a number of reasons. My fear for ONAP is that it becomes such a behemoth of technical complexity that it becomes too unwieldy for use by any of the member bodies. I use the analogy of ATM versus Ethernet here, where ONAP is equivalent to ATM in power and complexity. The question is whether there’s an Ethernet answer to the whys that ONAP is trying to solve.
I’d love to hear your thoughts.
(BTW. I’m not saying that the technologies the ONAP team is investigating are the wrong ones. Far from it. I just find it interesting that the mission is starting with a technical direction in mind. I see parallels with the OSS radar analogy.)
We keep shiploads of data in our OSS don’t we? Just think about how much storage your OSS estate consumes.
Technically, it doesn’t cost much (relatively) to retain all that potential for insight generation with the cost of storage diminishing. The real cost of storing the data goes a little deeper than the $/Mb though. Other cost factors include data curation, cleansing, database search performance, etc.
There’s a whole field of study relating to this, named Information Lifecycle Management (ILM), but let’s look at it in terms of relevance to OSS.
We collect information across different timescales including real-time processing, short-term correlations, longer-term trending and long-term statutory / regulatory.
But rather than blindly just storing everything, we could ask ourselves at what stage does each data sub-set lose relevance. As our OSS data ages, it can tend to deteriorate because the models it uses also deteriorate. Model deterioration factors, such as those described in this recent post about a machine-learning PoC and the following, are numerous:
Data model factors change to cope with gaps in original models
Each of these factors (and more) lead to deterioration in the usefulness of baseline data. This means the insight signals in the data becomes less clear, or at worst the baseline needs to be re-established, making old data invalid. If it’s invalid, then retention would appear to be pointless. Shifting it to the right through the storage types shown in the diagram above could also be pointless.
Very little of the OSS data you store is ever actually used, decreasingly so as it ages. Do you have a heatmap of what data you use in your OSS?
As you already know, there are two categories of downtime – unplanned (eg failures) and planned (eg upgrades / maintenance).
Planned downtime sounds a lot nicer (for operators) but the reality is that you could call both types “incidents” – they both impact (or potentially impact) the customer. We sometimes underestimate that fact.
Today’s question is whether you’re able to identify where the hotspots are in your OSS suite when you combine both types of downtime. Can you tell which outages are service-impacting?
In a round-about way, I’m asking whether you already have a dashboard that monitors uptime of all the components (eg applications, probes, middleware, infra, etc) that make up your complete OSS / BSS estate? If you do, does it tell you what you anecdotally know already, or are there sometimes surprises?
Does the data give you the evidence you need to negotiate with the implementers of problematic components (eg patch cadence, the need for reliability fixes, streamlining the patch process, reduction in customisations, etc)? Does it give you reason to make architectural changes (eg webscaling)?
“The iPhone disrupted the handset business, but has not disrupted the cellular network operators at all, though many people were convinced that it would. For all that’s changed, the same companies still have the same business model and the same customers that they did in 2006. Online flight booking doesn’t disrupt airlines much, but it was hugely disruptive to travel agents. Online booking (for the sake of argument) was sustaining innovation for airlines and disruptive innovation for travel agents.
Meanwhile, the people who are first to bring the disruption to market may not be the people who end up benefiting from it, and indeed the people who win from the disruption may actually be doing something different – they may be in a different part of the value chain. Apple pioneered PCs but lost the PC market, and the big winners were not even other PC companies. Rather, most of the profits went to Microsoft and Intel, which both operated at different layers of the stack. PCs themselves became a low-margin commodity with fierce competition, but PC CPUs and operating systems (and productivity software) turned out to have very strong winner-takes-all effects.”
Ben Evans on his blog about Tesla.
As usual, Ben makes some thought-provoking points. The ones above have coaxed me into thinking about OSS from a slightly perspective.
I’d tended to look at OSS as a product to be consumed by network operators (and further downstream by the customers of those network operators). I figured that if our OSS delivered benefit to the downstream customers, the network operators would thrive and would therefore be prepared to invest more into OSS projects. In a way, it’s a bit like a sell-through model.
But the ideas above give some alternatives for OSS providers to reduce dependence on network operator budgets.
Traditional OSS fit within a value-chain that’s driven by customers that wish to communicate. In the past, the telephone network was perceived as the most valuable part of that value-chain. These days, digitisation and competition has meant that the perceived value of the network has dropped to being a low-margin commodity in most cases. We’re generally not prepared to pay a premium for a network service. The Microsofts and Intels of the communications value-chain is far more diverse. It’s the Googles, Facebooks, Instagrams, YouTubes, etc that are perceived to deliver most value to end customers today.
If I were looking for a disruptive OSS business model, I wouldn’t be looking to add exciting new features within the existing OSS model. In fact, I’d be looking to avoid our current revenue dependence on network operators (ie the commoditising aspects of the communications value-chain). Instead I’d be looking for ways to contribute to the most valuable aspects of the chain (eg apps, content, etc). Or even better, to engineer a more exceptional comms value-chain than we enjoy today, with an entirely new type of OSS.
The diagram below attempts to show how the entire market (whether that’s the supplier-side or the buyer-side) will absorb a given new feature.
The leaders pick up the concept at T0 and then it takes another few years before the laggards implement it.
Most of us in the OSS implementation world crave to be at the leading edge of change. The right-side of the curve is definitely the sexier side to be on. I know I do. It’s part of the reason this blog exists – to stay abreast of the exciting new ideas, projects and technologies that are coming through in OSS. Funnily enough, there’s probably even people within most of the laggards who are already excited about a new concept not long after T0, but are just unable to implement it until much later.
Supplier sales-pitches also tend to focus on the right side of the curve. That’s where the buzz is. That’s where the premiums are, the rewards for being first to market. It’s the customers on the right-side of the curve that are most attractive as sales targets for many suppliers.
But I also wonder whether the increasing proliferation of tech options within OSS means there’s also increasing inefficiency for suppliers (and possibly buyers) on the right side of the curve? Do we focus all our development efforts on ONAP or [insert any of millions of other alternative platforms, technologies, ideas, etc] today? What if the mass-market goes down an alternate path to the one you’ve chosen? How long before you identify a divergence from the mass-market trend? What’s the impact of changing direction (or not)? Are you bound to spill some blood by playing on the bleeding edge?
The left side of the graph is arguably more predictable. You can already see where the market is trending. Has the whole concept just been hype or has this new thing really made a difference for customers? Most of the implementation hurdles are likely to have already been resolved. Products have matured. More integrations, reports, etc have been developed. Waters have already been chartered.
I don’t have the numbers to back this up, but I also have a suspicion that there’s less supplier competition for the business of laggard or follower customers. I’ve seen some companies that have thrived on this model. They get a nice unimpeded ride on the back of the wave whilst everyone else is fighting to catch the front-edge of it.
Chasing the left side of the curve might seem counter-intuitive because it clearly represents a falling market. But there’s always the next wave to jump onto, each with similar predictability and reduced competition.
Not only that, but a majority of the the most important OSS use-cases have been around for many years. It’s increasingly difficult to find new functionality that delivers tangible benefits. Whilst other suppliers have jumped off to chase the next big thing, the followers can keep refining their solutions for what matters most.
Let me pose the question this way – Can you think of a single OSS product that is so refined that it can’t do the basics any better than it already does? Nope??
When selecting new applications for an OSS or to augment an existing OSS, it always makes sense to me to run a Proof of Concept. But what do we want to demonstrate in that PoC? For me, we want to run demonstrations of the factors (eg features, use-cases, processes, etc) that justify the investment.
A simple exercise you can use is to identify the personas / roles that interact with the OSS. This could include personas such as NOC operator, strategic planner, network engineer, order entry, field ops, data / analytics, application administrator, etc. The actual personas will differ within each organisation of course.
For each of those personas, we can identify and interview an individual that represents that persona.
Interview questions include:
What are the key responsibilities of your role
What is the most important goal / KPI for your role
How does this OSS (or proposed OSS) support you meeting this goal
Describe the single most important process / function that you perform using the OSS
Why is it so important
How often do you perform this process / function
Please provide a short list of other important processes / functions you perform with this OSS
We can then build this into a matrix and seek to prioritise into a set of use-cases. Based on time and cost constraints, we can then build the top-n of those use-cases into implementation scenarios for the PoC.
We had a highly flexible network design team at a previous company. Not because we wanted to necessarily, but because we were forced to by the client’s allocation of work.
Our team was largely based on casual workers because there was little to predict whether we needed 2 designers or 50 in any given week. The workload being assigned by the client was incredibly lumpy.
But we were lucky. We only had design work. The lumpiness in design effort flowed down through the work stack into construction, test and deployment teams. The constructors had millions of dollars of equipment that they needed to mobilise and demobilise as the work ebbed and flowed. Unfortunately for the constructors, they’d prepared their rate cards on the assumption of a fairly consistent level of work coming through (it was a very big project).
This lumpiness didn’t work out for anyone in the delivery pipeline, the client included. It was actually quite instrumental in a few of the constructors going into liquidation. The client struggled to meet roll-out targets.
The allocation of work was being made via the client’s B/OSS stack. The B/OSS teams were blissfully unaware of the downstream impact of their sporadic allocation of designs. Towards the end of the project, they were starting to get more consistent and delivery teams started to get into more of a rhythm… just as the network was coming to the end of build.
As OSS builders, we sometimes get so wrapped up in delivering functionality that we can forget that one of the key requirements of an OSS is to operationalise at scale. In addition to UI / CX design, this might be something as simple as smoothing the effort allocation for work under our OSS‘s management.
Interesting table below in relation to the customer satisfaction and costs of delivering various styles of customer assurance activities:
The ambition for any organisation is to perform a shift to the left on this table. In other words, to introduce assurance mechanisms that increase the likelihood of an event being captured towards the left of the table (ie before becoming a field operations issue to solve). In theory, every shift left results in greater customer satisfaction and reduced cost to the operator.
Of course it’s a generic table (eg some proactive assurance programs can be higher than a “low” cost classification), but it does tell a story.
Our OSS cover the full scope of this table. Our OSS don’t perform L1/2/3 assurance or Field Ops, but they certainly help to coordinate and manage those activities.
If you were to use this table to classify your operational costs, what does the cost profile look like? Is it heavily weighted to the right side of the table? Does your operational cost profile justify further investment in your OSS to shift some of those costs to the left?
This post from sysaid has some further shift-left concepts as they relate to service management within IT.
Ciena Corporation has entered into a definitive agreement to acquire privately-held DonRiver, a global software and services company specializing in federated network and service inventory management solutions within the service provider Operational Support Systems (OSS) environment.
DonRiver will bring new capabilities to Ciena’s Blue Planet software and services portfolio that significantly enhance the company’s ability to deliver on its Adaptive Network vision through intelligent, closed-loop automation. Specifically, with the addition of DonRiver’s federated network and service inventory management solutions, Ciena’s Blue Planet capabilities will extend beyond network orchestration and control to also provide a unified inventory view of all elements across a provider’s network. Additionally, the DonRiver team of specialized OSS software, integration and consulting experts will complement and scale the Blue Planet organization to form a truly unique and specialized services group that is able to manage modernization projects across both IT and network operations.
“The combination of Blue Planet and DonRiver will enhance our ability to deliver closed loop automation of network services and the underlying operational processes across IT/operations and the network,” said Rick Hamilton, senior vice president of Global Software and Services at Ciena. “With this new set of technology and expertise, we can help customers realize the full benefits of network automation by helping them move away from highly complex and fragmented OSS environments to those that accurately reflect the real-time state and utilization of network resources.”
The transaction is expected to close during Ciena’s fiscal fourth quarter 2018 and is subject to customary closing conditions.
“The more data you have, the more data you need to understand the data you have. You are engaged in a data ponzi scheme…Could it be in service assurance and IT ops that more data equals less understanding?”
Phil Tee in the opening address at the AIOps Symposium.
Interesting viewpoint right?
Given that our OSS hold shed-loads of data, Phil is saying we need lots of data to understand that data. Well, yes… and possibly no.
I have a theory that data alone doesn’t talk, but it’s great at answering questions. You could say that you need lots of data, although I’d argue in semantics that you actually need lots of knowledge / awareness to ask great questions. Perhaps that knowledge / awareness comes from seeding machine-led analysis tools (or our data scientists’s brains) with lots of data.
The more data you have, the more noise that you need to find signal in amongst. That means you have to ask more questions of your data if you want to drive a return that justifies the cost of collecting and curating it all. Machine-led analytics certainly assist us in handling the volume and velocity of data our OSS create / collect. That’s just asking the same question/s over and over. There’s almost no end to the questions that can be asked of our data, just a limit on the time in which we can ask it.
Does that make data a Ponzi scheme? A Ponzi scheme pays profits to earlier investors using funds obtained from newer investors. Eventually it must collapse the scheme eventually runs out of new investors to fund profits. In a data Ponzi scheme, it pays in insights from earlier (seed) data by obtaining new (streaming) data. The stream of data reaching an OSS never runs out. If we need to invest heavily in data (eg AI / ML, etc), at what point in the investment lifecycle will we stop creating new insights?
Comarch and LG U+ have signed a major contract in a bid to completely revamp the Korean operator’s network resource and service management, covering Operations Support & Readiness, Fulfilment and Assurance domains, in preparation for the operator’s planned big scale 5G rollout.
The upcoming implementation will allow LG U+ to migrate from its old in-house solution to a modern and comprehensive telco ecosystem.
Comarch will oversee the implementation of a complete stack of solutions consolidating the existing tools landscape into one unified, scalable platform in the areas of mobile and fixed networks. LG U+ goals in the project are to optimize internal company processes, and to improve the overall end-user experience.
The planned OSS stack overhaul will also be instrumental in the Korean operator’s plans to launch one of the first commercial 5G networks. While supporting the operator in pursuing the latest network technology, the Comarch system will also serve 3G, 4G and fixed network domains.
The solution delivered by Comarch will help LG U+ break IT architecture silos, prepare for efficient fulfilment of modern, 5G-based services, increase network management effectiveness and cut its costs through automation. It will also provide the tools to create logical connectivity layers in a unified format, support network virtualization, and handle the monitoring of network, service and customer layers. Comarch OSS will also empower LG U+ Intelligent Assurance & Analytics including an embedded machine learning engine.
The LG U+ contract is an important milestone for Comarch. Supporting one of the first deployments of a commercial 5G network, puts our company at the true forefront of innovation. The delivery of our OSS platform, which comprises close to 20 modules, will bring our customer a world-class, integrated solution enabling the efficient management of services delivered via mobile and fixed networks. Additionally, a major implementation for a key Korean mobile carrier will definitely help us expand our presence on the Asian markets – noted Jacek Lonc, EVP Sales Telco Division at Comarch.
At LG U+ we currently use an in-house developed OSS stack. As the current IT architecture is silo-based, we experience a number of challenges regarding the introduction of new technologies such as 5G and network virtualization. The successful implementation of Comarch’s comprehensive platform will enable us to achieve a competitive advantage and increase business process efficiency – noted Hokyung Kwon, NMS Development Team Leader at LGU+.
Ericsson has agreed to acquire 100 percent of the shares in CENX, boosting Ericsson’s Operations Support Systems (OSS) portfolio with vendor-agnostic service assurance and closed-loop automation capability. Ericsson has held a minority stake in CENX since 2012.
Ericsson has a market leading position in NFV and orchestration. This capability will be further enhanced with CENX’s closed-loop automation and service assurance capabilities. To unleash the potential of 5G, telecom operators need to leverage network virtualization and orchestrate and automate network slices to serve the needs of enterprise customers towards their digital transformation – all while reducing operational costs.
Mats Karlsson, Head of Solution Area OSS, Ericsson, says: “Dynamic orchestration is crucial in 5G-ready virtualized networks. By bringing CENX into Ericsson, we can continue to build upon the strong competitive advantage we have started as partners. I look forward to meeting and welcoming our new colleagues into Ericsson.”
Closed-loop automation ensures Ericsson can offer its service provider customers an orchestration solution that is optimised for 5G use cases like network slicing, taking full advantage of Ericsson’s distributed cloud offering. Ericsson’s global sales and delivery presence – along with its strong R&D – will also create economies of scale in the CENX portfolio and help Ericsson to offer in-house solutions for OSS automation and assurance.
Ed Kennedy CEO, CENX says: “Ericsson has been a great partner – and for us to take the step to fully join Ericsson gives us the best possible worldwide platform to realize CENX’s ultimate goal – autonomous networking for all. Our closed-loop service assurance automation capability complements Ericsson’s existing portfolio very well. We look forward to seeing our joint capability add great value to the transformation of both Ericsson and its customers.”
CENX, founded in 2009, is headquartered in Jersey City, New Jersey. The company achieved significant year-over-year revenue growth in the fiscal year that ended December 31, 2017. CENX employs 185 people.
The transaction is subject to customary regulatory approvals
I’d like to call in a favour today if I may. I’m on the hunt for any existing use-cases and / or project sites that have integrated a significant sensor network into their OSS and existing operational processes.
That includes a strategy for handling IoT-scale integration of data collection, event / alarm processing, device management, data contextualization, data analytics, end-to-end security and applications management / enablement within existing OSS tools.
I’m looking for examples where an OSS had previously managed thousands of (network) devices and is now managing hundreds of thousands of (IoT) devices. Not necessarily IoT devices of customers as services but within an operator’s own network.
Obviously that’s an unprecedented change in scale in traditional OSS terms, but will be commonplace if our OSS are to play a part in the management of large sensor networks in the future.
There’s an element of mutual exclusivity between what an IoT management platform and OSS needs to do, but there are also some similarities. I’d love to speak with anyone who has actually bridged the gap.
“If your partners don’t have to talk to you then you win.”
Put another way, the best form of customer service is no customer service (ie your customers and/or partners are so delighted with your automated offerings that they have no reason to contact you). They don’t want to contact you anyway (generally speaking). They just want to consume a perfectly functional and reliable solution.
In the deep, distant past, our comms networks required operators. But then we developed automated dialling / switching. In theory, the network looked after itself and people made billions of calls per year unassisted.
Something happened in the meantime though. Telco operators the world over started receiving lots of calls about their platform and products. You could say that they’re unwanted calls. The telcos even have an acronym called CVR – Call Volume Reduction – that describes their ambitions to reduce the number of customer calls that reach contact centre agents. Tools such as chatbots and IVR have sprung up to reduce the number of calls that an operator fields.
Network as a Service (NaaS), the context within Guy’s comment above, represents the next new tool that will aim to drive CVR (amongst a raft of other benefits). NaaS theoretically allows customers to interact with network operators via impersonal contracts (in the form of APIs). The challenge will be in the reliability – ensuring that nothing falls between the cracks in any of the layers / platforms that combine to form the NaaS.
In the world of NaaS creation, Guy is exactly right – “If your partners [and customers] don’t have to talk to you then you win.” As always, it’s complexity that leads to gaps. The more complex the NaaS stack, the less likely you are to achieve CVR.
When we’re planning a new OSS, we tend to be focused on the production (PROD) environment. After all, that’s where it’s primary purpose is served, to operationalise a network asset. That is where the majority of an OSS‘s value gets created.
But we also need some (roughly) equivalent environments for separate purposes. We’ll describe some of those environments below.
By default, vendors will tend to only offer licensing for a small number of database instances – usually just PROD and a development / test environment (DEV/TEST). You may not envisage that you will need more than this, but you might want to negotiate multiple / unlimited instances just in case. If nothing else, it’s worth bringing to the negotiation table even if it gets shot down because budgets are tight and / or vendor pricing is inflexible relating to extra environments.
Examples where multiple instances may be required include:
Production (PROD) – as indicated above, that’s where the live network gets managed. User access and controls need to be tight here to prevent catastrophic events from happening to the OSS and/or network
Disaster Recovery (DR) – depending on your high-availability (HA) model (eg cold standby, primary / redundant, active / active), you may require a DR or backup environment
Sandpit (DEV / TEST) – these environments are essential for OSS operators to be able to prototype and learn freely without the risk of causing damage to production environments. There may need to be multiple versions of this environment depending on how reflective of PROD they need to be and how viable it is to take refresh / updates from PROD (aka PROD cuts). Sometimes also known as non-PROD (NP)
Regression testing (REG TEST) – regression testing requires a baseline data set to continually test and compare against, flagging any variations / problems that have arisen from any change within the OSS or networks (eg new releases). This implies a need for data and applications to be shielded from the constant change occurring on other types of environments (eg DEV / TEST). In situations where testing transforms data (eg activation processes), REG TEST needs to have the ability to roll-back to the previous baseline state
Training (TRAIN) – your training environments may need to be established with a repeatable set of training scenarios that also need to be re-set after each training session. This should also be separated from the constant change occurring on dev/test environments. However, due to a shortage of environments, and the relative rarity of training needed at some customers, TRAIN often ends up as another DEV or TEST environment
Production Support (PROD-SUP) – this type of environment is used to prototype patches, releases or defect fixes (for defects on the PROD environment) prior to release into PROD. PROD-SUP might also be used for stress and volume testing, or SVT may require its own environment
Data Migration (DATA MIG) – At times, data creation and loading needs to be prototype in a non-PROD environment. Sometimes this can be done in PROD-SUP or even a DEV / TEST environment. On other occasions it needs its own dedicated environment so as to not interrupt BAU (business as usual) activities on those other environments
System Integration Testing (SIT) – OSS integrate with many other systems and often require dedicated integration testing environments
Am I forgetting any? What other environments do you find to be essential on your OSS?
I was lucky enough to get some time of a friend recently, a friend who’s running a machine-learning network assurance proof-of-concept (PoC).
He’s been really impressed with the results coming out of the PoC. However, one of the really interesting factors he’s been finding is how frequently BAU (business as usual) changes in the OSS data (eg changes in naming conventions, topologies, etc) would impact results. Little changes made by upstream systems effectively invalidated baselines identified by the machine-learning engines to key in on. Those little changes meant the engine had to re-baseline / re-learn to build back up to previous insight levels. Or to avoid invalidating the baseline, it would require re-normalising all of data prior to the identification of BAU changes.
That got me wondering whether DevOps (or any other high-change environment) might actually hinder our attempts to get machine-led assurance optimisation. But more to the point, does constant change (at all levels of a telco business) hold us back from reaching our aim of closed-loop / zero-touch assurance?
Just like the proverbial self-driving car, will we always need someone at the wheel of our OSS just in case a situation arises that the machines hasn’t seen before and/or can’t handle? How far into the future will it be before we have enough trust to take our hands off the OSS wheel and let the machines drive closed-loop processes without observation by us?
Is it just me or has there been a proliferation of superhero movies coming out at cinemas lately? Not only that, but movies where teams of superheros link up to defeat the baddies (eg Deadpool 2, Justice League, etc)?
The thing that strikes me as interesting is that there’s rarely an overlap of super-powers within the team. They all have their different strengths and points of difference. The sum of the parts… blah, blah, blah.
Anyway, I’m curious whether you’ve noticed the same thing as me on OSS projects, that when there are multiple team members with significant skill / experience overlap, the project can bog down in indecision? I’ve noticed this particularly when there are many architects, often super-talented ones, on a project. Instead of getting the benefit of collaboration of great minds, we can end up with too many possibilities (and possibly egos) to work through and the project stagnates.
If you were to hand-pick your all-star cast for your OSS Justice League, just like in the movies, you’d look for a team of differentiated, but hopefully complementary, super-heroes I assume. But I’m diverting away from my main point here.
Each project, just like each formidable foe in the movies, is slightly different and needs slightly different super-powers to tackle it. When selecting a cast for a movie, directors have a global pool to choose from. When selecting a cast for an OSS project, directors have traditionally chosen from their own organisation, possibly with some outside hires to fill the long-term gaps.
With the increasing availability of freelance resources (ie people who aren’t intrinsically tied to carriers or vendors), the proposition of selecting a purpose-built project team of OSS super-heroes is actually beginning to become more possible. I’m wondering how much the gig economy will change the traditional OSS project team model in coming years.
I’d love to hear your thoughts and experiences on this.
The diagram below comes from a presentation by Corey Clinger. It describes Telstra’s Operational Domain Manager (ODM) model that is a key component of their Network as a Service (NaaS) framework. Notice the API stubs across the top of the ODM? Corey went on to describe the TM Forum Open API model that Telstra is building upon.
In a following session, Raman Balla indicated an perspective that differs from many existing OSS. The service owner (and service consumer) must know all aspects of a given service (including all dimensions, lifecycle, etc) in a common repository / catalog and it needs to be attribute-based. Raman also indicated that the aim he has for architecting NaaS is to not only standardise the service, but the entire experience around the service.
In the world of NaaS, operators can no longer just focus separately on assurance or fulfillment or inventory / capacity, etc. As per DevOps, operators are accountable for everything.
The traditional (aka waterfall) approach to delivering an OSS project sees one big-bang delivery of business value at the end of the implementation.
The yellow arrows indicate the sequential nature of this style of delivery. The implications include:
If the project runs out of funds before the project finishes, no (negligible) value is delivered
If there’s no modularity of delivery then the project team must stay the course of the original project plan. There’s no room for prioritising or dropping or including delivery modules. Project plans are rarely perfect at first after all
Any changes in project plan tend to have knock-on effects into the rest of the delivery
There is only one true delivery of value, but milestones demonstrate momentum for the project… a key for change management and team morale
Large deliverables represent the proverbial overload one segment of the project delivery team then under-utilises the rest in each stage. This isn’t great for project flow or team utilisation
The alternate approach seeks to deliver in multiple phases by business value, not artefacts, as shown in the sample model below:
Phased enhancements following a base platform build (eg Sandpit and/or Single-site above) could include the following, where each provides a tangible outcome / benefit for the business, thus maintaining perception of momentum (assurance use-cases cited):
Additional event collection (ie additional collectors / probes / mediation-devices can be added or configured)
Additional filters / sorting of events
Event prioritisation mapping / presentation
Root-cause analysis (intra, then inter-domain)
Other configurations such as latching, auto-acknowledgement, visualisation parameters, etc
Heart-beat function (ie devices are unreachable for a user-defined period)
Knowledge base (ie developing a database of activities to respond to certain events)
Interfacing with other systems (eg trouble-ticket, work-force management, inventory, etc)
Setup of roles/groups
Setup of skills-based routing
Setup of reporting
Setup of notifications (eg email, SMS, etc)
Naming convention refinements
The latter is a more Agile-style breakdown of work, but doesn’t need to be delivered using Agile methodology.
Of course there are pros and cons of each approach. I’d love to hear your thoughts and experiences with different OSS delivery approaches.