The overlaps of DCIM with inventory, asset and config management

A regular reader of the PAOSS blog recently wrote, “I follow with passion your blog,latest post about Inventory are great [Ed. the reader is talking about this post about LNI and PNI and this one about Inventory vs Asset vs CMDB Management]. I ask you if possible have a post on Inside Plant vs Outside Plant vs Virtual network creation… we usually use CAD based tool for Inside Plant design both for TLC equipment, cabling, cross connection, Distribution Frame, rooms, virtual rooms, rows structure,etc but also for power, conditioning, lighfiring,etc. We also use Network Inventory for Datacenter and server farm modelling.Outside Plant typically deals with GIS tool for cabling infrastructure. And now also virtualizzation of Network is coming with NFV and SDN. What do you think about?”

Great question.

In the post about Inventory vs Asset vs CMDB, we used the following Venn Diagram:

Unfortunately, there’s another circle that’s not shown on this diagram, but should be – the DCIM (Data Centre Infrastructure Management) circle. The overlaps between OSS and DCIM partially answer the questions above. We wrote a 5 part series on DCIM back in 2014 (part one, two, three, four, five), so perhaps it’s time for a re-visit.

The last of those five posts even included another Venn Diagram, as follows:

OSS, DCIM, ITSM Venn Diagram

Data Centre Infrastructure Management (DCIM) shares much of its DNA with OSS, but also has a number of unique differences.

Similarities:

  • IT and network device / inventory management
  • CSPs and Data Centres tend to have many Enterprise customers, and therefore a need to align with their IT service and life-cycle management (ITIL / ITSM) methodologies
  • Electronic data collection and storage to support fulfillment and assurance workflows
  • Analytics and operational decision support
  • Planning and design tools
  • Predictive modelling
  • Process and change management
  • Capacity planning, resource allocation and provisioning

Differences (ie what Data Centres have that traditional CSP networks don’t):

  • Facilities / Building Management Systems (FMS/BMS)
  • Energy / Power management
  • Environment and heat management (HVAC) including management of hot/cold zones
  • Data Centres tend to have less outside plant or inter-site connectivity* (ie most power and network connectivity tends to reside within the Data Centres)
  • However, Data Centre cable management have some slight differences. Network links are more likely to be managed within 3D spatial systems (x, y and height) if at all, rather than the 2D (x and y coordinates) typically plotted by most OSS inventory via GIS (Geographical Information Systems) or CAD (Computer Aided Design) drawings. Data Centre cables tend to be run in spatially-dense above-rack or below-floor trayways. By comparison, cables between sites tends to be less dense and at a fairly consistent height (eg a standard depth underground or a standard height when mounted on towers/poles aboveground)
  • Alternatively, DCs may manage spatial infrastructure through naming conventions such as rooms, rack-rows, racks, rack-position rather than 3D spatial systems
  • Data Centres have traditionally had a higher proportion of virtualised assets than traditional CSPs, although that is now changing with the operator network embracing network virtualisation

 

So let’s now look at how it “might” all hang together (noting that each company is likely to be different depending on their systems and processes):

  • DCIM manages facilities, building, power / PLCs and heating/cooling/HVAC
  • PNI manages physical connectivity (between sites and within the DC) as it can generally manage connectivity to physical ports on patch-panels / frames and physical devices (eg switches and routers) inside the DC. PNI also handles splicing and patching. PNI tools can generally also manage power cabling, although not everyone uses PNI for this
  • LNI (in conjunction with EMS [Element Management Systems] and virtual resource managers) will tend to manage the virtual / logical networks including resource management and orchestration
  • LNI will also tend to provide topological views of the network (often point-to-point links between physical/logical ports rather than the cable routes shown in PNI). LNI may also potentially include rack layouts and other forms of network visualisation. However, LNI tends to only partially show spatial presentation of the data (eg physical locations of “circuit” end-points rather than spatial location of all racks and equipment in 3D)
  • Related compute / storage infrastructure could be managed by DCIM, LNI, VIM, etc
  • And any of this could be cross-referenced as assets in the Asset Management System and/or Configuration Management Database (CMDB)

I can see that CAD might still be required for trayway, HVAC ducting, etc because PNI isn’t really designed with this in mind in 3D. 

Having said that, I’d probably still attempt to get all connectivity and support designed into a spatial visualisation tool like PNI rather than CAD. Afterall, connectivity of any type can be modelled as nodes and arcs (same as PNI). It’s just that ducting tends to have a greater 3D heft than a single line / arc of a typical comms cable. 

Why is it important to have this data in a single spatial system rather than CAD? Well, I figure it should help future augmented reality (AR) use-cases like the ones described in the link.

So here’s the updated diagram:

* There are of course multi-site DC organisations that have links between their sites, but they tend to outsource their long-haul network links to traditional carriers.

Softwarisation of 5G

As you have undoubtedly noticed, 5G is generating quite a bit of buzz in telco and OSS circles.

For many it’s just an n+1 generation of mobile standards, where n is currently 4 (well, the number of recent introductions into the market mean n is probably now getting closer to 5  🙂  ).

But 5G introduces some fairly big changes from an OSS perspective. As usual with network transformations / innovations, OSS/BSS are key to operationalisation (ie monetising) the tech. This report from TM Forum suggests that more than 60% of revenues from 5G use-cases will be dependent on OSS/BSS transformation.

And this great image from the 5G PPP Architecture Working Group shows how the 5G architecture becomes a lot more software-driven than previous architectures. Interesting how all 5 “software dimensions” are the domain of our OSS/BSS isn’t it? We could replace “5G architecture” with “OSS/BSS” in the diagram below and it wouldn’t feel out of place at all.

So, you may be wondering in what ways 5G will impact our OSS/BSS:

  • Network slicing – being able to carve up the network virtually, to generate network slices that are completely different functionally, means operators will be able to offer tailored, premium service offerings to different clients. This differs from the one-size-fits-all approach being used previously. However, this means that the OSS/BSS complexity gets harder. It’s almost like you need an OSS/BSS stack for each network slice. Unless we can create massive operational efficiencies through automation, the cost to run the network will increase significantly. Definitely a no-no for the execs!!
  • Fibre deeper – since 5G will introduce increased cell density in many locations, and offer high throughput services, we’ll need to push fibre deeper into the network to support all those nano-cells, pico-cells, etc. That means an increased reliance on good outside plant (PNI – Physical Network Inventory) and workforce management (WFM) tools
  • Software defined networks, virtualisation and virtual infrastructure management (VIM) – since the networks become a lot more software-centric, that means there are more layers (and complexity) to manage.
  • Mobile Edge Compute (MEC) and virtualisation – 5G will help to serve use-cases that may need more compute at the edge of the radio network (ie base stations and cell sites). This means more cross-domain orchestration for our OSS/BSS to coordinate
  • And other use-cases where OSS/BSS will contribute including:
    • Multi-tenancy to support new business models
    • Programmability of disparate networks to create a homogenised solution (access, aggregation, core, mobile edge, satellite, IoT, cloud, etc)
    • Self-healing automations
    • Energy efficiency optimisation
    • Monitoring end-user experience
    • Zero-touch administration aspirations
    • Drone survey and augmented reality asset management
    • etc, etc

Fun times ahead for OSS transformations! I just hope we can keep up and allow the operator market to get everything it wants / needs from the possibilities of 5G.

The differences between Inventory, Asset and Config Management in an OSS

We recently discussed the differences between PNI (Physical Network Inventory) and LNI (Logical Network Inventory) solutions that appear as part of many OSS stacks. 

As promised, today we’ll talk about the subtle differences between:

  • Inventory Management Systems 
  • Asset Management Systems and
  • Configuration Management Databases (CMDB)
  • We might even discuss Virtual Infrastructure (VIM) and Resource Managers as well as Config Managers (different from CMDB) too

Inventory vs Asset vs CMDB

To be honest, the diagram above doesn’t show adequate overlap. Each of these systems has a slightly different purpose, usually for a slightly different set of personas. However, they all play a part in managing the resources that make up an organisation’s Active Network (the network segment dedicated to carrying customer traffic, as opposed to internal corporate traffic).

Let’s start with Inventory Management Systems (IMS) because IMHO, these are the tools that were traditionally responsible for managing service-provider networks. These are the tools typically used by network planners, network engineers, capacity planners and other back-office operational staff.  As mentioned in the link above, these tools can be further broken down into:

  • PNI (Physical Network Inventory) – The physical devices like switches, routers, firewalls as well as the outside plant (OSP) like cables, joints, etc. Generally only used by operators with large, wide-spread networks of physical assets, especially outside plant.
  • LNI (Logical Network Inventory) – The set of objects that are formed using physical infrastructure (and possibly associations to other logical objects). This could include circuits, VLANs, and other overlay network topologies as well as the management of attributes like bandwidth, protocols and other network functionality

These tools tend to focus on the key physical/logical/virtual resources that comprise an operator’s active network (AN). However, they often also support functionality that crosses into other domains such as asset and config management.

Asset Management Systems (AMS), as the name implies, have a more “financial” purpose; where assets are objects of intrinsic financial value to an organisation. AMS tools tend to be used by the accounting and asset management teams. They’re used to track current value (purchase price minus depreciation), warranties, spares management, life-cycles / refresh / end-of-life of assets and their contracts, as well as reactive and predictive maintenance. AMS will tend to store information about most of the Active Network Physical devices. This means they will have records for the same devices as PNI, but often with different information / attributes. They won’t tend to store LNI-related data. However, AMS will often keep information about assets in addition to Active Network devices. This could include software licenses and more.

Configuration Management Databases (CMDB) is more of an IT Service Management (ITSM) terminology. Like many IT concepts, ITSM has been increasingly used in parts of service provider networks. CMDBs are a database of Configuration Items (CIs), where CIs can be logical or physical entities. CIs may (or may not) be physical devices (PNI) or logical resource entities (LNI) and may (or may not) represent tangible values (assets). The main purpose of CIs is to store information about IT services that will allow other ITSM processes, such as Incident, Problem and Change Management, to be performed efficiently.

Not only is there functional overlap between these systems, there’s often also terminology overlap and/or misalignments. Different vendors have different levels of functionality and support alternate use-cases, so the areas of overlap differ between organisations.

Oh, and I also promised to mention VIMs and Config Managers:

Virtual Infrastructure Managers (VIM) are responsible for managing the virtual resources made available by physical infrastructure like compute, storage and network devices. In some cases, VIMs generate virtual network devices (VNFs) or virtual machines (VMs) that could look almost identical to any other device stored in LNI, PNI, AMS and/or CMDB. In fact, instances of these VNFs and VMs may even appear in those systems.

Config Management (as opposed to, but also potentially overlapping with, CMDB), is all about managing the configurations of devices in the network (often active network and corporate network). Each device, such as a router, has a configuration that tells the hardware how to function, where to route traffic, which packets to prioritise, where to send management logs (to the OSS), etc. Being able to monitor and manage these configurations centrally and consistently is the purpose for Config Managers. These are mostly used by network engineers to set policies and golden-configs (ie the config templates that all devices of that type must adhere to consistently). For example, you may have hundreds/thousands of devices in your network and want to re-point all management traffic to a new server as part of an OSS upgrade. Rather than configuring each device separately and manually, you can use the config management tool to push config changes out to the network.

Leave us a message to describe how your organisation use these (and other) tools.

OSS discovers a network

Following yesterday’s post about OSS Inventory, I received another great follow-up question from another avid reader of the PAOSS blog:

Interesting thoughts Ryan! In addition to ‘faults up’, perhaps there is a case also (obvious?) for ‘discovery up’ to capture ongoing non-planned changes? Wondering have you come across any sort of reconciliation / adaptive inventory patterns like this? Workflow based? Autonomous? (Going to far into chaos theory territory ?

Yes, we did exactly that with the same tool discussed yesterday that I used back in 2000. In fact, a very clever dev and I got that company’s first-ever auto-discovery tool working on site (using a product supplied by head-office). Discovering the nodals (ie equipment, cards, ports) was fairly easy. Discovering the connectivity within a domain (we started with SDH) was tricky, but achievable. Auto-discovering cross-domain connectivity (ie DSL circuits through physical, SDH transit, ATM and logical connectivity onto the IP cloud) was much trickier as we needed to find/make linking keys across different data sources.

It was definitely workflow based with a routine-driven back-end. We didn’t just want anything that was discovered to be automatically stuffed into (or removed from) the database (think flapping ports or equipment going down temporarily). It could’ve been automous, but we introduced a manual step to approve any of the discoveries made by each automated discovery iteration.

As you know, modern networks / EMS / VIM (resource managers) are much more discoverable. They need to be for modern orchestration and resilience techniques. I don’t think it would be quite so tricky to stitch circuits together as we’re no longer so circuit-oriented as back in 2000.

However, I’d be fascinated to hear from other readers how much of a problem they have trying to marry up different data sources for discovery purposes. I’d also love to hear whether they’re using fully autonomous discovery or using the manual intervention step for the same reason we were. I imagine most are automating, because orchestration plans just need to make use of whatever resources are being presented by the underlying resource managers in near-real-time.

PS. For those wondering what “discovery” is, it’s shown in the lower grey arrow in this diagram from “Orders down, Faults up

Discovery is the process that allows data to be passed from NMS/EMS/NEs (ie the network or resource managers) directly into the inventory management database. It should be a more reliable and expedient way of sychronising the inventory with the live network. 

The reason for the upper grey arrow is because not all networks have APIs that can be “discovered.” Passive equipment like cable joints and patch-panels don’t have programmatic interfaces. Therefore we need to find other ways to get that data into the Inventory Manager.

Various forms of OSS Inventory

After reading other recent posts such as “Orders Down, Faults Up” and “How is OSS/BSS service and resource availability supposed to work?” an avid reader of the PAOSS blog posed the following brilliant question:

Do you have any thoughts on geospatial vs non geospatial network inventory systems? How often do you see physical plant mapping in a separate system from network inventory, with linkages or integrations between them, vs how often do you see physical and logical inventory being captured primarily in a geospatially oriented system?

Boy do I ever have some thoughts on this topic!! I’m sure you do too, so I’d love to hear what you think in the comments section below.

I was lucky. The first OSS/BSS that I worked on (all the way back in 2000), had both geo and non-geo (topology) views. It also had a brilliantly flexible data model that accommodated physical and logical inventory. All tightly integrated into one package. There aren’t many tools that can do all of that even today. Like I said, I was lucky to have this as a starting point!!

Like all things OSS/BSS, it starts with the personas and the key tasks they need to perform. Or from the supplier’s perspective, which customer personas they’re most actively targeting.

For example, if you have a significant Outside Plant (OSP) Network, then geo-positioning is vital. The exchanges and comms huts are easy enough to find, but pits, cable routes, easements, etc are often harder to find. It’s not uncommon for a field tech to waste time searching for a pit that’s covered in dirt, grass or snow. And knowing the exact cable route in geo view is helpful for sending field techs to the exact location of a fault (ie helping them to pinpoint the location of the bright yellow excavator that has just sliced through your inter-capital link). Geo-view is also important for OSP designers and the field workforce that builds the OSP network.

But other personas don’t care about seeing the detailed cable route. They just want to see a point-to-point topological link to represent physical connections between the ports on adjacent devices. This helps them to quickly understand the network or circuit / service view. They may also like to see an alarm overlay on the topology to quickly determine which parts of the network aren’t performing as expected. For these personas, seeing all the geo-detail just acts as visual noise that they need to subconsciously filter out to understand the topology view.

These personas also tend to want topological views of the network, not just the physical but the logical and virtual network / service overlays too.

In most cases that I can think of, the physical / OSP inventory tools show the physical devices (ports even) that the OSP network connects into. Their main focus is on the cables, joints, pits, pipes, catenaries, poles, lead-ins, patch-panels, patch-leads, splitters, etc. But showing the termination of cables onto active equipment (Inside Plant or ISP) is an important linking key between the physical and logical views.

The physical port (on the physical device) becomes the key demarcation between physical and logical worlds. The physical port connects physical cables / leads, but it also acts as the anchor point from which to create logical ports to which logical connections are made. As a result, the physical device and port tend to be shown in both physical (geo) and logical inventory tools. They also tend to be shown in both physical and logical network topology views.

In the case of the original OSS/BSS I worked on, it had separate visualisation tools for geo, network and circuit/service, but all underpinned by a common data model.

What’s the best way? Different personas will have different perspectives of course. I prefer for physical and logical inventories to be integrated out of the box (to allow simple cross-ref visually and in queries)…. but I also prefer for them to have different views (eg geo, topology, network, circuit/service) to suit different situations.

I also find it helpful if each of those views allow the ability to drill down deeper into specific sections of the graph if necessary. I’d prefer not to have all of those different views overlaid onto a geo visualisation. Too much visual clutter IMHO, but others may love it that way.

Oh, and having separate LNI (Logical Network Inventory) and PNI (Physical Network Inventory) can be a tricky thing to reconcile. The LNI will almost always have programmatic interfaces (APIs) to collect data from, but will generally have to amalgamate many different sources. Meanwhile, the PNI consists of mostly passive equipment and therefore has no API to collect latest info from. I tend to use strategies at the above-mentioned demarcation point (ie physical ports) to help establish linking keys between LNI and PNI.

BTW. There’s one aspect of the question, “How often do you see physical plant mapping in a separate system from network inventory” that I haven’t fully answered. I’ll cover the question of asset management vs inventory management vs CMDB (Configuration Management Database) in more detail in an upcoming post. [Ed. See link here]

Diamonds are Forever and so is OSS OPEX

Sourced from: www.couponraja.in

I sometimes wonder whether OPEX is underestimated when considering OSS investments, or at least some facets (sorry, awful pun there!) of it.

Cost-out (aka head-count reduction) seems to be the most prominent OSS business case justification lever. So that’s clearly not underestimated. And the move to cloud is also an OPEX play in most cases, so it’s front of mind during the procurement process too. I’m nought for two so far! Hopefully the next examples are a little more persuasive!

Large transformation projects tend to have a focus on the up-front cost of the project, rightly so. There’s also an awareness of ongoing license costs (usually 20-25% of OSS software list price per annum). Less apparent costs can be found in the exclusions / omissions. This is where third-party OPEX costs (eg database licenses, virtualisation, compute / storage, etc) can be (not) found.

That’s why you should definitely consider preparing a TCO (Total Cost of Ownership) model that includes CAPEX and OPEX that’s normalised across all options when making a buying decision.

But the more subtle OPEX leakage occurs through customisation. The more customisation from “off-the-shelf” capability, the greater the variation from baseline, the larger the ongoing costs of maintenance and upgrade. This is not just on proprietary / commercial software, but open-source products as well.

And choosing Agile almost implies ongoing customisation. One of the things about Agile is it keeps adding stuff (apps, data, functions, processes, code, etc) via OPEX. It’s stack-ranked, so it’s always the most important stuff (in theory). But because it’s incremental, it tends to be less closely scrutinised than during a CAPEX / procurement event. Unless carefully monitored, there’s a greater chance for OPEX leakage to occur.

And as we know about OPEX, like diamonds, they’re forever (ie the costs re-appear year after year). 

Inventory Management re-states its case

In a post last week we posed the question on whether Inventory Management still retains relevance. There are certainly uses cases where it remains unquestionably needed. But perhaps others that are no longer required, a relic of old-school processes and data flows.
 
If you have an extensive OSP (Outside Plant) network, you have almost no option but to store all this passive infrastructure in an Inventory Management solution. You don’t have the option of having an EMS (Element Management System) console / API to tell you the current design/location/status of the network. 
 
In the modern world of ubiquitous connection and overlay / virtual networks, Inventory Management might be less essential than it once was. For service qualification, provisioning and perhaps even capacity planning, everything you need to know is available on demand from the EMS/s. The network is a more correct version of the network inventory than external repository (ie Inventory Management) can hope to be, even if you have great success with synchronisation.
 
But I have a couple of other new-age use-cases to share with you where Inventory Management still retains relevance.
 
One is for connectivity (okay so this isn’t exactly a new-age use-case, but the scenario I’m about to describe is). If we have a modern overlay / virtual network, anything that stays within a domain is likely to be better served by its EMS equivalent. Especially since connectivity is no longer as simple as physical connections or nearest neighbours with advanced routing protocols. But anything that goes cross-domain and/or off-net needs a mechanism to correlate, coordinate and connect. That’s the role the Inventory Manager is able to do (conceptually).
 
The other is for digital twinning. OSS (including Inventory Management) was the “original twin.” It was an offline mimic of the production network. But I cite Inventory Management as having a new-age requirement for the digital twin. I increasingly foresee the need for predictive scenarios to be modelled outside the production network (ie in the twin!). We want to try failure / degradation scenarios. We want to optimise our allocation of capital. We want to simulate and optimise customer experience under different network states and loads. We’re beginning to see the compute power that’s able to drive these scenarios (and more) at scale.
 
Is it possible to handle these without an Inventory Manager (or equivalent)?

When OSS experts are wrong

When experts are wrong, it’s often because they’re experts on an earlier version of the world.”
Paul Graham.
 
OSS experts are often wrong. Not only because of the “earlier version of the world” paradigm mentioned above, but also the “parallel worlds” paradigm that’s not explicitly mentioned. That is, they may be experts on one organisation’s OSS (possibly from spending years working on it), but have relatively little transferable expertise on other OSS.
 
It would be nice if the OSS world view never changed and we could just get more and more expert at it, approaching an asymptote of expertise. Alas, it’s never going to be like that. Instead, we experience a world that’s changing across some of our most fundamental building blocks.
 
We are the sum total of our experiences.”
B.J. Neblett.
 
My earliest forays into OSS had a heavy focus on inventory. The tie-in between services, logical and physical inventory (and all use-cases around it) was probably core to me becoming passionate about OSS. I might even go as far as saying I’m “an Inventory guy.”
 
Those early forays occurred when there was a scarcity mindset in network resources. You provisioned what you needed and only expanded capacity within tight CAPEX envelopes. Managing inventory and optimising revenue using these scarce resources was important. We did that with the help of Inventory Management (IM) tools. Even end-users had a mindset of resource scarcity. 
 
But the world has changed. We now operate with a cloud-inspired abundance mindset. We over-provision physical resources so that we can just spin up logical / virtual resources whenever we wish. We have meshed, packet-switched networks rather than nailed up circuits. Generally speaking, cost per resource has fallen dramatically so we now buy a much higher port density, compute capacity, dollar per bit, etc. Customers of the cloud generation assume abundance of capacity that is even available in small consumption-based increments. In many parts of the world we can also assume ubiquitous connectivity.
 
So, as “an inventory guy,” I have to question whether the scarcity to abundance transformation might even fundamentally change my world-view on inventory management. Do I even need an inventory management solution or should I just ask the network for resources when I want to turn on new customers and assume the capacity team has ensured there’s surplus to call upon?
 
Is the enormous expense we allocate to building and reconciling a digital twin of the network (ie the data gathered and used by Inventory Management) justified? Could we circumvent many of the fallouts (and a multitude of other problems) that occur because the inventory data doesn’t accurately reflect the real network?
 
For example, in the old days I always loved how much easier it was to provision a customer’s mobile / cellular or IN (Intelligent Network) service than a fixed-line service. It was easier because fixed-line service needed a whole lot more inventory allocation and reservation logic and process. Mobile / IN services didn’t rely on inventory, only an availability of capacity (mostly). Perhaps the day has almost come where all services are that easy to provision?
 
Yes, we continue to need asset management and capacity planning. Yes, we still need inventory management for physical plant that has no programmatic interface (eg cables, patch-panels, joints, etc). Yes, we still need to carefully control the capacity build-out to CAPEX to revenue balance (even more so now in a lower-profitability operator environment). But do many of the other traditional Inventory Management and resource provisioning use cases go away in a world of abundance?
 

 

I’d love to hear your opinions, especially from all you other “inventory guys” (and gals)!! Are your world-views, expertise and experiences changing along these lines too or does the world remain unchanged from your viewing point?
 
Hat tip to Garry for the seed of this post!

Over 30 Autonomous Networking User Stories

The following is a set of user stories I’ve provided to TM Forum to help with their current Autonomous Networking initiative.

They’re just an initial discussion point for others to riff off. We’d love to get your comments, additions and recommended refinements too.

As a Head of Network Operations, I want to Automatically maintain the health of my network (within expected tolerances if necessary) So that Customer service quality is kept to an optimal level with little or no human intervention
As a Head of Network Operations, I want to Ensure the overall solution is designed with machine-led automations as a guiding principle So that Human intervention can not be easily engineered into the systems/processes
As a Head of Network Operations, I want to Automatically identify any failures of resources or services within the entire network So that All relevant data can be collected, logged, codified and earmarked for effective remedial action without human interaction
As a Head of Network Operations, I want to Automatically identify any degradation of resource or service performance within the network So that All relevant data can be collected, logged, codified and earmarked for effective remedial action without human interaction
As a Head of Network Operations, I want to Map each codified data set (for failure or degradation cases) to a remedial action plan So that Remedial activities can be initiated without human interaction
As a Head of Network Operations, I want to Identify which remedial activities can be initiated via a programmatic interface and which activities require manual involvement such as a truck roll So that Even manual activities can be automatically initiated
As a Head of Network Operations, I want to Ensure that automations are able to resolve all known failure / degradation scenarios So that Activities can be initiated for any failure or degradation and be automatically resolved through to closure (with little or no human intervention)
As a Head of Network Operations, I want to Ensure there is sufficient network resilience So that Any failure or degradation can be automatically bypassed (temporarily or permanently)
As a Head of Network Operations, I want to Ensure there is sufficient resilience within all support systems So that Any failure or degradation can be automatically bypassed (temporarily or permanently) to ensure customer service is maintained
As a Head of Network Operations, I want to Ensure that operator initiated changes (eg planned maintenance, software upgrades, etc) automatically generate change tracking, documentation and logging So that The change can be monitored (by systems and humans where necessary) to ensure there is minimal or no impact to customer services, but also to ensure resolution data is consistently recorded
As a Head of Network Operations, I want to Ensure that customer initiated changes (eg by raising an incident) automatically generate change tracking, documentation and logging So that The change can be monitored (by systems and humans where necessary) to ensure the incident is closed expediently, but also to ensure resolution data is consistently recorded
As a Head of Network Operations, I want to Initiate planned outages with or without triggering automated remedial activities So that The change agents can decide to use automations or not and ensure automations don’t adversely effect the activities that are scheduled for the planned outage window
As a Head of Network Operations, I want to Ensure that if an unplanned outage does occur, impacted customers are automatically notified (on first instance and via a communications sequence if necessary throughout the outage window) So that Customer experience can be managed as best possible
As a Head of Network Operations, I want to Ensure that if an unplanned outage does occur without a remedial action being triggered, a post-mortem analysis is initiated So that Automations can be revised to cope with this previously unhandled outage scenario
As a Head of Network Operations, I want to Ensure that even previously un-seen new fail scenarios can be handled by remedial automations So that Customer service quality is kept to an optimal level with little or no human intervention
As a Head of Network Operations, I want to Automatically monitor the effects of remedial actions So that Remedial automations don’t trigger race conditions that result in further degradation and/or downstream impacts
As a Head of Network Operations, I want to Be able to manually override any automations by following a documented sequence of events So that If a race condition is inadvertently triggered by an automation, it can be negated quickly and effectively before causing further degradation
As a Head of Network Operations, I want to Intentionally trigger network/service outages and/or degradations, including cascaded scenarios on an scheduled and/or randomised basis So that The resilience of the network and systems can be thoroughly tested (and improved if necessary)
As a Head of Network Operations, I want to Intentionally trigger network/service outages and/or degradations, including cascaded scenarios on an ad-hoc basis So that The resilience of the network and systems can be thoroughly tested (and improved if necessary)
As a Head of Network Operations, I want to Perform scheduled compliance checks on the network So that Expected configurations and policies are in place across the network
As a Head of Network Operations, I want to Automatically generate scheduled reports relating to the effectiveness of the network, services and automations So that The overall solution health (including automations) can be monitored
As a Head of Network Operations, I want to Automatically generate dashboards (in near-real-time) relating to the effectiveness of the network, services and automations So that The overall solution health (including automations) can be monitored
As a Head of Network Operations, I want to Ensure that automations are able to extend across all domains within the solution So that Remedial actions aren’t constrained by system hand-offs
As a Head of Network Operations, I want to Ensure configuration backups are performed automatically on all relevant systems (eg EMS, OSS, etc) So that A recent good solution configuration can be stored as protection in case automations fail and corrupt configurations within the system
As a Head of Network Operations, I want to Ensure configuration restores are performed and tested automatically on all relevant systems (eg EMS, OSS, etc) So that A recent good solution configuration can be reverted to in case automations fail and corrupt configurations within the system
As a Head of Network Operations, I want to Ensure automations are able to manage the entire service lifecycle (add, modify/upgrade, suspend, restore, delete) So that Customer services can evolve to meet client expectations with little or no human intervention
As a Head of Network Operations, I want to Have a design and architecture that uses intent-based and/or policy-based actions So that The complexity of automations is minimised (eg automations don’t need to consider custom rules for different device makes/models, etc)
As a Head of Network Operations, I want to Ensure as many components of the solution (eg EMS, OSS, customer portals, etc) have programmatic interfaces (even if manual activities are required in back-end processes) So that Automations can initiate remedial actions in near real time
As a Head of Network Operations, I want to Ensure all components and data flows within the solution are securely hardened (eg encryption of data in motion and at rest) So that The power of the autonomous platform can not be leveraged for nefarious purposes
As a Head of Network Operations, I want to Ensure that all required metrics can be automatically sourced from the network / systems in as near real time as feasible / useful So that Automations have the full set of data they need to initiate remedial actions and it is as up-to-date as possible for precise decision-making
As a Head of Network Operations, I want to Use the power of learning machines So that The sophistication and speed of remedial response is faster, more accurate and more reliable than if manual interaction were used
As a Head of Network Operations, I want to Record actual event patterns and replay scenarios offline So that Event clusters and response patterns can be thoroughly tested as part of the certification process prior to being released into production environments
As a Head of Network Operations, I want to Capture metrics that can be cross-referenced against event patterns and remedial actions So that Regressions and/or refinements can improve existing automations (ie continuous retraining of the model)
As a Head of Network Operations, I want to Be able to seed a knowledge base with relevant event/action data, whether the pattern source is from Production, an offline environment, a digital twin environment or other production-like environments So that The database is able to identify real scenarios, even if  scenarios are intentially initiated, but could potentially cause network degradation to a production environment
As a Head of Network Operations, I want to Ensure that programmatic interfaces also allow for revert / rollback capabilities So that Remedial actions that aren’t beneficial can be rolled back to the previous state; OR other remedial actions are performed, allowing the automation to revert to original configuration / state
As a Head of Network Operations, I want to Be able to initiate circuit breakers to override any automations So that If a race condition is inadvertently triggered by an automation, it can be negated quickly and effectively before causing further degradation
As a Head of Network Operations, I want to Manually or automatically generate response-plans (ie documented sequences of activities) for any remedial actions fed back into the system So that Internal (eg quality control) or external (eg regulatory) bodies can review “best-practice” remedial activities at any point in time
As a Head of Network Operations, I want to Intentionally trigger catastrophic network failures (in non-prod environments) So that We can trial many remedial actions until we find an optimal solution to seed the knowledge base with

Is your service assurance really service assurance?? (Part 6)

Seems this post from last week has triggered some really interesting debate – Is your service assurance really service assurance?? (Part 5). It was a post that looked into collecting end-to-end service metrics rather than our traditional method of collecting network device events/metrics and trying to reverse-engineer to form a service-level perspective.

Thought I’d give you an update. I’m thinking along the following lines, but admit that I don’t have it all worked out by any means yet:

  1. We need to concept of span like OpenTelemetry does between microservices (in a way, it’s like nearest-neighbour of where each packet is getting pushed).
    Note that for us a span is on a service-by-service basis between nodes, not just a network link-by-link basis between nodes
  2. We need to be able to measure the real-time metrics of the performance of each span as well as any events/faults impacting them
  3. One challenge (one of probably many) is how to avoid flooding the data/management planes. Possibly a telemetry beacon at each node that’s aggregating performance/events of each packet passed for each service?? But what aggregation-window / cache-size to use? Still too impossibly huge to process except with ridiculously low sampling rates??
  4. By chaining the spans we get a real-time, end-to-end trace of services and the performance (and real-time snapshot of service-by-service resource usage in a packet-switched network)
  5. How to efficiently get the beacon data to a centralised logging/management point? Send beacons via management plane? Send via data plane? Take an approach similar to Netflow / IPFIX-style protocols?
  6. How to store data for a short period (ie for real-time analysis/reporting) as well as for long periods. Due to volumes, we’d have to apply aging policies to the data, but it would still be valuable for the purpose of mid and long-term SLA, network health, optimisation, capacity management, etc

As you can see, there are still so many wide-open questions about the feasibility of the concept. But getting feedback from multiple very clever people who read this blog is definitely helping! Thank you!!

Is your service assurance really service assurance?? (Part 5)

In yesterday’s fourth part of this series about modern network service assurance, we wrote this:

I also just stumbled upon OpenTelemetry, an open source project designed to capture traces / metrics / logs from apps / microservices. It intrigued me because just as you have the concept of traces / metrics / logs for apps, you similarly have traces / metrics / logs for networks.

In the network world, we’re good at getting metrics / logs / events, but not very good at getting trace data (ie end-to-end service chains) as described earlier in this blog series. And if we can’t monitor traces, we can’t easily interpret a customer’s experience whilst they’re using their network service. We currently do “service assurance” by reverse-engineering logs / events, which seems a bit backward to me.

Take a closer look at the OpenTelemetry link above, which provides an overview of how their team is going to gather application telemetry. With increasing software-ification of our networks (eg SDN / NFV) and the use of microservices / NaaS / APIs in our management stacks, could this actually be our path to the holy grail of service assurance (ie capturing trace data – network service telemetry)?? Is it data plane? Is it control / management plane? Is it something in between?

Note: The “active measurements” approach described in part 3 is slightly compromised in current form, which is why I’m so intrigued by the potential of extending the concepts of OpenTelemetry into our software / virtual networks.

I’d really love your take on this one because I’m sure there are many elements to this that I haven’t thought through yet. Please leave your thoughts on the viability of the approach.

Are modern OSS architectures well conceived?

Whatever is well conceived is clearly said,
And the words to say it flow with ease
.”
Nicolas Boileau-Despréaux
.

I’d like to hijack this quote and re-direct it towards architectures. Could we equally state that a well conceived architecture can be clearly understood? Some modern OSS/IT frameworks that I’ve seen recently are hugely complex. The question I’ve had to ponder is whether they’re necessarily complex. As the aphorism states, “Everything should be made as simple as possible, but not simpler.”

Just take in the complexity of this triptych I prepared to overlay SDN, NFV and MANO frameworks.

Yet this is only a basic model. It doesn’t consider networks with a blend of PNF and VNF (Physical and Virtual Network Functions). It doesn’t consider closed loop assurance. It doesn’t consider other automations, or omni-channel, or etc, etc.

Yesterday’s post raised an interesting concept from Tom Nolle that as our solutions become more complex, our ability to make a basic assessment of value becomes more strained. And by implication, we often need to upskill a team before even being able to assess the value of a proposed project.

It seems to me that we need simpler architectures to be able to generate persuasive business cases. But it poses the question, do they need to be complex or are our solutions just not well enough conceived yet?

To borrow a story from Wikiquote, “Richard Feynman, the late Nobel Laureate in physics, was once asked by a Caltech faculty member to explain why spin one-half particles obey Fermi Dirac statistics. Rising to the challenge, he said, “I’ll prepare a freshman lecture on it.” But a few days later he told the faculty member, “You know, I couldn’t do it. I couldn’t reduce it to the freshman level. That means we really don’t understand it.

Making a basic assessment of OSS value

“…as technology gets more complicated, it becomes more difficult for buyers to acquire the skills needed to make even a basic assessment of value. Without such an assessment, it’s hard to get a project going, and in particular hard to get one going the right way.”
Tom Nolle
.

Have you noticed that over the last few years, OSS choice has proliferated, making project assessment more challenging? Previously, the COTS (Commercial Off-the-Shelf) product solution dominated. That was already a challenge because there are hundreds to choose from (there are around 400 on our vendors page alone). But that’s just the tip of the iceberg.

We now also have choices to make across factors such as:

  • Building OSS tools with open-source projects
  • An increasing amount of in-house development (as opposed to COTS implementations by the product’s vendors)
  • Smaller niche products that need additional integration
  • An increase in the number of “standards” that are seeking to solve traditional OSS/BSS problems (eg ONAP, ETSI’s ZSM, TM Forum’s ODA, etc, etc)
  • Revolutions from the IT world such as cloud, containerisation, virtualisation, etc

As Tom indicates in the quote above, the diversity of skills required to make these decisions is broadening. Broadening to the point where you generally need a large team to have suitable skills coverage to make even a basic assessment of value.

At Passionate About OSS, we’re seeking to address this in the following ways:

  • We have two development projects underway (more news to come)
    • One to simplify the vendor / product selection process
    • One to assist with up-skilling on open-source and IT tools to build modern OSS
  • In addition to existing pages / blogs, we’re assembling more content about “standards” evolution, which should appear on this blog in coming days
  • Use our “Finding an Expert” tool to match experts to requirements
  • And of course there are the variety of consultancy services we offer ranging from strategy, roadmap, project business case and vendor selection through to resource identification and implementation. Leave us a message on our contact page if you’d like to discuss more

OSS that repair virtualised networks – the dual loop approach

In a recent article, we talked about Network Service Assurance (NSA) in an environment where network virtualisation exists.

One of the benefits of virtualisation or NaaS (Network as a Service) is that it provides a layer of programmability to your network. That is, to be able to instantiate network services by software through a network API. Virtualisation also tends to assume/imply that there is a huge amount of available capacity (the resource pool) that it can shift workloads between. If one virtual service instance dies or deteriorates, then just automatically spin up another. If one route goes down, customer services are automatically re-directed via alternate routes and the service is maintained. No problem…

But there are some problems that can’t be solved in software. You can’t just use software to fix a cable that’s been cut by an excavator. You can’t just use software to fix failed electronics. Modern virtualised networks can do a great job of self-healing, routing around the problem areas. But there are still physical failures that need to be repaired / replaced / maintained by a field workforce. NSA doesn’t tend to cover that.

Looking at the diagram below, NSA does a great job of the closed-loop assurance within the red circle. But it then needs to kick out to the green closed-loop assurance processes that are already driven by our OSS/BSS.

As described in the link above, “Perhaps if the NSA was just assuring the yellow cloud/s, any time it identifies any physical degradation / failure in the resource pool, it kicks a notification up to the Customer Service Assurance (CSA) tools in the OSS/BSS layers? The OSS/BSS would then coordinate 1) any required customer notifications and 2) any truck rolls or fixes that can’t be achieved programmatically; just like it already does today. The additional benefit of this two-tiered assurance approach is that NSA can handle the NFV / VNF world, whilst not trying to replicate the enormous effort that’s already been invested into the CSA (ie the existing OSS/BSS assurance stack that looks after PNFs, other physical resources and the field workforce processes that look after it all).

Therefore, a key part of the NSA process is how it kicks up from closed-loop 1 to closed-loop 2. Then, after closed-loop 2 has repaired the physical problem, NSA needs to be aware that the repaired resource is now back in the pool of available resources. Does your NSA automatically notice this, or must it receive a notification from closed loop 2?

It could be as simple as NSA sending alarms into the alarm list with a clearly articulate root-cause. The alarm has a ticket/s raised against it. The ticket triggers the field workforce to rectify it and the triggers customer assurance teams/tools to send notifications to impacted customers (if indeed they send notifications to customers who may not actually be effected yet due to the resilience measures that have kicked in). Standard OSS/BSS practice!

OSS change…. but not too much… oh no…..

Let me start today with a question:
Does your future OSS/BSS need to be drastically different to what it is today?

Please leave me a comment below, answering yes or no.

I’m going to take a guess that most OSS/BSS experts will answer yes to this question, that our future OSS/BSS will change significantly. It’s the reason I wrote the OSS Call for Innovation manifesto some time back. As great as our OSS/BSS are, there’s still so much need for improvement.

But big improvement needs big change. And big change is scary, as Tom Nolle points out:
IT vendors, like most vendors, recognize that too much revolution doesn’t sell. You have to creep up on change, get buyers disconnected from the comfortable past and then get them to face not the ultimate future but a future that’s not too frightening.”

Do you feel like we’re already in the midst of a revolution? Cloud computing, web-scaling and virtualisation (of IT and networks) have been partly responsible for it. Agile and continuous integration/delivery models too.

The following diagram shows a “from the moon” level view of how I approach (almost) any new project.

The key to Tom’s quote above is in step 2. Just how far, or how ambitious, into the future are you projecting your required change? Do you even know what that future will look like? After all, the environment we’re operating within is changing so fast. That’s why Tom is suggesting that for many of us, step 2 is just a “creep up on it change.” The gap is essentially small.

The “creep up on it change” means just adding a few new relatively meaningless features at the end of the long tail of functionality. That’s because we’ve already had the most meaningful functionality in our OSS/BSS for decades (eg customer management, product / catalog management, service management, service activation, network / service health management, inventory / resource management, partner management, workforce management, etc). We’ve had the functionality, but that doesn’t mean we’ve perfected the cost or process efficiency of using it.

So let’s say we look at step 2 with a slightly different mindset. Let’s say we don’t try to add any new functionality. We lock that down to what we already have. Instead we do re-factoring and try to pull the efficiency levers, which means changes to:

  1. Platforms (eg cloud computing, web-scaling and virtualisation as well as associated management applications)
  2. Methodologies (eg Agile, DevOps, CI/CD, noting of course that they’re more than just methodologies, but also come with tools, etc)
  3. Process (eg User Experience / User Interfaces [UX/UI], supply chain, business process re-invention, machine-led automations, etc)

It’s harder for most people to visualise what the Step 2 Future State looks like. And if it’s harder to envisage Step 2, how do we then move onto Steps 3 and 4 with confidence?

This is the challenge for OSS/BSS vendors, supplier, integrators and implementers. How do we, “get buyers disconnected from the comfortable past and then get them to face not the ultimate future but a future that’s not too frightening?” And I should point out, that it’s not just buyers we need to get disconnected from the comfortable past, but ourselves, myself definitely included.

Network Service Assurance has new meaning

Back in the old days, Network Service Assurance probably had a different meaning than it might today.

Clearly it’s assurance of a network service. That’s fairly obvious. But it’s in the definition of “network service” where the old and new terminologies have the potential to diverge.

In years past, telco networks were “nailed up” and network functions were physical appliances. I would’ve implied (probably incorrectly, but bear with me) that a “network service” was “owned” by the carrier and was something like a bearer circuit  (as distinct from a customer service or customer circuit). Those bearer circuits, using protocols such as in DWDM, SDH, SONET, ATM, etc potentially carried lots of customer circuits so they were definitely worth assuring. And in those nailed-up networks, we knew exactly which network appliances / resources / bearers were being utilised. This simplified service impact analysis (SIA) and allowed targeted fault-fix.

In those networks the OSS/BSS was generally able to establish a clear line of association from customer service to physical resources as per the TMN pyramid below. Yes, some abstraction happened as information permeated up the stack, but awareness of connectivity and resource utilisation was generally retained end-to-end (E2E).
OSS abstract and connect

But in the more modern computer or virtualised network, it all goes a bit haywire, perhaps starting right back at the definition of a network service.

The modern “network service” is more aligned to ETSI’s NFV definition – “a composition of network functions and defined by its functional and behavioral specification. The Network Service contributes to the behaviour of the higher layer service, which is characterised by at least performance, dependability, and security specifications. The end-to-end network service behaviour is the result of a combination of the individual network function behaviours as well as the behaviours of the network infrastructure composition mechanism.”

They are applications running at OSI’s application layer that can be consumed by other applications. These network services include DNS, DHCP, VoIP, etc, but the concept of NaaS (Network as a Service) expands the possibilities further.

So now the customer services at the top of the pyramid (BSS / BML) are quite separated from the resources at the physical layer, other than to say the customer services consume from a pool of resources (the yellow cloud below). Assurance becomes more disconnected as a result.

BSS OSS cloud abstract

OSS/BSS are able to tie customer services to pools of resources (the yellow cloud). And OSS/BSS tools also include PNI / WFM (Physical Network Inventory / Workforce Management) to manage the bottom, physical layer. But now there’s potentially an opaque gulf in the middle where virtualisation / NaaS exists.

The end-to-end association between customer services and the physical resources that carry them is lost. Unless we can find a way to establish E2E association, we just have to hope that our modern Network Service Assurance (NSA) tools make the yellow cloud robust to the point of infallibility. BTW. If the yellow cloud includes NaaS, then the NSA has to assure the NaaS gateway, catalog and all services instantiated through the gateway.

But as we know, there will always be failures in physical infrastructure (cable cuts, electronic malfunctions, etc). The individual resources can’t afford to be infallible, even if the resource pool seeks to provide collective resiliency.

Modern NSA has to find a way to manage the resource pool but also coordinate fault-fix in the physical resources that underpin it like the OSS used to do (still do??). They have to do more than just build policies and actions to ensure SLAs don’t they? They can seek to manage security, power, performance, utilisation and more. Unfortunately, not everything can be fixed programmatically, although that is a great place for NSA to start.

Perhaps if the NSA was just assuring the yellow cloud, any time it identifies any physical degradation / failure in the resource pool, it kicks a notification up to the Customer Service Assurance (CSA) tools in the OSS/BSS layers? The OSS/BSS would then coordinate 1) any required customer notifications and 2) any truck rolls or fixes that can’t be achieved programmatically; just like it already does today. The additional benefit of this two-tiered assurance approach is that NSA can handle the NFV / VNF world, whilst not trying to replicate the enormous effort that’s already been invested into the CSA (ie the existing OSS/BSS assurance stack that looks after PNFs, other physical resources and the field workforce processes that look after it all).

I’d love to hear your thoughts. Hopefully you can even correct me if/where I’m wrong.

NaaS is to networks what Agile is to software

After Telstra’s NaaS (Network as a Service) program won a TM Forum excellence award, I promised yesterday to share a post that describes why I’m so excited about the concept of NaaS.

As the title suggests above, NaaS has the potential to be as big a paradigm shift for networks (and OSS/BSS) as Agile has been for software development.

There are many facets to the Agile story, but for me one of the most important aspects is that it has taken end-to-end (E2E), monolithic thinking and has modularised it. Agile has broken software down into pieces that can be worked on by smaller, more autonomous teams than the methods used prior to it.

The same monolithic, E2E approach pervades the network space currently. If a network operator wants to add a new network type or a new product type/bundle, large project teams must be stood up. And these project teams must tackle E2E complexity, especially across an IT stack that is already a spaghetti of interactions.

But before I dive into the merits of NaaS, let me take you back a few steps, back into the past. Actually, for many operators, it’s not the past, but the current-day model.

Networks become Agile with NaaS (the TMN model)

As per the orange arrow, customers of all types (Retail, Enterprise and Wholesale) interact with their network operator through BSS (and possibly OSS) tools. [As an aside, see this recent post for a “religious war” discussion on where BSS ends and OSS begins]. The customer engagement occurs (sometimes directly, sometimes indirectly) via BSS tools such as:

  • Order Entry, Order Management
  • Product Catalog (Product / Offer Management)
  • Service Management
  • SLA (Service Level Agreement) Management
  • Billing
  • Problem Management
  • Customer Management
  • Partner Management
  • etc

If the customer wants a new instance of an existing service, then all’s good with the current paradigm. Where things become more challenging is when significant changes occur (as reflected by the yellow arrows in the diagram above).

For example, if any of the following are introduced, there are end-to-end impacts. They necessitate E2E changes to the IT spaghetti and require formation of a project team that includes multiple business units (eg products, marketing, IT, networks, change management to support all the workers impacted by system/process change, etc)

  1. A new product or product bundle is to be taken to market
  2. An end-customer needs a custom offering (especially in the case of managed service offerings for large corporate / government customers)
  3. A new network type is added into the network
  4. System and / or process transformations occur in the IT stack

If we just narrow in on point 3 above, fundamental changes are happening in network technology stacks already. Network virtualisation (SDN/NFV) and 5G are currently generating large investments of time and money. They’re fundamental changes because they also change the shape of our traditional OSS/BSS/IT stacks, as follows.

Networks become Agile with NaaS (the virtualisation model)

We now not only have Physical Network Functions (PNF) to manage, but Virtual Network Functions (VNF) as well. In fact it now becomes even more difficult because our IT stacks need to handle PNF and VNF concurrently. Each has their own nuances in terms of over-arching management.

The virtualisation of networks and application infrastructure means that our OSS see greater southbound abstraction. Greater southbound abstraction means we potentially lose E2E visibility of physical infrastructure. Yet we still need to manage E2E change to IT stacks for new products, network types, etc.

The diagram below shows how NaaS changes the paradigm. It de-couples the network service offerings from the network itself. Customer Facing Services (CFS) [as presented by BSS/OSS/NaaS] are de-coupled from Resource Facing Services (RFS) [as presented by the network / domains].

NaaS becomes a “meet-in-the-middle” tool. It effectively de-couples

  • The products / marketing teams (who generate customer offerings / bundles) from
  • The networks / operations teams (who design, build and maintain the network).and
  • The IT teams (who design, build and maintain the IT stack)

It allows product teams to be highly creative with their CFS offerings from the available RFS building blocks. Consider it like Lego. The network / ops teams create the building blocks and the products / marketing teams have huge scope for innovation. The products / marketing teams rarely need to ask for custom building blocks to be made.

You’ll notice that the entire stack shown in the diagram below is far more modular than the diagram above. Being modular makes the network stack more suited to being worked on by smaller autonomous teams. The yellow arrows indicate that modularity, both in terms of the IT stack and in terms of the teams that need to be stood up to make changes. Hence my claim that NaaS is to networks what Agile has been to software.

Networks become Agile with NaaS (the NaaS model)

You will have also noted that NaaS allows the Network / Resource part of this stack to be broken into entirely separate network domains. Separation in terms of IT stacks, management and autonomy. It also allows new domains to be stood up independently, which accommodates the newer virtualised network domains (and their VNFs) as well as platforms such as ONAP.

The NaaS layer comprises:

  • A TMF standards-based API Gateway
  • A Master Services Catalog
  • A common / consistent framework of presentation of all domains

The ramifications of this excites me even more that what’s shown in the diagram above. By offering access to the network via APIs and as a catalog of services, it allows a large developer pool to provide innovative offerings to end customers (as shown in the green box below). It opens up the long tail of innovation that we discussed last week.
Networks become Agile with NaaS (the developer model)

Some telcos will open up their NaaS to internal or partner developers. Others are drooling at the prospect of offering network APIs for consumption by the market.

You’ve probably already identified this, but the awesome thing for the developer community is that they can combine services/APIs not just from the telcos but any other third-party providers (eg Netflix, Amazon, Facebook, etc, etc, etc). I could’ve shown these as East-West services in the diagram but decided to keep it simpler.

Developers are not constrained to offering communications services. They can now create / offer higher-order services that also happen to have communications requirements.

If you weren’t already on board with the concept, hopefully this article has convinced you that NaaS will be to networks what Agile has been to software.

Agree or disagree? Leave me a comment below.

PS1. I’ve used the old TMN pyramid as the basis of the diagram to tie the discussion to legacy solutions, not to imply size or emphasis of any of the layers.

PS2. I use the terms OSS/BSS as per TMN pyramid. The actual demarcation line between what OSS and BSS does tend to be grey and trigger religious wars, as per the post earlier this week.

PS3. Similarly, the size of the NaaS layer is to bring attention to it rather than to imply it is a monolithic stack in it’s own right. In reality, it is actually a much thinner shim layer architecturally

PS4. The analogy between NaaS and Agile is to show similarities, not to imply that NaaS replaces Agile. They can definitely be used together

PS5. I’ve used the term IT quite generically (operationally and technically) just to keep the diagram and discussion as simple as possible. In reality, there are many sub-functions like data centre operations, application monitoring, application control, applications development, product owner, etc. These are split differently at each operator.

Fast and slow OSS, where uCPE and network virtualisation fits in

Yesterday’s post talked about one of the many dichotomies in OSSfast and slow data / processes.

One of the longer lead-time items in relation to OSS data and processes is in network build and customer connections. From the time when capacity planning or a customer order creates the signal to build, it can be many weeks or months before the physical infrastructure work is complete and appearing in the OSS.

There are two financial downsides to this. Firstly, it tends to be CAPEX-heavy with equipment, construction, truck-rolls, government approvals, etc burning through money. Meanwhile, it’s also a period where there is no money coming in because the services aren’t turned on yet. The time-to-cash cycle of new build (or augmentation) is the bane of all telcos.

This is one of the exciting aspects of network virtualisation for telcos. In a time where connectivity is nearly ubiquitous in most countries, often with high-speed broadband access, physical build becomes less essential (except over-builds). Technologies such as uCPE (Universal Customer Premises Equipment), NFV (Network Function Virtualisation), SD WAN (Software-Defined Wide Area Networks), SDN (Software Defined Networks) and others mean that we can remotely upgrade and reconfigure the network without field work.

Network virtualisation gives the potential to speed up many of the slowest, and costliest processes that run through our OSS… but only if our OSS can support efficient orchestration of virtualised networks. And that means having an OSS with the flexibility to easily change out slow processes to replace them with fast ones without massive overhauls.

Speeding up your OSS transition from PoC to PROD

In yesterday’s article, we discussed 7 models for achieving startup-like efficiency on large OSS transformations.

One popular approach is to build a proof-of-concept or sandpit quickly on cloud hosting or in lab environments. It’s fast for a number of reasons including reduced number of approvals, faster activation of infrastructure, reduced safety checks (eg security, privacy, etc), minimised integration with legacy systems and many other reasons. The cloud hosting business model is thriving for all of these reasons.

However, it’s one thing to speed up development of an OSS PoC and another entirely to speed up deployment to a PROD environment. As soon as you wish to absorb the PoC-proven solution back into PROD, all the items listed above (eg security sign-offs) come back into play. Something that took days/weeks to stand up in PoC now takes months to productionise.

Have you noticed that the safety checks currently being used were often defined for the old world? They often aren’t designed with transition from cloud to PROD in mind. Similarly, the culture of design cross-checks and approvals can also be re-framed (especially when the end-to-end solution crosses multiple different business units). Lastly, and way outside my locus of competence, is in re-visiting security / privacy / deployment / etc models to facilitate easier transition.

One consideration to make is just how much absorption is required. For example, there are examples of services being delivered to the large entity’s subscribers by a smaller, external entity. The large entity then just “clips-the-ticket,” gaining a revenue stream with limited involvement. But the more common (and much more challenging) absorption model is for the partner to fold the solution back into the large entity’s full OSS/BSS stack.

So let’s consider your opportunity in terms of the absorption continuum that ranges between:

clip-the-ticket (minimally absorbed) <-----------|-----------> folded-in (fully absorbed)

Perhaps it’s feasible for your opportunity to fit somewhere in between (partially absorbed)? Perhaps part of that answer resides in the cloud model you decide to use (public, private, hybrid, cloud-managed private cloud) as well as the partnership model?

Modularity and reduced complexity (eg integrations) are also a factor to consider (as always).

I haven’t seen an ideal response to the absorption challenge yet, but I believe the solution lies in re-framing corporate culture and technology stacks. We’ll look at that in more detail tomorrow.

How about you? Have you or your organisation managed to speed up your transition from PoC to PROD? What techniques have you found to be successful?

The TMN model suffers from modern network anxiety

As the TMN diagram below describes, each layer up in the network management stack abstracts but connects (as described in more detail in “What an OSS shouldn’t do“). That is, each higher layer reduces the amount if information/control within a domain that it’s responsible for, but it assumes a broader responsibility for connecting multiple domains together.
OSS abstract and connect

There’s just one problem with the diagram. It’s a little dated when we take modern virtualised infrastructure into account.

In the old days, despite what the layers may imply, it was common for an OSS to actually touch every layer of the pyramid to resolve faults. That is, OSS regularly connected to NMS, EMS and even devices (NE) to gather network health data. The services defined at the top of the stack (BSS) could be traced to the exact devices (NE / NEL) via the circuits that traversed them, regardless of the layers of abstraction. It helped for root-cause analysis (RCA) and service impact analysis (SIA).

But with modern networks, the infrastructure is virtualised, load-balanced and since they’re packet-switched, they’re completely circuitless (I’m excluding virtual circuits here by the way). The bottom three layers of the diagram could effectively be replaced with a cloud icon, a cloud that the OSS has little chance of peering into (see yellow cloud in the diagram later in this post).

The concept of virtualisation adds many sub-layers of complexity too by the way, as higlighted in the diagram below.

ONAP triptych

So now the customer services at the top of the pyramid (BSS / BML) are quite separated from the resources at the bottom, other than to say the services consume from a known pool of resources. Fault resolution becomes more abstracted as a result.

But what’s interesting is that there’s another layer that’s not shown on the typical TMN model above. That is the physical network inventory (PNI) layer. The cables, splices, joints, patch panels, equipment cards, etc that underpin every network. Yes, even virtual networks.

In the old networks the OSS touched every layer, including the missing layer. That functionality was provided by PNI management. Fault resolution also occurred at this layer through tickets of work conducted by the field workforce (Workforce Management – WFM).

In new networks, OSS/BSS tie services to resource pools (the top two layers). They also still manage PNI / WFM (the bottom, physical layer). But then there’s potentially an invisible cloud in the middle. Three distinctly different pieces, probably each managed by a different business unit or operational group.
BSS OSS cloud abstract

Just wondering – has your OSS/BSS developed control anxiety issues from losing some of the control that it once had?