Bollinger bands and candlestick charts in OSS

No doubt all of you have seen network performance graphs. The one below is an example (from Flowmon 8.03). This example shows throughput, jitter and round-trip time amongst other metrics. No doubt you use many additional metrics to track the health of your network.

Network Performance Graph

Most performance management tools show the range of metrics as line or bar graphs, as above, which is helpful of course. They often also have threshold-crossing alert capabilities. For example, if utilisation crosses a certain threshold (eg 70%), then send an alert to do something about it. Or even better, trigger an automation to do something about it.

The only problem of using this approach is that the discrete values you see are not telling the whole story. You’re generally seeing the averaged value for a given poll-cycle (eg 5 mins), not the fluctuations that appeared during the poll-cycle.

That got me thinking that candlestick charts might also show some useful information when visualising network performance. Candlesticks, such as the one shown below, are more often used for financial analysis.

They show the open and close (for a given recording period) in the thick body of the “candle” (red for a drop, green for a rise); as well as the high and low during the recording period in the candle wicks.

This gives you a much better feel for the fluctuations during a given recording period. However, in my many years in OSS/BSS, I’ve yet to see this technique used. Have you?

Whilst discussing the possible merits of this technique with a good friend and super-talented OSS/BSS developer, I was alerted to an additional concept – Bollinger Bands. Hat-tip Jay!

Bollinger Bands are also generally used in financial analysis. They are the lines marked in blue, overlaid onto the candlestick diagram below. They’re similar to moving average envelopes, but include standard deviations in their calculation. They’re useful because they identify periods of volatility such as the yellow box in the diagram below. The yellow box has identified abnormal behaviour well before a typical threshold-crossing event would have occurred, thus giving advanced warning. 

Generally speaking, performance is tending towards upper limits when the candlestick is near the upper blue line and tending towards lower limits when near the lower blue line.

What do you think? Could candlesticks and Bollinger Bands help you keep your network within acceptable performance bounds?

BTW. Time can be an important factor for monitoring network health trending, especially since 5, 10 and 15 minute poll cycles are still common. Let’s say the start of the 15m poll cycle is at 00:00, so the EMS/NE performance for 15 mins to then logs the record at 00:15 (or perhaps later depending on its ETL efficiency – let’s say 00:20). The log then gets pulled and processed and visualised by a performance visualisation tool, which requires further time (let’s say 10 mins). So now it’s 00:30 before any person or system can react to what started happening 30 mins earlier. Most performance graphs just show the average value over each 15m poll period. If your EMS/NE can also export max and min during each period, you’ll have the ability to generate a candlestick. Then in turn you can generate your Bollinger Bands. It doesn’t speed up the delays mentioned above, but it will give you additional granularity on a single graph though.

Note: Candlesticks are similar to box plots, which are also used to show variability within time-series data. A box plot (aka box and whisker chart) shows:

  1. Outliers (optional)
  2. Upper end of range excluding outliers
  3. Lower end of range excluding outliers
  4. First and Third quartile marks
  5. Median

Hat-tip to Jim for recommending box plots.

OSS / BSS in the clouds

Have you noticed the recent up-tick in headlines around telco offerings by hyperscalers AWS, Google and Microsoft? Edge compute (both on-prem / device-edge and provider edge) and related services to support 5G use-cases appears to be the leading driver. These use-cases will need to be managed by our OSS/BSS for the telco operators and their customers.

Meanwhile, top-tier OSS/BSS users are also continuing to adopt cloud-native initiatives, as described in this Infographic from Analysis Mason / Amdocs. Analysis Mason estimates that over 90% of CSPs in North America, Asia–Pacific and Europe will have their OSS/BSS stacks running on cloud infrastructure by 2022, with well over 60% on hybrid cloud.

However, just how much of the CSP OSS/BSS stack will be on the cloud remains in question. According to TM Forum’s research, most CSPs have deployed less than 5% percent of their operations software in the public cloud.

In today’s article, we take a closer look into cloud offerings for OSS/BSS. The providers we’ll cover are hyperscalers:

AWS

https://aws.amazon.com/telecom/

The following diagrams come from the Amazon Telco Symposium. The first diagram shows the AWS Telecom Engagement Model (noting the OSS/BSS bubble).

The latter diagram provides some insight into important offerings in AWS’ push into the 5G / telco edge such as Greengrass, SiteWise, Sagemaker and more.

 

AWS services such as the following have been used as part of home-grown offerings for years:

  • Wavelength (low latency), Lambda (serverless) or EC2 – compute services for processing applications/code
  • S3, EFS, Glacier, Elastic, Snow Family, etc – data storage for collecting logs, etc
  • EKS or ECS – for Kubernetes / Docker container / cluster management
  • VPC – for separate environment deployments
  • VPN – to tie VPCs to networks / clouds / DCs
  • ELB – for load balancing
  • ELK – for log management consisting of three open source projects: Elasticsearch, Logstash, and Kibana
  • Aurora, RDS, Redshift, DynamoDB, Neptune, KDB, etc – databases
  • Cassandra, Kibana, etc – data visualisation
  • SageMaker, Augmented AI, Lex, etc – AI / ML tools
  • And much more

These have been leveraged by telco architects to build home-grown OSS/BSS tools that leverage commercial and open-source products like Apache’s Kafka, NiFi, Spark, etc.

However, there’s been an increasing trend for OSS/BSS vendors to publish their offerings on the AWS marketplace too, including:

  • Moogsoft
  • Zoho / ManageEngine (eg OpManager, Network Config Mgr)
  • Solarwinds
  • Domotz Pro
  • Lumeta CloudVisibility
  • Flowmon
  • Hyperglance
  • Mphasis InfraGraf (Forecasting & Planning)
  • KloudGin (Work Order Mgmt)
  • Kx (network performance)
  • AND Bosch, ThingsBoard, ThingPark, ThingLogix, etc, etc (if we extend into IoT device management)

AWS Marketplace tends to show the solutions that are more standardised / fixed-price in nature (Telecoms section in Marketplace). Many other OSS/BSS vendors such as Netcracker, CSG, Intraway and Camvio don’t appear in the AWS marketplace but have customisable, AWS-ready solutions for clients. These companies have their own sales arms obviously, but also train the AWS global salesforce in their products.

Google Cloud

https://cloud.google.com/solutions/telecommunications

According to Google Cloud’s strategy for the telecom industry, Google Cloud is focusing on three strategic areas to support telecommunications companies:

  • Helping telecommunications companies monetise 5G as a business services platform, including:

    • The Global Mobile Edge Cloud (GMEC) strategy, which will deliver a portfolio and marketplace of 5G solutions built jointly with telecommunications companies; an open cloud platform for developing network-centric applications; and a global distributed edge for deploying these solutions

    • Anthos for Telecom, which will bring its Anthos cloud application platform to the network edge, allowing telecommunications companies to run their applications wherever it makes the most sense. Anthos for Telecom—based on open-source Kubernetes—will provide an open platform for network-centric applications.
  • Empowering telecommunications companies to better engage their customers through data-driven experiences by:

    • Empowering telecommunications companies to transform their customer experiences through data- and AI-driven technologies. Google’s BigQuery platform provides a scalable data analytics solution—with machine learning built-in so telecommunications companies can store, process, and analyze data in real time, and build personalization models on top of this data

    • Contact Center AI assists telecommunications companies with customer service. Contact Center AI gives companies 24/7 access to conversational self-service, with seamless hand-offs to human agents for more complex issues. It also empowers human agents with continuous support during their calls by identifying intent and providing real-time, step-by-step assistance
    • AI and retail solutions including omni-channel marketing, sales and service, personalisation and recommendations, and virtual-agent presence in stores
  • Assisting them in improving operational efficiencies across core telecom systems. This allows operators to move OSS, BSS and network functions from their own environments to the Google Cloud

This LightReading report even highlights how Google has been engaged to provide extensive knowledge transfer to some telcos.

This press release from March 2020 announced that Google would partner with Amdocs to support the telecom industry to:

  • Deliver Amdocs solutions to Google Cloud: Amdocs will run its digital portfolio on Google Cloud’s Anthos, enabling communications service providers (CSPs) to deploy across hybrid and multi-cloud configurations
  • Develop new enterprise-focused 5G edge computing solutions: Amdocs and Google Cloud will create new industry solutions for CSPs to monetize over 5G networks at the edge
  • Help CSPs leverage data and analytics to improve services: Amdocs will make its Data Hub and Data Intelligence analytics solutions available on Google Cloud. Amdocs and Google Cloud will also develop a new, comprehensive analytics solution to help CSPs leverage data to improve the reliability of their services and customer experiences.
  • Partner on Site Reliability Engineering (SRE) services: The companies will share tools, frameworks, and best practices for SRE and DevOps

On the same day the Google / Amdocs partnership was announced, Netcracker Technology announced it would deploy its entire Digital BSS/OSS and Orchestration stack on Google Cloud. These applications are cloud native, deployed as a set of reusable microservices that run over on-prem or public cloud on top of container platforms such as Google Kubernetes Engine (GKE).

Optiva (formerly Redknee) has also adopted a Google Cloud strategy, using Google Cloud Spanner and the Google Cloud Platform (GCP) to underpin its Charging Engine.

This article from CIMI Corp provides some great additional reading about why Google is a credible telco cloud provider.

Microsoft Azure

https://www.microsoft.com/en-us/industry/telecommunications

Microsoft has also announced an intention to better serve telecom operators at the convergence of cloud and comms networks through its Azure platform.

“We will continue to partner with existing suppliers, emerging innovators and network equipment partners to share roadmaps and explore expanded opportunities to work together, including in the areas of radio access networks (RAN), next-generation core, virtualized services, orchestration and operations support system/business support system (OSS/BSS) modernization,” states Yousef Khalidi in this Microsoft post.

Acquisition of Metaswitch Networks (a provider of virtualised network software) and Affirmed Networks (a provider that sells virtualised, cloud-native mobile network solutions) shows further evidence of ambitions in the telco / cloud domain.

Like the partnerships described with AWS and Google above, Netcracker has also partnered with Microsoft, offering its Netcracker Digital BSS/OSS and Orchestration applications on Microsoft Azure. This article also describes that, “Netcracker is collaborating with Microsoft to integrate Azure Machine Learning (ML) and AI services with Netcracker’s Advanced Analytics to add intelligent contextual decisioning and recommendations to enable more personalized customer engagements.”

Meanwhile, Amdocs and Microsoft have been working on making ONAP available on the Azure platform. Nokia and Microsoft are partnering on “…cloud, Artificial Intelligence (AI) and Internet of Things (IoT), bringing together Microsoft cloud solutions and Nokia’s expertise in mission-critical networking.”

Microsoft Azure Edge Zones are offered through Azure, with select carriers and operators, or as private customer zones. They bring compute, storage, and service availability closer to the customer / device for low-latency + high-throughput use cases.

The recent announcement of an AT&T and Microsoft alliance as well as deals involving Telefónica (with Aura, its AI-powered digital assistant), SK Telecom (5G-based cloud gaming), Reliance Jio (cloud solutions), NTT (enterprise solution offerings), and Etisalat (future networks) show an increasing presence for Azure within the telco domain.

Netcracker has integrated its solution with Microsoft business applications (eg Office 365, Dynamics 365 and OneDrive), as other OSS/BSS providers undoubtedly have too.

Microsoft also has their own OSS/BSS offering in Dynamics 365 Field Service Management.

Summary

CSPs (Communications Service Providers) find themselves in a catch-22 position with cloud providers. Their own OSS/BSS and those of their suppliers have an increasing reliance on cloud provider services and infrastructure. Due to economies of scale, efficiency of delivery, scalability and a long-tail of service offerings (from the cloud providers and their marketplaces), CSPs aren’t able to compete. Complexity of public cloud (security, scalability, performance, interoperability, etc) also make it a quandary for CSPs. It’s already a challenge (commercially and technically) to run the networks they do, but prohibitively difficult to expand coverage further to include public cloud. 

Yet, by investing heavily in cloud services, CSPs are funding further growth of said cloud providers, thus making CSPs less competitive, but more reliant, on cloud services. Telco architects are becoming ever more adept at leveraging the benefits of cloud. An example is being able to spin up apps without having to wait for massive infrastructure projects to be completed first, which has been a massive dependency (ie time delay) for many OSS/BSS projects.

In the distant past, CSPs had the killer apps, being voice and WAN data. These services supported the long-tail of business (eg salespeople from every industry in the world would make sales calls via telephony services) and customers were willing to pay a premium for these services.

The long-tail of business is now omni-channel, and the killer apps are content, experiences, data and the apps that support them. Being the killer apps, whoever supplies them also takes the premium and share-of-wallet. AWS, Google and Microsoft are supplying more of today’s killer apps (or the platforms that support them) than CSPs are.

The risk for CSPs is that cloud providers and over the top players will squeeze most of the profits from massive global investments in 5G. This is exacerbated if telco architects get their cloud architectures wrong and OPEX costs spiral out of control. Whether architectures are optimal or not, CSPs will fund much of the cloud infrastructure. But if CSPs don’t leverage cloud provider offerings, the infrastructure will cost even more, take longer to get to market and constrain them to local presence, leaving them at a competitive disadvantage with other CSPs

If I were a cloud provider, I’d be happy for CSPs to keep providing the local, physical, outside plant networks (however noting recent investments in local CSPs such as Amazon’s $2B stake in Bharti Airtel and Google’s $4.7 billion investment in Jio Platforms* not to mention Google Fiber and sub-sea fibre roll-outs such as this). It’s CAPEX intensive and needs a lot of human interaction to maintain / augment the widely distributed infrastructure. That means a lot is paid on non-effective time (ie the travel-time of techs travelling to site to fix problems, managing resources and/or coordinating repairs with property owners). Not only that, but there tends to be a lot of regulatory overhead managing local infrastructure / services as well as local knowledge / relationships. Governments want to ensure all their constituents have access to communications services at affordable prices. All the while, revenue per bit is continuing to drop, so merely shuffling bits around is a business model with declining profitability.

With declining profitability, operational efficiency improvements and cost reduction becomes even more important. OSS/BSS tools are vital for delivering improved productivity. But CSPs are faced with the challenge of transforming from legacy, monolithic OSS/BSS to more modern, nimble solutions. The more modular, flexible OSS/BSS of today and in future roadmaps are virtualised, microservice-based and designed for continuous delivery / DevOps. This is painting CSPs into a cloud-based future.

Like I said, a catch-22 for CSPs!

But another interesting strategy by Google is that its Anthos hybrid cloud platform will run multi-cloud workloads, including workloads on AWS and Microsoft Azure. Gartner predicts that >75% of midsize and large organisations will have adopted a multi-cloud and/or hybrid IT strategies by 2021 to prevent vendor lock-in. VMware (Dell) and Red Hat (IBM) are others creating multi-cloud / hybrid-cloud offerings. This gives the potential for CSPs to develop a near-global presence for virtualised telco functions. But will cloud providers get there before the telcos do?

For those of us supporting or delivering OSS/BSS, our future is in the clouds either way. It’s a rapidly evolving landscape, so watch this space.

 

* Note: Google is not the only significant investor in Jio:

Investor US$B Stake
Facebook 5.7 10%
Silver Lake Partners 1.43 2.08%
Mubadala 1.3 1.85%
Adia UAE Sovereign 0.8 1.16%
Saudi Arabia Sovereign 1.6 2.32%
TPG 0.64 0.93%
Catterton 0.27 0.39%
Intel 0.253 0.39%
Qualcomm 0.097 15.00%
Google 4.7 7.70%
TOTALS 16.79 42%

 

OSS/BSS Testing – The importance of test data

Today’s is the third part in a series about OSS/BSS testing (part 1, part 2).

Many people think about OSS/BSS testing in terms of application functionality and non-functional requirements. They also think about entry criteria / pre-requisites such as the environments, application builds / releases, test case development and maybe even the integrations required.

However, an often overlooked aspect of OSS/BSS functionality testing is the  data that is required to underpin the tests.

Factors to be considered will include:

  • Programmatically collectable data – this refers to data that can be collected from data sources. Great examples are near-real-time alarm and performance data that can be collected by connecting to the network, either to real devices and NMS or simulators
  • Manually created or migrated data – this refers to data that is seeded into the OSS/BSS database. This could be manually created or script-loaded / migrated. Common examples of this are inventory data, especially information about passive infrastructure like buildings, cables, patch-panels, racks, etc. In some cases, even data that can be collected via a programmatic interface still needs to be augmented with manually created or migrated data
  • Base data – for consistency of test results, there usually needs to be a consistent data baseline to underpin the tests.
  • Reversion to baseline – If/when there are test cases that modify the base data set (eg provisioning a new service that reserves resources such as ports on a device), there may need to be a method of roll-back (or reinstatement) to base state. In other cases, a series of tests may require a cascading series of dependent changes (eg reserving a port before activating a service). These examples may need to be rolled-back as a sequence
  • Automated / Regression testing – if automations and/or regression testing is to be performed, automated reversion to consistent base data will also be required. Otherwise discrepancies will appear between automated test cycles
  • Migration of base data – for consistency of results between phases, there also needs to be consistency of data across the different environments on which testing is performed. This may require migration of base data between environments (see yesterday’s post about transitions between environments)
  • Multi-homing of data – particularly for real-time data, sources such as devices / NMS may issue data to more than one destination (eg different environments).
  • Reconciliation / Equivalency testing – When multi-homing of data sources is possible or when Current and Future PROD are running in parallel, equivalency testing can be performed. By performing comparison between the processed data in destination systems it allows for equivalency testing (eg between current and future state mediation devices / probes / MDDs). Transition planning will also be important here as we plan to migrate from Current PROD to Future PROD environments
  • Data Migration Testing – this is testing to confirm the end-to-end validity and completeness of migrated data sets
  • PROD data cuts – once a system is cut over to production status, it’s common to take copies of PROD data and load it onto lower environments. However, care should be taken that the PROD data doesn’t overwrite specially crafted base data (see “base data” dot-point above)

 

OSS/BSS Testing – Transitions

One of the most vital, but underestimated aspect of OSS/BSS project implementation is ensuring momentum is maintained. These large and complex projects are prone to stagnating at different stages, which can introduce pressure onto the implementation team.

As mentioned in yesterday’s post, the first in this week’s series, the test strategy and scheduling is regularly overlooked as a means of maintaining OSS project momentum. More specifically, careful planning of transitions between test phases and the environments that they’re run on, can demonstrate progress – where progress is seen through the introduction of business value.

The following diagram provides a highly stylised indicative timeline (x-axis) of activities, showing how to leverage multiple different environments (y-axis). See here for examples of additional environments that you may have on your project.

You’ll notice that this diagram covers:

  • Environments
  • Test phases (eg FAT, SAT, SIT, DMT, NFT, UAT – see descriptions of these test phases here)
  • Build phases (build, configure, integrate)
  • Data loads (reference data, symbolic data and/or real data extracts)
  • How builds and data loads can be cascaded between environments to reduce duplicated effort

OSS Phasing - Testing, Environments, Data

Some other important call-outs from this stylised diagram include:

  • The Builds and Data Loads cascade from lower environments (eg from PROD-SUPPORT to PRE-PROD). Thought needs to be given as to which builds and data sets need to be cascaded from which environments
  • However, after PRE-PROD is handed over and becomes PROD, it is common for cuts of production data to be regularly loaded back into the lower environments so that they are PROD-like
  • Stand-up of new PROD environments is often a long lead-time item (because of size and complexity such as resilience architectures, security, etc), compared to lower environments. Environments such as DEV/TEST could be as easy to stand up as creating a new Virtual Machine/s on existing hosting
  • Different environments may have access to different data sources / integrations. For example, the lower environments may be connected to lab versions of devices and NMS/EMS. Alternatively, the network might be mimicked by simulators in non-PROD environments. Other integrations could be for active directory (AD), environment logging, patch management, etc. Sometimes production data sources can be connected to non-PROD OSS environments, but this is not so common
  • The item marked as Base-build on the PRE-PROD / PROD environment reflects the initial build and configuration of virtualisation, databases, storage, management networking, resilience / failover mechanisms, backup/restore, logging, security hardening and much more
  • Careful transition planning needs to go into PROD cutover. In the sample diagram, a final cut of data comes from the Current PROD environment to PRE-PROD before it becomes the Future PROD environment during official handover
  • You’ll notice though that there may still be a period of overlap between cutover from Current PROD to Future PROD. This is because there needs to be staged data source cutover in cases where data sources like network devices can’t multi-home data feeds to both environments in parallel.

This earlier post provides some insights into novel ways to slice and dice your OSS implementation by planning of regular drops and consistent release of business value

OSS/BSS Testing – the V-Model

On major software projects like the OSS you’re building, testing is an important phase of course. You’ll have undoubtedly incorporated testing into your planning. After all, testing is a key component of any Software Development Life Cycle (SDLC). There are various SDLC models / methodologies such as Waterfall, V-Model, Agile and others that you can consider.

Unfortunately, most OSS project teams tend to underestimate the testing phase, thinking it can just fit in around other major activities towards the end of the implementation. Experienced testers will suggest that they should be involved right from the requirement capture phase, because they’ll have to design test cases to prove that each requirement is met.

More importantly, your test strategy and test phase transitioning can play a major part in maintaining momentum through a project’s delivery phase. We’ll look into a number of related details in a series of posts this week.

Today we’ll look at the V-Model. It can be a helpful model for mapping requirements to test phases / cases. The diagram below, which comes from my book, Mastering your OSS, shows a simplified, sample version of the V-Model. It highlights the relationship between key test artefacts (eg plans / designs / specifications / requirements) on the left with the corresponding test phases on the right.

V-Model Testing

Your documentation and test phases will probably differ. You can find a discussion about some other possible OSS test phases here.

We’ll take a closer look tomorrow at how your different test phases could map to the OSS environments you might have available.

Getting confused by key Assurance metrics?

Are you a bit slow like me and sometimes have to stop and think to differentiate your key assurance metrics like your MTTRs from your MTBFs?

If so, I thought this useful diagram from researchgate.net might help

The metrics are:

MTBF (Mean Time Between Failures) – the average elapsed time between failures of a system, service or device. It’s the basic measure of availability / reliability of the system / service / device. The higher, the better.

MTTR (Mean Time to Repair) – generally used to denote the average time to close a trouble ticket (to repair a failed system / service / device). It’s the basic measure of corrective action efficiency. The lower, the better.

Some also use MTTR as a Mean Time to Recover / Resolve (ie MTTD + MTTR in the diagram above) or Mean Time to Respond (MTTD in the diagram above to acknowledge an event and create a ticket). See why I get confused?

MTTD (Mean Time to Detect / Diagnose) – the average time taken from when an event is first generated and timestamped to when the NOC detects / diagnoses the cause and generates a ticket. The lower, the better.

MTTF (Mean Time to Failure) – the average system / service / device up-time

The common data store trend (part 2)

Last month we posted an article that described using a common data model (CDM) for our OSS / BSS data. It mostly looked at the situation within the context of typical operational data sources (the blue boxes on the left side of the diagram below):

Today’s article pushes the vision a little further. If your CDM is built as part of an enterprise-wide data warehouse, then you may get the opportunity to think beyond the boundaries network and operations data.

We’ve long said that our OSS/BSS impact most other parts of a business, yet we tend to spend very little time proactively seeking value-add opportunities outside network operations, in marketing, products, finance, the C-suite and beyond. 

Traditional OSS/BSS were built around highly structured, relational databases, usually designed by OSS/BSS product vendors. Each data architecture was designed to support the specific, baked-in use-cases offered by each product. It was like a building architect designing a building, let’s say a new wing of a university, from the ground up for a very specific purpose.

You don’t get the same luxury with your CDM. You have to take a multitude of existing platforms, applications and data models and attempt to turn them into a cohesive data set. This is a bit like taking a row of existing houses, extending and combining them to form a university wing. The “existing houses” might represent disparate OSS or BSS or network systems, but they could also be IT / data silos from various other parts of the organisation.

You know the latter “university” design will be compromised – discrepancies in data standards, data flows, siloed data knowledge, disjointed data governance, etc. However, it also comes with a big benefit. You can keep appending new sets of data that were never part of initial considerations, of any of the IT / data silos. It could be weather data, social trending, building approvals or so much more. It could be any data set that you think could unlock new insights.

But to form a coherent and valuable data set, you still need a common blueprint. As Stephanie Shen describes on towardsdatascience.com

“…the following areas need to be considered and planned at this conceptual stage:

– The core data entities and data elements such as those about customers, products, sales.
– The output data needed by the clients and customers.
– The source data to be gathered and transformed or referenced to produce the output data.
– Ownership of each data entity and how it should be consumed and distributed based on business use cases.
– Security policies to be applied to each data entity.
– The relationships between the data entities, such as reference integrity, business rules, execution sequence.
– Standard data classification and taxonomy.
– Standards of data quality, operations, and Service Level Agreements (SLAs).
 
This conceptual level of design consists of the underlying data entities that support each business function.
 
As with all valuable OSS tripods, your value in the CDM chain is being able to connect the silos of business, IT and operations.

An OSS Security Summary

Our OSS / BSS manage some of the world’s most vital comms infrastructure don’t they? That makes them pretty important assets to protect from cyber-intrusion. Therefore security is a key, but often underestimated, component of any OSS / BSS project.

Let me start by saying I’m no security expert. However, I have worked with quite a few experts tasked with securing my OSS projects and picked up a few ideas along the way. I’ll share a few of those ideas in today’s post.

We look at:

  1. Security Trust Zones / Realms
  2. Restricting Access to OSS / BSS systems and data
  3. OSS / BSS Data Security
  4. Real-time Security Logging / Monitoring
  5. Patch Management
  6. Security Testing / Hardening
  7. Useful Security Standards

 

1. Security Trust Zones / Realms

For me, security starts with how you segment and segregate your network and related systems. The aim of segmentation / segregation is to restrict malicious access to sensitive data / systems. The diagram below shows a highly simplified three-realm design, starting at the bottom:

  1. The operator’s Active Network realm – the network that carries live customer traffic and is managed by the CSP / operator [Noting though, that these are possibly managed as virtual and/or leased entities rather than owned]. It comprises the routers, switches, muxes, etc that make up the network. As such, this zone needs to be highly secure. Customers connect to the Active Network at the edge of the organisation’s network, often via CPE (Customer Premises Equipment), NTU (Network Termination Units) or similar. Dedicated Network Operation Centre (NOC) operator terminals tend to connect inside the Active Network
  2. The operator’s Corporate / Enterprise realm – the network that houses the organisation’s corporate IT assets. This is where most corporate staff engage with core business services like desktop tools and so much more. If network operations staff need to connect to the Corporate / Enterprise realm but also reach into the Active Network realm, then an air-gap is usually established by the SCP between the two. This is bridged through technologies like Citrix, RDP (Remote Desktop Protocol) or similar
  3. The Cloud / Internet realm –  the external networks / infrastructure utilised by the organisation that are outside the organisation’s direct control. This includes Internet services, which many corporate users rely on of course. However, it may also include some important components of your OSS/BSS stack if provided as public cloud services, an increasingly common software supply model these days
  4. You’ll also notice the all-important Security Control Points (SCP) like firewalls that provide segregation between the zones

OSS BSS Cloud Security Control Points

In all likelihood, your security trust model will contain more than three zones, but these should be the absolute minimum.

The Active Network should be segregated from the Corporate / Enterprise network so that it can continue to provide service to customers even if the connection between them is lost (or intentionally severed if a security breach is identified).

This is where things get interesting. The Active Network and our Network Management stack rely on Shared Services such as DNS (Domain Naming System), NTP (Network Time Protocol), Identity / Access Management, Anti-Virus and more. These tend to be housed in Corporate / Enterprise realms. If we want the Active Network to be able to operate in complete standalone mode then we need to provide special consideration to the shared services architectures. 

Aside: Traditionally, we’ve focused on perimeter defense and authenticated users are granted authorised access to a broad collection of resources. We now see the trend towards more remote users and cloud-based assets outside the enterprise-owned boundary in our OSS architectures. There’s currently debate around whether zero-trust architectures are required to segment more holistically – to restrict lateral movement within a network, assuming an attacker is already present on the network.
The NIST ZTA draft discusses this emerging approach in more detail

Once we have the security trust zones identified, we now have to determine where our OSS / BSS / management stack resides within the zones. If we use the layers of the TMN Pyramid as a guide:

  • The Network Element Layer (NEL) is the heart of the Active Network
  • The EMS / NMS (Element / Network Management Systems) will also usually reside within the Active Network
  • The OSS / BSS are interesting. They have to interface with the network and EMS / NMS. But they also usually have to interface with corporate systems like data warehouses, reporting tools, etc. They’re so critical to managing the Active Network, they need to be highly secure. That means they could be placed inside the Active Network realm or even have their own special Central Management realm. In other cases, different components of the OSS / BSS might be spread across different realms.

OSS abstract and connect

Note that we also have to consider the systems (eg user portals, asset management systems, etc, etc) that our OSS / BSS need to interface to and where they reside in the trust model.

2. Restricting Access to OSS / BSS systems and data

We want to uniquely control who has access to what systems and data using our OSS / BSS stack.

The Security Trust model also impacts the architectures of Identity Management (Directory Services like Active Directory), User Access Management (UAM) and Privileged Access Management (PAM) solutions and how they control access to our OSS / BSS

They serve three purposes:

  • To provide fine-grained management of access to privileged / restricted data and systems within our OSS / BSS
  • To simplify the administrative overhead of managing user access to our OSS / BSS by defining group-based user access policies
  • To log the activities of individual users whilst they use the OSS/BSS and related systems / networks

Most OSS / BSS allow user authentication via Directory Services these days. Most, but not all, also allow roles / privileges to be assigned via Directory Services. For example, RBAC (Role Based Access Control) is policy that is defined by our OSS / BSS applications. It controls what functions users / groups can perform via permission management. For central user administration purposes, it’s ideal that the Directory Service can pass role-based information to our OSS / BSS

3. OSS / BSS Data Security

The first step in the data security process is to identify categories of data such as unclassified, confidential, secret, etc.

We then need to consider what security mechanisms need to be applied to each category. There are four main OSS / BSS data security considerations:

  1. Data Anonymisation / Privacy – is the process of removing / redacting / encrypting personally identifiable information from the data sets stored in our OSS / BSS (particularly the latter). Our solutions need to store personal data such as names, addresses, contact details, billing details, etc. We can use techniques to control the pervasiveness of access to that data. For example, we may use a tightly restricted system to store personal details as well as a non-identifiable code (eg LocationID or ServiceID) for use by our other more widely accessed tools (eg PNI / LNI)
  2. Encryption of data at rest – is the process of encrypting the large stores of data used by our OSS / BSS, whether a local database used by each application or in centralised data warehouses
  3. Encryption of data in transit – is the process of encrypting data as it transits between components within your OSS/BSS stack (and possibly beyond). Techniques such as VPNs and IPSec protocols can be used. As we increasingly see OSS / BSS built as web-based applications, we’re using encrypted connections (eg HTTPS, SSL, TLS, etc) to protect our data
  4. Physical security – is the process of restricting physical access to data stores (eg locked cabinets, facilities access management, etc). This isn’t always within our control as an OSS / BSS project team.

 

4. Real-time Security Logging / Monitoring

Ensure all systems in the management stack (OSS, BSS, NMS, EMS, the network, out-of-band management, etc) are logging to a central SIEM (Security Information and Event Management) tool. Oh, and don’t do what I saw one big bank do – they had so many hits occurring just on their IPS / IDS tool that they just left it sitting in the corner unmonitored and in the too-hard basket. By having the tools, they’d ticked their compliance box, but there was no checkbox asking them to actually look at the results or respond to the incidents identified!!

 

5. Patch Management

Software patch management is theoretically one of the simplest security management techniques to implement. It ensures you have the latest, hopefully most secure, version of all software.

OSS / BSS / Management stacks tend to have many, many different components. Not just at the obvious application level, but operating systems, third-party software (eg runtime environments, databases, application servers, message buses, antivirus software, syslog, etc). 

Patch management is often well maintained by IT teams within the Corporate / Enterprise trust zone discussed above. They have access to the Internet to download patches and tools to help push updates out. However, the Active Network zone shouldn’t have direct access to the Internet, so routine patch management could be easily overlooked and/or difficult to implement. Sometimes the software components reside on servers that are rarely logged into and patches can be easily overlooked.

The other problem is that OSS / BSS applications are often heavily customised, making it hard to follow a standard upgrade path. I’ve seen OSS / BSS that haven’t been patched for years, even with something as simple as Jave runtime environments, because it causes the OSS / BSS to fail.

 

6. Security Testing / Hardening

Your organisation probably already has standards and checklists in place to ensure that all of your IT assets are as secure as possible. Your OSS / BSS environments are just one of those assets. However, as the “manager of managers” of your Active Network, the OSS / BSS is probably more important to secure than most.  

Your organisation might also insist that all applications, including the OSS / BSS, are built on a hardened Standard Operating Environment (SOE). However, some suppliers provide OSS / BSS as appliances, built on their own environments. These then have to go through a hardening process in alignment with your corporate IT standards.

If using a vendor-supplied off-the-shelf application, it will be quite common for it to have a default admin account on the application and database. This makes it easier for the system implementation team to navigate their way around the solution when building it. However, one of the first steps in a hardening process is to rename or disable these built-in accounts.

As “manager of managers,” your OSS / BSS‘s primary purpose is to collect (or request) information from a variety of sources. Some of these sources reside in the Active Network. Others reside in the Corporate Network or elsewhere. As such, careful consideration needs to be given to what Ports / Protocols are allowed. Some systems will come pre-configured with default / open settings. However, these should be restricted to necessary protocols only, including SNMP, HTTPS, SSH, FTPS and/or similar.

Speaking of SNMP, its original design was inherently insecure as it uses a primitive method of authentication. It uses clear-text community strings to secure access to the management plane. Only version 3 of SNMP (ie SNMPv3) has the ability to authenticate and encrypt payloads, so this should be used wherever possible. Some of you may have legacy device types that precede SNMPv3 though. Alert TA17-156A provides suggestions to minimise exposure to SNMP abuse.

Also consider the environment on which you’re performing your security testing. As described in this post about OSS / BSS environments and test transitions, you’ll probably have multiple environments – PROD environments that are connected to the live Active Network devices and non-PROD environments that are connected to test lab devices and/or simulators. Where should you perform your penetration / security testing? Probably not on PROD, because you want to ensure the solution is already secure before letting it loose into Production. But you also want to ensure it’s the most PROD-like as possible. You could possibly use PRE-PROD (ie a state before a solution is cut-over to PROD), before it’s fully connected to the Active Network. Or, you could use the most PROD-like lower environment (eg Staging).

One other thing when conducting security tests and hardening – penetration testing often breaks things by injecting malicious code / data. Ensure you take a backup of any environment so you can roll-back to a working state after conducting your pen-tests.

 

7. Useful Security Standards

The following is a list of security standards that I’ve used in the past:

As I mentioned at the start, I’m far from being an expert in the field of network or data security. I’d love to get your feedback if I’m missing anything important!!

Industry News: Netcracker and Google Cloud announce strategic partnership

Breaking news: “Netcracker and Google Cloud announce strategic partnership” has been published on our Industry News stream.

Industry News includes: contract wins, new product releases, job openings, EOIs/RFPs, etc.

To publish news about your organisation, first claim or register your organisation’s listing on The Blue Book OSS/BSS Supplier Directory then create a news post.

The overlaps of DCIM with inventory, asset and config management

A regular reader of the PAOSS blog recently wrote, “I follow with passion your blog,latest post about Inventory are great [Ed. the reader is talking about this post about LNI and PNI and this one about Inventory vs Asset vs CMDB Management]. I ask you if possible have a post on Inside Plant vs Outside Plant vs Virtual network creation… we usually use CAD based tool for Inside Plant design both for TLC equipment, cabling, cross connection, Distribution Frame, rooms, virtual rooms, rows structure,etc but also for power, conditioning, lighfiring,etc. We also use Network Inventory for Datacenter and server farm modelling.Outside Plant typically deals with GIS tool for cabling infrastructure. And now also virtualizzation of Network is coming with NFV and SDN. What do you think about?”

Great question.

In the post about Inventory vs Asset vs CMDB, we used the following Venn Diagram:

Unfortunately, there’s another circle that’s not shown on this diagram, but should be – the DCIM (Data Centre Infrastructure Management) circle. The overlaps between OSS and DCIM partially answer the questions above. We wrote a 5 part series on DCIM back in 2014 (part one, two, three, four, five), so perhaps it’s time for a re-visit.

The last of those five posts even included another Venn Diagram, as follows:

OSS, DCIM, ITSM Venn Diagram

Data Centre Infrastructure Management (DCIM) shares much of its DNA with OSS, but also has a number of unique differences.

Similarities:

  • IT and network device / inventory management
  • CSPs and Data Centres tend to have many Enterprise customers, and therefore a need to align with their IT service and life-cycle management (ITIL / ITSM) methodologies
  • Electronic data collection and storage to support fulfillment and assurance workflows
  • Analytics and operational decision support
  • Planning and design tools
  • Predictive modelling
  • Process and change management
  • Capacity planning, resource allocation and provisioning

Differences (ie what Data Centres have that traditional CSP networks don’t):

  • Facilities / Building Management Systems (FMS/BMS)
  • Energy / Power management
  • Environment and heat management (HVAC) including management of hot/cold zones
  • Data Centres tend to have less outside plant or inter-site connectivity* (ie most power and network connectivity tends to reside within the Data Centres)
  • However, Data Centre cable management have some slight differences. Network links are more likely to be managed within 3D spatial systems (x, y and height) if at all, rather than the 2D (x and y coordinates) typically plotted by most OSS inventory via GIS (Geographical Information Systems) or CAD (Computer Aided Design) drawings. Data Centre cables tend to be run in spatially-dense above-rack or below-floor trayways. By comparison, cables between sites tends to be less dense and at a fairly consistent height (eg a standard depth underground or a standard height when mounted on towers/poles aboveground)
  • Alternatively, DCs may manage spatial infrastructure through naming conventions such as rooms, rack-rows, racks, rack-position rather than 3D spatial systems
  • Data Centres have traditionally had a higher proportion of virtualised assets than traditional CSPs, although that is now changing with the operator network embracing network virtualisation

 

So let’s now look at how it “might” all hang together (noting that each company is likely to be different depending on their systems and processes):

  • DCIM manages facilities, building, power / PLCs and heating/cooling/HVAC
  • PNI manages physical connectivity (between sites and within the DC) as it can generally manage connectivity to physical ports on patch-panels / frames and physical devices (eg switches and routers) inside the DC. PNI also handles splicing and patching. PNI tools can generally also manage power cabling, although not everyone uses PNI for this
  • LNI (in conjunction with EMS [Element Management Systems] and virtual resource managers) will tend to manage the virtual / logical networks including resource management and orchestration
  • LNI will also tend to provide topological views of the network (often point-to-point links between physical/logical ports rather than the cable routes shown in PNI). LNI may also potentially include rack layouts and other forms of network visualisation. However, LNI tends to only partially show spatial presentation of the data (eg physical locations of “circuit” end-points rather than spatial location of all racks and equipment in 3D)
  • Related compute / storage infrastructure could be managed by DCIM, LNI, VIM, etc
  • And any of this could be cross-referenced as assets in the Asset Management System and/or Configuration Management Database (CMDB)

I can see that CAD might still be required for trayway, HVAC ducting, etc because PNI isn’t really designed with this in mind in 3D. 

Having said that, I’d probably still attempt to get all connectivity and support designed into a spatial visualisation tool like PNI rather than CAD. Afterall, connectivity of any type can be modelled as nodes and arcs (same as PNI). It’s just that ducting tends to have a greater 3D heft than a single line / arc of a typical comms cable. 

Why is it important to have this data in a single spatial system rather than CAD? Well, I figure it should help future augmented reality (AR) use-cases like the ones described in the link.

So here’s the updated diagram:

* There are of course multi-site DC organisations that have links between their sites, but they tend to outsource their long-haul network links to traditional carriers.

The common data store trend

Some time back, we discussed  A modern twist on OSS architecture that is underpinned by a common data model.
 
Time to discuss this a little more visually.
 
As the blue boxes on the left side of the diagram below show, you may have many different data sources (some master, some slaved). You may have a single OSS tool (monolithic solution) or you may have many OSS tools (best-of-breed approach).
 
You may have multiple BSS, NMS and even direct connections to network devices. You may even have other sources of data that you’ve never used before such as weather patterns, lightning strikes, asset management prediction modelling, SCADA data, HVAC data, building access / security events, etc, etc.
 
The common data model allows you to aggregate those sets to provide insights that have never been readily accessible to you previously.
 
So let’s look at a few key points
  1. Existing network layer systems (eg NMS, NE and their mediation devices) are currently sucking (near)real-time (ie alarm and perf) data out of the network and feeding to an OSS directly. They may also be pushing inventory discovery data to the OSS, although probably only loading less frequently (once-daily typically) .
  2. The common data model provides a few options for data flows: 
    1. If the data store is performant enough, the network layer could feed real-time data to the data store which on-forwards to OSS
    2. multi-home the data from the network to the data store and OSS simultaneously
    3. feed data from the network to the OSS, which may (or not) process before pushing to the data store
  3. Just a quick note regarding data flows: The network will tend to be the master for real-time / assurance flows. However, manual input tends to be the master for design/fulfil flows, so the OSS becomes the master of inventory data as per this link 
  4. The question then becomes where the data enrichment happens (ie appending inventory-related data to alarms) to help with root-cause and service-impact calculations. Enrichment / correlation probably needs to happen in the OSS‘s real-time engine, but it could source enrichment data directly from the network, from the OSS‘s inventory, or from the common data store 
  5. If the modern ETL tools (eg SNMP and syslog collectors, etc) allow you to do your own ETL to a common data store, a vendor OSS would only need one mediation device (ie to take data from the data store), rather than needing separate ones to pull from all the different NMS/EMS/NE) in your network. This has the potential to reduce mediation license costs from your OSS vendor
  6. Having said that, if you have difficult / proprietary interfaces that make it a challenge to do all of your own ETL then it might be best to let your OSS vendor build your mediation / ETL engines
  7. The big benefit of the common data store is you can choose a best-of-breed approach but still have a common data model to build Business Intelligence queries and reports around
  8. The common data store also takes load off the production OSS application / data servers. Queries and reports can be run against the common data platform, freeing up CPU cycles on the OSS for faster user interactions

The Common Data Model is supported by a few key advancements:

  1. In the past, the mediation layer (ie getting data out of the network and into the OSS) was a challenge. Network operators didn’t tend to want to do this themselves. This introduced a dependency on software suppliers / integrators to build mediation devices and sell them to operators as part of their OSS/BSS solutions. But there’s been a proliferation of highly scalable ETL (Extract, Transform, Load) tools in recent years
  2. Many networks used to have proprietary interfaces that required significant expertise to integrate with. The increasing ubiquity of IP networking and common interfaces (eg SNMP and web interfaces like RESTful, JSON, SOAP, XML) to the network layer makes ETL simpler.=
  3. Massively scalable databases that don’t have as much dependency on relational integrity and can ingest data for myriad sources
  4. A proliferation of data visualisation tools that are more user-friendly instead of having to be a coder capable of writing complex SQL queries
 

Softwarisation of 5G

As you have undoubtedly noticed, 5G is generating quite a bit of buzz in telco and OSS circles.

For many it’s just an n+1 generation of mobile standards, where n is currently 4 (well, the number of recent introductions into the market mean n is probably now getting closer to 5  🙂  ).

But 5G introduces some fairly big changes from an OSS perspective. As usual with network transformations / innovations, OSS/BSS are key to operationalisation (ie monetising) the tech. This report from TM Forum suggests that more than 60% of revenues from 5G use-cases will be dependent on OSS/BSS transformation.

And this great image from the 5G PPP Architecture Working Group shows how the 5G architecture becomes a lot more software-driven than previous architectures. Interesting how all 5 “software dimensions” are the domain of our OSS/BSS isn’t it? We could replace “5G architecture” with “OSS/BSS” in the diagram below and it wouldn’t feel out of place at all.

So, you may be wondering in what ways 5G will impact our OSS/BSS:

  • Network slicing – being able to carve up the network virtually, to generate network slices that are completely different functionally, means operators will be able to offer tailored, premium service offerings to different clients. This differs from the one-size-fits-all approach being used previously. However, this means that the OSS/BSS complexity gets harder. It’s almost like you need an OSS/BSS stack for each network slice. Unless we can create massive operational efficiencies through automation, the cost to run the network will increase significantly. Definitely a no-no for the execs!!
  • Fibre deeper – since 5G will introduce increased cell density in many locations, and offer high throughput services, we’ll need to push fibre deeper into the network to support all those nano-cells, pico-cells, etc. That means an increased reliance on good outside plant (PNI – Physical Network Inventory) and workforce management (WFM) tools
  • Software defined networks, virtualisation and virtual infrastructure management (VIM) – since the networks become a lot more software-centric, that means there are more layers (and complexity) to manage.
  • Mobile Edge Compute (MEC) and virtualisation – 5G will help to serve use-cases that may need more compute at the edge of the radio network (ie base stations and cell sites). This means more cross-domain orchestration for our OSS/BSS to coordinate
  • And other use-cases where OSS/BSS will contribute including:
    • Multi-tenancy to support new business models
    • Programmability of disparate networks to create a homogenised solution (access, aggregation, core, mobile edge, satellite, IoT, cloud, etc)
    • Self-healing automations
    • Energy efficiency optimisation
    • Monitoring end-user experience
    • Zero-touch administration aspirations
    • Drone survey and augmented reality asset management
    • etc, etc

Fun times ahead for OSS transformations! I just hope we can keep up and allow the operator market to get everything it wants / needs from the possibilities of 5G.

An Asset Management / Inventory trick

Last week we discussed the nuances between Inventory, Asset and Config Management within an OSS stack. Each one of these tools are designed to supports functionality for different users / persona-groups. However, they also tend to have significant functional overlap. Chances are your organisation doesn’t have separate dedicated tools for each.

So today I’m going to share a trick I’ve used in the past when I’ve only had a PNI (Physical Network Inventory) system to work with, but need to perform asset management style functionality.

Most inventory tools are great at storing the current state of a device that exists in a network. However, they don’t tend to be so great at an asset manager’s primary function – tracking the entire life-cycle of an asset from procurement to decommissioning and sparing / maintenance along the way.

Normally the PNI just records the locations of all the active network equipment – in buildings, exchanges, comms-huts, cabinets, etc. The trick I use is to create an additional location/s for warehouses. They may (or may not) reside in the physical location of your real warehouse/s.

In almost all PNI systems, you have control over the status of the device (eg IN-SERVICE, etc). You can use this functionality to include status of SPARE, UNDER REPAIR, etc and switch a device between active network locations and the warehouse.

These status-change records give you the ability to pin-point the location of a given asset at any point in time. It also gives you trending stats, either as an individual device or as a cohort of devices (eg by make/model).

You can even build processes around it for check-in / check-out of the warehouse and maintenance scheduling.

I should point out that this works if your PNI allows you to uniquely identify a device (eg by make/model + serial number or perhaps a unique naming convention instance). If your PNI device records only show the current function of a device (eg a naming convention like SiteA-Router-0001), then you might lose sight of the device’s trail when it moves through life-cycle states (eg to the warehouse).

The differences between Inventory, Asset and Config Management in an OSS

We recently discussed the differences between PNI (Physical Network Inventory) and LNI (Logical Network Inventory) solutions that appear as part of many OSS stacks. 

As promised, today we’ll talk about the subtle differences between:

  • Inventory Management Systems 
  • Asset Management Systems and
  • Configuration Management Databases (CMDB)
  • We might even discuss Virtual Infrastructure (VIM) and Resource Managers as well as Config Managers (different from CMDB) too

Inventory vs Asset vs CMDB

To be honest, the diagram above doesn’t show adequate overlap. Each of these systems has a slightly different purpose, usually for a slightly different set of personas. However, they all play a part in managing the resources that make up an organisation’s Active Network (the network segment dedicated to carrying customer traffic, as opposed to internal corporate traffic).

Let’s start with Inventory Management Systems (IMS) because IMHO, these are the tools that were traditionally responsible for managing service-provider networks. These are the tools typically used by network planners, network engineers, capacity planners and other back-office operational staff.  As mentioned in the link above, these tools can be further broken down into:

  • PNI (Physical Network Inventory) – The physical devices like switches, routers, firewalls as well as the outside plant (OSP) like cables, joints, etc. Generally only used by operators with large, wide-spread networks of physical assets, especially outside plant.
  • LNI (Logical Network Inventory) – The set of objects that are formed using physical infrastructure (and possibly associations to other logical objects). This could include circuits, VLANs, and other overlay network topologies as well as the management of attributes like bandwidth, protocols and other network functionality

These tools tend to focus on the key physical/logical/virtual resources that comprise an operator’s active network (AN). However, they often also support functionality that crosses into other domains such as asset and config management.

The main differences between Inventory and Asset Management systems is that:

  • Inventory tends to have the concept of connectivity (eg cables, patch panels, circuits, networks, topologies), which asset management rarely cares about
  • Inventory tends to align customer services / circuits to devices to help identify utilisation, available capacity and service impact analysis, which asset management rarely cares about
  • This includes service hierarchies, multiplexing and networks / meshes
  • More importantly, IMS often have network project planning tools to allow design teams to plan for network augmentation
  • Since connectivity is important to IMS, they are more likely to have geographic and schematic / topology views of the network

Asset Management Systems (AMS), as the name implies, have a more “financial” purpose; where assets are objects of intrinsic financial value to an organisation. AMS tools tend to be used by the accounting and asset management teams. Asset / device life-cycle management such as the use-cases described below are sometimes performed by operational teams.

They’re used to track current value (purchase price minus depreciation), warranties, spares management, life-cycles / refresh / end-of-life of assets and their contracts, as well as reactive and predictive maintenance and reliability management. AMS will tend to store information about most of the Active Network Physical devices. This means they will have records for the same devices as PNI, but often with different information / attributes.

They won’t tend to store LNI-related data. However, AMS will often keep information about assets in addition to Active Network devices. This could include software licenses and more.

AMS will tend to consider unique network devices down to the level of serial numbers so that they can be tracked through a unique device’s life-cycle from order to warehouse to in-service to scrapping. IMS tools can store device serial numbers but tend to track devices by function (eg Router-0001 connects SiteA to SiteB). If the device in that function (eg Router-0001) fails, then a new one is inserted with the same same.

Some providers don’t even manage financial assets to a device level. They clump costs by site or project, with no relationships to asset details or the asset management use-cases described above.

Configuration Management Databases (CMDB) is more of an IT Service Management (ITSM) terminology. Like many IT concepts, ITSM has been increasingly used in parts of service provider networks. CMDBs are a database of Configuration Items (CIs), where CIs can be logical or physical entities. CIs may (or may not) be physical devices (PNI) or logical resource entities (LNI) and may (or may not) represent tangible values (assets). The main purpose of CIs is to store information about IT services that will allow other ITSM processes, such as Incident, Problem and Change Management, to be performed efficiently.

Not only is there functional overlap between these systems, there’s often also terminology overlap and/or misalignments. Different vendors have different levels of functionality and support alternate use-cases, so the areas of overlap differ between organisations.

CMDB tend to have great flexibility to create associations (eg parent, child, related to, etc), that can establish the connectivity and hierarchies of an IMS. However, they tend to require significant effort to establish and maintain rather than having pre-established relationships like IMS do. An example might be a cable that connects from port 1 on device A to port 5 on device B, but then gets re-patched to port 8 – this scenario tends to be easier in an IMS than a CMDB.

Oh, and I also promised to mention VIMs and Config Managers:

Virtual Infrastructure Managers (VIM) are responsible for managing the virtual resources made available by physical infrastructure like compute, storage and network devices. In some cases, VIMs generate virtual network devices (VNFs) or virtual machines (VMs) that could look almost identical to any other device stored in LNI, PNI, AMS and/or CMDB. In fact, instances of these VNFs and VMs may even appear in those systems.

Config Management (as opposed to, but also potentially overlapping with, CMDB), is all about managing the configurations of devices in the network (often active network and corporate network). Each device, such as a router, has a configuration that tells the hardware how to function, where to route traffic, which packets to prioritise, where to send management logs (to the OSS), etc. Being able to monitor and manage these configurations centrally and consistently is the purpose for Config Managers. These are mostly used by network engineers to set policies and golden-configs (ie the config templates that all devices of that type must adhere to consistently). For example, you may have hundreds/thousands of devices in your network and want to re-point all management traffic to a new server as part of an OSS upgrade. Rather than configuring each device separately and manually, you can use the config management tool to push config changes out to the network.

Leave us a message to describe how your organisation use these (and other) tools.

Bleak sentiments

People in a tough spot often focus on their own problems, when the answer usually lies in fixing someone else’s.”
Steve Schwarzman.

The telco industry is in a tough spot in many areas around the globe. Sadly, there were more stories of wholesale retrenchments here in Australia this week, including good friends. Revenues falling. Sentiment bleak.

Yet it’s not like most declining industries. Telecom services and remote communications are more in demand than ever before. It’s just that the dynamic has flipped. Previously there was a scarcity mindset around telco services. Now there’s an abundance (in many parts of the world). Connectivity is less of a problem now. Customers have stepped up the hierarchy of needs.

Maybe as industries, telco and OSS, we’re focusing on our own problems? The O in OSS, Operations, can trigger an inward-facing viewpoint (and retrenchments only exacerbate internalisation).

5G isn’t the solution if we’re just looking at it as a new, better model of connectivity. It might be if it can make us better at solving other people’s problems. I suspect OSS will have a big part to play in generating those solutions (or stymieing them!).

Hello Trouble!

Hello, Trouble.
It’s been a while since we last met.
But I know you’re still out there.
And I have a feeling you’re looking for me.
You wish I’d forget ya.. Don’t ya trouble?
Perhaps it is you, that has forgotten me.
Perhaps I need to come find you.
Remind you, who I am.

Sounds like an apt mindset for working in the OSS industry doesn’t it?

 

Or for marketing knives.

I’ll be honest. I like the OSS perspective better!

 

OSS discovers a network

Following yesterday’s post about OSS Inventory, I received another great follow-up question from another avid reader of the PAOSS blog:

Interesting thoughts Ryan! In addition to ‘faults up’, perhaps there is a case also (obvious?) for ‘discovery up’ to capture ongoing non-planned changes? Wondering have you come across any sort of reconciliation / adaptive inventory patterns like this? Workflow based? Autonomous? (Going to far into chaos theory territory ?

Yes, we did exactly that with the same tool discussed yesterday that I used back in 2000. In fact, a very clever dev and I got that company’s first-ever auto-discovery tool working on site (using a product supplied by head-office). Discovering the nodals (ie equipment, cards, ports) was fairly easy. Discovering the connectivity within a domain (we started with SDH) was tricky, but achievable. Auto-discovering cross-domain connectivity (ie DSL circuits through physical, SDH transit, ATM and logical connectivity onto the IP cloud) was much trickier as we needed to find/make linking keys across different data sources.

It was definitely workflow based with a routine-driven back-end. We didn’t just want anything that was discovered to be automatically stuffed into (or removed from) the database (think flapping ports or equipment going down temporarily). It could’ve been automous, but we introduced a manual step to approve any of the discoveries made by each automated discovery iteration.

As you know, modern networks / EMS / VIM (resource managers) are much more discoverable. They need to be for modern orchestration and resilience techniques. I don’t think it would be quite so tricky to stitch circuits together as we’re no longer so circuit-oriented as back in 2000.

However, I’d be fascinated to hear from other readers how much of a problem they have trying to marry up different data sources for discovery purposes. I’d also love to hear whether they’re using fully autonomous discovery or using the manual intervention step for the same reason we were. I imagine most are automating, because orchestration plans just need to make use of whatever resources are being presented by the underlying resource managers in near-real-time.

PS. For those wondering what “discovery” is, it’s shown in the lower grey arrow in this diagram from “Orders down, Faults up

Discovery is the process that allows data to be passed from NMS/EMS/NEs (ie the network or resource managers) directly into the inventory management database. It should be a more reliable and expedient way of sychronising the inventory with the live network. 

The reason for the upper grey arrow is because not all networks have APIs that can be “discovered.” Passive equipment like cable joints and patch-panels don’t have programmatic interfaces. Therefore we need to find other ways to get that data into the Inventory Manager.

Various forms of OSS Inventory

After reading other recent posts such as “Orders Down, Faults Up” and “How is OSS/BSS service and resource availability supposed to work?” an avid reader of the PAOSS blog posed the following brilliant question:

Do you have any thoughts on geospatial vs non geospatial network inventory systems? How often do you see physical plant mapping in a separate system from network inventory, with linkages or integrations between them, vs how often do you see physical and logical inventory being captured primarily in a geospatially oriented system?

Boy do I ever have some thoughts on this topic!! I’m sure you do too, so I’d love to hear what you think in the comments section below.

I was lucky. The first OSS/BSS that I worked on (all the way back in 2000), had both geo and non-geo (topology) views. It also had a brilliantly flexible data model that accommodated physical and logical inventory. All tightly integrated into one package. There aren’t many tools that can do all of that even today. Like I said, I was lucky to have this as a starting point!!

Like all things OSS/BSS, it starts with the personas and the key tasks they need to perform. Or from the supplier’s perspective, which customer personas they’re most actively targeting.

For example, if you have a significant Outside Plant (OSP) Network, then geo-positioning is vital. The exchanges and comms huts are easy enough to find, but pits, cable routes, easements, etc are often harder to find. It’s not uncommon for a field tech to waste time searching for a pit that’s covered in dirt, grass or snow. And knowing the exact cable route in geo view is helpful for sending field techs to the exact location of a fault (ie helping them to pinpoint the location of the bright yellow excavator that has just sliced through your inter-capital link). Geo-view is also important for OSP designers and the field workforce that builds the OSP network.

But other personas don’t care about seeing the detailed cable route. They just want to see a point-to-point topological link to represent physical connections between the ports on adjacent devices. This helps them to quickly understand the network or circuit / service view. They may also like to see an alarm overlay on the topology to quickly determine which parts of the network aren’t performing as expected. For these personas, seeing all the geo-detail just acts as visual noise that they need to subconsciously filter out to understand the topology view.

These personas also tend to want topological views of the network, not just the physical but the logical and virtual network / service overlays too.

In most cases that I can think of, the physical / OSP inventory tools show the physical devices (ports even) that the OSP network connects into. Their main focus is on the cables, joints, pits, pipes, catenaries, poles, lead-ins, patch-panels, patch-leads, splitters, etc. But showing the termination of cables onto active equipment (Inside Plant or ISP) is an important linking key between the physical and logical views.

The physical port (on the physical device) becomes the key demarcation between physical and logical worlds. The physical port connects physical cables / leads, but it also acts as the anchor point from which to create logical ports to which logical connections are made. As a result, the physical device and port tend to be shown in both physical (geo) and logical inventory tools. They also tend to be shown in both physical and logical network topology views.

In the case of the original OSS/BSS I worked on, it had separate visualisation tools for geo, network and circuit/service, but all underpinned by a common data model.

What’s the best way? Different personas will have different perspectives of course. I prefer for physical and logical inventories to be integrated out of the box (to allow simple cross-ref visually and in queries)…. but I also prefer for them to have different views (eg geo, topology, network, circuit/service) to suit different situations.

I also find it helpful if each of those views allow the ability to drill down deeper into specific sections of the graph if necessary. I’d prefer not to have all of those different views overlaid onto a geo visualisation. Too much visual clutter IMHO, but others may love it that way.

Oh, and having separate LNI (Logical Network Inventory) and PNI (Physical Network Inventory) can be a tricky thing to reconcile. The LNI will almost always have programmatic interfaces (APIs) to collect data from, but will generally have to amalgamate many different sources. Meanwhile, the PNI consists of mostly passive equipment and therefore has no API to collect latest info from. I tend to use strategies at the above-mentioned demarcation point (ie physical ports) to help establish linking keys between LNI and PNI.

BTW. There’s one aspect of the question, “How often do you see physical plant mapping in a separate system from network inventory” that I haven’t fully answered. I’ll cover the question of asset management vs inventory management vs CMDB (Configuration Management Database) in more detail in an upcoming post. [Ed. See link here]

In need of an OSS transformation translator

As OSS Architects, we have an array of elegant frameworks to call upon when designing our transformational journeys – from current state to a target state architecture.

For example, when providing data mapping, we have tools to prepare current and/or target-state data diagrams such as the following:

Source here.

These diagrams are really elegant and powerful for communicating with other data experts and delivery teams. It’s a data expert language.

Data experts are experts of the ETL (Extract, Transform, Load) process, but often have less expertise with the actual meaning and importance of the data sets themselves. For example, a data expert may know there’s a product offerings table, and each has 23 associated attributes (eg bandwidth, SLA class, etc) available. But they may have less understanding of the 245 product types that are housed in the product’s data table, and even less awareness of the meanings of the thousands of product attributes. You need to be a subject matter expert (SME) to understand that detail about the data. In some cases, the SME might be from your client and knows far more tribal knowledge than you.

We often need other SMEs (the products expert in this case) to help us understand what has to happen with the data during transformation. What do we keep, what do we change, what do we discard, etc.

Just one problem – SMEs might not always speak the same language as the data experts.

As elegant as it is, the data relationships diagram above might not be the most intuitive format for product experts to review and comment.

As with many aspects of Architecture and transformation, if we’re to understand, it’s best to communicate in our audience’s language.

In this case, it might be best to show data mappings as overlays on screenshots that the Product owner is familiar with:

  • From
    • Their current GUI
    • Existing sales order forms
    • Current report templates
  • To
    • Their next-generation GUI
    • New order forms
    • Post-Transform report templates

Such an approach might not look elegant to our data expert colleagues. The question is whether it quickly makes enough sense to the SMEs for you to elicit concise responses from them.

The “right” approach is not always the most effective.

I’d love to hear your tips, tricks and recommendations for speaking / listening in the audience’s language.

New functionality added to Blue Book OSS/BSS Vendor Directory

We’re excited to announce the release of some new functionality on The Blue Book OSS/BSS Vendor Directory (which now hosts nearly 450 different OSS/BSS supplier listings).

We’ve introduced:

  • An Industry News feed

    If you wish to publish news / press-releases on products, contract wins, changes in ownership structure, job advertisements, tradeshow attendance or any other related news, you can click here to create a news item (note that you have to be logged in and be an authorised listing owner – you can find a how-to guide here).
    If tagged to a vendor, these news items also appear in the listing of that vendor.

    If you are an OSS/BSS buyer and wish to publish news about an Expression of Interest (EOI), Request for Proposal (RFP), or similar,  please leave us a note at the Contact Page and we’ll publish your details.

  • Twitter Feed
    Each of the OSS/BSS vendors that have their Twitter handle registered now have their Twitter feed visible within their listing.

We hope that these two new features make it easier for buyers to find up-to-date information about the suppliers they’re most interested in.

We have plenty of additional new functionality under development and data being loaded, so be sure to check back in from time to time.

If you have any additional features that you think we should add, we’d love to hear from you, so please let us know via the Contact Page too.