OSS Sandpit – 5G Network Inventory Prototype

5G networks seem to be the big investment trend in telco at the moment. It comes with a lot of tech innovation such as network slicing and an increased use of virtualised network functions (VNFs). This article provides an example of building 5G Network components into the inventory module of our Personal OSS Sandpit Project.

This prototype build includes components such as:

  • Hosting infrastructure
  • NFVI / VIM (NFV Infrastructure and Virtualised Infrastructure Management)
  • A 5GCN (5G Core Network)
  • An IMS (IP Multimedia Subsystem)
  • An RIC (RAN Intelligent Controller)
  • Virtualised Network Functions (AUSF, AMF, NRF, CU, DU, etc – a more extensive list of examples is provided later in this article)
  • Mobile Edge Compute (MEC)
  • MEC Applications like gaming servers, CDN (Content Delivery Networks)
  • Radio Access Network (RAN) and Remote Radio Units (RRU)
  • Outside Plant for fibre fronthaul and backhaul
  • Patching between physical infrastructure
  • End to end circuits between DN (Data Network), IMS, 5GCN, gNodeB, RRU
  • Logical Modelling of 5G Reference Points

Our prototype (a Standalone 5G model) is summarised in the diagram below:

We describe this via the following use-cases:

  • Building Reference Data like data hierarchies, device types, connectivity types, containment, device layouts, templates, flexible data models, etc
  • Creating Device Instances including rack views and the virtualised layers within them
  • Creating Physical Connections between devices
  • Creating Logical Connections between devices
  • Creating Network Slices in the form of services
  • Performing Service Impact Analysis (SIA)

Reference Data

Starting off with the data hierarchy, we had to develop some new building blocks (data classes) to support the virtualisation used in 5G networks. This included some new network slice types, virtualisation concepts and various other things:

In our prototype, we’ve developed a custom containment model as follows:

  • Country
    • Site
      • System (Network Domain)
        • Rack
          • Hosting
            • NFVI / VIM
              • VNF-Groupings (eg CU, DU, MEC, IMS, etc)
                • VNF
                  • Apps (like Gaming Servers)

In a real situation, you probably wouldn’t bother to model to this level of detail as it just makes more data to maintain. We’ve just included this detail to show some of the attributes of our sample 5G network.

5G also required some new templates, especially for core infrastructure that can house dozens of VNFs, 5G reference points and apps (eg games servers, CDN, etc) that you don’t want to recreate each time.

The 5G System architecture includes the following network functions (VNFs) and others.

  • Authentication Server Function (AUSF).
  • Access and Mobility Management Function (AMF).
  • Data Network (DN), e.g. operator services, Internet access or 3rd party services.
  • Unstructured Data Storage Function (UDSF).
  • Network Exposure Function (NEF).
  • Network Repository Function (NRF).
  • Network Slice Specific Authentication and Authorization Function (NSSAAF).
  • Network Slice Selection Function (NSSF).
  • Policy Control Function (PCF).
  • Session Management Function (SMF).
  • Unified Data Management (UDM).
  • Unified Data Repository (UDR).
  • User Plane Function (UPF).
  • UE radio Capability Management Function (UCMF).
  • Application Function (AF).
  • User Equipment (UE).
  • (Radio) Access Network ((R)AN).
  • 5G-Equipment Identity Register (5G-EIR).
  • Network Data Analytics Function (NWDAF).
  • CHarging Function (CHF).

Device Instances

We then create the devices to build the prototype network model shown in the first diagram above. This includes:

  • Hosting infrastructure
  • NFVI / VIM (NFV Infrastructure and Virtualised Infrastructure Management)
  • A 5GCN (5G Core Network)
  • An IMS (IP Multimedia Subsystem)
  • An RIC (RAN Intelligent Controller)
  • Virtualised Network Functions (AUSF, AMF, NRF, CU, DU, etc)
  • Mobile Edge Compute (MEC)
  • MEC Applications like gaming servers, CDN (Content Delivery Networks)
  • Radio Access Network (RAN) and Remote Radio Units (RRU)

This diagram below shows a small snapshot of the 5G Core. The templates we created earlier sure came in handy to avoid re-creating these hierarchies for each device type:

Note that the VirtualPorts are used for 5G reference points to support logical links, which we’ll cover later.

The diagrams below show the rack-layout views of core and edge hosting respectively. You’ll notice the hierarchy of device, NFVI, VNF-group, VNFs and applications are shown:

Physical Connections

To create the physical connectivity between core, edge and RRU, we’ve re-used the fibre cables, splice joints and ODFs that we demonstrated in the introduction to the OSS Sandpit inventory module.

In this case, we’ve just used fibres that were spare from last time and patched onto the 5G network’s physical infrastructure. The diagram below shows the physical path all the way from the Data Network (DN – aka a core router) to the transmitting antenna at site 2040.

This diagram includes router, core hosting, ODFs (optical patch panels), cables, splice joints, edge hosting, Radio Units and antenna, as well as fibre front and backhaul circuits.

Logical Connections

We also decided to create the various logical connections – in the most part these are interfaces between VNFs – via the standardised 5G Reference Points. 

You can also find a reference to the various logical interfaces / reference-points in the top-right corner of the prototype diagram (first diagram above).

You can also see the full list of reference points from any given VNF, as shown in the example of the AMF below. You’ll notice that these have already been set up as logical links to other components, as shown under “mplsLink” in the bottom pane. (ie the top pane are the “ports” on the AMF, the bottom pane shows the logical links to other VNFs)

The upper pane shows the instance of AMF (on the core) and its various interface points (A-end of the interface as VirtualPorts). The lower pane shows the relationships to Z-end components via logical circuits (note that I had to model them as MPLS links, which is not quite right, but the workaround needed in the tool).

You’ll also notice that the AMF is used by a number of network slices (under “uses” in the bottom pane), but we’ll get to that next.

Network Slices

Whilst not really technically correct, we’ve simulated some network slices in the form of “internal” services. To simplify, for each network slice type we’ve created a separate service terminating at each RRU. So, we’ve associated each RRU, Mobile Edge Infra (RAN), AMF (the Access and Mobility Management Function within the core) and the NSSF (the Network Slice Selection Function within the core) to these network slice “services.”

Some samples are shown below.

BTW 3GPP has defined the following Slice Types:

  • MIoT – Massive Internet of Things – support a huge device counts with enhanced coverage and low power usage
  • URLLC –  Ultra-Reliable Low-Latency Communications – to support low-latency, mission-critical applications
  • eMBB – Enhanced Mobile Broadband – to provide high speed data for application use (eg video conferencing, etc) and
  • V2X – Vehicle to Everything

Service Impact Analysis (SIA)

We can also use the service relationships to determine which Network Slices would be affected if the AMF failed. In the example below, there would be seven slices affected (see under “Uses” in the bottom pane), including all supported via sites 2040 and 2052

Similar analysis could be done using the getAffectedServices API that we demonstrated in the OSS Sandpit Inventory Intro post.

SigScale RIM

Over the last few weeks, I’ve also been using another open-source inventory management tool from SigScale called RIM (a Resource Inventory Manager designed to support service assurance use cases). It shines a light on mobile networks in particular.

The project creators authored the TM Forum best practice document IG1217 Resource Inventory of 3GPP NRM for Service Assurance which details the rationale for, and process of, mapping 3GPP information models to TM Forum’s TMF634 (Resource Catalog Mgmt) and TMF639 (Resource Inventory Mgmt) standards.

I plan to also use RIM’s REST interface (based on TM Forum’s OpenAPIs) to share data both ways with the Kuwaiba inventory module in the future. 

Summary

I hope you enjoyed this brief introduction into how we’ve modelled a sample 5G network into the Inventory module of our Personal OSS Sandpit Project. Click on the link to step back to the parent page and see what other modules and/or use-cases are available for review.

If you think there are better ways of modelling the 5G network, if I’ve missed some of the nuances or practicalities, I’d love to hear your feedback. Leave us a note in the contact form below.

OSS Sandpit – Resource / Inventory Module

This article provides a description of the inventory baseline, one module of our Personal OSS Sandpit Project.

As outlined in the diagram below, this incorporates the Inventory solution (by Kuwaiba), the graph database that underpins it as well as its APIs and data query tools. The greyed out sections are to be described in separate articles.

OSS Sandpit Inventory Baseline

We’ve tackled inventory first as this provides the base data set about resources in the network that other tools rely upon.

As the baseline introduction to the inventory module, we’ll provide a quick introduction to the following use-cases:

  • Building Reference Data like data hierarchies, device types, connectivity types, containment, device layouts, templates, flexible data models, etc
  • Creating Device Instances including rack views
  • Creating Physical Connections between devices
  • Creating Logical Connections between devices
  • Creating Services and their relationships with resources / inventory
  • Creating Outside Plant Views on geo-maps that include buildings, pits, splice cases, cable management, splicing, towers, antenna, end-to-end L1 circuits
  • Assigning IP Addresses and subnets with an IPAM tool
  • Creating an MPLS network
  • Creating an SDH network
  • Data import / export / updates via APIs including Service Impact Analysis (SIA)
  • Data import / export / updates via a Graph Database Query Language

Reference Data

Kuwaiba has a highly flexible and extensible data model. We’ve added many custom data classes (eg device categories like routers, switches, etc) such as those shown below:

And selectively added custom attributes to each of the classes (such as the Router class below):

Once the classes are created, we then create the Containment model (ie hierarchy of data objects). In our prototype, we’ve developed a custom containment model as follows:

  • Country
    • Site
      • System (Network Domain)
        • Rack
          • Equipment and so on.

We’ve also created a series of data templates to simplify data entry, such as the Cisco ASR 9001 and Generic Router examples below:

But we can also create templates for other objects, such as cables. The following sample shows a 24 fibre cable with two loose-tubes, each containing 12 fibre strands. (Note that colour-coding on tubes and strands is important for splicing technicians and designers)

Site and Device Instances

Next, we created some sites and devices within the sites, as shown below:

You’ll notice that some devices are placed inside a rack whilst others aren’t. You’ll also notice the naming convention for all devices (eg site – system – type – index, where site = 2052, system = DIS (distribution), type = CD (CD player for messaging) and index = 01 (the first instance of CD player at this site)).

It even allows us to show rack layouts (of equipment positions inside a rack)

And even patching-level details inside the rack (pink and blue lines represent patch-leads connecting to ports on the Cisco ASR 9001 router in rack position 2):

Physical Connections

Physical connections can take the form of patch-leads or via strands / conductors inside cables.

The diagram below represents a stylised optical fibre connection that we’ve created between a CODEC at site 2000 and another CODEC at site 2052. As you’ll also notice, it traverses two patch panels (ODFs – optical distribution frames), two splice joints and three optical fibre cables.

In our inventory tool, the stylised connection above presents as follows, where A and B have been added to indicate the patch-leads from the CODECs to patch-panels (ODFs):

Logical Connections

We can also represent logical and virtual connections. In the case below, we show a logical connection from the Waveguide of an antenna, to the broadcast of that signal to a neighbouring site, which then picks up the signal at the UAST (receiver).

Outside Plant Views

Outside plant (OSP) are the cables, joints, manholes, etc that help connect sites and equipment together. In the example below, we see the OSP view of the fibre circuits we described above in “Physical Connections.” If you look closely on the GIS (map overlay) below, you’ll spot sites 2000 and 2052, as well as the cables and splice joints. The lines show the physical route that the cables follow. 

You may also have noticed that the green line is showing a radio broadcast link, which is point-to-point radio and therefore follows a straight line path from antenna to antenna.

Cable Management

Cable management and splicing / connections is supported, with tubes/strands being selected and then terminated at each end of the cable (in this case CABLE1 and its strands connect the splice case on the left pane, with the ODF on the right pane). These can be managed on a strand-by-strand basis via the central pane. From the diagram, we can see that fibre 001 in CABLE 1 is connected to F1-001 in the splice case and 001-back on the ODF from the A-end and B-end details in the bottom left corner.

From the naming convention, you’ll notice that there are two sets of cable “ports” in the splice case, as indicated by fibre numbers starting with F1 and F2 respectively.

Topology Views

The diagram below shows a topological view of the devices within a site, helping operators to visualise connectivity relationships.

Services

One of the most important roles that inventory solutions play is as a repository of equipment and capacity. They also assist in allocating available resources to customer services. In the example below, service number “2052-ABC_LR_97.3FM-BSO” has a dependency on a tower, antenna, antenna switch frame and many more devices. If any of these devices fails, it will impact this customer service, as we’ll describe in more detail below.

 

IP Address Management (IPAM) and IP Assignment

We can manage IP address ranges / subnets, such as the examples below:

And then allocate individual IP addresses to devices, such as assigning IP address 222.22.22.1 to the CODEC, as shown on the “Physical Connection” diagram above.

MPLS Network

The following provides a simple MPLS network cloud for a customer:

APIs (including Service Impact Analysis Query)

The solution has hundreds of in-built APIs that facilitate queries, adds, modifies and deletes of data. 

The example shown below is getAffectedServices, which performs a service impact analysis. In this case, if we know that the device TEST-CD-02 fails, it will affect service number “2052-ABC_LR_97.3FM-BSO.” We can also look up the attributes of that service, which could include customer and customer contact details so that we can inform them their service is degraded and that repair processes have been initiated.

Note that the left-side pane is the Request and the right-side pane is the Response across the getAffectedServices API.

Data management via queries of the Graph Database

This inventory tool uses a Neo4j graph database. Using Neo4j browser, we can connect to the database and issue cypher queries (which are analogous to SQL, a structured query language that allows you to read/write data from/to the database). 

The screenshot below shows the constellation of linked data returned after issuing the cypher query (MATCH (n:InventoryObjects…. etc)). The data can also be exported in other formats, not just the graphical form shown here. 

I hope you enjoyed the brief introduction to the Inventory module of our Personal OSS Sandpit Project. Click on the link to step back to the parent page and see what other modules and/or use-cases are available for review.

Building a Personal OSS Sandpit

Being Passionate About OSS, I’ve used this blog / site to share this passion with the OSS community. The aim is to evangelise and make operational support tools even more impactful than they already are.

But there’s a big stumbling block – the barrier to entry into the OSS industry is huge. The barrier manifests in the following ways:

  • Opportunities – due to the breadth of knowledge required to be proficient, there aren’t many entry-level, OSS-related roles
  • Knowledge / Information – such is the diversity of knowledge, no single person has expertise across all facets of OSS – facets such as software, large-scale networks, IT infrastructure, business processes, project implementation, etc, etc. Even within OSS, there’s too huge a range of functional capability for anyone to know it all. Similarly, there’s no single repository of information, although organisations like TM Forum do a great job at sharing knowledge. On top of all that, the tech-centric worlds that OSS operate within are constantly evolving and proliferating
  • Access to ToolsOSS / BSS tools tend to be highly flexible, covering all aspects of a telco’s business operations (from sales to design to operations to build). OSS tend to cost a lot and take a long time to build / configure. That means that unless you already work for a telco or an OSS product vendor, you may struggle to get hands-on experience using the tools

It’s long been an ambition to help reduce the third barrier to entry by making personal OSS sandpit environments accessible to anyone with the time and interest.

The plan is to build a step-by-step guide that allows anyone to build their own small-scale OSS sandpit and try out realistic use-cases. The aim is to keep costs to almost nothing to ensure nobody is limited from tackling the project/s.

The building blocks of the sandpit are to be open-source and ideally reflect cutting-edge technologies / architectures. The main building blocks are:

  1. Network – a simulated, multi-domain network that can be configured and tested
  2. Fulfilment / BSS – the ability to create product offerings, take customer orders for those products, then implement as services into a network
  3. Assurance / Real-time – to perform (near) real-time monitoring of the network and services using alarms / performance / telemetry / logging
  4.  Resource / Inventory – to design and store records of multi-domain networks that spans PNI (Physical Network Inventory), LNI (Logical Network Inventory), OSP (Outside Plant), ISP (Inside Plant) and more
  5. Data Visualisation & Management – being able to interact with data generated via the abovementioned building blocks. Interact via search / queries, reports, dashboards, analytics, APIs and other forms of data import / export

OSS Sandpit Concept Diagram

Until recently, this sandpit has only been an ambition. But I’m pleased to say that some of these building blocks are starting to take shape. I’ll share more details in coming days and update this page.

This includes:

  • Introduction to The Resource / Inventory Module with the following use-cases:
    • Building Reference Data like location hierarchies, device types, connectivity types, containment, device layouts, templates, flexible data models, etc
    • Creating Device Instances including rack views
    • Creating Physical Connections between devices
    • Creating Logical Connections between devices
    • Creating Services and their relationships with resources / inventory
    • Creating Outside Plant Views on geo-maps that include buildings, pits, splice cases, cable management, splicing, towers, antenna, end-to-end L1 circuits
    • Assigning IP Addresses and subnets with an IPAM tool
    • Creating an MPLS network
    • Creating an SDH network
    • Data import / export / updates via APIs including Service Impact Analysis (SIA)
    • Data import / export / updates via a Graph Database Query Language
  • Designing the Inventory / Resources of a 5G Network including:
    • Hosting infrastructure
    • NFVI / VIM
    • A 5GCN (5G Core Network)
    • An IMS (IP Multimedia Subsystem)
    • An RIC (RAN Intelligent Controller)
    • Virtualised Network Functions (AUSF, AMF, NRF, CU, DU, etc)
    • Mobile Edge Compute (MEC)
    • MEC Applications like gaming servers, CDN (Content Delivery Networks)
    • Radio Access Network (RAN) and Remote Radio Units (RRU)
    • Outside Plant for fibre fronthaul and backhaul
    • Patching between physical infrastructure
    • End to end circuits between DN (Data Network), IMS, 5GCN, gNodeB, RRU
    • Logical Modelling of 5G Reference Points

More details and use-cases to come, including:

  1. Inventory modelling of:
    1. Satellite services
    2. Fixed wireless and Rural ISP network models
    3. Internet of Things (IoT)
    4. GPON / FTTH
    5. HFC / CableCo
    6. Data Centre (DC)
    7. MPLS
    8. SDH Transmission
    9. More network virtualisation (SDN), in addition to the virtualisation scenarios covered in the 5G prototype (see above)
    10. Are there any other scenarios you’d like to see???

 

If you’d like to know more about our Personal OSS Sandpit Project, fill in the contact form below.

How fragmentation is harming the OSS/BSS industry

Our Blue Book OSS/BSS Vendors Directory provides a list of over 400 vendors. That clearly states that it’s a highly fragmented market. This amount of fragmentation hurts the industry in many ways, including:

  • Duplication – Let’s say 100 of the 400 vendors offer alarm / fault management capabilities. That means there are 100 teams duplicating effort in creating similar functionality. Isn’t that re-inventing the wheel, again and again? Wouldn’t the effort be better spread into developing new / improved functionality, rather than repeating what’s already available (more or less). And it’s not just coding, but testing, training, etc. The talent pool overlaps on duplicated work at the expense of taking us faster into an innovative future
  • Profit-share – The collective revenues of all those vendors need to be spread across many investors. Consolidated profits would most likely lead to more coordination of innovation (and less duplication above). And just think at how much capability has been lost in tools developed by companies that are no longer commercially viable
  • Overhead – Closely related is that every one of these organisations has an overhead that supports the real work (ie product development, project implementation, etc). Consolidation would bring greater economies of scale
  • Consistency – With 400+ vendors, there are 400+ visions / designs / architectures / approaches. This means the cost and complexity of integration hurts us and our customers. The number of variants makes it impossible for everything to easily bolt together – not withstanding the wonderful alignment mechanisms that TM Forum, MEF, etc create (via their products Frameworx, OpenAPIs, MEF 3.0 Framework, LSO, etc). At the end of the day, they create recommendations that vendors can interpret as they see fit. It seems the integration points are proliferating rather than consolidating
  • Repeatability and Quality – Repeatability tends to provide a platform for continual improvement. If you do something repeatedly, you have more opportunities to refine. Unfortunately, the bespoke nature of OSS/BSS implementations (and products) means there’s not a lot of repeatability. Linus’s Law of OSS defects also applies, with eyeballs spread across many code-bases. And the spread of our variant trees means that we can never have sufficient test coverage, meaning higher end-to-end failure / fall-out rates than should be acceptable 
  • Portability – Because each product and implementation is so different, it can be difficult to readily transfer skills between organisations. An immensely knowledgeable, talented and valuable OSS expert at one organisation will likely still need to do an apprenticeship period at a new organisation before becoming nearly as valuable
  • Analysis Paralysis – If you’re looking for a new vendor / product, you generally need to consider dozens of alternatives. And it’s not like the decisions are easy. Each vendor provides a different set of functionality, pros and cons. It’s never a simple “apples-for-apples” comparison (although we at PAOSS have refined ways to make comparisons simpler). It’s certainly not like a cola-lover having to choose between Coke and Pepsi. The cost and ramifications of an OSS/BSS procurement decision are decidedly more significant too obviously
  • Requirement Spread – Because there are so many vendors with so many niches and such a willingness to customise, our customers tend to have few constraints when compiling a list of requirements for their OSS/BSS. As described in the Lego analogy, reducing the number of building blocks, perhaps counter-intuitively, can actually enhance creativity and productivity
  • Shared Insight – Our OSS/BSS collect eye-watering amounts of data. However, every data set is unique – collection / ETL approach, network under management, product offerings, even naming conventions, etc. This makes it challenging to truly benchmark between organisations, or share insights, or share seeded data for cognitive tools

However, I’m very cognisant that OSS come in all shapes and sizes. They all have nuanced requirements and need unique consideration. Yet many of our customers stand on a burning platform and desperately need us to create better outcomes for them.

From the points listed above, the industry is calling out for consolidation – especially in the foundational functionality that is so heavily duplicated – inventory / resource management, alarms, performance, workflows, service ordering, provisioning, security, infrastructure scalability, APIs, etc, etc. 

If we had a consistent foundation for all to work on, we could then more easily take the industry forward. It becomes a platform for continuous improvement of core functionality, whilst allowing more widespread customisation / innovation at its edges.

But who could provide such a platform and lead its over-arching vision?

  • I don’t think it can be a traditional vendor. Despite there being 400+ vendors, I’m not aware of any that cover the entire scope of TM Forum’s TAM map. Nor do any hold enough market share currently to try to commandeer the foundational platform
  • TM Forum, wouldn’t want to compromise their subscriber base by creating something that overlaps with existing offerings
  • Solution Integrators often perform a similar role today, combining a multitude of different OSS/BSS offerings on behalf of their customers. But none have a core of foundational products that they’ve rolled out to enough customers to achieve critical mass
  • I like the concept of what ONAP is trying to do to rally global carriers to a common cause. However, its size and complexity also worries me. That it’s a core component of the Linux Foundation (LF Networking) gives it more chance of creating a core foundation via collaborative means rather than dictatorial ones

We’d love to hear your thoughts. Is fragmentation a good thing or a bad thing? Do you suggest a better way? Leave us a message below.

A new revenue line just waiting for OSS/BSS to grab

I’m assuming that if you’re reading this blog, chances are you’re already an OSS/BSS expert, or spend a lot of your working life thinking about them. Perhaps you do more than think about them and actually help to implement them in some way. Perhaps you don’t implement them yet, but have been tasked with understanding them in more detail.

During those activities do you spend much time thinking about the end user (EU)?

But I guess I should first start by classifying who I think our EUs actually are. They can fall into a few different categories:

  • Internal EUs (IEUs) – If you’re developing an OSS/BSS in-house, then you might be providing a product / service to your colleagues in network operations, IT, sales, etc that helps them do their job
  • Client EUs (CEUs) – Similar to IEUs, but you’re providing a product / service to a client’s team in network operations, IT, sales, etc. This generally implies that you’re an OSS/BSS vendor or integrator that is supplying a solution to a client like a telco, utility or enterprise customer. Your solution helps them to provide a network-related service
  • Ultimate End Users (UEUs) – These are the people who consume network services. The IEUs and CEUs simply provide support to the UEUs to ensure their network and related services are all operational and usable.

My gut feel is that we don’t tend to think about UEUs very often. After all, I sometimes wonder whether our clients (eg telcos) only think about them in terms of how to avoid them.

The UEU calls the help line because they want someone to help them. But that’s expensive for our clients (eg telcos), so they’d rather:

  • IVR them or
  • Chat-bot them or
  • Point them to canned URLs / FAQs or
  • App them

When you are the upset customer, you want a full, uninterrupted hearing, you want to deal with someone with the authority to fix the problem and you want a fair resolution. You don’t want to be sent a copy of the company’s warranty policy.”
Jeffrey J. Fox.

The “typical” approach to handling upset customers (UEUs) for our telco clients is akin to sending a copy of the company’s warranty policy.

But the telcos are in a bit of a catch-22 situation. They want to keep customers happy (ie they don’t want churn). But they generally don’t have the tools that give sufficient insights about why the customer is upset. An expensive call centre operator (CEU) often adds little extra value than the cheap customer avoidance channels dot-pointed above. So customer avoidance mechanisms are invested in.

For example, NOC operators (CEUs) can see logs, alarms, performance within their network, but that doesn’t often directly translate into a user’s experience. It certainly doesn’t translate the network telemetry into words / actions that a contact centre operative (CEU) can clearly communicate with UEUs (especially non-tech-savvy UEUs).

Personally, I think this is a massive opportunity for OSS/BSS developers. If we could create the tools that:

  1. Could reliably interpret real UEU experiences (ie health of the service as experienced by the UEU, not reverse-engineering nodal info and guessing what the experience might be)
  2. Diagnose degradations / failures / fallouts / problems in UEU services
  3. Translate the diagnosis into actions / recommendations – either for the UEU to perform as self-help, or for CEU / backend-systems to repair
  4. Inform UEU of what’s happening and when the problem will be solved
  5. If the customer calls before a proactive notification can be sent, ensure the collective telemetry insights / recommendations are available for contact centre operators (CEU) to help resolve rather than deflect the problem

… then there would be less incentive for building the annoying customer avoidance mechanisms. If we can do that, OSS/BSS would surely pick up more of the investment that’s currently carved out for chat-bots, IVR, etc. It’s a new/improved revenue line, but only because it better helps the people further down the line who ultimately pay our bills.

OSS/BSS Testing – The importance of test data

Today’s is the third part in a series about OSS/BSS testing (part 1, part 2).

Many people think about OSS/BSS testing in terms of application functionality and non-functional requirements. They also think about entry criteria / pre-requisites such as the environments, application builds / releases, test case development and maybe even the integrations required.

However, an often overlooked aspect of OSS/BSS functionality testing is the  data that is required to underpin the tests.

Factors to be considered will include:

  • Programmatically collectable data – this refers to data that can be collected from data sources. Great examples are near-real-time alarm and performance data that can be collected by connecting to the network, either to real devices and NMS or simulators
  • Manually created or migrated data – this refers to data that is seeded into the OSS/BSS database. This could be manually created or script-loaded / migrated. Common examples of this are inventory data, especially information about passive infrastructure like buildings, cables, patch-panels, racks, etc. In some cases, even data that can be collected via a programmatic interface still needs to be augmented with manually created or migrated data
  • Base data – for consistency of test results, there usually needs to be a consistent data baseline to underpin the tests.
  • Reversion to baseline – If/when there are test cases that modify the base data set (eg provisioning a new service that reserves resources such as ports on a device), there may need to be a method of roll-back (or reinstatement) to base state. In other cases, a series of tests may require a cascading series of dependent changes (eg reserving a port before activating a service). These examples may need to be rolled-back as a sequence
  • Automated / Regression testing – if automations and/or regression testing is to be performed, automated reversion to consistent base data will also be required. Otherwise discrepancies will appear between automated test cycles
  • Migration of base data – for consistency of results between phases, there also needs to be consistency of data across the different environments on which testing is performed. This may require migration of base data between environments (see yesterday’s post about transitions between environments)
  • Multi-homing of data – particularly for real-time data, sources such as devices / NMS may issue data to more than one destination (eg different environments).
  • Reconciliation / Equivalency testing – When multi-homing of data sources is possible or when Current and Future PROD are running in parallel, equivalency testing can be performed. By performing comparison between the processed data in destination systems it allows for equivalency testing (eg between current and future state mediation devices / probes / MDDs). Transition planning will also be important here as we plan to migrate from Current PROD to Future PROD environments
  • Data Migration Testing – this is testing to confirm the end-to-end validity and completeness of migrated data sets
  • PROD data cuts – once a system is cut over to production status, it’s common to take copies of PROD data and load it onto lower environments. However, care should be taken that the PROD data doesn’t overwrite specially crafted base data (see “base data” dot-point above)

 

OSS/BSS Testing – Transitions

One of the most vital, but underestimated aspect of OSS/BSS project implementation is ensuring momentum is maintained. These large and complex projects are prone to stagnating at different stages, which can introduce pressure onto the implementation team.

As mentioned in yesterday’s post, the first in this week’s series, the test strategy and scheduling is regularly overlooked as a means of maintaining OSS project momentum. More specifically, careful planning of transitions between test phases and the environments that they’re run on, can demonstrate progress – where progress is seen through the introduction of business value.

The following diagram provides a highly stylised indicative timeline (x-axis) of activities, showing how to leverage multiple different environments (y-axis). See here for examples of additional environments that you may have on your project.

You’ll notice that this diagram covers:

  • Environments
  • Test phases (eg FAT, SAT, SIT, DMT, NFT, UAT – see descriptions of these test phases here)
  • Build phases (build, configure, integrate)
  • Data loads (reference data, symbolic data and/or real data extracts)
  • How builds and data loads can be cascaded between environments to reduce duplicated effort

OSS Phasing - Testing, Environments, Data

Some other important call-outs from this stylised diagram include:

  • The Builds and Data Loads cascade from lower environments (eg from PROD-SUPPORT to PRE-PROD). Thought needs to be given as to which builds and data sets need to be cascaded from which environments
  • However, after PRE-PROD is handed over and becomes PROD, it is common for cuts of production data to be regularly loaded back into the lower environments so that they are PROD-like
  • Stand-up of new PROD environments is often a long lead-time item (because of size and complexity such as resilience architectures, security, etc), compared to lower environments. Environments such as DEV/TEST could be as easy to stand up as creating a new Virtual Machine/s on existing hosting
  • Different environments may have access to different data sources / integrations. For example, the lower environments may be connected to lab versions of devices and NMS/EMS. Alternatively, the network might be mimicked by simulators in non-PROD environments. Other integrations could be for active directory (AD), environment logging, patch management, etc. Sometimes production data sources can be connected to non-PROD OSS environments, but this is not so common
  • The item marked as Base-build on the PRE-PROD / PROD environment reflects the initial build and configuration of virtualisation, databases, storage, management networking, resilience / failover mechanisms, backup/restore, logging, security hardening and much more
  • Careful transition planning needs to go into PROD cutover. In the sample diagram, a final cut of data comes from the Current PROD environment to PRE-PROD before it becomes the Future PROD environment during official handover
  • You’ll notice though that there may still be a period of overlap between cutover from Current PROD to Future PROD. This is because there needs to be staged data source cutover in cases where data sources like network devices can’t multi-home data feeds to both environments in parallel.

This earlier post provides some insights into novel ways to slice and dice your OSS implementation by planning of regular drops and consistent release of business value

OSS/BSS Testing – the V-Model

On major software projects like the OSS you’re building, testing is an important phase of course. You’ll have undoubtedly incorporated testing into your planning. After all, testing is a key component of any Software Development Life Cycle (SDLC). There are various SDLC models / methodologies such as Waterfall, V-Model, Agile and others that you can consider.

Unfortunately, most OSS project teams tend to underestimate the testing phase, thinking it can just fit in around other major activities towards the end of the implementation. Experienced testers will suggest that they should be involved right from the requirement capture phase, because they’ll have to design test cases to prove that each requirement is met.

More importantly, your test strategy and test phase transitioning can play a major part in maintaining momentum through a project’s delivery phase. We’ll look into a number of related details in a series of posts this week.

Today we’ll look at the V-Model. It can be a helpful model for mapping requirements to test phases / cases. The diagram below, which comes from my book, Mastering your OSS, shows a simplified, sample version of the V-Model. It highlights the relationship between key test artefacts (eg plans / designs / specifications / requirements) on the left with the corresponding test phases on the right.

V-Model Testing

Your documentation and test phases will probably differ. You can find a discussion about some other possible OSS test phases here.

We’ll take a closer look tomorrow at how your different test phases could map to the OSS environments you might have available.

Getting confused by key Assurance metrics?

Are you a bit slow like me and sometimes have to stop and think to differentiate your key assurance metrics like your MTTRs from your MTBFs?

If so, I thought this useful diagram from researchgate.net might help

The metrics are:

MTBF (Mean Time Between Failures) – the average elapsed time between failures of a system, service or device. It’s the basic measure of availability / reliability of the system / service / device. The higher, the better.

MTTR (Mean Time to Repair) – generally used to denote the average time to close a trouble ticket (to repair a failed system / service / device). It’s the basic measure of corrective action efficiency. The lower, the better.

Some also use MTTR as a Mean Time to Recover / Resolve (ie MTTD + MTTR in the diagram above) or Mean Time to Respond (MTTD in the diagram above to acknowledge an event and create a ticket). See why I get confused?

MTTD (Mean Time to Detect / Diagnose) – the average time taken from when an event is first generated and timestamped to when the NOC detects / diagnoses the cause and generates a ticket. The lower, the better.

MTTF (Mean Time to Failure) – the average system / service / device up-time

An OSS Security Summary

Our OSS / BSS manage some of the world’s most vital comms infrastructure don’t they? That makes them pretty important assets to protect from cyber-intrusion. Therefore security is a key, but often underestimated, component of any OSS / BSS project.

Let me start by saying I’m no security expert. However, I have worked with quite a few experts tasked with securing my OSS projects and picked up a few ideas along the way. I’ll share a few of those ideas in today’s post.

We look at:

  1. Security Trust Zones / Realms
  2. Restricting Access to OSS / BSS systems and data
  3. OSS / BSS Data Security
  4. Real-time Security Logging / Monitoring
  5. Patch Management
  6. Security Testing / Hardening
  7. Useful Security Standards

 

1. Security Trust Zones / Realms

For me, security starts with how you segment and segregate your network and related systems. The aim of segmentation / segregation is to restrict malicious access to sensitive data / systems. The diagram below shows a highly simplified three-realm design, starting at the bottom:

  1. The operator’s Active Network realm – the network that carries live customer traffic and is managed by the CSP / operator [Noting though, that these are possibly managed as virtual and/or leased entities rather than owned]. It comprises the routers, switches, muxes, etc that make up the network. As such, this zone needs to be highly secure. Customers connect to the Active Network at the edge of the organisation’s network, often via CPE (Customer Premises Equipment), NTU (Network Termination Units) or similar. Dedicated Network Operation Centre (NOC) operator terminals tend to connect inside the Active Network
  2. The operator’s Corporate / Enterprise realm – the network that houses the organisation’s corporate IT assets. This is where most corporate staff engage with core business services like desktop tools and so much more. If network operations staff need to connect to the Corporate / Enterprise realm but also reach into the Active Network realm, then an air-gap is usually established by the SCP between the two. This is bridged through technologies like Citrix, RDP (Remote Desktop Protocol) or similar
  3. The Cloud / Internet realm –  the external networks / infrastructure utilised by the organisation that are outside the organisation’s direct control. This includes Internet services, which many corporate users rely on of course. However, it may also include some important components of your OSS/BSS stack if provided as public cloud services, an increasingly common software supply model these days
  4. You’ll also notice the all-important Security Control Points (SCP) like firewalls that provide segregation between the zones

OSS BSS Cloud Security Control Points

In all likelihood, your security trust model will contain more than three zones, but these should be the absolute minimum.

The Active Network should be segregated from the Corporate / Enterprise network so that it can continue to provide service to customers even if the connection between them is lost (or intentionally severed if a security breach is identified).

This is where things get interesting. The Active Network and our Network Management stack rely on Shared Services such as DNS (Domain Naming System), NTP (Network Time Protocol), Identity / Access Management, Anti-Virus and more. These tend to be housed in Corporate / Enterprise realms. If we want the Active Network to be able to operate in complete standalone mode then we need to provide special consideration to the shared services architectures. 

Aside: Traditionally, we’ve focused on perimeter defense and authenticated users are granted authorised access to a broad collection of resources. We now see the trend towards more remote users and cloud-based assets outside the enterprise-owned boundary in our OSS architectures. There’s currently debate around whether zero-trust architectures are required to segment more holistically – to restrict lateral movement within a network, assuming an attacker is already present on the network.
The NIST ZTA draft discusses this emerging approach in more detail

Once we have the security trust zones identified, we now have to determine where our OSS / BSS / management stack resides within the zones. If we use the layers of the TMN Pyramid as a guide:

  • The Network Element Layer (NEL) is the heart of the Active Network
  • The EMS / NMS (Element / Network Management Systems) will also usually reside within the Active Network
  • The OSS / BSS are interesting. They have to interface with the network and EMS / NMS. But they also usually have to interface with corporate systems like data warehouses, reporting tools, etc. They’re so critical to managing the Active Network, they need to be highly secure. That means they could be placed inside the Active Network realm or even have their own special Central Management realm. In other cases, different components of the OSS / BSS might be spread across different realms.

OSS abstract and connect

Note that we also have to consider the systems (eg user portals, asset management systems, etc, etc) that our OSS / BSS need to interface to and where they reside in the trust model.

2. Restricting Access to OSS / BSS systems and data

We want to uniquely control who has access to what systems and data using our OSS / BSS stack.

The Security Trust model also impacts the architectures of Identity Management (Directory Services like Active Directory), User Access Management (UAM) and Privileged Access Management (PAM) solutions and how they control access to our OSS / BSS

They serve three purposes:

  • To provide fine-grained management of access to privileged / restricted data and systems within our OSS / BSS
  • To simplify the administrative overhead of managing user access to our OSS / BSS by defining group-based user access policies
  • To log the activities of individual users whilst they use the OSS/BSS and related systems / networks

Most OSS / BSS allow user authentication via Directory Services these days. Most, but not all, also allow roles / privileges to be assigned via Directory Services. For example, RBAC (Role Based Access Control) is policy that is defined by our OSS / BSS applications. It controls what functions users / groups can perform via permission management. For central user administration purposes, it’s ideal that the Directory Service can pass role-based information to our OSS / BSS

3. OSS / BSS Data Security

The first step in the data security process is to identify categories of data such as unclassified, confidential, secret, etc.

We then need to consider what security mechanisms need to be applied to each category. There are four main OSS / BSS data security considerations:

  1. Data Anonymisation / Privacy – is the process of removing / redacting / encrypting personally identifiable information from the data sets stored in our OSS / BSS (particularly the latter). Our solutions need to store personal data such as names, addresses, contact details, billing details, etc. We can use techniques to control the pervasiveness of access to that data. For example, we may use a tightly restricted system to store personal details as well as a non-identifiable code (eg LocationID or ServiceID) for use by our other more widely accessed tools (eg PNI / LNI)
  2. Encryption of data at rest – is the process of encrypting the large stores of data used by our OSS / BSS, whether a local database used by each application or in centralised data warehouses
  3. Encryption of data in transit – is the process of encrypting data as it transits between components within your OSS/BSS stack (and possibly beyond). Techniques such as VPNs and IPSec protocols can be used. As we increasingly see OSS / BSS built as web-based applications, we’re using encrypted connections (eg HTTPS, SSL, TLS, etc) to protect our data
  4. Physical security – is the process of restricting physical access to data stores (eg locked cabinets, facilities access management, etc). This isn’t always within our control as an OSS / BSS project team.

 

4. Real-time Security Logging / Monitoring

Ensure all systems in the management stack (OSS, BSS, NMS, EMS, the network, out-of-band management, etc) are logging to a central SIEM (Security Information and Event Management) tool. Oh, and don’t do what I saw one big bank do – they had so many hits occurring just on their IPS / IDS tool that they just left it sitting in the corner unmonitored and in the too-hard basket. By having the tools, they’d ticked their compliance box, but there was no checkbox asking them to actually look at the results or respond to the incidents identified!!

 

5. Patch Management

Software patch management is theoretically one of the simplest security management techniques to implement. It ensures you have the latest, hopefully most secure, version of all software.

OSS / BSS / Management stacks tend to have many, many different components. Not just at the obvious application level, but operating systems, third-party software (eg runtime environments, databases, application servers, message buses, antivirus software, syslog, etc). 

Patch management is often well maintained by IT teams within the Corporate / Enterprise trust zone discussed above. They have access to the Internet to download patches and tools to help push updates out. However, the Active Network zone shouldn’t have direct access to the Internet, so routine patch management could be easily overlooked and/or difficult to implement. Sometimes the software components reside on servers that are rarely logged into and patches can be easily overlooked.

The other problem is that OSS / BSS applications are often heavily customised, making it hard to follow a standard upgrade path. I’ve seen OSS / BSS that haven’t been patched for years, even with something as simple as Jave runtime environments, because it causes the OSS / BSS to fail.

 

6. Security Testing / Hardening

Your organisation probably already has standards and checklists in place to ensure that all of your IT assets are as secure as possible. Your OSS / BSS environments are just one of those assets. However, as the “manager of managers” of your Active Network, the OSS / BSS is probably more important to secure than most.  

Your organisation might also insist that all applications, including the OSS / BSS, are built on a hardened Standard Operating Environment (SOE). However, some suppliers provide OSS / BSS as appliances, built on their own environments. These then have to go through a hardening process in alignment with your corporate IT standards.

If using a vendor-supplied off-the-shelf application, it will be quite common for it to have a default admin account on the application and database. This makes it easier for the system implementation team to navigate their way around the solution when building it. However, one of the first steps in a hardening process is to rename or disable these built-in accounts.

As “manager of managers,” your OSS / BSS‘s primary purpose is to collect (or request) information from a variety of sources. Some of these sources reside in the Active Network. Others reside in the Corporate Network or elsewhere. As such, careful consideration needs to be given to what Ports / Protocols are allowed. Some systems will come pre-configured with default / open settings. However, these should be restricted to necessary protocols only, including SNMP, HTTPS, SSH, FTPS and/or similar.

Speaking of SNMP, its original design was inherently insecure as it uses a primitive method of authentication. It uses clear-text community strings to secure access to the management plane. Only version 3 of SNMP (ie SNMPv3) has the ability to authenticate and encrypt payloads, so this should be used wherever possible. Some of you may have legacy device types that precede SNMPv3 though. Alert TA17-156A provides suggestions to minimise exposure to SNMP abuse.

Also consider the environment on which you’re performing your security testing. As described in this post about OSS / BSS environments and test transitions, you’ll probably have multiple environments – PROD environments that are connected to the live Active Network devices and non-PROD environments that are connected to test lab devices and/or simulators. Where should you perform your penetration / security testing? Probably not on PROD, because you want to ensure the solution is already secure before letting it loose into Production. But you also want to ensure it’s the most PROD-like as possible. You could possibly use PRE-PROD (ie a state before a solution is cut-over to PROD), before it’s fully connected to the Active Network. Or, you could use the most PROD-like lower environment (eg Staging).

One other thing when conducting security tests and hardening – penetration testing often breaks things by injecting malicious code / data. Ensure you take a backup of any environment so you can roll-back to a working state after conducting your pen-tests.

 

7. Useful Security Standards

The following is a list of security standards that I’ve used in the past:

As I mentioned at the start, I’m far from being an expert in the field of network or data security. I’d love to get your feedback if I’m missing anything important!!

The common data store trend

Some time back, we discussed  A modern twist on OSS architecture that is underpinned by a common data model.
 
Time to discuss this a little more visually.
 
As the blue boxes on the left side of the diagram below show, you may have many different data sources (some master, some slaved). You may have a single OSS tool (monolithic solution) or you may have many OSS tools (best-of-breed approach).
 
You may have multiple BSS, NMS and even direct connections to network devices. You may even have other sources of data that you’ve never used before such as weather patterns, lightning strikes, asset management prediction modelling, SCADA data, HVAC data, building access / security events, etc, etc.
 
The common data model allows you to aggregate those sets to provide insights that have never been readily accessible to you previously.
 
So let’s look at a few key points
  1. Existing network layer systems (eg NMS, NE and their mediation devices) are currently sucking (near)real-time (ie alarm and perf) data out of the network and feeding to an OSS directly. They may also be pushing inventory discovery data to the OSS, although probably only loading less frequently (once-daily typically) .
  2. The common data model provides a few options for data flows: 
    1. If the data store is performant enough, the network layer could feed real-time data to the data store which on-forwards to OSS
    2. multi-home the data from the network to the data store and OSS simultaneously
    3. feed data from the network to the OSS, which may (or not) process before pushing to the data store
  3. Just a quick note regarding data flows: The network will tend to be the master for real-time / assurance flows. However, manual input tends to be the master for design/fulfil flows, so the OSS becomes the master of inventory data as per this link 
  4. The question then becomes where the data enrichment happens (ie appending inventory-related data to alarms) to help with root-cause and service-impact calculations. Enrichment / correlation probably needs to happen in the OSS‘s real-time engine, but it could source enrichment data directly from the network, from the OSS‘s inventory, or from the common data store 
  5. If the modern ETL tools (eg SNMP and syslog collectors, etc) allow you to do your own ETL to a common data store, a vendor OSS would only need one mediation device (ie to take data from the data store), rather than needing separate ones to pull from all the different NMS/EMS/NE) in your network. This has the potential to reduce mediation license costs from your OSS vendor
  6. Having said that, if you have difficult / proprietary interfaces that make it a challenge to do all of your own ETL then it might be best to let your OSS vendor build your mediation / ETL engines
  7. The big benefit of the common data store is you can choose a best-of-breed approach but still have a common data model to build Business Intelligence queries and reports around
  8. The common data store also takes load off the production OSS application / data servers. Queries and reports can be run against the common data platform, freeing up CPU cycles on the OSS for faster user interactions

The Common Data Model is supported by a few key advancements:

  1. In the past, the mediation layer (ie getting data out of the network and into the OSS) was a challenge. Network operators didn’t tend to want to do this themselves. This introduced a dependency on software suppliers / integrators to build mediation devices and sell them to operators as part of their OSS/BSS solutions. But there’s been a proliferation of highly scalable ETL (Extract, Transform, Load) tools in recent years
  2. Many networks used to have proprietary interfaces that required significant expertise to integrate with. The increasing ubiquity of IP networking and common interfaces (eg SNMP and web interfaces like RESTful, JSON, SOAP, XML) to the network layer makes ETL simpler.=
  3. Massively scalable databases that don’t have as much dependency on relational integrity and can ingest data for myriad sources
  4. A proliferation of data visualisation tools that are more user-friendly instead of having to be a coder capable of writing complex SQL queries
 

Softwarisation of 5G

As you have undoubtedly noticed, 5G is generating quite a bit of buzz in telco and OSS circles.

For many it’s just an n+1 generation of mobile standards, where n is currently 4 (well, the number of recent introductions into the market mean n is probably now getting closer to 5  🙂  ).

But 5G introduces some fairly big changes from an OSS perspective. As usual with network transformations / innovations, OSS/BSS are key to operationalisation (ie monetising) the tech. This report from TM Forum suggests that more than 60% of revenues from 5G use-cases will be dependent on OSS/BSS transformation.

And this great image from the 5G PPP Architecture Working Group shows how the 5G architecture becomes a lot more software-driven than previous architectures. Interesting how all 5 “software dimensions” are the domain of our OSS/BSS isn’t it? We could replace “5G architecture” with “OSS/BSS” in the diagram below and it wouldn’t feel out of place at all.

So, you may be wondering in what ways 5G will impact our OSS/BSS:

  • Network slicing – being able to carve up the network virtually, to generate network slices that are completely different functionally, means operators will be able to offer tailored, premium service offerings to different clients. This differs from the one-size-fits-all approach being used previously. However, this means that the OSS/BSS complexity gets harder. It’s almost like you need an OSS/BSS stack for each network slice. Unless we can create massive operational efficiencies through automation, the cost to run the network will increase significantly. Definitely a no-no for the execs!!
  • Fibre deeper – since 5G will introduce increased cell density in many locations, and offer high throughput services, we’ll need to push fibre deeper into the network to support all those nano-cells, pico-cells, etc. That means an increased reliance on good outside plant (PNI – Physical Network Inventory) and workforce management (WFM) tools
  • Software defined networks, virtualisation and virtual infrastructure management (VIM) – since the networks become a lot more software-centric, that means there are more layers (and complexity) to manage.
  • Mobile Edge Compute (MEC) and virtualisation – 5G will help to serve use-cases that may need more compute at the edge of the radio network (ie base stations and cell sites). This means more cross-domain orchestration for our OSS/BSS to coordinate
  • And other use-cases where OSS/BSS will contribute including:
    • Multi-tenancy to support new business models
    • Programmability of disparate networks to create a homogenised solution (access, aggregation, core, mobile edge, satellite, IoT, cloud, etc)
    • Self-healing automations
    • Energy efficiency optimisation
    • Monitoring end-user experience
    • Zero-touch administration aspirations
    • Drone survey and augmented reality asset management
    • etc, etc

Fun times ahead for OSS transformations! I just hope we can keep up and allow the operator market to get everything it wants / needs from the possibilities of 5G.

Hello Trouble!

Hello, Trouble.
It’s been a while since we last met.
But I know you’re still out there.
And I have a feeling you’re looking for me.
You wish I’d forget ya.. Don’t ya trouble?
Perhaps it is you, that has forgotten me.
Perhaps I need to come find you.
Remind you, who I am.

Sounds like an apt mindset for working in the OSS industry doesn’t it?

 

Or for marketing knives.

I’ll be honest. I like the OSS perspective better!

 

In need of an OSS transformation translator

As OSS Architects, we have an array of elegant frameworks to call upon when designing our transformational journeys – from current state to a target state architecture.

For example, when providing data mapping, we have tools to prepare current and/or target-state data diagrams such as the following:

Source here.

These diagrams are really elegant and powerful for communicating with other data experts and delivery teams. It’s a data expert language.

Data experts are experts of the ETL (Extract, Transform, Load) process, but often have less expertise with the actual meaning and importance of the data sets themselves. For example, a data expert may know there’s a product offerings table, and each has 23 associated attributes (eg bandwidth, SLA class, etc) available. But they may have less understanding of the 245 product types that are housed in the product’s data table, and even less awareness of the meanings of the thousands of product attributes. You need to be a subject matter expert (SME) to understand that detail about the data. In some cases, the SME might be from your client and knows far more tribal knowledge than you.

We often need other SMEs (the products expert in this case) to help us understand what has to happen with the data during transformation. What do we keep, what do we change, what do we discard, etc.

Just one problem – SMEs might not always speak the same language as the data experts.

As elegant as it is, the data relationships diagram above might not be the most intuitive format for product experts to review and comment.

As with many aspects of Architecture and transformation, if we’re to understand, it’s best to communicate in our audience’s language.

In this case, it might be best to show data mappings as overlays on screenshots that the Product owner is familiar with:

  • From
    • Their current GUI
    • Existing sales order forms
    • Current report templates
  • To
    • Their next-generation GUI
    • New order forms
    • Post-Transform report templates

Such an approach might not look elegant to our data expert colleagues. The question is whether it quickly makes enough sense to the SMEs for you to elicit concise responses from them.

The “right” approach is not always the most effective.

I’d love to hear your tips, tricks and recommendations for speaking / listening in the audience’s language.

Setting a challenge for clever OSS Architects

Back in the old days, there was really only one OSS build model – via big milestone/functionality delivery. You followed a waterfall style delivery where you designed the end-solution up-front, then tried to build, test and handover to that design. The business value was delivered at the end of the project (or perhaps major phases along the way). For the large operators, there may have been multiple projects in-flight at any one time, but the value was still only delivered at the end of each project.

Some clients still follow this model, particularly if they outsource build/transformation projects to suppliers. These clients tend to be smaller and have less simultaneous change underway. OSS build/transform projects tend to be occasional for these clients.

But the larger operators now tend to undertake a constant, Agile evolution of their OSS. There are constant transitions, delivery of value at a regular cadence (eg fortnightly sprints). The operating environments (and the constraints associated with them) tend to be in dynamic flow, at an ever increasing speed. There is no project end-state. The OSS simply doesn’t stay in one state for long because change is happening every day (give or take).

For this reason, I find it interesting that Architects tend to design solutions for a particular end-state. This can be a good thing because it stops Agile projects from meandering off track incrementally. 

Unfortunately, this type of traditional solution design doesn’t fully suit modern delivery though. The traditional solution design doesn’t show the many stepping-stones that delivery teams have to implement – the multiple states required to transition from current-state to end-state. I’ve seen delivery teams being unable to determine a workable sequence of stepping stones that allow the designed end-state to be reached.

So, a challenge for the many clever Architects out there – help delivery teams out by designing the stepping stones, not just the ideal end-state. Update your design templates to describe all the intermediate states and the transition steps between them.

Include diagrams, pre-requisites, dependencies, etc as you normally do, but only as bite-sized chunks for easy consumption by delivery teams rather than even more massive documents than today.

After all, those intermediate states are only experienced briefly and then gone. Obsoleted. In fact all OSS delivery states are only transitory, so we may need to re-think our document template models. Pre-build, during-build and post-build documentation requirements are all waiting to be accommodated within a stepping-stone timeline in a new style of design template.

In most cases, it’s a greater skill to navigate an OSS journey rather than simply predict a destination.

I’d love to hear from Architects and Delivery Teams regarding how you’ve overcome this challenge at your organisation! What models do you use?

What’s in your OSS for me?

May I ask you a question?  Do the senior executives at your organisation ever USE your OSS/BSS?

I’d love to hear your answer.

My guess is that few, if any, do. Not directly anyway. They may depend on reports whose data comes from our OSS, but is that all?

Execs are ultimately responsible for signing off large budget allocations (in CAPEX and OPEX) for our OSS. But if they don’t see any tangible benefits, do the execs just see OSS as cost centres? And cost centres tend to become targets for cost reduction right?

Building on last week’s OSS Scoreboard Analogy, the senior execs are the head coaches of the team. They don’t need the transactional data our OSS are brilliant at collating (eg every network device’s health metrics). They need insights at a corporate objective level.

How can we increase the executives’s, “what’s in it for me?” ranking of the OSS/BSS we implement? We can start by considering OSS design through the lens of senior executive responsibilities:

  • Strategy / objective development
  • Strategy execution (planning and ongoing management to targets)
  • Clear communication of priorities and goals
  • Optimising productivity
  • Risk management / mitigation
  • Optimising capital allocation
  • Team development

And they are busy, so they need concise, actionable information.

Do we deliver functionality that helps with any of those responsibilities? Rarely!

Could we? Definitely!

Should we? Again, I’d love to hear your thoughts!

 

An OSS checksum

Yesterday’s post discussed two waves of decisions stemming from our increasing obsession with data collection.

“…the first wave had [arisen] because we’d almost all prefer to make data-driven decisions (ie decisions based on “proof”) rather than “gut-feel” decisions.

We’re increasingly seeing a second wave come through – to use data not just to identify trends and guide our decisions, but to drive automated actions.”

Unfortunately, the second wave has an even greater need for data correctness / quality than we’ve experienced before.

The first wave allowed for human intervention after the collection of data. That meant human logic could be applied to any unexpected anomalies that appeared.

With the second wave, we don’t have that luxury. It’s all processed by the automation. Even learning algorithms struggle with “dirty data.” Therefore, the data needs to be perfect and the automation’s algorithm needs to flawlessly cope with all expected and unexpected data sets.

Our OSS have always had a dependence on data quality so we’ve responded with sophisticated ways of reconciling and maintaining data. But the human logic buffer afforded a “less than perfect” starting point, as long as we sought to get ever-closer to the “perfection” asymptote.

Does wave 2 require us to solve the problem from a fundamentally different starting point? We have to assume perfection akin to a checksum of correctness.

Perfection isn’t something I’m very qualified at, so I’m open to hearing your ideas. 😉

 

OSS diamonds are forever (part 2)

Wednesday’s post discussed how OPEX is forever, just like the slogan for diamonds.
 
As discussed, some aspects of Operational Expenses are well known when kicking off a new OSS project (eg annual OSS license / support costs). Others can slip through the cracks – what I referred to as OPEX leakage (eg third-party software, ongoing maintenance of software customisations).
 
OPEX leakage might be an unfair phrase. If there’s a clear line of sight from the expenses to a profitable return, then it’s not leakage. If costs (of data, re-work, cloud services, applications, etc) are proliferating with no clear benefit, then the term “leakage” is probably fair.
 
I’ve seen examples of Agile and cloud implementation strategies where leakage has occurred. And even the supposedly “cheap” open-source strategies have led to surprises. OPEX leakage has caused project teams to scramble as their financial year progressed and budgets were unexpectedly being exceeded.
 
Oh, and one other observation to share that you may’ve seen examples of, particularly if you’ve worked on OSS in large organisations – Having OPEX incurred by one business unit but the benefit derived by different business units. This can cause significant problems for the people responsible for divisional budgets, even if it’s good for the business as a whole. 
 
Let me explain by example: An operations delivery team needs extralogging capability so they stand up a new open-source tool. They make customisations so that log data can be collected for all of their network types. All log data is then sent to the organisation’s cloud instance. The operations delivery team now owns lifecycle maintenance costs. However, the cost of cloud (compute and storage) and data lake licensing have now escalated but Operations doesn’t foot that bill. They’ve just handed that “forever” budgetary burden to another business unit.
 
The opposite can also be true. The costs of build and maintain might be borne by IT or ops, but the benefits in revenue or CX (customer experience) are gladly accepted by business-facing units.
 
Both types of project could give significant whole-of-company benefit. But the unit doing the funding will tend to choose projects that are less effective if it means their own business unit will derive benefit (especially if individual’s bonuses are tied to those results).
 
OSS can be powerful tools, giving and receiving benefit from many different business units. However, the more OPEX-centric OSS projects that we see today are introducing new challenges to get funded and then supported across their whole life-cycle.
 
PS. Just like diamonds bought at retail prices, there’s a risk that the financials won’t look so great a year after purchase. If that’s the case, you may have to seek justification on intangible benefits.  😉
 
PS2. Check out Robert’s insightful comment to the initial post, including the following question, “I wonder how many OSS procurements are justified on the basis of reducing the Opex only *of the current OSS*, rather than reducing the cost of achieving what the original OSS was created to do? The former is much easier to procure (but may have less benefit to the business). The latter is harder (more difficult analysis to do and change to manage, but payoff potentially much larger).”

Crossing the OSS chasm

Geoff Moore’s seminal book, “Crossing the Chasm,” described the psychological chasm between early buyers and the mainstream market.

Crossing the Chasm

Seth Godin cites Moore’s work, “Moore’s Crossing the Chasm helped marketers see that while innovation was the tool to reach the small group of early adopters and opinion leaders, it was insufficient to reach the masses. Because the masses don’t want something that’s new, they want something that works…

The lesson is simple:

– Early adopters are thrilled by the new. They seek innovation.

– Everyone else is wary of failure. They seek trust.”
 

I’d reason that almost all significant OSS buyer decisions fall into the “mainstream market” section in the diagram above.  Why? Well, an organisation might have the 15% of innovators / early-adopters conceptualising a new OSS project. However, sign-off of that project usually depends on a team of approvers / sponsors. Statistics suggest that 85% of the team is likely to exist in a mindset beyond the chasm and outweigh the 15%. 

The mainstream mindset is seeking something that works and something they can trust.

But OSS / digital transformation projects are hard to trust. They’re all complex and unique. They often fail to deliver on their promises. They’re rarely reliable or repeatable. They almost all require a leap of faith (and/or a burning platform) for the buyer’s team to proceed.

OSS sellers seek to differentiate from the 400+ other vendors (of course). How do they do this? Interestingly, by pitching their innovations and uniqueness mostly.

Do you see the gap here? The seller is pitching the left side of the chasm and the buyer cohort is on the right.

I wonder whether our infuriatingly lengthy sales cycles (often 12-18 months) could be reduced if only we could engineer our products and projects to be more mainstream, repeatable, reliable and trustworthy, whilst being less risky.

This is such a dilemma though. We desperately need to innovate, to take the industry beyond the chasm. Should we innovate by doing new stuff? Or should we do the old, important stuff in new and vastly improved ways? A bit of both??

Do we improve our products and transformations so that they can be used / performed by novices rather than designed for use by all the massive intellects that our industry seems to currently consist of?

 

 

 

 

Diamonds are Forever and so is OSS OPEX

Sourced from: www.couponraja.in

I sometimes wonder whether OPEX is underestimated when considering OSS investments, or at least some facets (sorry, awful pun there!) of it.

Cost-out (aka head-count reduction) seems to be the most prominent OSS business case justification lever. So that’s clearly not underestimated. And the move to cloud is also an OPEX play in most cases, so it’s front of mind during the procurement process too. I’m nought for two so far! Hopefully the next examples are a little more persuasive!

Large transformation projects tend to have a focus on the up-front cost of the project, rightly so. There’s also an awareness of ongoing license costs (usually 20-25% of OSS software list price per annum). Less apparent costs can be found in the exclusions / omissions. This is where third-party OPEX costs (eg database licenses, virtualisation, compute / storage, etc) can be (not) found.

That’s why you should definitely consider preparing a TCO (Total Cost of Ownership) model that includes CAPEX and OPEX that’s normalised across all options when making a buying decision.

But the more subtle OPEX leakage occurs through customisation. The more customisation from “off-the-shelf” capability, the greater the variation from baseline, the larger the ongoing costs of maintenance and upgrade. This is not just on proprietary / commercial software, but open-source products as well.

And choosing Agile almost implies ongoing customisation. One of the things about Agile is it keeps adding stuff (apps, data, functions, processes, code, etc) via OPEX. It’s stack-ranked, so it’s always the most important stuff (in theory). But because it’s incremental, it tends to be less closely scrutinised than during a CAPEX / procurement event. Unless carefully monitored, there’s a greater chance for OPEX leakage to occur.

And as we know about OPEX, like diamonds, they’re forever (ie the costs re-appear year after year).