Unexpected OSS indicators

Yesterday’s post talked about using customer contacts as a real-time proxy metric for friction in the business, which could also be a directional indicator for customer experience.

That got me wondering what other proxy metrics might be used by to provide predictive indicators of what’s happening in your network, OSS and/or BSS. Apparently, “Colt aims to enhance its service assurance capabilities by taking non-traditional data (signal strength, power, temperature, etc.) from network elements (cards, links, etc.) to predict potential faults,” according to James Crawshaw here on LightReading.

What about environmental metrics like humidity, temperature, movement, power stability/disturbance?

I’d love to hear about what proxies you use or what unexpected metrics you’ve found to have shone the spotlight on friction in your organisation.

Shooting the OSS messenger

NPS, or Net Promoter Score, has become commonly used in the telecoms industry in recent years. In effect, it is a metric that measures friction in the business. If NPS is high, the business runs more smoothly. Customers are happy with the service and want to buy more of it. They’re happy with the service so they don’t need to contact the business. If NPS is low, it’s harder to make sales and there’s the additional cost of time dealing with customer complaints, etc (until the customer goes away of course).

NPS can be easy to measure via survey, but a little more challenging as a near-real-time metric. What if we used customer contacts (via all channels such as phone, IVR, email, website, live-chat, etc) as a measure of friction? But more importantly, how does any of this relate to OSS / BSS? We’ll get to that shortly (I hope).

BSS (billing, customer relationship management, etc) and OSS (service health, network performance, etc) tend to be the final touchpoints of a workflow before reaching a customer. When the millions of workflows through a carrier are completing without customer contact, then friction is low. When there are problems, calls go up and friction / inefficiency is also going up. When there are problems, the people (or systems) dealing with the calls (eg contact centre operators) tend to start with OSS / BSS tools and then work their way back up the funnel to identify the cause of friction and attempt to resolve it.

The problem is that the OSS / BSS tools are often seen as the culprit because that’s where the issue first becomes apparent. It’s easier to log an issue against the OSS than to keep tracking back to the real source of the problem. Many times, it’s a case of shooting the messenger. Not only that, but if we’re not actually identifying the source of the problem then it becomes systemic (ie the poor customer experience perpetuates).

Maybe there’s a case for us to get better at tracking the friction caused further upstream of our OSS / BSS and to give more granular investigative tools to the call takers. Even if we do, our OSS / BSS are still the ones delivering the message.

The OSS Matrix – the blue or the red pill?

OSS Matrix
OSS tend to be very good at presenting a current moment in time – the current configuration of the network, the health of the network, the activities underway.

Some (but not all) tend to struggle to cope with other moments in time – past and future.

Most have tools that project into the future for the purpose of capacity planning, such as link saturation estimation (based on projecting forward from historical trend-lines). Predictive analytics is a current buzz-word as research attempts to predict future events and mitigate for them now.

Most also have the ability to look into the past – to look at historical logs to give an indication of what happened previously. However, historical logs can be painful and tend towards forensic analysis. We can generally see who (or what) performed an action at a precise timestamp, but it’s not so easy to correlate the surrounding context in which that action occurred. They rarely present a fully-stitched view in the OSS GUI that shows the state of everything else around it at that snapshot in time past. At least, not to the same extent that the OSS GUI can stitch and present current state together.

But the scenario that I find most interesting is for the purpose of network build / maintenance planning. Sometimes these changes occur as isolated events, but are more commonly run as projects, often with phases or milestone states. For network designers, it’s important to differentiate between assets (eg cables, trenches, joints, equipment, ports, etc) that are already in production versus assets that are proposed for installation in the future.

And naturally those states cross over at cut-in points. The proposed new branch of the network needs to connect to the existing network at some time in the future. Designers need to see available capacity now (eg spare ports), but be able to predict with confidence that capacity will still be available for them in the future. That’s where the “reserved” status comes into play, which tends to work for physical assets (eg physical ports) but can be more challenging for logical concepts like link utilisation.

In large organisations, it can be even more challenging because there’s not just one augmentation project underway, but many. In some cases, there can be dependencies where one project relies on capacity that is being stood up by other future projects.

Not all of these projects / plans will make it into production (eg funding is cut or a more optimal design option is chosen), so there is also the challenge of deprecating planned projects. Capability is required to find whether any other future projects are dependent on this deprecated future project.

It can get incredibly challenging to develop this time/space matrix in OSS. If you’re a developer of OSS, the question becomes whether you want to take the blue or red pill.

Front-loading with OSS auto-discovery

Yesterday’s post discussed the merits of front-loading effort on knowledge transfer of new starters and automated testing, whilst acknowledging the challenges that often prevent that from happening.

Today we look at the front-loading benefits of building OSS / network auto-discovery tools.

We all know that OSS are only as good as the data we seed them with. As the old saying goes, garbage in, garbage out.

Assurance / network-health data is generally collected directly from the network in near real time, typically using SNMP traps, syslog events and similar. The network is generally the data master for this assurance-style data, so it makes sense to pull data from the network wherever possible (ie bottom-up data flows).

Fulfilment data, in the form of customer orders, network designs, etc are often captured in external systems first (ie as master) and pushed into the network as configurations (ie top-down data flows).
Bottom-up and top-down OSS data flows

These two flows meet in the middle, as they both tend to rely on network inventory and/or resources. Bottom-up – Network faults to be tied to inventory / resource / topology information (eg fibre cuts, port failure, device failure, etc) which are important for fault identification (Root Cause Analysis [RCA]). Similarly for top-down – customer services / circuits / contracts tend to consume inventory, capacity and/or other resources.

Looking end-to-end and correlating network health (assurance) to customer service health (fulfilment) (eg as Service Level Agreement [SLA] analysis, Service Impact Analysis [SIA]) tends to only be possible due to reconciliation via inventory / resource data sets as linking keys.

Seeding an OSS with inventory / resource data can be done via three methods:

  1. Data migration (eg script-loading from external sources such as spreadsheet or CSV files)
  2. Manual data creation
  3. Auto-discovery (ie collection of data directly from the network)
  4. (or a combination of the above)

Options 1 and 2 are probably the more traditional method of initial seeding of OSS databases, mainly because they tend to be faster to demonstrate progress.

Option 3 is the front-loading option that can be challenging in the short-to-medium term but will hopefully prove beneficial in the longer term (just like knowledge transfer and automated testing).

It might seem easy to just suck data directly out of the network, but the devil is definitely in the detail, details such as:

  • Choosing optimal timings to poll the network without saturating it (if notification aren’t being pushed by the network), not to mention session handling of these long-running data transfers
  • Building the mediation layer to perform protocol conversion and data mappings
  • Field translation / mapping to common naming standards. This can be much more difficult than it sounds and is key to allow the assurance and fulfilment meet-in-the-middle correlations to occur (as described above)
  • Reconciliation between the data being presented by the network and what’s already in the OSS database. In theory, the data presented by the network should always be right, but there are scenarios where it’s not (eg flapping ports giving the appearance of assets being present / absent from network audit data, assets in test / maintenance mode that aren’t intended be accepted into the OSS inventory pool, lifecycle transitions from planned to built, etc)
  • Discrepancy and exception handling rules
  • All of the above make it challenging for “siloed” data sets, but perhaps even more challenging is in the discovery and auto-stitching of cross-domain data sets (eg cross-domain circuits / services / resource chains)

The vexing question arises – do you front-load and seed via auto-discovery or perform a creation / migration that requires more manual intervention? In some cases, it could even be a combination of both as some domains are easier to auto-discover than others.

From PoC to OSS sandpit

You all know I’m a fan of training operators in OSS sandpits (and as apprenticeships during the build phase) rather than a week or two of classroom training at the end of a project.

To reduce the re-work in building a sandpit environment, which will probably be a dev/test environment rather than a production environment, I like to go all the way back to the vendor selection process.
From PoC to OSS sandpit

Running a Proof of Concept (PoC) is a key element of vendor selection in my opinion. The PoC should only include a small short-list of pre-selected solutions so as to not waste time of operator or vendor / integrator. But once short-listed, the PoC should be a cut-down reflection of the customer’s context. Where feasible, it should connect to some real devices / apps (maybe lab devices / apps, possibly via a common/simple interface like SNMP). This takes some time on both sides to set up, but it shows how easily (or not) the solution can integrate with the customer’s active network, BSS, etc. It should be specifically set up to show the device types, alarm types, naming conventions, workflows, etc that fit into the customer’s specific context. That allows the customer to understand the new OSS in terms they’re familiar with.

And since the effort has been made to set up the PoC, doesn’t it make sense to make further use of it and not just throw it away? If the winning bidder then leaves the PoC environment in the hands of the customer, it becomes the sandpit to play in. The big benefit for the winning bidder is that hopefully the customer will have less “what if?” questions that distract the project team during the implementation phase. Questions can be demonstrated, even if only partially, using the sandpit environment rather than empty words.

Post Implementation Review (PIR)

Have you noticed that OSS projects need to go through extensive review to get funding of business cases? That makes sense. They tend to be a big investment after all. Many OSS projects fail, so we want to make sure this one doesn’t and we perform thorough planing / due-diligence.

But I do find it interesting that we spend less time and effort on Post Implementation Reviews (PIRs). We might do the review of the project, but do we compare with the Cost Benefit Analysis (CBA) that undoubtedly goes into each business case?

OSS Project Analysis Scales

Even more interesting is that we spend even less time and effort performing ongoing analysis of an implemented OSS
against against the CBA.

Why interesting? Well, if we took the time to figure out what has really worked, we might have better (and more persuasive) data to improve our future business cases. Not only that, but more chance to reduce the effort on the business case side of the scale compared with current approaches (as per diagrams above).

What do you think?

Would you ever alarm your lab equipment?

Something curious dawned on me the other day – I wondered how many people / organisations actively manage alarms / alerts being generated by their lab equipment?

At first glance, this would seem silly. Lab environments are in constant flux, in all sorts of semi-configured situations, and therefore likely to be alarming their heads off at different times.

As such, it would seem even sillier to send alarms, syslogs, performance counters, etc from your lab boxes through to your production alarm management platform. Events on your lab equipment simply shouldn’t be distracting your NOC teams from handling events on your active network right?

But here’s why I’m curious, in a word, DevOps. Does the DevOps model now mean that some lab equipment stays in a relatively stable state and performs near-mission-critical activities like automated regression testing?? Therefore, does it make sense to selectively monitor / manage some lab equipment?

In what other scenarios might we wish to send lab alarms to the NOC (and I’m not talking about system integration testing type scenarios, but ongoing operational scenarios)?

Using OSS machine learning to predict backwards not forwards

There’s a lot of excitement about what machine-led decisioning can introduce into the world of network operations, and rightly so. Excitement about predictions, automation, efficiency, optimisation, zero-touch assurance, etc.

There are so many use-cases that disruptors are proposing to solve using Artificial Intelligence (AI), Machine Learning (ML) and the like. I might have even been guilty of proposing a few ideas here on the PAOSS blog – ideas like closed-loop OSS learning / optimisation of common processes, the use of Robotic Process Automation (RPA) to reduce swivel-chairing and a few more.

But have you ever noticed that most of these use-cases are forward-looking? That is, using past data to look into the future (eg predictions, improvements, etc). What if we took a slightly different approach and used past data to reconcile what we’ve just done?

AI and ML models tend to rely on lots of consistent data. OSS produce or collect lots of consistent data, so we get a big tick there. The only problem is that a lot of our manually created OSS data is riddled with inconsistency, due to nuances in the way that each worker performs workflows, data entry, etc as well as our own personal biases and perceptions.

For example, zero-touch automation has significant cachet at the moment. To achieve that, we need network health indicators (eg alarms, logs, performance metrics), but also a record of interventions that have (hopefully) improved on those indicators. If we have inconsistent or erroneous trouble-ticketing information, our AI is going to struggle to figure out the right response to automate. Network operations teams tend to want to fix problems quickly, get the ticketing data populated as fast as possible and move on to the next fault, even if that means creating inconsistent ticketing data.

So, to get back to looking backwards, a machine-learning use-case to consider today is to look at what an operator has solved, compare it to past resolutions and either validate the accuracy of their resolution data or even auto-populate fields based on the operator’s actions.

Hat tip to Jay Fenton for the idea seed for this post.

OSS / BSS security getting a little cloudy

Many systems are moving beyond simple virtualization and are being run on dynamic private or even public clouds. CSPs will migrate many to hybrid clouds because of concerns about data security and regulations on where data are stored and processed.
We believe that over the next 15 years, nearly all software systems will migrate to clouds provided by third parties and be whatever cloud native becomes when it matures. They will incorporate many open-source tools and middleware packages, and may include some major open-source platforms or sub-systems (for example, the size of OpenStack or ONAP today)
.”
Dr Mark H Mortensen
in an article entitled, “BSS and OSS are moving to the cloud: Analysys Mason” on Telecom Asia.

Dr Mortensen raises a number of other points relating to cloud models for OSS and BSS in the article linked above. Included is definition of various cloud / virtualisation related terms.

He also rightly points out that many OSS / BSS vendors are seeking to move to virtualised / cloud / as-a-Service delivery models (for reasons including maintainability, scalability, repeatability and other “ilities”).

The part that I find interesting with cloud models (I’ll use the term generically) is positioning of the security control point(s). Let’s start by assuming a scenario where:

  1. The Active Network (AN) is “on-net” – the network that carries live customer traffic (the routers, switches, muxes, etc) is managed by the CSP / operator [Noting though, that these too are possibly managed as virtual entities rather than owned].
  2. The “cloud” OSS/BSS is “off-net” – some vendors will insist on their multi-tenanted OSS/BSS existing within the public cloud

The diagram below shows three separate realms:

  1. The OSS/BSS “in the cloud”
  2. The operator’s enterprise / DC realm
  3. The operator’s active network realm

as well as the Security Control Points (SCPs) between them.

OSS BSS Cloud Security Control Points

The most important consideration of this architecture is that the Active Network remains operational (ie carry customer traffic) even if the link to the DC and/or the link to the cloud is lost.

With that in mind, our second consideration is what aspects of network management need to reside within the AN realm. It’s not just the Active Network devices, but anything else that allows the AN to operate in an isolated state. This means that shared services like NTP / synch needs a presence in the AN realm (even if not of the highest stratum within the operator’s time-synch solution).

What about Element Managers (EMS) that look after the AN devices? How about collectors / probes? How about telemetry data stores? How about network health management tools like alarm and performance management? How about user access management (LDAP, AD, IAM, etc)? Do they exist in the AN or DC realm?

Then if we step up the stack a little to what I refer to as East-West OSS / BSS tools like ticket management, workforce management, even inventory management – do we collect, process, store and manage these within the DC or are we prepared to shift any of this functionality / data out to the cloud? Or do we prefer it to remain in the AN realm and ensure only AN privileged users have access?

Which OSS / BSS tools remain on-net (perhaps as private cloud) and which can (or must) be managed off-net (public cloud)?

Climb further up the stack and we get into the interesting part of cloud offerings. Not only do we potentially have the OSS/BSS (including East-West tools), but more excitingly, we could bring in services or solutions like content from external providers and bundle them with our own offerings.

We often hear about the tight security that’s expected (and offered) as part of the vendor OSS/BSS cloud solutions, but as you see, the tougher consideration for network management architects is actually the end-to-end security and where to position the security control points relative to all the pieces of the OSS/BSS stack.

A new phenomenon for IT

In the past, business-oriented groups have had ideas about what they want to do and then they come to us… Now, they want to know what technology can bring to the table and then they’ll work on the business plan.
So there’s a big gap here. It’s a phenomenon that’s been happening in the last year and it’s an uncomfortable place for IT. We’re not used to having to lead in that way. We have been more in the order-taker business.

Veenod Kurup
, Liberty Global, Group CIO.

That’s a really thought-provoking insight from Veenod isn’t it? Technology driving the business rather than business driving the technology. Technology as the business advantage.

I’ll be honest here – I never thought I’d see that day although I… guess… as e-business increases, the dependency on tech increases in lockstep. I’m passionate about tech, but also of the opinion that the tech is only a means to an end.

So if what Veenod says is reflective of a macro-trend across all industry then he’s right in saying that we’re going to have some very uncomfortable situations for some tech experts. Many will have to widen their field of view from a tech-only vision to a business vision.

Maybe instead the business-oriented groups could just come to the OSS / BSS department and speak with our valuable tripods. After all, we own and run the information and systems where business (BSS) meets technology (OSS) right? Or as previously reiterated on this blog, we have the opportunity to take the initiative and demonstrate the business value within our OSS / BSS.

PS. hat-tip to Dawn Bushaus for unearthing the quote from Veenod in her article on Inform.

Dematerialisation of OSS

In 1972, the Club of Rome in its report The Limits to Growth predicted a steadily increasing demand for material as both economies and populations grew. The report predicted that continually increasing resource demand would eventually lead to an abrupt economic collapse. Studies on material use and economic growth show instead that society is gaining the same economic growth with much less physical material required. Between 1977 and 2001, the amount of material required to meet all needs of Americans fell from 1.18 trillion pounds to 1.08 trillion pounds, even though the country’s population increased by 55 million people. Al Gore similarly noted in 1999 that since 1949, while the economy tripled, the weight of goods produced did not change.
Wikipedia on the topic of Dematerialisation.

The weight of OSS transaction volumes appears to be increasing year on year as we add more stuff to our OSS. The touchpoint explosion is amplifying this further. Luckily, our platforms / middleware, compute, networks and storage have all been scaling as well so the increased weight has not been as noticeable as it might have been (even though we’ve all worked on OSS that have been buckling under the weight of transaction volumes right?).

Does it also make sense that when there is an incremental cost per transaction (eg via the increasingly prevalent cloud or “as a service” offerings) that we pay closer attention to transaction volumes because there is a great perception of cost to us? But not for “internal” transactions where there is little perceived incremental cost?

But it’s not so much the transaction processing volumes that are the problem directly. It’s more by implication. For each additional transaction there’s the risk of a hand-off being missed or mis-mapped or slowing down overall activity processing times. For each additional transaction type, there’s additional mapping, testing and regression testing effort as well as an increased risk of things going wrong.

Do you measure transaction flow volumes across your entire OSS suite? Does it provide an indication of where efficiency optimisation (ie dematerialisation) could occur and guide your re-factoring investments? Does it guide you on process optimisation efforts?

An OSS automation mind-flip

I recently had something of a perspective-flip moment in relation to automation within the realm of OSS.

In the past, I’ve tended to tackle the automation challenge from the perspective of applying automated / scripted responses to tasks that are done manually via the OSS. But it’s dawned on me that I have it around the wrong way! It is an incremental perspective on the main objective of automations – global zero-touch networks.

If we take all of the tasks performed by all of the OSS around the globe, the number of variants is incalculable… which probably means the zero-touch problem is unsolvable (we might be able to solve for many situations, but not all).

The more solvable approach would be to develop a more homogeneous approach to network self-care / self-optimisation. In other words, the majority of the zero-touch challenge is actually handled at the equivalent of EMS level and below (I’m perhaps using out-dated self-healing terminology, but hopefully terminology that’s familiar to readers) and only cross-domain issues bubble up to OSS level.

As the diagram below describes, each layer up abstracts but connects (as described in more detail in “What an OSS shouldn’t do“). That is, each higher layer in the stack reduces the amount if information/control within a domain that it’s responsible for, but it assumes more a broader responsibility for connecting domains together.
OSS abstract and connect

The abstraction process reduces the number of self-healing variants the OSS needs to handle. But to cope with the complexity of self-caring for connected domains, we need a more homogeneous set of health information being presented up from the network.

Whereas the intent model is designed to push actions down into the lower layers with a standardised, simplified language, this would be the reverse – pushing network health knowledge up to higher layers to deal with… in a standard, consistent approach.

And BTW, unlike the pervading approach of today, I’m clearly saying that when unforeseen (or not previously experienced) scenarios appear within a domain, they’re not just kicked up to OSS, but the domains are stoic enough to deal with the situation inside the domain.

Using OSS/BSS to steer the ship

For network operators, our OSS and BSS touch most parts of the business. The network, and the services they carry, are core business so a majority of business units will be contributing to that core business. As such, our OSS and BSS provide many of the metrics used by those business units.

This is a privileged position to be in. We get to see what indicators are most important to the business, as well as the levers used to control those indicators. From this privileged position, we also get to see the aggregated impact of all these KPIs.

In your years of working on OSS / BSS, how many times have you seen key business indicators that are conflicting between business units? They generally become more apparent on cross-team projects where the objectives of one internal team directly conflict with the objectives of another internal team/s.

In theory, a KPI tree can be used to improve consistency and ensure all business units are pulling towards a common objective… [but what if, like most organisations, there are many objectives? Does that mean you have a KPI forest and the trees end up fighting for light?]

But here’s a thought… Have you ever seen an OSS/BSS suite with the ability to easily build KPI trees? I haven’t. I’ve seen thousands of standalone reports containing myriad indicators, but never a consolidated roll-up of metrics. I have seen a few products that show operational metrics rolled-up into a single dashboard, but not business metrics. They appear to have been designed to show an information hierarchy, but not necessarily with KPI trees in mind specifically.

What do you think? Does it make sense for us to offer KPI trees as base product functionality from our reporting modules? Would this functionality help our OSS/BSS add more value back into the businesses we support?

Are today’s platform benefits tomorrow’s constraints?

One of the challenges facing OSS / BSS product designers is which platform/s to tie the roadmap to.

Let’s use a couple of examples.
In the past, most outside plant (OSP) designs were done in AutoCAD, so it made sense to build OSP design tools around AutoCAD. However in making that choice, the OSP developer becomes locked into whatever product directions AutoCAD chooses in future. And if AutoCAD falls out of favour as an OSP design format, the earlier decision to align with it becomes an even bigger challenge to work around.

A newer example is for supply chain related tools to leverage Salesforce. The tools and marketplace benefits make great sense as a platform to leverage today. But it also represents a roadmap lock-in that developers hope will remain beneficial in the years / decades to come.

I haven’t seen an OSS / BSS product built around Facebook, but there are plenty of other tools that have been. Given Facebook’s recent travails, I wonder how many product developers are feeling at risk due to their reliance on the Facebook platform?

The same concept extends to the other platforms we choose – operating systems, programming languages, messaging models, data storage models, continuous development tools, etc, etc.

Anecdotally, it seems that new platforms are coming online at an ever greater rate, so the chance / risk of platform churn is surely much higher than when AutoCAD was chosen back in the 1980s – 1990s.

The question becomes how to leverage today’s platform benefits, whilst minimising dependency and allowing easy swap-out. There’s too much nuance to answer this definitively, but the one key strategy is to try to build key logic models outside the dependent platform (ie ensuring logic isn’t built into AutoCAD or similar).

Designing OSS to cope with greater transience

There are three broad models of networking in use today. The first is the adaptive model where devices exchange peer information to discover routes and destinations. This is how IP networks, including the Internet, work. The second is the static model where destinations and pathways (routes) are explicitly defined in a tabular way, and the final is the central model where destinations and routes are centrally controlled but dynamically set based on policies and conditions.”
Tom Nolle here.

OSS of decades past worked best with static networks. Services / circuits that were predominantly “nailed up” and (relatively) rarely changed after activation. This took the real-time aspect out of play and justified the significant manual effort required to establish a new service / circuit.

However, adaptive and centrally managed networks have come to dominate the landscape now. In fact, I’m currently working on an assignment where DWDM, a technology that was once largely static, is now being augmented to introduce an SDN controller and dynamic routing (at optical level no less!).

This paradigm shift changes the fundamentals of how OSS operate. Apart from the physical layer, network connectivity is now far more transient, so our tools must be able to cope with that. Not only that, but the changes are too frequent to justify the manual effort of the past.

To tie in with yesterday’s post, we are again faced with the option of abstract / generic modelling or specific modelling.

Put another way, we have to either come up with adaptive / algorithmic mechanisms to deal with that transience (the specific model), or need to mimic “nailed-up” concepts (the abstract model).

More on the implications of this tomorrow.

Which OSS tool model do you prefer – Abstract or Specific?

There’s something I’ve noticed about OSS products – they are either designed to be abstract / flexible or they are designed to cater for specific technologies / topologies.

When designed from the abstract perspective, the tools are built around generic core data models. For example, whether a virtual / logical device, a physical device, a customer premises device, a core network device, an access network device, a trasmission device, a security device, etc, they would all be lumped into a single object called a device (but with flexibility to cater for the many differences).

When designed from the specific perspective, the tools are built with exact network and/or service models in mind. That could be 5G, DWDM, physical plant, etc and when any new technology / topology comes along, a new plug-in has to be built to augment the tools.

The big advantage of the abstract approach is obvious (if they truly do have a flexible core design) – you don’t have to do any product upgrades to support a new technology / topology. You just configure the existing system to mimic the new technology / topology.

The first OSS product I worked with had a brilliant core design and was able to cope with any technology / topology that we were faced with. It often just required some lateral thinking to make the new stuff fit into the abstract data objects.

What I also noticed was that operators always wanted to customise the solution so that it became more specific. They were effectively trying to steer the product from an open / abstract / flexible tool-set to the other end of the scale. They generally paid significant amounts to achieve specificity – to mould the product to exactly what their needs were at that precise moment in time.

However, even as an OSS newbie, I found that to be really short-term thinking. A network (and the services it carries) is constantly evolving. Equipment goes end-of-life / end-of-support every few years. Useful life models are usually approx 5-7 years and capital refresh projects tend to be ongoing. Then, of course, vendors are constantly coming up with new products, features, topologies, practices, etc.

Given this constant change, I’d much rather have a set of tools that are abstract and flexible rather than specific but less malleable. More importantly, I’ll always try to ensure that any customisations should still retain the integrity of the abstract and flexible core rather than steering the product towards specificity.

How about you? What are your thoughts?

An OSS conundrum with many perspectives

Even aside from the OSS impact, it illustrates the contrast between “bottom-up” planning of networks (new card X is cheaper/has more ports) and “top down” (what do we need to change to reduce our costs/increase capacity).”
Robert Curran
.

Robert’s quote above is in response to a post called “Trickle-down impact planning.”

Robert makes a really interesting point. Adding a new card type is a relatively common event for a big network operator. It’s a relatively minor challenge for the networks team – a BAU (Business as Usual) activity in fact. But if you follow the breadcrumbs, the impact to other parts of the business can be quite significant.

Your position in your organisation possibly dictates your perspective on the alternative approaches Robert discusses above. Networks, IT, planning, operations, sales, marketing, projects/delivery, executive – all will have different impacts and a different field of view on the conundrum. This makes it an interesting problem to solve – which viewpoint is the “right” one to tackle the challenge from?

My “solutioning” background tends to align with the top down viewpoint, but today we’ll take a look at this from the perspective of how OSS can assist from either direction.

Bottom Up: In an ideal world, our OSS and associated processes would be able to identify a new card (or similar) and just ripple changes out without interface changes. The first OSS I worked on did this really well. However, it was a “single-vendor” solution so the ripples were self-contained (mostly). This is harder to control in the more typical “best-of-breed” OSS stacks of today. There are architectural mechanisms for controlling the ripples out but it’s still a significant challenge to solve. I’d love to hear from you if you’re aware of any vendors or techniques that do this really well.

Top Down: This is where things get interesting. Should top-down impact analysis even be the task of an OSS/BSS? Since it’s a common / BAU operational task, then you could argue it is. If so, how do we create OSS tools* that help with organisational impact / change / options analysis and not just network impact analysis? How do we build the tools* that can:

  1. Predict the rippling impacts
  2. Allow us to estimate the impact of each
  3. Present options (if relevant) and
  4. Provide a cost-benefit comparison to determine whether any of the options are viable for development

* When I say “tools,” this might be a product, but it could just mean a process, data extract, etc.

I have the sense that this type of functionality falls into the category of, “just because you can, doesn’t mean you should… build it into your OSS.” Have you seen an OSS/BSS with this type of impact analysis functionality built-in?

Blown away by one innovation. Now to extend on it

Our most recent two posts, from yesterday and Friday, have talked about one stunningly simple idea that helps to overcome one of OSS‘ biggest challenges – data quality. Those posts have stimulated quite a bit of dialogue and it seems there is some consensus about the cleverness of the idea.

I don’t know if the idea will change the OSS landscape (hopefully), or just continue to be a strong selling point for CROSS Network Intelligence, but it has prompted me to think a little longer about innovating around OSS‘ biggest challenges.

Our standard approach of just adding more coats of process around our problems, or building up layers of incremental improvements isn’t going to solve them any time soon (as indicated in our OSS Call for Innovation). So how?

Firstly, we have to be able to articulate the problems! If we know what they are, perhaps we can then take inspiration from the CROSS innovation to spur us into new ways of thinking?

Our biggest problem is complexity. That has infiltrated almost every aspect of our OSS. There are so many posts about identifying and resolving complexity here on PAOSS that we might skip over that one in this post.

I decided to go back to a very old post that used the Toyota 5-whys approach to identify the real cause of the problems we face in OSS [I probably should update that analysis because I have a whole bunch of additional ideas now, as I’m sure you do too… suggested improvements welcomed BTW].

What do you notice about the root-causes in that 5-whys analysis? Most of the biggest causes aren’t related to system design at all (although there are plenty of problems to fix in that space too!). CROSS has tackled the data quality root-cause, but almost all of the others are human-centric factors – change controls, availability of skilled resources, requirement / objective mis-matches, stakeholder management, etc. Yet, we always seem to see OSS as a technical problem.

How do you fix those people challenges? Ken Segal puts it this way, “When process is king, ideas will never be. It takes only common sense to recognize that the more layers you add to a process, the more watered down the final work will become.” Easier said than done, but a worthy objective!

Blown away by one innovation – a follow-up concept

Last Friday’s blog discussed how I’ve just been blown away by the most elegant OSS innovation I’ve seen in decades.

You can read more detail via the link, but the three major factors in this simple, elegant solution to data quality problems (probably OSS‘ biggest kryptonite) are:

  1. Being able to make connections that break standard object hierarchy rules; but
  2. Having the ability to mark that standard rules haven’t been followed; and
  3. Being able to uses the markers to prioritise the fixing of data at a more convenient time

It’s effectively point 2 that has me most excited. So novel, yet so obvious in hindsight. When doing data migrations in the past, I’ve used confidence flags to indicate what I can rely on and what needs further audit / remediation / cleansing. But the recent demo I saw of the CROSS product is the first time I’ve seen it built into the user interface of an OSS.

This one factor, if it spreads, has the ability to change OSS data quality in the same way that Likes (or equivalent) have changed social media by acting as markers of confidence / quality.

Think about this for a moment – what if everyone who interacts with an OSS GUI had the ability to rank their confidence in any element of data they’re touching, with a mechanism as simple as clicking a like/dislike button (or similar)?

Bad example here but let’s say field techs are given a design pack, and upon arriving at site, find that the design doesn’t match in-situ conditions (eg the fibre pairs they’re expecting to splice a customer lead-in cable to are already carrying live traffic, which they diagnose is due to data problems in an upstream distribution joint). Rather than jeopardising the customer activation window by having to spend hours/days fixing all the trickle-down effects of the distribution joint data, they just mark confidence levels in the vicinity and get the customer connected.

The aggregate of that confidence information is then used to show data quality heat maps and help remediation teams prioritise the areas that they need to work on next. It helps to identify data and process improvements using big circle and/or little circle remediation techniques.

Possibly the most important implication of the in-built ranking system is that everyone in the end-to-end flow, from order takers to designers through to coal-face operators, can better predict whether they need to cater for potential data problems.

Your thoughts?? In what scenarios do you think it could work best, or alternatively, not work?

I’ve just been blown away by the most elegant OSS innovation I’ve seen in decades

Looking back, I now consider myself extremely lucky to have worked with an amazing product on the first OSS project I worked on (all the way back in 2000). And I say amazing because the underlying data models and core product architecture are still better than any other I’ve worked with in the two decades since. The core is the most elegant, simple and powerful I’ve seen to date. Most importantly, the models were designed to cope with any technology, product or service variant that could be modelled as a hierarchy, whether physical or virtual / logical. I never found a technology that couldn’t be modelled into the core product and it required no special overlays to implement a new network model. Sadly, the company no longer exists and the product is languishing on the books of the company that bought out the assets but isn’t leveraging them.

Having been so spoilt on the first assignment, I’ve been slightly underwhelmed by the level of elegant innovation I’ve observed in OSS since. That’s possibly part of the reason for the OSS Call for Innovation published late last year. There have been many exciting innovations introduced since, but many modern tools are still more complex and complicated than they should be, for implementers and operators alike.

But during a product demo last week, I was blown away by an innovation that was so simple in concept, yet so powerful that it is probably the single most impressive innovation I’ve seen since that first OSS. Like any new elegant solution, it left me wondering why it hasn’t been thought of previously. You’re probably wondering what it is. Well first let me start by explaining the problem that it seeks to overcome.

Many inventory-based OSS rely on highly structured and hierarchical data. This is a double-edged sword. Significant inter-relationship of data increases the insight generation opportunities, but the downside is that it can be immensely challenging to get the data right (and to maintain a high-quality data state). Limited data inter-relationships make the project easier to implement, but tend to allow less rich data analyses. In particular, connectivity data (eg circuits, cables, bearers, VPNs, etc) can be a massive challenge because it requires the linking of separate silos of data, often with no linking key. In fact, the data quality problem was probably one of the most significant root-causes of the demise of my first OSS client.

Now getting back to the present. The product feature that blew me away was the first I’ve seen that allows significant inter-relationship of data (yet in a simple data model), but still copes with poor data quality. Let’s say your OSS has a hierarchical data model that comprises Location, Rack, Equipment, Card, Port (or similar) and you have to make a connection from one device’s port to another’s. In most cases, you have to build up the whole pyramid of data perfectly for each device before you can create a customer connection between them. Let’s also say that for one device you have a full pyramid of perfect data, but for the other end, you only know the location.

The simple feature is to connect a port to a location now, or any other point to point on the hierarchy (and clean up the far-end data later on if you wish). It also allows the intermediate hops on the route to be connected at any point in the hierarchy. That’s too simple right, yet most inventory tools don’t allow connections to be made between different levels of their hierarchies. For implementers, data migration / creation / cleansing gets a whole lot simpler with this approach. But what’s even more impressive is that the solution then assigns a data quality ranking to the data that’s just been created. The quality ranking is subsequently considered by tools such as circuit design / routing, impact analysis, etc. However, you’ll have noted that the data quality issue still hasn’t been fixed. That’s correct, so this product then provides the tools that show where quality rankings are lower, thus allowing remediation activities to be prioritised.

If you have an inventory data quality challenge and / or are wondering the name of this product, it’s CROSS, from the team at CROSS Network Intelligence (www.cross-ni.com).