The broken watch analogy.
When the second hand on your analog watch stops moving, you’ll quickly realise that your watch has broken. You’ll know exactly what to do next. You won’t assume that time is broken or is somehow standing still. You’ll immediately recognise that your watch needs fixing, re-winding or replacing.
A watch is an amazing piece of machinery, with a level of complexity that few of us can fully comprehend, much less be able to build ourselves. Despite this, the user interface, the watch-face, is so elegantly simple that almost anyone can immediately understand – not only the current metrics (time and date), but also its operational status (working, broken or perhaps even degraded if the time shown isn’t accurate).
Once we start dealing with more complex systems, such as multi-technology telecommunications networks, we can easily lose our comprehension / awareness of the current metrics and status of the network:
- Is it working correctly?
- Is it working within expected tolerances?
- Do we even know whether it’s working at all?
It’s important to also note, that we need to ask these same three questions not just of the network, but of each customer service that is running across the network. Furthermore, it may also make sense that we ask these questions of each of the devices and links that make up the network.
An end-user of a customer service is generally only interested in the watch-face for their specific service/s. Is the watch-face you provide them elegant enough to be able to answer the three questions above?
A network operator doesn’t have just one watch-face to monitor, but hundreds / thousands / millions. They can’t just have a screen with that many watch-faces. That’s just too inefficient to work with when there are hundreds / millions of watch-faces (although I have seen examples of OSS that do just that!!).
Instead, operators need to be able to toggle between an individual watch-face (of a single user) and an aggregated watch-face that answers the three questions above for all services, devices, sub-networks and the network holistically.
For a network operator, there are likely to be other layers of granularity too. Not just “whole of network” and “individual user” but filters that allow an operator to narrow in on network performance by:
- Time ranges
- Geographic regions
- Topology zones
- Network / vendor / domain types
- Metric types
- Customer/s
- etc
Some network operators use standard business intelligence (BI) tools to plot their watch-faces. Invariably, that leads to the problem described above, of having a screen full of dials, which are challenging to work with and make decisions upon.
Dedicated Network Performance Management (NPM) solutions, are generally designed with operators in mind. Not only do they collect metrics at big data volumes (in some cases billions of xDRs), but do so across different networks, domains and technologies (IP/MPLS, 5G, LTE, etc.).
NPMs provide operators with flexibility in the layers of granularity of data presented and the network intelligence shown. More importantly, they (hopefully) provide a user-interface that allows operational staff (and/or integrated systems) to respond to any deviations in expected behaviour, at macro or micro levels.
Just like a watch, it’s not always just the metric (time / date) or whether it accurately presents the current situation that are important. More important is how you use that information to decide what to do next.
If your current network performance management solution is not giving you an elegant way of deciding what to do next, it might be time for a new NPM solution. Don’t let your network performance monitoring solution be like a broken analog watch that only shows a seemingly accurate result twice a day.