On face-value, you’d think that NASA’s Mission Control (MC) would have quite a lot in common with the typical service provider’s Network Operations Centre (NOC), apart from the fact that NASA probably IS largely run by rocket scientists. Both MC and NOCs exist to operate and coordinate a multitude of different, complex systems as well as providing event / crisis management mechanisms.
However, there is one distinct difference and it manifests in customer experience in the world of OSS.
NASA is generally only managing one “payload,” which could be in the form of one vehicle for example. Mission Control is managing that vehicle and all of its supporting systems to ensure it gets into space, fulfils its objectives and then returns to earth (albeit not in all cases). All attention is on the success of the single payload. If there are any diversions from expected results (eg a system failure or a process that didn’t complete as planned), then Mission Control immediately recalibrates to overcome the fall-out.
However, in a NOC, there isn’t just a single “payload” but thousands (if not millions) of customer services running simultaneously at different stages of their life-cycles. This means that processing tends to be batched / queued and handled in bulk, often in siloes that don’t communicate beyond their respective hand-offs. The challenge for the NOC and its supporting systems is noticing any deviations from expected results. Fall-outs are much more easily missed in the throng than when focussing on a single payload.
The challenge for us in OSS is two-fold (and probably more):
- How do we build a mechanism that focuses on a single payload from end-to-end, one that tracks that payload through all the networks, systems, processes and transformations, then raises the alert if there is a fall-out? It has a single linking key that allows it to be tracked through each step on its journey (which is usually a lot harder than it sounds!!) and can raise an alert if it has deviated from its expected journey
- Due to the limitations of software, how do we guarantee that we catch all fall-out scenarios? We can cater for the expected unexpected cases (ie the failures we can envisage happening), but it’s the unexpected unexpected cases that are harder to catch (ie the failure cases we never thought possible)
Customer experience is an “it’s all about me” syndrome. As a customer, I don’t care if 99.9999999% of orders are activated successfully if mine is the one that fails. If our OSS can be realigned to monitor each individual payload end-to-end rather than the more common current approach of monitoring disparate queues as siloes and bulk hand-offs, then we’re more likely to deliver a better ratio of great customer experience…. We are aren’t we?