“We are made out of oppositions; we live between two poles. There is a philistine and an aesthete in all of us, and a murderer and a saint. You don’t reconcile the poles. You just recognize them.”
Orson Welles.
One of the holy grails of OSS – automation – has many forms. Whether it is in flow-through provisioning, self-healing networks, advanced event filtering, etc. They are all huge achievements by the implementation and product teams that are able to make them happen. Bravo!
Just one small word of warning though. Don’t be lulled into a false sense of security that these automations have solved your OSS problems for all of eternity.
I strongly recommend building in processes and / or tools that perform scheduled reconciliation against your automations to make sure they are capturing all circumstances correctly.
For example, once your up-front testing has revealed that no exceptions are slipping through the cracks in your shiny new automation, you may wish to perform a reconciliation daily, or perhaps weekly or monthly thereafter.
You may wonder why?
Automations tend to rely on many things being in alignment (eg naming conventions, specific events, strict data structuring / formatting, etc). Over time, these have a tendency to change, often imperceptibly. It might be an upgrade to firmware on a device, software upgrades on a probe, a new release of OSS software, a refinement to a process, changes to (or non-adherence to) naming conventions, network topology changes, failover state-change, ultra-rare scenarios, etc. Each of these items can cause exceptions that your automation hasn’t been designed to cope with.
As such, you always need to design a way of comparing raw variables (eg number of raw events) with post-automation variables (eg number of events processed) to ensure that any exceptions have been properly captured by an exception handling process. It might even require a lengthy manual audit, giving back some of the hours gained by the efficiency improvements of your automation.