What is your spares policy?

You need to understand what you are buying, and why, how it will affect your business, and what the potential risks are. That detailed understanding may be beyond the scope of a procurement department.”
Owen Williams
.

It is standard practice for an OSS to have in-built resiliency, meaning that if there is a failure in one device or even one whole site then there are failover mechanisms in place to ensure operational continuity or minimal down-time.

Don’t let this high-availability mantra allow you to lose sight on your spares process though. I’ve recently been whacked by this tentacle of the OctopOSS whilst assisting an integrator to roll out a well-known vendor’s OSS to an enterprise customer.

The vendor’s OSS solution consisted of both software and customised hardware as appliances. The solution consisted of master nodes and a number of collectors spread throughout the enterprise customer’s network. Murphy’s Law teaches an integrator to expect device failures, but certainly not catastrophic failures of different types on almost every device within a 6 month period.

Exacerbating this problem was the fact that these appliances sometimes took months to replace, due to a dispute between the OSS vendor and their hardware suppliers. Exacerbating further was the end-customer’s stringent change management process, which delayed the replacement of failed devices in their data centres, on the premise that the devices had yet to be signed into production.

The moral of the story is to hold spares of all key components of your OSS, and/or contract your vendor to hold spares on your behalf as well as insisting upon a contracted response / replacement time.

It probably also pays to test the cycle-time of each of the pieces of the end-to-end puzzle. In our case this consisted of:

  1. The hardware supplier providing a replacement customised hardware platform
  2. The OSS vendor undertaking a diagnosis and determining whether it is a hardware, software or configuration fault
  3. The integrators collecting logs or data to help with the diagnosis
  4. Freight / logistics / customs entities to ship kit to sites
  5. The enterprise customer’s change management processes
  6. Not to mention the time taken for various responsibility hand-offs, inter-party discussions, approvals and potential disputes on responsibility and costs

In the not too distant future I’m sure almost all new OSS will reside on large clusters of commodity hardware, but in the meantime it would pay to understand your end-to-end spares process and plan contingencies.

Have you ever experienced similar sagas with hardware replacement that took far longer than you ever expected?

Read the Passionate About OSS Blog for more or Subscribe to the Passionate About OSS Blog by Email

Leave a Reply

Your email address will not be published. Required fields are marked *