What is your spares policy?

“You need to understand what you are buying, and why, how it will affect your business, and what the potential risks are. That detailed understanding may be beyond the scope of a procurement department.”
Owen Williams.

It is standard practice for an OSS to have in-built resiliency, meaning that if there is a failure in one device or even one whole site then there are failover mechanisms in place to ensure operational continuity or minimal down-time.

Don’t let this high-availability mantra allow you to lose sight on your spares process though. I’ve recently been whacked by this tentacle of the OctopOSS whilst assisting an integrator to roll out a well-known vendor’s OSS to an enterprise customer.

The vendor’s OSS solution consisted of both software and customised hardware as appliances. The solution consisted of master nodes and a number of collectors spread throughout the enterprise customer’s network. Murphy’s Law teaches an integrator to expect device failures, but certainly not catastrophic failures of different types on almost every device within a 6 month period.

Exacerbating this problem was the fact that these appliances sometimes took months to replace, due to a dispute between the OSS vendor and their hardware suppliers. Exacerbating further was the end-customer’s stringent change management process, which delayed the replacement of failed devices in their data centres, on the premise that the devices had yet to be signed into production.

The moral of the story is to hold spares of all key components of your OSS, and/or contract your vendor to hold spares on your behalf as well as insisting upon a contracted response / replacement time.

It probably also pays to test the cycle-time of each of the pieces of the end-to-end puzzle. In our case this consisted of:

The hardware supplier providing a replacement customised hardware platform
The OSS vendor undertaking a diagnosis and determining whether it is a hardware, software or configuration fault
The integrators collecting logs or data to help with the diagnosis
Freight / logistics / customs entities to ship kit to sites
The enterprise customer’s change management processes
Not to mention the time taken for various responsibility hand-offs, inter-party discussions, approvals and potential disputes on responsibility and costs

In the not too distant future I’m sure almost all new OSS will reside on large clusters of commodity hardware, but in the meantime it would pay to understand your end-to-end spares process and plan contingencies.

Have you ever experienced similar sagas with hardware replacement that took far longer than you ever expected?

January 29, 2014
Ryan

If you found this article useful or valuable, subscribe (in the top-right corner of this page) and share. Let's spread the word and inspire more people to become passionate about OSS. Ryan is Passionate About OSS and has dedicated the last two decades to sharing his passion for OSS with the world. He is a founder, author, blogger, Engineer, connector and inquisitive learner about OSS and managing networks. To find out a little about his back-story and why he's so Passionate About OSS, click on the About Page. To connect with Ryan and the PAOSS team, click on the Contact page.

All Posts