A recurring impediment to successful implementation of AIOps: And 1 way to treat it

With our latest report, “AIOps of the Future: A Definitive Guide” tentatively penciled in for launch in 2 weeks from today, I thought I’d share a concept that kept coming through loud and clear from AIOps vendors and buyers alike – Get the data right! Especially cross-domain data!

This has always been true in the world of OSS (how many OSS data death spirals have you witnessed?), but even more so in an AI-driven OSS of the future. Whereas humans can (arguably) help to smooth over discrepancies or inconsistencies in data and still proceed, an algorithm either makes incorrect inferences or needs to be heavily programmed / trained to accommodate data quality problems. If you’ve had problems with your data sources in the past, then you’ll probably want to initiate data fix programs before considering how AI / ML / algorithms can solve all your other problems.

And this reminds me of a potential solution to this problem (one solution of too many to count!!).

It was around 2.5 years ago when we first proposed the idea of a DOC (Data Operations Centre) (probably not an ideal acronym I must admit!!), a systematic approach to resolving every identified data fault, in much the same way we do with a NOC / SOC for resolving network / service faults. I haven’t implemented a DOC with any clients in the ensuing period. However, we currently have 3 separate clients who share my belief that a novel approach is required for systematically improving data integrity and are putting some of the building blocks in place to stand up a DOC. Exciting times.

As soon as any of our clients get a DOC up and running, we’ll be sure to inform you of the wins, losses and learnings over the journey.

Some of the main features of a DOC are as follows:

  • Data quality issues should be treated as data faults in much the same way as network / service faults
  • They need to be treated individually, as each unique data point, not just as a collective to apply an algorithm to (although like network faults, we may choose to aggregate unique data faults and treat them as a collective)
  • Each data fault needs to be managed systematically (eg itemised, acknowledged, actioned, possibly assigned remediation workflows, repaired and closed)
  • There is an urgency around the fix of each data fault, just like network faults. People who experience the data fault may expect time-based data-fix SLAs to apply. Firstly, so they can perform their actions with greater confidence / reliability. Secondly, so the data faults don’t ripple out and cause contagion
  • There is a known contact point (eg phone number, drop-box, etc) for the DOC, so anyone who experiences a data issue knows how to log a fault. By comparison, in many organisations, if a field worker notes a discrepancy between their design pack and the real situation in the field, they just work around the problem and leave site without fixing the data fault/s. They invariably have no mechanism for providing feedback. The data problem continues to exist and will cause problems for the next field tech who comes to the same site. Note that there may also be algorithms / rules generating faults, not just humans
  • There are notifications upon closure and/or fix of a data fault (if needed), which helps to identify patterns and automated or process-driven resolutions
  • We provide the DOC with fault management tools, like the ITSM tools we use to monitor and manage IT or network faults, but for managing data faults. It’s possible that we could even use our standard incident management tools, but with customisation to handle data type faults

I’d love to hear from you if you’ve already introduced a DOC or something similar. Has it proved fruitful for you? Are there any lessons learned that you’d like to share?

I’d also love to hear your thoughts about an alternative acronym we should consider using as clearly the DOC nomenclature could cause problems (and maybe even litigation!) 🙂

If this article was helpful, subscribe to the Passionate About OSS Blog to get each new post sent directly to your inbox. 100% free of charge and free of spam.

Our Solutions

Share:

Most Recent Articles

A million words about OSS

Whilst setting up for another new initiative this week I became aware that the PAOSS blog has just ticked past 1 million words.  And that’s

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.