We keep shiploads of data in our OSS don’t we? Just think about how much storage your OSS estate consumes.
Technically, it doesn’t cost much (relatively) to retain all that potential for insight generation with the cost of storage diminishing. The real cost of storing the data goes a little deeper than the $/Mb though. Other cost factors include data curation, cleansing, database search performance, etc.
There’s a whole field of study relating to this, named Information Lifecycle Management (ILM), but let’s look at it in terms of relevance to OSS.
We collect information across different timescales including real-time processing, short-term correlations, longer-term trending and long-term statutory / regulatory.
Note: I suspect the “Less Archive” box actually should say “Less Active”.
Diagram above sourced from here.
But rather than blindly just storing everything, we could ask ourselves at what stage does each data sub-set lose relevance. As our OSS data ages, it can tend to deteriorate because the models it uses also deteriorate. Model deterioration factors, such as those described in this recent post about a machine-learning PoC and the following, are numerous:
- Network devices change (including cards, naming conventions used, life-cycle upgrades, capacity, new alarm types, etc)
- Network topologies change
- Business processes change
- Customer behaviours change
- Product / Service offerings change
- Regulations change
- New datasets become available
- Data model factors change to cope with gaps in original models
Each of these factors (and more) lead to deterioration in the usefulness of baseline data. This means the insight signals in the data becomes less clear, or at worst the baseline needs to be re-established, making old data invalid. If it’s invalid, then retention would appear to be pointless. Shifting it to the right through the storage types shown in the diagram above could also be pointless.
Very little of the OSS data you store is ever actually used, decreasingly so as it ages. Do you have a heatmap of what data you use in your OSS?