My six laws of data integrity

Data integrity law #1 – When being handled, the accuracy / integrity of a data set tends to degrade over time.

Data integrity law #2 – To prevent rule #1 from making the data unusable, the data needs to be curated.

Data integrity law #3 – Curating data always carries a cost.

Data integrity law #4 – The more data and the more referential integrity (ie cross-linking) the greater the costs.

Data integrity law #5 – If the same data is maintained in more than one place (without automated synchronisation), the faster the decay time of law #1 and the higher the cost of law #3.

Data integrity law #6 – To reduce costs and optimise integrity, retain only essential data, don’t duplicate it and keep cross-linking to a minimum.

The problem with law #6 is that it’s the cross-linking that often unearths the most dramatic insights.

[Edit: Dougie Stevenson rightly suggested a seventh data integrity rule – always use data snapshots rather than production databases to work on your data for BI purposes such as building new reports] Read the Passionate About OSS Blog for more or Subscribe to the Passionate About OSS Blog by Email

2 thoughts on “My six laws of data integrity

  1. Nice. I tend to agree.

    When I do BI sorts of things, reporting, etc. I want to leave the reference data alone and use snapshots to my sort of work.

    In the snapshots – I consider them to be just that – SNAPSHOTS.

    Anyway, Application data structures may be be the right thing for Reporting… 😉

  2. Great additional advice Dougie.
    Especially in highly-available systems, like we tend to use, working with an offline snapshot of data is a great rule too!!

Leave a Reply

Your email address will not be published. Required fields are marked *