My six laws of data integrity

Data integrity law #1 – When being handled, the accuracy / integrity of a data set tends to degrade over time.

Data integrity law #2 – To prevent rule #1 from making the data unusable, the data needs to be curated.

Data integrity law #3 – Curating data always carries a cost.

Data integrity law #4 – The more data and the more referential integrity (ie cross-linking) the greater the costs.

Data integrity law #5 – If the same data is maintained in more than one place (without automated synchronisation), the faster the decay time of law #1 and the higher the cost of law #3.

Data integrity law #6 – To reduce costs and optimise integrity, retain only essential data, don’t duplicate it and keep cross-linking to a minimum.

The problem with law #6 is that it’s the cross-linking that often unearths the most dramatic insights.

[Edit: Dougie Stevenson rightly suggested a seventh data integrity rule – always use data snapshots rather than production databases to work on your data for BI purposes such as building new reports]

If this article was helpful, subscribe to the Passionate About OSS Blog to get each new post sent directly to your inbox. 100% free of charge and free of spam.

Our Solutions


Most Recent Articles

2 Responses

  1. Nice. I tend to agree.

    When I do BI sorts of things, reporting, etc. I want to leave the reference data alone and use snapshots to my sort of work.

    In the snapshots – I consider them to be just that – SNAPSHOTS.

    Anyway, Application data structures may be be the right thing for Reporting… 😉

  2. Great additional advice Dougie.
    Especially in highly-available systems, like we tend to use, working with an offline snapshot of data is a great rule too!!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.