Data integrity law #1 – When being handled, the accuracy / integrity of a data set tends to degrade over time.
Data integrity law #2 – To prevent rule #1 from making the data unusable, the data needs to be curated.
Data integrity law #3 – Curating data always carries a cost.
Data integrity law #4 – The more data and the more referential integrity (ie cross-linking) the greater the costs.
Data integrity law #5 – If the same data is maintained in more than one place (without automated synchronisation), the faster the decay time of law #1 and the higher the cost of law #3.
Data integrity law #6 – To reduce costs and optimise integrity, retain only essential data, don’t duplicate it and keep cross-linking to a minimum.
The problem with law #6 is that it’s the cross-linking that often unearths the most dramatic insights.[Edit: Dougie Stevenson rightly suggested a seventh data integrity rule – always use data snapshots rather than production databases to work on your data for BI purposes such as building new reports] Read the Passionate About OSS Blog for more or Subscribe to the Passionate About OSS Blog by Email