Data integrity law #1 – When being handled, the accuracy / integrity of a data set tends to degrade over time.
Data integrity law #2 – To prevent rule #1 from making the data unusable, the data needs to be curated.
Data integrity law #3 – Curating data always carries a cost.
Data integrity law #4 – The more data and the more referential integrity (ie cross-linking) the greater the costs.
Data integrity law #5 – If the same data is maintained in more than one place (without automated synchronisation), the faster the decay time of law #1 and the higher the cost of law #3.
Data integrity law #6 – To reduce costs and optimise integrity, retain only essential data, don’t duplicate it and keep cross-linking to a minimum.
The problem with law #6 is that it’s the cross-linking that often unearths the most dramatic insights.
[Edit: Dougie Stevenson rightly suggested a seventh data integrity rule – always use data snapshots rather than production databases to work on your data for BI purposes such as building new reports]
2 Responses
Nice. I tend to agree.
When I do BI sorts of things, reporting, etc. I want to leave the reference data alone and use snapshots to my sort of work.
In the snapshots – I consider them to be just that – SNAPSHOTS.
Anyway, Application data structures may be be the right thing for Reporting… π
Great additional advice Dougie.
Especially in highly-available systems, like we tend to use, working with an offline snapshot of data is a great rule too!!