To reduce OSS dark data (or not)?

Dark data is the name for data that is collected but never used.
lt’s said that 96-98% of all data is dark data (not that I can confirm or deny those claims).

Dark data forms the bottom layer in the DIKW hierarchy below (image sourced from here).
DIKW hierarchy

What would the dark data percentage be within OSS do you think? Or more specifically, your OSS?

If you’re not going to use it, then why collect it?

I have two conflicting trains of thought here:

  • The Minimum Viable Data perspective; and
  • It’s relatively cheap and easy to collect / store raw data if an interface is already built, so hoard it all just in case your data scientists (or automated data algorithms) ever need it

Where do you sit on the data collection spectrum?

Read the Passionate About OSS Blog for more or Subscribe to the Passionate About OSS Blog by Email

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.