Mining OSS data for gold

OSS‘s reason for being is to deliver efficiency and insights (and monitoring).

If we focus in on the insights component of that, where do we look for insights? In many cases, we hone in on particular insights because we’re looking for evidence on something specific, to either prove or disprove a concept. In a way, it’s like an investigator asking a range of different questions of a suspect, from different angles, with the aim of proving their guilt or innocence.

But if looking for more general insights in the future, perhaps a machine learning future, where would we start looking at the enormous lake of data that OSS collects and curates?

Do we use the academic citation principle (ie how many citations from other papers refer to the given paper), much in the way that Google search determines the relevance of a page for any given search? To do this, do we look at the number of “hits” a certain data set has had, or do we look at the number of similar questions that have been asked of the data? Or do we go completely contrarian and look at the data that hasn’t already been mined?

To put it another way, when prospecting for insights from your OSS, do you dig for gold in the vicinity of previous gold strikes (like the tailings and mines of past prospectors), do you look where nobody else has looked before, or do you use the patterns of past successes and look for similar formations?

As an OSS consultant who has worked for many clients, it’s probably the latter (similar formations between companies – asking questions that have produced gold previously), but for an operator who knows their mining history and site well perhaps it is the former (vicinity of previous strikes).

Read the Passionate About OSS Blog for more or Subscribe to the Passionate About OSS Blog by Email

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.