“If you torture the data long enough, it will confess.”
A subscriber to this blog named Paul recently asked a very thought-provoking question in relation to the context of data within this earlier post. He asked whether by “context” I meant that you come at the data with a question in mind that you’re looking to solve.
My definition of “context” is slightly different, but related, yet this question is thought-provoking on many levels. Importantly it made me wrack my brain as to the different approaches I’ve used when trying to gain insight from the multitude of different data sets I’ve processed in the past.
In some cases, I definitely do tackle the data with a question in mind. Examples include when identifying why exceptions or faults have occurred, when looking for particular trends or anomalies that have existed in similar previous data sets, when investigating a hunch, when trying to better understand the internal workings of a system, or simply when trying to resolve a question posed by a colleague or customer.
But in other cases, it’s from observing the data with no pre-meditated ideas that insights evolve. Examples include analysis of outliers or rare events, analysis of the commonest events or trends, variations from standards (eg naming convention mismatches), identification of possible linking keys to join data sets, pattern or time matching that identifies root-cause or relationships, etc.
How do you approach it? Using Paul’s approach or a more zen-like approach?
Would you like me to ear-mark this question for further research on how expert data analysts approach their data?
BTW, when I was referring to “context” in the previous article, I was intending to mean a level of understanding of what the data means. For example, if you’re looking at inventory data from a particular network type, it’s important to know about the devices, chassis, cards, ports, services, configurations, etc relating to that network and not just having an understanding of the stanzas of XML that the network’s northbound interface (NBI) spits out. “Context” relates to today’s topic too because the more you understand the context of the data you’re looking at (ie the deeper your subject matter expertise), the more questions you can ask of the data presented to you.