Down with big data?

The amount of data that enterprises are storing and managing is growing rapidly – various industry estimates indicate that data volume is doubling every 2-3 years. The rapid growth of data presents daunting challenges for IT, both in cost and performance.
Although the cost of storage keeps declining, fast-growing data volumes make storage one of the costliest elements of most IT budgets. In addition, the accelerating growth of data makes it difficult to meet performance requirements while staying within budget.
Information Lifecycle Management (ILM) is intended to address these challenges by storing data in different storage and compression tiers, according to the enterprise’s current business and performance needs. This approach offers the possibility of optimizing storage for both cost savings and maximum performance
From a data optimisation white paper by Oracle.

I recently heard that the typical organisation uses 0.05% of the data it collects. I haven’t been able to find the research that backs this up, but let’s assume that this is correct. This implies that 99.95% of the data that is collected, stored and (perhaps) curated is never utilised.

As stated by wikipedia, “The Efficiency Movement was a major movement in the United States, Britain and other industrial nations in the early 20th century that sought to identify and eliminate waste in all areas of the economy and society, and to develop and implement best practices. The concept covered mechanical, economic, social, and personal improvement. The quest for efficiency promised effective, dynamic management rewarded by growth. As a result of the influence of an early proponent, it is more often known as Taylorism [after Frederick Winslow Taylor].”

But that was the Industrial Age. Do you think there are parallels to be drawn in the Information Age? Are we awaiting the Data Efficiency Movement to eliminate wastage within the digital businesses of today? If so, we’re starting from a very low base from which to improve (ie 0.05%) aren’t we?

But what does this mean for OSS? We do spend a lot of time and money collecting data with our OSS. How many of us know how much of the collected data is actually used? Our database administrators probably have the tools to indicate the “heat” of each data set stored in our databases (as discussed in the Oracle article).

Like the Efficiency Movement, we could use that knowledge to reduce costs by minimising the data we collect and manage. An earlier post on Minimum Viable Data (MVD) discusses some of the opportunities that exist. However, unlike the Efficiency Movement of the twentieth century, we don’t necessarily want to discard all that waste. Conversely, with modern-day big data tools (and perhaps more advanced machine learning in the future), we now have the means to do more with the remaining 99.95%. That could be a very profitable 99.95% if properly harnessed, perhaps by the OSS gold mining technique?

Read the Passionate About OSS Blog for more or Subscribe to the Passionate About OSS Blog by Email

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.