We are drowning in data
The possibilities of Predictive Analytics with big data is rapidly evolving. Never before has there been the volume of data that we currently have at our disposal. Not only are the size of these data sources growing exponentially, but there are also new sources made available all the time.
Note: 1 petabyte = 1,000,000,000 megabytes; Source: IBM, Domo.com, The Economist, The Guardian, IDC
It’s got to a stage now where there is so much data out there that it has become impossible to digest it all. Current estimates state that the world produces an average of 400Mb of information for every person on the planet… every single day.
Thinking about that another way, it’s like having a 95,000-page report land on your desk every morning, and everybody else’s desk.
The crucial aspect, though, is that the context of this data is so diverse it covers every imaginable industry. Perhaps the most exciting opportunity is that data consumers (machines and not people) are becoming more intelligent. Data is no longer restricted to numbers and now includes text, speech, and even imagery.
Finding the important data points
With this deluge of data comes the problem of filtering out what’s irrelevant and finding the important data points. It’s the proverbial needle in a haystack. If the cost of discovering value exceeds the benefit, then the whole point of big data is lost. We may as well send it all to the recycling bin.
Fortunately, alongside the explosion of data, we have seen enormous leaps in the technology that helps us make sense of this data. Also, current computing power and internet speeds allow us to get and process data in a time frame that was unimaginable in the past.
Recent advances in cloud computing and even on premise clusters like Hadoop with Big Insights or Cloudera allow us to process and organise data with unbelievable speed. This capability is relatively affordable for all but the smallest of firms.
There is an international technology competition that has run since the 1980’s where contestants pitch super-fast computers against each other. The test is simple; each contestant must sort as much data as possible in 60 seconds or less.
Note: 1 terabyte = 1,000 gigabytes; Source: http://sortbenchmark.org/
In 1995 the record was around 1.1 gigabytes per minute, but technology improved rapidly over the next few decades, and in 2010 the winner hit a fantastic 500 gigabytes per minute.
Furthermore, just five years later, in 2015, this record was smashed by FuxiSort, who achieved an incredible 15 terabytes per minute. That’s like being able to skim read 38,000 of those 95,000 page reports, or 3.5 billion pages, every minute.
Analytical tools take big leaps
This evolution has provided tools and opportunities to explore and analyse diverse sets of large data sets very, very quickly. We can run analyses that produce results with such speed that we can experiment and explore the fringe. It means we have data and the tools that enable us to find correlations that are not obvious to the human eye.
Modern day data analytics allow us to dig deeper into the data and find underlying sentiments, emotions, or even concepts.
We now armed with a broader range of tools and approaches that open up areas of research previously not accessible. For example, optimisation has benefited from many different fields of study:
Complex analytical models are more commonplace thanks to the efforts of the open source community.
- Biological advances in neurology have given us neural networks
- Newtons laws of gravity used to solve logistics problems
- Botany has given us heuristic models that use plant root paths to simulate optimal solutions
- Machine learning and Hyper-Heuristics comes from developments in Artificial Intelligence
- Simulated annealing has evolved from the Markov Chain Monte Carlo method and quantum mechanics
Possibly the most significant outcome has been the accessibility of these tools due to the proliferation of the open source mindset. Were it not for the determination and collective focus of the open source community, many of the tools we have today would simply not have reached consumers as an off the shelf product.
And this open source revolution has spurred on the technology leaders to provide platforms and mechanisms for collaboration and development. Because let’s face it, it’s only through collaboration that we make those big leaps.
Dynamic models become the norm
Dynamic predictive analytics made possible with the advances in Big Data and computing power.
Governments around the world are opening up their data sets to the public, as are organisations like eBay, Google, etc. Many of these public and commercial sources are available in real-time or near real-time. This provides opportunities to understand what is happening now, rather than then.
What an enormous advantage in any industry!
Connected to real-time data and dynamic in nature, models no longer need to be static. They can be live. With continually streaming data, immensely powerful machines, and numerous candidate models, exploration can co-exist alongside prediction. You can even explore the fringe of your data as you forecast where your business goes.
Add backcasting into the mix, and you quickly understand the strengths in your models and how they can be improved.
The future beckons
The availability of vast volumes of real-time external data provides us with the backbone for a richer and stronger basis for forecasting.
Not only does it give us an unlimited scope of what is achievable, but also to ask the right questions of the data.
Technology is evolving at an increasing pace. What we can achieve now is far superior to the capabilities we had even five years ago. The problem is, though, many of us struggle to keep pace with the changes. It is resulting in us rarely having time to assess how we think about our issues. Worse, we still think in terms of the technology we know and learned in the past and are unsure what this new era means to our organisations and us.
I argue that organisations and individuals need to cast off the skeuomorphic view of technology and embrace the avalanche of data with a curious mindset. To do so will mean you can thrive. Otherwise, you will only just survive.
Note: This article first presented at the Credit Suisse 2016 Technology Conference held in Sydney on 15 March 2016.