Don’t miss the latest developments in business and finance.

Big data, bigger aspiration, little expertise

People from every sector - be they in industry, sports, health care, or national policymaking - are now obsessed with big data and aspire to use them in every bit of lifestyle

Big data, bigger aspiration, little expertise
Atanu Biswas
Last Updated : Sep 03 2018 | 11:40 PM IST
Barack Obama might have a reason for special enthusiasm over big data, as the 2012 US presidential election was perhaps the first major electoral battle in the world which was dominated by big data analytics more than anything else. The Obama campaign spent $1.5 billion towards, among others, 66,000 computer simulations every day, and the rest is history. However, Hillary Clinton might have a completely different experience with big data, as her supporters are still striving for the truth how her big data analytics-based campaign, using an algorithm called ‘Ada’, in the 2016 presidential election failed to defeat Trump’s campaign, which was apparently more atmospheric than scientific. Thus, big data analytics might not be the Aladdin's lamp!
 
Big data is perhaps the biggest hype in the recent years. People from every sector — be they in industry, sports, health care, or national policymaking — are now obsessed with big data and aspire to use them in every bit of lifestyle.
 
Big data comprises lots of variables, spooled with loads of data — collecting data from any possible source is a fashion nowadays, quite often without having any idea about what to do or what can be done with them. And quite often we do not know how to analyse that data having so many variables with possible complicated and unknown relationships among many of them. The number of possible pairs showing significant correlation increases in the order of the 'square of the number of variables'. Even ‘independent’ pairs of variables might exhibit high correlation; eg divorce rate in Maine, US, during 2000-2009 nicely correlates with the per capita consumption of margarine in these years. The number of such occurrences of ‘spurious’ or ‘nonsense’ correlations also increases in the order of the 'square of the number of variables'. More than five years back, Nassim Nicholas Taleb, the author of the bestseller book, The Black Swan: The Impact of the Highly Improbable, illustrated through a simulation exercise that with 500 'independent' variables, the number of 'significant' spurious correlations is nearly 6,000, whereas this number grows to 140,000 for 2,500 'independent' variables! Certainly, correlation does not imply causation, but in real life, it is almost impossible to identify these 'spurious' ones among millions of correlations involving thousands of variables.
 
With the ever-expanding horizon of the Internet of Things (IoT), big data is continuously becoming bigger. The growth of data is exponential — the size of the digital universe will be doubled every two years beyond 2020. And we do not know how to leverage that volume of data, for we have neither the statistical expertise of handling thousands of variables and eliminating 'spurious' correlations nor the suitable computational algorithms and equipment to handle billions of data points. Even if algorithms are available, standard computers are inadequate to handle this gigantic volume of data.
 
However, the ocean of big data contains limitless possibilities, and the aspiration to extract knowledge from the heartbeats of big data is also huge. The problem is that the present technology and expertise is still primitive. Let’s be honest to admit that. Still, our adventure might become successful in some particular cases with special prior knowledge and special expertise in that topic, and of course, by using ‘instinct’ effectively, but certainly not in general. That's why I'm very sceptical about running routine software packages for analysing big data; we need to develop the required tools very carefully instead, in a case-by-case way. And that's a time-consuming research exercise which can only be performed by top statisticians and computer scientists, together.
 
Some success stories of big data are of course there. In their 2003 book Scoring Points: How Tesco Continues to Win Customer Loyalty, Clive Humby, Terry Hunt and Tim Phillips discussed how the UK-based grocer Tesco fueled rapid growth by analysing data of customer purchase behaviour. Today we have an unprecedented ability to collect and store data. But, we should always be very careful in monitoring infrastructure to understand individuals' life pathways from loads of data.
 
In May 2017, Cisco reported only 26 per cent of survey respondents were successful with IoT initiatives, indicating a 74 per cent failure rate. In November 2017, Gartner analyst Nick Heudecker inferred that about 85 per cent of the big data projects fails. My personal belief is that the actual failure percentage is even more, as 'success' is not well-defined in most of the situations dealing with big data, making it difficult to gauge the quantum of failures, or even to understand a 'failure'. When an organisation is happy with the apparent 'success' of the strategy framed by big data analytics, they fail to understand what more could have been done, unless the endeavour collapses like the Google Flu Trends experiment. Also, there is serious doubt about data quality in most of the cases -- according to a Harvard Business Review article of September 2017, only 3 per cent of companies’ data meets basic quality standards
 
In the 1958 Hollywood movie, The Blob, a meteorite landed in a small Pennsylvania town carrying an alien amoeba, which expanded and swallowed up people and structures, threatening to envelop the whole town. Today's 'big data' sometimes resembles that amoeba, which devours everything. In the process, big data is getting bigger, so are our aspirations. However, our capacity to handle data did not grow proportionately. In the six-decade-old movie, finally, the air force had to swoop in and airlift the amoeba to the Arctic. Well, is that the appropriate way to stop 'Blob' until one gets equipped to handle it? The writer is professor of statistics at the Indian Statistical Institute, Kolkata


Next Story