Monday 30 April 2018

Beware the big data rush

Bank of England chief economist Andy Haldane today gave a speech entitled Will Big Data Keep Its Promise?  in which he assessed the contribution that big data can make to improving decision making in finance and macroeconomics. Whilst I agree that this is indeed a subject that offers significant potential, we do have to be mindful of the downsides associated with the data trails we leave as we live our lives in a digital world.

In 2005 there were around 1 billion global internet users; today there are estimated to be almost 3.5 billion. Just as important, there has been a significant switch from the one-way flow of traffic from suppliers to consumers, which characterised the early years of internet use, to a more interactive medium. Today, users send around 6000 tweets, make 40,000 Google searches and send 2 million emails every second. The capacity of text on the internet is estimated at 1.1 zettabytes, which is approximately 305.5 billion pages of A4 paper and which is projected to rise to 2 zettabytes by 2019 (more than 550 billion sheets). And that is without the pictures! To take another example, the Large Hadron Collider generates 15 petabytes of data each year, equivalent to around 15,000 years of digital music.

Where does all this data come from? Some of it is merely the transcription of existing data into an electronic form that makes it more accessible. Wikipedia, for example, has helped to democratise knowledge in a way that was previously impossible. But a lot of it has come into being as a result of technological developments which allow the capture of much greater volumes of information. This has been facilitated by the rise of cloud computing which allows users to store, manage and process vast amounts of information in a network of remote servers (ironically, this is a reversal of the trend of recent decades which saw a shift from centralised towards local data storage). Perhaps even more important, the rise of social media such as Twitter and Facebook has vastly increased the volume of information pumped out (not to mention the rise of microblogging sites in China such as Tencent or Sina Weibo).

Clearly, a lot of this information does not yield any valuable insight but given the vast amount of available information even a small fraction of it is still too much for humans to reasonably digest. Even if we only require 0.5% of the information stored online, we would still need 1.5 billion sheets of A4. The problem is compounded by the fact that we do not necessarily know what is useful information and what can easily be discarded so we have to scan far more than we require in order to stream out the good stuff. As a result, much progress has been made in recent years to devise methods of scanning large datasets in order to search for relevant information.

To the extent that knowledge is power, it stands to reason that those with the data in the digital age are those with the power. This raises a big question of how much control we should be prepared to give up, and there are legal issues about who owns the information that most of us have until now simply given away for free – something that the recent Facebook furore brought into the open.

But whilst social media platforms contain huge amounts of data that can be extracted at relatively little cost, and are often a useful barometer of public opinion, they are biased towards younger, urban-dwelling high income users. Relying on Tweets, for example, without accounting for this bias risks repeating the classic mistake made when trying to predict the US presidential results in 1936 and 1948, when the polling samples were skewed by the inclusion of those picked at random from the phonebook, at a time when telephone penetration was low.

Thus, whilst I agree with Haldane’s sentiment that “economics and finance needs to make an on-going investment in Big Data and data analytics” we need to beware of the headlong rush. As I wrote in a piece last year, “before too long, there will almost certainly be a spectacular miss which will bring out the critics in droves” and it could yet be that the Facebook problems will be a catalyst for a rethink. At the present time, much of society is only operating in the foothills of the big data revolution. The real trick, as former boss of Hewlett-Packard Carly Fiorina once said, will be to turn data into information, and information into insight. We are not quite there yet.

1 comment:

  1. Beware Big Data Rush" serves as an essential cautionary guide in our data-driven age.
    Tv Indian Channels Its astute examination of the potential pitfalls and A timely reminder to prioritize responsibility and human values in the pursuit of innovation.

    ReplyDelete