Bank of England chief economist Andy Haldane today gave a speech
entitled Will Big Data Keep Its Promise?
in which he assessed the contribution that big data can make to improving
decision making in finance and macroeconomics. Whilst I agree that this is
indeed a subject that offers significant potential, we do have to be mindful of
the downsides associated with the data trails we leave as we live our lives in
a digital world.
In 2005 there were around 1 billion global internet users;
today there are estimated to be almost 3.5 billion. Just as important, there
has been a significant switch from the one-way flow of traffic from suppliers
to consumers, which characterised the early years of internet use, to a more
interactive medium. Today, users send around 6000 tweets, make 40,000 Google
searches and send 2 million emails every second. The capacity of text on the
internet is estimated at 1.1 zettabytes, which is approximately 305.5 billion
pages of A4 paper and which is projected to rise to 2 zettabytes by 2019 (more
than 550 billion sheets). And that is without the pictures! To take another
example, the Large Hadron Collider generates 15 petabytes of data each year,
equivalent to around 15,000 years of digital music.
Where does all this data come from? Some of it is merely the
transcription of existing data into an electronic form that makes it more
accessible. Wikipedia, for example, has helped to democratise knowledge in a way
that was previously impossible. But a lot of it has come into being as a result
of technological developments which allow the capture of much greater volumes
of information. This has been facilitated by the rise of cloud computing which
allows users to store, manage and process vast amounts of information in a
network of remote servers (ironically, this is a reversal of the trend of
recent decades which saw a shift from centralised towards local data storage).
Perhaps even more important, the rise of social media such as Twitter and
Facebook has vastly increased the volume of information pumped out (not to
mention the rise of microblogging sites in China such as Tencent or Sina
Weibo).
Clearly, a lot of this information does not yield any
valuable insight but given the vast amount of available information even a
small fraction of it is still too much for humans to reasonably digest. Even if
we only require 0.5% of the information stored online, we would still need 1.5
billion sheets of A4. The problem is compounded by the fact that we do not
necessarily know what is useful information and what can easily be discarded so
we have to scan far more than we require in order to stream out the good stuff.
As a result, much progress has been made in recent years to devise methods of
scanning large datasets in order to search for relevant information.
To the extent that knowledge is power, it stands to reason
that those with the data in the digital age are those with the power. This
raises a big question of how much control we should be prepared to give up, and
there are legal issues about who owns the information that most of us have
until now simply given away for free – something that the recent Facebook
furore brought into the open.
But whilst social media platforms contain huge amounts of
data that can be extracted at relatively little cost, and are often a useful
barometer of public opinion, they are biased towards younger, urban-dwelling
high income users. Relying on Tweets, for example, without accounting for this
bias risks repeating the classic mistake made when trying to predict the US
presidential results in 1936 and 1948, when the polling samples were skewed by
the inclusion of those picked at random from the phonebook, at a time when
telephone penetration was low.
Thus, whilst I agree with Haldane’s sentiment that “economics and finance needs to make an
on-going investment in Big Data and data analytics” we need to beware of
the headlong rush. As I wrote in a piece last year, “before too long, there will almost certainly be a spectacular miss
which will bring out the critics in droves” and it could yet be that the
Facebook problems will be a catalyst for a rethink. At the present time, much
of society is only operating in the foothills of the big data revolution. The
real trick, as former boss of Hewlett-Packard Carly Fiorina once said, will be
to turn data into information, and information into insight. We are not quite
there yet.
Beware Big Data Rush" serves as an essential cautionary guide in our data-driven age.
ReplyDeleteTv Indian Channels Its astute examination of the potential pitfalls and A timely reminder to prioritize responsibility and human values in the pursuit of innovation.