Eomics: opinion polls

Showing posts with label opinion polls. Show all posts

Friday, 29 November 2019

Second guessing the pollsters

With polling day less than two weeks away, this curiously flat election that nobody really wants has yet to take off. The opinion polls, for what they are worth, suggest that the Conservatives will easily gain a parliamentary majority as they enjoy a lead of 13 points over Labour. The Lib Dems are struggling to make themselves relevant and the Brexit Party has virtually disappeared as a coherent political force.

It was all so different in the spring, when the Conservatives, Labour, the Lib Dems and the Brexit Party were all capturing a similar share of the vote, around 20% (chart). Whatever else you might say about Boris Johnson, he has the ability to tap into what a lot of people want to hear, and by promising to “get Brexit done” he has consigned Nigel Farage to the political sidelines by taking votes away from the limited company masquerading as a political party. Jeremy Corbyn is unable to generate any form of feelgood factor. As I have long suspected he will not be in any position to repeat his decent performance in 2017 because he has dithered on Brexit, and in the eyes of many voters he is simply untrustworthy. Meanwhile, the Liberal Democrats’ new leader Jo Swinson does not come across well with voters and it increasingly looks as though her party’s commitment to revoke Article 50 was a major tactical blunder.

Whilst the opinion polls can be wrong, it certainly looks as though the opposition parties will have their work cut out to limit the extent of the Tories’ majority. Since the headline polls have proven to be a poor guide to electoral outcomes in the recent past, the commentariat paid a lot of attention to the release this week of YouGov’s Multilevel Regression and Post-stratification (MRP) model results. This model, which correctly called a hung parliament in 2017 when predictions based on aggregate survey results indicated a large Conservative majority, suggests that on the basis of current data the Tories could win a 68 seat majority on 12 December. I briefly touched on MRP models in the wake of the 2017 election (here) but it is worth reminding ourselves of what political commentators – who would not normally care about regression models – are getting excited about.

The MRP model proceeds in two steps. First, YouGov builds a detailed description of UK local populations to determine the characteristics of each parliamentary constituency. The modellers then use YouGov survey data to determine how voting intentions are associated with individual population characteristics (e.g. how likely a person is to vote Conservative or Labour based on their education levels or their age). Combining these two pieces of information, using survey data from the preceding seven days, allows pollsters to predict voting intentions at the constituency level. It all sounds very scientific but a few points are worth noting. For one thing, a track record based on one set of observations is not very useful. As YouGov themselves note, “despite the strong performance of the method in the 2017 election, it is not magic and there are important limitations to keep in mind.” Second, it does not offer a prediction for what will happen at the election since data may change in the interim. Third, the model is only as reliable as the data input and we can never be sure whether respondents are telling the truth about their voting intentions. Finally, the sample sizes used in each constituency are very small and thus subject to significant sampling error.

Moreover, despite all the work which has gone into constructing the model, it does not generate significantly different results from a simple method which applies a uniform swing to each constituency. Whilst it would be nice to have access to all the data in order to generate an MRP model of my own, it is impossible for individuals to recreate a sample of 100,000 interviews in a short space of time. This got me thinking about whether there are other ways to generate constituency models and I report the results here, albeit subject to huge caveats.

Using Logit models to predict the election outcome

The starting point is to try and find readily available information on a constituency basis that might help us. I start from the premise that MPs who have strong local support and who recorded a solid majority last time out are more likely to be re-elected. Even if the MP is no longer standing for re-election, I assume they enjoy the benefit bequeathed by the previous incumbent. This is proxied by the size of the sitting MP’s majority relative to the total number of votes cast (or alternatively, the share of the vote achieved by the winning candidate). Since Brexit is such an important factor in this election we can also assess whether the constituency’s pro- or anti-Brexit bias is important in determining the outcome (see here for the results collated by Chris Hanretty). A final variable is the regional polling data which, although not available at the local constituency level, is assumed to be representative for each constituency in the region (e.g there are 73 constituencies in London and I assume that support for each party is broadly the same as the London average).

My model is designed to predict whether the incumbent party retains the seat at the 2019 election. To do this, I ran a series of qualitative choice models (technically akin to a Logit model with fixed effects) for each of the five main parties (Tories, Labour, Lib Dems, SNP and Plaid Cymru) across all constituencies in GB (Northern Ireland was excluded). Comparing the results for the five models across constituencies, I looked for the party with the highest probability of winning the seat. The central case forecasts gave the Conservatives between 333 and 351 seats (corresponding Labour figures: between 243 and 220). The SNP took anywhere between 37 and 44 seats in Scotland (though I reckon it could go as high as 50) whilst Plaid Cymru took between 3 and 5 Welsh seats. The model struggled to give the Lib Dems many seats at all. Even making some manual adjustments, it is difficult to see the Lib Dems picking up more than 15 seats.

How do the results compare with YouGov? The answer is pretty well. Their central scenario gives the Tories 359 seats; Labour 211; Lib Dems 13; the SNP 43 and Plaid Cymru 4. For a lot less effort (basically, some playing around with the data in a spreadsheet and a few lines of code in EViews) I can broadly replicate the results. Crucially, the evidence from both models suggests that the Tories can win an outright majority in the December election. As noted above, this is not a done deal by any means – there is a large margin of error associated with any such model. Electoral Calculus, which runs a similar model to YouGov, also looks for the Tories to win 331 seats (Labour 235) but with a range between 252 and 429 (Labour 141 to 304). You might think that is a sufficiently wide margin as to be meaningless, but they do ascribe a 63% probability to the chance of a Tory majority.

Reality does, of course, make fools of us all. But I am satisfied that my low budget modelling exercise replicates the work of the highly-paid pollsters. I can thus either get it right at a much lower cost – or can save someone a lot of money by getting it wrong for a lot less

Sunday, 11 June 2017

Why do pollsters get it wrong?

Whilst the headline writers have spent the last two days poring over the entrails of the UK general election and what it means for the future direction of economic policy, I have been wondering how the pollsters could get it so wrong – again. After all, this is the third successive UK plebiscite since 2015 in which the electoral pundits have failed to call the result – not to mention their failure to call the US presidential result. If the polls this week had been even slightly more accurate, the result would not have come as such a surprise and we would not have spent the last three days having many of the debates about Theresa May’s future.

First, however, some sense of perspective is in order. The opinion polls did a pretty good job in getting the voting shares right. The analysts at Electoral Calculus, for example, predicted that the Conservative vote share would rise from 37.8% in 2015 to 43.5% this time around whilst the Labour share would increase from 31.2% to 40%. In the event, the final vote shares were 42.4% and 40% respectively, so there was a modest overshoot on the Conservative share but they got Labour spot on. But it is a lot harder to go from there to actually predicting the election result, because the regional vote distribution matters hugely. In the UK’s first-past-the-post system, parties only need to outperform their local rivals by the tiniest of margins to win a seat (indeed, one constituency was decided by a margin of just 2 votes – 0.00478% of the votes cast). Once we start digging below the national level, the issue becomes fraught with sample size problems and the margins of error become much wider.

The opinion polls clearly narrowed over the course of the campaign. But even by the time the final polls were published on Wednesday night, the 15-poll average was still showing a Conservative lead of 6 points – down from 20 in the first half of May – with their polling share only back where it was when the election was called (43%). The central case forecast was thus not suggestive of a hung parliament. But if we apply a 5% margin of error, by adjusting down the Conservative figure and raising Labour’s polling share by this amount, although the trend does not change the extent of the lead does. Rather than a 6% margin the Conservatives went into the election with a 2% lead on this basis (see chart).

Applying this level of statistical inaccuracy in trying to predict the number of seats becomes a whole order of magnitude more difficult. In addition to Electoral Calculus (EC), I have also been tracking the results derived by the Election Forecast group (EF). EC predicted that the Conservatives would win 358 seats whilst EF’s projection was 366 (though EF’s low case scenario did predict that the Conservatives would win 318 seats – the right answer as it happens – whilst EC’s low estimate was 314). The central case predictions were thus off by more than 10%. They were even further off in their predictions for Labour with EC predicting 218 seats and EF (where the outturn was even higher than their upper limit) projecting 207. One group which did predict a hung parliament was YouGov whose “big data” model proved to be right, but their final call based on conventional survey methods was for a wider Conservative majority.

One of the reasons for the apparent failure of conventional methods was that most polling organisations discounted the evidence suggesting that younger age groups would vote Labour, and assumed that many of them would stay at home, as happened in 2015. This is an object lesson in the perils of manual adjustment – something which I do all the time when using structural macroeconometric models and more often than not, this turns out to be justified. It is always galling when the model beats your prior view, but ironically, the next time you let the model run without overwriting the results you often find you would have been justified in overriding it.

YouGov provided a non-technical summary of their Multilevel Regression and Post-stratification (MRP) model which seemed to work so well (here). It takes polling data from the preceding seven days to estimate a model relating interview date, constituency, voter demographics, past voting behaviour and other profile variables to their current voting intentions. This is used to estimate the probability that a voter with specified characteristics will vote for a particular party. Obviously, it is not infallible: It is a snapshot of intentions at the time the survey is made. In addition, the models are based on very small sample sizes so they suffer from the usual bias problems. Like all models, they are subject to significant margins of error, and as this blog post highlights they need to be treated cautiously. Indeed, I assign a huge degree of mistrust to regression models for predictive purposes because, as noted in the post, MRP “is a useful tool, but potentially misleading if used carelessly or indiscriminately.”

Ultimately, I suspect that trying to predict the detailed results of elections in the multi-media age is going to become ever harder. As information is thrown at us ever more rapidly, we will have to learn to assimilate it more quickly, and our quantitative models will have to take on board information from sources such as Twitter (already possible in statistical packages such as R). I am sure that in the course of the next week, some bright spark will ask me why I failed to get the election result right. The simple answer is because it’s hard to do, so I leave it to those with the expertise, time and resources to do it. And even they struggle, so what chance have I got?