Showing posts with label Logit models. Show all posts
Showing posts with label Logit models. Show all posts

Friday, 29 November 2019

Second guessing the pollsters

With polling day less than two weeks away, this curiously flat election that nobody really wants has yet to take off. The opinion polls, for what they are worth, suggest that the Conservatives will easily gain a parliamentary majority as they enjoy a lead of 13 points over Labour. The Lib Dems are struggling to make themselves relevant and the Brexit Party has virtually disappeared as a coherent political force.

It was all so different in the spring, when the Conservatives, Labour, the Lib Dems and the Brexit Party were all capturing a similar share of the vote, around 20% (chart). Whatever else you might say about Boris Johnson, he has the ability to tap into what a lot of people want to hear, and by promising to “get Brexit done” he has consigned Nigel Farage to the political sidelines by taking votes away from the limited company masquerading as a political party. Jeremy Corbyn is unable to generate any form of feelgood factor. As I have long suspected he will not be in any position to repeat his decent performance in 2017 because he has dithered on Brexit, and in the eyes of many voters he is simply untrustworthy. Meanwhile, the Liberal Democrats’ new leader Jo Swinson does not come across well with voters and it increasingly looks as though her party’s commitment to revoke Article 50 was a major tactical blunder.

Whilst the opinion polls can be wrong, it certainly looks as though the opposition parties will have their work cut out to limit the extent of the Tories’ majority. Since the headline polls have proven to be a poor guide to electoral outcomes in the recent past, the commentariat paid a  lot of attention to the release this week of YouGov’s Multilevel Regression and Post-stratification (MRP) model results. This model, which correctly called a hung parliament in 2017 when predictions based on aggregate survey results indicated a large Conservative majority, suggests that on the basis of current data the Tories could win a 68 seat majority on 12 December. I briefly touched on MRP models in the wake of the 2017 election (here) but it is worth reminding ourselves of what political commentators – who would not normally care about regression models – are getting excited about.

The MRP model proceeds in two steps. First, YouGov builds a detailed description of UK local populations to determine the characteristics of each parliamentary constituency. The modellers then use YouGov survey data to determine how voting intentions are associated with individual population characteristics (e.g. how likely a person is to vote Conservative or Labour based on their education levels or their age). Combining these two pieces of information, using survey data from the preceding seven days, allows pollsters to predict voting intentions at the constituency level. It all sounds very scientific but a few points are worth noting. For one thing, a track record based on one set of observations is not very useful. As YouGov themselves note, “despite the strong performance of the method in the 2017 election, it is not magic and there are important limitations to keep in mind.” Second, it does not offer a prediction for what will happen at the election since data may change in the interim. Third, the model is only as reliable as the data input and we can never be sure whether respondents are telling the truth about their voting intentions. Finally, the sample sizes used in each constituency are very small and thus subject to significant sampling error.

Moreover, despite all the work which has gone into constructing the model, it does not generate significantly different results from a simple method which applies a uniform swing to each constituency. Whilst it would be nice to have access to all the data in order to generate an MRP model of my own, it is impossible for individuals to recreate a sample of 100,000 interviews in a short space of time. This got me thinking about whether there are other ways to generate constituency models and I report the results here, albeit subject to huge caveats.

Using Logit models to predict the election outcome

The starting point is to try and find readily available information on a constituency basis that might help us. I start from the premise that MPs who have strong local support and who recorded a solid majority last time out are more likely to be re-elected. Even if the MP is no longer standing for re-election, I assume they enjoy the benefit bequeathed by the previous incumbent. This is proxied by the size of the sitting MP’s majority relative to the total number of votes cast (or alternatively, the share of the vote achieved by the winning candidate). Since Brexit is such an important factor in this election we can also assess whether the constituency’s pro- or anti-Brexit bias is important in determining the outcome (see here for the results collated by Chris Hanretty). A final variable is the regional polling data which, although not available at the local constituency level, is assumed to be representative for each constituency in the region (e.g there are 73 constituencies in London and I assume that support for each party is broadly the same as the London average).

My model is designed to predict whether the incumbent party retains the seat at the 2019 election. To do this, I ran a series of qualitative choice models (technically akin to a Logit model with fixed effects) for each of the five main parties (Tories, Labour, Lib Dems, SNP and Plaid Cymru) across all constituencies in GB (Northern Ireland was excluded). Comparing the results for the five models across constituencies, I looked for the party with the highest probability of winning the seat. The central case forecasts gave the Conservatives between 333 and 351 seats (corresponding Labour figures: between 243 and 220). The SNP took anywhere between 37 and 44 seats in Scotland (though I reckon it could go as high as 50) whilst Plaid Cymru took between 3 and 5 Welsh seats. The model struggled to give the Lib Dems many seats at all. Even making some manual adjustments, it is difficult to see the Lib Dems picking up more than 15 seats.

How do the results compare with YouGov? The answer is pretty well. Their central scenario gives the Tories 359 seats; Labour 211; Lib Dems 13; the SNP 43 and Plaid Cymru 4. For a lot less effort (basically, some playing around with the data in a spreadsheet and a few lines of code in EViews) I can broadly replicate the results. Crucially, the evidence from both models suggests that the Tories can win an outright majority in the December election. As noted above, this is not a done deal by any means – there is a large margin of error associated with any such model. Electoral Calculus, which runs a similar model to YouGov, also looks for the Tories to win 331 seats (Labour 235) but with a range between 252 and 429 (Labour 141 to 304). You might think that is a sufficiently wide margin as to be meaningless, but they do ascribe a 63% probability to the chance of a Tory majority.

Reality does, of course, make fools of us all. But I am satisfied that my low budget modelling exercise replicates the work of the highly-paid pollsters. I can thus either get it right at a much lower cost – or can save someone a lot of money by getting it wrong for a lot less