Sunday, 13 October 2024

The table doesn't lie (and other football tales)

It is quite some time since I last made a foray into any football related matters but I was motivated to revisit the application of statistical techniques to predict match outcomes after reading Ian Graham’s fascinating book How to Win The Premier League. Graham was previously head of data analytics at Liverpool FC and he described in detail how the collection of data and its use in tracking player performance and value made a significant contribution to Liverpool’s resurgence as a footballing force under Jürgen Klopp. This culminated in the club winning the Champions League in 2019, and in 2020 it secured its first league title in 30 years.

One area of the book that piqued my interest was the discussion of how a couple of academics[1] in the 1990s set out to assess whether it was possible to predict the outcome of football matches in order to beat the odds offered by bookmakers (answer: yes, but only in certain circumstances). The so-called Dixon-Coles model (no relation) used a Poisson distribution to estimate the number of goals scored and conceded by each team in a head-to-head match. This was particularly interesting to me because I used a similar approach when trying to assess who would win the World Cup in 2022 and I was motivated to see whether I could improve the code to extend the analysis to Premier League football. You may ask why an economist would want to do that. One reason is that it satisfies my long-standing interest in statistics and growing interest in data science methods, and with time on my hands during the summer holiday, why not? It also provided an opportunity to use ChatGPT to fill in any gaps in my coding knowledge in order to assess whether it really is the game changer for the coding industry that people say (my experience in this regard was very positive).

How can we model match outcomes?

Turning to our problem, the number of goals scored by each team is a discrete number (i.e. we cannot have fractions of goals) so we have to use a discrete probability distribution and it turns out that the Poisson distribution does this job rather well. As Ian Graham put it, “the Poisson distribution, also known as the ‘Law of Small Numbers’, governs the statistics of rare events”. And when you think about it, goals are rare events in the context of all the action that takes place on a football pitch. In aggregate the number of goals scored across all teams does indeed follow a Poisson distribution (see above). However, since the sample size for each team is much smaller (each team plays only 19 home and 19 away games), this increases the likelihood of deviations from the expected average.

That caveat aside, following the literature, if a variable follows the Poisson distribution, it is defined by a probability mass function as set out below:

In this instance we are trying to find the value of k given that we know λ. But what do we know about λ – the expected number of goals each team scores per game? One piece of information is that teams will generally score more goals per game at home than in an away match, and typically concede fewer goals at home (and vice versa). We also know that the number of goals each team scores depends on the quality of the team’s attacking players as well as the quality of the opposition defence. Intuitively, it makes sense to define λ as a weighted average of the expected goals scored (a proxy for quality of a team’s attack) and the expected number of goals conceded by the opposition as an indication of their defensive qualities. I assumed weights of 0.5 for each. We calculate this for home games and away games, using the previous season’s average as a starting point. As a result, for each team, λ can take one of four values, depending on expected goals scored home and away, and the opposition’s defensive performance home and away.

By simulating each game 1000 times we can derive an estimate of what the average result is likely to be and construct the expected league rankings. To add a little spice, we can adjust λ by adding a random number in the range [-n, n, 0<n<1] in a bid to capture the element of luck inherent in any sporting contest on the basis that over the course of a number of games, this element averages out to zero. One thing to take into account is that the performance of promoted teams will not be so strong in the Premier League compared to the Championship, with the evidence suggesting that promoted teams score around 0.7 goals per game fewer and concede 0.7 goals per game more. We thus amend the λ value for promoted teams accordingly. It is possible to add a number of other tweaks[2] but for the purposes of this exercise, this is a good starting point.We construct λ as follows: 

How well does the model predict outcomes?

I ran the analysis over the past six seasons to assess the usefulness of the model and measured each team’s expected league position against outturn. On average, the model predicts to within three places the expected league position with 64% accuracy. That’s not bad although I would have hoped for a bit better. 

But excluding the Covid-impacted season 2019-20, when the absence of crowds had a significant impact on expected results, we can raise that slightly to 66%. But ironically, it is season 2022-23 which imparts a serious downward bias to results – excluding that season raises the accuracy rate to 69%. A popular cliche associated with football is that the table doesn't lie. What we have discovered here is either that it does, or the model needs a bit of improvement and at some point I will try and incorporate a dynamic estimate of λ to give a higher weight to more recent performance, but that is a topic for another day.

A model which predicts the past is all very well, but what about the future? Looking at expected outcomes for season 2024-25, prior to the start of the season, the model predicted that Manchester City would win the league. This is hardly a controversial choice – it has won in six of the last seven years – but in contrast to previous years, the model assigned only a 48% probability to a City title win, compared to an average of 74% over the preceding six seasons. Arsenal ran them close with a 45% chance of winning the league. At the other end of the table, Southampton, Nottingham Forest and Ipswich were the primary relegation candidates (probabilities of 38%, 39% and 67% respectively). Running the analysis to include data through last weekend, covering the first seven games of the season, it is still nip and tuck at the top with Manchester City and Arsenal maintaining a 48% and 44% probability, respectively, of winning the title. But at the bottom, Wolverhampton, Ipswich and Southampton are the three relegation candidates with assigned probabilities of 58%, 70% and 72% respectively.


What is the point of it all?

As an economist who spends time looking at macro trends, it may seem difficult to justify such an apparently trivial pursuit as predicting football outcomes. But there are some lessons that we can carry over into the field of macro and market forecasting. In the first instance, the outcomes of football matches involve inherent uncertainty and the application of stochastic simulation techniques such as those applied here can equally be used to account for randomness and uncertainty in economic and financial predictions. Indeed they are often used in the construction of error bands around forecast outcomes. A second application is to demonstrate that probabilistic forecasting is a viable statistical technique to use when faced with a priori uncertainty. We do not know the outcome of a football match in advance any more than we know what will be the closing price of the stock market on any given day. But simulating past performance can give us a guide as to the possible range of outcomes. A third justification is to demonstrate the impact of scenario analysis: by changing model parameters we are able to generate different outcomes and with modern computing techniques rendering the cost of such analysis to be trivially small, it is possible to run a huge number of different alternatives to assess model sensitivities.

Forecasting often throws up surprises and we will never be right 100% of the time. If you can come up with a system that predicts the outcome significantly better than 50% of the time, you are on the right track. Sometimes the outlandish does happen and I am hoping that the model’s current prediction that Newcastle United can finish fourth in the Premiership (probability: 37%) will be one of them.



[1] Dixon, M. J. and S. G. Coles (1997) ‘Modelling Association Football Scores and Inefficiencies in the Football Betting Market’ Journal of the Royal Statistical Society Series C: Applied Statistics, Volume 46, Issue 2, 265–280

[2] For example, if we want to model the overall points outcome, a better approximation to historical outturns is derived if we adjust λ by a multiplicative constant designed to reduce it relative to the mean for teams where it is very high and raise it for those where it is low. This is a way of accounting for the fact that a team which performed well in the previous season may not perform so well in the current season, and teams which performed badly may do rather better. This does not have a material impact on the ranking but does reduce the dispersion of points between the highest and lowest ranking teams in line with realised performance. Another tweak is to adjust the random variable to follow a Gaussian distribution rather than the standard uniform assumption.

Sunday, 7 July 2024

Changing of the guard

For once the pollsters got it right. Unlike in 2017 and 2019 when they predicted, respectively, a handsome and narrow Conservative victory, the 2024 election produced the landslide Labour win that was long expected. Back in 2021 it did not look likely that Keir Starmer would be leading a government in Downing Street, particularly after Labour lost one of their safest seats in a by-election. Not for the first time, I was proved wrong, but at least it offers a chance for a policy reset after years of fractious governance.

Labour’s position less secure than it looks

The 2024 result was a rejection of the Conservatives rather than a ringing endorsement of Labour. Indeed, while it won an overwhelming majority of seats, the roots of Labour’s win were not deep. A large parliamentary majority, which gave them two-thirds of the seats, was achieved with just one-third of the votes on a very low turnout slightly below 60%. It is not the lowest turnout of recent times – that occurred in 2001 when it dropped to 59.4% – but things are different today. In 2001, the electorate voted for an incumbent government and was expecting more of the same. In 2024, however, the electorate is voting for change and it matters whether they are simply voting against the previous government as a protest or in favour of the alternatives on offer.

Without wishing to strike a discordant note after one of the least popular governments of modern times has been banished into history, the narrow foundations of Labour’s win do matter. Although Labour appears to have a strong mandate, which many advocate as a reason to set out bold policy prescriptions, unpopular measures will simply encourage those who sat out the election last week to vote against them next time around. And there is no guarantee that Reform UK and the Conservatives will split the vote as they did on 4 July. Indeed, the combined vote of the Tories and Reform UK was larger than that of Labour.

This makes it all the more imperative that Starmer’s government gets the big things right quickly. Making voters lives better is the one thing that will raise the chances of a second term in office – a second term that will undoubtedly be required to properly fix many of things in the economy that require improvement. At least the new government is comprised of members that share the experiences of the people they represent. For example, only 4% of the cabinet was educated at a private school vs. 63% of the previous one. If accusations of being out of touch plagued the Conservatives, it is not an accusation we can so easily level at Labour.

What next for the Conservatives?

After a chastening defeat, which produced the worst result by the Conservatives since their foundation in 1834, and the worst by either of the two main parties since 1931 (when Labour won just 52 seats), a period of soul-searching is in order. Not only does the party need a new leader following the resignation of Rishi Sunak, it needs to decide what it stands for. The party has become increasingly out of touch since the Brexit referendum in 2016, burning through five prime ministers and spending more time pandering to right-wing MPs than listening to what voters want. It failed to improve public services – indeed their deterioration can be traced back to the austerity policy introduced by George Osborne in 2010; it failed to reach its immigration targets and it failed to make Brexit work.

At least the more reflective MPs recognised that fact as they trooped out of office yesterday (Sunak and Chancellor Jeremy Hunt among them). But is that shared by the 172,000 members of the Conservative Party, who will be responsible for choosing the next leader? The Tories made a mistake in tacking to the right after their defeat by Blair’s Labour Party in 1997 which kept them out of office for 13 years. Although circumstances are different today, the general view is that elections are won from the centre ground. A tie-up with Nigel Farage, as proposed by many excitable political commentators recently, would probably be a mistake. If Labour are smart (and they are), they will know that reducing NHS waiting lists and improving the quality of public services will draw the sting out of the immigration debate. The Tories would be well advised not to go too far down that path.

The fate of the smaller parties

The Liberal Democrats returned after three drubbings to record their best performance in terms of seats since 1923 (72). The Greens outperformed expectations to win four seats in parliament – a record for them – while Reform UK came from nowhere, grabbing the headlines with five seats and a 14.3% vote share. This was largely down to the charisma of Nigel Farage – love him or loathe him, he knows how to whip up the populist vote. Farage and his band of fellow travellers will be noisy and consume a lot of political oxygen in the months ahead. They are too small to be politically relevant but they will have an influence at the margin by influencing the debate in parts of the Tory party as it ponders its future.

The SNP had a bad day in Scotland, going from the dominant political force holding 48 of the country’s 59 seats in 2019 to just 9 of 57 today. This is the result of many domestic factors, including allegations of corruption at the top of the party, but the truth is that independence is no longer the burning issue it was a decade ago. This will at least make Starmer’s job a bit easier as he will no longer have to contend with demands for an independence referendum for the foreseeable future.

Stacked in-tray: What to do?

Aside from the high profile issues of tackling the NHS, and overcrowded prisons which Starmer mentioned in his first press conference yesterday, reform of the social care, welfare and benefit systems are areas where the government will have to act quickly. It has long been recognised that the rollout of the Universal Credit system has been plagued with difficulties, particularly as people migrate from legacy benefits to the new system. Access to welfare benefits is increasingly wrapped up in red tape as claimants are subject to conditionality requirements, while there are mounting problems in accessing disability benefits as regulatory changes are introduced. In 2019 I advocated reducing the taper rate on Universal Credit as a gesture of goodwill to those voters who lent their votes to the Tories (which in fairness the government introduced in 2022 but more can be done here), and reducing the time between claiming benefits and receiving payments. If the government wants to improve the lot of the poorest in society, there are low cost wins to be had.

Final thoughts

As parts of Europe swing to the right of the political spectrum, notably France which goes to the polls today, the European landscape will become more fractured. As a result the UK may stand out as a beacon of stability after a tumultuous few years. That does not mean that the UK should expect a huge wave of foreign investment immediately but it may at the margin become less unattractive vis-à-vis other EU markets. Building some bridges back to the EU will definitely help.

Undoubtedly, the new government will have to prioritise on policy and it says that one of its primary tasks is to boost growth. In truth, this will be hard to achieve – there are so many factors which impact on performance that are outside its control. Not having made many tangible economic promises, it will be difficult to underdeliver, but that is not enough – voters want a bit of stability, and a return of the feelgood factor. Don’t we all?