Showing posts with label football. Show all posts
Showing posts with label football. Show all posts

Monday, 1 September 2025

The silly season: A numbers game

As August gives way to September and the media’s so-called “silly season” winds down, attention shifts to the return of football. With the new season having kicked off, and today’s transfer window now having closed, it can sometimes feel as though a different kind of silly season is only just beginning. Yet football is merely a reflection of our society with its passions, excesses and contradictions. The sport magnifies our tribal instincts, celebrates collective joy and exposes the widening gulf between ordinary fans and the vast sums of money that swirl around the game. It is also – as I have frequently noted – a great test bed for applying economic and statistical analysis.

Some economic reflections on this year’s big stories

While football can provoke great debate, it is just as likely to be treated with indifference by a large section of the population. But it is hard to ignore. Even the Financial Times is taking notice, with its fun online challenge (Can you run a Premier League football club?) and an article equating the decline of Manchester United with the fall of the Berlin Wall. I have long thought that the diminished prowess of the Red Devils speaks more to the industrial economics literature on why dominant firms decline. Empirical research conducted by Paul Geroski in the 1980s[1] challenged the conventional view that dominant firms do indeed decline. Paul is alas no longer with us, but he did develop a surprising affinity for football, and he would almost certainly conclude that Manchester United’s recent travails do not represent a permanent shift in the club’s fortunes.

To the sports journalist or the casual fan, the standoff between Alexander Isak and Newcastle United, in which the player’s refusal to train with the team as he tried – eventually successfully – to force a move to Liverpool, may seem like the petulant actions of a spoiled star. To the economist, they represent calculated moves in a high-stakes financial negotiation. Isak’s motivation is clear: moving to one of Europe’s top clubs will propel him into the footballing elite, generating more trophies and a higher income. A footballer’s career is short, and could be ended tomorrow by injury. It is thus rational for him to attempt to maximise his income.

What about the clubs? Newcastle are about to reap the benefit of Champions League revenues, and must weigh the immediate windfall of a record-breaking player sale against his value on the pitch, where his contribution could see the team progress further in the competition thus generating additional broadcast, matchday and win-related revenue. For their part, Liverpool must balance ambition with fiscal prudence. A blockbuster signing is no guarantee of success and the club must be careful that a fee in the region of £130 million does not undermine their carefully maintained financial model under the Premier League’s Profit and Sustainability rules (PSR).

Indeed, one of the features of the transfer window – the player trading period that closed today – is what it reveals about how modern football teams manage assets, revenue streams and strategic risk. The economics of football transfers are increasingly shaped not just by player ability but by contract dynamics. The Isak case is unusual because he had three years left on his contract. But when a player has only a year left on his deal, his transfer value typically falls sharply because the selling club risks losing him for free under the Bosman ruling. This shifts bargaining power towards the player and the buying club: the player can threaten to run down his contract, forcing a cut-price sale, while suitors know they can secure him on a free the following summer. As a result, clubs often face a strategic dilemma – cash in now at a reduced fee, or gamble on retaining the player’s services for another season and risk losing a valuable asset without compensation. In a financial landscape constrained by PSR regulations, these contractual time horizons are as important to balance sheets as the players’ performances on the pitch.

The real action is on the pitch

While much of the economics focuses on the finances, the action on the field lends itself to statistical analysis. Last autumn, I took a look at Premier League prospects for season 2024-25 on the basis of a Poisson simulation model[2]. How did I do? First the bad news: I gave Liverpool only an 8% chance of winning the title (they won comfortably). More positively, I tipped the five clubs who would win Champions League places; correctly predicted two of the three relegated teams and called 8 of the top 10 teams. I will leave it to the reader to judge whether that was an acceptable performance.

The method has a substantial academic pedigree[3] and last year’s performance was sufficiently robust that it is worth trying it again as a means to forecast outcomes for season 2025-26. As a reminder, the model simulates each game 1000 times and adjusts expected goals (λ) by adding a random number in the range [-1<n<1] in a bid to capture the element of luck inherent in any sporting contest. The results are shown in the table below.

Even before a ball was kicked, the model suggested that Liverpool were favourites to retain their title (57% probability, or evens favourites) with Arsenal, Manchester City and Newcastle making up the rest of the top 4. The unfortunate favourites for relegation are Burnley, Wolves and Sunderland (probabilities of 47.5%, 45% and 76% respectively). It is notable that the pre-season rankings generated by the model broadly accord with the bookmakers rankings. In the chart below, I use the bookies odds of achieving a top 4 finish as a proxy for team’s relative strength. In cases where the club is placed above the diagonal line, this represents cases where the bookies are more optimistic than the ranking predicted by the model (take comfort fans of Tottenham and Manchester United). Similarly for those clubs placed below the diagonal, the bookmakers are less optimistic (fans of Leeds, Brentford and Bournemouth take note). Given that last year’s results form the basis of the model’s expected goals parameter (λ), and the weight of money placed with the bookmakers is heavily influenced by last year’s performance, congruence in the results should not be a great surprise.

One of the weaknesses that I tried (unsuccessfully) to address is the momentum effect. If a team starts the season well (badly), does it represent a temporary deviation from the mean or does it represent a genuine improvement (deterioration) in performance relative to last season? In an attempt to address this problem, I experimented with a dynamic estimate of λ based on an exponentially weighted average of recent performance. In this approach, the starting value for λ is last season’s average although over time this plays a diminishing role. Early random match results feed back into the calculation of subsequent expected goals (λ), and each week these noisy current-season averages are blended with last season’s stats. This repeated averaging pulls temporary leads or deficits toward the league mean, reducing persistent differences between strong and weak teams. This feedback loop compressed variation across simulated seasons, causing title probabilities to bunch up, making the league appear artificially balanced compared with a static model where team strengths remain fixed. Perhaps with a bit more time I could develop a dynamic approach that improves on the current method, but for now the fixed λ approach appears to generate a better approximation to reality.

Last word

Applying statistical methods to football outcomes is both fascinating and practically valuable because it allows us to move beyond intuition and anecdote, quantifying the uncertainty inherent in each match and across an entire season. By modelling goals, team strengths and dynamic interactions, we can simulate outcomes that would be impossible to assess reliably by eye. This not only deepens our understanding of the game’s underlying patterns but also provides actionable insights for analysts, coaches and fans. It illustrates how a combination of mathematics, probability, and real-world data can illuminate the complex, dynamic and often unpredictable world of football. This approach is not limited to football: the same principles can be applied to virtually any competitive or stochastic system where outcomes depend on multiple interacting factors, from other sports to business forecasting, financial markets or epidemiology. In all these contexts, statistical modelling enables a deeper understanding of underlying patterns, informs decision-making and helps anticipate outcomes in complex and uncertain environments. Just don’t assume that my model is going to make you rich.


[1] Geroski, P. A. and A. Jacquemin (1984) ‘Dominant firms and their alleged decline’, International Journal of Industrial Organization (2) 1, pp. 1-27

[2] The Poisson distribution – a probability distribution that describes discrete events – is commonly applied in football analytics to model and predict match outcomes because goals in a match can be thought of as rare, discrete events that occur independently over time. In this framework, each team is assumed to score goals at a constant average rate (λ), and the Poisson distribution gives the probability of scoring exactly 0, 1, 2, … goals in a match. The probability mass function for a variable following the Poisson distribution is defined as  where X is the number of goals a team scores in a match and k is  a specific outcome (a non-negative integer: 0, 1, 2, …). For example, k=2 means “the team scores exactly 2 goals.

[3] Dixon, M. J. and S. G. Coles (1997) ‘Modelling Association Football Scores and Inefficiencies in the Football Betting Market’ Journal of the Royal Statistical Society Series C: Applied Statistics, Volume 46, Issue 2, 265–280

Sunday, 13 October 2024

The table doesn't lie (and other football tales)

It is quite some time since I last made a foray into any football related matters but I was motivated to revisit the application of statistical techniques to predict match outcomes after reading Ian Graham’s fascinating book How to Win The Premier League. Graham was previously head of data analytics at Liverpool FC and he described in detail how the collection of data and its use in tracking player performance and value made a significant contribution to Liverpool’s resurgence as a footballing force under Jürgen Klopp. This culminated in the club winning the Champions League in 2019, and in 2020 it secured its first league title in 30 years.

One area of the book that piqued my interest was the discussion of how a couple of academics[1] in the 1990s set out to assess whether it was possible to predict the outcome of football matches in order to beat the odds offered by bookmakers (answer: yes, but only in certain circumstances). The so-called Dixon-Coles model (no relation) used a Poisson distribution to estimate the number of goals scored and conceded by each team in a head-to-head match. This was particularly interesting to me because I used a similar approach when trying to assess who would win the World Cup in 2022 and I was motivated to see whether I could improve the code to extend the analysis to Premier League football. You may ask why an economist would want to do that. One reason is that it satisfies my long-standing interest in statistics and growing interest in data science methods, and with time on my hands during the summer holiday, why not? It also provided an opportunity to use ChatGPT to fill in any gaps in my coding knowledge in order to assess whether it really is the game changer for the coding industry that people say (my experience in this regard was very positive).

How can we model match outcomes?

Turning to our problem, the number of goals scored by each team is a discrete number (i.e. we cannot have fractions of goals) so we have to use a discrete probability distribution and it turns out that the Poisson distribution does this job rather well. As Ian Graham put it, “the Poisson distribution, also known as the ‘Law of Small Numbers’, governs the statistics of rare events”. And when you think about it, goals are rare events in the context of all the action that takes place on a football pitch. In aggregate the number of goals scored across all teams does indeed follow a Poisson distribution (see above). However, since the sample size for each team is much smaller (each team plays only 19 home and 19 away games), this increases the likelihood of deviations from the expected average.

That caveat aside, following the literature, if a variable follows the Poisson distribution, it is defined by a probability mass function as set out below:

In this instance we are trying to find the value of k given that we know λ. But what do we know about λ – the expected number of goals each team scores per game? One piece of information is that teams will generally score more goals per game at home than in an away match, and typically concede fewer goals at home (and vice versa). We also know that the number of goals each team scores depends on the quality of the team’s attacking players as well as the quality of the opposition defence. Intuitively, it makes sense to define λ as a weighted average of the expected goals scored (a proxy for quality of a team’s attack) and the expected number of goals conceded by the opposition as an indication of their defensive qualities. I assumed weights of 0.5 for each. We calculate this for home games and away games, using the previous season’s average as a starting point. As a result, for each team, λ can take one of four values, depending on expected goals scored home and away, and the opposition’s defensive performance home and away.

By simulating each game 1000 times we can derive an estimate of what the average result is likely to be and construct the expected league rankings. To add a little spice, we can adjust λ by adding a random number in the range [-n, n, 0<n<1] in a bid to capture the element of luck inherent in any sporting contest on the basis that over the course of a number of games, this element averages out to zero. One thing to take into account is that the performance of promoted teams will not be so strong in the Premier League compared to the Championship, with the evidence suggesting that promoted teams score around 0.7 goals per game fewer and concede 0.7 goals per game more. We thus amend the λ value for promoted teams accordingly. It is possible to add a number of other tweaks[2] but for the purposes of this exercise, this is a good starting point.We construct λ as follows: 

How well does the model predict outcomes?

I ran the analysis over the past six seasons to assess the usefulness of the model and measured each team’s expected league position against outturn. On average, the model predicts to within three places the expected league position with 64% accuracy. That’s not bad although I would have hoped for a bit better. 

But excluding the Covid-impacted season 2019-20, when the absence of crowds had a significant impact on expected results, we can raise that slightly to 66%. But ironically, it is season 2022-23 which imparts a serious downward bias to results – excluding that season raises the accuracy rate to 69%. A popular cliche associated with football is that the table doesn't lie. What we have discovered here is either that it does, or the model needs a bit of improvement and at some point I will try and incorporate a dynamic estimate of λ to give a higher weight to more recent performance, but that is a topic for another day.

A model which predicts the past is all very well, but what about the future? Looking at expected outcomes for season 2024-25, prior to the start of the season, the model predicted that Manchester City would win the league. This is hardly a controversial choice – it has won in six of the last seven years – but in contrast to previous years, the model assigned only a 48% probability to a City title win, compared to an average of 74% over the preceding six seasons. Arsenal ran them close with a 45% chance of winning the league. At the other end of the table, Southampton, Nottingham Forest and Ipswich were the primary relegation candidates (probabilities of 38%, 39% and 67% respectively). Running the analysis to include data through last weekend, covering the first seven games of the season, it is still nip and tuck at the top with Manchester City and Arsenal maintaining a 48% and 44% probability, respectively, of winning the title. But at the bottom, Wolverhampton, Ipswich and Southampton are the three relegation candidates with assigned probabilities of 58%, 70% and 72% respectively.


What is the point of it all?

As an economist who spends time looking at macro trends, it may seem difficult to justify such an apparently trivial pursuit as predicting football outcomes. But there are some lessons that we can carry over into the field of macro and market forecasting. In the first instance, the outcomes of football matches involve inherent uncertainty and the application of stochastic simulation techniques such as those applied here can equally be used to account for randomness and uncertainty in economic and financial predictions. Indeed they are often used in the construction of error bands around forecast outcomes. A second application is to demonstrate that probabilistic forecasting is a viable statistical technique to use when faced with a priori uncertainty. We do not know the outcome of a football match in advance any more than we know what will be the closing price of the stock market on any given day. But simulating past performance can give us a guide as to the possible range of outcomes. A third justification is to demonstrate the impact of scenario analysis: by changing model parameters we are able to generate different outcomes and with modern computing techniques rendering the cost of such analysis to be trivially small, it is possible to run a huge number of different alternatives to assess model sensitivities.

Forecasting often throws up surprises and we will never be right 100% of the time. If you can come up with a system that predicts the outcome significantly better than 50% of the time, you are on the right track. Sometimes the outlandish does happen and I am hoping that the model’s current prediction that Newcastle United can finish fourth in the Premiership (probability: 37%) will be one of them.



[1] Dixon, M. J. and S. G. Coles (1997) ‘Modelling Association Football Scores and Inefficiencies in the Football Betting Market’ Journal of the Royal Statistical Society Series C: Applied Statistics, Volume 46, Issue 2, 265–280

[2] For example, if we want to model the overall points outcome, a better approximation to historical outturns is derived if we adjust λ by a multiplicative constant designed to reduce it relative to the mean for teams where it is very high and raise it for those where it is low. This is a way of accounting for the fact that a team which performed well in the previous season may not perform so well in the current season, and teams which performed badly may do rather better. This does not have a material impact on the ranking but does reduce the dispersion of points between the highest and lowest ranking teams in line with realised performance. Another tweak is to adjust the random variable to follow a Gaussian distribution rather than the standard uniform assumption.