Showing posts with label football. Show all posts
Showing posts with label football. Show all posts

Sunday, 13 October 2024

The table doesn't lie (and other football tales)

It is quite some time since I last made a foray into any football related matters but I was motivated to revisit the application of statistical techniques to predict match outcomes after reading Ian Graham’s fascinating book How to Win The Premier League. Graham was previously head of data analytics at Liverpool FC and he described in detail how the collection of data and its use in tracking player performance and value made a significant contribution to Liverpool’s resurgence as a footballing force under Jürgen Klopp. This culminated in the club winning the Champions League in 2019, and in 2020 it secured its first league title in 30 years.

One area of the book that piqued my interest was the discussion of how a couple of academics[1] in the 1990s set out to assess whether it was possible to predict the outcome of football matches in order to beat the odds offered by bookmakers (answer: yes, but only in certain circumstances). The so-called Dixon-Coles model (no relation) used a Poisson distribution to estimate the number of goals scored and conceded by each team in a head-to-head match. This was particularly interesting to me because I used a similar approach when trying to assess who would win the World Cup in 2022 and I was motivated to see whether I could improve the code to extend the analysis to Premier League football. You may ask why an economist would want to do that. One reason is that it satisfies my long-standing interest in statistics and growing interest in data science methods, and with time on my hands during the summer holiday, why not? It also provided an opportunity to use ChatGPT to fill in any gaps in my coding knowledge in order to assess whether it really is the game changer for the coding industry that people say (my experience in this regard was very positive).

How can we model match outcomes?

Turning to our problem, the number of goals scored by each team is a discrete number (i.e. we cannot have fractions of goals) so we have to use a discrete probability distribution and it turns out that the Poisson distribution does this job rather well. As Ian Graham put it, “the Poisson distribution, also known as the ‘Law of Small Numbers’, governs the statistics of rare events”. And when you think about it, goals are rare events in the context of all the action that takes place on a football pitch. In aggregate the number of goals scored across all teams does indeed follow a Poisson distribution (see above). However, since the sample size for each team is much smaller (each team plays only 19 home and 19 away games), this increases the likelihood of deviations from the expected average.

That caveat aside, following the literature, if a variable follows the Poisson distribution, it is defined by a probability mass function as set out below:

In this instance we are trying to find the value of k given that we know λ. But what do we know about λ – the expected number of goals each team scores per game? One piece of information is that teams will generally score more goals per game at home than in an away match, and typically concede fewer goals at home (and vice versa). We also know that the number of goals each team scores depends on the quality of the team’s attacking players as well as the quality of the opposition defence. Intuitively, it makes sense to define λ as a weighted average of the expected goals scored (a proxy for quality of a team’s attack) and the expected number of goals conceded by the opposition as an indication of their defensive qualities. I assumed weights of 0.5 for each. We calculate this for home games and away games, using the previous season’s average as a starting point. As a result, for each team, λ can take one of four values, depending on expected goals scored home and away, and the opposition’s defensive performance home and away.

By simulating each game 1000 times we can derive an estimate of what the average result is likely to be and construct the expected league rankings. To add a little spice, we can adjust λ by adding a random number in the range [-n, n, 0<n<1] in a bid to capture the element of luck inherent in any sporting contest on the basis that over the course of a number of games, this element averages out to zero. One thing to take into account is that the performance of promoted teams will not be so strong in the Premier League compared to the Championship, with the evidence suggesting that promoted teams score around 0.7 goals per game fewer and concede 0.7 goals per game more. We thus amend the λ value for promoted teams accordingly. It is possible to add a number of other tweaks[2] but for the purposes of this exercise, this is a good starting point.We construct λ as follows: 

How well does the model predict outcomes?

I ran the analysis over the past six seasons to assess the usefulness of the model and measured each team’s expected league position against outturn. On average, the model predicts to within three places the expected league position with 64% accuracy. That’s not bad although I would have hoped for a bit better. 

But excluding the Covid-impacted season 2019-20, when the absence of crowds had a significant impact on expected results, we can raise that slightly to 66%. But ironically, it is season 2022-23 which imparts a serious downward bias to results – excluding that season raises the accuracy rate to 69%. A popular cliche associated with football is that the table doesn't lie. What we have discovered here is either that it does, or the model needs a bit of improvement and at some point I will try and incorporate a dynamic estimate of λ to give a higher weight to more recent performance, but that is a topic for another day.

A model which predicts the past is all very well, but what about the future? Looking at expected outcomes for season 2024-25, prior to the start of the season, the model predicted that Manchester City would win the league. This is hardly a controversial choice – it has won in six of the last seven years – but in contrast to previous years, the model assigned only a 48% probability to a City title win, compared to an average of 74% over the preceding six seasons. Arsenal ran them close with a 45% chance of winning the league. At the other end of the table, Southampton, Nottingham Forest and Ipswich were the primary relegation candidates (probabilities of 38%, 39% and 67% respectively). Running the analysis to include data through last weekend, covering the first seven games of the season, it is still nip and tuck at the top with Manchester City and Arsenal maintaining a 48% and 44% probability, respectively, of winning the title. But at the bottom, Wolverhampton, Ipswich and Southampton are the three relegation candidates with assigned probabilities of 58%, 70% and 72% respectively.


What is the point of it all?

As an economist who spends time looking at macro trends, it may seem difficult to justify such an apparently trivial pursuit as predicting football outcomes. But there are some lessons that we can carry over into the field of macro and market forecasting. In the first instance, the outcomes of football matches involve inherent uncertainty and the application of stochastic simulation techniques such as those applied here can equally be used to account for randomness and uncertainty in economic and financial predictions. Indeed they are often used in the construction of error bands around forecast outcomes. A second application is to demonstrate that probabilistic forecasting is a viable statistical technique to use when faced with a priori uncertainty. We do not know the outcome of a football match in advance any more than we know what will be the closing price of the stock market on any given day. But simulating past performance can give us a guide as to the possible range of outcomes. A third justification is to demonstrate the impact of scenario analysis: by changing model parameters we are able to generate different outcomes and with modern computing techniques rendering the cost of such analysis to be trivially small, it is possible to run a huge number of different alternatives to assess model sensitivities.

Forecasting often throws up surprises and we will never be right 100% of the time. If you can come up with a system that predicts the outcome significantly better than 50% of the time, you are on the right track. Sometimes the outlandish does happen and I am hoping that the model’s current prediction that Newcastle United can finish fourth in the Premiership (probability: 37%) will be one of them.



[1] Dixon, M. J. and S. G. Coles (1997) ‘Modelling Association Football Scores and Inefficiencies in the Football Betting Market’ Journal of the Royal Statistical Society Series C: Applied Statistics, Volume 46, Issue 2, 265–280

[2] For example, if we want to model the overall points outcome, a better approximation to historical outturns is derived if we adjust λ by a multiplicative constant designed to reduce it relative to the mean for teams where it is very high and raise it for those where it is low. This is a way of accounting for the fact that a team which performed well in the previous season may not perform so well in the current season, and teams which performed badly may do rather better. This does not have a material impact on the ranking but does reduce the dispersion of points between the highest and lowest ranking teams in line with realised performance. Another tweak is to adjust the random variable to follow a Gaussian distribution rather than the standard uniform assumption.

Wednesday, 21 April 2021

The not-so-super league

Regular readers will know that I have a long-standing interest in football (or soccer as American readers know it), partly driven by the extent to which it is an area ripe for economic analysis. The recent attempt by 12 of Europe’s top football clubs to join the breakaway European Super League (ESL) in opposition to the Champions League is thus a fascinating topic, as well as a major sporting/cultural issue. As I started writing this piece the news came through that all six of the English clubs which signed up have pulled out with two Spanish clubs reportedly considering following suit. The project thus seems destined to collapse - a conclusion I came to in my original (non-published) post. But by shining a light on the reasons for the collapse we can illuminate more clearly some important aspects of the economics of football.

The project has been heavily criticised for a number of reasons – the most common being that it reflects greed on the part of the owners who wish to maximise their income irrespective of the consequences for grassroots football (including the women’s game which is now gaining traction across Europe). It is indeed notable that no German clubs signed up, which may have a lot to do with the ownership structure (the 50+1 rule which gives fans majority voting rights). Support for the ESL appears to be confined to the board room as players past and present, football administrators from across the continent and, most importantly, fans lined up to condemn the idea. Obviously UEFA was not pleased that some of the continent’s best-known clubs are planning an alternative to its money-spinning Champions League competition and threatened the imposition of retaliatory sanctions. But there are a lot of issues at play here, not to mention a nice line in hypocrisy from many of those in football who have suddenly discovered an interest in the welfare of fans.

The financial angle

Looking first at the finances of the G12 (or the dirty dozen), 11 of them occupy the top 14 places in the annual Deloitte’s Football Money League revenue ranking (the 12th is AC Milan which occupies 30th spot). But according to Swiss Ramble (one of the best commentators on European football finances), the G12 made a financial loss of £1.2 billion (€1.05 billion) in season 2019-20 before player sales were taken into account. He also calculates that they owe a “staggering” €7.4 billion of debt (chart) on €5.59 of revenue (my calculations), implying a debt-to-income ratio of 132%. On that basis it is not difficult to understand why they are keen to take part in a competition which increases their revenue stream, particularly in the wake of Covid which has had a dramatic effect on finances.

But whilst the elite clubs have the option of being able to join a super league which protects their revenue stream, most do not and the enforced absence of spectators since March 2020 has had a major impact on their revenues. Even before Covid struck, the finances of English Premier League (EPL) teams were shaky. The top teams in England have generated a huge rise in income over the last 30 years thanks to the money pumped in by TV companies keen to secure the broadcasting rights. Some of this has been used to fund the construction of more modern stadiums fit for the 21st century but to a large extent it has ended up in the pockets of players with wages making up an average of 65% of clubs’ income.

The finances of teams lower down the pyramid have not kept pace as the gap between the rich and poor continues to widen. We should not kid ourselves that top-level football is an altruistic institution with clubs at the top looking out for those lower down the scale. In 2019 Bury FC, one of the oldest professional clubs in England, was forced into bankruptcy over a debt of less than £2 million. The two EPL clubs closest to Bury, Manchester United and Manchester City, have a combined weekly wage bill of over £6 million. Indeed, for all the outrage generated by the EPL over the breakaway league, we should not forget that the EPL itself was formed in 1992 as a breakaway from the Football League (the administrator of league football in England) to allow clubs to maximise revenue from the sale of TV rights and sponsorship arrangements. As I noted in this post, its record on financial probity is spotty and it cannot be said to be looking after the interests of fans.

Business or sport?

The advent of the ESL is perhaps an inevitable consequence of allowing the big clubs to grab an ever larger slice of the pie. This prescient film clip from 1994 accurately foreshadowed the consequences for the sport of allowing TV to call the shots, with participants in the documentary predicting with uncanny accuracy how little the voice of the fans would count in the brave new footballing world (although the extent of fans’ discontent did clearly convince club owners that the ESL was a step too far).

Football is not a conventional business and it is therefore difficult to ascribe standard business practices. I have long characterised football as operating in an imperfect oligopolistic market in which the products are differentiated by branding and where there are significant barriers to entry. Matters are made more complex by the fact that it is a product which has global appeal but is rooted in domestic structures. This makes valuing the ESL a difficult prospect. However, I would argue that the owners of elite clubs have miscalculated the value of their brand and arguably they do not understand the underpinnings of their industry.

This is reinforced by the findings of Peter Sloane, one of the pioneers in the economics of football who has been studying the area since the early-1970s. In a paper published in 2015[1] he noted that there are significant differences in the conduct of North American and European team sports management: “While it is assumed by most protagonists in North America that clubs attempt to maximise profits, in Europe the most common assumption is the maximisation of playing success subject to a break-even constraint.” Sloane went on to point out that “North American leagues are closed to new entrants through the granting of exclusive territorial rights, though with allowance for some franchise mobility, whereas in Europe leagues are open to entry through a system of promotion and relegation.” It is notable that three of the English clubs signing up to the ESL are owned by Americans and arguably they made a mistake by applying the American model in the wrong setting.

Sloane touches on another interesting point: Although there may appear to be little solidarity between clubs in the same league, “mutual inter-dependence is generally regarded as a sine qua non of professional sporting leagues.” An inherent paradox of competition is that while clubs strive for playing success at the expense of the opposition, they each have an interest in the survival of rivals as they require healthy teams to play against. A revenue sharing structure, rather than a profit maximising model, best ensures this by ensuring that smaller clubs can receive additional revenue to buy better players in order to improve their performance, thereby raising the quality of the product. Although the ESL model does allow for revenue sharing, it only does so for elite clubs already in the clique. The uncertainty of result required to ensure continued consumer interest is correspondingly reduced. Accordingly, a successful league can best be described as a joint venture between the administrative body which sets the overarching competitive framework and the clubs which operate as independent entities within it. Changing this fragile balance will lead to system failure.

The future

The attempt to form the ESL is not the first time that big clubs have tried to increase their revenue at the expense of smaller clubs and it is unlikely to be the last. As I have pointed out before, if football is allowed to be conducted along casino capitalism lines with light-touch self-regulation it is inevitable that the more powerful will try to assert their market power. In its 2019 election manifesto, the Conservative Party promised to “set up a fan-led review of football governance, which will include consideration of the Owners and Directors Test.” There have also been calls to set up an independent regulator to oversee governance of the sport although past performance suggests that they generally tend to be toothless bodies.

However, football finances do clearly need to be overhauled. Some of the lessons football can learn from US team sports are the introduction of wage caps, restrictions on transfer fees and restrictions on stock market flotation. There is also a case for limiting the amount of debt that clubs are able to carry. Over the last 30 years football has failed to reform itself. Maybe it is time to impose reform upon it.


[1] Sloane (2015) ‘The Economics of Professional Football Revisited’, Scottish Journal of Political Economy 62(1), 1-7 (available here as a download)