Eomics: forecast uncertainty

Showing posts with label forecast uncertainty. Show all posts

Sunday, 13 October 2024

The table doesn't lie (and other football tales)

It is quite some time since I last made a foray into any football related matters but I was motivated to revisit the application of statistical techniques to predict match outcomes after reading Ian Graham’s fascinating book How to Win The Premier League. Graham was previously head of data analytics at Liverpool FC and he described in detail how the collection of data and its use in tracking player performance and value made a significant contribution to Liverpool’s resurgence as a footballing force under Jürgen Klopp. This culminated in the club winning the Champions League in 2019, and in 2020 it secured its first league title in 30 years.

One area of the book that piqued my interest was the discussion of how a couple of academics[1] in the 1990s set out to assess whether it was possible to predict the outcome of football matches in order to beat the odds offered by bookmakers (answer: yes, but only in certain circumstances). The so-called Dixon-Coles model (no relation) used a Poisson distribution to estimate the number of goals scored and conceded by each team in a head-to-head match. This was particularly interesting to me because I used a similar approach when trying to assess who would win the World Cup in 2022 and I was motivated to see whether I could improve the code to extend the analysis to Premier League football. You may ask why an economist would want to do that. One reason is that it satisfies my long-standing interest in statistics and growing interest in data science methods, and with time on my hands during the summer holiday, why not? It also provided an opportunity to use ChatGPT to fill in any gaps in my coding knowledge in order to assess whether it really is the game changer for the coding industry that people say (my experience in this regard was very positive).

How can we model match outcomes?

Turning to our problem, the number of goals scored by each team is a discrete number (i.e. we cannot have fractions of goals) so we have to use a discrete probability distribution and it turns out that the Poisson distribution does this job rather well. As Ian Graham put it, “the Poisson distribution, also known as the ‘Law of Small Numbers’, governs the statistics of rare events”. And when you think about it, goals are rare events in the context of all the action that takes place on a football pitch. In aggregate the number of goals scored across all teams does indeed follow a Poisson distribution (see above). However, since the sample size for each team is much smaller (each team plays only 19 home and 19 away games), this increases the likelihood of deviations from the expected average.

That caveat aside, following the literature, if a variable follows the Poisson distribution, it is defined by a probability mass function as set out below:

In this instance we are trying to find the value of k given that we know λ. But what do we know about λ – the expected number of goals each team scores per game? One piece of information is that teams will generally score more goals per game at home than in an away match, and typically concede fewer goals at home (and vice versa). We also know that the number of goals each team scores depends on the quality of the team’s attacking players as well as the quality of the opposition defence. Intuitively, it makes sense to define λ as a weighted average of the expected goals scored (a proxy for quality of a team’s attack) and the expected number of goals conceded by the opposition as an indication of their defensive qualities. I assumed weights of 0.5 for each. We calculate this for home games and away games, using the previous season’s average as a starting point. As a result, for each team, λ can take one of four values, depending on expected goals scored home and away, and the opposition’s defensive performance home and away.

By simulating each game 1000 times we can derive an estimate of what the average result is likely to be and construct the expected league rankings. To add a little spice, we can adjust λ by adding a random number in the range [-n, n, 0<n<1] in a bid to capture the element of luck inherent in any sporting contest on the basis that over the course of a number of games, this element averages out to zero. One thing to take into account is that the performance of promoted teams will not be so strong in the Premier League compared to the Championship, with the evidence suggesting that promoted teams score around 0.7 goals per game fewer and concede 0.7 goals per game more. We thus amend the λ value for promoted teams accordingly. It is possible to add a number of other tweaks[2] but for the purposes of this exercise, this is a good starting point.We construct λ as follows:

How well does the model predict outcomes?

I ran the analysis over the past six seasons to assess the usefulness of the model and measured each team’s expected league position against outturn. On average, the model predicts to within three places the expected league position with 64% accuracy. That’s not bad although I would have hoped for a bit better.

But excluding the Covid-impacted season 2019-20, when the absence of crowds had a significant impact on expected results, we can raise that slightly to 66%. But ironically, it is season 2022-23 which imparts a serious downward bias to results – excluding that season raises the accuracy rate to 69%. A popular cliche associated with football is that the table doesn't lie. What we have discovered here is either that it does, or the model needs a bit of improvement and at some point I will try and incorporate a dynamic estimate of λ to give a higher weight to more recent performance, but that is a topic for another day.

A model which predicts the past is all very well, but what about the future? Looking at expected outcomes for season 2024-25, prior to the start of the season, the model predicted that Manchester City would win the league. This is hardly a controversial choice – it has won in six of the last seven years – but in contrast to previous years, the model assigned only a 48% probability to a City title win, compared to an average of 74% over the preceding six seasons. Arsenal ran them close with a 45% chance of winning the league. At the other end of the table, Southampton, Nottingham Forest and Ipswich were the primary relegation candidates (probabilities of 38%, 39% and 67% respectively). Running the analysis to include data through last weekend, covering the first seven games of the season, it is still nip and tuck at the top with Manchester City and Arsenal maintaining a 48% and 44% probability, respectively, of winning the title. But at the bottom, Wolverhampton, Ipswich and Southampton are the three relegation candidates with assigned probabilities of 58%, 70% and 72% respectively.

What is the point of it all?

As an economist who spends time looking at macro trends, it may seem difficult to justify such an apparently trivial pursuit as predicting football outcomes. But there are some lessons that we can carry over into the field of macro and market forecasting. In the first instance, the outcomes of football matches involve inherent uncertainty and the application of stochastic simulation techniques such as those applied here can equally be used to account for randomness and uncertainty in economic and financial predictions. Indeed they are often used in the construction of error bands around forecast outcomes. A second application is to demonstrate that probabilistic forecasting is a viable statistical technique to use when faced with a priori uncertainty. We do not know the outcome of a football match in advance any more than we know what will be the closing price of the stock market on any given day. But simulating past performance can give us a guide as to the possible range of outcomes. A third justification is to demonstrate the impact of scenario analysis: by changing model parameters we are able to generate different outcomes and with modern computing techniques rendering the cost of such analysis to be trivially small, it is possible to run a huge number of different alternatives to assess model sensitivities.

Forecasting often throws up surprises and we will never be right 100% of the time. If you can come up with a system that predicts the outcome significantly better than 50% of the time, you are on the right track. Sometimes the outlandish does happen and I am hoping that the model’s current prediction that Newcastle United can finish fourth in the Premiership (probability: 37%) will be one of them.

[1] Dixon, M. J. and S. G. Coles (1997) ‘Modelling Association Football Scores and Inefficiencies in the Football Betting Market’ Journal of the Royal Statistical Society Series C: Applied Statistics, Volume 46, Issue 2, 265–280

[2] For example, if we want to model the overall points outcome, a better approximation to historical outturns is derived if we adjust λ by a multiplicative constant designed to reduce it relative to the mean for teams where it is very high and raise it for those where it is low. This is a way of accounting for the fact that a team which performed well in the previous season may not perform so well in the current season, and teams which performed badly may do rather better. This does not have a material impact on the ranking but does reduce the dispersion of points between the highest and lowest ranking teams in line with realised performance. Another tweak is to adjust the random variable to follow a Gaussian distribution rather than the standard uniform assumption.

Monday, 11 May 2020

The limits of modelling

The British government has made it clear throughout the Covid 19 crisis that it has been “following the science.” But at this relatively early stage of our understanding of the disease there is no single body of knowledge to draw on. There is a lot that epidemiologists agree on but there are also areas where they do not. Moreover, the science upon which the UK lockdown is based is derived from a paper published almost two months ago when our understanding of Covid was rather different to what we know now. I was thus fascinated by this BBC report by medical editor Deborah Cohen, who posed questions of the current strategy and interviewed experts in the field who expressed some reservations about how the facts are reported. Whilst the report gave an interesting insight into epidemiology, it also reminded me of the criticism directed at economic forecasting.

One of the most interesting issues to arise out of the discussion was the use of models to track the progression of disease. The epidemiologists quoted were unanimous in their view that models were only useful if backed up by data. As Dame Deirdre Hine, the author of a report on the 2009 H1N1 pandemic pointed out, models are not always useful in the early stages of a pandemic given the lack of data upon which they are based. She further noted that “politicians and the public are often dazzled by the possibilities that modelling affords” and that models often “overstate the possibilities of deaths in the early stages” of a pandemic due to a lack of data. As Hine pointed out, epidemiological models only start to become useful once we implement a thorough programme of tracing and tracking people’s contacts, for only then can we start to get a decent handle on the spread of any disease.

This approach has great parallels with empirical macroeconomics where many of the mainstream models used for analytical purposes are not necessarily congruent with the data. Former member of the Bank of England Monetary Policy Committee Danny Blanchflower gave a speech on precisely this topic back in 2007 with the striking title The Economics of Walking About. The objective of Blanchflower’s speech was to encourage policymakers to look at what is going on around them rather than uncritically accept the outcomes derived from a predetermined set of ideas, and to put “the data before the theory where this seems warranted.”

I have always thought this to be very sensible advice, particularly in the case where DSGE models are used for forecasting purposes. These models are theoretical constructs based on a particular economic structure which use a number of assumptions whose existence in the real world are subject to question (Calvo pricing and rational expectations to name but two). Just as in epidemiology, models which are not consistent with the data do not have a good forecasting record. In fact, economic models do not have a great track record, full stop. But we are still forced to rely on them because the alternative is either not to provide a forecast at all, or simply make a guess. As the statistician George Box once famously said, “all models are wrong, but some are useful.”

Epidemiologists make the point that models can be a blunt instrument which give a false sense of security. The researchers at Imperial College whose paper formed the basis of the government’s strategy might well come up with different estimates if, instead of basing their analysis on data derived from China and Italy, they updated their results on the basis of latest UK data. They may indeed have already done so (though I have not seen it) but this does not change the fact that the government appears to have accepted the original paper at face value. Of course, we cannot blame the researchers for the way in which the government interpreted the results. But having experienced the uncritical media acceptance of economic forecasts produced by the likes of the IMF, it is important to be aware of the limitations of model-driven results.

Another related issue pointed out by the epidemiologists is the way in which the results are communicated. For example, the government’s strategy is based on the modelled worst case outcomes for Covid 19 but this has been criticised for being misleading because it implies an event which is unlikely rather than one which close to the centre of the distribution. The implication is that the government based its strategy on a worst case outcome rather than on a more likely outcome with the result that the damage to the economy is far greater than it needed to be. That is a highly contentious suggestion and is not one I would necessarily buy into. After all, a government has a duty of care to all its citizens and if the lives of more vulnerable members of society are saved by imposing a lockdown then it may be a price worth paying.

But it nonetheless raises a question of the way in which potential outcomes are reported. I have made the point (here) in an economics context that whilst we need to focus on the most likely outcomes (e.g. for GDP growth projections), there are a wide range of possibilities around the central case which we also need to account for. Institutions that prepare forecast fan charts recognise that there are alternatives around the central case to which we can ascribe a lower probability. Whilst the likes of the Bank of England have in the past expressed frustration that too much emphasis is placed on the central case, they would be far more concerned if the worst case outcomes grabbed all the attention. The role of the media in reporting economic or financial outcomes does not always help. How often do we see headlines reporting that markets could fall 20% (to pick an arbitrary figure) without any discussion of the conditions necessary to produce such an outcome? The lesson is that we need to be aware of the whole range of outcomes but apply the appropriate weighting structure when reporting possible outcomes.

None of this is to criticise the efforts of epidemiologists in their efforts to model the spread of Covid 19. Nor is it to necessarily criticise the government’s interpretation of it. But it does highlight the difficulties inherent in forecasting outcomes based on models using incomplete information. As Nils Bohr reputedly once said, “forecasting is hard, especially when it’s about the future.” He might have added, “but it’s impossible without accurate inputs.”