Eomics: Bayes theorem

Showing posts with label Bayes theorem. Show all posts

Wednesday, 31 July 2019

Bayes says "no" to no-deal

One of the questions most frequently asked of me is what are the chances of a no-deal Brexit? In common with most analysts I tend to give an answer couched in subjective probability terms. This has the advantage that it is my view – nobody can say it is definitively wrong. Others may not agree with it, but they have the right to apply their own subjective assessments. However, this does strike me as a bit imprecise and led me to wonder whether it might be possible to derive a more accurate data-driven assessment of the odds of no deal.

One way to approach the problem is in terms of Bayesian statistics. In simple terms, Bayesian statistics assesses probability in terms of the degree of belief in an event. This contrasts with the more traditional frequentist school of statistics which represents probability as the number of times an event will occur based on an infinite number of representations of the process (I looked at this issue in more detail here). Bayesian statistics has always struck me as a sensible way to approach problems like Brexit where belief plays a big role and where the frequentist approach, which is based on the assumption that an event can be repeated, is unrealistic.

Bayesian statistics is based on Bayes’ Theorem which describes the conditional probability of an event based upon prior beliefs (or information) and which is written thus:

P(A | B) = P(B | A) . P(A)

P(B)

This says that P(A | B), which is the probability of event A conditional on event B, depends on the probability of the event B given that A is true (the so-called posterior probability), and the prior probabilities of A and B.

To put this into Brexit terms, let us assume that event A is a no-deal Brexit and B is the event Remain. We are interested in the probability of a no-deal Brexit, P(A). We can rewrite this:

P(A) = P(A | B) . P(B)

P(B | A)

P(No-deal Brexit) = P(No-deal Brexit | Remain) . P(Remain)

P(Remain | No-deal Brexit)

Using data from the website What UK Thinks, we can use survey data to provide some of the evidence. Based on the question of how people would be likely to vote in a second referendum, we can derive a value for P(Remain) which is currently 44% (or 51% once we strip out those voters who either do not intend to vote or who have reported as “don’t know”). We proxy the value for P(Remain | No deal Brexit) by looking at the survey evidence which asks respondents to choose between no-deal and Remain. Faced with this choice, 44% of respondents indicated that they would vote Remain (54% once we strip out don’t knows). Similarly, we estimate a value for P(No deal | Remain) by looking at the proportion who would vote for no-deal if the alternative is Remain (38%, or 46% after adjusting for “don’t knows”).

Putting these numbers together gives a constrained probability of no-deal of 44% (where the constrained probability is derived by stripping out “don’t knows” and constraining the remaining categories to sum to 100%). The unconstrained probability, where we do not adjust for the “don’t knows”, comes out around 36%. The good news is that however we slice it, the numbers come out at less than 50%; the bad news is that they are higher than my subjective probability of 30%.

To the extent that politicians are looking at the polling numbers to assess whether a no-deal policy makes sense, this Bayesian interpretation of the evidence suggests that there is more support for a no-deal Brexit than is often supposed. However, it is not high enough to push through with it in the event that it proves to be economically disastrous, since more than 50% of voters do not support it and the backlash from the disaffected half is likely to be severe. One caveat is that some of the surveys have not been updated for a few weeks and therefore do not reflect any possible change of stance since Boris Johnson became prime minister. Moreover, Bayesian probabilities change as more information becomes available, so these are not set in stone by any means.

Bayesian statistics are the big thing these days but as the pollster Nate Silver pointed out, “under Bayes' theorem, no theory is perfect. Rather, it is a work in progress, always subject to further refinement and testing.” The prime minister’s new chief adviser, Dominic Cummings, pointed out in a blog post in 2017 “Rationality is more than ‘Bayesian updating’”. But cold hard statistics do catch up with you eventually and the evidence suggests that support for a no-deal Brexit is limited. If I were a politician it would not be the ground on which I would want to fight a battle – irrespective of what the prime minister says these days.

Sunday, 20 November 2016

Brexit: A Bayesian view

The Reverend Thomas Bayes was an English clergyman who lived in the first half of the eighteen century, and who also happened to be a mathematician. He gave his name to a branch of statistics which has emerged from relative obscurity in recent years, and which helps better understand the world around us. The insight of Bayesian statistics is that it characterises probability as uncertainty, which represents a belief about a particular outcome. The only real thing is the data and as a result some outcomes are more believable than others based on the data and their prior beliefs.

So-called classical statistics, which is most people’s introduction to the subject, relies on the insight that probability represents a fixed long-run relative frequency in which the likelihood of an event emerges as a ratio from an infinitely large sample size. In other words, the more observations we have, the more likely it is that the most frequently observed outcome represents the true mean of a given distribution.

To illustrate how these two schools of thought differ, consider the case of horse racing. Two horses – let’s call them True Blue and Knackers Yard – have raced against each other 15 times. True Blue has beaten Knackers Yard on 9 occasions. A classical statistician would thus assign a probability of 60% to the likelihood that True Blue wins (9/15), implying a 40% chance that Knackers Yard will win. But we have additional information that on 5 of the 7 occasions when Knackers Yard has won, the weather has been wet whilst True Blue won two wet races. The question of interest here is what are the odds that Knackers Yard will win knowing that the weather ahead of the sixteenth race is wet? To do this, we can combine two pieces of information: the head-to-head performance of the two horses, and their performance dependent on weather conditions.

In order to do this, we make use of Bayes Theorem which is written thus:

P(A | B) = P(B | A). P(A)

P(B)

P(A|B) is the likelihood that event A occurs conditional on event B. In this case, we want to know the probability that Knackers Yard wins conditional on the fact it is raining. P(B|A) is the probability of the evidence turning up, given the outcome. In this case, we want to know the likelihood that it is raining given that Knackers Yard wins. Since there were 7 rainy days in total and Knackers Yard won on five occasions, the answer is 5/7 or 83.3%. P(A) is the prior probability that the event occurs given no additional evidence. In this case, the probability that Knackers Yard wins is 40% (it has won 6 out of 15 races). P(B) is the probability of the evidence arising, without regard for the outcome – in this case, the probability of rain irrespective of which horse won. Since we know there were 7 rainy days out of 15 races, P(B)=7/15 = 46.7%. Plugging all this information into the formula, we can calculate that P(A|B)=71.4%.

Now all this might appear to be a bit geeky but it is an interesting way to look at the problem of how the UK economy is likely to perform given that Brexit happens. Our variable of interest is thus P(A|B): the UK’s economic growth performance conditional on Brexit; P(B) is the likelihood of Brexit and assuming (as the government seems to suggest) that it is set in stone, we set it to a value of 1. Moreover, assuming that Brexit will happen regardless of the economic cost (i.e. ministers are not overly concerned about accepting a hard Brexit) then P(B|A) is also close to unity.

In effect, the Bayesian statistician might suggest that P(Growth│Brexit)=P(Growth). Since the only concrete information we have on economic performance is past performance, it is easy to make the case from a Bayesian perspective that the UK's future growth prospects can be extrapolated from past evidence. Those pro-Brexiteers who say that UK’s post-Brexit performance will not be damaged by leaving the UK may unwittingly have statistical theory on their side. But one of the key insights of Bayesian statistics is that we change our prior beliefs as new information becomes available. If growth slows over the next year or so, then other things being equal, it would be rational to reduce our assessment of post-Brexit growth prospects.

Incidentally, a joke doing the rounds of the statistics community at present suggests that although Bayes first published the theorem which bears his name, it was the French mathematician Laplace who developed the mathematics underpinning this branch of statistics. As a result, Brexit may present a good opportunity to give due credit to the Frenchman by naming it Laplacian statistics. It’s enough to make arch-Bayesian Nigel Farage choke on his croissant.