xkcd commentary - Frequentists vs. Bayesians
I found this xkcd comic hilarious and, at the same time, brilliant:
The reason why I like it so much is that it shows what is wrong with frequentist hypothesis testing very plainly, and why a Bayesian approach might be preferable. And mind you, this isn’t just a philosophical issue, devoid of real-world value, we statisticians cannot agree on. On the contrary, it has serious consequences: I am sure you heard of the replication crisis plaguing some fields of science. Essentially, people realized that it is not possible to reach the same conclusions shown in some studies, and one of the culprits is misunderstanding and wrongful application of hypothesis testing and p-values.
Here, I just want to explain what is going on in that comic. The two scientists have a hypothesis they want to test (the sun exploded), and do so by gathering some data (the answer from the machine). A good scientific hypothesis should be falsifiable, it should be possible to show that it is false. If this is not possible, then that theory is just pseudo-science on the same level as magic, witches, and dragons. At least this was Popper’s reaction to the other scientific paradigm of the time, namely showing that a theory is true by means of repeated observations. Things have changed since then, most notably falsifiability was rejected as a criterion for separating science from non-science.
The frequentist approach to this task is to assume that the hypothesis is false (to be precise, that there is “no effect”) and compute the probability of the observations under this assumption; if this probability (the p-value) is low enough, then we can be reasonably sure that the hypothesis must be true, otherwise nothing can be said. Now, people mindlessly use 0.05 as a threshold to say that something is statistically significant, even though there is no particular reason to use this value and not another. The true story is that Fischer, who first developed the theory behind p-values, used this value as a cut-off to establish that something fishy is going on (ha-ha) and worthy of more investigation. In practice, nowadays, the investigation just stops at that threshold, as if we found the truth and nothing more needs to be done. (Check here and here for other xkcd comics pointing out how silly this is).
Going back to the comic, we assume that the sun has not exploded. Since the
machine answered “yes”, it must be lying; as the machine only lies when the
outcome of a two-dices roll is a double six, the probability it did lie is
This is obviously ridiculous, but why? Given our understanding of the working of
the sun, it is inconceivable that it will explode anytime soon, and we do not
wish a dice roll to change our opinion on that. There is also another, more
subtle, issue. Sure, the reasoning seems to work. Call the observations (machine
answered “yes”)
Essentially, the flaw of the frequentist reasoning is that it does not consider
the probability that
(this is not the standard form, but I find it more illuminating and easier to
remember), where the vertical bar indicates conditioning:
Let’s now look at the Bayesian approach. We want to know
(this is actually how Bayes’ theorem is presented in the first place).
Where
which makes intuitive sense: we really do not expect the sun to explode, so when the detector says “yes” we would rather believe the dice roll did not went well. Putting all together we have:
i.e. the detector’s answer barely changed the Bayesian’s opinion about the state of the sun. Well, given the answer, now he thinks that the sun is 35 times more likely to have exploded, but it is still a tiny probability.
In this case, computing
Going back to our sun detector, though, we can express the reasoning above using the odds:
Which is (are?) much simpler to compute:
note that it is equal to
All of this depends on our strong belief of the sun not exploding. If we were indifferent to it, as our frequentist friend, we would indeed reach the same conclusion. It can be weird to let our prior belief (or biases) change our conclusions. Our dream is to only use data to make decisions. After all, math is unbiased, right? Why not let the data speak for itself? As this comic shows, this is simply not possible. Sure, priors are subjective, and when they are too strong, no amount of data can make Bayesians change their mind. On the other hand, no prior at all can make them very gullible, almost like a child. You don’t believe everything you read on the internet, do you? Then why do you believe everything your data tells you?
An alternative interpretation of the reasoning conducted by the Bayesian scientist is that 50$, or anything else, is a sure bet, for if the sun really exploded it will not matter that you lost it. This is the real genius behind this comic.