# xkcd commentary - Frequentists vs. Bayesians

I found this xkcd comic hilarious and, at the same time, brilliant:

The reason why I like it so much is that it shows what is wrong with frequentist hypothesis testing very plainly, and why a Bayesian approach might be preferable. And mind you, this isn’t just a philosophical issue, devoid of real-world value, we statisticians cannot agree on. On the contrary, it has serious consequences: I am sure you heard of the replication crisis plaguing some fields of science. Essentially, people realized that it is not possible to reach the same conclusions shown in some studies, and one of the culprits is misunderstanding and wrongful application of hypothesis testing and p-values.

Here, I just want to explain what is going on in that comic. The two scientists
have a hypothesis they want to test (the sun exploded), and do so by gathering
some data (the answer from the machine). A good scientific hypothesis should be
*falsifiable*, it should be possible to show that it is false. If this is not
possible, then that theory is just pseudo-science on the same level as magic,
witches, and dragons. At least this was Popper’s reaction to the other
scientific paradigm of the time, namely showing that a theory is true by means
of repeated observations. Things have changed since then, most notably
falsifiability was rejected as a criterion for separating science from
non-science.

The frequentist approach to this task is to assume that the hypothesis is false
(to be precise, that there is “no effect”) and compute the probability of the
observations under this assumption; if this probability (the *p-value*) is low
enough, then we can be reasonably sure that the hypothesis must be true,
otherwise *nothing can be said*. Now, people mindlessly use 0.05 as a threshold
to say that something is statistically significant, even though there is no
particular reason to use this value and not another. The true story is that
Fischer, who first developed the theory behind p-values, used this value as a
cut-off to establish that something *fishy* is going on (ha-ha) and worthy of
more investigation. In practice, nowadays, the investigation just stops at that
threshold, as if we found the truth and nothing more needs to be done. (Check
here and here for other xkcd
comics pointing out how silly this is).

Going back to the comic, we assume that the sun has *not* exploded. Since the
machine answered “yes”, it must be lying; as the machine only lies when the
outcome of a two-dices roll is a double six, the probability it did lie is
$1/36$. Conversely, the probability it did *not* lie is $35/36$, so better save
yourselves.

This is obviously ridiculous, but why? Given our understanding of the working of
the sun, it is inconceivable that it will explode anytime soon, and we do not
wish a dice roll to change our opinion on that. There is also another, more
subtle, issue. Sure, the reasoning seems to work. Call the observations (machine
answered “yes”) $D$ and the hypothesis (the sun exploded) $H$: $D$ is very
unlikely to happen if $H$ is false, since $D$ happened, then $H$ is true. This
works in mathematical logic, since $\neg H\Rightarrow \neg D$ is equivalent to
$D\Rightarrow H$, but just because of a quirk of the logical implication, namely
that a false premise *does* imply a false conclusion. Things cannot work like
that in probability though: what if $D$ is *even more* unlikely to happen if $H$
is true? (if the sun exploded, we would have around 8 minutes to roll the dices
before being annihilated; it seems reasonable to assume that the scientists in
the comic are performing the experiment at least ten minutes after sunset,
unless they are very anxious individuals).

Essentially, the flaw of the frequentist reasoning is that it does not consider the probability that $H$ itself is true, or, to be precise, it assumes that the two outcomes are equally probable. We know this is not true, and we can express this formally using Bayes’ theorem, which states:

\[p(H\vert D)p(D)=p(D\vert H)p(H)=p(H\cap B)\](this is not the standard form, but I find it more illuminating and easier to
remember), where the vertical bar indicates conditioning: $D\vert H$ means
observing $D$ *after* we observed (or assumed) $H$. With this notation, the
frequentist reasoning can be summarized as follows: “$p(D\vert\overline{H})$ is
too low, so $H$ is true”. Silly right?

Let’s now look at the Bayesian approach. We want to know $p(H\vert D)$, which can be derived from Bayes’ theorem:

\[p(H\vert D)=\frac{p(D\vert H)p(H)}{p(D)}\](this is actually how Bayes’ theorem is presented in the first place). $p(D\vert
H)$ is the probability of the detector being honest about our terrible fate,
$p(H)$ our believed probability of the sun exploding *before* we ask the
detector, and $p(D)$ the probability of the detector saying “yes” (independently
of the state of the sun). Given the working of the machine, we know $p(D\vert
H)=35/36$, and given our knowledge of physics we might say that $p(H)=10^{-6}$
or so. $p(D)$ is in general a bit more complicated to compute, and later I will
show you that it is not necessary to compute it. In this case, however, we can
easily get it using the *law of total probability*:

Where $B_1,\ldots,B_n$ is any partition of the sample space. In other words,
they are $n$ different alternatives, such that only one can happen at any given
time; then, the probability of $A$ is the probability that it happens when $B_1$
happens *or* when $B_2$ happens, *or* … *or* when $B_n$ happens. Since we can
choose anything we want for these sets, we will choose $H$ and $\overline H$,
and we use Bayes’ formula once again to get:

which makes intuitive sense: we really do not expect the sun to explode, so when the detector says “yes” we would rather believe the dice roll did not went well. Putting all together we have:

\[p(H\vert D)\approx\frac{35}{36}10^{-6}\cdot 36=3.5\cdot 10^{-5}\]i.e. the detector’s answer barely changed the Bayesian’s opinion about the state of the sun. Well, given the answer, now he thinks that the sun is 35 times more likely to have exploded, but it is still a tiny probability.

In this case, computing $p(D)$ was simple, but it usually is not, because it
requires precise knowledge about the *data generating distribution*, i.e. how
the data is “produced”. In this case, we know how the detector works, but in the
general case you cannot write this down. What is the probability of that picture
you took last week with your friends? This is why we usually try our best to get
rid of this term whenever it pops out (i.e. always); usually we just ignore it,
because we are always take the observations $D$ as fixed, so $p(D)$ is just an
annoying constant. In other cases we can actually remove it. Note that $p(D\vert
H)$ is a lot easier to compute, because it corresponds to our *model* of the
world, i.e. to how we *assume* the data is generated. On the other hand, $p(D)$
is the *true* way in which data is generated, and we cannot afford any
assumption there (or we could use the law of total probability, but that
requires considering all possible data-generating processes, which are
infinite).

Going back to our sun detector, though, we can express the reasoning above using
the *odds*:

Which is (are?) much simpler to compute:

\[O(H\vert D)=\frac{35/36}{1/36}\cdot\frac{10^{-6}}{1-10^{-6}}\approx3.5\cdot10^{-5}\]note that it is equal to $p(H\vert D)$ *only by chance*. Such a low value means
that we are *extremely* sure that $H$ is false.

All of this depends on our strong belief of the sun not exploding. If we were indifferent to it, as our frequentist friend, we would indeed reach the same conclusion. It can be weird to let our prior belief (or biases) change our conclusions. Our dream is to only use data to make decisions. After all, math is unbiased, right? Why not let the data speak for itself? As this comic shows, this is simply not possible. Sure, priors are subjective, and when they are too strong, no amount of data can make Bayesians change their mind. On the other hand, no prior at all can make them very gullible, almost like a child. You don’t believe everything you read on the internet, do you? Then why do you believe everything your data tells you?

An alternative interpretation of the reasoning conducted by the Bayesian scientist is that 50$, or anything else, is a sure bet, for if the sun really exploded it will not matter that you lost it. This is the real genius behind this comic.