This article is available in a Jupyter notebook: click here to run it on Colab.
I’ve been enjoying Aubrey Clayton’s new book Bernoulli’s Fallacy. The first chapter, which is about the historical development of competing definitions of probability, is worth the price of admission alone.
One of the examples in Chapter 1 is a simplified version of a problem posed by Thomas Bayes. The original version, which I wrote about here, involves a billiards (pool) table; Clayton’s version uses dice:
Your friend rolls a six-sided die and secretly records the outcome; this number becomes the target T. You then put on a blindfold and roll the same six-sided die over and over. You’re unable to see how it lands, so each time your friend […] tells you only whether the number you just rolled was greater than, equal to, or less than T.
Suppose in one round of the game we had this sequence of outcomes, with G representing a greater roll, L a lesser roll, and E an equal roll:
G, G, L, E, L, L, L, E, G, LClayton, Bernoulli’s Fallacy, pg 36.
Based on this data, what is the posterior distribution of T?
There are two parts of my solution; computing the likelihood of the data under each hypothesis and then using those likelihoods to compute the posterior distribution of T.
To compute the likelihoods, I’ll demonstrate one of my favorite idioms, using a meshgrid to apply an operation, like
>, to all pairs of values from two sequences.
In this case, the sequences are
hypos: The hypothetical values of T, and
outcomes: possible outcomes each time we roll the dice
hypos = [1,2,3,4,5,6] outcomes = [1,2,3,4,5,6]
If we compute a meshgrid of
hypos, the result is two arrays.
import numpy as np O, H = np.meshgrid(outcomes, hypos)
The first contains the possible outcomes repeated down the columns.
array([[1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6]])
The second contains the hypotheses repeated across the rows.
array([[1, 1, 1, 1, 1, 1], [2, 2, 2, 2, 2, 2], [3, 3, 3, 3, 3, 3], [4, 4, 4, 4, 4, 4], [5, 5, 5, 5, 5, 5], [6, 6, 6, 6, 6, 6]])
If we apply an operator like
>, the result is a Boolean array.
O > H
array([[False, True, True, True, True, True], [False, False, True, True, True, True], [False, False, False, True, True, True], [False, False, False, False, True, True], [False, False, False, False, False, True], [False, False, False, False, False, False]])
Now we can use
axis=1 to compute the fraction of
True values in each row.
(O > H).mean(axis=1)
array([0.83333333, 0.66666667, 0.5 , 0.33333333, 0.16666667, 0. ])
The result is the probability that the outcome is greater than T, for each hypothetical value of T. I’ll name this array
gt = (O > H).mean(axis=1)
The first element of the array is 5/6, which indicates that if T is 1, the probability of exceeding it is 5/6. The second element is 2/3, which indicates that if T is 2, the probability of exceeding it is 2/3. And do on.
Now we can compute the corresponding arrays for less than and equal.
lt = (O < H).mean(axis=1)
array([0. , 0.16666667, 0.33333333, 0.5 , 0.66666667, 0.83333333])
eq = (O == H).mean(axis=1)
array([0.16666667, 0.16666667, 0.16666667, 0.16666667, 0.16666667, 0.16666667])
In the next section, we’ll use these arrays to do a Bayesian update.
In this example, computing the likelihoods was the hard part. The Bayesian update is easy. Since T was chosen by rolling a fair die, the prior distribution for T is uniform. I’ll use a Pandas
Series to represent it.
import pandas as pd pmf = pd.Series(1/6, hypos) pmf
1 0.166667 2 0.166667 3 0.166667 4 0.166667 5 0.166667 6 0.166667
Now here’s the sequence of data, encoded using the likelihoods we computed in the previous section.
data = [gt, gt, lt, eq, lt, lt, lt, eq, gt, lt]
The following loop updates the prior distribution by multiplying by each of the likelihoods.
for datum in data: pmf *= datum
Finally, we normalize the posterior.
pmf /= pmf.sum()
1 0.000000 2 0.016427 3 0.221766 4 0.498973 5 0.262834 6 0.000000
Here’s what it looks like.
As an aside, you might have noticed that the values in
eq are all the same. So when the value we roll is equal to T, we don’t get any new information about T. We could leave the instances of
eq out of the data, and we would get the same answer.