Browsed by
Tag: probability

Field Sobriety Tests and the Base Rate Fallacy

Field Sobriety Tests and the Base Rate Fallacy

In Chapter 9 of Probably Overthinking It I wrote about Drug Recognition Experts (DREs), who are law enforcement officers trained to recognize impaired drivers.

I reviewed the research papers that were supposed to evaluate the accuracy of DREs and I summarized my impressions like this:

What I found was a collection of studies that are, across the board, deeply flawed. Every one of them features at least one methodological error so blatant it would be embarrassing at a middle school science fair.

Recently the related topic of Field Sobriety Tests (FSTs) came up in this Reddit discussion, which links to this TV news report about sober drivers who were arrested based on FST results.

The TV report refers to this 2023 paper in JAMA Psychiatry. Because it’s recent, published in a good quality journal, and called “Evaluation of Field Sobriety Tests for Identifying Drivers Under the Influence of Cannabis: A Randomized Clinical Trial”, I thought it might address the problems I found in previous research.

Unfortunately, it has the same problems:

  • Selection bias: It excludes as subjects people with conditions that might cause them to fail an FST while sober – but these are exactly the people most vulnerable to false positive results.
  • Wrong metrics: The paper focuses on the true positive and false positive rates, and neglects the predictive value of the test – which is more relevant to the policy question.
  • Unrealistic base rate: In the test conditions, two thirds of the participants were impaired, which is almost certainly higher than the relevant fraction in the real world.

Despite all that, the false positive rate they reported is 49%, which means that nearly half of the sober participants were wrongly classified as impaired.

Let’s look at each of these problems more closely.

False Positives

The study tested 184 participants, 121 randomly assigned to the THC group and 63 to the placebo group. The THC group smoked cannabis cigarettes containing THC; the placebo group smoked cigarettes with almost none. Each participant was evaluated by one officer, who was “blinded to treatment assignment”. The paper reports

Officers classified 98 participants (81.0%) in the THC group and 31 (49.2%) in the placebo group as FST impaired.

The following table summarizes these results as a confusion matrix:

FST PositiveFST NegativeTotal
THC Group9823121
Placebo Group313263
Total12955184

Let’s start with the most obvious problem: of 63 people in the placebo group, 31 were wrongly classified as impaired, so the false positive rate was 49%.

Although the tests “were administered by certified DRE instructors, the highest training level for impaired driving detection”, the results for sober participants were no better than a coin toss. That’s pretty bad, but in reality it’s probably worse, because of selection bias.

Selection Bias

The study recruited 261 people who met these requirements: “age 21 to 55 years, cannabis use 4 or more times in the past month, holding a valid driver’s license, and driving at least 1000 miles in the past year.”

But it excluded 62 recruits for reasons including “history of traumatic brain injury [and] significant medical conditions or psychiatric conditions”. They also excluded people with a positive urine test for nonprescription drugs or substance use disorder in the past year.

That’s a problem because people with these kinds of medical conditions are more likely to fail an FST – even if they are not actually impaired. By excluding them, the study excludes exactly the people most vulnerable to a false positive result.

A better experiment would recruit a representative sample of drivers, including people older than 55 and people with conditions that make it hard to pass a field sobriety test. The TV report highlights an example: an autistic man who was arrested for DUI because his autism-related differences were mistaken for impairment. I assume he would have been excluded from the study.

To see how much difference the selection criteria could make, suppose 20 of the excluded participants (about one third) had been assigned to the placebo group. And suppose that because of their conditions 16 of them were wrongly classified as impaired – that’s 80%, somewhat higher than the rate among included participants.

That would increase the number of false positives by 16 and the number of true negatives by 4, so the unbiased false positive rate might be 57%.

This is just a guess: it’s not clear how many were excluded specifically for medical conditions or how many of the excluded would have failed the FST. But this calculation gives us a sense of how big the bias could be.

As I wrote in Probably Overthinking It:

How can you estimate the number of false positives if you exclude from the study everyone likely to yield a false positive? You can’t.

And that brings us to the next problem.

Predictive Value

The paper reports:

Officers classified 98 participants (81.0%) in the THC group and 31 (49.2%) in the placebo group as FST impaired at the first evaluation

They quantify this difference as 31.8 percentage points, with 95% CI, 16.4-47.2 percentage points, and report a p-value < .001. Based on this analysis, they conclude:

FSTs administered by highly trained law enforcement officers differentiated between individuals receiving THC vs placebo

This conclusion is true in the sense that the difference in percentages is statistically significant, but the policy question is not whether THC exposure changes FST performance under laboratory conditions. The question is whether an FST result provides sufficiently strong evidence to justify detention or arrest.

For that, the false positive rate is relevant, and as we have discussed, it is probably more than 50%.

But even more important is the positive predictive value (PPV), which is the probability that a positive test is correct. In the confusion matrix, there are 129 positive tests, of which 98 are correct and 31 incorrect, so the PPV is 98 out of 129, about 76%.

Of the people who failed the FST, 76% were actually impaired. That might sound good enough for probable cause, but that conclusion is misleading because there is still another problem – the base rate.

Base Rate

In the study, two thirds of the participants were impaired. In the real world, it is unlikely that two thirds of drivers are impaired – or even two thirds of drivers who take an FST. So the base rate in the study is too high.

To see why that matters, we have to do a little math. First we’ll use the confusion matrix to compute one more metric, sensitivity, which is the percentage of impaired participants who were classified correctly.

We can use sensitivity, along with the false positive rate we already computed, to figure out the positive predictive value of a test with a more realistic base rate.

Of all people pulled over and given a field sobriety test, how many do you think are impaired by THC? That’s a hard question to answer, so we’ll try a couple of values.

First, suppose the base rate is one third, rather than the two thirds in the study. If we imagine 100 drivers:

  • If 33 are impaired, and sensitivity is 81%, we expect 27 true positive results.
  • If 67 are not impaired, and the false positive rate is 49%, we expect 33 false positive results.

In that case the positive predictive value is 27 / (27 + 33), which means that only 45% of positive tests are correct. If we put those numbers in a table, the calculation might be clearer.

TestsProb posPos testsPercent
Impaired330.81026.72744.773
Not impaired670.49232.96855.227

With a lower base rate, PPV is lower, which means that a positive test is weaker evidence of impairment. But even 45% might be too high.

If we suppose that 15% of drivers who take an FST are impaired, we can run the numbers again.

TestsProb posPos testsPercent
Impaired150.81012.14922.508
Not impaired850.49241.82577.492

With 15% base rate, the predictive value of the test is only 23% – which means 77% of drivers identified as impaired would actually be sober.

In reality, the base rate depends on the context. At a checkpoint where every driver is stopped, the base rate might be lower than 15%. If a driver is stopped for driving erratically, the base rate might be relatively high. But even then, it is unlikely to be as high as 66%, as in the study.

Discussion

The JAMA Psychiatry study provides valuable data, but it suffers from the same methodological problems as previous DRE validation studies:

  1. High false positive rate: Nearly half of sober participants were incorrectly classified as impaired.
  2. Selection bias: The study excluded exactly the people most likely to be falsely accused, making it impossible to assess the true false positive rate in the general population.
  3. Unrealistic base rate: The base rate in the study is higher than what we expect in real-world use, which inflates the predictive value of the test.

Although I have been critical of the study, I agree with their interpretation of the results:

…the substantial overlap of FST impairment between groups and the high frequency at which FST impairment was suspected to be due to THC suggest that absent other indicators, FSTs alone may be insufficient to identify THC-specific driving impairment.

Emphasis mine.

Notes

In my interpretation of the results, I follow the methodology of the study, which treats assignment to the THC group as ground truth – that is, we assume that participants in the THC group were actually impaired and participants in the placebo group were not. And the paper reports:

Median self-reported highness (scale of 0 to 100, with higher scores indicating more impairment) at 30 minutes was 64 (IQR, 32-76) for the THC group and 13 (IQR, 1-28) for the placebo group (P < .001).

The THC group felt that they were more impaired, but based on the IQRs, it looks like there might be overlap. That complicates the interpretation of “impaired”, but for this analysis I use the study’s operational definition.

Click here to run this notebook on Colab.

Don’t Bet on the Super Bowl

Don’t Bet on the Super Bowl

If you have studied probability, you might be familiar with fractional odds, which represent the ratio of the probability something happens to the probability it doesn’t. For example, if the Seahawks have a 75% chance of winning the Super Bowl, they have a 25% chance of losing, so the ratio is 75 to 25, sometimes written 3:1 and pronounced “three to one”.

But if you search for “the odds that the Seahawks win”, you will probably get moneyline odds, also known as American odds. Right now, the moneyline odds are -240 for the Seahawks and +195 for the Patriots. If you are not familiar with this format, that means:

  • If you bet $100 on the Patriots and they win, you gain $195 – otherwise you lose $100.
  • If you bet $240 on the Seahawks and they win, you gain $100 – otherwise you lose $240.

If you are used to fractional odds, this format might make your head hurt. So let’s unpack it.

Suppose you think the Patriots have a 25% chance of winning. Under that assumption, we can compute the expected value of the first wager like this:

def expected_value(p, wager, payout):
    return p * payout - (1-p) * wager
expected_value(p=0.25, wager=100, payout=195)
-26.25

If the Patriots actually have a 25% chance of winning, the first wager has negative expected value – so you probably don’t want to make it.

Now let’s compute the expected value of the second wager – assuming the Seahawks have a 75% chance of winning:

expected_value(p=0.75, wager=240, payout=100)
15.0

The expected value of this wager is positive, so you might want to make it – but only if you have good reason to think the Seahawks have a 75% chance of winning.

Implied Probability

More generally, we can compute the expected value of each wager for a range of probabilities from 0 to 1.

ps = np.linspace(0, 1)
ev_patriots = expected_value(ps, 100, 195)
ps = np.linspace(0, 1)
ev_seahawks = expected_value(1-ps, 240, 100)

Here’s what they look like.

plt.plot(ps, ev_patriots, label='Bet on Patriots')
plt.plot(ps, ev_seahawks, label='Bet on Seahawks')
plt.axhline(0, color='gray', alpha=0.4)

decorate(xlabel='Actual probability Patriots win',
        ylabel='Expected value of wager')
_images/48c1402d18913b4e4a43f62a11e1f6206cac36ff426a02adefd811caa7714e50.png

To find the crossover point, we can set the expected value to 0 and solve for p. This function computes the result:

def crossover(wager, payout):
    return wager / (wager + payout)

Here’s crossover for a bet on the Patriots at the offered odds.

p1 = crossover(100, 195)
p1
0.3389830508474576

If you think the Patriots have a probability higher than the crossover, the first bet has positive expected value.

And here’s the crossover for a bet on the Seahawks.

p2 = crossover(240, 100)
p2
0.7058823529411765

If you think the Seahawks have a probability higher than this crossover, the second bet has positive expected value.

So the offered odds imply that the consensus view of the betting market is that the Patriots have a 33.9% chance of winning and the Seahawks have a 70.6% chance. But you might notice that the sum of those probabilities exceeds 1.

p1 + p2
1.0448654037886342

What does that mean?

The Take

The sum of the crossover probabilities determines “the take”, which is the share of the betting pool taken by “the house” – that is, the entity that takes the bets.

For example, suppose 1000 people take the first wager and bet $100 each on the Patriots. And 1000 people take the second wager and bet $240 on the Seahawks.

Here’s the total expected value of all of those wagers.

total = expected_value(ps, 100_000, 195_000) + expected_value(1-ps, 240_000, 100_000) 
plt.plot(ps, total, label='Total')
plt.axhline(0, color='gray', alpha=0.4)

decorate(xlabel='Actual probability Patriots win',
        ylabel='Total expected value of all wagers')
_images/4cbfc771ae7a96120417d15491ffc998549738a707551c824a63cfbaa12536b4.png

The total expected value is negative for all probabilities (or zero if the Patriots have no chance at all) – which means the house wins.

How much the house wins depends on the actual probability. As an example, suppose the actual probability is the midpoint of the probabilities implied by the odds:

p = (p1 + (1-p2)) / 2
p
0.31655034895314055

In that case, here’s the expected take, assuming that the implied probability is correct.

take = -expected_value(p, 100_000, 195_000) - expected_value(1-p, 240_000, 100_000) 
take
14244.765702891316

As a percentage of the total betting pool, it’s a little more than 4%.

take / (100_000 + 240_000)
0.04189636971438623

Which we could have approximated by computing the “overround”, which is the amount that the sum of the implied probabilities exceeds 1.

(p1 + p2) - 1
0.04486540378863424

Don’t Bet

In summary, here are the reasons you should not bet on the Super Bowl:

  • If the implied probabilities are right (within a few percent) all wagers have negative expected value.
  • If you think the implied probabilities are wrong, you might be able to make a good bet – but only if you are right. The odds represent the aggregated knowledge of everyone who places a bet, which probably includes a lot of people who know more than you.
  • If you spend a lot of time and effort, you might find instances where the implied probabilities are wrong, and you might even make money in the long run. But there are better things you could do with your time.

Betting is a zero-sum game if you include the house and a negative-sum game for people who bet. If you make money, someone else loses – there is no net creation of economic value.

So, if you have the skills to beat the odds, find something more productive to do.

The Girl Born on Tuesday

The Girl Born on Tuesday

Some people have strong opinions about this question:

In a family with two children, if at least one of the children is a girl born on Tuesday, what are the chances that both children are girls?

In this article, I hope to offer

  1. A solution to one interpretation of this question,
  2. An explanation of why the solution seems so counterintuitive,
  3. A discussion of other interpretations, and
  4. An implication of this problem for teaching and learning probability.

Let’s get started.

One interpretation

One reason this problem is contentious is that it is open to multiple interpretations. I’ll start by presenting just one – then we’ll get back to the ambiguity.

First, to avoid real-world complications, let’s assume an imaginary world where:

  • Every family has two children.
  • 50% of children are boys and 50% are girls.
  • All days of the week are equally likely birth days.
  • Genders and birth days are independent.

Second, we will interpret the question in terms of conditional probability; that is, we’ll compute P(B|A), where

  • A is “at least one of the children is a girl born on Tuesday”, and
  • B is “both children are girls”.

Under these assumptions and this interpretation, the answer is unambiguous – and it turns out to be 13/27 (about 48.1%).

But why?

This problem is counterintuitive because it elicits confusion between causation and evidence.

  • If a family has a girl born on a Tuesday, that does not cause the other child to be a girl.
  • But the fact that a family has a girl born on Tuesday is evidence that the other child is a girl.

To see why, imagine two families: the first has one girl and the other has ten girls. Suppose I choose one of the families at random, check to see whether they have a girl born on Tuesday, and find that they do.

Which family do you think I chose?

  • If I chose the family with one girl, the chance is only 1/7 (about 14%) that she was born on Tuesday.
  • If I chose the family with ten girls, the chance is about 79% that at least one of them was born on a Tuesday.

And that’s the key to understanding the problem:

A family with more than one girl is more likely to have one born on Tuesday. Therefore, if a family has a girl born on a Tuesday, it is more likely that they have more than one girl.

That’s the qualitative argument. Now we’ll make it quantitative – with Bayes’s Theorem.

Bayes’s Theorem

Let’s start with four kinds of two-child families.

kinds = ['Boy Boy', 'Boy Girl', 'Girl Boy', 'Girl Girl']

Under our simplifying assumptions, these combinations are equally likely, so their prior probabilities are equal.

from fractions import Fraction

prior = pd.Series(Fraction(1, 4), kinds)
display(prior, 'prior')
prior
Boy Boy1/4
Boy Girl1/4
Girl Boy1/4
Girl Girl1/4

Now for each kind of family, let’s compute the likelihood of a girl born on Tuesday:

  • If there are two boys, the probability of a girl born on Tuesday is 0.
  • If there is one girl, the probability she is born on Tuesday is 1/7.
  • If there are two girls, the probability at least one is born on Tuesday is 1 - (6/7)**2.

Let’s put those values in a list.

p = Fraction(1, 7)
likelihood = [0, p, p, 1 - (1-p)**2]
likelihood
[0, Fraction(1, 7), Fraction(1, 7), Fraction(13, 49)]

To compute the posterior probabilities, we multiply the prior and likelihood, then normalize so the results add up to 1.

posterior = prior * likelihood
posterior /= posterior.sum()
display(posterior, 'posterior')
posterior
Boy Boy0
Boy Girl7/27
Girl Boy7/27
Girl Girl13/27

The posterior probability of two girls is 13/27. As always, Bayes’s Theorem is the chainsaw that cuts through the knottiest problems in probability.

Other versions

Everything so far is based on the interpretation of the question as a conditional probability. But many people have pointed out that the question is ambiguous because it does not specify how we learn that the family has a girl born on a Tuesday.

This objection is valid:

  1. The answer depends on how we get the information, and
  2. The statement of the problem does not say how.

There are many versions of this problem that specify different ways you might learn that a family has a girl born on a Tuesday, and you might enjoy the challenge of solving them.

In general, if we specify the process that generates the data, we can use simulation, enumeration, or Bayes’s Theorem to compute the conditional probability given the data.

But what should we do if the data-generating process is not uniquely specified?

  • One option is to say that the question has no answer because it is ambiguous.
  • Another option is to specify a prior distribution of possible data-generating processes, compute the answer under each process, and apply the law of total probability.

Some of the people who choose the second option also choose a prior distribution so that the answer turns out to be 1/2. In my view, that is a correct answer to one interpretation, but that interpretation seems arbitrary – by choosing different priors, we can make the answer almost anything.

I prefer the interpretation I presented, because

  1. I believe it is what was intended by the people who posed the problem,
  2. It is consistent with the conventional interpretation of conditional probability,
  3. It yields an answer that seems paradoxical at first, so it is an interesting problem,
  4. The apparent paradox can be resolved in a way that sheds light on conditional probability and the idea of independent events.

So I think it’s a perfectly good problem – it’s just hard to express it unambiguously in natural language (as opposed to math notation).

But you don’t have to agree with me. If you prefer a different interpretation of the question, and it leads to a different answer, feel free to write a blog post about it.

What about independence?

I think the girl born on Tuesday carries a lesson about how we teach. In introductory probability, students often learn two ways to compute the probability of a conjunction. First they learn the easy way:

  • P(A and B) =  P(A) P(B)

But they are warned that this only applies if A and B are independent. Otherwise, they have to do it the hard way:

  • P(A and B) =  P(A) P(B|A)

But how to we know whether A and B are independent? Formally, they are independent if

  • P(B|A) = P(B)

So, in order to know which formula to use, you have to know P(B|A). But if you know P(B|A), you might as well use the second formula.

Rather than check independence by conditional probability, it is more common to assert independence by intuition. For example, if we flip two coins, we have a strong intuition that the outcomes are independent. And if the coins are known to fair, this intuition is correct. But if there is any uncertainty about the probability of heads, it is not.

The coin example – and Monty Hall, and Bertrand’s Boxes, and many more – demonstrate the real lesson of the girl born on Tuesday – our intuition for independence is wildly unreliable.

Which means we might want to rethink the way we teach it.

In general

Previously I wrote about a version of this problem where the girl is named Florida. In general, if we are given that a family has at least one girl with a particular property, and the prevalence of the property is p, we can use Bayes’s Theorem to compute the probability of two girls.

I’ll use SymPy to represent the priors and the probability p.

from sympy import Rational

prior = pd.Series(Rational(1, 4), kinds)
display(prior, 'prior')
prior
Boy Boy1/4
Boy Girl1/4
Girl Boy1/4
Girl Girl1/4

Here are the likelihoods in terms of p.

from sympy import symbols

p = symbols('p')

likelihood = [0, p, p, 1 - (1-p)**2]
likelihood
[0, p, p, 1 - (1 - p)**2]

And here are the posteriors.

posterior = prior * likelihood
posterior /= posterior.sum()

for kind, prob in posterior.items():
    print(kind, prob.simplify())
Boy Boy 0
Boy Girl -1/(p - 4)
Girl Boy -1/(p - 4)
Girl Girl (p - 2)/(p - 4)

So the general answer is (p-2) / (p-4).

If we plug in p = 1/7, we get 13/27 again.

prob = posterior['Girl Girl'].subs({p: Rational(1, 7)})
prob

Or for the girl named Florida, let’s assume one girl out of 1000 is named Florida.

prob = posterior['Girl Girl'].subs({p: Rational(1, 1000)})
prob

The following figure shows the probability of two girls as a function of the prevalence of the property.

xs = np.linspace(0, 1)
ys = (xs-2) / (xs-4)

plt.plot(xs, ys)
plt.xlabel('Prevalence of the property')
plt.ylabel('Conditional probability of two girls')
_images/c81aa262e67d9b56ecabe5664c2397cdd0375ce23e2d2c683d8d281e36c47726.png

If the property is rare – like the name Florida – the conditional probability is close to 1/2. If the property is common – like having a name – the conditional probability is close to 1/3.

Objections

Here are some objections to the “girl born on Tuesday” problem along with my responses.

You have to model the message, not just the event

Objection.
The statement “at least one child is a girl born on Tuesday” should not be treated as a bare event in a probability space. It should be treated as the outcome of a random process that generates messages or facts we learn. Therefore, the probability space must include not only family composition, but also the mechanism by which that information is produced. Any solution that conditions only on the family outcomes is incomplete.

Response.
I agree that if the problem is interpreted as conditioning on a message (something that is said, reported, or chosen from among several true statements), then the reporting mechanism matters and must be modeled explicitly. However, I don’t think such a mechanism is required in all cases. It is standard and meaningful to interpret a question as conditioning on an event – an extensional property of outcomes – without introducing an additional random variable for how the information was obtained. That is the interpretation I adopt here.

Without a specified selection rule, symmetry forces the answer to 1/2

Objection.
If the problem does not specify how the information was obtained, then we must assume a symmetric rule for selecting which true statement is revealed. Under that assumption, conditioning on “at least one boy” or “at least one girl” must give the same answer, and applying the law of total probability forces the posterior probability to equal the prior. Therefore, the correct answer must be 1/2.

Response.
This conclusion follows only if we assume that the conditioning is on a message chosen from a symmetric set of alternatives. Under that interpretation, the result does depend on the selection rule, and 1/2 is a valid answer for one particular choice of rule. But if the conditioning is on an event rather than a message, there is no requirement that different events form a symmetric partition or that the law of total probability be applied across them in this way. Under the event-based interpretation, the argument forcing 1/2 does not apply.

The problem is ambiguous and therefore has no answer

Objection.
Because the problem does not specify how we learn that there is a girl born on Tuesday, it is fundamentally ambiguous. Since different interpretations lead to different answers, the question has no single correct solution.

Response.
It’s true that the problem is ambiguous as stated in natural language. One option is to declare it unanswerable. Another is to resolve the ambiguity by adopting a conventional default interpretation. I choose the latter: I interpret the question as a conditional probability defined on an explicit probability model and make that interpretation clear by enumerating the sample space. Under that interpretation, the answer is unambiguous and, in my view, interesting and instructive – even if other interpretations lead to different answers.

You are changing the sampling procedure

Objection.
Some people object that the 13/27 result comes from changing how families are selected. Conditioning on “at least one child is a girl born on Tuesday” oversamples families with more girls, so the conditional distribution no longer represents the original population of two-child families. From this perspective, the result feels like an artifact of biased sampling rather than a genuine probability update.

Response.
That description is accurate, but it is not a flaw. Conditioning is biased sampling: evidence changes the distribution of outcomes. Families with more girls really are more likely to satisfy the condition, and the conditional probability reflects that fact.

The day of the week seems irrelevant

Objection.
Tuesday has nothing to do with gender, so it feels wrong that adding this detail should change the probability. Since the day of the week does not cause a child to be a girl, it seems irrelevant to the question.

Response.
This objection reflects a common confusion between causal independence and evidential relevance. While the day of the week does not cause the other child’s gender, it provides evidence about the number of girls in the family. Evidence can change probabilities even when there is no causal connection.

The result depends on unrealistic independence assumptions

Objection.
The solution assumes that genders and days of the week are independent and uniformly distributed, which is not true in the real world. If those assumptions are relaxed, the answer changes.

Response.
That is correct, but those assumptions are not the source of the puzzle. Relaxing them changes the numerical value of the answer, but not the underlying logic. The same kind of reasoning applies under more realistic models.

The problem is artificial or pathological

Objection.
Some readers reject the problem not because the calculation is wrong, but because the setup feels artificial or unlike how information is learned in real life. From this view, the problem is a trick rather than a meaningful probability question.

Response.
Whether this is a flaw or a feature depends on the goal. The problem is artificial, but it is intended to expose how unreliable our intuitions about conditional probability and independence can be. In that sense, its artificiality is what makes it pedagogically useful. The underlying issue – determining how evidence bears on hypotheses – comes up in real-world problems all the time. And getting it wrong has real-world consequences.

The Lost Chapter

The Lost Chapter

I’m happy to report that Probably Overthinking It is available now in paperback. If you would like a copy, you can order from Bookshop.org and Amazon (affiliate links).

To celebrate, I’m publishing The Lost Chapter — that is, the chapter I cut from the published book. It’s about The Girl Named Florida problem, which might be the most counterintuitive problem in probability — even more than the Monty Hall problem.

When I started writing the book, I thought it would include more puzzles and paradoxes like this, but as the project evolved, it shifted toward real world problems where data help us answer questions and make better decisions. As much as The Girl Named Florida is challenging and puzzling, it doesn’t have much application in the real world.

But it got a new life in the internet recently, so I think this is a good time to publish! The following is an excerpt; you can read the complete chapter here.

The Girl Named Florida

The Monty Hall Problem is famously contentious. People have strong feelings about the answer, and it has probably started more fights than any other problem in probability. But there’s another problem that I think it’s even more counterintuitive – and it has started a good number of fights as well. It’s called The Girl Named Florida.

I’ve written about this problem before, and I’ve demonstrated the correct answer, but I don’t think I really explained why the answer is what it is. That’s what I’ll try to do here.

As far as I have found, the source of the problem is Leonard Mlodinow’s book, The Drunkard’s Walk, which pose the question like this:

In a family with two children, what are the chances, if one of the children is a girl named Florida, that both children are girls?

If you have not encountered this problem before, your first thought is probably that the girl’s name is irrelevant – but it’s not. In fact, the answer depends on how common the name is.

If you feel like that can’t possibly be right, you are not alone. Solving this puzzle requires conditional probability, which is one of the most counterintuitive areas of probability. So I suggest we approach it slowly – like we’re defusing a bomb.

We’ll start with two problems involving coins and dice, where the probabilities are relatively simple. These examples demonstrate three principles that will help when things get strange:

  • It is not always clear when the condition in a conditional probability is relevant, and our intuition can be unreliable.
  • A reliable way to compute conditional probabilities is to enumerate equally likely possibilities and count.
  • If someone does something rare, it is likely that they made more than one attempt.

Then, finally, we’ll solve The Girl Named Florida.

Tossing Coins

Let’s warm up with two problems related to coins and dice.

We’ll assume that coins are fair, so the probability of getting heads or tails is 1/2. And the outcome of one coin toss does not affect another, so even if the coin comes up heads ten times, the probability of heads on the next toss is 1/2.

Now, suppose I toss a coin twice where I can see the outcome and you can’t. I tell you that I got heads at least once, and ask you the probability that I got heads both times.

You might think, if the outcome of one coin does not affect the other, it doesn’t matter if one of the coins came up heads – the probability for the other coin is still 1/2.

But that’s not right; the correct answer is 1/3. To see why, consider this:

  1. After I toss the coins, there are four equally likely outcomes: two heads, two tails, heads first and then tails, or tails first and then heads.
  2. When I tell you that I got heads at least once, I rule out one of the possibilities, two tails.
  3. The remaining three possibilities are still equally likely, so the probability of each is 1/3.
  4. In one of the remaining possibilities, the other coin is also heads.

So the conditional probability is 1/3.

If that argument doesn’t entirely convince you, there’s another way to solve problems like this, called enumeration.

Enumeration

A conditional probability has two parts: a statement and a condition. Both are claims about the world that might be true or not, but they play different roles. A conditional probability is the probability that the statement is true, given that the condition is true. In the coin toss example, the statement is “I got heads both times” and the condition is “I tossed a coin twice and got heads at least once”.

We’ve seen that it can be tricky to compute conditional probabilities, so let me suggest what I think is the most reliable way to get the right answer and be confident that it’s correct. Here are the steps:

  1. Make a list of equally likely outcomes,
  2. Select the subset where the condition is true,
  3. Within the subset where the condition is true, compute the fraction where the statement is also true.

This method is called enumerating the sample space, where the “sample space” is the list of outcomes. In the coin toss example, there are four possible outcomes, as shown in the following diagram.

_images/cac73dbcec0ebbfe16b64c983c335aaa366de414ea093c872a9fc77de6b4fd05.png

The shaded cells (both light and dark) show the three outcomes where the condition is true; the darker cell shows the one outcome where the statement is true. So the conditional probability is 1/3.

This example demonstrates one of the principles we’ll need to understand the puzzles: you have to count the combinations. If we know that the number of heads is either one or two, it is tempting to think these possibilities are equally likely. But there is only one way to get two heads, and there are two ways to get one heads. So the one-heads possibility is more likely.

In the next section, we’ll use this method to solve a problem involving dice. But I’ll start with a story that sets the scene.

Can We Get Serious Now?

The 2016 film Sully is based on the true story of Captain Chelsea Sullenberger, who famously and improbably landed a large passenger jet in the Hudson River near New York City, saving the lives of all 155 people on board.

In the aftermath of this emergency landing, investigators questioned his decision to ditch the airplane rather than attempt to land at one of two airports nearby. To demonstrate that these alternatives were feasible, they showed simulations of pilots landing successfully at both airports.

In the movie version of the hearing, Tom Hanks, who played Captain Sullenberger, memorably asks, “Can we get serious now?” Having seen the simulations, he says, “I’d like to know how many times the pilot practiced that maneuver before he actually pulled it off. Please ask how many practice runs they had.”

One of the investigators replies, “Seventeen. The pilot […] had seventeen practice attempts before the simulation we just witnessed.” And the audience gasps.

Of course this scene is fictionalized, but the logic of this exchange is consistent with the actual investigation. It is also consistent with the laws of probability.

If someone accomplishes an unlikely feat, you are right to suspect it was not their first try. And the more unlikely the feat, the more attempts you might guess they made.

I will demonstrate this point with coins and dice. Suppose I toss a coin and, based on the outcome, roll a die either once or twice. I don’t let you see the coin or the die, and you don’t know how many times I rolled, but I report that I rolled at least one six. Which do you think is more likely, that I rolled once or twice?

You might suspect that I rolled twice – and this time your intuition is correct. If I get a six, it is more likely that I rolled twice.

To see how much more likely, let’s enumerate the possibilities. The following diagram shows 72 equally likely outcomes.

_images/d3adf32d7822666f1be1efc833f3b66dd154ffea3ce8389ddd6c3afe62f20d5b.png

The left side shows 36 cases where I roll the die once, using a single digit to represent the outcomes. The right side shows 36 cases where I roll the die twice: the first digit represents the first roll; the second digit represents the second roll.

The shaded area indicates the outcomes where at least one die is a six. There are 17 in total, 6 when I roll the die once and 11 when I roll it twice. So if I tell you I rolled at least one six, the probability is 11/17 that I rolled the die twice, which is about 65%.

If you succeed at something difficult, it is likely you had more than one chance.

The Two Child Problems

Next we’ll solve two puzzles made famous by Martin Gardner in his Scientific American column in 1959. He posed the first like this:

Mr. Jones has two children. The older child is a girl. What is the probability that both children are girls?

The real world is complicated, so let’s assume that these problems are set in a world where all children are boys or girls with equal probability. With that simplification, there are four equally likely combinations of two children, shown in the following diagram.

_images/502b755caff85849b479f54f200cc4eeb645981da95d8f8a1c908832b6539896.png

The shaded areas show the families where the condition is true – that is, the first child is a girl. The darker area shows the only family where the statement is true – that is, both children are girls.

There are two possibilities where the condition is true and one of them where the statement is true, so the conditional probability is 1/2. This result confirms what you might have suspected: the sex of the older child is irrelevant. The probability that the second child is a girl is 1/2, regardless.

Now here’s the second problem, which I have revised to make it easier to compare with the first part:

Mr. Smith has two children. At least one of them is a [girl]. What is the probability that both children are [girls]?

Again, there are four equally likely combinations of two children, shown in the following diagram.

_images/eaad730bbc578ce03f54e6d28126d38225e8c1d36f22add213a361922c82eead.png

Now there are three possibilities where the condition is true – that is, at least one child is a girl. In one of them, the statement is true – that is, both children are girls. So the conditional probability is 1/3.

This problem is identical to the coin example, and it demonstrates the same principle: you have to count the combinations. There is only one way to have two girls, but there are two ways to have a boy and a girl.

More Variations

Now let’s consider a series of related questions where:

  • One of the children is a girl born on Saturday,
  • One of the children is a left-handed girl, and finally
  • One of the children is a girl named Florida.

To avoid real-world complications, let’s assume:

  • Children are equally likely to be born on any day of the week.
  • One child in 10 is left-handed.
  • One child out of 1000 is named Florida.
  • Children are independent of one other in the sense that the attributes of one (birth day, handedness, and name) don’t affect the attributes of the others.

Let’s also assume that “one of the children” means at least one, so a family could have two girls born on Saturday, or even two girls named Florida.

Saturday’s Child

In a family with two children, what are the chances, if one of the children is a girl born on Saturday, that both children are girls?

To answer this question, we’ll divide each of the four boy-girl combinations into 49 day-of-the-week combinations.

I’ll represent days with the digits 1 through 7, with 1 for Sunday and 7 for Saturday. And I’ll represent families with two digit numbers; for example, the number 17 represents a family where the first child was born on Sunday and the second child was born on Saturday.

The following diagram shows the four possible orders for boys and girls, and within them, the 49 possible orders for days of the week.

_images/42bb6c763be3e28fa2d943e0fe2e59b450f3466146f26cd34063a3646d195b1c.png

The shaded area shows the possibilities where the condition is true – that is, at least one of the children is a girl born on a Saturday. And the darker area shows the possibilities where the statement is true – that is, both children are girls.

There are 27 cases where the condition is true. In 13 of them, the statement is true, so the conditional probability is 13/27, which is about 48%.

So the day of the week is not irrelevant. If at least one child is a girl, the probability of two girls is about 33%. If at least one child is a girl born on Saturday, the probability of two girls is about 48%.

Now let’s see what happens if the girl is left-handed.

Left-handed girl

In a family with two children, what are the chances, if one of the children is a left-handed girl, that both children are girls?

Let’s assume that 1 child in 10 is left-handed, and if one sibling is left-handed, it doesn’t change the probability that the other is.

The following diagram shows the possible combinations, using “R” to represent a right-handed child and “L” to represent a left-handed child. Again, the shaded areas show where the condition is true; the darker area shows where the statement is true.

_images/fb32bf5f014d00840e8d73fb8405088d0378a13550e56894fe9452d437be5fd3.png

There are 39 combinations where at least one child is a left-handed girl. Of them, there are 19 combinations where both children are girls. So the conditional probability is 19/39, which is about 49%.

Now we’re starting to see a pattern. If the probability of a particular attribute, like birthday or handedness, is 1 in n, the number of cases where the condition is true is 4n-1 and the number of cases where the statement is true is 2n-1. So the conditional probability is (2n-1) / (4n-1).

Looking at the diagram, we can see where the terms in this expression come from. The multiple of four represents the segments of the L-shaped region where the condition is true; the multiple of two represents the segments where the statement is true. And we subtract one from the numerator and denominator so we don’t count the case in the lower-right corner twice.

In the days-of-the-week example, n is 7 and the conditional probability is 13/27, about 48%. In the handedness example, n is 10 and the conditional probability is 19/39, about 49%. And for the girl named Florida, who is 1 in 1000, the conditional probability is 1999/3999, about 49.99%. As n increases, the conditional probability approaches 1/2.

Going in the other direction, if we choose an attribute that’s more common, like 1 in 2, the conditional probability is 3/7, around 43%. And if we choose an attribute that everyone has, n is 1 and the conditional probability is 1/3.

In general for problems like these, the answer is between 1/3 and 1/2, closer to 1/3 if the attribute is common, and closer to 1/2 if it is rare.

But Why?

At this point I hope you are satisfied that the answers we calculated are correct. Enumerating the sample space, as we did, is a reliable way to compute conditional probabilities. But it might not be clear why additional information, like the name of a child or the day they were born, is relevant to the probability that a family has two girls.

The key is to remember what we learned from Sully: if you succeed at something improbable, you probably made more than one attempt.

If a family has a girl born on Saturday, they have done something moderately improbable, which suggests that they had more than one chance, that is, more than one girl. If they have a girl named Florida, which is more improbable, it is even more likely that they have two girls.

These problems seem paradoxical because we have a strong intuition that the additional information is irrelevant. The resolution of the paradox is that our intuition is wrong. In these examples, names and birthdays are relevant because they make the condition of the conditional probability more strict. And if you meet a strict condition, it is likely that you had more than one chance.

Be Careful What You Ask For

When I first wrote about these problems in 2011, a reader objected that the wording of the questions is ambiguous. For example, here’s Gardner’s version again (with my revision):

Mr. Smith has two children. At least one of them is a [girl]. What is the probability that both children are [girls]?

And here is the objection:

  • If we pick a family at random, ask if they have a girl, and learn that they do, the probability is 1/3 that the family has two girls.
  • But if we pick a family at random, choose one of the children at random, and find that she’s a girl, the probability is 1/2 that the family has two girls.

In either case, we know that the family has at least one girl, but the answer depends on how we came to know that.

I am sympathetic to this objection, up to a point. Yes, the question is ambiguous, but natural language is almost always ambiguous. As readers, we have to make assumptions about the author’s intent.

If I tell you that a family has at least one girl, without specifying how I came to know it, it’s reasonable to assume that all families with a girl are equally likely. I think that’s the natural interpretation of the question and, based on the answers Gardner and Mlodinow provide, that’s the interpretation they intended.

I offered this explanation to the reader who objected, but he was not satisfied. He replied at length, writing almost 4000 words about this problem, which is longer than this chapter. Sadly, we were not able to resolve our differences.

But this exchange helped me understand the difficulty of explaining this problem clearly, which helped when I wrote this chapter. So, if you think I succeeded, it’s probably because I had more than one chance.

The Girl Named Florida

The Girl Named Florida

In The Drunkard’s Walk, Leonard Mlodinow presents “The Girl Named Florida Problem”:

“In a family with two children, what are the chances, if [at least] one of the children is a girl named Florida, that both children are girls?”

I added “at least” to Mlodinow’s statement of the problem to avoid a subtle ambiguity.

I wrote about this problem in a previous article from 2011. As you can see in the comments, my explanation was not met with universal acclaim.

This time, I want to take a different approach.

First, to avoid some real-world complications, let’s assume that this question takes place in an imaginary city called Statesville where:

  • Every family has two children.
  • 50% of children are male and 50% are female.
  • All children are named after U.S. states, and all state names are chosen with equal probability.
  • Genders and names within each family are chosen independently.

Second, rather than solve it mathematically, I’ll demonstrate it computationally:

Either way, I hope you enjoy getting your head around this problem.