Browsed by
Tag: paradox

The Raven Paradox

The Raven Paradox

Suppose you are not sure whether all ravens are black. If you see a white raven, that clearly refutes the hypothesis. And if you see a black raven, that supports the hypothesis in the sense that it increases our confidence, maybe slightly. But what if you see a red apple – does that make the hypothesis any more or less likely?

This question is the core of the Raven paradox, a problem in the philosophy of science posed by Carl Gustav Hempel in the 1940s. It highlights a counterintuitive aspect of how we evaluate evidence and confirm hypotheses.

No resolution of the paradox is universally accepted, but the most widely accepted is what I will call the standard Bayesian response. In this article, I’ll present this response, explain why I think it is incomplete, and propose an extension that might resolve the paradox.

Click here to run this notebook on Colab.

The Problem

The paradox starts with the hypothesis

A: All ravens are black

And the contrapositive hypothesis

B: All non-black things are non-ravens

Logically, these hypotheses are identical – if A is true, B must be true, and vice versa. So if we have a certain level of confidence in A, we should have exactly the same confidence in B. And if we observe evidence in favor of A, we should also accept it as evidence in favor of B, to the same degree.

Also, if we accept that a black raven is evidence in favor of A, we should also accept that a non-black non-raven is evidence in favor of B.

Finally, if a non-black non-raven is evidence in favor of B, we should also accept that it is evidence in favor of A.

Therefore, a red apple (which is a non-black non-raven) is evidence that all ravens are black.

If you accept this conclusion, it seems like every time you see a red apple (or a blue car, or a green leaf, etc.) you should think, “Now I am slightly more confident that all ravens are black”.

But that seems absurd, so we have two options:

  1. Discover an error in the argument, or
  2. Accept the conclusion.

As you might expect, many versions of (1) and (2) have been proposed.

The standard Bayesian response is to accept the conclusion but, quoth Wikipedia “argue that the amount of confirmation provided is very small, due to the large discrepancy between the number of ravens and the number of non-black objects. According to this resolution, the conclusion appears paradoxical because we intuitively estimate the amount of evidence provided by the observation of a green apple to be zero, when it is in fact non-zero but extremely small.”

It is true that when the number of non-ravens is large, the amount of evidence we get from each non-black non-raven is so small it is negligible. But I don’t think that’s why the conclusion is so acutely counterintuitive.

To clarify my objection, let me present a smaller example I’ll call the Roulette paradox.

The Roulette Paradox

An American roulette wheel has 36 pockets with the numbers 1 to 36, and two pockets labeled 0 and 00. The non-zero pockets are red or black, and the zero pockets are green.

Suppose we work in quality control at the roulette factory and our job is to check that all zero pockets are green. If we observe a green zero, that’s evidence that all zeros are green. But what if we observe a red 19?

In this example, the standard Bayesian response fails:

  • First, the number of non-zeros is not particularly large, so the weight of the evidence is not negligible.
  • Also, the Bayesian response doesn’t address what I think is actually the key: The non-green non-zero may or may not be evidence, depending on how it was sampled.

As I will demonstrate,

  1. If we choose a pocket at random and it turns out to be a non-green non-zero, that is not evidence that all zeros are green.
  2. But if we choose a non-green pocket and it turns out to be non-zero, that is evidence that all zeros are green.

In both cases we observe a non-green non-zero, but “observe” is ambiguous. Whether the observation is evidence or not depends on the sampling process that generated the observation. And I think confusion between these two scenarios is the foundation of the paradox.

The Setup

Let’s get into the details. Switching from roulette back to ravens, we will consider four scenarios:

  1. You choose a random thing and it turns out to be a black raven.
  2. You choose a random thing and it turns out to be a non-black non-raven.
  3. You choose a random raven and it turns out to be black.
  4. You choose a random non-black thing and it turns out to be a non-raven.

The key to the raven paradox is the difference between scenarios 2 and 4.

  • Scenario 2 is what most people imagine when they picture “observing a red apple”. And in this scenario, the red apple is irrelevant, exactly as intuition insists.
  • In Scenario 4, a red apple is evidence in favor of A, because we’re systematically checking non-black things to ensure they’re not ravens – so finding they aren’t is confirmation. But this sampling process is a more contrived interpretation of “observing a red apple”.

The reason for the paradox is that we imagine Scenario 2 and we are given the conclusion from Scenario 4.

It might not be obvious why the red apple is evidence in Scenario 4, but not Scenario 2. I think it will be clearer if we do the math.

The Math

We’ll start with a small world where there are only N = 9 ravens and M = 19 non-ravens. Then we’ll see what happens as we vary N and M.

I’ll use i to represent the unknown number of black ravens, which could be any value from 0 to N, and j to represent the unknown number of black non-ravens, from 0 to M.

We’ll use a joint distribution to represent beliefs about i and j; then we’ll use Bayes’s Theorem to update these beliefs when we see new data.

Let’s start with a uniform prior over all possible combinations of (i, j). For this prior, the probability of A is 10%. We’ll see later that the prior affects the strength of the evidence, but it doesn’t affect whether an observation is in favor of A or not.

Scenario 1

Now let’s consider the first scenario: we choose a thing at random from the universe of things, and we find that it is a black raven.

The likelihood for this observation is: i / (N + M), because i is the number of black ravens and N + M is the total number of things.

In this scenario the posterior probability of A is 20%. The posterior probability is higher than the prior, so the black raven is evidence in favor of A.

To quantify the strength of the evidence, we’ll use the log odds ratio, which is 0.81. Later we’ll see how the strength of the evidence depends on the prior distribution of i and j.

Before we go on, let’s also look at the marginal distribution of i (number of black ravens) before and after.

_images/73acb5dc58b1e987037e9ef21aff0c9bc014e6a3f841f015b7365f9352d648c7.png

As expected, observing a black raven increases our confidence that all ravens are black. The posterior distribution shifts toward higher values of i, and the probability that i = N increases.

In Scenario 1, the likelihood depends only on i, not on j, so the update doesn’t change our beliefs about j (the number of black non-ravens).

Finally, let’s visualize posterior joint distribution of i and j.

_images/3f8f1dc012592d11ac19f20c5698984fc6134a93c7eeec35c1ba1aed5913a1a2.png

Because we started with a uniform distribution and the data has no bearing on j, the joint posterior probabilities don’t depend on j.

In summary, Scenario 1 is consistent with intuition: a black raven is evidence in favor of A.

Scenario 2

In this scenario, we choose a thing at random from the universe of N + M things, and it turns out to be a red apple – which we will treat generally as a non-black non-raven.

The likelihood of this observation is: (M - j) / (N + M), because M - j is the number of non-black non-ravens and N + M is the total number of things.

In this scenario, the posterior probability of A is the same as the prior. In fact, the entire distribution of i is unchanged.

So the red apple is not evidence in favor of A or against it. This is consistent with the intuition that the red apple (or any non-black non-raven) is irrelevant.

However, the red apple is evidence about j, as we can confirm by comparing the marginal distribution of j before and after.

_images/b99c195fe739e438b9f9d21fbbfc7e12e264b93bff2de276ae0dbe7effc395dc.png

And here’s the posterior joint distribution of i and j.

_images/de8e2c353c1109b5324bd83ac12ec0262d1267021a2736b91d8f83ec641bc11d.png

Because the red apple has no bearing on i, the posterior probabilities in this scenario don’t depend on i.

In summary, Scenario 2 matches our intuition: a red apple (chosen at random) is not evidence about whether all ravens are black.

Scenario 3

In this scenario, we choose a raven first and then observe that it is black.

The likelihood for this observation is: i / N, because i is the number of black ravens and N is the total number of ravens.

In this scenario, the posterior probability of A is 20%, the same as in Scenario 1.So we conclude that the black raven is evidence in favor of A, with the same strength regardless of whether we are in:

  • Scenario 1: Select a random thing and it turns out to be a black raven or
  • Scenario 3: Select a random raven and it turns out to be black.

In fact, the entire posterior distribution is the same in both scenarios. That’s because the likelihoods in Scenarios 1 and 3 differ only by a constant factor, which is removed when the posterior distributions are normalized.

In summary, Scenario 3 is consistent with intuition: if we choose a raven and find that it is black, that is evidence in favor of A.

Scenario 4

In the last scenario, we first choose a non-black thing (from all non-black things in the universe), and then observe that it is a non-raven.

The likelihood of this observation is: (M - j) / (N - i + M - j) because M - j is the number of non-black non-ravens and N - i + M - j is the total number of non-black things.

This likelihood depends on both i and j, unlike Scenario 2. This is the key difference that makes Scenario 4 informative about whether all ravens are black.

The posterior probability of A is about 15%, which is greater than the prior, so the non-black non-raven is evidence in favor of A. The log odds ratio is about 0.47, which is smaller than in Scenarios 1 and 3, because there are more non-ravens than ravens. As we’ll see, the strength of the evidence gets smaller as M gets bigger.

Here is the marginal distribution of i (number of black ravens) before and after.

_images/90ae0b51191d2f3b068cc9a4d382e172f23d8ec19a5708c3e5f51677d0b966cd.png

And here’s the marginal distribution of j (number of black non-ravens) before and after.

_images/f7b88c2307331de2664b17bd98bdd246a61d436d1f5fc263d05e80ea706ce875.png

Finally, here’s the posterior joint distribution of i and j.

_images/83ec65bdd92130d86aad0ff90c7b99232096fffa2e14c5ec96647f66e8137d0c.png

In Scenario 4, the likelihood depends on both i and j, so the update changes our beliefs about both parameters.

And in Scenario 4 a non-black non-raven (chosen from non-black things) is evidence in favor of A. This might still be surprising, but let me suggest a way to think about it: in this scenario we are checking non-black things to make sure they are not ravens. If we find a non-black raven, that contradicts A. If we don’t, that supports A.

In all four scenarios, the results are consistent with intuition. So as long as you are clear about which scenario you are in, there is no paradox. The paradox is only apparent if you think you are in Scenario 2 and you imagine the result from Scenario 4.

In the context of the original problem:

  1. If you walk out of your house and the first thing you see is a red apple (or a blue car, or a green leaf) that has no bearing on whether raven are black.
  2. But if you deliberately select a non-black thing and check whether it’s a raven, and you find that it is not, that actually is evidence that all ravens are black – but consistent with the standard Bayesian response, it is so weak it is negligible.

Successive updates

In these examples, we started with a uniform prior over all combinations of i and j. Of course that’s not a realistic representation of what we believe about the world. So let’s consider the effect of other priors.

In general, different priors lead to different posterior distributions, and in this case they lead to different conclusions about the strength of the evidence. But they lead to the same conclusion about the direction of the evidence.

To demonstrate, let’s see what happens if we observe a series of black ravens (in Scenario 1 or 3). For simplicity, assume that we sample with replacement.

The following function computes multiple updates, starting with the uniform prior and then using the posterior from each update as the prior for the next.

This table shows the results in Scenario 1 (which is the same as in Scenario 3). For each iteration, the table shows the prior and posterior probability of A, and the log odds ratio.

IterationPriorPosteriorLOR
00.1000000.2000000.810930
10.2000000.2842110.462624
20.2842110.3600000.348307
30.3600000.4279010.284942
40.4279010.4887150.245274
50.4887150.5431710.218261
60.5431710.5919200.198796
70.5919200.6355510.184196
80.6355510.6745900.172914
90.6745900.7095120.163995

As we see more ravens, the posterior probability of A increases, but the LOR decreases – which means that each raven provides weaker evidence than the previous one. In the long run the LOR converges to a value greater than 0 (about 0.11), which means that each raven provides at least some additional evidence, even when the prior is far from the uniform distribution we started with.

In the worst case, if the prior probability of A is 0 or 1, nothing we observe can change those beliefs, so nothing is evidence for or against A. But there is no prior where a black raven provides evidence against A.

[Proof: The likelihood of the observation is maximized when all ravens are black (i = N). Therefore, for any prior that gives non-zero probability to both A and its complement, the LOR is positive: these observations can never be evidence against A.]

The following table shows the results in Scenario 4, where we select a non-black thing and check that it is not a raven.

IterationPriorPosteriorLOR
00.1000000.1494030.457933
10.1494030.2010060.359272
20.2010060.2539910.302582
30.2539910.3072170.264273
40.3072170.3594960.235611
50.3594960.4098370.212911
60.4098370.4575280.194344
70.4575280.5021410.178860
80.5021410.5434770.165785
90.5434770.5815140.154644

The pattern is similar. Each non-black thing that turns out not to be a raven is weaker evidence than the previous one. But it is always in favor of A – in this scenario, there is no prior where a non-black non-raven is evidence against A.

Varying M

Finally, let’s see how the strength of the evidence varies as we increase M, the number of non-ravens. The following function computes results in Scenario 4 for a range of values of M, holding constant the number of ravens, N = 9.

MPriorPosteriorLOR
200.10.1476550.444110
500.10.1245150.246875
1000.10.1145300.151946
2000.10.1084950.091022
5000.10.1041000.044751
10000.10.1023310.025640

As M increases (more non-ravens in the universe), the strength of the evidence decreases. This is consistent with the standard Bayesian response, which notes that in a realistic scenario, the evidence is negligible.

Conclusion

The standard Bayesian response to the Raven paradox is correct in the sense that if a non-black non-raven is evidence that all ravens are black, it is so extremely weak. But that doesn’t explain why the roulette example – where the number of non-green non-zero pockets is relatively small – is still so contrary to intuition.

I think a better explanation for the paradox is the ambiguity of the word “observe”. If we are explicit about the sampling process that generates the observation, we find that a non-black non-raven may or may not be evidence that all ravens are black.

  • Scenario 2: If we choose a random thing and find that it is a non-black non-raven, that is not evidence.
  • Scenario 4: If we choose a non-black thing and find that it is a non-raven, that is evidence.

The first case is entirely consistent with intuition. The second case is less obvious, but if we consider smaller examples like a roulette wheel, and do the math, it can be reconciled with intuition.

Confusion between these scenarios causes the apparent paradox, and clarity about the scenarios resolves it.

Symmetry and Asymmetry

It might still seem strange that a black raven is always evidence for A and B, but a non-black non-raven may or may not be, depending on the sampling process. If A and B are logically identical, and a black raven supports A, it’s still not clear why a non-black non-raven doesn’t always support B.

After all, if we start with B, we conclude that a non-black non-raven is always evidence for B (and A), and a black raven may or may not be. Where does this asymmetry come from?

We broke the symmetry when we formulated “All ravens are black” as “Out of all ravens, how many are black?” This formulation first divides the world into ravens and non-ravens, then asks how many in each group are black.

Conversely, if we start with “All non-black things are non-ravens”, we formulate it as “Out of all non-black things, how many are ravens?” In this formulation, we divide the world into black and non-black things, then ask how many in each group are ravens.

The asymmetry is apparent when we parameterize the models. If we start with A, we define i to be the number of ravens that are black. And we find that in Scenario 1, the likelihood of a black raven depends on i, and in Scenario 2, the likelihood of a non-black non-raven does not.

If we start with B, we define i to be the number of non-black things that are non-ravens. Then in Scenario 1 we find that a non-black non-raven pertains to i, but a black raven does not.

So the symmetry is broken when we formulate the hypothesis in a way that is testable with data. In propositional logic, A and B are equivalent in the sense that evidence for one must be evidence for the other. In the Bayesian formulation, “How many ravens are black?” and “How many non-black things are non-ravens?” are not equivalent; evidence for one is not necessarily evidence for the other.

A critic might say that the Bayesian formulation is a non-resolution – that is, it doesn’t solve the original problem posed by Hempel; it only solves a related problem by making additional assumptions.

A Bayesian response is that the Raven Paradox is only problematic in the abstract world of propositional logic; as soon as we formulate the question in a way that connects it to the real world through observation, it disappears. So the Raven Paradox is similar to the principle of explosion – it demonstrates a brittleness in propositional logic that makes it unsuitable for reasoning about many real-world hypotheses.

Related Reading

I am not the first to notice that the interpretation of evidence depends on a model of the data-generating process. In the context of the Raven Problem, Richard Royall wrote:

We see that the observation of a red pencil can be evidence that all ravens are black. To make the proper interpretation, we must have an additional piece of information. Whether the observation is or is not evidence supporting the hypothesis (A) that all ravens are black versus the hypothesis (B) that only a fraction … are black is determined by the sampling procedure. A randomly selected pencil that proves to be red is not evidence that all ravens are black, but a randomly selected red object that proves to be a pencil is.

This analysis appears in an appendix of Statistical evidence: a likelihood paradigm, first published in 1997. I found it in a footnote of Belief, Evidence, and Uncertainty: Problems of Epistemic Inference, published in 2016:

Royall in his commentary on the Raven Paradox … observes that how one got the white shoes is inferentially important. If you grabbed a non-raven object at random, then it does not bear on the question of whether all ravens are black. If on the other hand you grabbed a random non-black object, and it turned out to be a pair of shoes, then it provides a very tiny amount of evidence for the hypothesis that all ravens are black …

Royall is right that the sampling process determines whether a red pencil (or white shoe) is evidence about ravens, and he analyzes a version of what I’m calling Scenario 4. But I don’t think his analysis quite explains why the paradox feels so counterintuitive, and it seems to have had little impact on the discussion of the Raven paradox in the confirmation theory literature.


Here is a longer version of this article that includes all of the code and a list of objections with my responses. You can also click here to run the notebook on Colab.

The Lost Chapter

The Lost Chapter

I’m happy to report that Probably Overthinking It is available now in paperback. If you would like a copy, you can order from Bookshop.org and Amazon (affiliate links).

To celebrate, I’m publishing The Lost Chapter — that is, the chapter I cut from the published book. It’s about The Girl Named Florida problem, which might be the most counterintuitive problem in probability — even more than the Monty Hall problem.

When I started writing the book, I thought it would include more puzzles and paradoxes like this, but as the project evolved, it shifted toward real world problems where data help us answer questions and make better decisions. As much as The Girl Named Florida is challenging and puzzling, it doesn’t have much application in the real world.

But it got a new life in the internet recently, so I think this is a good time to publish! The following is an excerpt; you can read the complete chapter here.

The Girl Named Florida

The Monty Hall Problem is famously contentious. People have strong feelings about the answer, and it has probably started more fights than any other problem in probability. But there’s another problem that I think it’s even more counterintuitive – and it has started a good number of fights as well. It’s called The Girl Named Florida.

I’ve written about this problem before, and I’ve demonstrated the correct answer, but I don’t think I really explained why the answer is what it is. That’s what I’ll try to do here.

As far as I have found, the source of the problem is Leonard Mlodinow’s book, The Drunkard’s Walk, which pose the question like this:

In a family with two children, what are the chances, if one of the children is a girl named Florida, that both children are girls?

If you have not encountered this problem before, your first thought is probably that the girl’s name is irrelevant – but it’s not. In fact, the answer depends on how common the name is.

If you feel like that can’t possibly be right, you are not alone. Solving this puzzle requires conditional probability, which is one of the most counterintuitive areas of probability. So I suggest we approach it slowly – like we’re defusing a bomb.

We’ll start with two problems involving coins and dice, where the probabilities are relatively simple. These examples demonstrate three principles that will help when things get strange:

  • It is not always clear when the condition in a conditional probability is relevant, and our intuition can be unreliable.
  • A reliable way to compute conditional probabilities is to enumerate equally likely possibilities and count.
  • If someone does something rare, it is likely that they made more than one attempt.

Then, finally, we’ll solve The Girl Named Florida.

Tossing Coins

Let’s warm up with two problems related to coins and dice.

We’ll assume that coins are fair, so the probability of getting heads or tails is 1/2. And the outcome of one coin toss does not affect another, so even if the coin comes up heads ten times, the probability of heads on the next toss is 1/2.

Now, suppose I toss a coin twice where I can see the outcome and you can’t. I tell you that I got heads at least once, and ask you the probability that I got heads both times.

You might think, if the outcome of one coin does not affect the other, it doesn’t matter if one of the coins came up heads – the probability for the other coin is still 1/2.

But that’s not right; the correct answer is 1/3. To see why, consider this:

  1. After I toss the coins, there are four equally likely outcomes: two heads, two tails, heads first and then tails, or tails first and then heads.
  2. When I tell you that I got heads at least once, I rule out one of the possibilities, two tails.
  3. The remaining three possibilities are still equally likely, so the probability of each is 1/3.
  4. In one of the remaining possibilities, the other coin is also heads.

So the conditional probability is 1/3.

If that argument doesn’t entirely convince you, there’s another way to solve problems like this, called enumeration.

Enumeration

A conditional probability has two parts: a statement and a condition. Both are claims about the world that might be true or not, but they play different roles. A conditional probability is the probability that the statement is true, given that the condition is true. In the coin toss example, the statement is “I got heads both times” and the condition is “I tossed a coin twice and got heads at least once”.

We’ve seen that it can be tricky to compute conditional probabilities, so let me suggest what I think is the most reliable way to get the right answer and be confident that it’s correct. Here are the steps:

  1. Make a list of equally likely outcomes,
  2. Select the subset where the condition is true,
  3. Within the subset where the condition is true, compute the fraction where the statement is also true.

This method is called enumerating the sample space, where the “sample space” is the list of outcomes. In the coin toss example, there are four possible outcomes, as shown in the following diagram.

_images/cac73dbcec0ebbfe16b64c983c335aaa366de414ea093c872a9fc77de6b4fd05.png

The shaded cells (both light and dark) show the three outcomes where the condition is true; the darker cell shows the one outcome where the statement is true. So the conditional probability is 1/3.

This example demonstrates one of the principles we’ll need to understand the puzzles: you have to count the combinations. If we know that the number of heads is either one or two, it is tempting to think these possibilities are equally likely. But there is only one way to get two heads, and there are two ways to get one heads. So the one-heads possibility is more likely.

In the next section, we’ll use this method to solve a problem involving dice. But I’ll start with a story that sets the scene.

Can We Get Serious Now?

The 2016 film Sully is based on the true story of Captain Chelsea Sullenberger, who famously and improbably landed a large passenger jet in the Hudson River near New York City, saving the lives of all 155 people on board.

In the aftermath of this emergency landing, investigators questioned his decision to ditch the airplane rather than attempt to land at one of two airports nearby. To demonstrate that these alternatives were feasible, they showed simulations of pilots landing successfully at both airports.

In the movie version of the hearing, Tom Hanks, who played Captain Sullenberger, memorably asks, “Can we get serious now?” Having seen the simulations, he says, “I’d like to know how many times the pilot practiced that maneuver before he actually pulled it off. Please ask how many practice runs they had.”

One of the investigators replies, “Seventeen. The pilot […] had seventeen practice attempts before the simulation we just witnessed.” And the audience gasps.

Of course this scene is fictionalized, but the logic of this exchange is consistent with the actual investigation. It is also consistent with the laws of probability.

If someone accomplishes an unlikely feat, you are right to suspect it was not their first try. And the more unlikely the feat, the more attempts you might guess they made.

I will demonstrate this point with coins and dice. Suppose I toss a coin and, based on the outcome, roll a die either once or twice. I don’t let you see the coin or the die, and you don’t know how many times I rolled, but I report that I rolled at least one six. Which do you think is more likely, that I rolled once or twice?

You might suspect that I rolled twice – and this time your intuition is correct. If I get a six, it is more likely that I rolled twice.

To see how much more likely, let’s enumerate the possibilities. The following diagram shows 72 equally likely outcomes.

_images/d3adf32d7822666f1be1efc833f3b66dd154ffea3ce8389ddd6c3afe62f20d5b.png

The left side shows 36 cases where I roll the die once, using a single digit to represent the outcomes. The right side shows 36 cases where I roll the die twice: the first digit represents the first roll; the second digit represents the second roll.

The shaded area indicates the outcomes where at least one die is a six. There are 17 in total, 6 when I roll the die once and 11 when I roll it twice. So if I tell you I rolled at least one six, the probability is 11/17 that I rolled the die twice, which is about 65%.

If you succeed at something difficult, it is likely you had more than one chance.

The Two Child Problems

Next we’ll solve two puzzles made famous by Martin Gardner in his Scientific American column in 1959. He posed the first like this:

Mr. Jones has two children. The older child is a girl. What is the probability that both children are girls?

The real world is complicated, so let’s assume that these problems are set in a world where all children are boys or girls with equal probability. With that simplification, there are four equally likely combinations of two children, shown in the following diagram.

_images/502b755caff85849b479f54f200cc4eeb645981da95d8f8a1c908832b6539896.png

The shaded areas show the families where the condition is true – that is, the first child is a girl. The darker area shows the only family where the statement is true – that is, both children are girls.

There are two possibilities where the condition is true and one of them where the statement is true, so the conditional probability is 1/2. This result confirms what you might have suspected: the sex of the older child is irrelevant. The probability that the second child is a girl is 1/2, regardless.

Now here’s the second problem, which I have revised to make it easier to compare with the first part:

Mr. Smith has two children. At least one of them is a [girl]. What is the probability that both children are [girls]?

Again, there are four equally likely combinations of two children, shown in the following diagram.

_images/eaad730bbc578ce03f54e6d28126d38225e8c1d36f22add213a361922c82eead.png

Now there are three possibilities where the condition is true – that is, at least one child is a girl. In one of them, the statement is true – that is, both children are girls. So the conditional probability is 1/3.

This problem is identical to the coin example, and it demonstrates the same principle: you have to count the combinations. There is only one way to have two girls, but there are two ways to have a boy and a girl.

More Variations

Now let’s consider a series of related questions where:

  • One of the children is a girl born on Saturday,
  • One of the children is a left-handed girl, and finally
  • One of the children is a girl named Florida.

To avoid real-world complications, let’s assume:

  • Children are equally likely to be born on any day of the week.
  • One child in 10 is left-handed.
  • One child out of 1000 is named Florida.
  • Children are independent of one other in the sense that the attributes of one (birth day, handedness, and name) don’t affect the attributes of the others.

Let’s also assume that “one of the children” means at least one, so a family could have two girls born on Saturday, or even two girls named Florida.

Saturday’s Child

In a family with two children, what are the chances, if one of the children is a girl born on Saturday, that both children are girls?

To answer this question, we’ll divide each of the four boy-girl combinations into 49 day-of-the-week combinations.

I’ll represent days with the digits 1 through 7, with 1 for Sunday and 7 for Saturday. And I’ll represent families with two digit numbers; for example, the number 17 represents a family where the first child was born on Sunday and the second child was born on Saturday.

The following diagram shows the four possible orders for boys and girls, and within them, the 49 possible orders for days of the week.

_images/42bb6c763be3e28fa2d943e0fe2e59b450f3466146f26cd34063a3646d195b1c.png

The shaded area shows the possibilities where the condition is true – that is, at least one of the children is a girl born on a Saturday. And the darker area shows the possibilities where the statement is true – that is, both children are girls.

There are 27 cases where the condition is true. In 13 of them, the statement is true, so the conditional probability is 13/27, which is about 48%.

So the day of the week is not irrelevant. If at least one child is a girl, the probability of two girls is about 33%. If at least one child is a girl born on Saturday, the probability of two girls is about 48%.

Now let’s see what happens if the girl is left-handed.

Left-handed girl

In a family with two children, what are the chances, if one of the children is a left-handed girl, that both children are girls?

Let’s assume that 1 child in 10 is left-handed, and if one sibling is left-handed, it doesn’t change the probability that the other is.

The following diagram shows the possible combinations, using “R” to represent a right-handed child and “L” to represent a left-handed child. Again, the shaded areas show where the condition is true; the darker area shows where the statement is true.

_images/fb32bf5f014d00840e8d73fb8405088d0378a13550e56894fe9452d437be5fd3.png

There are 39 combinations where at least one child is a left-handed girl. Of them, there are 19 combinations where both children are girls. So the conditional probability is 19/39, which is about 49%.

Now we’re starting to see a pattern. If the probability of a particular attribute, like birthday or handedness, is 1 in n, the number of cases where the condition is true is 4n-1 and the number of cases where the statement is true is 2n-1. So the conditional probability is (2n-1) / (4n-1).

Looking at the diagram, we can see where the terms in this expression come from. The multiple of four represents the segments of the L-shaped region where the condition is true; the multiple of two represents the segments where the statement is true. And we subtract one from the numerator and denominator so we don’t count the case in the lower-right corner twice.

In the days-of-the-week example, n is 7 and the conditional probability is 13/27, about 48%. In the handedness example, n is 10 and the conditional probability is 19/39, about 49%. And for the girl named Florida, who is 1 in 1000, the conditional probability is 1999/3999, about 49.99%. As n increases, the conditional probability approaches 1/2.

Going in the other direction, if we choose an attribute that’s more common, like 1 in 2, the conditional probability is 3/7, around 43%. And if we choose an attribute that everyone has, n is 1 and the conditional probability is 1/3.

In general for problems like these, the answer is between 1/3 and 1/2, closer to 1/3 if the attribute is common, and closer to 1/2 if it is rare.

But Why?

At this point I hope you are satisfied that the answers we calculated are correct. Enumerating the sample space, as we did, is a reliable way to compute conditional probabilities. But it might not be clear why additional information, like the name of a child or the day they were born, is relevant to the probability that a family has two girls.

The key is to remember what we learned from Sully: if you succeed at something improbable, you probably made more than one attempt.

If a family has a girl born on Saturday, they have done something moderately improbable, which suggests that they had more than one chance, that is, more than one girl. If they have a girl named Florida, which is more improbable, it is even more likely that they have two girls.

These problems seem paradoxical because we have a strong intuition that the additional information is irrelevant. The resolution of the paradox is that our intuition is wrong. In these examples, names and birthdays are relevant because they make the condition of the conditional probability more strict. And if you meet a strict condition, it is likely that you had more than one chance.

Be Careful What You Ask For

When I first wrote about these problems in 2011, a reader objected that the wording of the questions is ambiguous. For example, here’s Gardner’s version again (with my revision):

Mr. Smith has two children. At least one of them is a [girl]. What is the probability that both children are [girls]?

And here is the objection:

  • If we pick a family at random, ask if they have a girl, and learn that they do, the probability is 1/3 that the family has two girls.
  • But if we pick a family at random, choose one of the children at random, and find that she’s a girl, the probability is 1/2 that the family has two girls.

In either case, we know that the family has at least one girl, but the answer depends on how we came to know that.

I am sympathetic to this objection, up to a point. Yes, the question is ambiguous, but natural language is almost always ambiguous. As readers, we have to make assumptions about the author’s intent.

If I tell you that a family has at least one girl, without specifying how I came to know it, it’s reasonable to assume that all families with a girl are equally likely. I think that’s the natural interpretation of the question and, based on the answers Gardner and Mlodinow provide, that’s the interpretation they intended.

I offered this explanation to the reader who objected, but he was not satisfied. He replied at length, writing almost 4000 words about this problem, which is longer than this chapter. Sadly, we were not able to resolve our differences.

But this exchange helped me understand the difficulty of explaining this problem clearly, which helped when I wrote this chapter. So, if you think I succeeded, it’s probably because I had more than one chance.