{"id":1593,"date":"2025-09-25T21:46:17","date_gmt":"2025-09-25T21:46:17","guid":{"rendered":"https:\/\/www.allendowney.com\/blog\/?p=1593"},"modified":"2025-09-25T21:46:20","modified_gmt":"2025-09-25T21:46:20","slug":"the-poincare-problem","status":"publish","type":"post","link":"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/","title":{"rendered":"The Poincar\u00e9 Problem"},"content":{"rendered":"\n<p><strong>Selection bias<\/strong> is the hardest problem in statistics because it\u2019s almost unavoidable in practice, and once the data have been collected, it\u2019s usually not possible to quantify the effect of selection or recover an unbiased estimate of what you are trying to measure.<\/p>\n\n\n\n<p>And because the effect is systematic, not random, it doesn\u2019t help to collect more data. In fact, larger sample sizes make the problem worse, because they give the false impression of precision.<\/p>\n\n\n\n<p>But sometimes, if we are willing to make assumptions about the data generating process, we can use <strong>Bayesian methods<\/strong> to infer the effect of selection bias and produce an unbiased estimate.<\/p>\n\n\n\n<p><a href=\"https:\/\/colab.research.google.com\/github\/AllenDowney\/ThinkBayes2\/blob\/master\/examples\/bread.ipynb\">Click here to run this notebook on Colab<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Poincar\u00e9 and the Baker<\/h2>\n\n\n\n<p>As an example, let\u2019s solve an exercise from <a href=\"https:\/\/allendowney.github.io\/ThinkBayes2\/chap07.html\">Chapter 7 of <em>Think Bayes<\/em><\/a>. It\u2019s based on a fictional anecdote about the mathematician Henri Poincar\u00e9:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Supposedly Poincar\u00e9 suspected that his local bakery was selling loaves of bread that were lighter than the advertised weight of 1 kg, so every day for a year he bought a loaf of bread, brought it home and weighed it. At the end of the year, he plotted the distribution of his measurements and showed that it fit a normal distribution with mean 950 g and standard deviation 50 g. He brought this evidence to the bread police, who gave the baker a warning.<\/p>\n\n\n\n<p>For the next year, Poincar\u00e9 continued to weigh his bread every day. At the end of the year, he found that the average weight was 1000 g, just as it should be, but again he complained to the bread police, and this time they fined the baker.<\/p>\n\n\n\n<p>Why? Because the shape of the new distribution was asymmetric. Unlike the normal distribution, it was skewed to the right, which is consistent with the hypothesis that the baker was still making 950 g loaves, but deliberately giving Poincar\u00e9 the heavier ones.<\/p>\n\n\n\n<p>To see whether this anecdote is plausible, let\u2019s suppose that when the baker sees Poincar\u00e9 coming, he hefts <code>k<\/code> loaves of bread and gives Poincar\u00e9 the heaviest one. How many loaves would the baker have to heft to make the average of the maximum 1000 g?<\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\">How Many Loaves?<\/h2>\n\n\n\n<p>Here are distributions with the same underlying normal distribution and different values of <code>k<\/code>.<\/p>\n\n\n\n<pre id=\"codecell1\" class=\"wp-block-preformatted\">mu_true, sigma_true = 950, 50<br><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"442\" height=\"292\" src=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/35cfe517511163559ff87ce035853e291ea5625f0b9eba3d13b81592c0427cca.png\" alt=\"\" class=\"wp-image-1600\" srcset=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/35cfe517511163559ff87ce035853e291ea5625f0b9eba3d13b81592c0427cca.png 442w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/35cfe517511163559ff87ce035853e291ea5625f0b9eba3d13b81592c0427cca-300x198.png 300w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/35cfe517511163559ff87ce035853e291ea5625f0b9eba3d13b81592c0427cca-409x270.png 409w\" sizes=\"auto, (max-width: 442px) 100vw, 442px\" \/><\/figure>\n\n\n\n<p>As <code>k<\/code> increases, the mean increases and the standard deviation decreases.<\/p>\n\n\n\n<p>When <code>k=4<\/code>, the mean is close to 1000. So let\u2019s assume the baker hefted four loaves and gave the heaviest to Poincar\u00e9.<\/p>\n\n\n\n<p>At the end of one year, can we tell the difference between the following possibilities?<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Innocent: The baker actually increased the mean to 1000, and <code>k=1<\/code>.<\/li>\n\n\n\n<li>Shenanigans: The mean was still 950, but the baker selected with <code>k=4<\/code>.<\/li>\n<\/ul>\n\n\n\n<p>Here\u2019s a sample under the <code>k=4<\/code> scenario, compared to 10 samples with the same mean and standard deviation, and <code>k=1<\/code>. <\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"442\" height=\"292\" src=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/7b6da6e593659e1009facc4f1dee2ef9641cd61e691f7c197a0c60a0d56d4308.png\" alt=\"\" class=\"wp-image-1598\" srcset=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/7b6da6e593659e1009facc4f1dee2ef9641cd61e691f7c197a0c60a0d56d4308.png 442w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/7b6da6e593659e1009facc4f1dee2ef9641cd61e691f7c197a0c60a0d56d4308-300x198.png 300w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/7b6da6e593659e1009facc4f1dee2ef9641cd61e691f7c197a0c60a0d56d4308-409x270.png 409w\" sizes=\"auto, (max-width: 442px) 100vw, 442px\" \/><\/figure>\n\n\n\n<p>The <code>k=4<\/code> distribution falls mostly within the range of variation we\u2019d expect from the <code>k=1<\/code> distribution (with the same mean and standard deviation). If you were on the jury and saw this evidence, would you convict the baker?<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Ask a Bayesian<\/h2>\n\n\n\n<p>As a Bayesian approach to this problem, let\u2019s see if we can use this data to estimate <code>k<\/code> and the parameters of the underlying distribution. Here\u2019s a PyMC model that<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Defines prior distributions for <code>mu<\/code>, <code>sigma<\/code>, and <code>k<\/code>, and<\/li>\n\n\n\n<li>Uses a custom distribution that computes the likelihood of the data for a hypothetical set of parameters (<a href=\"https:\/\/colab.research.google.com\/github\/AllenDowney\/ThinkBayes2\/blob\/master\/examples\/bread.ipynb\">see the notebook for details<\/a>).<\/li>\n<\/ul>\n\n\n\n<pre id=\"codecell6\" class=\"wp-block-preformatted\">def make_model(sample):\n    with pm.Model() as model:\n        mu = pm.Normal(\"mu\", mu=950, sigma=30)\n        sigma = pm.HalfNormal(\"sigma\", sigma=30)\n        k = pm.Uniform(\"k\", lower=0.5, upper=15)\n\n        obs = pm.CustomDist(\n            \"obs\",\n            mu, sigma, k,\n            logp=max_normal_logp,\n            observed=sample,\n        )\n    return model\n<\/pre>\n\n\n\n<p>Notice that we treat <code>k<\/code> as continuous. That\u2019s because continuous parameters are much easier to sample (and the log PDF function allows non-integer values of <code>k<\/code>). But it also make sense in the context of the problem \u2013 for example, if the baker sometimes hefts three loaves and sometimes four, we can approximate the distribution of the maximum with <code>k=3.5<\/code>.<\/p>\n\n\n\n<p>The model runs quickly and the diagnostics look good. Here are the posterior distributions of the parameters compared to their known values.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"441\" height=\"292\" src=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/478c2e093c381a97829fb67575579940c9647ce76044eb7599ca124764f0c8bd.png\" alt=\"Posterior distribution of mu showing the posterior mean is 940 compared to the true value 950.\" class=\"wp-image-1599\" srcset=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/478c2e093c381a97829fb67575579940c9647ce76044eb7599ca124764f0c8bd.png 441w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/478c2e093c381a97829fb67575579940c9647ce76044eb7599ca124764f0c8bd-300x199.png 300w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/478c2e093c381a97829fb67575579940c9647ce76044eb7599ca124764f0c8bd-408x270.png 408w\" sizes=\"auto, (max-width: 441px) 100vw, 441px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"441\" height=\"292\" src=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/064e15e7bb9b7a88a9b4118964b00919934f5b1e6fe04ebd195cfbf9d7da40a9.png\" alt=\"Posterior distribution of sigma showing the posterior mean is 54 compared to the true value 50.\" class=\"wp-image-1601\" srcset=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/064e15e7bb9b7a88a9b4118964b00919934f5b1e6fe04ebd195cfbf9d7da40a9.png 441w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/064e15e7bb9b7a88a9b4118964b00919934f5b1e6fe04ebd195cfbf9d7da40a9-300x199.png 300w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/064e15e7bb9b7a88a9b4118964b00919934f5b1e6fe04ebd195cfbf9d7da40a9-408x270.png 408w\" sizes=\"auto, (max-width: 441px) 100vw, 441px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"440\" height=\"292\" src=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/image.png\" alt=\"Posterior distribution of k showing the posterior mean is 5.5 compared to the true value 4.\" class=\"wp-image-1597\" srcset=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/image.png 440w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/image-300x199.png 300w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/image-407x270.png 407w\" sizes=\"auto, (max-width: 440px) 100vw, 440px\" \/><\/figure>\n\n\n\n<p>With one year of data, we can recover the parameters pretty well. The true values fall comfortably inside the posterior distributions, and the posterior mode of <code>k<\/code> is close to the true value, <code>4<\/code>.<\/p>\n\n\n\n<p>But the posterior distributions are still quite wide. There is even some possibility that the baker is innocent, although it is small.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>This example shows that we can use the shape of an observed distribution to estimate the effect of selection bias and recover the unbiased latent distribution. But we might need a lot of data, and the inference depends on strong assumptions about the data generating process.<\/p>\n\n\n\n<p>Credits: I don\u2019t remember where I got this example from (maybe <a href=\"https:\/\/everything2.com\/title\/%2522true%2522+story+about+Poincar%25C3%25A9%2527s+baker\">here<\/a>?), but it appears in Leonard Mlodinov, <em>The Drunkard\u2019s Walk<\/em> (2008). Mlodinov credits Bart Holland, <em>What Are the Chances?<\/em> (2002). The <a href=\"https:\/\/hsm.stackexchange.com\/questions\/7299\/poincar%C3%A9-and-the-baker-was-the-anecdote-true\">ultimate source<\/a> seems to be George Gamow and Marvin Stern, <em>Puzzle Math<\/em> (1958) \u2013 but their version is about a German professor, not Poincar\u00e9.<\/p>\n\n\n\n<p>You can order print and ebook versions of <em>Think Bayes 2e<\/em> from <a href=\"https:\/\/bookshop.org\/a\/98697\/9781492089469\">Bookshop.org<\/a> and <a href=\"https:\/\/amzn.to\/334eqGo\">Amazon<\/a>.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Selection bias is the hardest problem in statistics because it\u2019s almost unavoidable in practice, and once the data have been collected, it\u2019s usually not possible to quantify the effect of selection or recover an unbiased estimate of what you are trying to measure. And because the effect is systematic, not random, it doesn\u2019t help to collect more data. In fact, larger sample sizes make the problem worse, because they give the false impression of precision. But sometimes, if we are&#8230;<\/p>\n<p class=\"read-more\"><a class=\"btn btn-default\" href=\"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/\"> Read More<span class=\"screen-reader-text\">  Read More<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[101],"tags":[71,100],"class_list":["post-1593","post","type-post","status-publish","format-standard","hentry","category-bayesian-methods","tag-bayesian-statistics","tag-selection-bias"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>The Poincar\u00e9 Problem - Probably Overthinking It<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"The Poincar\u00e9 Problem - Probably Overthinking It\" \/>\n<meta property=\"og:description\" content=\"Selection bias is the hardest problem in statistics because it\u2019s almost unavoidable in practice, and once the data have been collected, it\u2019s usually not possible to quantify the effect of selection or recover an unbiased estimate of what you are trying to measure. And because the effect is systematic, not random, it doesn\u2019t help to collect more data. In fact, larger sample sizes make the problem worse, because they give the false impression of precision. But sometimes, if we are... Read More Read More\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/\" \/>\n<meta property=\"og:site_name\" content=\"Probably Overthinking It\" \/>\n<meta property=\"article:published_time\" content=\"2025-09-25T21:46:17+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-09-25T21:46:20+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/35cfe517511163559ff87ce035853e291ea5625f0b9eba3d13b81592c0427cca.png\" \/>\n\t<meta property=\"og:image:width\" content=\"442\" \/>\n\t<meta property=\"og:image:height\" content=\"292\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"AllenDowney\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@AllenDowney\" \/>\n<meta name=\"twitter:site\" content=\"@AllenDowney\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"AllenDowney\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/\"},\"author\":{\"name\":\"AllenDowney\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207\"},\"headline\":\"The Poincar\u00e9 Problem\",\"datePublished\":\"2025-09-25T21:46:17+00:00\",\"dateModified\":\"2025-09-25T21:46:20+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/\"},\"wordCount\":810,\"publisher\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/35cfe517511163559ff87ce035853e291ea5625f0b9eba3d13b81592c0427cca.png\",\"keywords\":[\"bayesian statistics\",\"selection bias\"],\"articleSection\":[\"Bayesian methods\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/\",\"url\":\"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/\",\"name\":\"The Poincar\u00e9 Problem - Probably Overthinking It\",\"isPartOf\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/35cfe517511163559ff87ce035853e291ea5625f0b9eba3d13b81592c0427cca.png\",\"datePublished\":\"2025-09-25T21:46:17+00:00\",\"dateModified\":\"2025-09-25T21:46:20+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/#primaryimage\",\"url\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/35cfe517511163559ff87ce035853e291ea5625f0b9eba3d13b81592c0427cca.png\",\"contentUrl\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/35cfe517511163559ff87ce035853e291ea5625f0b9eba3d13b81592c0427cca.png\",\"width\":442,\"height\":292},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.allendowney.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"The Poincar\u00e9 Problem\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#website\",\"url\":\"https:\/\/www.allendowney.com\/blog\/\",\"name\":\"Probably Overthinking It\",\"description\":\"Data science, Bayesian Statistics, and other ideas\",\"publisher\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.allendowney.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#organization\",\"name\":\"Probably Overthinking It\",\"url\":\"https:\/\/www.allendowney.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png\",\"contentUrl\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png\",\"width\":714,\"height\":784,\"caption\":\"Probably Overthinking It\"},\"image\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/AllenDowney\",\"https:\/\/www.linkedin.com\/in\/allendowney\/\",\"https:\/\/bsky.app\/profile\/allendowney.bsky.social\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207\",\"name\":\"AllenDowney\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g\",\"caption\":\"AllenDowney\"},\"url\":\"https:\/\/www.allendowney.com\/blog\/author\/allendowney_6dbrc4\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"The Poincar\u00e9 Problem - Probably Overthinking It","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/","og_locale":"en_US","og_type":"article","og_title":"The Poincar\u00e9 Problem - Probably Overthinking It","og_description":"Selection bias is the hardest problem in statistics because it\u2019s almost unavoidable in practice, and once the data have been collected, it\u2019s usually not possible to quantify the effect of selection or recover an unbiased estimate of what you are trying to measure. And because the effect is systematic, not random, it doesn\u2019t help to collect more data. In fact, larger sample sizes make the problem worse, because they give the false impression of precision. But sometimes, if we are... Read More Read More","og_url":"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/","og_site_name":"Probably Overthinking It","article_published_time":"2025-09-25T21:46:17+00:00","article_modified_time":"2025-09-25T21:46:20+00:00","og_image":[{"width":442,"height":292,"url":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/35cfe517511163559ff87ce035853e291ea5625f0b9eba3d13b81592c0427cca.png","type":"image\/png"}],"author":"AllenDowney","twitter_card":"summary_large_image","twitter_creator":"@AllenDowney","twitter_site":"@AllenDowney","twitter_misc":{"Written by":"AllenDowney","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/#article","isPartOf":{"@id":"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/"},"author":{"name":"AllenDowney","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207"},"headline":"The Poincar\u00e9 Problem","datePublished":"2025-09-25T21:46:17+00:00","dateModified":"2025-09-25T21:46:20+00:00","mainEntityOfPage":{"@id":"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/"},"wordCount":810,"publisher":{"@id":"https:\/\/www.allendowney.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/#primaryimage"},"thumbnailUrl":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/35cfe517511163559ff87ce035853e291ea5625f0b9eba3d13b81592c0427cca.png","keywords":["bayesian statistics","selection bias"],"articleSection":["Bayesian methods"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/","url":"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/","name":"The Poincar\u00e9 Problem - Probably Overthinking It","isPartOf":{"@id":"https:\/\/www.allendowney.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/#primaryimage"},"image":{"@id":"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/#primaryimage"},"thumbnailUrl":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/35cfe517511163559ff87ce035853e291ea5625f0b9eba3d13b81592c0427cca.png","datePublished":"2025-09-25T21:46:17+00:00","dateModified":"2025-09-25T21:46:20+00:00","breadcrumb":{"@id":"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/#primaryimage","url":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/35cfe517511163559ff87ce035853e291ea5625f0b9eba3d13b81592c0427cca.png","contentUrl":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/09\/35cfe517511163559ff87ce035853e291ea5625f0b9eba3d13b81592c0427cca.png","width":442,"height":292},{"@type":"BreadcrumbList","@id":"https:\/\/www.allendowney.com\/blog\/2025\/09\/25\/the-poincare-problem\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.allendowney.com\/blog\/"},{"@type":"ListItem","position":2,"name":"The Poincar\u00e9 Problem"}]},{"@type":"WebSite","@id":"https:\/\/www.allendowney.com\/blog\/#website","url":"https:\/\/www.allendowney.com\/blog\/","name":"Probably Overthinking It","description":"Data science, Bayesian Statistics, and other ideas","publisher":{"@id":"https:\/\/www.allendowney.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.allendowney.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.allendowney.com\/blog\/#organization","name":"Probably Overthinking It","url":"https:\/\/www.allendowney.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png","contentUrl":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png","width":714,"height":784,"caption":"Probably Overthinking It"},"image":{"@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/AllenDowney","https:\/\/www.linkedin.com\/in\/allendowney\/","https:\/\/bsky.app\/profile\/allendowney.bsky.social"]},{"@type":"Person","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207","name":"AllenDowney","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g","caption":"AllenDowney"},"url":"https:\/\/www.allendowney.com\/blog\/author\/allendowney_6dbrc4\/"}]}},"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[{"id":1684,"url":"https:\/\/www.allendowney.com\/blog\/2025\/12\/16\/sat-math-scores-gender-difference-or-selection-bias\/","url_meta":{"origin":1593,"position":0},"title":"SAT math scores: gender difference or selection bias?","author":"AllenDowney","date":"December 16, 2025","format":false,"excerpt":"The video from my PyData Boston talk is up now: https:\/\/www.youtube.com\/watch?v=6pwtbNVgyzg Resources The slides are here Run the first notebook (Poincar\u00e9 problem) on Colab Run the second notebook (analysis of SAT data) on Colab If you want to learn to do this kind of analysis, you can sign up for\u2026","rel":"","context":"In \"bayesian statistics\"","block_context":{"text":"bayesian statistics","link":"https:\/\/www.allendowney.com\/blog\/tag\/bayesian-statistics\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/img.youtube.com\/vi\/6pwtbNVgyzg\/0.jpg?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":569,"url":"https:\/\/www.allendowney.com\/blog\/2021\/04\/25\/bayesian-and-frequentist-results-are-not-the-same-ever\/","url_meta":{"origin":1593,"position":1},"title":"Bayesian and frequentist results are not the same, ever","author":"AllenDowney","date":"April 25, 2021","format":false,"excerpt":"I often hear people say that the results from Bayesian methods are the same as the results from frequentist methods, at least under certain conditions. And sometimes it even comes from people who understand Bayesian methods. Today I saw this tweet from Julia Rohrer: \"Running a Bayesian multi-membership multi-level probit\u2026","rel":"","context":"In \"bayesian\"","block_context":{"text":"bayesian","link":"https:\/\/www.allendowney.com\/blog\/tag\/bayesian\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":442,"url":"https:\/\/www.allendowney.com\/blog\/2020\/04\/13\/bayesian-hypothesis-testing\/","url_meta":{"origin":1593,"position":2},"title":"Bayesian hypothesis testing","author":"AllenDowney","date":"April 13, 2020","format":false,"excerpt":"I have mixed feelings about Bayesian hypothesis testing. On the positive side, it's better than null-hypothesis significance testing (NHST). And it is probably necessary as an onboarding tool: Hypothesis testing is one of the first things future Bayesians ask about; we need to have an answer. On the negative side,\u2026","rel":"","context":"In \"bayesian statistics\"","block_context":{"text":"bayesian statistics","link":"https:\/\/www.allendowney.com\/blog\/tag\/bayesian-statistics\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2020\/04\/image.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":609,"url":"https:\/\/www.allendowney.com\/blog\/2021\/05\/07\/founded-upon-an-error\/","url_meta":{"origin":1593,"position":3},"title":"Founded Upon an Error","author":"AllenDowney","date":"May 7, 2021","format":false,"excerpt":"A recent post on Reddit asks, \"Why was Bayes' Theory not accepted\/popular historically until the late 20th century?\" Great question! As always, there are many answers to a question like this, and the good people of Reddit provide several. But the first and most popular answer is, in my humble\u2026","rel":"","context":"In \"bayesian\"","block_context":{"text":"bayesian","link":"https:\/\/www.allendowney.com\/blog\/tag\/bayesian\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1704,"url":"https:\/\/www.allendowney.com\/blog\/2026\/01\/09\/bayesian-decision-analysis\/","url_meta":{"origin":1593,"position":4},"title":"Bayesian Decision Analysis","author":"AllenDowney","date":"January 9, 2026","format":false,"excerpt":"At PyData Global 2025 I presented a workshop on Bayesian Decision Analysis with PyMC. The video is available now. This workshop is based on the first session of the Applied Bayesian Modeling Workshop I teach along with my colleagues at PyMC Labs. If you would like to learn more, it\u2026","rel":"","context":"In \"bayesian statistics\"","block_context":{"text":"bayesian statistics","link":"https:\/\/www.allendowney.com\/blog\/tag\/bayesian-statistics\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/img.youtube.com\/vi\/PLGVZCDnMOq0qmerwB1eITnr5AfYRGm0DF\/0.jpg?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":883,"url":"https:\/\/www.allendowney.com\/blog\/2023\/03\/20\/the-bayesian-killer-app\/","url_meta":{"origin":1593,"position":5},"title":"The Bayesian Killer App","author":"AllenDowney","date":"March 20, 2023","format":false,"excerpt":"It's been a while since anyone said \"killer app\" without irony, so let me remind you that a killer app is software \"so necessary or desirable that it proves the core value of some larger technology,\" quoth Wikipedia. For example, most people didn't have much use for the internet until\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/img.youtube.com\/vi\/fsdbneHgi58\/0.jpg?resize=350%2C200","width":350,"height":200},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts\/1593","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/comments?post=1593"}],"version-history":[{"count":4,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts\/1593\/revisions"}],"predecessor-version":[{"id":1602,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts\/1593\/revisions\/1602"}],"wp:attachment":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/media?parent=1593"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/categories?post=1593"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/tags?post=1593"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}