{"id":1425,"date":"2024-11-17T15:24:26","date_gmt":"2024-11-17T15:24:26","guid":{"rendered":"https:\/\/www.allendowney.com\/blog\/?p=1425"},"modified":"2024-11-17T15:24:26","modified_gmt":"2024-11-17T15:24:26","slug":"comparing-distributions","status":"publish","type":"post","link":"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/","title":{"rendered":"Comparing Distributions"},"content":{"rendered":"\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>This is the second is a series of excerpts from <em>Elements of Data Science<\/em> which <a href=\"https:\/\/www.lulu.com\/shop\/allen-downey\/elements-of-data-science\/paperback\/product-9dyrwn.html\">available from Lulu.com<\/a> and online booksellers. It&#8217;s from Chapter 8, which is about representing distribution using PMFs and CDFs. This section explains why I think CDFs are often better for plotting and comparing distributions.You can read the complete chapter <a href=\"https:\/\/allendowney.github.io\/ElementsOfDataScience\/08_distributions.html\">here<\/a>, or run the Jupyter notebook on <a href=\"https:\/\/colab.research.google.com\/github\/AllenDowney\/ElementsOfDataScience\/blob\/v1\/08_distributions.ipynb\">Colab<\/a>.<\/p>\n<\/blockquote>\n\n\n\n<p>So far we\u2019ve seen two ways to represent distributions, PMFs and CDFs. Now we\u2019ll use PMFs and CDFs to compare distributions, and we\u2019ll see the pros and cons of each. One way to compare distributions is to plot multiple PMFs on the same axes. For example, suppose we want to compare the distribution of age for male and female respondents. First we\u2019ll create a Boolean <code>Series<\/code> that\u2019s true for male respondents and another that\u2019s true for female respondents.<\/p>\n\n\n\n<pre id=\"codecell32\" class=\"wp-block-preformatted\">male = (gss['sex'] == 1)\nfemale = (gss['sex'] == 2)\n<\/pre>\n\n\n\n<p>We can use these <code>Series<\/code> to select ages for male and female respondents.<\/p>\n\n\n\n<pre id=\"codecell33\" class=\"wp-block-preformatted\">male_age = age[male]\nfemale_age = age[female]\n<\/pre>\n\n\n\n<p>And plot a PMF for each.<\/p>\n\n\n\n<pre id=\"codecell34\" class=\"wp-block-preformatted\">pmf_male_age = Pmf.from_seq(male_age)\npmf_male_age.plot(label='Male')\n\npmf_female_age = Pmf.from_seq(female_age)\npmf_female_age.plot(label='Female')\n\nplt.xlabel('Age (years)') \nplt.ylabel('PMF')\nplt.title('Distribution of age by sex')\nplt.legend();\n<\/pre>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"414\" height=\"264\" src=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/09005abc9b2fe1bc9adfb183ae2de0515a2461f79699d3985f53a43934bf1b3f.png\" alt=\"\" class=\"wp-image-1426\" srcset=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/09005abc9b2fe1bc9adfb183ae2de0515a2461f79699d3985f53a43934bf1b3f.png 414w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/09005abc9b2fe1bc9adfb183ae2de0515a2461f79699d3985f53a43934bf1b3f-300x191.png 300w\" sizes=\"auto, (max-width: 414px) 100vw, 414px\" \/><\/figure>\n\n\n\n<p>A plot as variable as this is often described as <strong>noisy<\/strong>. If we ignore the noise, it looks like the PMF is higher for men between ages 40 and 50, and higher for women between ages 70 and 80. But both of those differences might be due to randomness.<\/p>\n\n\n\n<p>Now let\u2019s do the same thing with CDFs \u2013 everything is the same except we replace <code>Pmf<\/code> with <code>Cdf<\/code>.<\/p>\n\n\n\n<pre id=\"codecell35\" class=\"wp-block-preformatted\">cdf_male_age = Cdf.from_seq(male_age)\ncdf_male_age.plot(label='Male')\n\ncdf_female_age = Cdf.from_seq(female_age)\ncdf_female_age.plot(label='Female')\n\nplt.xlabel('Age (years)') \nplt.ylabel('CDF')\nplt.title('Distribution of age by sex')\nplt.legend();\n<\/pre>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"401\" height=\"264\" src=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/1361d5ca0371bdc4a2d705fa7b394b0a4a55e1256a410a903d20f89ba4a5b206.png\" alt=\"\" class=\"wp-image-1427\" srcset=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/1361d5ca0371bdc4a2d705fa7b394b0a4a55e1256a410a903d20f89ba4a5b206.png 401w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/1361d5ca0371bdc4a2d705fa7b394b0a4a55e1256a410a903d20f89ba4a5b206-300x198.png 300w\" sizes=\"auto, (max-width: 401px) 100vw, 401px\" \/><\/figure>\n\n\n\n<p>Because CDFs smooth out randomness, they provide a better view of real differences between distributions. In this case, the lines are close together until age 40 \u2013 after that, the CDF is higher for men than women.<\/p>\n\n\n\n<p>So what does that mean? One way to interpret the difference is that the fraction of men below a given age is generally more than the fraction of women below the same age. For example, about 77% of men are 60 or less, compared to 75% of women.<\/p>\n\n\n\n<pre id=\"codecell36\" class=\"wp-block-preformatted\">cdf_male_age(60), cdf_female_age(60)\n<\/pre>\n\n\n\n<pre id=\"codecell37\" class=\"wp-block-preformatted\">(array(0.7721998), array(0.7474241))\n<\/pre>\n\n\n\n<p>Going the other way, we could also compare percentiles. For example, the median age woman is older than the median age man, by about one year.<\/p>\n\n\n\n<pre id=\"codecell38\" class=\"wp-block-preformatted\">cdf_male_age.inverse(0.5), cdf_female_age.inverse(0.5)\n<\/pre>\n\n\n\n<pre id=\"codecell39\" class=\"wp-block-preformatted\">(array(44.), array(45.))\n<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Comparing Incomes<\/h2>\n\n\n\n<p>As another example, let\u2019s look at household income and compare the distribution before and after 1995 (I chose 1995 because it\u2019s roughly the midpoint of the survey). We\u2019ll make two Boolean <code>Series<\/code> objects to select respondents interviewed before and after 1995.<\/p>\n\n\n\n<pre id=\"codecell40\" class=\"wp-block-preformatted\">pre95 = (gss['year'] &lt; 1995)\npost95 = (gss['year'] &gt;= 1995)\n<\/pre>\n\n\n\n<p>Now we can plot the PMFs of <code>realinc<\/code>, which records household income converted to 1986 dollars.<\/p>\n\n\n\n<pre id=\"codecell41\" class=\"wp-block-preformatted\">realinc = gss['realinc']\n\nPmf.from_seq(realinc[pre95]).plot(label='Before 1995')\nPmf.from_seq(realinc[post95]).plot(label='After 1995')\n\nplt.xlabel('Income (1986 USD)')\nplt.ylabel('PMF')\nplt.title('Distribution of income')\nplt.legend();\n<\/pre>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"421\" height=\"264\" src=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/9bd2d1e3c22973960561c56ada44da714902c900e0077a399697b205f338e765.png\" alt=\"\" class=\"wp-image-1428\" srcset=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/9bd2d1e3c22973960561c56ada44da714902c900e0077a399697b205f338e765.png 421w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/9bd2d1e3c22973960561c56ada44da714902c900e0077a399697b205f338e765-300x188.png 300w\" sizes=\"auto, (max-width: 421px) 100vw, 421px\" \/><\/figure>\n\n\n\n<p>There are a lot of unique values in this distribution, and none of them appear very often. As a result, the PMF is so noisy and we can\u2019t really see the shape of the distribution. It\u2019s also hard to compare the distributions. It looks like there are more people with high incomes after 1995, but it\u2019s hard to tell. We can get a clearer picture with a CDF.<\/p>\n\n\n\n<pre id=\"codecell42\" class=\"wp-block-preformatted\">Cdf.from_seq(realinc[pre95]).plot(label='Before 1995')\nCdf.from_seq(realinc[post95]).plot(label='After 1995')\n\nplt.xlabel('Income (1986 USD)')\nplt.ylabel('CDF')\nplt.title('Distribution of income')\nplt.legend();\n<\/pre>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"401\" height=\"264\" src=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/5c3184ea9ce15063868ee31f65fec22c0a7ac3dd86c19a9070b2d4c0653b6d1f.png\" alt=\"\" class=\"wp-image-1429\" srcset=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/5c3184ea9ce15063868ee31f65fec22c0a7ac3dd86c19a9070b2d4c0653b6d1f.png 401w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/5c3184ea9ce15063868ee31f65fec22c0a7ac3dd86c19a9070b2d4c0653b6d1f-300x198.png 300w\" sizes=\"auto, (max-width: 401px) 100vw, 401px\" \/><\/figure>\n\n\n\n<p>Below $30,000 the CDFs are almost identical; above that, we can see that the post-1995 distribution is shifted to the right. In other words, the fraction of people with high incomes is about the same, but the income of high earners has increased.<\/p>\n\n\n\n<p>In general, I recommend CDFs for exploratory analysis. They give you a clear view of the distribution, without too much noise, and they are good for comparing distributions.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is the second is a series of excerpts from Elements of Data Science which available from Lulu.com and online booksellers. It&#8217;s from Chapter 8, which is about representing distribution using PMFs and CDFs. This section explains why I think CDFs are often better for plotting and comparing distributions.You can read the complete chapter here, or run the Jupyter notebook on Colab. So far we\u2019ve seen two ways to represent distributions, PMFs and CDFs. Now we\u2019ll use PMFs and CDFs&#8230;<\/p>\n<p class=\"read-more\"><a class=\"btn btn-default\" href=\"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/\"> Read More<span class=\"screen-reader-text\">  Read More<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[1],"tags":[],"class_list":["post-1425","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Comparing Distributions - Probably Overthinking It<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Comparing Distributions - Probably Overthinking It\" \/>\n<meta property=\"og:description\" content=\"This is the second is a series of excerpts from Elements of Data Science which available from Lulu.com and online booksellers. It&#8217;s from Chapter 8, which is about representing distribution using PMFs and CDFs. This section explains why I think CDFs are often better for plotting and comparing distributions.You can read the complete chapter here, or run the Jupyter notebook on Colab. So far we\u2019ve seen two ways to represent distributions, PMFs and CDFs. Now we\u2019ll use PMFs and CDFs... Read More Read More\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/\" \/>\n<meta property=\"og:site_name\" content=\"Probably Overthinking It\" \/>\n<meta property=\"article:published_time\" content=\"2024-11-17T15:24:26+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/09005abc9b2fe1bc9adfb183ae2de0515a2461f79699d3985f53a43934bf1b3f.png\" \/>\n<meta name=\"author\" content=\"AllenDowney\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@AllenDowney\" \/>\n<meta name=\"twitter:site\" content=\"@AllenDowney\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"AllenDowney\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/\"},\"author\":{\"name\":\"AllenDowney\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207\"},\"headline\":\"Comparing Distributions\",\"datePublished\":\"2024-11-17T15:24:26+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/\"},\"wordCount\":531,\"publisher\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/09005abc9b2fe1bc9adfb183ae2de0515a2461f79699d3985f53a43934bf1b3f.png\",\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/\",\"url\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/\",\"name\":\"Comparing Distributions - Probably Overthinking It\",\"isPartOf\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/09005abc9b2fe1bc9adfb183ae2de0515a2461f79699d3985f53a43934bf1b3f.png\",\"datePublished\":\"2024-11-17T15:24:26+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/#primaryimage\",\"url\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/09005abc9b2fe1bc9adfb183ae2de0515a2461f79699d3985f53a43934bf1b3f.png\",\"contentUrl\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/09005abc9b2fe1bc9adfb183ae2de0515a2461f79699d3985f53a43934bf1b3f.png\",\"width\":414,\"height\":264},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.allendowney.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Comparing Distributions\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#website\",\"url\":\"https:\/\/www.allendowney.com\/blog\/\",\"name\":\"Probably Overthinking It\",\"description\":\"Data science, Bayesian Statistics, and other ideas\",\"publisher\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.allendowney.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#organization\",\"name\":\"Probably Overthinking It\",\"url\":\"https:\/\/www.allendowney.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png\",\"contentUrl\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png\",\"width\":714,\"height\":784,\"caption\":\"Probably Overthinking It\"},\"image\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/AllenDowney\",\"https:\/\/www.linkedin.com\/in\/allendowney\/\",\"https:\/\/bsky.app\/profile\/allendowney.bsky.social\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207\",\"name\":\"AllenDowney\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g\",\"caption\":\"AllenDowney\"},\"url\":\"https:\/\/www.allendowney.com\/blog\/author\/allendowney_6dbrc4\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Comparing Distributions - Probably Overthinking It","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/","og_locale":"en_US","og_type":"article","og_title":"Comparing Distributions - Probably Overthinking It","og_description":"This is the second is a series of excerpts from Elements of Data Science which available from Lulu.com and online booksellers. It&#8217;s from Chapter 8, which is about representing distribution using PMFs and CDFs. This section explains why I think CDFs are often better for plotting and comparing distributions.You can read the complete chapter here, or run the Jupyter notebook on Colab. So far we\u2019ve seen two ways to represent distributions, PMFs and CDFs. Now we\u2019ll use PMFs and CDFs... Read More Read More","og_url":"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/","og_site_name":"Probably Overthinking It","article_published_time":"2024-11-17T15:24:26+00:00","og_image":[{"url":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/09005abc9b2fe1bc9adfb183ae2de0515a2461f79699d3985f53a43934bf1b3f.png","type":"","width":"","height":""}],"author":"AllenDowney","twitter_card":"summary_large_image","twitter_creator":"@AllenDowney","twitter_site":"@AllenDowney","twitter_misc":{"Written by":"AllenDowney","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/#article","isPartOf":{"@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/"},"author":{"name":"AllenDowney","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207"},"headline":"Comparing Distributions","datePublished":"2024-11-17T15:24:26+00:00","mainEntityOfPage":{"@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/"},"wordCount":531,"publisher":{"@id":"https:\/\/www.allendowney.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/#primaryimage"},"thumbnailUrl":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/09005abc9b2fe1bc9adfb183ae2de0515a2461f79699d3985f53a43934bf1b3f.png","inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/","url":"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/","name":"Comparing Distributions - Probably Overthinking It","isPartOf":{"@id":"https:\/\/www.allendowney.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/#primaryimage"},"image":{"@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/#primaryimage"},"thumbnailUrl":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/09005abc9b2fe1bc9adfb183ae2de0515a2461f79699d3985f53a43934bf1b3f.png","datePublished":"2024-11-17T15:24:26+00:00","breadcrumb":{"@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/#primaryimage","url":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/09005abc9b2fe1bc9adfb183ae2de0515a2461f79699d3985f53a43934bf1b3f.png","contentUrl":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/09005abc9b2fe1bc9adfb183ae2de0515a2461f79699d3985f53a43934bf1b3f.png","width":414,"height":264},{"@type":"BreadcrumbList","@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.allendowney.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Comparing Distributions"}]},{"@type":"WebSite","@id":"https:\/\/www.allendowney.com\/blog\/#website","url":"https:\/\/www.allendowney.com\/blog\/","name":"Probably Overthinking It","description":"Data science, Bayesian Statistics, and other ideas","publisher":{"@id":"https:\/\/www.allendowney.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.allendowney.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.allendowney.com\/blog\/#organization","name":"Probably Overthinking It","url":"https:\/\/www.allendowney.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png","contentUrl":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png","width":714,"height":784,"caption":"Probably Overthinking It"},"image":{"@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/AllenDowney","https:\/\/www.linkedin.com\/in\/allendowney\/","https:\/\/bsky.app\/profile\/allendowney.bsky.social"]},{"@type":"Person","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207","name":"AllenDowney","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g","caption":"AllenDowney"},"url":"https:\/\/www.allendowney.com\/blog\/author\/allendowney_6dbrc4\/"}]}},"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[{"id":279,"url":"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/","url_meta":{"origin":1425,"position":0},"title":"Watch your tail!","author":"AllenDowney","date":"August 13, 2019","format":false,"excerpt":"For a long time I have recommended using CDFs to compare distributions. If you are comparing an empirical distribution to a model, the CDF gives you the best view of any differences between the data and the model. Now I want to amend my advice. CDFs give you a good\u2026","rel":"","context":"In \"CDF\"","block_context":{"text":"CDF","link":"https:\/\/www.allendowney.com\/blog\/tag\/cdf\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-16.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":703,"url":"https:\/\/www.allendowney.com\/blog\/2022\/05\/02\/how-gaussian-is-it\/","url_meta":{"origin":1425,"position":1},"title":"How Gaussian Is It?","author":"AllenDowney","date":"May 2, 2022","format":false,"excerpt":"This article is an excerpt from the current draft of my book Probably Overthinking It, to be published by the University of Chicago Press in early 2023. If you would like to receive infrequent notifications about the book (and possibly a discount), please sign up for this mailing list.This book\u2026","rel":"","context":"In \"Gaussian\"","block_context":{"text":"Gaussian","link":"https:\/\/www.allendowney.com\/blog\/tag\/gaussian\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2022\/04\/ansur_30_0.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":996,"url":"https:\/\/www.allendowney.com\/blog\/2023\/08\/20\/how-correlated-are-you\/","url_meta":{"origin":1425,"position":2},"title":"How Correlated Are You?","author":"AllenDowney","date":"August 20, 2023","format":false,"excerpt":"This post is an offshoot from Chapter 1 of Probably Overthinking It, which is available for pre-order now! Suppose you measure the arm and leg lengths of 4082 people. You would expect those measurements to be correlated, and you would be right. In the ANSUR-II dataset, among male members of\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2023\/08\/image-7.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2023\/08\/image-7.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2023\/08\/image-7.png?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":1009,"url":"https:\/\/www.allendowney.com\/blog\/2023\/08\/27\/taming-black-swans\/","url_meta":{"origin":1425,"position":3},"title":"Taming Black Swans","author":"AllenDowney","date":"August 27, 2023","format":false,"excerpt":"At SciPy 2023 I presented a talk called \"Taming Black Swans: Long-tailed distributions in the natural and engineered world\". Here's the abstract: Long-tailed distributions are common in natural and engineered systems; as a result, we encounter extreme values more often than we would expect from a short-tailed distribution. If we\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/img.youtube.com\/vi\/-rE3DfeZ_jE\/0.jpg?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":956,"url":"https:\/\/www.allendowney.com\/blog\/2023\/06\/10\/abstracts-and-keywords\/","url_meta":{"origin":1425,"position":4},"title":"Abstracts and keywords","author":"AllenDowney","date":"June 10, 2023","format":false,"excerpt":"As Probably Overthinking It approaches the finish line, there are just a few more tasks: I am working on the index and -- as I have recently learned -- I also have to write a 200-word abstract, a list of keywords for each chapter, and a 250-word abstract for the\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":721,"url":"https:\/\/www.allendowney.com\/blog\/2022\/05\/09\/name-that-distribution\/","url_meta":{"origin":1425,"position":5},"title":"The Student-t model of Long-Tailed Distributions","author":"AllenDowney","date":"May 9, 2022","format":false,"excerpt":"As I've mentioned, I'm working on a book called Probably Overthinking It, to be published in early 2023. It's intended for a general audience, so I'm not trying to do research, but I might have found something novel while working on a chapter about power law distributions. If you are\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2022\/05\/djia-1.jpg?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2022\/05\/djia-1.jpg?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2022\/05\/djia-1.jpg?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2022\/05\/djia-1.jpg?resize=700%2C400&ssl=1 2x"},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts\/1425","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/comments?post=1425"}],"version-history":[{"count":1,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts\/1425\/revisions"}],"predecessor-version":[{"id":1430,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts\/1425\/revisions\/1430"}],"wp:attachment":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/media?parent=1425"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/categories?post=1425"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/tags?post=1425"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}