{"id":1431,"date":"2024-11-19T16:04:41","date_gmt":"2024-11-19T16:04:41","guid":{"rendered":"https:\/\/www.allendowney.com\/blog\/?p=1431"},"modified":"2024-11-20T18:13:44","modified_gmt":"2024-11-20T18:13:44","slug":"whats-a-chartist","status":"publish","type":"post","link":"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/","title":{"rendered":"What&#8217;s a Chartist?"},"content":{"rendered":"\n<p>Recently I heard the word \u201cchartist\u201d for the first time in my life (that I recall). And then later the same day, I heard it again. So that raises two questions:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>What are the chances of going 57 years without hearing a word, and then hearing it twice in one day?<\/li>\n\n\n\n<li>Also, what\u2019s a chartist?<\/li>\n<\/ul>\n\n\n\n<p>To answer the second question first, it\u2019s someone who supported chartism, which was \u201ca working-class movement for political reform in the United Kingdom that erupted from 1838 to 1857\u201d, quoth <a href=\"https:\/\/en.wikipedia.org\/wiki\/Chartism\">Wikipedia<\/a>. The name comes from the People\u2019s Charter of 1838, which called for voting rights for unpropertied men, among other reforms.<\/p>\n\n\n\n<p>To answer the first question, we\u2019ll do some Bayesian statistics. My solution is based on a model that\u2019s not very realistic, so we should not take the result too seriously, but it demonstrates some interesting methods, I think. And as you\u2019ll see, there is a connection to Zipf\u2019s law, <a href=\"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/\">which I wrote about last week<\/a>.<\/p>\n\n\n\n<p>Since last week\u2019s post was at the beginner level, I should warn you that this one is more advanced \u2013 in rapid succession, it involves the beta distribution, the <em>t<\/em> distribution, the negative binomial, and the binomial.<\/p>\n\n\n\n<p>This post is based on <em>Think Bayes 2e<\/em>, which is available from <a href=\"https:\/\/bookshop.org\/a\/98697\/9781492089469\">Bookshop.org<\/a> and <a href=\"https:\/\/amzn.to\/334eqGo\">Amazon<\/a>.<\/p>\n\n\n\n<p><a href=\"https:\/\/colab.research.google.com\/github\/AllenDowney\/ThinkBayes2\/blob\/master\/examples\/zipf.ipynb\">Click here to run this notebook on Colab<\/a>. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Word Frequencies<\/h2>\n\n\n\n<p>If you don\u2019t hear a word for more than 50 years, that suggests it is not a common word. We can use Bayes\u2019s theorem to quantify this intuition. First we\u2019ll compute the posterior distribution of the word\u2019s frequency, then the posterior predictive distribution of hearing it again within a day.<\/p>\n\n\n\n<p>Because we have only one piece of data \u2013 the time until first appearance \u2013 we\u2019ll need a good prior distribution. Which means we\u2019ll need a large, good quality sample of English text. For that, I\u2019ll use a free sample of the COCA dataset from <a href=\"https:\/\/www.corpusdata.org\/formats.asp\">CorpusData.org<\/a>. The following cells download and read the data.<\/p>\n\n\n\n<pre id=\"codecell3\" class=\"wp-block-preformatted\">download(\"https:\/\/www.corpusdata.org\/coca\/samples\/coca-samples-text.zip\")\n<\/pre>\n\n\n\n<pre id=\"codecell4\" class=\"wp-block-preformatted\">import zipfile\n\n\ndef generate_lines(zip_path=\"coca-samples-text.zip\"):\n    with zipfile.ZipFile(zip_path, \"r\") as zip_file:\n        file_list = zip_file.namelist()\n        for file_name in file_list:\n            with zip_file.open(file_name) as file:\n                lines = file.readlines()\n                for line in lines:\n                    yield (line.decode(\"utf-8\"))\n<\/pre>\n\n\n\n<p>We\u2019ll use a <code>Counter<\/code> to count the number of times each word appears.<\/p>\n\n\n\n<pre id=\"codecell5\" class=\"wp-block-preformatted\">import re\nfrom collections import Counter\n\npattern = r\"[ \/\\n]+|--\"\n\ncounter = Counter()\nfor line in generate_lines():\n    words = re.split(pattern, line)[1:]\n    counter.update(word.lower() for word in words if word)\n<\/pre>\n\n\n\n<p>The dataset includes about 188,000 unique strings, but not all of them are what we would consider words.<\/p>\n\n\n\n<pre id=\"codecell6\" class=\"wp-block-preformatted\">len(counter), counter.total()\n<\/pre>\n\n\n\n<pre id=\"codecell7\" class=\"wp-block-preformatted\">(188086, 11503819)\n<\/pre>\n\n\n\n<p>To narrow it down, I\u2019ll remove anything that starts or ends with a non-alphabetical character \u2013 so hyphens and apostrophes are allowed in the middle of a word.<\/p>\n\n\n\n<pre id=\"codecell8\" class=\"wp-block-preformatted\">for s in list(counter.keys()):\n    if not s[0].isalpha() or not s[-1].isalpha():\n        del counter[s]\n<\/pre>\n\n\n\n<p>This filter reduces the number of unique words to about 151,000.<\/p>\n\n\n\n<pre id=\"codecell9\" class=\"wp-block-preformatted\">num_words = counter.total()\nlen(counter), num_words\n<\/pre>\n\n\n\n<pre id=\"codecell10\" class=\"wp-block-preformatted\">(151414, 8889694)\n<\/pre>\n\n\n\n<p>Of the 50 most common words, all of them have one syllable except number 38. Before you look at the list, can you guess the most common two-syllable word? Here\u2019s a theory about <a href=\"https:\/\/news.mit.edu\/2011\/words-count-0210\">why common words are short<\/a>.<\/p>\n\n\n\n<pre id=\"codecell11\" class=\"wp-block-preformatted\">for i, (word, freq) in enumerate(counter.most_common(50)):\n    print(f'{i+1}\\t{word}\\t{freq}')\n<\/pre>\n\n\n\n<pre id=\"codecell12\" class=\"wp-block-preformatted\">1\tthe\t461991\n2\tto\t237929\n3\tand\t231459\n4\tof\t217363\n5\ta\t203302\n6\tin\t153323\n7\ti\t137931\n8\tthat\t123818\n9\tyou\t109635\n10\tit\t103712\n11\tis\t93996\n12\tfor\t78755\n13\ton\t64869\n14\twas\t64388\n15\twith\t59724\n16\the\t57684\n17\tthis\t51879\n18\tas\t51202\n19\tn't\t49291\n20\twe\t47694\n21\tare\t47192\n22\thave\t46963\n23\tbe\t46563\n24\tnot\t43872\n25\tbut\t42434\n26\tthey\t42411\n27\tat\t42017\n28\tdo\t41568\n29\twhat\t35637\n30\tfrom\t34557\n31\this\t33578\n32\tby\t32583\n33\tor\t32146\n34\tshe\t29945\n35\tall\t29391\n36\tmy\t29390\n37\tan\t28580\n38\tabout\t27804\n39\tthere\t27291\n40\tso\t27081\n41\ther\t26363\n42\tone\t26022\n43\thad\t25656\n44\tif\t25373\n45\tyour\t24641\n46\tme\t24551\n47\twho\t23500\n48\tcan\t23311\n49\ttheir\t23221\n50\tout\t22902\n<\/pre>\n\n\n\n<p>There are about 72,000 words that only appear once in the corpus, technically known as <a href=\"https:\/\/en.wikipedia.org\/wiki\/Hapax_legomenon\">hapax legomena<\/a>.<\/p>\n\n\n\n<pre id=\"codecell13\" class=\"wp-block-preformatted\">singletons = [word for (word, freq) in counter.items() if freq == 1]\nlen(singletons), len(singletons) \/ counter.total() * 100\n<\/pre>\n\n\n\n<pre id=\"codecell14\" class=\"wp-block-preformatted\">(72159, 0.811715228893143)\n<\/pre>\n\n\n\n<p>Here\u2019s a random selection of them. Many are proper names, typos, or other non-words, but some are legitimate but rare words.<\/p>\n\n\n\n<pre id=\"codecell15\" class=\"wp-block-preformatted\">np.random.choice(singletons, 100)\n<\/pre>\n\n\n\n<pre id=\"codecell16\" class=\"wp-block-preformatted\">array(['laneer', 'emc', 'literature-like', 'tomyworld', 'roald',\n       'unreleased', 'basemen', 'kielhau', 'clobber', 'feydeau',\n       'symptomless', 'channelmaster', 'v-i', 'tipsha', 'mjlkdroppen',\n       'harlots', 'phaetons', 'grlinger', 'naniwa', 'dadian',\n       'banafionen', 'ceramaseal', 'vine-covered', 'terrafirmahome.com',\n       'hesten', 'undertheorized', 'fantastycznie', 'kaido', 'noughts',\n       'hannelie', 'cacoa', 'subelement', 'mestothelioma', 'gut-level',\n       'abis', 'potterville', 'quarter-to-quarter', 'lokkii', 'telemed',\n       'whitewood', 'dualmode', 'plebiscites', 'loubrutton',\n       'off-loading', 'abbot-t-t', 'whackaloons', 'tuinal', 'guyi',\n       'samanthalaughs', 'editor-sponsored', 'neurosciences', 'lunched',\n       'chicken-and-brisket', 'korekane', 'ruby-colored',\n       'double-elimination', 'cornhusker', 'wjounds', 'mendy', 'red.ooh',\n       'delighters', 'tuviera', 'spot-lit', 'tuskarr', 'easy-many',\n       'timepoint', 'mouthfuls', 'catchy-titled', 'b.l', 'four-ply',\n       \"sa'ud\", 'millenarianism', 'gelder', 'cinnam',\n       'documentary-filmmaking', 'huviesen', 'by-gone', 'boy-friend',\n       'heartlight', 'farecompare.com', 'nurya', 'overstaying',\n       'johnny-turn', 'rashness', 'mestier', 'trivedi', 'koshanska',\n       'tremulousness', 'movies-another', 'womenfolks', 'bawdy',\n       'all-her-life', 'lakhani', 'screeeeaming', 'marketings', 'girthy',\n       'non-discriminatory', 'chumpy', 'resque', 'lysing'], dtype='&lt;U24')\n<\/pre>\n\n\n\n<p>Now let\u2019s see what the distribution of word frequencies looks like.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Zipf\u2019s Law<\/h2>\n\n\n\n<p>One way to visualize the distribution is a Zipf plot, which shows the ranks on the x-axis and the frequencies on the y-axis.<\/p>\n\n\n\n<pre id=\"codecell17\" class=\"wp-block-preformatted\">freqs = np.array(sorted(counter.values(), reverse=True))\n<\/pre>\n\n\n\n<pre id=\"codecell18\" class=\"wp-block-preformatted\">n = len(freqs)\nranks = range(1, n + 1)\n<\/pre>\n\n\n\n<p>Here\u2019s what it looks like on a log-log scale.<\/p>\n\n\n\n<pre id=\"codecell19\" class=\"wp-block-preformatted\">plt.plot(ranks, freqs)\n\ndecorate(\n    title=\"Zipf plot\", xlabel=\"Rank\", ylabel=\"Frequency\", xscale=\"log\", yscale=\"log\"\n)\n<\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/allendowney.github.io\/ThinkBayes2\/_images\/204f4aae3fe537fefdbe43abadd4be2a854bc627c7d5e064d9efebd9cc6a58df.png\" alt=\"_images\/204f4aae3fe537fefdbe43abadd4be2a854bc627c7d5e064d9efebd9cc6a58df.png\"\/><\/figure>\n\n\n\n<p>Zipf\u2019s law suggest that the result should be a straight line with slope close to -1. It\u2019s not exactly a straight line, but it\u2019s close, and the slope is about -1.1.<\/p>\n\n\n\n<pre id=\"codecell20\" class=\"wp-block-preformatted\">rise = np.log10(freqs[-1]) - np.log10(freqs[0])\nrise\n<\/pre>\n\n\n\n<pre id=\"codecell21\" class=\"wp-block-preformatted\">-5.664633515191604\n<\/pre>\n\n\n\n<pre id=\"codecell22\" class=\"wp-block-preformatted\">run = np.log10(ranks[-1]) - np.log10(ranks[0])\nrun\n<\/pre>\n\n\n\n<pre id=\"codecell23\" class=\"wp-block-preformatted\">5.180166032638616\n<\/pre>\n\n\n\n<pre id=\"codecell24\" class=\"wp-block-preformatted\">rise \/ run\n<\/pre>\n\n\n\n<pre id=\"codecell25\" class=\"wp-block-preformatted\">-1.0935235433575892\n<\/pre>\n\n\n\n<p>The Zipf plot is a well-known visual representation of the distribution of frequencies, but for the current problem, we\u2019ll switch to a different representation.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Tail Distribution<\/h2>\n\n\n\n<p>Given the number of times each word appear in the corpus, we can compute the rates, which is the number of times we expect each word to appear in a sample of a given size, and the inverse rates, which are the number of words we need to see before we expect a given word to appear.<\/p>\n\n\n\n<p>We will find it most convenient to work with the distribution of inverse rates on a log scale. The first step is to use the observed frequencies to estimate word rates \u2013 we\u2019ll estimate the rate at which each word would appear in a random sample.<\/p>\n\n\n\n<p>We\u2019ll do that by creating a beta distribution that represents the posterior distribution of word rates, given the observed frequencies (see <a href=\"https:\/\/allendowney.github.io\/ThinkBayes2\/chap18.html#the-conjugate-prior\">this section of <em>Think Bayes<\/em><\/a>) \u2013 and then drawing a random sample from the posterior. So words that have the same frequency will not generally have the same inferred rate.<\/p>\n\n\n\n<pre id=\"codecell26\" class=\"wp-block-preformatted\">from scipy.stats import beta\n\nnp.random.seed(17)\nalphas = freqs + 1\nbetas = num_words - freqs + 1\ninferred_rates = beta(alphas, betas).rvs()\n<\/pre>\n\n\n\n<p>Now we can compute the inverse rates, which are the number of words we have to sample before we expect to see each word once.<\/p>\n\n\n\n<pre id=\"codecell27\" class=\"wp-block-preformatted\">inverse_rates = 1 \/ inferred_rates\n<\/pre>\n\n\n\n<p>And here are their magnitudes, expressed as logarithms base 10.<\/p>\n\n\n\n<pre id=\"codecell28\" class=\"wp-block-preformatted\">mags = np.log10(inverse_rates)\n<\/pre>\n\n\n\n<p>To represent the distribution of these magnitudes, we\u2019ll use a <code>Surv<\/code> object, which represents survival functions, but we\u2019ll use a variation of the survival function which is the probability that a randomly-chosen value is greater than or equal to a given quantity. The following function computes this version of a survival function, which is called a tail probability.<\/p>\n\n\n\n<pre id=\"codecell29\" class=\"wp-block-preformatted\">from empiricaldist import Surv\n\n\ndef make_surv(seq):\n    \"\"\"Make a non-standard survival function, P(X&gt;=x)\"\"\"\n    pmf = Pmf.from_seq(seq)\n    surv = pmf.make_surv() + pmf\n\n    # correct for numerical error\n    surv.iloc[0] = 1\n    return Surv(surv)\n<\/pre>\n\n\n\n<p>Here\u2019s how we make the survival function.<\/p>\n\n\n\n<pre id=\"codecell30\" class=\"wp-block-preformatted\">surv = make_surv(mags)\n<\/pre>\n\n\n\n<p>And here\u2019s what it looks like.<\/p>\n\n\n\n<pre id=\"codecell31\" class=\"wp-block-preformatted\">options = dict(marker=\".\", ms=2, lw=0.5, label=\"data\")\nsurv.plot(**options)\ndecorate(xlabel=\"Inverse rate (log10 words per appearance)\", ylabel=\"Tail probability\")\n<\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/allendowney.github.io\/ThinkBayes2\/_images\/a85267e340ee9e92988ee9ce9ec80c2edcf7c526f04c09a28b53b4ca43f0fa1a.png\" alt=\"_images\/a85267e340ee9e92988ee9ce9ec80c2edcf7c526f04c09a28b53b4ca43f0fa1a.png\"\/><\/figure>\n\n\n\n<p>The tail distribution has the sigmoid shape that is characteristic of normal distributions and <em>t<\/em> distributions, although it is notably asymmetric.<\/p>\n\n\n\n<p>And here\u2019s what the tail probabilities look like on a log-y scale.<\/p>\n\n\n\n<pre id=\"codecell32\" class=\"wp-block-preformatted\">surv.plot(**options)\ndecorate(xlabel=\"Inverse rate (words per appearance)\", yscale=\"log\")\n<\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/allendowney.github.io\/ThinkBayes2\/_images\/ad17e7857447f99903d3a718f91bcaae09fdcf067df4d4f48f5127d3e8151c5d.png\" alt=\"_images\/ad17e7857447f99903d3a718f91bcaae09fdcf067df4d4f48f5127d3e8151c5d.png\"\/><\/figure>\n\n\n\n<p>If this distribution were normal, we would expect this curve to drop off with increasing slope. But for the words with the lowest frequencies \u2013 that is, the highest inverse rates \u2013 it is almost a straight line. And that suggests that a<\/p>\n\n\n\n<p>distribution might be a good model for this data.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Fitting a Model<\/h2>\n\n\n\n<p>To estimate the frequency of rare words, we will need to model the tail behavior of this distribution and extrapolate it beyond the data. So let\u2019s fit a <em>t<\/em> distribution and see how it looks. I\u2019ll use code from <a href=\"https:\/\/allendowney.github.io\/ProbablyOverthinkingIt\/longtail.html\">Chapter 8 of <em>Probably Overthinking It<\/em><\/a>, which is all about these long-tailed distributions.<\/p>\n\n\n\n<p>The following function makes a <code>Surv<\/code> object that represents a <em>t<\/em> distribution with the given parameters.<\/p>\n\n\n\n<pre id=\"codecell33\" class=\"wp-block-preformatted\">from scipy.stats import t as t_dist\n\n\ndef truncated_t_sf(qs, df, mu, sigma):\n    \"\"\"Makes Surv object for a t distribution.\n    \n    Truncated on the left, assuming all values are greater than min(qs)\n    \"\"\"\n    ps = t_dist.sf(qs, df, mu, sigma)\n    surv_model = Surv(ps \/ ps[0], qs)\n    return surv_model\n<\/pre>\n\n\n\n<p>If we are given the <code>df<\/code> parameter, we can use the following function to find the values of <code>mu<\/code> and <code>sigma<\/code> that best fit the data, focusing on the central part of the distribution.<\/p>\n\n\n\n<pre id=\"codecell34\" class=\"wp-block-preformatted\">from scipy.optimize import least_squares\n\n\ndef fit_truncated_t(df, surv):\n    \"\"\"Given df, find the best values of mu and sigma.\"\"\"\n    low, high = surv.qs.min(), surv.qs.max()\n    qs_model = np.linspace(low, high, 2000)\n    ps = np.linspace(0.1, 0.8, 20)\n    qs = surv.inverse(ps)\n\n    def error_func_t(params, df, surv):\n        mu, sigma = params\n        surv_model = truncated_t_sf(qs_model, df, mu, sigma)\n\n        error = surv(qs) - surv_model(qs)\n        return error\n\n    pmf = surv.make_pmf()\n    pmf.normalize()\n    params = pmf.mean(), pmf.std()\n    res = least_squares(error_func_t, x0=params, args=(df, surv), xtol=1e-3)\n    assert res.success\n    return res.x\n<\/pre>\n\n\n\n<p>But since we are not given <code>df<\/code>, we can use the following function to search for the value that best fits the tail of the distribution.<\/p>\n\n\n\n<pre id=\"codecell35\" class=\"wp-block-preformatted\">from scipy.optimize import minimize<br><br><br>def minimize_df(df0, surv, bounds=[(1, 1e3)], ps=None):<br>    low, high = surv.qs.min(), surv.qs.max()<br>    qs_model = np.linspace(low, high * 1.2, 2000)<br><br>    if ps is None:<br>        t = surv.ps[0], surv.ps[-5]<br>        low, high = np.log10(t)<br>        ps = np.logspace(low, high, 30, endpoint=False)<br><br>    qs = surv.inverse(ps)<br><br>    def error_func_tail(params):<br>        (df,) = params<br>        # print(df)<br>        mu, sigma = fit_truncated_t(df, surv)<br>        surv_model = truncated_t_sf(qs_model, df, mu, sigma)<br><br>        errors = np.log10(surv(qs)) - np.log10(surv_model(qs))<br>        return np.sum(errors**2)<br><br>    params = (df0,)<br>    res = minimize(error_func_tail, x0=params, bounds=bounds, method=\"Powell\")<br>    assert res.success<br>    return res.x<br><\/pre>\n\n\n\n<pre id=\"codecell36\" class=\"wp-block-preformatted\">df = minimize_df(25, surv)\ndf\n<\/pre>\n\n\n\n<pre id=\"codecell37\" class=\"wp-block-preformatted\">array([22.52401171])\n<\/pre>\n\n\n\n<pre id=\"codecell38\" class=\"wp-block-preformatted\">mu, sigma = fit_truncated_t(df, surv)\ndf, mu, sigma\n<\/pre>\n\n\n\n<pre id=\"codecell39\" class=\"wp-block-preformatted\">(array([22.52401171]), 6.433323515095857, 0.49070837962997577)\n<\/pre>\n\n\n\n<p>Here\u2019s the <code>t<\/code> distribution that best fits the data.<\/p>\n\n\n\n<pre id=\"codecell40\" class=\"wp-block-preformatted\">low, high = surv.qs.min(), surv.qs.max()\nqs = np.linspace(low, 10, 2000)\nsurv_model = truncated_t_sf(qs, df, mu, sigma)\n<\/pre>\n\n\n\n<pre id=\"codecell41\" class=\"wp-block-preformatted\">surv_model.plot(color=\"gray\", alpha=0.4, label=\"model\")\nsurv.plot(**options)\ndecorate(xlabel=\"Inverse rate (log10 words per appearance)\", ylabel=\"Tail probability\")\n<\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/allendowney.github.io\/ThinkBayes2\/_images\/6a78db0b7207d7492f70ad6ec717b9441d528c6bae9ad44e8e5673a7982f4777.png\" alt=\"_images\/6a78db0b7207d7492f70ad6ec717b9441d528c6bae9ad44e8e5673a7982f4777.png\"\/><\/figure>\n\n\n\n<p>With the y-axis on a linear scale, we can see that the model fits the data reasonably well, except for a range between 5 and 6 \u2013 that is for words that appear about 1 time in a million.<\/p>\n\n\n\n<p>Here\u2019s what the model looks like on a log-y scale.<\/p>\n\n\n\n<pre id=\"codecell42\" class=\"wp-block-preformatted\">surv_model.plot(color=\"gray\", alpha=0.4, label=\"model\")\nsurv.plot(**options)\ndecorate(\n    xlabel=\"Inverse rate (log10 words per appearance)\",\n    ylabel=\"Tail probability\",\n    yscale=\"log\",\n)\n<\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/allendowney.github.io\/ThinkBayes2\/_images\/8486cbced76150d86f5639eb3320a68b34fd31325bd8d20f9caf3d994dd0c669.png\" alt=\"_images\/8486cbced76150d86f5639eb3320a68b34fd31325bd8d20f9caf3d994dd0c669.png\"\/><\/figure>\n\n\n\n<p>The model fits the data well in the extreme tail, which is exactly where we need it. And we can use the model to extrapolate a little beyond the data, to make sure we cover the range that will turn out to be likely in the scenario where we hear a word for this first time after 50 years.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Update<\/h2>\n\n\n\n<p>The model we\u2019ve developed is the distribution of inverse rates for the words that appear in the corpus and, by extrapolation, for additional rare words that didn\u2019t appear in the corpus. This distribution will be the prior for the Bayesian update. We just have to convert it from a survival function to a PMF (remembering that these are equivalent representations of the same distribution).<\/p>\n\n\n\n<pre id=\"codecell43\" class=\"wp-block-preformatted\">prior = surv_model.make_pmf()\nprior.plot(label=\"prior\")\ndecorate(\n    xlabel=\"Inverse rate (log10 words per appearance)\",\n    ylabel=\"Density\",\n)\n<\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/allendowney.github.io\/ThinkBayes2\/_images\/baed114476768c8511a77423e7217807b462ce1d867fe098a75626df9ed59042.png\" alt=\"_images\/baed114476768c8511a77423e7217807b462ce1d867fe098a75626df9ed59042.png\"\/><\/figure>\n\n\n\n<p>To compute the likelihood of the observation, we have to transform the inverse rates to probabilities.<\/p>\n\n\n\n<pre id=\"codecell44\" class=\"wp-block-preformatted\">ps = 1 \/ np.power(10, prior.qs)\n<\/pre>\n\n\n\n<p>Now suppose that in a given day, you read or hear 10,000 words in a context where you would notice if you heard a word for the first time. Here\u2019s the number of words you would hear in 50 years.<\/p>\n\n\n\n<pre id=\"codecell45\" class=\"wp-block-preformatted\">words_per_day = 10_000\ndays = 50 * 365\nk = days * words_per_day\nk\n<\/pre>\n\n\n\n<pre id=\"codecell46\" class=\"wp-block-preformatted\">182500000\n<\/pre>\n\n\n\n<p>Now, what\u2019s the probability that you fail to encounter a word in <code>k<\/code> attempts and then encounter it on the next attempt? We can answer that with the negative binomial distribution, which computes the probability of getting the <code>n<\/code>th success after <code>k<\/code> failures, for a given probability \u2013 or in this case, for a sequence of possible probabilities.<\/p>\n\n\n\n<pre id=\"codecell47\" class=\"wp-block-preformatted\">from scipy.stats import nbinom\n\nn = 1\nlikelihood = nbinom.pmf(k, n, ps)\n<\/pre>\n\n\n\n<p>With this likelihood and the prior, we can compute the posterior distribution in the usual way.<\/p>\n\n\n\n<pre id=\"codecell48\" class=\"wp-block-preformatted\">posterior = prior * likelihood\nposterior.normalize()\n<\/pre>\n\n\n\n<pre id=\"codecell49\" class=\"wp-block-preformatted\">1.368245917258196e-11\n<\/pre>\n\n\n\n<p>And here\u2019s what it looks like.<\/p>\n\n\n\n<pre id=\"codecell50\" class=\"wp-block-preformatted\">prior.plot(alpha=0.5, label=\"prior\")\nposterior.plot(label=\"posterior\")\ndecorate(\n    xlabel=\"Inverse rate (log10 words per appearance)\",\n    ylabel=\"Density\",\n)\n<\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/allendowney.github.io\/ThinkBayes2\/_images\/9ee20cb48e5268c6716d40b69fd8eede6f4a244ae418222ecfaaa83a632dd371.png\" alt=\"_images\/9ee20cb48e5268c6716d40b69fd8eede6f4a244ae418222ecfaaa83a632dd371.png\"\/><\/figure>\n\n\n\n<p>If you go 50 years without hearing a word, that suggests that it is a rare word, and the posterior distribution reflects that logic.<\/p>\n\n\n\n<p>The posterior distribution represents a range of possible values for the inverse rate of the word you heard. Now we can use it to answer the question we started with: what is the probability of hearing the same word again on the same day \u2013 that is, within the next 10,000 words you hear?<\/p>\n\n\n\n<p>To answer that, we can use the survival function of the <a href=\"https:\/\/allendowney.github.io\/ThinkBayes2\/chap18.html?highlight=binomial#binomial-likelihood\">binomial distribution<\/a> to compute the probability of more than 0 successes in the next <code>n_pred<\/code> attempts. We\u2019ll compute this probability for each of the <code>ps<\/code> that correspond to the inverse rates in the posterior.<\/p>\n\n\n\n<pre id=\"codecell51\" class=\"wp-block-preformatted\">from scipy.stats import binom\n\nn_pred = words_per_day\nps_pred = binom.sf(0, n_pred, ps)\n<\/pre>\n\n\n\n<p>And we can use the probabilities in the posterior to compute the expected value \u2013 by the law of total probability, the result is the probability of hearing the same word again within a day.<\/p>\n\n\n\n<pre id=\"codecell52\" class=\"wp-block-preformatted\">p = np.sum(posterior * ps_pred)\np, 1 \/ p\n<\/pre>\n\n\n\n<pre id=\"codecell53\" class=\"wp-block-preformatted\">(0.00016019406802217392, 6242.42840166579)\n<\/pre>\n\n\n\n<p>The result is about 1 in 6000.<\/p>\n\n\n\n<p>With all of the assumptions we made in this calculation, there\u2019s no reason to be more precise than that. And as I mentioned at the beginning, we should probably not take this conclusion too seriously. For one thing, it&#8217;s likely that my experience is an example of the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Frequency_illusion\">frequency illusion<\/a>, which is &#8220;a cognitive bias in which a person notices a specific concept, word, or product more frequently after recently becoming aware of it.&#8221; Also, if you hear a word for the first time after 50 years, there\u2019s a good chance the word is having a moment, which greatly increases the chance you\u2019ll hear it again. I can\u2019t think of why chartism might be in the news at the moment, but maybe this post will go viral and make it happen.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recently I heard the word \u201cchartist\u201d for the first time in my life (that I recall). And then later the same day, I heard it again. So that raises two questions: To answer the second question first, it\u2019s someone who supported chartism, which was \u201ca working-class movement for political reform in the United Kingdom that erupted from 1838 to 1857\u201d, quoth Wikipedia. The name comes from the People\u2019s Charter of 1838, which called for voting rights for unpropertied men, among&#8230;<\/p>\n<p class=\"read-more\"><a class=\"btn btn-default\" href=\"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/\"> Read More<span class=\"screen-reader-text\">  Read More<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[1],"tags":[],"class_list":["post-1431","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>What&#039;s a Chartist? - Probably Overthinking It<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"What&#039;s a Chartist? - Probably Overthinking It\" \/>\n<meta property=\"og:description\" content=\"Recently I heard the word \u201cchartist\u201d for the first time in my life (that I recall). And then later the same day, I heard it again. So that raises two questions: To answer the second question first, it\u2019s someone who supported chartism, which was \u201ca working-class movement for political reform in the United Kingdom that erupted from 1838 to 1857\u201d, quoth Wikipedia. The name comes from the People\u2019s Charter of 1838, which called for voting rights for unpropertied men, among... Read More Read More\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/\" \/>\n<meta property=\"og:site_name\" content=\"Probably Overthinking It\" \/>\n<meta property=\"article:published_time\" content=\"2024-11-19T16:04:41+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-11-20T18:13:44+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/allendowney.github.io\/ThinkBayes2\/_images\/204f4aae3fe537fefdbe43abadd4be2a854bc627c7d5e064d9efebd9cc6a58df.png\" \/>\n<meta name=\"author\" content=\"AllenDowney\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@AllenDowney\" \/>\n<meta name=\"twitter:site\" content=\"@AllenDowney\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"AllenDowney\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"13 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/\"},\"author\":{\"name\":\"AllenDowney\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207\"},\"headline\":\"What&#8217;s a Chartist?\",\"datePublished\":\"2024-11-19T16:04:41+00:00\",\"dateModified\":\"2024-11-20T18:13:44+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/\"},\"wordCount\":1672,\"publisher\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/allendowney.github.io\/ThinkBayes2\/_images\/204f4aae3fe537fefdbe43abadd4be2a854bc627c7d5e064d9efebd9cc6a58df.png\",\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/\",\"url\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/\",\"name\":\"What's a Chartist? - Probably Overthinking It\",\"isPartOf\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/allendowney.github.io\/ThinkBayes2\/_images\/204f4aae3fe537fefdbe43abadd4be2a854bc627c7d5e064d9efebd9cc6a58df.png\",\"datePublished\":\"2024-11-19T16:04:41+00:00\",\"dateModified\":\"2024-11-20T18:13:44+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/#primaryimage\",\"url\":\"https:\/\/allendowney.github.io\/ThinkBayes2\/_images\/204f4aae3fe537fefdbe43abadd4be2a854bc627c7d5e064d9efebd9cc6a58df.png\",\"contentUrl\":\"https:\/\/allendowney.github.io\/ThinkBayes2\/_images\/204f4aae3fe537fefdbe43abadd4be2a854bc627c7d5e064d9efebd9cc6a58df.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.allendowney.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"What&#8217;s a Chartist?\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#website\",\"url\":\"https:\/\/www.allendowney.com\/blog\/\",\"name\":\"Probably Overthinking It\",\"description\":\"Data science, Bayesian Statistics, and other ideas\",\"publisher\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.allendowney.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#organization\",\"name\":\"Probably Overthinking It\",\"url\":\"https:\/\/www.allendowney.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png\",\"contentUrl\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png\",\"width\":714,\"height\":784,\"caption\":\"Probably Overthinking It\"},\"image\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/AllenDowney\",\"https:\/\/www.linkedin.com\/in\/allendowney\/\",\"https:\/\/bsky.app\/profile\/allendowney.bsky.social\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207\",\"name\":\"AllenDowney\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g\",\"caption\":\"AllenDowney\"},\"url\":\"https:\/\/www.allendowney.com\/blog\/author\/allendowney_6dbrc4\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"What's a Chartist? - Probably Overthinking It","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/","og_locale":"en_US","og_type":"article","og_title":"What's a Chartist? - Probably Overthinking It","og_description":"Recently I heard the word \u201cchartist\u201d for the first time in my life (that I recall). And then later the same day, I heard it again. So that raises two questions: To answer the second question first, it\u2019s someone who supported chartism, which was \u201ca working-class movement for political reform in the United Kingdom that erupted from 1838 to 1857\u201d, quoth Wikipedia. The name comes from the People\u2019s Charter of 1838, which called for voting rights for unpropertied men, among... Read More Read More","og_url":"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/","og_site_name":"Probably Overthinking It","article_published_time":"2024-11-19T16:04:41+00:00","article_modified_time":"2024-11-20T18:13:44+00:00","og_image":[{"url":"https:\/\/allendowney.github.io\/ThinkBayes2\/_images\/204f4aae3fe537fefdbe43abadd4be2a854bc627c7d5e064d9efebd9cc6a58df.png","type":"","width":"","height":""}],"author":"AllenDowney","twitter_card":"summary_large_image","twitter_creator":"@AllenDowney","twitter_site":"@AllenDowney","twitter_misc":{"Written by":"AllenDowney","Est. reading time":"13 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/#article","isPartOf":{"@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/"},"author":{"name":"AllenDowney","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207"},"headline":"What&#8217;s a Chartist?","datePublished":"2024-11-19T16:04:41+00:00","dateModified":"2024-11-20T18:13:44+00:00","mainEntityOfPage":{"@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/"},"wordCount":1672,"publisher":{"@id":"https:\/\/www.allendowney.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/#primaryimage"},"thumbnailUrl":"https:\/\/allendowney.github.io\/ThinkBayes2\/_images\/204f4aae3fe537fefdbe43abadd4be2a854bc627c7d5e064d9efebd9cc6a58df.png","inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/","url":"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/","name":"What's a Chartist? - Probably Overthinking It","isPartOf":{"@id":"https:\/\/www.allendowney.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/#primaryimage"},"image":{"@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/#primaryimage"},"thumbnailUrl":"https:\/\/allendowney.github.io\/ThinkBayes2\/_images\/204f4aae3fe537fefdbe43abadd4be2a854bc627c7d5e064d9efebd9cc6a58df.png","datePublished":"2024-11-19T16:04:41+00:00","dateModified":"2024-11-20T18:13:44+00:00","breadcrumb":{"@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/#primaryimage","url":"https:\/\/allendowney.github.io\/ThinkBayes2\/_images\/204f4aae3fe537fefdbe43abadd4be2a854bc627c7d5e064d9efebd9cc6a58df.png","contentUrl":"https:\/\/allendowney.github.io\/ThinkBayes2\/_images\/204f4aae3fe537fefdbe43abadd4be2a854bc627c7d5e064d9efebd9cc6a58df.png"},{"@type":"BreadcrumbList","@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.allendowney.com\/blog\/"},{"@type":"ListItem","position":2,"name":"What&#8217;s a Chartist?"}]},{"@type":"WebSite","@id":"https:\/\/www.allendowney.com\/blog\/#website","url":"https:\/\/www.allendowney.com\/blog\/","name":"Probably Overthinking It","description":"Data science, Bayesian Statistics, and other ideas","publisher":{"@id":"https:\/\/www.allendowney.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.allendowney.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.allendowney.com\/blog\/#organization","name":"Probably Overthinking It","url":"https:\/\/www.allendowney.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png","contentUrl":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png","width":714,"height":784,"caption":"Probably Overthinking It"},"image":{"@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/AllenDowney","https:\/\/www.linkedin.com\/in\/allendowney\/","https:\/\/bsky.app\/profile\/allendowney.bsky.social"]},{"@type":"Person","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207","name":"AllenDowney","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g","caption":"AllenDowney"},"url":"https:\/\/www.allendowney.com\/blog\/author\/allendowney_6dbrc4\/"}]}},"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[{"id":1661,"url":"https:\/\/www.allendowney.com\/blog\/2025\/12\/04\/the-lost-chapter\/","url_meta":{"origin":1431,"position":0},"title":"The Lost Chapter","author":"AllenDowney","date":"December 4, 2025","format":false,"excerpt":"I'm happy to report that Probably Overthinking It is available now in paperback. If you would like a copy, you can order from Bookshop.org and Amazon (affiliate links). To celebrate, I'm publishing The Lost Chapter -- that is, the chapter I cut from the published book. It's about The Girl\u2026","rel":"","context":"In \"paradox\"","block_context":{"text":"paradox","link":"https:\/\/www.allendowney.com\/blog\/tag\/paradox\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/12\/502b755caff85849b479f54f200cc4eeb645981da95d8f8a1c908832b6539896.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":1715,"url":"https:\/\/www.allendowney.com\/blog\/2026\/01\/31\/the-girl-born-on-tuesday\/","url_meta":{"origin":1431,"position":1},"title":"The Girl Born on Tuesday","author":"AllenDowney","date":"January 31, 2026","format":false,"excerpt":"Some people have strong opinions about this question: In a family with two children, if at least one of the children is a girl born on Tuesday, what are the chances that both children are girls? In this article, I hope to offer A solution to one interpretation of this\u2026","rel":"","context":"In \"Bayes&#039;s Theorem\"","block_context":{"text":"Bayes&#039;s Theorem","link":"https:\/\/www.allendowney.com\/blog\/tag\/bayess-theorem\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2026\/01\/c81aa262e67d9b56ecabe5664c2397cdd0375ce23e2d2c683d8d281e36c47726.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2026\/01\/c81aa262e67d9b56ecabe5664c2397cdd0375ce23e2d2c683d8d281e36c47726.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2026\/01\/c81aa262e67d9b56ecabe5664c2397cdd0375ce23e2d2c683d8d281e36c47726.png?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":442,"url":"https:\/\/www.allendowney.com\/blog\/2020\/04\/13\/bayesian-hypothesis-testing\/","url_meta":{"origin":1431,"position":2},"title":"Bayesian hypothesis testing","author":"AllenDowney","date":"April 13, 2020","format":false,"excerpt":"I have mixed feelings about Bayesian hypothesis testing. On the positive side, it's better than null-hypothesis significance testing (NHST). And it is probably necessary as an onboarding tool: Hypothesis testing is one of the first things future Bayesians ask about; we need to have an answer. On the negative side,\u2026","rel":"","context":"In \"bayesian statistics\"","block_context":{"text":"bayesian statistics","link":"https:\/\/www.allendowney.com\/blog\/tag\/bayesian-statistics\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2020\/04\/image.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":1145,"url":"https:\/\/www.allendowney.com\/blog\/2023\/12\/12\/smoking-causes-cancer-2\/","url_meta":{"origin":1431,"position":3},"title":"Smoking Causes Cancer","author":"AllenDowney","date":"December 12, 2023","format":false,"excerpt":"In the preface of Probably Overthinking It, I wrote: Sometimes interpreting data is easy. For example, one of the reasons we know that smoking causes lung cancer is that when only 20% of the population smoked, 80% of people with lung cancer were smokers. If you are a doctor who\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2023\/12\/image-1.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2023\/12\/image-1.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2023\/12\/image-1.png?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2023\/12\/image-1.png?resize=700%2C400&ssl=1 2x"},"classes":[]},{"id":826,"url":"https:\/\/www.allendowney.com\/blog\/2022\/11\/14\/overthinking-the-question\/","url_meta":{"origin":1431,"position":4},"title":"Overthinking the question","author":"AllenDowney","date":"November 14, 2022","format":false,"excerpt":"\"Tell me if you agree or disagree with this statement: \u00a0Most men are better suited emotionally for politics than are most women.\" That's one of the questions on the General Social Survey. In 1974, when it was first asked, 47% of respondents said they agreed. In 2018, the most recent\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":426,"url":"https:\/\/www.allendowney.com\/blog\/2020\/01\/28\/the-elvis-problem-revisited\/","url_meta":{"origin":1431,"position":5},"title":"The Elvis problem revisited","author":"AllenDowney","date":"January 28, 2020","format":false,"excerpt":"Here's a problem from Bayesian Data Analysis: Elvis Presley had a twin brother (who died at birth). What is the probability that Elvis was an identical twin? I will answer this question in three steps: First, we need some background information about the relative frequencies of identical and fraternal twins.Then\u2026","rel":"","context":"In \"Elvis\"","block_context":{"text":"Elvis","link":"https:\/\/www.allendowney.com\/blog\/tag\/elvis\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2020\/01\/birth_data_1935.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2020\/01\/birth_data_1935.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2020\/01\/birth_data_1935.png?resize=525%2C300&ssl=1 1.5x"},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts\/1431","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/comments?post=1431"}],"version-history":[{"count":3,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts\/1431\/revisions"}],"predecessor-version":[{"id":1435,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts\/1431\/revisions\/1435"}],"wp:attachment":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/media?parent=1431"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/categories?post=1431"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/tags?post=1431"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}