{"id":1411,"date":"2024-11-10T14:08:57","date_gmt":"2024-11-10T14:08:57","guid":{"rendered":"https:\/\/www.allendowney.com\/blog\/?p=1411"},"modified":"2024-11-14T20:06:47","modified_gmt":"2024-11-14T20:06:47","slug":"zipfs-law","status":"publish","type":"post","link":"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/","title":{"rendered":"War and Peace and Zipf&#8217;s Law"},"content":{"rendered":"\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em>Elements of Data Science<\/em> is in print now, <a href=\"https:\/\/www.lulu.com\/shop\/allen-downey\/elements-of-data-science\/paperback\/product-9dyrwn.html\">available from Lulu.com<\/a> and online booksellers. To celebrate, I&#8217;ll post some excerpts here, starting with one of my favorite examples, Zipf&#8217;s Law. It&#8217;s from Chapter 6, which is about plotting data, and it uses Python dictionaries, which are covered in the previous chapter. You can read the complete chapter <a href=\"https:\/\/allendowney.github.io\/ElementsOfDataScience\/06_plotting.html\">here<\/a>, or run the Jupyter notebook on <a href=\"https:\/\/colab.research.google.com\/github\/AllenDowney\/ElementsOfDataScience\/blob\/v1\/06_plotting.ipynb\">Colab<\/a>.<\/p>\n<\/blockquote>\n\n\n\n<p>In almost any book, in almost any language, if you count the number of unique words and the number of times each word appears, you will find a remarkable pattern: the most common word appears twice as often as the second most common \u2013 at least approximately \u2013 three times as often as the third most common, and so on.<\/p>\n\n\n\n<p>In general, if we sort the words in descending order of frequency, there is an inverse relationship between the rank of the words \u2013 first, second, third, etc. \u2013 and the number of times they appear. This observation was most famously made by George Kingsley Zipf, so it is called Zipf\u2019s law.<\/p>\n\n\n\n<p>To see if this law holds for the words in <em>War and Peace<\/em>, we\u2019ll make a Zipf plot, which shows:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The frequency of each word on the y-axis, and<\/li>\n\n\n\n<li>The rank of each word on the x-axis, starting from 1.<\/li>\n<\/ul>\n\n\n\n<p>In the previous chapter, we looped through the book and made a string that contains all punctuation characters. Here are the results, which we will need again.<\/p>\n\n\n\n<pre id=\"codecell28\" class=\"wp-block-preformatted\">all_punctuation = ',.-:[#]*\/\u201c\u2019\u2014\u2018!?\u201d;()%@'\n<\/pre>\n\n\n\n<p>The following program reads through the book and makes a dictionary that maps from each word to the number of times it appears.<\/p>\n\n\n\n<pre id=\"codecell29\" class=\"wp-block-preformatted\">fp = open('2600-0.txt')\nfor line in fp:\n    if line.startswith('***'):\n        break\n\nunique_words = {}\nfor line in fp:\n    if line.startswith('***'):\n        break\n        \n    for word in line.split():\n        word = word.lower()\n        word = word.strip(all_punctuation)\n        if word in unique_words:\n            unique_words[word] += 1\n        else:\n            unique_words[word] = 1\n<\/pre>\n\n\n\n<p>In <code>unique_words<\/code>, the keys are words and the values are their frequencies. We can use the <code>values<\/code> function to get the values from the dictionary. The result has the type <code>dict_values<\/code>:<\/p>\n\n\n\n<pre id=\"codecell30\" class=\"wp-block-preformatted\">freqs = unique_words.values()\ntype(freqs)\n<\/pre>\n\n\n\n<pre id=\"codecell31\" class=\"wp-block-preformatted\">dict_values\n<\/pre>\n\n\n\n<p>Before we plot them, we have to sort them, but the <code>sort<\/code> function doesn\u2019t work with <code>dict_values<\/code>.<\/p>\n\n\n\n<pre id=\"codecell32\" class=\"wp-block-preformatted\">%%expect AttributeError\n\nfreqs.sort()\n<\/pre>\n\n\n\n<pre id=\"codecell33\" class=\"wp-block-preformatted\">AttributeError: 'dict_values' object has no attribute 'sort'\n<\/pre>\n\n\n\n<p>We can use <code>list<\/code> to make a list of frequencies:<\/p>\n\n\n\n<pre id=\"codecell34\" class=\"wp-block-preformatted\">freq_list = list(unique_words.values())\ntype(freq_list)\n<\/pre>\n\n\n\n<pre id=\"codecell35\" class=\"wp-block-preformatted\">list\n<\/pre>\n\n\n\n<p>And now we can use <code>sort<\/code>. By default it sorts in ascending order, but we can pass a keyword argument to reverse the order.<\/p>\n\n\n\n<pre id=\"codecell36\" class=\"wp-block-preformatted\">freq_list.sort(reverse=True)\n<\/pre>\n\n\n\n<p>Now, for the ranks, we need a sequence that counts from 1 to <code>n<\/code>, where <code>n<\/code> is the number of elements in <code>freq_list<\/code>. We can use the <code>range<\/code> function, which returns a value with type <code>range<\/code>. As a small example, here\u2019s the range from 1 to 5.<\/p>\n\n\n\n<pre id=\"codecell37\" class=\"wp-block-preformatted\">range(1, 5)\n<\/pre>\n\n\n\n<pre id=\"codecell38\" class=\"wp-block-preformatted\">range(1, 5)\n<\/pre>\n\n\n\n<p>However, there\u2019s a catch. If we use the range to make a list, we see that \u201cthe range from 1 to 5\u201d includes 1, but it doesn\u2019t include 5.<\/p>\n\n\n\n<pre id=\"codecell39\" class=\"wp-block-preformatted\">list(range(1, 5))\n<\/pre>\n\n\n\n<pre id=\"codecell40\" class=\"wp-block-preformatted\">[1, 2, 3, 4]\n<\/pre>\n\n\n\n<p>That might seem strange, but it is often more convenient to use <code>range<\/code> when it is defined this way, rather than what might seem like the more natural way. Anyway, we can get what we want by increasing the second argument by one:<\/p>\n\n\n\n<pre id=\"codecell41\" class=\"wp-block-preformatted\">list(range(1, 6))\n<\/pre>\n\n\n\n<pre id=\"codecell42\" class=\"wp-block-preformatted\">[1, 2, 3, 4, 5]\n<\/pre>\n\n\n\n<p>So, finally, we can make a range that represents the ranks from <code>1<\/code> to <code>n<\/code>:<\/p>\n\n\n\n<pre id=\"codecell43\" class=\"wp-block-preformatted\">n = len(freq_list)\nranks = range(1, n+1)\nranks\n<\/pre>\n\n\n\n<pre id=\"codecell44\" class=\"wp-block-preformatted\">range(1, 20484)\n<\/pre>\n\n\n\n<p>And now we can plot the frequencies versus the ranks:<\/p>\n\n\n\n<pre id=\"codecell45\" class=\"wp-block-preformatted\">plt.plot(ranks, freq_list)<br><br>plt.xlabel('Rank')<br>plt.ylabel('Frequency')<br>plt.title(\"War and Peace and Zipf's law\");<br><\/pre>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"417\" height=\"264\" src=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/8a72a3aca82bbeb573bd4e9b47c5a0e1c9516921c67b531a1ab90ac836d60517.png\" alt=\"\" class=\"wp-image-1413\" srcset=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/8a72a3aca82bbeb573bd4e9b47c5a0e1c9516921c67b531a1ab90ac836d60517.png 417w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/8a72a3aca82bbeb573bd4e9b47c5a0e1c9516921c67b531a1ab90ac836d60517-300x190.png 300w\" sizes=\"auto, (max-width: 417px) 100vw, 417px\" \/><\/figure>\n\n\n\n<p>According to Zipf\u2019s law, these frequencies should be inversely proportional to the ranks. If that\u2019s true, we can write:<\/p>\n\n\n\n<p>f = k \/ r<\/p>\n\n\n\n<p>where r is the rank of a word, f is its frequency, and k is an unknown constant of proportionality. If we take the logarithm of both sides, we get<\/p>\n\n\n\n<p>log f = log k &#8211; log r<\/p>\n\n\n\n<p>This equation implies that if we plot f versus r on a log-log scale, we expect to see a straight line with intercept at log k and slope -1.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">6.6. Logarithmic Scales<\/h2>\n\n\n\n<p>We can use <code>plt.xscale<\/code> to plot the x-axis on a log scale.<\/p>\n\n\n\n<pre id=\"codecell46\" class=\"wp-block-preformatted\">plt.plot(ranks, freq_list)\n\nplt.xlabel('Rank')\nplt.ylabel('Frequency')\nplt.title(\"War and Peace and Zipf's law\")\nplt.xscale('log')\n<\/pre>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"417\" height=\"268\" src=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/329072caf8939213ad7f866bc6fc1fd42224d40d6f1b8e0db2e204a0e709c7eb.png\" alt=\"\" class=\"wp-image-1416\" srcset=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/329072caf8939213ad7f866bc6fc1fd42224d40d6f1b8e0db2e204a0e709c7eb.png 417w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/329072caf8939213ad7f866bc6fc1fd42224d40d6f1b8e0db2e204a0e709c7eb-300x193.png 300w\" sizes=\"auto, (max-width: 417px) 100vw, 417px\" \/><\/figure>\n\n\n\n<p>And <code>plt.yscale<\/code> to plot the y-axis on a log scale.<\/p>\n\n\n\n<pre id=\"codecell47\" class=\"wp-block-preformatted\">plt.plot(ranks, freq_list)\n\nplt.xlabel('Rank')\nplt.ylabel('Frequency')\nplt.title(\"War and Peace and Zipf's law\")\nplt.xscale('log')\nplt.yscale('log')\n<\/pre>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"404\" height=\"268\" src=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/71629c618864d20a59f2a2168ed3b1947dc58cc329557fb85cf41bdf0b7baae0-1.png\" alt=\"\" class=\"wp-image-1415\" srcset=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/71629c618864d20a59f2a2168ed3b1947dc58cc329557fb85cf41bdf0b7baae0-1.png 404w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/71629c618864d20a59f2a2168ed3b1947dc58cc329557fb85cf41bdf0b7baae0-1-300x199.png 300w\" sizes=\"auto, (max-width: 404px) 100vw, 404px\" \/><\/figure>\n\n\n\n<p>The result is not quite a straight line, but it is close. We can get a sense of the slope by connecting the end points with a line. First, we\u2019ll select the first and last elements from <code>xs<\/code>.<\/p>\n\n\n\n<pre id=\"codecell48\" class=\"wp-block-preformatted\">xs = ranks[0], ranks[-1]\nxs\n<\/pre>\n\n\n\n<pre id=\"codecell49\" class=\"wp-block-preformatted\">(1, 20483)\n<\/pre>\n\n\n\n<p>And the first and last elements from <code>ys<\/code>.<\/p>\n\n\n\n<pre id=\"codecell50\" class=\"wp-block-preformatted\">ys = freq_list[0], freq_list[-1]\nys\n<\/pre>\n\n\n\n<pre id=\"codecell51\" class=\"wp-block-preformatted\">(34389, 1)\n<\/pre>\n\n\n\n<p>And plot a line between them.<\/p>\n\n\n\n<pre id=\"codecell52\" class=\"wp-block-preformatted\">plt.plot(xs, ys, color='gray')\nplt.plot(ranks, freq_list)\n\nplt.xlabel('Rank')\nplt.ylabel('Frequency')\nplt.title(\"War and Peace and Zipf's law\")\nplt.xscale('log')\nplt.yscale('log')\n<\/pre>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"404\" height=\"268\" src=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/92b6d5e8cb9065d4b5b7e8ea0d56738437c9b08f97e05d0ceab47925c228ab62.png\" alt=\"\" class=\"wp-image-1417\" srcset=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/92b6d5e8cb9065d4b5b7e8ea0d56738437c9b08f97e05d0ceab47925c228ab62.png 404w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/92b6d5e8cb9065d4b5b7e8ea0d56738437c9b08f97e05d0ceab47925c228ab62-300x199.png 300w\" sizes=\"auto, (max-width: 404px) 100vw, 404px\" \/><\/figure>\n\n\n\n<p>The slope of this line is the \u201crise over run\u201d, that is, the difference on the y-axis divided by the difference on the x-axis. We can compute the rise using <code>np.log10<\/code> to compute the log base 10 of the first and last values:<\/p>\n\n\n\n<pre id=\"codecell53\" class=\"wp-block-preformatted\">np.log10(ys)\n<\/pre>\n\n\n\n<pre id=\"codecell54\" class=\"wp-block-preformatted\">array([4.53641955, 0.        ])\n<\/pre>\n\n\n\n<p>Then we can use <code>np.diff<\/code> to compute the difference between the elements:<\/p>\n\n\n\n<pre id=\"codecell55\" class=\"wp-block-preformatted\">rise = np.diff(np.log10(ys))\nrise\n<\/pre>\n\n\n\n<pre id=\"codecell56\" class=\"wp-block-preformatted\">array([-4.53641955])\n<\/pre>\n\n\n\n<p><strong>Exercise:<\/strong> Use <code>log10<\/code> and <code>diff<\/code> to compute the run, that is, the difference on the x-axis. Then divide the rise by the run to get the slope of the grey line. Is it close to -1, as Zipf\u2019s law predicts? Hint: yes.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Elements of Data Science is in print now, available from Lulu.com and online booksellers. To celebrate, I&#8217;ll post some excerpts here, starting with one of my favorite examples, Zipf&#8217;s Law. It&#8217;s from Chapter 6, which is about plotting data, and it uses Python dictionaries, which are covered in the previous chapter. You can read the complete chapter here, or run the Jupyter notebook on Colab. In almost any book, in almost any language, if you count the number of unique&#8230;<\/p>\n<p class=\"read-more\"><a class=\"btn btn-default\" href=\"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/\"> Read More<span class=\"screen-reader-text\">  Read More<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[1],"tags":[],"class_list":["post-1411","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>War and Peace and Zipf&#039;s Law - Probably Overthinking It<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"War and Peace and Zipf&#039;s Law - Probably Overthinking It\" \/>\n<meta property=\"og:description\" content=\"Elements of Data Science is in print now, available from Lulu.com and online booksellers. To celebrate, I&#8217;ll post some excerpts here, starting with one of my favorite examples, Zipf&#8217;s Law. It&#8217;s from Chapter 6, which is about plotting data, and it uses Python dictionaries, which are covered in the previous chapter. You can read the complete chapter here, or run the Jupyter notebook on Colab. In almost any book, in almost any language, if you count the number of unique... Read More Read More\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/\" \/>\n<meta property=\"og:site_name\" content=\"Probably Overthinking It\" \/>\n<meta property=\"article:published_time\" content=\"2024-11-10T14:08:57+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-11-14T20:06:47+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/8a72a3aca82bbeb573bd4e9b47c5a0e1c9516921c67b531a1ab90ac836d60517.png\" \/>\n<meta name=\"author\" content=\"AllenDowney\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@AllenDowney\" \/>\n<meta name=\"twitter:site\" content=\"@AllenDowney\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"AllenDowney\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/\"},\"author\":{\"name\":\"AllenDowney\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207\"},\"headline\":\"War and Peace and Zipf&#8217;s Law\",\"datePublished\":\"2024-11-10T14:08:57+00:00\",\"dateModified\":\"2024-11-14T20:06:47+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/\"},\"wordCount\":738,\"publisher\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/8a72a3aca82bbeb573bd4e9b47c5a0e1c9516921c67b531a1ab90ac836d60517.png\",\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/\",\"url\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/\",\"name\":\"War and Peace and Zipf's Law - Probably Overthinking It\",\"isPartOf\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/8a72a3aca82bbeb573bd4e9b47c5a0e1c9516921c67b531a1ab90ac836d60517.png\",\"datePublished\":\"2024-11-10T14:08:57+00:00\",\"dateModified\":\"2024-11-14T20:06:47+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/#primaryimage\",\"url\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/8a72a3aca82bbeb573bd4e9b47c5a0e1c9516921c67b531a1ab90ac836d60517.png\",\"contentUrl\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/8a72a3aca82bbeb573bd4e9b47c5a0e1c9516921c67b531a1ab90ac836d60517.png\",\"width\":417,\"height\":264},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.allendowney.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"War and Peace and Zipf&#8217;s Law\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#website\",\"url\":\"https:\/\/www.allendowney.com\/blog\/\",\"name\":\"Probably Overthinking It\",\"description\":\"Data science, Bayesian Statistics, and other ideas\",\"publisher\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.allendowney.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#organization\",\"name\":\"Probably Overthinking It\",\"url\":\"https:\/\/www.allendowney.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png\",\"contentUrl\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png\",\"width\":714,\"height\":784,\"caption\":\"Probably Overthinking It\"},\"image\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/AllenDowney\",\"https:\/\/www.linkedin.com\/in\/allendowney\/\",\"https:\/\/bsky.app\/profile\/allendowney.bsky.social\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207\",\"name\":\"AllenDowney\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g\",\"caption\":\"AllenDowney\"},\"url\":\"https:\/\/www.allendowney.com\/blog\/author\/allendowney_6dbrc4\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"War and Peace and Zipf's Law - Probably Overthinking It","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/","og_locale":"en_US","og_type":"article","og_title":"War and Peace and Zipf's Law - Probably Overthinking It","og_description":"Elements of Data Science is in print now, available from Lulu.com and online booksellers. To celebrate, I&#8217;ll post some excerpts here, starting with one of my favorite examples, Zipf&#8217;s Law. It&#8217;s from Chapter 6, which is about plotting data, and it uses Python dictionaries, which are covered in the previous chapter. You can read the complete chapter here, or run the Jupyter notebook on Colab. In almost any book, in almost any language, if you count the number of unique... Read More Read More","og_url":"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/","og_site_name":"Probably Overthinking It","article_published_time":"2024-11-10T14:08:57+00:00","article_modified_time":"2024-11-14T20:06:47+00:00","og_image":[{"url":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/8a72a3aca82bbeb573bd4e9b47c5a0e1c9516921c67b531a1ab90ac836d60517.png","type":"","width":"","height":""}],"author":"AllenDowney","twitter_card":"summary_large_image","twitter_creator":"@AllenDowney","twitter_site":"@AllenDowney","twitter_misc":{"Written by":"AllenDowney","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/#article","isPartOf":{"@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/"},"author":{"name":"AllenDowney","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207"},"headline":"War and Peace and Zipf&#8217;s Law","datePublished":"2024-11-10T14:08:57+00:00","dateModified":"2024-11-14T20:06:47+00:00","mainEntityOfPage":{"@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/"},"wordCount":738,"publisher":{"@id":"https:\/\/www.allendowney.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/#primaryimage"},"thumbnailUrl":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/8a72a3aca82bbeb573bd4e9b47c5a0e1c9516921c67b531a1ab90ac836d60517.png","inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/","url":"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/","name":"War and Peace and Zipf's Law - Probably Overthinking It","isPartOf":{"@id":"https:\/\/www.allendowney.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/#primaryimage"},"image":{"@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/#primaryimage"},"thumbnailUrl":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/8a72a3aca82bbeb573bd4e9b47c5a0e1c9516921c67b531a1ab90ac836d60517.png","datePublished":"2024-11-10T14:08:57+00:00","dateModified":"2024-11-14T20:06:47+00:00","breadcrumb":{"@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/#primaryimage","url":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/8a72a3aca82bbeb573bd4e9b47c5a0e1c9516921c67b531a1ab90ac836d60517.png","contentUrl":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/8a72a3aca82bbeb573bd4e9b47c5a0e1c9516921c67b531a1ab90ac836d60517.png","width":417,"height":264},{"@type":"BreadcrumbList","@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/10\/zipfs-law\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.allendowney.com\/blog\/"},{"@type":"ListItem","position":2,"name":"War and Peace and Zipf&#8217;s Law"}]},{"@type":"WebSite","@id":"https:\/\/www.allendowney.com\/blog\/#website","url":"https:\/\/www.allendowney.com\/blog\/","name":"Probably Overthinking It","description":"Data science, Bayesian Statistics, and other ideas","publisher":{"@id":"https:\/\/www.allendowney.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.allendowney.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.allendowney.com\/blog\/#organization","name":"Probably Overthinking It","url":"https:\/\/www.allendowney.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png","contentUrl":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png","width":714,"height":784,"caption":"Probably Overthinking It"},"image":{"@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/AllenDowney","https:\/\/www.linkedin.com\/in\/allendowney\/","https:\/\/bsky.app\/profile\/allendowney.bsky.social"]},{"@type":"Person","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207","name":"AllenDowney","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g","caption":"AllenDowney"},"url":"https:\/\/www.allendowney.com\/blog\/author\/allendowney_6dbrc4\/"}]}},"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[{"id":748,"url":"https:\/\/www.allendowney.com\/blog\/2022\/07\/23\/almost-done\/","url_meta":{"origin":1411,"position":0},"title":"Almost done?","author":"AllenDowney","date":"July 23, 2022","format":false,"excerpt":"I thought I would be done this week, but it looks like there will be one more chapter. If you don't know, I am working on a book that includes updated articles from this blog, plus new examples, and pulls the whole thing together. So far, it's going well. I\u2026","rel":"","context":"In \"book\"","block_context":{"text":"book","link":"https:\/\/www.allendowney.com\/blog\/tag\/book\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2022\/07\/simpson_penguin.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":1357,"url":"https:\/\/www.allendowney.com\/blog\/2024\/08\/23\/probably-the-book\/","url_meta":{"origin":1411,"position":1},"title":"Probably the Book","author":"AllenDowney","date":"August 23, 2024","format":false,"excerpt":"Last week I had the pleasure of presenting a keynote at posit::conf(2024). When the video is available, I will post it here [UPDATE here it is]. https:\/\/www.youtube.com\/watch?v=YKMZIzYBgTk In the meantime, you can read the slides, if you don't mind spoilers. For people at the conference who don't know me, this\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/08\/are_you_normal_windshield_wiper.gif?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/08\/are_you_normal_windshield_wiper.gif?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/08\/are_you_normal_windshield_wiper.gif?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/08\/are_you_normal_windshield_wiper.gif?resize=700%2C400&ssl=1 2x"},"classes":[]},{"id":1431,"url":"https:\/\/www.allendowney.com\/blog\/2024\/11\/19\/whats-a-chartist\/","url_meta":{"origin":1411,"position":2},"title":"What&#8217;s a Chartist?","author":"AllenDowney","date":"November 19, 2024","format":false,"excerpt":"Recently I heard the word \u201cchartist\u201d for the first time in my life (that I recall). And then later the same day, I heard it again. So that raises two questions: What are the chances of going 57 years without hearing a word, and then hearing it twice in one\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":941,"url":"https:\/\/www.allendowney.com\/blog\/2023\/05\/10\/causation-collision-and-confusion\/","url_meta":{"origin":1411,"position":3},"title":"Causation, Collision, and Confusion","author":"AllenDowney","date":"May 10, 2023","format":false,"excerpt":"Today I presented a talk about Berkson's paradox at ODSC East 2023. If you missed it, the slides are here. When the video is available, I'll post it here. Abstract: Collision bias is the most treacherous error in statistics: it can be subtle, it is easy to induce it by\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":721,"url":"https:\/\/www.allendowney.com\/blog\/2022\/05\/09\/name-that-distribution\/","url_meta":{"origin":1411,"position":4},"title":"The Student-t model of Long-Tailed Distributions","author":"AllenDowney","date":"May 9, 2022","format":false,"excerpt":"As I've mentioned, I'm working on a book called Probably Overthinking It, to be published in early 2023. It's intended for a general audience, so I'm not trying to do research, but I might have found something novel while working on a chapter about power law distributions. If you are\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2022\/05\/djia-1.jpg?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2022\/05\/djia-1.jpg?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2022\/05\/djia-1.jpg?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2022\/05\/djia-1.jpg?resize=700%2C400&ssl=1 2x"},"classes":[]},{"id":956,"url":"https:\/\/www.allendowney.com\/blog\/2023\/06\/10\/abstracts-and-keywords\/","url_meta":{"origin":1411,"position":5},"title":"Abstracts and keywords","author":"AllenDowney","date":"June 10, 2023","format":false,"excerpt":"As Probably Overthinking It approaches the finish line, there are just a few more tasks: I am working on the index and -- as I have recently learned -- I also have to write a 200-word abstract, a list of keywords for each chapter, and a 250-word abstract for the\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts\/1411","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/comments?post=1411"}],"version-history":[{"count":7,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts\/1411\/revisions"}],"predecessor-version":[{"id":1424,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts\/1411\/revisions\/1424"}],"wp:attachment":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/media?parent=1411"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/categories?post=1411"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/tags?post=1411"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}