{"id":1436,"date":"2024-11-24T15:55:39","date_gmt":"2024-11-24T15:55:39","guid":{"rendered":"https:\/\/www.allendowney.com\/blog\/?p=1436"},"modified":"2024-11-24T15:55:39","modified_gmt":"2024-11-24T15:55:39","slug":"download-the-world-in-data","status":"publish","type":"post","link":"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/","title":{"rendered":"Download the World in Data"},"content":{"rendered":"\n<p>Our World in Data <a href=\"https:\/\/ourworldindata.org\/easier-to-reuse-our-data\">recently announced<\/a> that they are providing APIs to access their data. Coincidentally, I am using one of their datasets in my <a href=\"https:\/\/global2024.pydata.org\/cfp\/talk\/KLXYKX\/\">workshop on time series analysis at PyData Global 2024<\/a>. So I took this opportunity to update my example using the new API \u2013 this notebook shows what I learned.<\/p>\n\n\n\n<p><a href=\"https:\/\/colab.research.google.com\/github\/AllenDowney\/ThinkStats\/blob\/v3\/examples\/temperature.ipynb\">Click here to run this notebook on Colab<\/a>. It is based on Chapter 12 of <a href=\"https:\/\/allendowney.github.io\/ThinkStats\/\"><em>Think Stats<\/em>, third edition<\/a>.<\/p>\n\n\n\n<pre id=\"codecell0\" class=\"wp-block-preformatted\">import numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\n<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Air Temperature<\/h2>\n\n\n\n<p>In the chapter on time series analysis, in an exercise on seasonal decomposition, I use monthly average surface temperatures in the United States, from a <a href=\"https:\/\/ourworldindata.org\/grapher\/average-monthly-surface-temperature\">dataset from Our World in Data<\/a> that includes \u201ctemperature [in Celsius] of the air measured 2 meters above the ground, encompassing land, sea, and in-land water surfaces,\u201d for most countries in the world from 1941 to 2024.<\/p>\n\n\n\n<p>The following cells download and display the metadata that describes the dataset.<\/p>\n\n\n\n<pre id=\"codecell1\" class=\"wp-block-preformatted\">import requests<br><br>url = (<br>    \"https:\/\/ourworldindata.org\/grapher\/\"<br>    \"average-monthly-surface-temperature.metadata.json\"<br>)<br>query_params = {<br>    \"v\": \"1\",<br>    \"csvType\": \"full\",<br>    \"useColumnShortNames\": \"true\"<br>}<br>headers = {'User-Agent': 'Our World In Data data fetch\/1.0'}<br><br>response = requests.get(url, params=query_params, headers=headers)<br>metadata = response.json()<br><\/pre>\n\n\n\n<p>The result is a nested dictionary. Here are the top-level keys.<\/p>\n\n\n\n<pre id=\"codecell2\" class=\"wp-block-preformatted\">metadata.keys()\n<\/pre>\n\n\n\n<pre id=\"codecell3\" class=\"wp-block-preformatted\">dict_keys(['chart', 'columns', 'dateDownloaded'])\n<\/pre>\n\n\n\n<p>Here\u2019s the chart-level documentation.<\/p>\n\n\n\n<pre id=\"codecell4\" class=\"wp-block-preformatted\">from pprint import pprint\n\npprint(metadata['chart'])\n<\/pre>\n\n\n\n<pre id=\"codecell5\" class=\"wp-block-preformatted\">{'citation': 'Contains modified Copernicus Climate Change Service information '\n             '(2019)',\n 'originalChartUrl': 'https:\/\/ourworldindata.org\/grapher\/average-monthly-surface-temperature?v=1&amp;csvType=full&amp;useColumnShortNames=true',\n 'selection': ['World'],\n 'subtitle': 'The temperature of the air measured 2 meters above the ground, '\n             'encompassing land, sea, and in-land water surfaces.',\n 'title': 'Average monthly surface temperature'}\n<\/pre>\n\n\n\n<p>And here\u2019s the documentation of the column we\u2019ll use.<\/p>\n\n\n\n<pre id=\"codecell6\" class=\"wp-block-preformatted\">pprint(metadata['columns']['temperature_2m'])<br><\/pre>\n\n\n\n<pre id=\"codecell7\" class=\"wp-block-preformatted\">{'citationLong': 'Contains modified Copernicus Climate Change Service '\n                 'information (2019) \u2013 with major processing by Our World in '\n                 'Data. \u201cAnnual average\u201d [dataset]. Contains modified '\n                 'Copernicus Climate Change Service information, \u201cERA5 monthly '\n                 'averaged data on single levels from 1940 to present 2\u201d '\n                 '[original data].',\n 'citationShort': 'Contains modified Copernicus Climate Change Service '\n                  'information (2019) \u2013 with major processing by Our World in '\n                  'Data',\n 'descriptionKey': [],\n 'descriptionProcessing': '- Temperature measured in kelvin was converted to '\n                          'degrees Celsius (\u00b0C) by subtracting 273.15.\\n'\n                          '\\n'\n                          '- Initially, the temperature dataset is provided '\n                          'with specific coordinates in terms of longitude and '\n                          'latitude. To tailor this data to each country, we '\n                          'utilize geographical boundaries as defined by the '\n                          'World Bank. The method involves trimming the global '\n                          'temperature dataset to match the exact geographical '\n                          'shape of each country. To correct for potential '\n                          \"distortions caused by the Earth's curvature on a \"\n                          'flat map, we apply a latitude-based weighting. This '\n                          'step is essential for maintaining accuracy, '\n                          'especially in high-latitude regions where '\n                          'distortion is more pronounced. The result of this '\n                          'process is a latitude-weighted average temperature '\n                          'for each nation.\\n'\n                          '\\n'\n                          \"- It's important to note, however, that due to the \"\n                          'resolution constraints of the Copernicus dataset, '\n                          'this methodology might not be as effective for '\n                          'countries with very small landmasses. In these '\n                          'cases, the process may not yield reliable data.\\n'\n                          '\\n'\n                          '- The derived 2-meter temperature readings for each '\n                          'country are calculated based on administrative '\n                          'borders, encompassing all land surface types within '\n                          'these defined areas. As a result, temperatures over '\n                          'oceans and seas are not included in these averages, '\n                          'focusing the data primarily on terrestrial '\n                          'environments.\\n'\n                          '\\n'\n                          '- Global temperature averages and anomalies are '\n                          'calculated over all land and ocean surfaces.',\n 'descriptionShort': 'The temperature of the air measured 2 meters above the '\n                     'ground, encompassing land, sea, and in-land water '\n                     'surfaces. The 2024 data is incomplete and was last '\n                     'updated 13 October 2024.',\n 'fullMetadata': 'https:\/\/api.ourworldindata.org\/v1\/indicators\/819532.metadata.json',\n 'lastUpdated': '2023-12-20',\n 'owidVariableId': 819532,\n 'shortName': 'temperature_2m',\n 'shortUnit': '\u00b0C',\n 'timespan': '1940-2024',\n 'titleLong': 'Annual average',\n 'titleShort': 'Annual average',\n 'type': 'Numeric',\n 'unit': '\u00b0C'}\n<\/pre>\n\n\n\n<p>The following cells download the data for the United States \u2013 to see data from another country, change <code>country_code<\/code> to almost any <a href=\"https:\/\/en.wikipedia.org\/wiki\/List_of_ISO_3166_country_codes#Current_ISO_3166_country_code\">three-letter ISO 3166 country codes<\/a>.<\/p>\n\n\n\n<pre id=\"codecell8\" class=\"wp-block-preformatted\">country_code = 'USA'    # replace this with other three-letter country codes<br>base_url = (<br>    \"https:\/\/ourworldindata.org\/grapher\/\" <br>    \"average-monthly-surface-temperature.csv\"<br>)<br><br>query_params = {<br>    \"v\": \"1\",<br>    \"csvType\": \"filtered\",<br>    \"useColumnShortNames\": \"true\",<br>    \"tab\": \"chart\",<br>    \"country\": country_code  <br>}<br><\/pre>\n\n\n\n<pre id=\"codecell9\" class=\"wp-block-preformatted\">from urllib.parse import urlencode\n\nurl = f\"{base_url}?{urlencode(query_params)}\"\ntemp_df = pd.read_csv(url, storage_options=headers)\n<\/pre>\n\n\n\n<p>In general, you can find out which query parameters are supported by exploring the dataset online and pressing the download icon, which displays a URL with query parameters corresponding to the filters you selected by interacting with the chart.<\/p>\n\n\n\n<pre id=\"codecell10\" class=\"wp-block-preformatted\">temp_df.head()\n<\/pre>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th><\/th><th>Entity<\/th><th>Code<\/th><th>year<\/th><th>Day<\/th><th>temperature_2m<\/th><th>temperature_2m.1<\/th><\/tr><\/thead><tbody><tr><th>0<\/th><td>United States<\/td><td>USA<\/td><td>1941<\/td><td>1941-12-15<\/td><td>-1.878019<\/td><td>8.016244<\/td><\/tr><tr><th>1<\/th><td>United States<\/td><td>USA<\/td><td>1942<\/td><td>1942-01-15<\/td><td>-4.776551<\/td><td>7.848984<\/td><\/tr><tr><th>2<\/th><td>United States<\/td><td>USA<\/td><td>1942<\/td><td>1942-02-15<\/td><td>-3.870868<\/td><td>7.848984<\/td><\/tr><tr><th>3<\/th><td>United States<\/td><td>USA<\/td><td>1942<\/td><td>1942-03-15<\/td><td>0.097811<\/td><td>7.848984<\/td><\/tr><tr><th>4<\/th><td>United States<\/td><td>USA<\/td><td>1942<\/td><td>1942-04-15<\/td><td>7.537291<\/td><td>7.848984<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The resulting <code>DataFrame<\/code> includes the column that\u2019s documented in the metadata, <code>temperature_2m<\/code>, and an additional undocumented column, which might be an annual average.<\/p>\n\n\n\n<p>For this example, we\u2019ll use the monthly data.<\/p>\n\n\n\n<pre id=\"codecell11\" class=\"wp-block-preformatted\">temp_series = temp_df['temperature_2m']\ntemp_series.index = pd.to_datetime(temp_df['Day'])\n<\/pre>\n\n\n\n<p>Here\u2019s what it looks like.<\/p>\n\n\n\n<pre id=\"codecell12\" class=\"wp-block-preformatted\">temp_series.plot(label=country_code)\nplt.ylabel(\"Surface temperature (\u2103)\");\n<\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/allendowney.github.io\/ThinkStats\/_images\/ae897b58e9b8541f265025d79764a87d73740b1fad45b044244e957e896889fe.png\" alt=\"_images\/ae897b58e9b8541f265025d79764a87d73740b1fad45b044244e957e896889fe.png\"\/><\/figure>\n\n\n\n<p>Not surprisingly, there is a strong seasonal pattern. We can use <code>seasonal_decompose<\/code> from StatsModels to identify a long-term trend, a seasonal component, and a residual.<\/p>\n\n\n\n<pre id=\"codecell13\" class=\"wp-block-preformatted\">from statsmodels.tsa.seasonal import seasonal_decompose\n\ndecomposition = seasonal_decompose(temp_series, model=\"additive\", period=12)\n<\/pre>\n\n\n\n<p>We\u2019ll use the following function to plot the results.<\/p>\n\n\n\n<pre id=\"codecell14\" class=\"wp-block-preformatted\">def plot_decomposition(original, decomposition):\n    plt.figure(figsize=(6, 5))\n\n    plt.subplot(4, 1, 1)\n    plt.plot(original, label=\"Original\", color=\"C0\")\n    plt.ylabel(\"Original\")\n\n    plt.subplot(4, 1, 2)\n    plt.plot(decomposition.trend, label=\"Trend\", color=\"C1\")\n    plt.ylabel(\"Trend\")\n\n    plt.subplot(4, 1, 3)\n    plt.plot(decomposition.seasonal, label=\"Seasonal\", color=\"C2\")\n    plt.ylabel(\"Seasonal\")\n\n    plt.subplot(4, 1, 4)\n    plt.plot(decomposition.resid, label=\"Residual\", color=\"C3\")\n    plt.ylabel(\"Residual\")\n\n    plt.tight_layout()\n<\/pre>\n\n\n\n<pre id=\"codecell15\" class=\"wp-block-preformatted\">plot_decomposition(temp_series, decomposition)\n<\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/allendowney.github.io\/ThinkStats\/_images\/c5738f871cf8982612f4921ae8479cf1c5ad9d583b5f0cebabb401af92abce1f.png\" alt=\"_images\/c5738f871cf8982612f4921ae8479cf1c5ad9d583b5f0cebabb401af92abce1f.png\"\/><\/figure>\n\n\n\n<p>As always, I\u2019m grateful to Our World in Data for making datasets like this available, and now easier to use programmatically.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Our World in Data recently announced that they are providing APIs to access their data. Coincidentally, I am using one of their datasets in my workshop on time series analysis at PyData Global 2024. So I took this opportunity to update my example using the new API \u2013 this notebook shows what I learned. Click here to run this notebook on Colab. It is based on Chapter 12 of Think Stats, third edition. import numpy as np import pandas as&#8230;<\/p>\n<p class=\"read-more\"><a class=\"btn btn-default\" href=\"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/\"> Read More<span class=\"screen-reader-text\">  Read More<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[1],"tags":[],"class_list":["post-1436","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Download the World in Data - Probably Overthinking It<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Download the World in Data - Probably Overthinking It\" \/>\n<meta property=\"og:description\" content=\"Our World in Data recently announced that they are providing APIs to access their data. Coincidentally, I am using one of their datasets in my workshop on time series analysis at PyData Global 2024. So I took this opportunity to update my example using the new API \u2013 this notebook shows what I learned. Click here to run this notebook on Colab. It is based on Chapter 12 of Think Stats, third edition. import numpy as np import pandas as... Read More Read More\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/\" \/>\n<meta property=\"og:site_name\" content=\"Probably Overthinking It\" \/>\n<meta property=\"article:published_time\" content=\"2024-11-24T15:55:39+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/allendowney.github.io\/ThinkStats\/_images\/ae897b58e9b8541f265025d79764a87d73740b1fad45b044244e957e896889fe.png\" \/>\n<meta name=\"author\" content=\"AllenDowney\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@AllenDowney\" \/>\n<meta name=\"twitter:site\" content=\"@AllenDowney\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"AllenDowney\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/\"},\"author\":{\"name\":\"AllenDowney\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207\"},\"headline\":\"Download the World in Data\",\"datePublished\":\"2024-11-24T15:55:39+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/\"},\"wordCount\":367,\"publisher\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/allendowney.github.io\/ThinkStats\/_images\/ae897b58e9b8541f265025d79764a87d73740b1fad45b044244e957e896889fe.png\",\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/\",\"url\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/\",\"name\":\"Download the World in Data - Probably Overthinking It\",\"isPartOf\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/allendowney.github.io\/ThinkStats\/_images\/ae897b58e9b8541f265025d79764a87d73740b1fad45b044244e957e896889fe.png\",\"datePublished\":\"2024-11-24T15:55:39+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/#primaryimage\",\"url\":\"https:\/\/allendowney.github.io\/ThinkStats\/_images\/ae897b58e9b8541f265025d79764a87d73740b1fad45b044244e957e896889fe.png\",\"contentUrl\":\"https:\/\/allendowney.github.io\/ThinkStats\/_images\/ae897b58e9b8541f265025d79764a87d73740b1fad45b044244e957e896889fe.png\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.allendowney.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Download the World in Data\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#website\",\"url\":\"https:\/\/www.allendowney.com\/blog\/\",\"name\":\"Probably Overthinking It\",\"description\":\"Data science, Bayesian Statistics, and other ideas\",\"publisher\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.allendowney.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#organization\",\"name\":\"Probably Overthinking It\",\"url\":\"https:\/\/www.allendowney.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png\",\"contentUrl\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png\",\"width\":714,\"height\":784,\"caption\":\"Probably Overthinking It\"},\"image\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/AllenDowney\",\"https:\/\/www.linkedin.com\/in\/allendowney\/\",\"https:\/\/bsky.app\/profile\/allendowney.bsky.social\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207\",\"name\":\"AllenDowney\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g\",\"caption\":\"AllenDowney\"},\"url\":\"https:\/\/www.allendowney.com\/blog\/author\/allendowney_6dbrc4\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Download the World in Data - Probably Overthinking It","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/","og_locale":"en_US","og_type":"article","og_title":"Download the World in Data - Probably Overthinking It","og_description":"Our World in Data recently announced that they are providing APIs to access their data. Coincidentally, I am using one of their datasets in my workshop on time series analysis at PyData Global 2024. So I took this opportunity to update my example using the new API \u2013 this notebook shows what I learned. Click here to run this notebook on Colab. It is based on Chapter 12 of Think Stats, third edition. import numpy as np import pandas as... Read More Read More","og_url":"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/","og_site_name":"Probably Overthinking It","article_published_time":"2024-11-24T15:55:39+00:00","og_image":[{"url":"https:\/\/allendowney.github.io\/ThinkStats\/_images\/ae897b58e9b8541f265025d79764a87d73740b1fad45b044244e957e896889fe.png","type":"","width":"","height":""}],"author":"AllenDowney","twitter_card":"summary_large_image","twitter_creator":"@AllenDowney","twitter_site":"@AllenDowney","twitter_misc":{"Written by":"AllenDowney","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/#article","isPartOf":{"@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/"},"author":{"name":"AllenDowney","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207"},"headline":"Download the World in Data","datePublished":"2024-11-24T15:55:39+00:00","mainEntityOfPage":{"@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/"},"wordCount":367,"publisher":{"@id":"https:\/\/www.allendowney.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/#primaryimage"},"thumbnailUrl":"https:\/\/allendowney.github.io\/ThinkStats\/_images\/ae897b58e9b8541f265025d79764a87d73740b1fad45b044244e957e896889fe.png","inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/","url":"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/","name":"Download the World in Data - Probably Overthinking It","isPartOf":{"@id":"https:\/\/www.allendowney.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/#primaryimage"},"image":{"@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/#primaryimage"},"thumbnailUrl":"https:\/\/allendowney.github.io\/ThinkStats\/_images\/ae897b58e9b8541f265025d79764a87d73740b1fad45b044244e957e896889fe.png","datePublished":"2024-11-24T15:55:39+00:00","breadcrumb":{"@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/#primaryimage","url":"https:\/\/allendowney.github.io\/ThinkStats\/_images\/ae897b58e9b8541f265025d79764a87d73740b1fad45b044244e957e896889fe.png","contentUrl":"https:\/\/allendowney.github.io\/ThinkStats\/_images\/ae897b58e9b8541f265025d79764a87d73740b1fad45b044244e957e896889fe.png"},{"@type":"BreadcrumbList","@id":"https:\/\/www.allendowney.com\/blog\/2024\/11\/24\/download-the-world-in-data\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.allendowney.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Download the World in Data"}]},{"@type":"WebSite","@id":"https:\/\/www.allendowney.com\/blog\/#website","url":"https:\/\/www.allendowney.com\/blog\/","name":"Probably Overthinking It","description":"Data science, Bayesian Statistics, and other ideas","publisher":{"@id":"https:\/\/www.allendowney.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.allendowney.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.allendowney.com\/blog\/#organization","name":"Probably Overthinking It","url":"https:\/\/www.allendowney.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png","contentUrl":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png","width":714,"height":784,"caption":"Probably Overthinking It"},"image":{"@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/AllenDowney","https:\/\/www.linkedin.com\/in\/allendowney\/","https:\/\/bsky.app\/profile\/allendowney.bsky.social"]},{"@type":"Person","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207","name":"AllenDowney","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g","caption":"AllenDowney"},"url":"https:\/\/www.allendowney.com\/blog\/author\/allendowney_6dbrc4\/"}]}},"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[{"id":1496,"url":"https:\/\/www.allendowney.com\/blog\/2025\/01\/20\/1496\/","url_meta":{"origin":1436,"position":0},"title":"Algorithmic Fairness","author":"AllenDowney","date":"January 20, 2025","format":false,"excerpt":"This is the last in a series of excerpts from Elements of Data Science, now available from Lulu.com and online booksellers. This article is based on the Recidivism Case Study, which is about algorithmic fairness. The goal of the case study is to explain the statistical arguments presented in two\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":239,"url":"https:\/\/www.allendowney.com\/blog\/2019\/07\/25\/matplotlib-animation-in-jupyter\/","url_meta":{"origin":1436,"position":1},"title":"Matplotlib animation in Jupyter","author":"AllenDowney","date":"July 25, 2019","format":false,"excerpt":"For two of my books, Think Complexity and Modeling and Simulation in Python, many of the examples involve animation. Fortunately, there are several ways to do animation with Matplotlib in Jupyter. Unfortunately, none of them is ideal. FuncAnimation Until recently, I was using FuncAnimation, provided by the matplotlib.animation package, as\u2026","rel":"","context":"In \"animation\"","block_context":{"text":"animation","link":"https:\/\/www.allendowney.com\/blog\/tag\/animation\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1449,"url":"https:\/\/www.allendowney.com\/blog\/2024\/12\/04\/multiple-regression-with-statsmodels\/","url_meta":{"origin":1436,"position":2},"title":"Multiple Regression with StatsModels","author":"AllenDowney","date":"December 4, 2024","format":false,"excerpt":"This is the third is a series of excerpts from Elements of Data Science which available from Lulu.com and online booksellers. It's from Chapter 10, which is about multiple regression. You can read the complete chapter here, or run the Jupyter notebook on Colab. In the previous chapter we used\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/12\/ecc1aef34032bb07cf1639367d00ddbe2fc8a8ed7532628b9ddddafed10f7f38.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":1655,"url":"https:\/\/www.allendowney.com\/blog\/2025\/11\/04\/think-dsp-second-edition\/","url_meta":{"origin":1436,"position":3},"title":"Think DSP second edition","author":"AllenDowney","date":"November 4, 2025","format":false,"excerpt":"I have started work on a second edition of Think DSP! You can see the current draft here. I started this project in part because of this announcement: Once in a while, a few of the Scicloj friends will meet to learn about signal processing, following the Think DSP book\u2026","rel":"","context":"In \"DSP\"","block_context":{"text":"DSP","link":"https:\/\/www.allendowney.com\/blog\/tag\/dsp\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":659,"url":"https:\/\/www.allendowney.com\/blog\/2021\/08\/09\/bayesian-dice\/","url_meta":{"origin":1436,"position":4},"title":"Bayesian Dice","author":"AllenDowney","date":"August 9, 2021","format":false,"excerpt":"This article is available in a Jupyter notebook: click here to run it on Colab. I\u2019ve been enjoying Aubrey Clayton\u2019s new book Bernoulli\u2019s Fallacy. The first chapter, which is about the historical development of competing definitions of probability, is worth the price of admission alone. One of the examples in\u2026","rel":"","context":"In \"Bayes&#039;s Theorem\"","block_context":{"text":"Bayes&#039;s Theorem","link":"https:\/\/www.allendowney.com\/blog\/tag\/bayess-theorem\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":201,"url":"https:\/\/www.allendowney.com\/blog\/2019\/04\/01\/local-regression-in-python\/","url_meta":{"origin":1436,"position":5},"title":"Local regression in Python","author":"AllenDowney","date":"April 1, 2019","format":false,"excerpt":"I love data visualization make-overs (like this one I wrote a few months ago), but sometimes the tone can be too negative (like this one I wrote a few months ago). Sarah Leo, a data journalist at The Economist, has found the perfect solution: re-making your own visualizations. Here's her\u2026","rel":"","context":"In \"local regression\"","block_context":{"text":"local regression","link":"https:\/\/www.allendowney.com\/blog\/tag\/local-regression\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/04\/image.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts\/1436","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/comments?post=1436"}],"version-history":[{"count":3,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts\/1436\/revisions"}],"predecessor-version":[{"id":1439,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts\/1436\/revisions\/1439"}],"wp:attachment":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/media?parent=1436"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/categories?post=1436"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/tags?post=1436"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}