{"id":279,"date":"2019-08-13T17:14:42","date_gmt":"2019-08-13T17:14:42","guid":{"rendered":"https:\/\/www.allendowney.com\/blog\/?p=279"},"modified":"2019-08-13T17:14:42","modified_gmt":"2019-08-13T17:14:42","slug":"watch-your-tail","status":"publish","type":"post","link":"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/","title":{"rendered":"Watch your tail!"},"content":{"rendered":"\n<p>For a long time <a href=\"https:\/\/allendowney.blogspot.com\/2013\/08\/are-my-data-normal.html\">I have recommended using CDFs<\/a> to compare distributions.  If you are comparing an empirical distribution to a model, the CDF gives you the best view of any differences between the data and the model.<\/p>\n\n\n\n<p>Now I want to amend my advice.  CDFs give you a good view of the distribution between the 5th and 95th percentiles, but they are not as good for the tails.<\/p>\n\n\n\n<p>To compare both tails, as well as the &#8220;bulk&#8221; of the distribution, I recommend a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Triptych\">triptych<\/a> that looks like this:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"341\" src=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/sp550.2-1024x341.png\" alt=\"\" class=\"wp-image-280\" srcset=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/sp550.2-1024x341.png 1024w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/sp550.2-300x100.png 300w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/sp550.2-768x256.png 768w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/sp550.2-604x201.png 604w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/sp550.2.png 1800w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>There&#8217;s a lot of information in that figure.  So let me explain.<\/p>\n\n\n\n<p><a href=\"https:\/\/nbviewer.jupyter.org\/github\/AllenDowney\/ProbablyOverthinkingIt\/blob\/master\/sp500.ipynb\">The code for this article is in this Jupyter notebook.<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Daily changes<\/h2>\n\n\n\n<p>Suppose you observe a random process, like daily changes in the <a href=\"https:\/\/en.wikipedia.org\/wiki\/S%26P_500_Index\">S&amp;P 500<\/a>. And suppose you have <a href=\"https:\/\/finance.yahoo.com\/quote\/%5EGSPC\/history?period1=-630961200&amp;period2=1565150400&amp;interval=1d&amp;filter=history&amp;frequency=1d\">collected historical data<\/a> in the form of percent changes from one day to the next. The distribution of those changes might look like this:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"424\" height=\"280\" src=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-7.png\" alt=\"\" class=\"wp-image-282\" srcset=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-7.png 424w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-7-300x198.png 300w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-7-409x270.png 409w\" sizes=\"auto, (max-width: 424px) 100vw, 424px\" \/><\/figure>\n\n\n\n<p>If you fit a Gaussian model to this data, it looks like this:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"424\" height=\"280\" src=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-10.png\" alt=\"\" class=\"wp-image-285\" srcset=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-10.png 424w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-10-300x198.png 300w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-10-409x270.png 409w\" sizes=\"auto, (max-width: 424px) 100vw, 424px\" \/><\/figure>\n\n\n\n<p>It looks like there are small discrepancies between the model and the data, but if you follow my previous advice, you might look at these CDFs and conclude that the Gaussian model is pretty good.<\/p>\n\n\n\n<p>If we zoom in on the middle of the distribution, we can see the discrepancies more clearly:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"424\" height=\"280\" src=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-11.png\" alt=\"\" class=\"wp-image-286\" srcset=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-11.png 424w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-11-300x198.png 300w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-11-409x270.png 409w\" sizes=\"auto, (max-width: 424px) 100vw, 424px\" \/><\/figure>\n\n\n\n<p>In this figure it is clearer that the Gaussian model does not fit the data particularly well.  And, as we&#8217;ll see, the tails are even worse.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Survival on a log-log scale<\/h2>\n\n\n\n<p>In my opinion, the best way to compare tails is to plot the survival curve (which is the complementary CDF) on a log-log scale.<\/p>\n\n\n\n<p>In this case, because the dataset includes positive and negative values, I shift them right to view the right tail, and left to view the left tail.  <\/p>\n\n\n\n<p>Here&#8217;s what the right tail looks like:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"406\" height=\"278\" src=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-12.png\" alt=\"\" class=\"wp-image-287\" srcset=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-12.png 406w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-12-300x205.png 300w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-12-394x270.png 394w\" sizes=\"auto, (max-width: 406px) 100vw, 406px\" \/><\/figure>\n\n\n\n<p>This view is like a microscope for looking at tail behavior; it compresses the bulk of the distribution and expands the tail.  In this case we can see a small discrepancy between the data and the model around 1 percentage point.  And we can see a substantial discrepancy above 3 percentage points.<\/p>\n\n\n\n<p>The Gaussian distribution has &#8220;thin tails&#8221;; that is, the probabilities it assigns to extreme events drop off very quickly.  In the dataset, extreme values are much more common than the model predicts.<\/p>\n\n\n\n<p>The results for the left tail are similar:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"424\" height=\"277\" src=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-13.png\" alt=\"\" class=\"wp-image-288\" srcset=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-13.png 424w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-13-300x196.png 300w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-13-413x270.png 413w\" sizes=\"auto, (max-width: 424px) 100vw, 424px\" \/><\/figure>\n\n\n\n<p>Again, there is a small discrepancy near -1 percentage points, as we saw when we zoomed in on the CDF.  And there is a substantial discrepancy in the leftmost tail.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Student&#8217;s t-distribution<\/h2>\n\n\n\n<p>Now let&#8217;s try the same exercise with <a href=\"https:\/\/en.wikipedia.org\/wiki\/Student%27s_t-distribution\">Student&#8217;s t-distribution<\/a>. There are two ways I suggest you think about this distribution:<\/p>\n\n\n\n<p>1) Student&#8217;s t is similar to a Gaussian distribution in the middle, but it has heavier tails. The heaviness of the tails is controlled by a third parameter, \u03bd.<\/p>\n\n\n\n<p>2) Also, Student&#8217;s t is a mixture of Gaussian distributions with different variances. The tail parameter, \u03bd, is related to the variance of the variances.<\/p>\n\n\n\n<p>For a demonstration of the second interpretation, I recommend <a href=\"http:\/\/www.sumsar.net\/blog\/2013\/12\/t-as-a-mixture-of-normals\/\">this animation by Rasmus B\u00e5\u00e5th<\/a>.<\/p>\n\n\n\n<p>I used PyMC to estimate the parameters of a Student&#8217;s t model and generate a posterior predictive distribution.  You can see the details in <a href=\"https:\/\/nbviewer.jupyter.org\/github\/AllenDowney\/ProbablyOverthinkingIt\/blob\/master\/sp500.ipynb\">this Jupyter notebook<\/a>.<\/p>\n\n\n\n<p>Here is the CDF of the Student t model compared to the data and the Gaussian model:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"424\" height=\"280\" src=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-14.png\" alt=\"\" class=\"wp-image-290\" srcset=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-14.png 424w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-14-300x198.png 300w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-14-409x270.png 409w\" sizes=\"auto, (max-width: 424px) 100vw, 424px\" \/><\/figure>\n\n\n\n<p>In the bulk of the distribution, Student&#8217;s t-distribution is clearly a better fit.<\/p>\n\n\n\n<p>Now here&#8217;s the right tail, again comparing survival curves on a log-log scale:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"406\" height=\"278\" src=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-15.png\" alt=\"\" class=\"wp-image-291\" srcset=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-15.png 406w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-15-300x205.png 300w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-15-394x270.png 394w\" sizes=\"auto, (max-width: 406px) 100vw, 406px\" \/><\/figure>\n\n\n\n<p>Student&#8217;s t-distribution is a better fit than the Gaussian model, but it overestimates the probability of extreme values.  The problem is that the left tail of the empirical distribution is heavier than the right.  But the model is symmetric, so it can only match one tail or the other, not both.<\/p>\n\n\n\n<p>Here is the left tail:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"424\" height=\"277\" src=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-16.png\" alt=\"\" class=\"wp-image-292\" srcset=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-16.png 424w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-16-300x196.png 300w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/image-16-413x270.png 413w\" sizes=\"auto, (max-width: 424px) 100vw, 424px\" \/><\/figure>\n\n\n\n<p>The model fits the left tail about as well as possible.  <\/p>\n\n\n\n<p>If you are primarily worried about predicting extreme losses, this model would be a good choice.  But if you need to model both tails well, you could try one of the <a href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0304407610000266#sec3\">asymmetric generalizations of Student&#8217;s t<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The old six sigma<\/h2>\n\n\n\n<p>The tail behavior of the Gaussian distribution is the key to understanding &#8220;six sigma events&#8221;.<\/p>\n\n\n\n<p>John Cook explains six sigmas in <a href=\"https:\/\/www.johndcook.com\/blog\/2018\/05\/31\/six-sigma-events\/\">this excellent article<\/a>:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>&#8220;Six sigma means six standard deviations away from the mean of a probability distribution, sigma (\u03c3) being the common notation for a standard deviation. Moreover, the underlying distribution is implicitly a normal (Gaussian) distribution; people don\u2019t commonly talk about &#8216;six sigma&#8217; in the context of other distributions.&#8221;<\/p><\/blockquote>\n\n\n\n<p>This is important. John also explains:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>&#8220;A six-sigma event isn\u2019t that rare unless your probability distribution is normal\u2026 The rarity of six-sigma events comes from the assumption of a normal distribution more than from the number of sigmas per se.&#8221;<\/p><\/blockquote>\n\n\n\n<p>So, if you see a six-sigma event, you should probably not think, &#8220;That was extremely rare, according to my Gaussian model.&#8221;  Instead, you should think, &#8220;Maybe my Gaussian model is not a good choice&#8221;.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>For a long time I have recommended using CDFs to compare distributions. If you are comparing an empirical distribution to a model, the CDF gives you the best view of any differences between the data and the model. Now I want to amend my advice. CDFs give you a good view of the distribution between the 5th and 95th percentiles, but they are not as good for the tails. To compare both tails, as well as the &#8220;bulk&#8221; of the&#8230;<\/p>\n<p class=\"read-more\"><a class=\"btn btn-default\" href=\"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/\"> Read More<span class=\"screen-reader-text\">  Read More<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[1],"tags":[33,36,30,35,32,31,34],"class_list":["post-279","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-cdf","tag-distribution","tag-gaussian","tag-heavy-tail","tag-stock-market","tag-student-t","tag-survival-function"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Watch your tail! - Probably Overthinking It<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Watch your tail! - Probably Overthinking It\" \/>\n<meta property=\"og:description\" content=\"For a long time I have recommended using CDFs to compare distributions. If you are comparing an empirical distribution to a model, the CDF gives you the best view of any differences between the data and the model. Now I want to amend my advice. CDFs give you a good view of the distribution between the 5th and 95th percentiles, but they are not as good for the tails. To compare both tails, as well as the &#8220;bulk&#8221; of the... Read More Read More\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/\" \/>\n<meta property=\"og:site_name\" content=\"Probably Overthinking It\" \/>\n<meta property=\"article:published_time\" content=\"2019-08-13T17:14:42+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/sp550.2-1024x341.png\" \/>\n<meta name=\"author\" content=\"AllenDowney\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@AllenDowney\" \/>\n<meta name=\"twitter:site\" content=\"@AllenDowney\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"AllenDowney\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/\"},\"author\":{\"name\":\"AllenDowney\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207\"},\"headline\":\"Watch your tail!\",\"datePublished\":\"2019-08-13T17:14:42+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/\"},\"wordCount\":855,\"publisher\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/sp550.2-1024x341.png\",\"keywords\":[\"CDF\",\"distribution\",\"Gaussian\",\"heavy tail\",\"stock market\",\"Student t\",\"Survival function\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/\",\"url\":\"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/\",\"name\":\"Watch your tail! - Probably Overthinking It\",\"isPartOf\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/sp550.2-1024x341.png\",\"datePublished\":\"2019-08-13T17:14:42+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/#primaryimage\",\"url\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/sp550.2.png\",\"contentUrl\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/sp550.2.png\",\"width\":1800,\"height\":600},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.allendowney.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Watch your tail!\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#website\",\"url\":\"https:\/\/www.allendowney.com\/blog\/\",\"name\":\"Probably Overthinking It\",\"description\":\"Data science, Bayesian Statistics, and other ideas\",\"publisher\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.allendowney.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#organization\",\"name\":\"Probably Overthinking It\",\"url\":\"https:\/\/www.allendowney.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png\",\"contentUrl\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png\",\"width\":714,\"height\":784,\"caption\":\"Probably Overthinking It\"},\"image\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/AllenDowney\",\"https:\/\/www.linkedin.com\/in\/allendowney\/\",\"https:\/\/bsky.app\/profile\/allendowney.bsky.social\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207\",\"name\":\"AllenDowney\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g\",\"caption\":\"AllenDowney\"},\"url\":\"https:\/\/www.allendowney.com\/blog\/author\/allendowney_6dbrc4\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Watch your tail! - Probably Overthinking It","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/","og_locale":"en_US","og_type":"article","og_title":"Watch your tail! - Probably Overthinking It","og_description":"For a long time I have recommended using CDFs to compare distributions. If you are comparing an empirical distribution to a model, the CDF gives you the best view of any differences between the data and the model. Now I want to amend my advice. CDFs give you a good view of the distribution between the 5th and 95th percentiles, but they are not as good for the tails. To compare both tails, as well as the &#8220;bulk&#8221; of the... Read More Read More","og_url":"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/","og_site_name":"Probably Overthinking It","article_published_time":"2019-08-13T17:14:42+00:00","og_image":[{"url":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/sp550.2-1024x341.png","type":"","width":"","height":""}],"author":"AllenDowney","twitter_card":"summary_large_image","twitter_creator":"@AllenDowney","twitter_site":"@AllenDowney","twitter_misc":{"Written by":"AllenDowney","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/#article","isPartOf":{"@id":"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/"},"author":{"name":"AllenDowney","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207"},"headline":"Watch your tail!","datePublished":"2019-08-13T17:14:42+00:00","mainEntityOfPage":{"@id":"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/"},"wordCount":855,"publisher":{"@id":"https:\/\/www.allendowney.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/#primaryimage"},"thumbnailUrl":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/sp550.2-1024x341.png","keywords":["CDF","distribution","Gaussian","heavy tail","stock market","Student t","Survival function"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/","url":"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/","name":"Watch your tail! - Probably Overthinking It","isPartOf":{"@id":"https:\/\/www.allendowney.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/#primaryimage"},"image":{"@id":"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/#primaryimage"},"thumbnailUrl":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/sp550.2-1024x341.png","datePublished":"2019-08-13T17:14:42+00:00","breadcrumb":{"@id":"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/#primaryimage","url":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/sp550.2.png","contentUrl":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2019\/08\/sp550.2.png","width":1800,"height":600},{"@type":"BreadcrumbList","@id":"https:\/\/www.allendowney.com\/blog\/2019\/08\/13\/watch-your-tail\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.allendowney.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Watch your tail!"}]},{"@type":"WebSite","@id":"https:\/\/www.allendowney.com\/blog\/#website","url":"https:\/\/www.allendowney.com\/blog\/","name":"Probably Overthinking It","description":"Data science, Bayesian Statistics, and other ideas","publisher":{"@id":"https:\/\/www.allendowney.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.allendowney.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.allendowney.com\/blog\/#organization","name":"Probably Overthinking It","url":"https:\/\/www.allendowney.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png","contentUrl":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png","width":714,"height":784,"caption":"Probably Overthinking It"},"image":{"@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/AllenDowney","https:\/\/www.linkedin.com\/in\/allendowney\/","https:\/\/bsky.app\/profile\/allendowney.bsky.social"]},{"@type":"Person","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207","name":"AllenDowney","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g","caption":"AllenDowney"},"url":"https:\/\/www.allendowney.com\/blog\/author\/allendowney_6dbrc4\/"}]}},"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[{"id":1425,"url":"https:\/\/www.allendowney.com\/blog\/2024\/11\/17\/comparing-distributions\/","url_meta":{"origin":279,"position":0},"title":"Comparing Distributions","author":"AllenDowney","date":"November 17, 2024","format":false,"excerpt":"This is the second is a series of excerpts from Elements of Data Science which available from Lulu.com and online booksellers. It's from Chapter 8, which is about representing distribution using PMFs and CDFs. This section explains why I think CDFs are often better for plotting and comparing distributions.You can\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/11\/5c3184ea9ce15063868ee31f65fec22c0a7ac3dd86c19a9070b2d4c0653b6d1f.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":721,"url":"https:\/\/www.allendowney.com\/blog\/2022\/05\/09\/name-that-distribution\/","url_meta":{"origin":279,"position":1},"title":"The Student-t model of Long-Tailed Distributions","author":"AllenDowney","date":"May 9, 2022","format":false,"excerpt":"As I've mentioned, I'm working on a book called Probably Overthinking It, to be published in early 2023. It's intended for a general audience, so I'm not trying to do research, but I might have found something novel while working on a chapter about power law distributions. If you are\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2022\/05\/djia-1.jpg?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2022\/05\/djia-1.jpg?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2022\/05\/djia-1.jpg?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2022\/05\/djia-1.jpg?resize=700%2C400&ssl=1 2x"},"classes":[]},{"id":1390,"url":"https:\/\/www.allendowney.com\/blog\/2024\/10\/15\/bootstrapping-a-proportion\/","url_meta":{"origin":279,"position":2},"title":"Bootstrapping a Proportion","author":"AllenDowney","date":"October 15, 2024","format":false,"excerpt":"It's another installment in Data Q&A: Answering the real questions with Python. Previous installments are available from the Data Q&A landing page. Here\u2019s a question from the Reddit statistics forum. How do I use bootstrapping to generate confidence intervals for a proportion\/ratio? The situation is this: I obtain samples of\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2024\/10\/image-4.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":849,"url":"https:\/\/www.allendowney.com\/blog\/2023\/01\/28\/never-test-for-normality\/","url_meta":{"origin":279,"position":3},"title":"Never Test for Normality","author":"AllenDowney","date":"January 28, 2023","format":false,"excerpt":"Way back in 2013, I wrote this blog post explaining why you should never use a statistical test to check whether a sample came from a Gaussian distribution. I argued that data from the real world never come from a Gaussian distribution, or any other simple mathematical model, so the\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2023\/01\/anderson1-1.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2023\/01\/anderson1-1.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2023\/01\/anderson1-1.png?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2023\/01\/anderson1-1.png?resize=700%2C400&ssl=1 2x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2023\/01\/anderson1-1.png?resize=1050%2C600&ssl=1 3x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2023\/01\/anderson1-1.png?resize=1400%2C800&ssl=1 4x"},"classes":[]},{"id":808,"url":"https:\/\/www.allendowney.com\/blog\/2022\/10\/03\/the-long-tail-of-disaster\/","url_meta":{"origin":279,"position":4},"title":"The Long Tail of Disaster","author":"AllenDowney","date":"October 3, 2022","format":false,"excerpt":"In honor of NASA's successful DART mission, here's a relevant excerpt from my forthcoming book, Probably Overthinking It. On March 11, 2022, an astronomer near Budapest, Hungary detected a new asteroid, now named 2022 EB5, on a collision course with Earth. Less than two hours later, it exploded in the\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2022\/10\/longtail_183_0.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2022\/10\/longtail_183_0.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2022\/10\/longtail_183_0.png?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2022\/10\/longtail_183_0.png?resize=700%2C400&ssl=1 2x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2022\/10\/longtail_183_0.png?resize=1050%2C600&ssl=1 3x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2022\/10\/longtail_183_0.png?resize=1400%2C800&ssl=1 4x"},"classes":[]},{"id":1122,"url":"https:\/\/www.allendowney.com\/blog\/2023\/11\/29\/superbolts\/","url_meta":{"origin":279,"position":5},"title":"Superbolts","author":"AllenDowney","date":"November 29, 2023","format":false,"excerpt":"Probably Overthinking It is available to predorder now. You can get a 30% discount if you order from the publisher and use the code UCPNEW. You can also order from Amazon or, if you want to support independent bookstores, from Bookshop.org. Recently I read a Scientific American article about superbolts,\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2023\/11\/superbolt1-1.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2023\/11\/superbolt1-1.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2023\/11\/superbolt1-1.png?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2023\/11\/superbolt1-1.png?resize=700%2C400&ssl=1 2x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2023\/11\/superbolt1-1.png?resize=1050%2C600&ssl=1 3x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2023\/11\/superbolt1-1.png?resize=1400%2C800&ssl=1 4x"},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts\/279","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/comments?post=279"}],"version-history":[{"count":5,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts\/279\/revisions"}],"predecessor-version":[{"id":296,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts\/279\/revisions\/296"}],"wp:attachment":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/media?parent=279"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/categories?post=279"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/tags?post=279"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}