{"id":434,"date":"2020-02-20T18:02:58","date_gmt":"2020-02-20T18:02:58","guid":{"rendered":"https:\/\/www.allendowney.com\/blog\/?p=434"},"modified":"2023-10-02T13:29:46","modified_gmt":"2023-10-02T13:29:46","slug":"correlation-determination-and-prediction-error","status":"publish","type":"post","link":"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/","title":{"rendered":"Correlation, determination, and prediction error"},"content":{"rendered":"\n<p><a href=\"https:\/\/twitter.com\/mumbrainstats\/status\/1230140083631775745\">This tweet<\/a> appeared in my feed recently:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2020\/02\/Screenshot-at-2020-02-20-12-51-56.png\" alt=\"\" class=\"wp-image-439\" style=\"width:476px;height:235px\" width=\"476\" height=\"235\" srcset=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2020\/02\/Screenshot-at-2020-02-20-12-51-56.png 736w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2020\/02\/Screenshot-at-2020-02-20-12-51-56-300x148.png 300w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2020\/02\/Screenshot-at-2020-02-20-12-51-56-547x270.png 547w\" sizes=\"auto, (max-width: 476px) 100vw, 476px\" \/><\/figure>\n\n\n\n<p>I wrote about this topic in <em><a href=\"https:\/\/allendowney.github.io\/ElementsOfDataScience\/\">Elements of Data Science<\/a><\/em>\u00a0<a rel=\"noreferrer noopener\" href=\"https:\/\/colab.research.google.com\/github\/AllenDowney\/ElementsOfDataScience\/blob\/master\/09_relationships.ipynb\" target=\"_blank\">Notebook 9<\/a>, where I suggest that using Pearson&#8217;s coefficient of correlation, usually denoted r, to summarize the relationship between two variables is problematic because:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Correlation only quantifies the linear relationship between variables; if the relationship is non-linear, correlation tends to underestimate it.<\/li>\n\n\n\n<li>Correlation does not quantify the &#8220;strength&#8221; of the relationship in terms of slope, which is often more important in practice.<\/li>\n<\/ol>\n\n\n\n<p>For an explanation of either of those points, see <a href=\"https:\/\/colab.research.google.com\/github\/AllenDowney\/ElementsOfDataScience\/blob\/master\/09_relationships.ipynb\">the discussion in Notebook 9<\/a>.  But that tweet and the responses got me thinking, and now I think there are even more reasons correlation is not a great statistic:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>It is hard to interpret as a measure of predictive power.<\/li>\n\n\n\n<li>It makes the relationship between variables sound more impressive than it is.<\/li>\n<\/ol>\n\n\n\n<p>As an example, I&#8217;ll quantify the relationship between SAT scores and IQ tests. I know this is a contentious topic; people have strong feelings about the SAT, IQ, and the consequences of using standardized tests for college admissions.<\/p>\n\n\n\n<p>I chose this example <em>because<\/em> it is a topic people care about, and I think the analysis I present can contribute to the discussion.<\/p>\n\n\n\n<p>But a similar analysis applies in any domain where we use a correlation to quantify the strength of a relationship between two variables.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"SAT-scores-and-IQ\">SAT scores and IQ<\/h3>\n\n\n\n<p>According to Frey and Detterman, &#8220;<a rel=\"noreferrer noopener\" href=\"https:\/\/www.ncbi.nlm.nih.gov\/pubmed\/15147489\" target=\"_blank\">Scholastic Assessment or g? The relationship between the Scholastic Assessment Test and general cognitive ability<\/a>&#8220;, the correlation between SAT scores and general intelligence (<em>g<\/em>) is 0.82.<\/p>\n\n\n\n<p>That&#8217;s just one study, and if you read the paper, you might have questions about the methodology. But for now I will take this estimate at face value. If you find another source that reports a different correlation, feel free to plug in another value and run my analysis again.  <\/p>\n\n\n\n<p><a href=\"https:\/\/colab.research.google.com\/github\/AllenDowney\/ElementsOfDataScience\/blob\/master\/correlation.ipynb\">In the notebook<\/a>, I generate fake datasets with the same mean and standard deviation as the SAT and the IQ, and with a correlation of 0.82.<\/p>\n\n\n\n<p>Then I use them to compute <\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The coefficient of determination, <em>R<\/em>\u00b2,<\/li>\n\n\n\n<li>The mean absolute error (MAE),<\/li>\n\n\n\n<li>Root mean squared error (RMSE), and<\/li>\n\n\n\n<li>Mean absolute percentage error (MAPE).<\/li>\n<\/ul>\n\n\n\n<p>In the SAT-IQ example, the correlation is 0.82, which is a strong correlation, but I think it sounds stronger than it is.<\/p>\n\n\n\n<p><em>R<\/em>\u00b2&nbsp;is 0.66, which means we can reduce variance by 66%. But that also makes the relationship sound stronger than it is.<\/p>\n\n\n\n<p>Using SAT scores to predict IQ, we can reduce MAE by 44%, we can reduce RMSE by 42%, and we can reduce MAPE also by 42%.  <\/p>\n\n\n\n<p>Admittedly, these are substantial reductions.  If you have to guess someone&#8217;s IQ (for some reason) your guesses will be more accurate if you know their SAT scores.<\/p>\n\n\n\n<p>But any of these reductions in error is substantially more modest than the correlation might lead you to believe.<\/p>\n\n\n\n<p>The same pattern holds over the range of possible correlations.  The following figure shows <em>R<\/em>\u00b2 and the fractional improvement in RMSE as a function of correlation:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2020\/02\/corr3.png\" alt=\"\" class=\"wp-image-435\" style=\"width:503px;height:396px\" width=\"503\" height=\"396\" srcset=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2020\/02\/corr3.png 372w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2020\/02\/corr3-300x236.png 300w, https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2020\/02\/corr3-343x270.png 343w\" sizes=\"auto, (max-width: 503px) 100vw, 503px\" \/><\/figure>\n\n\n\n<p>For all values except 0 and 1, <em>R<\/em>\u00b2&nbsp;is less than correlation and the reduction in RMSE is even less than that.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Summary<\/h3>\n\n\n\n<p>Correlation is a problematic statistic because it sounds more impressive than it is.<\/p>\n\n\n\n<p>Coefficient of determination,&nbsp;<em>R<\/em>\u00b2,&nbsp;is a little better because it has a more natural interpretation: percentage reduction in variance. But reducing variance it usually not what we care about.<\/p>\n\n\n\n<p>A better option is to choose a measure of error that is meaningful in context, possibly MAE, RMSE, or MAPE.<\/p>\n\n\n\n<p>Which one of these is most meaningful depends on the cost function. Does the cost of being wrong depend on the absolute error, squared error, or percentage error? If so, that should guide your choice.<\/p>\n\n\n\n<p>One advantage of RMSE is that we don&#8217;t need the data to compute it; we only need the variance of the dependent variable and either\u00a0r\u00a0or\u00a0<em>R<\/em>\u00b2.  So if you read a paper that reports <em>r<\/em>, you can compute the corresponding reduction in RMSE.<\/p>\n\n\n\n<p>But any measure of predictive error is more meaningful than reporting correlation or&nbsp;<em>R<\/em>\u00b2.<\/p>\n\n\n\n<p><a href=\"https:\/\/colab.research.google.com\/github\/AllenDowney\/ElementsOfDataScience\/blob\/master\/correlation.ipynb\">The details of my analysis are in this Jupyter notebook.<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This tweet appeared in my feed recently: I wrote about this topic in Elements of Data Science\u00a0Notebook 9, where I suggest that using Pearson&#8217;s coefficient of correlation, usually denoted r, to summarize the relationship between two variables is problematic because: For an explanation of either of those points, see the discussion in Notebook 9. But that tweet and the responses got me thinking, and now I think there are even more reasons correlation is not a great statistic: As an&#8230;<\/p>\n<p class=\"read-more\"><a class=\"btn btn-default\" href=\"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/\"> Read More<span class=\"screen-reader-text\">  Read More<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[1],"tags":[67,69,68,70],"class_list":["post-434","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-correlation","tag-error","tag-regression","tag-rmse"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Correlation, determination, and prediction error - Probably Overthinking It<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Correlation, determination, and prediction error - Probably Overthinking It\" \/>\n<meta property=\"og:description\" content=\"This tweet appeared in my feed recently: I wrote about this topic in Elements of Data Science\u00a0Notebook 9, where I suggest that using Pearson&#8217;s coefficient of correlation, usually denoted r, to summarize the relationship between two variables is problematic because: For an explanation of either of those points, see the discussion in Notebook 9. But that tweet and the responses got me thinking, and now I think there are even more reasons correlation is not a great statistic: As an... Read More Read More\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/\" \/>\n<meta property=\"og:site_name\" content=\"Probably Overthinking It\" \/>\n<meta property=\"article:published_time\" content=\"2020-02-20T18:02:58+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-10-02T13:29:46+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2020\/02\/Screenshot-at-2020-02-20-12-51-56.png\" \/>\n<meta name=\"author\" content=\"AllenDowney\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@AllenDowney\" \/>\n<meta name=\"twitter:site\" content=\"@AllenDowney\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"AllenDowney\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/\"},\"author\":{\"name\":\"AllenDowney\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207\"},\"headline\":\"Correlation, determination, and prediction error\",\"datePublished\":\"2020-02-20T18:02:58+00:00\",\"dateModified\":\"2023-10-02T13:29:46+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/\"},\"wordCount\":697,\"publisher\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2020\/02\/Screenshot-at-2020-02-20-12-51-56.png\",\"keywords\":[\"correlation\",\"error\",\"regression\",\"RMSE\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/\",\"url\":\"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/\",\"name\":\"Correlation, determination, and prediction error - Probably Overthinking It\",\"isPartOf\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2020\/02\/Screenshot-at-2020-02-20-12-51-56.png\",\"datePublished\":\"2020-02-20T18:02:58+00:00\",\"dateModified\":\"2023-10-02T13:29:46+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/#primaryimage\",\"url\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2020\/02\/Screenshot-at-2020-02-20-12-51-56.png\",\"contentUrl\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2020\/02\/Screenshot-at-2020-02-20-12-51-56.png\",\"width\":736,\"height\":363},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.allendowney.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Correlation, determination, and prediction error\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#website\",\"url\":\"https:\/\/www.allendowney.com\/blog\/\",\"name\":\"Probably Overthinking It\",\"description\":\"Data science, Bayesian Statistics, and other ideas\",\"publisher\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.allendowney.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#organization\",\"name\":\"Probably Overthinking It\",\"url\":\"https:\/\/www.allendowney.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png\",\"contentUrl\":\"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png\",\"width\":714,\"height\":784,\"caption\":\"Probably Overthinking It\"},\"image\":{\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/AllenDowney\",\"https:\/\/www.linkedin.com\/in\/allendowney\/\",\"https:\/\/bsky.app\/profile\/allendowney.bsky.social\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207\",\"name\":\"AllenDowney\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g\",\"caption\":\"AllenDowney\"},\"url\":\"https:\/\/www.allendowney.com\/blog\/author\/allendowney_6dbrc4\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Correlation, determination, and prediction error - Probably Overthinking It","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/","og_locale":"en_US","og_type":"article","og_title":"Correlation, determination, and prediction error - Probably Overthinking It","og_description":"This tweet appeared in my feed recently: I wrote about this topic in Elements of Data Science\u00a0Notebook 9, where I suggest that using Pearson&#8217;s coefficient of correlation, usually denoted r, to summarize the relationship between two variables is problematic because: For an explanation of either of those points, see the discussion in Notebook 9. But that tweet and the responses got me thinking, and now I think there are even more reasons correlation is not a great statistic: As an... Read More Read More","og_url":"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/","og_site_name":"Probably Overthinking It","article_published_time":"2020-02-20T18:02:58+00:00","article_modified_time":"2023-10-02T13:29:46+00:00","og_image":[{"url":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2020\/02\/Screenshot-at-2020-02-20-12-51-56.png","type":"","width":"","height":""}],"author":"AllenDowney","twitter_card":"summary_large_image","twitter_creator":"@AllenDowney","twitter_site":"@AllenDowney","twitter_misc":{"Written by":"AllenDowney","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/#article","isPartOf":{"@id":"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/"},"author":{"name":"AllenDowney","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207"},"headline":"Correlation, determination, and prediction error","datePublished":"2020-02-20T18:02:58+00:00","dateModified":"2023-10-02T13:29:46+00:00","mainEntityOfPage":{"@id":"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/"},"wordCount":697,"publisher":{"@id":"https:\/\/www.allendowney.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/#primaryimage"},"thumbnailUrl":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2020\/02\/Screenshot-at-2020-02-20-12-51-56.png","keywords":["correlation","error","regression","RMSE"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/","url":"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/","name":"Correlation, determination, and prediction error - Probably Overthinking It","isPartOf":{"@id":"https:\/\/www.allendowney.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/#primaryimage"},"image":{"@id":"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/#primaryimage"},"thumbnailUrl":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2020\/02\/Screenshot-at-2020-02-20-12-51-56.png","datePublished":"2020-02-20T18:02:58+00:00","dateModified":"2023-10-02T13:29:46+00:00","breadcrumb":{"@id":"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/#primaryimage","url":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2020\/02\/Screenshot-at-2020-02-20-12-51-56.png","contentUrl":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2020\/02\/Screenshot-at-2020-02-20-12-51-56.png","width":736,"height":363},{"@type":"BreadcrumbList","@id":"https:\/\/www.allendowney.com\/blog\/2020\/02\/20\/correlation-determination-and-prediction-error\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.allendowney.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Correlation, determination, and prediction error"}]},{"@type":"WebSite","@id":"https:\/\/www.allendowney.com\/blog\/#website","url":"https:\/\/www.allendowney.com\/blog\/","name":"Probably Overthinking It","description":"Data science, Bayesian Statistics, and other ideas","publisher":{"@id":"https:\/\/www.allendowney.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.allendowney.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.allendowney.com\/blog\/#organization","name":"Probably Overthinking It","url":"https:\/\/www.allendowney.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png","contentUrl":"https:\/\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/03\/probably_logo.png","width":714,"height":784,"caption":"Probably Overthinking It"},"image":{"@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/AllenDowney","https:\/\/www.linkedin.com\/in\/allendowney\/","https:\/\/bsky.app\/profile\/allendowney.bsky.social"]},{"@type":"Person","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/4e5bfb2e9af6c3446cb0031a7bf83207","name":"AllenDowney","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.allendowney.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/fb01b3a7f7190bea1bbf7f0852e686c2f8c03b099222df2ce4bc7926f15bcb43?s=96&d=mm&r=g","caption":"AllenDowney"},"url":"https:\/\/www.allendowney.com\/blog\/author\/allendowney_6dbrc4\/"}]}},"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[{"id":482,"url":"https:\/\/www.allendowney.com\/blog\/2020\/10\/13\/whatever-the-question-was-correlation-is-not-the-answer\/","url_meta":{"origin":434,"position":0},"title":"Whatever the question was, correlation is not the answer","author":"AllenDowney","date":"October 13, 2020","format":false,"excerpt":"Pearson's coefficient of correlation, r, is one of the most widely-reported statistics. But in my opinion, it is useless; there is no good reason to report it, ever. Most of the time, what you really care about is either effect size or predictive value: To quantify effect size, report the\u2026","rel":"","context":"In \"BRFSS\"","block_context":{"text":"BRFSS","link":"https:\/\/www.allendowney.com\/blog\/tag\/brfss\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2020\/10\/image-1.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":1034,"url":"https:\/\/www.allendowney.com\/blog\/2023\/10\/02\/what-size-is-that-correlation\/","url_meta":{"origin":434,"position":1},"title":"What size is that correlation?","author":"AllenDowney","date":"October 2, 2023","format":false,"excerpt":"This article is related to Chapter 6 of Probably Overthinking It, which is available for preorder now. It is also related to a new course at Brilliant.org, Explaining Variation. Suppose you find a correlation of 0.36. How would you characterize it? I posed this question to the stalwart few still\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2023\/10\/Screenshot-at-2023-10-01-13-51-49.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2023\/10\/Screenshot-at-2023-10-01-13-51-49.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2023\/10\/Screenshot-at-2023-10-01-13-51-49.png?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2023\/10\/Screenshot-at-2023-10-01-13-51-49.png?resize=700%2C400&ssl=1 2x"},"classes":[]},{"id":1618,"url":"https:\/\/www.allendowney.com\/blog\/2025\/10\/16\/simpsons-what\/","url_meta":{"origin":434,"position":2},"title":"Simpson&#8217;s What?","author":"AllenDowney","date":"October 16, 2025","format":false,"excerpt":"I like Simpson\u2019s paradox so much I wrote three chapters about it in Probably Overthinking It. In fact, I like it so much I have a Google alert that notifies me when someone publishes a new example (or when the horse named Simpson\u2019s Paradox wins a race). So I was\u2026","rel":"","context":"In \"epidemiology\"","block_context":{"text":"epidemiology","link":"https:\/\/www.allendowney.com\/blog\/tag\/epidemiology\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2025\/10\/image-1.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":536,"url":"https:\/\/www.allendowney.com\/blog\/2021\/04\/07\/berkson-goes-to-college\/","url_meta":{"origin":434,"position":3},"title":"Berkson Goes to College","author":"AllenDowney","date":"April 7, 2021","format":false,"excerpt":"This article is like a pre-excerpt from my forthcoming book, Probably Overthinking It: a revised version of this article make up part of the chapter about Berkson's paradox. If you would like to get an occasional update about the book, please join my mailing list. Suppose one day you visit\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2021\/04\/berkson5.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2021\/04\/berkson5.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2021\/04\/berkson5.png?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2021\/04\/berkson5.png?resize=700%2C400&ssl=1 2x"},"classes":[]},{"id":996,"url":"https:\/\/www.allendowney.com\/blog\/2023\/08\/20\/how-correlated-are-you\/","url_meta":{"origin":434,"position":4},"title":"How Correlated Are You?","author":"AllenDowney","date":"August 20, 2023","format":false,"excerpt":"This post is an offshoot from Chapter 1 of Probably Overthinking It, which is available for pre-order now! Suppose you measure the arm and leg lengths of 4082 people. You would expect those measurements to be correlated, and you would be right. In the ANSUR-II dataset, among male members of\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2023\/08\/image-7.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2023\/08\/image-7.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/www.allendowney.com\/blog\/wp-content\/uploads\/2023\/08\/image-7.png?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":902,"url":"https:\/\/www.allendowney.com\/blog\/2023\/04\/02\/llm-assisted-programming\/","url_meta":{"origin":434,"position":5},"title":"LLM-Assisted Programming","author":"AllenDowney","date":"April 2, 2023","format":false,"excerpt":"I've been experimenting with programming assisted by Large Language Models (LLMs) like ChatGPT. I am amazed at how good it is, and it seems clear to me that the great majority of programming work will be LLM-assisted, starting now. Here are some of the examples I've tried. Think Python For\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts\/434","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/comments?post=434"}],"version-history":[{"count":6,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts\/434\/revisions"}],"predecessor-version":[{"id":1046,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/posts\/434\/revisions\/1046"}],"wp:attachment":[{"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/media?parent=434"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/categories?post=434"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.allendowney.com\/blog\/wp-json\/wp\/v2\/tags?post=434"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}