The library of data visualization
Getting ready for my Data Science class (starting next week!) I am updating my data visualization library, looking for resources to help students learn about visualization.
Last week I asked Twitter to help me find resources, especially new ones. Here’s the thread. Thank you to everyone who responded!
I’ll try to summarize and organize the responses. I am mostly interested in books and web pages about visualization, rather than examples of it or tools for doing it.
There are lots of good books; to impose some order, I put them in three categories: newer work, the usual suspects, and moldy oldies.
![]()
Newer books
The following are some newer books (or at least new to me).
Fundamentals of Data Visualization, by Claus O. Wilke (online preview of a book forthcoming from O’Reilly)
Data Visualization: A practical introduction Kieran Healy (free online draft)
Data Visualization: Charts, Maps, and Interactive Graphics Robert Grant
Data Visualisation: A Handbook for Data Driven Design by Andy Kirk
Dear Data by Giorgia Lupi, Stefanie Posavec
![]()
Established books
The following are more established books that appear on most lists.
The Functional Art: An Introduction to Information Graphics and Visualization by Alberto Cairo
The Truthful Art: Data, Charts, and Maps for Communication by Alberto Cairo
Interactive Data Visualization for the Web by Scott Murray
Storytelling with Data: A Data Visualization Guide for Business Professionals by Cole Nussbaumer Knaflic
Beautiful Visualization: Looking at Data through the Eyes of Experts by Julie Steele
Designing Data Visualizations: Representing Informational Relationships by Noah Iliinsky, Julie Steele
Visualization Analysis and Design by Tamara Munzner
Visualize This: The FlowingData Guide to Design, Visualization, and Statistics by Nathan Yau
Data Points: Visualization That Means Something by Nathan Yau
Show Me the Numbers: Designing Tables and Graphs to Enlighten by Stephen Few
Now You See It: Simple Visualization Techniques for Quantitative Analysis by Stephen Few
![]()
Older books
The Visual Display of Quantitative Information by Edward R. Tufte
The Elements of Graphing Data by William S. Cleveland
![]()
Websites and blogs
Again, I mostly went for sites that are about visualization, rather than examples of it.
The Data Visualisation Catalogue
More references and resources from MPA 635: DATA VISUALIZATION
![]()
Videos and podcasts
The Art of Data Visualization | Off Book | PBS Digital Studios
 Data Stories A podcast on data visualization with Enrico Bertini and Moritz Stefaner
![]()
Python-specific resources
Python Plotting for Exploratory Data Analysis
How to visualize data in Python
![]()

Nelis Willers “wrote a 510 page book with LaTeX, using 

For the first 9 months, from September to May, we see what we would expect if at least some of the excess diagnoses are due to age-related behavior differences. For each month of difference in age, we see an increase in the number of diagnoses.
This pattern breaks down for the last three months, June, July, and August. This might be explained by random variation, but it also might be due to parental intervention; if some parents hold back students born near the deadline, the observations for these months include some children who are relatively old for their grade and therefore less likely to be diagnosed.
We could test this hypothesis by checking the actual ages of these students when they started school, rather than just looking at their months of birth. I will see whether that additional data is available; in the meantime, I will proceed taking the data at face value.
I fit the data using a Bayesian logistic regression model, assuming a linear relationship between month of birth and the log-odds of diagnosis. The following figure shows the fitted models superimposed on the data.
Most of these regression lines fall within the credible intervals of the observed rates, so in that sense this model is not ruled out by the data. But it is clear that the lower rates in the last 3 months bring down the estimated slope, so we should probably consider the estimated effect size to be a lower bound on the true effect size.
To express this effect size in a way that’s easier to interpret, I used the posterior predictive distributions to estimate the difference in diagnosis rate for children born in September and August. The difference is 21 diagnoses per 10,000, with 95% credible interval (13, 30).
As a percentage of the baseline (71 diagnoses per 10,000), that’s an increase of 30%, with credible interval (18%, 42%).
However, if it turns out that the observed rates for June, July, and August are brought down by red-shirting, the effect could be substantially higher. Here’s what the model looks like if we exclude those months:
Of course, it is hazardous to exclude data points because they violate expectations, so this result should be treated with caution. But under this assumption, the difference in diagnosis rate would be 42 per 10,000. On a base rate of 67, that’s an increase of 62%.
Here is the notebook with the details of my analysis:
Here’s another Bayes puzzle: