Backlash of Homophobia?

Backlash of Homophobia?

Last week I published an excerpt from Probably Overthinking It that showed a long-term decline in homophobic responses to questions in the General Social Survey, starting around 1990 and continuing in the most recent data.

Then I heard from a friend that Gallup published an article just a few weeks ago, with the title “Fewer in U.S. Say Same-Sex Relations Morally Acceptable”.

It features this graph, which shows that after a consistent increase from 2001 to 2022, the percentage of respondents who said same-sex relations are morally acceptable declined from 71% to 64% in 2023.

Looking the whole time series, there are several reasons I don’t think this change reflects an long-term reversal in the population:

1) The variation from year to year is substantial. This year’s drop is bigger than most, but not an outlier. I conjecture that some of the variation from year to year is due to short-term period effects — like whatever people were reading about in the news in the interval before they were surveyed.

2) Even with the drop, the most recent point is not far below the long-term trend.

3) Last year was a record high, so a part of the drop is regression to the mean.

4) A large part of the trend is due to generational replacement, so unless young people die and are replaced by old people, that can’t go into reverse.

5) The other part of the trend is due to changed minds. While it’s possible for that to go into reverse, I start with a strong prior that it will not. In general, the moral circle expands.

Taken together, I would make a substantial bet that next year’s data point will be 3 or more percentage points higher, and I would not be surprised by 7-10.

The Data

Gallup makes it easy to download the data from the article, so I’ll use it to make my argument more quantitative. Here’s the time series.

The responses vary from year to year. Here is the distribution of the differences in percentage points.

Changes of 4 percentage points in either direction are not unusual. This year’s decrease of 7 points is bigger than what we’ve seen in the past, but not by much.

This figure shows the time series again, along with a smooth curve fit by local regression (LOWESS).

Since last year’s point was above the long term trend, we would have expected this year’s point to be lower by about 1 percentage points, just by returning to the trend line.

That leaves 6 points unaccounted for. To get a sense of how unexpected a drop that size is, we can compute the average and standard deviation of the distances from the points to the regression line. The mean is 1.7 points, and the standard deviation is 1.3.

So a two-sigma event is a 4.2 point distance, and a three-sigma event is a 5.4 point distance.

Of the 7-point drop:

  • 1 point is what we’d expect from a return to the long-term trend.
  • 4-5 points are within the range of random variation we’ve seen from year to year.

Which leaves 1-2 points that could be a genuine period effect.

But I think it’s likely to be short term. As the Gallup article notes, “From a longer-term perspective, Americans’ opinions of most of these issues have trended in a more liberal direction in the 20-plus years Gallup has asked about them.”

And there are two reasons I think they are likely to continue.

One reason is the expansion of the moral circle, an idea proposed by historian William Lecky in 1867. He wrote:

“At one time the benevolent affections embrace merely the family, soon the circle expanding includes first a class, then a nation, then a coalition of nations, then all humanity, and finally, its influence is felt in the dealings of man with the animal world.”

Lecky, A History of European Morals from Augustus to Charlemagne

Historically, the expansion of the moral circle seldom goes in reverse, and never for long.

The other reason is generational replacement. Older people are substantially more likely to think homosexuality is not moral. As they die, they are replaced by younger people who have no problem with it.

The only way for that trend to go in reverse is if a very large, long-term period effect somehow convinces Gen Z and their successors that they were mistaken and — actually — homosexuality is wrong.

I predict that next year’s data point will be substantially higher than this year’s.

Here’s the notebook where I created these plots.

Go Get the Data

Go Get the Data

My mantra when I was working on Probably Overthinking It was “Go Get the Data.” If I wanted to use a result from prior work, I would get the data whenever possible and make my own visualization. Of course, that’s more work than copying and pasting a figure, but there are a lot of benefits. One is that I can often get newer data. Another is that I can check the results.

I was reminded of these benefits a few months ago when I was reading a very good book that I won’t name because I’m about to point out an error, and I don’t want to be a jerk about it. The book includes the following figure:

Source: A book I’m not identifying because the point of this example is not to name and shame.

The top line got my attention because I have worked on several projects with data from the General Social Survey (GSS), and this particular question is one of the examples in my Political Alignment Case Study. Plotting the fraction of people who think homosexuality is wrong, I generated this figure:

Source: Downey, Political Alignment Case Study.

These results are not the same (even accounting for the inverted y-axis).

So I checked the source, which is Steven Pinker’s The Better Angels of Our Nature, specifically this figure:

Source: Page 452 of Pinker, The Better Angels of Our Nature

It looks like Pinker’s figure is consistent with mine. So what went wrong with the first figure?

Here’s what I think happened: they wanted to invert the y axis to show the percentage who think homosexuality is not wrong. But instead of flipping the y axis, they seem to have flipped the x axis, reversing the arrow of time! If you compare the other lines, it looks like they have all been reversed left-for-right.

Now, this is just one error in an otherwise excellent book. But it’s a reminder that this kind of thing can be avoided, if you Go Get the Data.

Abstracts and keywords

Abstracts and keywords

As Probably Overthinking It approaches the finish line, there are just a few more tasks: I am working on the index and — as I have recently learned — I also have to write a 200-word abstract, a list of keywords for each chapter, and a 250-word abstract for the whole book. I have come to two conclusions:

  • There is no good software for making an index from a PDF document.
  • ChatGPT is pretty good at writing chapter abstracts.

By converting each chapter to a simple text file (no figures and tables) I was able to get most of them under the ~4000 token limit. Then I asked, “write a one-paragraph abstract for this chapter with no more than 180 words. write about the chapter in the third person”.

Here’s the unedited abstract for the last chapter:

The chapter titled “Chasing the Overton Window” explores the phenomenon of people considering themselves more conservative as they age, despite their views actually becoming more liberal over time. The chapter begins by examining the trends of self-identification and political views across different age groups. It then delves into the concept of the Overton window, which represents the range of acceptable public opinion on a given issue. The chapter explains how the center of public opinion has shifted toward liberalism over the past 50 years and how the relative terms “liberal” and “conservative” depend on the perceived center. Furthermore, it demonstrates that both liberals and conservatives have become more liberal over time. The chapter concludes by proposing an explanation for why people think they are becoming more conservative, even though their views are becoming more liberal, which involves the interplay of generational effects, the shifting center of public opinion, and the connotations associated with political labels.

ChatGPT June 10, 2023

It’s not great prose, but I think I can revise it into something acceptable without much effort.

Three of the chapters exceeded the token limit, so I asked for a summary of the first half, then a summary of the second half, then I asked, “Combine the following two abstracts into a single paragraph with no more than 180 words”. Here’s the combined abstract of Chapter 8:

This chapter delves into the distribution of natural and human-caused disasters, investigating their sizes, costs, prevalence, and characteristics within long-tailed distributions. Understanding the probabilities of major disasters is crucial for effective preparedness and response, despite the challenge of comprehending rare and large-scale events. By analyzing a dataset of 125 disasters, including hurricanes, earthquakes, floods, nuclear disasters, and terror attacks, the author demonstrates a pattern where doubling the rank of a disaster corresponds to halving its costs when plotted on a logarithmic scale. While exploring the limitations of the lognormal distribution in predicting the probabilities of large disasters, the author introduces Student’s t-distribution as a more suitable model for estimating the probabilities of extreme events. The chapter also examines lunar craters and their abundance and sizes, revealing the prevalence of long-tailed distributions and their connection to asteroid sizes. Additionally, it explores the occurrence of long-tailed distributions in stock market crashes and introduces the concept of black swans to emphasize their relevance in understanding rare and impactful events. Concluding the chapter, it discusses the challenges associated with predicting and comprehending rare, large events in a long-tailed world, with a specific focus on earthquake magnitudes and a comparison of prediction models.

ChatGPT June 10, 2023

Again, I think that’s editing distance away from acceptable — and a near perfect 198 words.

It does pretty well with keywords, too:

  1. Disasters
  2. Long-tailed distributions
  3. Probabilities
  4. Preparedness
  5. Response
  6. Natural events
  7. Human-made incidents
  8. Lognormal distribution
  9. Student’s t-distribution
  10. Black swans

To generate the abstract for the whole book, I pasted the chapter abstracts and keywords into a single document and asked, “the following are chapter abstracts for a book called ‘Probably Overthinking It’. write a one-paragraph abstract for the whole book, more than 200 words, but not more than 250 words. write about the book in the third person. not too much like an ad”

Here’s the unedited result:

“Probably Overthinking It” is a thought-provoking book that delves into various statistical and cognitive paradoxes, challenging common intuitions and shedding light on the complexities of human behavior and measurement. Through a series of captivating chapters, the author explores topics such as normalcy and weirdness, inspection paradoxes, family size and population growth, lognormal distributions, expected remaining time, Berkson’s paradox, low birthweight paradox, long-tailed distributions, fairness and fallacy in algorithmic decision-making, Simpson’s paradox, the moral circle, the evolution of societal values, and the shifting perceptions of political ideologies. By analyzing real-world data and employing statistical models, the book uncovers the nuances and contradictions that emerge when multiple factors interact, leading to counterintuitive outcomes. Readers will gain a deeper understanding of statistical biases, the limitations of measurement, and the challenges inherent in making accurate predictions and drawing meaningful conclusions. “Probably Overthinking It” encourages readers to question their assumptions, approach complex problems with skepticism, and develop a more nuanced and informed perspective on the intricacies of human behavior and the world around us.

ChatGPT June 10, 2023

Surprisingly, the book abstract is only 169 words, despite my prompt. In general, the word counts are not precise, and more often over than under (which is why I asked for 180, not 200).

So, I have some editing to do, but I’d say ChatGPT saved me at least a few hours — and spared me from exactly the kind of writing I dislike the most.

Absence of Evidence

Absence of Evidence

If anyone tells you that absence of evidence is not evidence of absence, you have my permission to slap them. Of course, my permission will not prevent you from getting slapped back or charged with assault. Regardless, absence of evidence is very often evidence of absence, and sometimes strong evidence.

To make this claim precise, I propose we use the Bayesian definition of evidence:

If an observation, D, would be more likely under a hypothesis, H, than under the alternative hypothesis, then D is evidence in favor of H. Conversely, if D is less likely under H than under the alternative, D is evidence against H.

As an example, suppose H is the hypothesis that unicorns exist. Since people have explored most of the world’s land mass, I’d say there’s a 99% chance we would have found unicorns if they existed.

So if D is the fact that we have not found unicorns, the probability of D is only 1% if unicorns exist, and 100% if they don’t. Therefore, D is evidence that unicorns don’t exist, with a likelihood ratio of 100:1.

Let’s consider a more realistic example. In a recent article, The Economist discusses the hypothesis that social media use is a major cause of recent increases in rates of self-harm and suicide among teenage girls. To test this hypothesis, they propose an experiment:

Because smartphones were adopted at different rates in different countries, the timing of any increases they caused in suicides or self-harm should vary on this basis.

But their experiment came up empty:

[W]e could not find any statistical link between changes over time in the prevalence of either mobile-internet subscriptions or self-reported social-media use in a country, and changes over time in that country’s suicide or self-harm hospitalisation rates, for either boys or girls.

They conclude:

But if social media were the sole or main cause of rising levels of suicide or self-harm—rather than just one part of a complex problem—country-level data would probably show signs of their effect.

Since it did not, this negative result is evidence against the hypothesis. It may not be strong evidence; there are other reasons the experiment might have failed. And in light of other evidence, it is still plausible that social media is harmful to mental health.

Nevertheless, in this example, as in any reasonable experiment, absence of evidence is evidence of absence.

[In this 2015 article, I made a similar claim that we should stop saying correlation does not imply causation.]

Causation, Collision, and Confusion

Causation, Collision, and Confusion

Today I presented a talk about Berkson’s paradox at ODSC East 2023. If you missed it, the slides are here. When the video is available, I’ll post it here.

Abstract: Collision bias is the most treacherous error in statistics: it can be subtle, it is easy to induce it by accident, and the error it causes can be bigger than the effect you are trying to measure. It is the cause of Berkson’s paradox, the low birthweight paradox, and the obesity paradox, among other famous historical errors. And it might be the cause of your next blunder! Although it is best known in epidemiology, it appears in other fields of science, engineering, and business.

In this talk, I will present examples of collision bias and show how it can be caused by a biased sampling process or induced by inappropriate statistical controls; and I will introduce causal diagrams as a tool for representing causal hypotheses and diagnosing collision bias.

So, don’t tell anyone, but this talk is part of my stealth book tour!

  • It started in 2019, when I presented a talk at PyData NYC based on Chapter 2: Relay Races and Revolving Doors.
  • In 2022, I presented another talk at PyData NYC, based on Chapter 12: Chasing the Overton Window.
  • Today’s talk is based on Chapter 7: Causation, Collision, and Confusion.
  • In July I’m presenting a talk at SciPy based on Chapter 9: The Long Tail of Disaster.

And, if things go according to plan, I’ll present Chapter 1 at a book event at the Needham Public Library on December 7.

More chapters coming soon!

The Overton Paradox

The Overton Paradox

Chapter 12 of Probably Overthinking It is about three trends that form what I’m calling the Overton Paradox:

  • Older people are more likely to say they are conservative.
  • And older people hold more conservative views.
  • But people don’t become more conservative as they get older — on average they get a little more liberal.

To demonstrate these trends, I used data from the General Social Survey.

Older people are more likely to say they are conservative:

And older people hold more conservative views:

But if we split people up by decade of birth, most cohorts don’t become more conservative as they get older; on average they become a little more liberal.

So if people become more liberal as they age, why are they more likely to say they are conservative?

I think the reason is that the perceived center of mass changes over time. Here’s how the average number of conservative responses has changed over the ~50 years of the GSS:

And it’s not just liberals going off the rails — all three groups have changed:

Let’s compare these changes to the average for people born in the 1940s:

In 1970, when they were in their 20s, this cohort was about as liberal as the average liberal. In 1990, when they were in their 40s, they were indistinguishable from the average moderate. In 2020, they were in their 70s, they found themselves substantially right of center.

On average, they are more liberal now than they were in 1970, but the world has moved faster. They are more likely to say they are conservative because, relative to the center of mass, they are.

LLM-Assisted Programming

LLM-Assisted Programming

I’ve been experimenting with programming assisted by Large Language Models (LLMs) like ChatGPT. I am amazed at how good it is, and it seems clear to me that the great majority of programming work will be LLM-assisted, starting now. Here are some of the examples I’ve tried.

Think Python

For most of the exercises in Think Python, if you paste the question into ChatGPT, you get a correct answer on the first try. For example, here’s one of the exercises in the chapter on dictionaries:

Prompt: If you did Exercise 7, you already have a function named has_duplicates that takes a list as a parameter and returns True if there is any object that appears more than once in the list.

Use a dictionary to write a faster, simpler version of has_duplicates.

And here’s the response:

ChatGPT: Here’s one possible implementation of the has_duplicates function using a dictionary to keep track of the elements in the list:

def has_duplicates(lst):
    """
    Returns True if there are any duplicates in the given list, False otherwise.
    """
    seen = {}
    for item in lst:
        if item in seen:
            return True
        seen[item] = True
    return False

ChatGPT: The function works by iterating through the list and adding each item to the seen dictionary. If an item is already in the dictionary, it means that we’ve seen it before and therefore there’s a duplicate in the list, so we return True. If we iterate through the entire list without finding any duplicates, we return False.

This implementation is faster and simpler than the previous one because checking for the presence of an item in a dictionary is much faster than iterating through the entire list each time. Also, using a dictionary allows us to keep track of the elements in the list using less code.

One of the later exercises asks what’s wrong with this class definition:

class Kangaroo:
    """A Kangaroo is a marsupial."""
    
    def __init__(self, name, contents=[]):
        """Initialize the pouch contents.
        name: string
        contents: initial pouch contents.
        """
        self.name = name
        self.pouch_contents = contents

Here’s the response:

ChatGPT: The Kangaroo class has a potential issue in its __init__ method with the contents parameter. Specifically, the default value for the contents parameter is a mutable object (i.e., a list), which can cause unexpected behavior.

When the __init__ method is called without a contents argument, it creates a new instance of the list object and assigns it to the contents parameter. However, since the default value for contents is a mutable object, the same list object is used as the default value for all Kangaroo instances that don’t provide a contents argument.

This means that if you create two Kangaroo instances without providing a contents argument, both instances will share the same list object. This can lead to unexpected behavior if you modify the list of one Kangaroo instance, as the changes will be reflected in the other instance as well.

To fix this issue, you can use a more appropriate default value for the contents parameter, such as None, and then create a new list instance inside the __init__ method if no contents argument is provided, like this:

class Kangaroo:
    
    def __init__(self, name, contents=None):
        """Initialize the pouch contents.
        name: string
        contents: initial pouch contents.
        """
        self.name = name
        if contents is None:
            self.pouch_contents = []
        else:
            self.pouch_contents = contents

So that’s pretty good.

On Twitter, many people have helpfully pointed out that the exercises in Think Python, and solutions available on the web, are part of ChatGPT’s training corpus, so I am “testing on the training set”.

While that may be true, I think it misses the point several times over:

  1. The responses I saw were not my solutions, and based on a few Google searches, I could not find a source they were copied from. At worse, ChatGPT makes it easier and faster to find a correct solution than a conventional search. But it seems to be doing more than that, recombining elements of the training corpus in useful (possibly novel) ways.
  2. Even if LLMs do nothing more than retrieve solutions from a large database, that alone would accelerate the great majority of programming — so substantially that I think it will change the day-to-day work of a great majority of programmers.

Data Structures

One of my complaints about data structures classes is that most of the work students do is re-implementation of basic algorithms. Most of them have been available in libraries for a long time, which means there is an opportunity to spend less time on how they work and more time on how we use them to do more interesting things. I hope the prevalence of LLMs will accelerate this process.

Here’s one of the exercises from Data Structures and Information Retrieval in Python (aka DSIRP):

Prompt: Write a function called intersect that takes two LetterSet objects and returns a new LetterSet that represents the intersection of the two sets. In other words, the new LetterSet should contain only elements that appear in both sets. In a comment, identify the order of growth of this function.

I also provided the introduction to the question, which introduced the LetterSet object. Here’s the code ChatGPT generated:

def intersect(set1, set2):
    bits = [a and b for a, b in zip(set1.bits, set2.bits)]
    return LetterSet(bits)

And its explanation:

ChatGPT: The intersect function first creates a new list bits by using a list comprehension that computes the element-wise logical AND of the bits in the two LetterSet objects. The resulting list contains True values only for the elements that appear in both sets, and False values for all other elements.

The intersect function then creates a new LetterSet object using the bits list and returns it.

The order of growth of the intersect function is O(1), because the size of the LetterSet objects is fixed at 26 elements, and the function always performs a constant number of operations (i.e., create a new list using a list comprehension, create a new LetterSet object, and return it).

Unlike Think Python, DSIRP is not well known and if there are answers to these exercises on the web, they are not easy to find.

But again, I don’t think it matters where the answers are coming from. The impact on the practice of programming is the same: the solution to basic programming problems can be generated quickly, which means programmers will be able to spend more time on higher-value activities.

Probability of Superiority

Let’s see how ChatGPT does with something a little harder:

Prompt: write a python function that takes the parameters of two poisson distributions and computes the probability of superiority

Here’s the code (I’ll skip the explanatory text):

from math import exp, factorial

def poisson_probability_of_superiority(lambda1, lambda2):
    """
    Calculates the probability of superiority between two Poisson distributions.
    
    Parameters:
    lambda1 (float): the parameter of the first Poisson distribution.
    lambda2 (float): the parameter of the second Poisson distribution.
    
    Returns:
    float: the probability that a random variable drawn from the first Poisson distribution is greater than a random variable drawn from the second Poisson distribution.
    """
    prob = 0
    for i in range(int(lambda1) + 1):
        for j in range(int(lambda2)):
            prob += (exp(-lambda1) * lambda1**i / factorial(i)) * (exp(-lambda2) * lambda2**j / factorial(j))
    
    return prob

I asked for a version that uses SciPy and NumPy, and got this:

import numpy as np
from scipy.stats import poisson

def poisson_probability_of_superiority(lambda1, lambda2):
    """
    Calculates the probability of superiority between two Poisson distributions.
    
    Parameters:
    lambda1 (float): the parameter of the first Poisson distribution.
    lambda2 (float): the parameter of the second Poisson distribution.
    
    Returns:
    float: the probability that a random variable drawn from the first Poisson distribution is greater than a random variable drawn from the second Poisson distribution.
    """
    k = np.arange(int(lambda1) + 1)
    pmf1 = poisson.pmf(k, lambda1)
    cdf2 = poisson.cdf(k - 1, lambda2)
    prob = np.sum(pmf1 * (1 - cdf2))
    
    return prob

This is close to being a good solution, but there are still a couple of problems. However, after just a few more iterations, I got a good solution to this problem, and a similar solution that works with exponential distributions. You can see the sequence of prompts and responses in this notebook.

I have no idea where this solution came from. If you Google for a solution to this problem, the results are not very useful. If anyone reading this can find the source of this code, please let me know.

Looking at the sequence of solutions ChatGPT generated, I got several ideas for solving this problem that I might not have come up with. The solutions I got still need some work, but I could polish them off without much additional effort. And I got ChatGPT to generate a unit test as well:

def test_exponential_probability_of_superiority():
    # Test case with known result
    lambda1 = 0.5
    lambda2 = 1.0
    expected_prob = 0.368
    tol = 0.001
    prob = exponential_probability_of_superiority(lambda1, lambda2)
    assert abs(prob - expected_prob) < tol, f"Test case failed: expected {expected_prob}, but got {prob}"

I would not have confidence in this solution without additional refactoring and testing, but ChatGPT got me off to a very fast start on a non-trivial problem.

Correlation in Elm

If you find yourself working with an unfamiliar language or libraries, LLMs can accelerate your learning — by a lot. Recently I started learning Elm, so I am not very good at it. And it is not a commonly-used language, so I expect the corpus to be limited. Let’s see how it does on a basic stats question:

Prompt: write an elm function to compute the correlation of two lists of floats

Here’s the response:

correlation : List Float -> List Float -> Maybe Float
correlation xs ys =
    let
        n = List.length xs
        meanX = List.sum xs / Float.fromInt n
        meanY = List.sum ys / Float.fromInt n
        numerator = List.foldl (\(x, y) acc -> acc + ((x - meanX) * (y - meanY))) 0 (List.zip xs ys)
        denominatorX = List.foldl (\x acc -> acc + ((x - meanX) * (x - meanX))) 0 xs
        denominatorY = List.foldl (\y acc -> acc + ((y - meanY) * (y - meanY))) 0 ys
        denominator = sqrt (denominatorX * denominatorY)
    in
    if denominator == 0 then
        Nothing
    else
        Just (numerator / denominator)

That’s pretty good, and the explanation is even better:

ChatGPT: The correlation function takes in two lists of floats xs and ys and computes the Pearson correlation coefficient between them. The result is returned as a Maybe Float, as it is possible for the denominator to be 0 if one of the lists is empty or if all the elements in one of the lists are the same, in which case the correlation is undefined.

The function first computes the mean of each list and then uses these means to calculate the numerator of the correlation coefficient using a fold over the zipped lists. It then calculates the denominators for each list separately, and computes the denominator of the correlation coefficient by taking the product of these two denominators and then taking the square root. Finally, the function checks if the denominator is 0 and returns Nothing in that case, or returns the correlation coefficient as a Just Float.

To someone getting the hang of Elm’s type system, this is very helpful.

I asked it to generate a version using map instead of foldl, and another using existing functions to compute mean and variance. You can see the entire exchange in this notebook.

Coding is different now

When I wrote about these examples on Twitter, I got more disagreement than I expected. Lots of people reminded me of the limitations of LLMs for generating code. But again, I think this is missing the point. Even if LLMs only solve simple programming problems, there are a lot of simple programming problems! And I conjecture that most programmers spend most of their time on things that ChatGPT could greatly accelerate — or just solve.

And we’ve only been using them for a few weeks! LLMs will get better, and we will get better at using them. So I stand by my conclusion: The great majority of coding will be LLM-assisted, starting now.

LLMs will also have a huge effect on how we teach and learn programming, but I’ll get to that later.

Addendum: The New Skills

Most of the skills programmers use now are also the skills they will need to work with LLMs. Breaking a problem down into smaller problems, and designing good interfaces between components, are still essential skills. One difference is that now, for each of those smaller problems, programmers need to decide whether it would be easier and faster to solve it themselves or start a conversation with an LLM.

After deciding whether to start a conversation, the next big question is how to compose the prompt. In particular, it makes a big difference how much information is included as a preamble. Just as we all got better at composing search terms, we’ll get better at composing prompts. (I have a hard time not being polite to ChatGPT. I wonder if that will persist, or we’ll start writing blunt imperatives.)

And a final question is when to stop a conversation and work with the code you have, or ask for further refinements. In my experiments, it felt like I reached a point of diminishing returns, where further refinements were likely to introduce new errors. On the other hand, asking for at least two versions of a function produced useful variations.

At least for now, we cannot assume that code produced by an LLM is correct, which means it needs extensive testing. People who are used to test-driven development (TDD) will have a head start with LLM-assisted programming. Of course, we can use LLMs to generate unit tests as well, but then we have to validate the unit tests, too.

Which brings me to what I think will be the most important skill for LLM-assisted programming: reading code. LLMs can generate code much faster than we can understand it, so the ability to read, understand, and check code will be critical.

The other skill that will become more important is meta-language, that is, the vocabulary we use to talk about programs. In my correlation in Elm example, I asked ChatGPT to “factor out the anonymous function”, and it new exactly what I meant. In general, it seems to understand the meta-language of programming well, so it will be useful if we can speak it.

Most of the skills programmers need to work with LLMs are the skills they already have, but some of them will become more important, especially problem decomposition, reading code, and speaking the meta-language of programming.

How Many Typos?

How Many Typos?

When I started work at Brilliant a couple of weeks ago, I learned that one of my new colleagues, Michelle McSweeney, just published a book called OK, which is all about the word OK.

As we discussed the joys and miseries of publishing, Michelle mentioned that she had found a typo in the book after publication. So naturally I took it as a challenge to find the typo. While I was searching, I enjoyed the book very much. If you are interested in etymology, linguistics, and history, I recommend it!

As it turned out, I found exactly one typo. When I told Michelle, she asked me nervously which page it was on. Page 17. She looked disappointed – that was not the same typo she found.

Now, for people who like Bayesian statistics, this scenario raises some questions:

  1. After our conversation, how many additional typos should we expect there to be?
  2. If she and I had found the same typo, instead of different ones, how many typos would we expect?

As it happens, I used a similar scenario as an example in Think Bayes. So I was able to reuse some code and answer these questions.

You can read my solution here.

You can also click here to run the notebook with the solution on Colab.