Handicapping pub trivia

# Handicapping pub trivia

### Introduction

If there is a quiz of `x` questions with varying results between teams of different sizes, how could you logically handicap the larger teams to bring some sort of equivalence in performance measure?

[Suppose there are] 25 questions and a team of two scores 11/25. A team of 4 scores 17/25. Who did better […]?

One respondent suggested a binomial model, in which every player has the same probability of answering any question correctly.

I suggested a model based on item response theory, in which each question has a level of difficulty, `d`, each player has a level of efficacy `e`, and the probability that a player answers a question is

``expit(e-d+c)``

where `c` is a constant offset for all players and questions and `expit` is the inverse of the logit function.

Another respondent pointed out that group dynamics will come into play. On a given team, it is not enough if one player knows the answer; they also have to persuade their teammates.

I wrote some simulations to explore this question. You can see a static version of my notebook here, or you can run the code on Colab.

I implement a binomial model and a model based on item response theory. Interestingly, for the scenario in the question they yield opposite results: under the binomial model, we would judge that the team of two performed better; under the other model, the team of four was better.

In both cases I use a simple model of group dynamics: if anyone on the team gets a question, that means the whole team gets the question. So one way to think of this model is that “getting” a question means something like “knowing the answer and successfully convincing your team”.

Anyway, I’m not sure I really answered the question, other than to show that the answer depends on the model.