# 22. Birthday paradox

[status: usable-but-incomplete]

## 22.1. To get started

We start by going around the room and seeing how many people there are. Then we ask everyone to estimate “what’s the chance that at least two people have the same birthday?”

We look at the estimates and see how they vary, and ask students why they picked the probability they did.

Then we talk about picking *one person* and asking what’s the
probability that someone else has their same birthday.

How is that a different question, and would the probability is the same, smaller, or bigger?

## 22.2. A practical demonstration

Look at the code in Listing 22.2.1:

```
#! /usr/bin/env python3
"""Simulates the birthday paradox. You set how many people
are at the party. The program will assign a random birthday
to each person, and then calculate how mahy duplicate
birthdays are in that group."""
import random
def main():
n_people = 25
birthdays = make_birthdays(n_people) # randomly distributed
print('## Single example with %d people:' % n_people)
print('## birthdays:', birthdays)
print('## birthdays_sorted:', sorted(birthdays))
n_doubles, n_triplets = count_multiples(birthdays)
print(f'## doubles: {n_doubles} triplets: {n_triplets}')
# return # stop here for the simplest analysis
n_tries = 200
print('## running %d tries' % n_tries)
for n_people in range(100):
avg_doubles, avg_triplets = get_average_multiples(n_people, n_tries)
print(f'n_people: {n_people} -- frac_doubles: {avg_doubles} frac_triplets: {avg_triplets}')
def make_birthdays(n_people):
"""Generate a birthday for each person, but we do it the easy way: a
number from 1 to 365, so we don't handle leap years.
"""
# the list "birthdays" will store the birthday of each person
birthdays = [0]*n_people
for person in range(n_people):
day = random.randint(1, 365) # generate random b-day
birthdays[person] = day # store it for that person
return birthdays
def count_multiples(bdays):
"""Goes through the birthday list and sees if any two people
have the same birthday. Returns how many times we find
duplicate birthdays."""
n_people = len(bdays)
count_doubles = 0
count_triplets = 0
for person in range(n_people):
this_bday = bdays[person]
# we can look at just the "further on" entries in this list
# since we have already looked for duplicates before this point.
for other_dude in range(person + 1, n_people):
other_bday = bdays[other_dude]
# now see if two people have the same birthday
if this_bday == other_bday:
# we have a duplicate!
count_doubles += 1
# now look for a third dude; once
# again I can look beyond this point
for third_dude in range(other_dude + 1, n_people):
third_bday = bdays[third_dude]
if third_bday == other_bday:
count_triplets += 1 # I have a third!!
return count_doubles, count_triplets
def get_average_multiples(n_people, n_tries):
"""Estimates an expectation of finding people with the same birthday.
Returns the probability of two and of three people having the same
birthday.
"""
n_with_doubles = 0
n_with_triplets = 0
for i in range(n_tries):
bdays = make_birthdays(n_people)
n_doubles, n_triplets = count_multiples(bdays)
if n_doubles >= 1:
n_with_doubles += 1
if n_triplets >= 1:
n_with_triplets += 1
avg_doubles = (1.0*n_with_doubles) / n_tries
avg_triplets = (1.0*n_with_triplets) / n_tries
return avg_doubles, avg_triplets
main()
```

The result of running birthdays.py: this calculates the probability that two people at a party will share the same birthday for party. We put in population sizes of 0 to 50, but you can obviously extend that all the way to 365 or more.

We can plot this output with:

```
python3 birthdays.py > bday_prob.out
```

and plot with:

```
gnuplot
# then at the gnuplot> prompt type:
plot 'bday_prob.out' using 2:7 with linespoints
```

## 22.3. The theory

The calculation is a bit involved, but it is a very good example of
the field of *combinatorics,* and it teaches us a lot about how to
count things. Since this example was clearly counterintuitive, the
exercise of counting well is a good one.

The simplifying idea is to first calculate the opposite: the
probability that *no* two people have the same birthday.

Imagine a sequence of events in which people enter an empty room.

For one person the probability of no duplicate birthdays is 1, which is the same as \(365/365 = 1.0\).

Let us say you have 2 people, then the probability that the second person does not have the same birthday as the first is \(364/365 \approx 0.99726\).

Add a 3rd person and you have \(363/365 \approx 0.99452\) chance that the newcomer’s birthday does not match one of the other two.

To get the probability of no matches with 3 people you take the *join*
probability of the 3 situations: 1.0 when the first person
enters the room, times the probabilities of no matches with each
successive entry into the room:

The general formula after \(n\) people have entered the room is:

Using the definition of factorial, we see that the numerator is:

where we used the mathematical “choose” notation:

So our formula is:

Do we believe this? Well, let us compare it to our monte carlo calculation for the tipping point of \(n = 23\):

## 22.4. Take-home

What we have learned:

Simulate a situation (in this case people sharing birthdays).

Calculate the probability of an event with a random component. We do this by running the event many times and averaging the outcome.

Jargon: you could think of this as a very simple example of a “monte carlo” simulation.