24. Birthday paradox
[status: usable-but-incomplete]
24.1. To get started
We start by going around the room and seeing how many people there are. Then we ask everyone to estimate “what’s the chance that at least two people have the same birthday?”
We look at the estimates and see how they vary, and ask students why they picked the probability they did.
Then we talk about picking one person and asking what’s the probability that someone else has their same birthday.
How is that a different question, and would the probability is the same, smaller, or bigger?
24.2. A practical demonstration
Look at the code in Listing 24.2.1:
#! /usr/bin/env python3
"""Simulates the birthday paradox. You set how many people
are at the party. The program will assign a random birthday
to each person, and then calculate how mahy duplicate
birthdays are in that group."""
import random
def main():
n_people = 25
birthdays = make_birthdays(n_people) # randomly distributed
print('## Single example with %d people:' % n_people)
print('## birthdays:', birthdays)
print('## birthdays_sorted:', sorted(birthdays))
n_doubles, n_triplets = count_multiples(birthdays)
print(f'## doubles: {n_doubles} triplets: {n_triplets}')
# return # stop here for the simplest analysis
n_tries = 200
print('## running %d tries' % n_tries)
for n_people in range(100):
avg_doubles, avg_triplets = get_average_multiples(n_people, n_tries)
print(f'n_people: {n_people} -- frac_doubles: {avg_doubles} frac_triplets: {avg_triplets}')
def make_birthdays(n_people):
"""Generate a birthday for each person, but we do it the easy way: a
number from 1 to 365, so we don't handle leap years.
"""
# the list "birthdays" will store the birthday of each person
birthdays = [0]*n_people
for person in range(n_people):
day = random.randint(1, 365) # generate random b-day
birthdays[person] = day # store it for that person
return birthdays
def count_multiples(bdays):
"""Goes through the birthday list and sees if any two people
have the same birthday. Returns how many times we find
duplicate birthdays."""
n_people = len(bdays)
count_doubles = 0
count_triplets = 0
for person in range(n_people):
this_bday = bdays[person]
# we can look at just the "further on" entries in this list
# since we have already looked for duplicates before this point.
for other_dude in range(person + 1, n_people):
other_bday = bdays[other_dude]
# now see if two people have the same birthday
if this_bday == other_bday:
# we have a duplicate!
count_doubles += 1
# now look for a third dude; once
# again I can look beyond this point
for third_dude in range(other_dude + 1, n_people):
third_bday = bdays[third_dude]
if third_bday == other_bday:
count_triplets += 1 # I have a third!!
return count_doubles, count_triplets
def get_average_multiples(n_people, n_tries):
"""Estimates an expectation of finding people with the same birthday.
Returns the probability of two and of three people having the same
birthday.
"""
n_with_doubles = 0
n_with_triplets = 0
for i in range(n_tries):
bdays = make_birthdays(n_people)
n_doubles, n_triplets = count_multiples(bdays)
if n_doubles >= 1:
n_with_doubles += 1
if n_triplets >= 1:
n_with_triplets += 1
avg_doubles = (1.0*n_with_doubles) / n_tries
avg_triplets = (1.0*n_with_triplets) / n_tries
return avg_doubles, avg_triplets
main()
The result of running birthdays.py: this calculates the probability that two people at a party will share the same birthday for party. We put in population sizes of 0 to 50, but you can obviously extend that all the way to 365 or more.
We can plot this output with:
python3 birthdays.py > bday_prob.out
and plot with:
gnuplot
# then at the gnuplot> prompt type:
plot 'bday_prob.out' using 2:7 with linespoints
24.3. The theory
The calculation is a bit involved, but it is a very good example of the field of combinatorics, and it teaches us a lot about how to count things. Since this example was clearly counterintuitive, the exercise of counting well is a good one.
The simplifying idea is to first calculate the opposite: the probability that no two people have the same birthday.
Imagine a sequence of events in which people enter an empty room.
For one person the probability of no duplicate birthdays is 1, which is the same as \(365/365 = 1.0\).
Let us say you have 2 people, then the probability that the second person does not have the same birthday as the first is \(364/365 \approx 0.99726\).
Add a 3rd person and you have \(363/365 \approx 0.99452\) chance that the newcomer’s birthday does not match one of the other two.
To get the probability of no matches with 3 people you take the join probability of the 3 situations: 1.0 when the first person enters the room, times the probabilities of no matches with each successive entry into the room:
The general formula after \(n\) people have entered the room is:
Using the definition of factorial, we see that the numerator is:
where we used the mathematical “choose” notation:
So our formula is:
Do we believe this? Well, let us compare it to our monte carlo calculation for the tipping point of \(n = 23\):
24.4. Take-home
What we have learned:
Simulate a situation (in this case people sharing birthdays).
Calculate the probability of an event with a random component. We do this by running the event many times and averaging the outcome.
Jargon: you could think of this as a very simple example of a “monte carlo” simulation.