.. _chap-birthday-paradox: ================== Birthday paradox ================== [status: usable-but-incomplete] To get started ============== We start by going around the room and seeing how many people there are. Then we ask everyone to estimate "what's the chance that at least two people have the same birthday?" We look at the estimates and see how they vary, and ask students why they picked the probability they did. Then we talk about picking *one person* and asking what's the probability that someone else has their same birthday. How is that a different question, and would the probability is the same, smaller, or bigger? A practical demonstration ========================= Look at the code in :numref:`listing-birthdays-py`: .. _listing-birthdays-py: .. literalinclude:: birthdays.py :language: python :caption: Simulate a party with several people and calculate the probability that two of them share a birthday. .. .. command-output:: python3 birthday-paradox/birthdays.py :ellipsis: 0, 20 The result of running birthdays.py: this calculates the probability that two people at a party will share the same birthday for party. We put in population sizes of 0 to 50, but you can obviously extend that all the way to 365 or more. We can plot this output with: :: python3 birthdays.py > bday_prob.out and plot with: :: gnuplot # then at the gnuplot> prompt type: plot 'bday_prob.out' using 2:7 with linespoints The theory ========== The calculation is a bit involved, but it is a very good example of the field of *combinatorics,* and it teaches us a lot about how to count things. Since this example was clearly counterintuitive, the exercise of counting well is a good one. The simplifying idea is to first calculate the opposite: the probability that *no* two people have the same birthday. Imagine a sequence of events in which people enter an empty room. For one person the probability of no duplicate birthdays is 1, which is the same as :math:`365/365 = 1.0`. Let us say you have 2 people, then the probability that the second person does not have the same birthday as the first is :math:`364/365 \approx 0.99726`. Add a 3rd person and you have :math:`363/365 \approx 0.99452` chance that the newcomer's birthday does not match one of the other two. To get the probability of no matches with 3 people you take the *join* probability of the 3 situations: 1.0 when the first person enters the room, times the probabilities of no matches with each successive entry into the room: .. math:: P_{\rm no\_dup}(n) = \frac{365}{365} \times \frac{364}{365} \times \frac{363}{365} \approx 0.991795 \\ P_{\rm duplicates}(n) = 1 - P({\rm no\_dup}) \approx 1 - 0.991795 \approx 0.00820417 The general formula after :math:`n` people have entered the room is: .. math:: P_{\rm no\_dup}(n) & = & \frac{n}{n} & \times \frac{n-1}{n} \times \frac{n-2}{n} \times \dots \times \frac{1}{365} \\ & = & (space) & \frac{365 \times (365-1) \times \dots \times (365 - n + 1)}{365^n} Using the definition of factorial, we see that the numerator is: .. math:: {\rm numerator} = & 365 \times 364 \times 363 \times ... \times (365-n+1) = \frac{365!}{(365-n)!} \\ & = n! \times { 365 \choose n} where we used the mathematical "choose" notation: .. math:: {n \choose k} & = & \frac{n (n-1) \dots (n - k + 1)}{k (k-1) \dots 1} \\ & = & \frac{n!}{k!(n-k)!} So our formula is: .. math:: P_{\rm no\_dup}(n) = \frac{n! \times {365 \choose n}}{365^n} \\ P_{\rm duplicates}(n) = 1 - \frac{n! \times {365 \choose n}}{365^n} Do we believe this? Well, let us compare it to our monte carlo calculation for the tipping point of :math:`n = 23`: .. math:: P_{\rm duplicates}(23) = 1 - \frac{365! \times {365 \choose 23}}{365^{23}} \approx 1 - 0.492703 \approx 0.507297 Take-home ========= What we have learned: - Simulate a situation (in this case people sharing birthdays). - Calculate the probability of an event with a random component. We do this by running the event many times and averaging the outcome. - Jargon: you could think of this as a very simple example of a "monte carlo" simulation.