the plan for the future

Birthday problem

As on 19 September 2013

Taken from: Wikipedia - Birthday problem

Note: This is not the complete text as it was on Wikipedia.


In probability theory, the birthday problem or birthday paradox[1] concerns the probability that, in a set of n randomly chosen people, some pair of them will have the same birthday. By the pigeonhole principle, the probability reaches 100% when the number of people reaches 367 (since there are 366 possible birthdays, including February 29). However, 99% probability is reached with just 57 people, and 50% probability with 23 people. These conclusions include the assumption that each day of the year (except February 29) is equally probable for a birthday.

The mathematics behind this problem led to a well-known cryptographic attack called the birthday attack, which uses this probabilistic model to reduce the complexity of cracking a hash function.

Understanding the problem

The birthday problem asks whether one of the people in a given group has a birthday matching any of the others — not any two people. (See "Same birthday as you" below for an analysis of this much less surprising alternative problem.)

In the example given earlier, a list of 23 people, comparing the birthday of the first person on the list to the others allows 22 chances for a matching birthday, the second person on the list to the others allows 21 chances for a matching birthday, third person has 20 chances, and so on. Hence total chances are: 22+21+20+....+1 = 253, so comparing every person to all of the others allows 253 distinct chances (combinations): in a group of 23 people there are math expression pairs.

Presuming all birthdays are equally probable,[2] the probability of a given birthday for a person chosen from the entire population at random is 1/365 (ignoring Leap Day, February 29). Although the pairings in a group of 23 people are not statistically equivalent to 253 pairs chosen independently, the birthday paradox becomes less surprising if a group is thought of in terms of the number of possible pairs, rather than as the number of individuals.

Calculating the probability

The problem is to compute the approximate probability that in a room of n people, at least two have the same birthday. For simplicity, disregard variations in the distribution, such as leap years, twins, seasonal or weekday variations, and assume that the 365 possible birthdays are equally likely. Real-life birthday distributions are not uniform since not all dates are equally likely.[2]

If P(A) is the probability of at least two people in the room having the same birthday, it may be simpler to calculate P(A'), the probability of there not being any two people having the same birthday. Then, because A and A' are the only two possibilities and are also mutually exclusive, P(A) = 1 − P(A').

In deference to widely published solutions concluding that 23 is the number of people necessary to have a P(A) that is greater than 50%, the following calculation of P(A) will use 23 people as an example.

When events are independent of each other, the probability of all of the events occurring is equal to a product of the probabilities of each of the events occurring. Therefore, if P(A') can be described as 23 independent events, P(A') could be calculated as P(1) × P(2) × P(3) × ... × P(23).

The 23 independent events correspond to the 23 people, and can be defined in order. Each event can be defined as the corresponding person not sharing his/her birthday with any of the previously analyzed people. For Event 1, there are no previously analyzed people. Therefore, the probability, P(1), that person number 1 does not share his/her birthday with previously analyzed people is 1, or 100%. Ignoring leap years for this analysis, the probability of 1 can also be written as 365/365, for reasons that will become clear below.

For Event 2, the only previously analyzed people is Person 1. Assuming that birthdays are equally likely to happen on each of the 365 days of the year, the probability, P(2), that Person 2 has a different birthday than Person 1 is 364/365. This is because, if Person 2 was born on any of the other 364 days of the year, Persons 1 and 2 will not share the same birthday.

Similarly, if Person 3 is born on any of the 363 days of the year other than the birthdays of Persons 1 and 2, Person 3 will not share their birthday. This makes the probability P(3) = 363/365.

This analysis continues until Person 23 is reached, whose probability of not sharing his/her birthday with people analyzed before, P(23), is 343/365.

P(A') is equal to the product of these individual probabilities:

(1) P(A') = 365/365 × 364/365 × 363/365 × 362/365 × ... × 343/365

The terms of equation (1) can be collected to arrive at:

(2) P(A') = (1/365)23 × (365 × 364 × 363 × ... × 343)

Evaluating equation (2) gives P(A') ≈ 0.492703

Therefore, P(A) ≈ 1 − 0.492703 = 0.507297 (50.7297%)

This process can be generalized to a group of n people, where p(n) is the probability of at least two of the n people sharing a birthday. It is easier to first calculate the probability p(n) that all n birthdays are different. According to the pigeonhole principle, p(n) is zero when n > 365. When n ≤ 365:

math expression

where ' ! ' is the factorial operator, math expression is the binomial coefficient and math expression denotes permutation.

The equation expresses the fact that the first person has no one to share a birthday, the second person cannot have the same birthday as the first (364/365), the third cannot have the same birthday as the first two (363/365), and in general the nth birthday cannot be the same as any of the n − 1 preceding birthdays.

The event of at least two of the n persons having the same birthday is complementary to all n birthdays being different. Therefore, its probability p(n) is

math expression

This probability surpasses 1/2 for n = 23 (with value about 50.7%). The following table shows the probability for some other values of n (this table ignores the existence of leap years, as described above):

300(100 − (3×10−80))%
350(100 − (3×10−129))%
365(100 − (1.45×10−155))%

Abstract proof

Here we prove the same result as above, but with results about sets and functions to provide a simpler proof.

Firstly, define math expression to be a set of N people and let math expression be the set of dates in a year.

Define the birthday function math expression to be the map that sends a person to their birthdate. Then it is obvious that everyone in math expression has a unique birthday if and only if the birthday function is injective.

Now we consider how many functions, and how many injective functions, exist between math expression and math expression.

Since math expression and math expression, it follows that there are math expression possible functions,[3] and math expression possible injective functions (see Twelvefold way#case i).

Let A be the statement "Everybody in the set math expression has a unique birthday" (so P(A') is what we are actually looking for). By definition, P(A) is the fraction of injective functions out of all possible functions (ie, the probability of the birthday function being one that assigns only one person to each birthdate), which gives math expression.

Hence, math expression


1. This is not a paradox in the sense of leading to a logical contradiction, but is called a paradox because the mathematical truth contradicts naïve intuition: an intuitive guess would suggest that the chance of two individuals sharing the same birthday in a group of 23 is much lower than 50%, but the birthday problem demonstrates that this is not the case.
2. a b In reality, birthdays are not evenly distributed throughout the year; there are more births per day in some seasons than in others, but for the purposes of this problem the distribution is treated as uniform. In particular, many children are born in the summer, especially the months of August and September (for the northern hemisphere) [1], and in the U.S. it has been noted that many children are conceived around the holidays of Christmas and New Year's Day. Also, because hospitals rarely schedule C-sections and induced labor on the weekend, more Americans are born on Mondays and Tuesdays than on weekends; where many of the people share a birth year (e.g. a class in a school), this creates a tendency toward particular dates. In Sweden 9.3% of the population is born in March and 7.3% in November when a uniform distribution would give 8.3% Swedish statistics board. See also: Murphy, Ron. "An Analysis of the Distribution of Birthdays in a Calendar Year". Retrieved 2011-12-27.
Mathers, C D; R S Harris (1983). "Seasonal Distribution of Births in Australia". International Journal of Epidemiology 12 (3): 326–331. doi:10.1093/ije/12.3.326. PMID 6629621. Retrieved 2011-12-27.
These factors tend to increase the chance of identical birth dates, since a denser subset has more possible pairs (in the extreme case when everyone was born on three days, there would obviously be many identical birthdays). The birthday problem for such non-constant birthday probabilities was first understood by Murray Klamkin in 1967. A formal proof that the probability of two matching birthdays is least for a uniform distribution of birthdays was given by D. Bloom (1973).
3. "Sets and Functions". p. 18.