Probability

See also: numbers, Permutations and Combinations

Bayes' Theorem

Bayes' Theorem, named after Rev. Thomas Bayes, describes the relationship between the conditional probability of two events A and B as follows:

P(A | B) = P(B | A) P(A)
P(B)

or

P(A | B) = P(B | A) P(A)
P(B | A)P(A) + P(B | A)P(A)

Example of Bayes' Theorem application

In a certain country, it is known that 2% of the population suffer from a certain disease. A clinical test yields a correct positive result of a person carrying the disease 97% of the cases. However, when a person not carrying the disease is subjected to the same test, 9% of all the cases, the test will yield a false positive result.

If a random person from that country takes the test and get a positive result, what is the probability that he is really carrying the disease?

Intuitively, we may think that there's a very good chance that he's carrying the disease, since the test yield 97% correct positive result. However, let's look at the math.

Let us represent the information above as follows:

We also know the following probabilities

Using the formula for conditional probability, we can then summarize the probabilities of the events in the following table:

A (2%)A (98%)
BTrue positive
P (BA) = P (A) × P (B | A) = 2% × 97% = 0.0194
False positive
P (BA) = P (A) × P (B | A) = 98% × 9% = 0.0882
BFalse negative
P (BA) = P (A) × P (B | A) = 2% × 3% = 0.0006
True negative
P (BA) = P (A) × P (B | A) = 98% × 91% = 0.8918

Now, suppose a person was tested positive, what is the probability that he is really carrying the disease?
In other word, we are trying to find the probability of A, given B or P (A | B).

From the table above, we can see that P (A | B) is really the probability of true positive divided by the probability of getting any positive result. That is 0.0194 / (0.0194 + 0.0882) = 0.1803.

We can also get this result by using the above Bayes' Theorem formula:

P(A | B) = P(BA)
P(B)
= P(B | A) × P(A)
P(B | A)P(A) + P(B | A)P(A)
= 97% × 2%
(97% × 2%) + (9% × 98%)
= 0.0194
0.0194 + 0.0882
= 0.0194
0.1076
P(A | B) = 0.1803

This result may seems counter-intuitive. The probability that the person tested positive is actually carrying the disease is not as high as we may think. There's only about 18% chance that the person is really carrying the disease.

Why is this so?

In estimating the probability, we often forget that the percentage of the population carrying the disease is small in the first place (i.e. only 2% of the population suffers the disease). So even though the test is highly accurate for people with the disease, it doesn't mean that if the person is tested positive, there will be a high chance that he or she is really carrying the disease.

We can look at it this way. Suppose there are only 1000 people in that country. Only 20 people suffer the disease (2%). 19 of those 20 will get a positive test result (97% true positive result). Of the other 980 people not suffering the disease, about 88 people (9% false positive result) will also get positive test result.

So from the 1000 people, we can group them as follows

From the above, we gather that about (88 + 19) = 107 people will get positive results (regardless of whether they suffer the disease or not). From these 107, how many of them are really carrying the disease? Only 19 out of 107 (or about 18%).

By Jimmy Sie

Previous:
Conditional Probability

See also: numbers, Permutations and Combinations