 Okay, let’s start with a silly hypothetical. Suppose one person in every 100 (1%) is actually an alien from a distant planet. And further imagine that a brilliant scientist develops a test which identifies real aliens 100% of the time. But it also incorrectly flags 5% of normal people as aliens too (false positives).

Question: If someone tests positive, what’s the probability that they’re a true alien and not a false positive?

95% maybe?

Well, let’s work it out. In a random sample of 100 people, we’d expect the test to identify one bonafide alien (1%) along with roughly five false positives (5%). That means that 5 out of the 6 positive tests are wrong. So, the probability of getting it right is only 1 out of 6, or a mere 16%. That’s terrible! The picture above tells the story: of the six non-blue balls, only one is green. You’ll get the same outcome regardless of the sample size too. The ratios are always the same.

Next, suppose they discover an ET enclave (Area 51?) where 40% of the population are aliens in disguise. But before blindly diving into more frequency counting, we should point out an important nuance. The group with false positives is restricted to those who really aren’t aliens, so we need to adjust our false positive count by multiplying by this probability (not alien = 60%). This was true in the previous example too but we glossed over it since it was 99%. Similarly, the number of positives needs to be weighted by the probability of the test detecting a positive, which here is 100% and thus the same. More on this important detail later when we introduce Bayes’ Theorem.

With that in mind, pick a random group of 100 people from this population. Since 60% are likely normal, we’d expect our test to now record only three false positives: $60\%* 5\%* 100 = 3$. Adding this to the 40 positives, the likelihood of nabbing an alien has increased to 40/43 or ~93%. In other words, the same test now produces way better results. Hmm… If any of this surprises you, then welcome to the base rate fallacy (a.k.a., base rate bias/neglect). Or, put another way: why screening matters. These two examples should demonstrate that if the probability of a true match is outweighed by the likelihood of a false positive then an otherwise decent test isn’t going to produce very good results.

Before continuing, it’s useful to identify a few terms that arise in examples like this. Then we’ll show a few more examples. Then a bunch of math.

#### Terminology

The probability of an occurrence in a population is referred to as the prevalenceprior or base rate. In the first example, the prevalence is 1 in 100 or 1%. In the second example it rises to 40%.

The thing we’re mostly interested in is the probability of a true positive which is called the positive predictive value (PPV). In terms of frequency counting, it’s just the ratio of true positives to total positives: $= {TP \over TP + FP} * 100\%$.

Next, the likelihood of a test correctly identifying a true positive is called the sensitivity, which is $= {TP \over TP + FN} * 100\%$. Anything less than 100% indicates the possibility of false negatives. In our example, the false negative rate is 0%: the test always correctly identifies the real aliens. But wait, you say! It also misclassified a few humans as aliens too. How can that be 100%? With respect to its ability to detect true aliens, it did so 100% of the time. That is, if you’re from another planet, the test will find you. The errors are false positives, not false negatives, which is described next.

The likelihood of a test correctly identifying a true negative is called the specificity, which is $= {TN \over TN + FP} * 100\%$. Anything less than 100% indicates false positives. This one may require a moment of thought. A true negative means that a negative case will not test positive: thus its failures are false positives. In our example, the false positive rate is 5%.

Putting these together consider the following example. A firefighter can always identify houses that are on fire but he/she may also a confuse a house with a smoky fireplace as being on fire too. That burning houses are perfectly identified means that firefighters have a 100% sensitivity to fires. That they occasionally mistaken chimney smoke for house fires means their specificity is less than 100%.

One more point before the examples. One can always construct a test with 100% sensitivity by simply providing a positive result for all test subjects. But the specificity of that test would be zero. For example, a test that blindly identifies everyone as an alien will never miss a real alien (100% sensitivity) but unfortunately it generates false positives for everyone else (0% specificity).

#### Disease Examples

For some of these examples we defer the mathematics behind the result until the next section.

An open screening for a low prevalence disease, say 1%, is performed using a test with perfect sensitivity and nearly perfect specificity, say 99%. If 200 people are screened then we’d expect to find two true positives and another two false positives. This means that of the four positives, half are wrong, i.e., the PPV is only 50% despite the high specificity and sensitivity of the test itself.

Similarly, if we assume 2% of the population has Covid-19 and we test at random using the swab antigen test with 75% sensitivity and 95% specificity, then the PPV is only 23% — i.e., only roughly 1 in 5 positive test results will be true positives. Once again, the test won’t be very good at predicting disease.

But if you pre-screen (fever, breathing issues, etc.) and assume a 70% probability of infection, then the PPV goes up to 97.2% for the same test.

On the flip-side, for the same screened population, the false negative rate is 38%. That means almost 2 in 5 sick people test negative. So maybe this is worse?

To play around with these numbers visit this site: test calculator. To learn how these numbers are computed, keep reading.

Finally, a recent New York Times OpEd raised a concern surrounding antibody testing. For this test we don’t want to screen, we just want an indication of how prevalent the disease is in our population. Unfortunately the base rate fallacy makes this difficult for all the reasons described above. What to do? Perform the test multiple times? Only test those with known exposure? Call it an open question.

#### Takeaways

Prevalence (prior) is the key aspect of the base rate fallacy. For the same test, an increase in the prior (by screening, for example), will also increase the PPV and thus the quality of the results. We demonstrated this using frequency analysis but it can also be shown using Bayes’ Theorem, which we do below. Improvements in test sensitivity and specificity also help, of course.

As it pertains to Covid-19, if you’re having symptoms, just assume you’re sick. Quarantine. Contact your doctor. Monitor your symptoms.

#### Math Details

Now for the good stuff. We can use Bayes’ Theorem to be more specific about the math used in these examples, and get the exact value computed by the calculator above.

$D$  has disease
$D’$ does not have disease
$T+$ tests positive
$T-$  tests negative

Bayes’ Theorem lets you solve conditional probabilities for things we don’t know in terms of that which we do. The conditional probability we’re interested in is the probability of disease given a positive test result. This is our PPV which we denote $P(D|T+)$

Using Bayes’ Theorem we can express the PPV as follows:

$PPV = P(D|T+) = \frac{P(D) P(T+|D)}{P(T+)}$

As it pertains to the probability of a positive test result $P(T+)$, either the person has the disease and the test worked, or they don’t and the test reported a false positive. Using the law of total probability we can express this as:

$P(T+) = P(D) P(T+|D) + P(D’) P(T+|D’)$

Or in words:

probability of a positive test result =
prior * sensitivity + (1 – prior)* (false positive rate)

Replacing that into Bayes’ Theorem

$P(D|T+) = \frac{P(D) P(T+|D)}{P(D) P(T+|D) + P(D’) P(T+|D’)}$

Back to our very first example where the sensitivity was 100%, the FPR was 5% and the prior 1%. Using this formula we can compute the PPV:

$0.01/(0.01 + .99*0.05) = 0.168$, as expected.

And the Covid-19 example with a 70% prior, 75% sensitivity and 95% specificity, works out to

$(0.7 * 0.75 )/(0.7*0.75 + 0.3*0.05) = 0.972$, also as expected.

Related to all this is the inverse question. How likely are false negatives? This can be framed using the same Bayesian approach which leads to the so-called Negative Predictive Value (NPV). That is, the probability that a negative value is “true.”

$NPV = P(D’|T-) = \frac{P(D’) P(T-|D’)}{P(D) P(T-|D) + P(D’) P(T-|D’)}$

in words,

(1-prior)*specificity/ ((1-prior)*specificity + prior * (1 – sensitivity))

The quantity we’re interested in is then 1 – NPV, which is the probability of false negatives.

The Covid-19 false negative example from before:

$1 – \frac{(1 – 0.70)*.95}{(1 – 0.70)*.95 + .70*(1 – 0.75)} = 1 – 0.6195 \approx 38\%$ as expected.

Using all the same frequency counting arguments above, it follows that as the prior increases, the number of negative tests will drop and with that so will the proportion of true negatives. This causes the false negative rate to increase with prevalence, as shown in the previous Covid-19 example. In graphical form, the two relations look like this: 