Sensitivity and specificity

Sensitivity and specificity are the most widely used statistics used to describe a diagnostic test. Unfortunately, as we learned from the example of interpreting a mammogram above, they are not very helpful to clinicians trying to revise the probability of disease. Reviewing the definitions of prevalence, sensitivity, and specificity will help us understand why:

As clinicians, though, we don’t generally know whether or not the patient has disease; that’s why we’re ordering the test in the first place! Thus, sensitivity and specificity do not give us the information we need to interpret the test results.

What do we want to know? Ideally, we’d like to know what the probability of disease is given a positive or negative test. Reverend Bayes first described a way to do this in the late 1800's, when he developed an equation to relate the probability of disease before ordering the test (the pre-test probability) and the probability of disease given a positive or negative test (the post-test probability). Bayes’ equation is shown below:

Probability of disease given a positive test =
                                    (prevalence x sensitivity)                                
            ((prevalence x sensitivity) + ((1-prevalence) x (1-specificity)))

Yuck!  Clearly, this is not an equation you can carry around in your head! It is also not a simple transformation, which explains why it is hard to "guesstimate" the post-test probability from the sensitivity and specificity (remember our mammography example).

Take a look at the "2 x 2" table below:

 

Patients with disease

Patients without disease

Test is positive

a

b

Test is negative

c

d

We’ll refer to similar tables in other discussions, so it’s a good idea to get familiar with how they work. Using the above table, the definitions of sensitivity and specificity can also be written as:

sensitivity = a / (a+c)

specificity = d / (b+d)

Are sensitivity and specificity ever helpful? Occasionally. A very high sensitivity, when negative, rules out disease. For example, consider the complaint of "dyspnea on exertion" in the diagnosis of congestive heart failure (a critical appraisalwpe1.jpg (883 bytes) of this article is available):

 

CHF

no CHF

Dyspnea on exertion

41

183

No DOE

0

35

The sensitivity of dyspnea on exertion for the diagnosis of CHF is 100% (41/(41+0)), and the specificity 17% (35/(183+35)). If negative (the patient does not complain of dyspnea on exertion), it is very unlikely that they have CHF (0 out of 41 patients with CHF did not have this symptom). An easy way to remember this rule of thumb is the acronym "SnNOut", which is taken from the phrase: "Sensitive test when Negative rules Out disease".

Conversely, a very specific test, when positive, rules in disease. Not surprisingly, the acronym for this kind of test is "SpPIn"!  Consider a gallop (S3) murmur in the diagnosis of congestive heart failure, with data taken from the same study:

 

CHF

no CHF

Gallop (S3) murmur

10 3

No gallop murmur

31 215

The sensitivity of gallop for CHF is only 24% (10/41), but the specificity is 99% (215/218).  Thus, if a patient has a gallop murmur, they probably have CHF (10 out of 13).

Let's consider one more example, this time of a combination of symptoms:

 

CHF

no CHF

Displaced apex and JVD, gallop, rales, or edema

18 3

No displaced apex, or displaced apex alone

23 215

So, the sensitivity is:

            1%    10%    44%    56%    86%    99%

And the specificity is:

            1%    10%    44%    56%    86%    99%

Thus, another SpPIn:  patients with a displaced apex and either JVD, gallop, rales, or edema were very likely to have CHF (18 out of 21 or 86%).

Thus, sensitivity and specificity by themselves are only useful when either is very high (over typically, 95% or higher). There is a useful Web site with a collection of SpPIns and SnNOuts herewpe2.jpg (883 bytes). In the next section, we’ll learn about predictive values, which are more useful to clinicians than sensitivity and specificity.