Sensitivity and specificity are the most widely used statistics used to describe a diagnostic test. Unfortunately, as we learned from the example of interpreting a mammogram above, they are not very helpful to clinicians trying to revise the probability of disease. Reviewing the definitions of prevalence, sensitivity, and specificity will help us understand why:
As clinicians, though, we dont generally know whether or not the patient has disease; thats why were ordering the test in the first place! Thus, sensitivity and specificity do not give us the information we need to interpret the test results.
What do we want to know? Ideally, wed like to know what the probability of disease is given a positive or negative test. Reverend Bayes first described a way to do this in the late 1800's, when he developed an equation to relate the probability of disease before ordering the test (the pre-test probability) and the probability of disease given a positive or negative test (the post-test probability). Bayes equation is shown below:
Probability of disease given a positive test =
(prevalence x sensitivity)
((prevalence x sensitivity) + ((1-prevalence) x (1-specificity)))
Yuck! Clearly, this is not an equation you can carry around in your head! It is also not a simple transformation, which explains why it is hard to "guesstimate" the post-test probability from the sensitivity and specificity (remember our mammography example).
Take a look at the "2 x 2" table below:
Patients with disease |
Patients without disease |
|
Test is positive |
a |
b |
Test is negative |
c |
d |
Well refer to similar tables in other discussions, so its a good idea to get familiar with how they work. Using the above table, the definitions of sensitivity and specificity can also be written as:
sensitivity = a / (a+c)
specificity = d / (b+d)
Are sensitivity and specificity ever helpful? Occasionally. A very high sensitivity,
when negative, rules out disease. For example, consider the complaint of "dyspnea on
exertion" in the diagnosis of congestive heart failure (a critical appraisal
of this article is available):
CHF |
no CHF |
|
Dyspnea on exertion |
41 |
183 |
No DOE |
0 |
35 |
The sensitivity of dyspnea on exertion for the diagnosis of CHF is 100% (41/(41+0)), and the specificity 17% (35/(183+35)). If negative (the patient does not complain of dyspnea on exertion), it is very unlikely that they have CHF (0 out of 41 patients with CHF did not have this symptom). An easy way to remember this rule of thumb is the acronym "SnNOut", which is taken from the phrase: "Sensitive test when Negative rules Out disease".
Conversely, a very specific test, when positive, rules in disease. Not surprisingly, the acronym for this kind of test is "SpPIn"! Consider a gallop (S3) murmur in the diagnosis of congestive heart failure, with data taken from the same study:
CHF |
no CHF |
|
Gallop (S3) murmur |
10 | 3 |
No gallop murmur |
31 | 215 |
The sensitivity of gallop for CHF is only 24% (10/41), but the specificity is 99% (215/218). Thus, if a patient has a gallop murmur, they probably have CHF (10 out of 13).
Let's consider one more example, this time of a combination of symptoms:
CHF |
no CHF |
|
Displaced apex and JVD, gallop, rales, or edema |
18 | 3 |
No displaced apex, or displaced apex alone |
23 | 215 |
So, the sensitivity is:
And the specificity is:
Thus, another SpPIn: patients with a displaced apex and either JVD, gallop, rales, or edema were very likely to have CHF (18 out of 21 or 86%).
Thus, sensitivity and specificity by themselves are only useful when either is very
high (over typically, 95% or higher). There is a useful Web site with a collection of
SpPIns and SnNOuts here
. In the next section, well
learn about predictive values, which are more useful to clinicians than sensitivity and
specificity.