Predictive values help us answer the question:
"Given a positive (or negative) test result, what is the new probability of disease?"
Lets fill some numbers in. In his article on the clinical diagnosis of strep,
Frank Dobbs took consecutive patients, prospectively did the same history and physical
exam maneuvers on all of them, and then did a throat culture. For
practice, critically appraise the article using the form
we gave you.
One of the things our nurses ask patients who call with a sore throat is how long theyve had it. If its only been a couple of days, they are more likely to ask patients to try symptomatic remedies. If the duration is longer, they are more likely to ask them to come in. The presence of fever is another important factor in advising the patients, with febrile patients more likely to be asked to come in for evaluation. Is there any evidence to support these strategies?
Essentially we are asking, "If a patient has fever, what is the likelihood of strep pharyngitis?" and "If a patient has symptoms for 3 or more days, what is the likelihood of strep pharyngitis?" Lets start by creating 2 x 2 tables for each question, using data from Dobbs article:
Strep |
No strep |
Strep | No strep | ||||
Fever |
58 |
80 |
138 | Duration |
16 |
62 |
78 |
No fever |
14 |
54 |
68 | Duration |
56 |
72 |
128 |
72 |
134 |
72 |
134 |
Using the equations for sensitivity and specificity, we find that for fever:
sensitivity = 58 / (58 + 14) = 0.81
specificity = 54 / (54 + 80) = 0.40
Note that the sensitivity can be written as "0.81" or "81%". One is no better than the other - just be consistent! Similarly, for duration of symptoms >= 3 days,
sensitivity = 16 / (56 + 16) = 0.22
specificity = 72 / (72 + 62) = 0.55
At this point, you should be getting a little uneasy about your triage policy. While most patients with strep had fever, only 22% with symptoms for more than 3 days had the diagnosis! We still havent answered our question, though. To do that, we have to calculate predictive values. They are defined as:
Positive predictive value = probability of disease among patients with a positive test
Negative predictive value = probability of no disease among patients with a negative test
The probability of disease given a positive test can therefore be called the "post-test probability of disease given a positive test", the "positive predictive value", or the "posterior probability of disease given a positive test". These names are interchangeable. Similarly, the probability of disease given a negative test is called the "post-test probability of disease given a negative test" or the "posterior probability of disease given a negative test"; this is equal to one minus the negative predictive value. Note this last point: the negative predictive value does not equal the post-test probability of disease given a negative test. They are the converse of one and other.
What about our old friend, the 2 x 2 table? Here is the standard 2 x 2 table:
Patients with disease |
Patients without disease |
|
Test is positive |
a |
b |
Test is negative |
c |
d |
We can now define positive and negative predictive value as follows:
Positive predictive value = a / ( a+b)
Negative predictive value = d / (c+d)
Post-test probability of disease given a positive test = a / (a+b)
Post-test probability of disease given a negative test = c / (c+d)
Notice that we are now using the rows instead of columns, as for sensitivity and specificity. What about our original question on the diagnosis of strep throat? Recall:
Strep |
No strep |
Strep | No strep | ||||
Fever |
58 |
80 |
138 | Duration >= 3 days |
16 |
62 |
78 |
No fever |
14 |
54 |
68 | Duration < 3 days |
56 |
72 |
128 |
72 |
134 |
72 |
134 |
We can quickly calculate that for fever:
Positive predictive value = 58 / (58+80) = 0.42
Negative predictive value = 54 / (54+14) = 0.79
And for duration of symptoms of 3 or more days:
Positive predictive value = 16 / (16+62) = 0.20
Negative predictive value = 72 / (72+56) = 0.56
So If a patient has fever, there is a 42% chance of strep, and if they have symptoms for 3 or more days, only a 20% chance. It appears that it may be appropriate to revise our triage policy!
Lets consider another example. The CAGE score is a useful screening tool for alcoholism, which has been validated in adolescent and adult populations using a detailed psychiatric interview as the reference standard for the diagnosis of alcoholism. It asks four questions:
Answer the above questions for yourself. If you prefer, answer the questions for your use of coffee! How many positive responses did you have? Click on the answer below, and look to the bottom of the screen for an interpretation:
The CAGE score is an example of a kind of test. Most people consider a score less than 2 to be a negative screening test for alcoholism, and a score >= 2 to be a positive test. Lets put the results from a study by Bush and colleagues in a 2 x 2 table:
| Alcoholic | Not alcoholic |
|
CAGE >= 2 |
88 | 15 |
CAGE < 2 |
29 | 386 |
There were 103 patients (88 + 15) with a "positive" CAGE score, of whom 88 were actually alcoholics by the reference standard. Thus, the positive predictive value is 88/103, or 85%. The likelihood that a patient with a "negative" CAGE score is not an alcoholic is 386/(386+29) or only 93%; recall that this is the negative predictive value. Conversely, this means that 7% of patients with a "negative" CAGE score are actually alcoholic, and were mistakenly classified by the score.
Incidentally, the sensitivity and specificity are 75% and 96%, respectively. Make sure you understand how we arrived at these figures, and go back to the original definitions of sensitivity, specificity, and predictive value if necessary.
Notice that in the above example, 20% of patients overall were alcoholic. This seems a little high for a typical outpatient practice. Sure enough, these data are for inpatients, who probably have a higher rate of substance abuse than the typical outpatient. What if we had fewer alcoholic patients? Cutting the number of alcoholics in half gives you the following 2 x 2 table:
| Alcoholic | Not alcoholic | |
CAGE >= 2 |
44 | 15 |
CAGE < 2 |
14 | 386 |
Since the sensitivity and specificity are calculated using the columns, their values dont change when the overall likelihood of alcoholism changes. Dont take my word for it:
sensitivity = 75%
specificity = 96%
Note that these are the same values we got earlier. This is an important point, and one of the strengths of sensitivity and specificity:
"The sensitivity and specificity do not depend
on the prevalence or pre-test probability of disease"
In other words, they are not affected by how common or rare the disease is! On the other hand, lets look at the predictive values in our new, "lower prevalence" population:
positive predictive value = 44/59 = 74%
negative predictive value = 386/400 = 97%
The values have changed! The positive predictive value dropped from 85% to 74%, while the negative predictive value went up from 93% to 97%. Thus:
"The predictive value varies with the pre-test probability of disease"
When I give a medical student lecture, I sometimes tell them that I'll kick the podium when I say something that will be on the test. Well, I'm now kicking the "virtual" podium! This is a significant limitation of using predictive values and/or post-test probabilities. Each pair of predictive values or post-test probabilities is associated with a single pre-test probability. Changing the pre-test probability changes the predictive value in non-linear ways (remember, we use the complicated, non-linear Bayes formula to calculate the post-test probability of disease). The same test result may therefore give you one post-test probability in the emergency room, and a different one in your office, if the pre-test probabilities differ.
Think again about the CAGE score. It can have a value of 0,1,2,3, or 4. Does a patient with a CAGE score of 4 have a greater probability of alcoholism than one with a score of 2? Using a single cutpoint lumps patients with scores of 2, 3, and 4 together. Similarly, do patients with a score of 0 have the same likelihood of alcoholism as ones with a score of 1? (The answer is a resounding "No!" to both questions )
Therefore, an important limitation of using a single cutpoint is that it results in a loss of information. In the next section we will learn about likelihood ratios, and how they overcome this and other limitations.