Predictive values

Predictive values help us answer the question:

    "Given a positive (or negative) test result, what is the new probability of disease?"

Let’s fill some numbers in. In his article on the clinical diagnosis of strep, Frank Dobbs took consecutive patients, prospectively did the same history and physical exam maneuvers on all of them, and then did a throat culture. For practice, critically appraise the article using the formwpe2.jpg (883 bytes) we gave you.

One of the things our nurses ask patients who call with a sore throat is how long they’ve had it. If it’s only been a couple of days, they are more likely to ask patients to try symptomatic remedies. If the duration is longer, they are more likely to ask them to come in. The presence of fever is another important factor in advising the patients, with febrile patients more likely to be asked to come in for evaluation. Is there any evidence to support these strategies?

Essentially we are asking, "If a patient has fever, what is the likelihood of strep pharyngitis?" and "If a patient has symptoms for 3 or more days, what is the likelihood of strep pharyngitis?" Let’s start by creating 2 x 2 tables for each question, using data from Dobbs’ article:

 

Strep

No strep

    Strep No strep  

Fever

58

80

138

Duration
>= 3 days

16

62

78

No fever

14

54

68

Duration
< 3 days

56

72

128
 

72

134

   

72

134

 

Using the equations for sensitivity and specificity, we find that for fever:

sensitivity = 58 / (58 + 14) = 0.81

specificity = 54 / (54 + 80) = 0.40

Note that the sensitivity can be written as "0.81" or "81%".   One is no better than the other - just be consistent!   Similarly, for duration of symptoms >= 3 days,

sensitivity = 16 / (56 + 16) = 0.22

specificity = 72 / (72 + 62) = 0.55

At this point, you should be getting a little uneasy about your triage policy. While most patients with strep had fever, only 22% with symptoms for more than 3 days had the diagnosis! We still haven’t answered our question, though. To do that, we have to calculate predictive values. They are defined as:

Positive predictive value = probability of disease among patients with a positive test

Negative predictive value = probability of no disease among patients with a negative test

The probability of disease given a positive test can therefore be called the "post-test probability of disease given a positive test", the "positive predictive value", or the "posterior probability of disease given a positive test". These names are interchangeable. Similarly, the probability of disease given a negative test is called the "post-test probability of disease given a negative test" or the "posterior probability of disease given a negative test"; this is equal to one minus the negative predictive value. Note this last point: the negative predictive value does not equal the post-test probability of disease given a negative test. They are the converse of one and other.

What about our old friend, the 2 x 2 table? Here is the standard 2 x 2 table:

 

Patients with disease

Patients without disease

Test is positive

a

b

Test is negative

c

d

We can now define positive and negative predictive value as follows:

Positive predictive value = a / ( a+b)

Negative predictive value = d / (c+d)

Post-test probability of disease given a positive test = a / (a+b)

Post-test probability of disease given a negative test = c / (c+d)

Notice that we are now using the rows instead of columns, as for sensitivity and specificity. What about our original question on the diagnosis of strep throat? Recall:

 

Strep

No strep

    Strep No strep  

Fever

58

80

138

Duration >= 3 days

16

62

78

No fever

14

54

68

Duration < 3 days

56

72

128
 

72

134

   

72

134

 

We can quickly calculate that for fever:

Positive predictive value = 58 / (58+80) = 0.42

Negative predictive value = 54 / (54+14) = 0.79

And for duration of symptoms of 3 or more days:

Positive predictive value = 16 / (16+62) = 0.20

Negative predictive value = 72 / (72+56) = 0.56

So…If a patient has fever, there is a 42% chance of strep, and if they have symptoms for 3 or more days, only a 20% chance. It appears that it may be appropriate to revise our triage policy!

Let’s consider another example. The CAGE score is a useful screening tool for alcoholism, which has been validated in adolescent and adult populations using a detailed psychiatric interview as the reference standard for the diagnosis of alcoholism. It asks four questions:

  1. Have you ever tried to cut down on your drinking?
  2. Have you ever gotten annoyed by criticism of your drinking?
  3. Have you ever felt guilty about your drinking?
  4. Do you ever have a drink first thing in the morning to calm your nerves or get over a hangover?

Answer the above questions for yourself. If you prefer, answer the questions for your use of coffee! How many positive responses did you have? Click on the answer below, and look to the bottom of the screen for an interpretation:

0     1      2     3     4

The CAGE score is an example of a kind of test. Most people consider a score less than 2 to be a negative screening test for alcoholism, and a score >= 2 to be a positive test. Let’s put the results from a study by Bush and colleagues in a 2 x 2 table:

  Alcoholic

Not alcoholic

CAGE >= 2

88 15

CAGE < 2

29 386

There were 103 patients (88 + 15) with a "positive" CAGE score, of whom 88 were actually alcoholics by the reference standard.  Thus, the positive predictive value is 88/103, or 85%. The likelihood that a patient with a "negative" CAGE score is not an alcoholic is 386/(386+29) or only 93%; recall that this is the negative predictive value. Conversely, this means that 7% of patients with a "negative" CAGE score are actually alcoholic, and were mistakenly classified by the score.

Incidentally, the sensitivity and specificity are 75% and 96%, respectively. Make sure you understand how we arrived at these figures, and go back to the original definitions of sensitivity, specificity, and predictive value if necessary.

Notice that in the above example, 20% of patients overall were alcoholic. This seems a little high for a typical outpatient practice. Sure enough, these data are for inpatients, who probably have a higher rate of substance abuse than the typical outpatient.  What if we had fewer alcoholic patients? Cutting the number of alcoholics in half gives you the following 2 x 2 table:

  Alcoholic Not alcoholic

CAGE >= 2

44 15

CAGE < 2

14 386

Since the sensitivity and specificity are calculated using the columns, their values don’t change when the overall likelihood of alcoholism changes. Don’t take my word for it:

sensitivity = 75%

specificity = 96%

Note that these are the same values we got earlier. This is an important point, and one of the strengths of sensitivity and specificity:

"The sensitivity and specificity do not depend
on the prevalence or pre-test probability of disease"

In other words, they are not affected by how common or rare the disease is!  On the other hand, lets look at the predictive values in our new, "lower prevalence" population:

positive predictive value = 44/59 = 74%

negative predictive value = 386/400 = 97%

The values have changed! The positive predictive value dropped from 85% to 74%, while the negative predictive value went up from 93% to 97%.  Thus:

"The predictive value varies with the pre-test probability of disease"

When I give a medical student lecture, I sometimes tell them that I'll kick the podium when I say something that will be on the test.  Well, I'm now kicking the "virtual" podium!  This is a significant limitation of using predictive values and/or post-test probabilities. Each pair of predictive values or post-test probabilities is associated with a single pre-test probability. Changing the pre-test probability changes the predictive value in non-linear ways (remember, we use the complicated, non-linear Bayes formula to calculate the post-test probability of disease).   The same test result may therefore give you one post-test probability in the emergency room, and a different one in your office, if the pre-test probabilities differ.

Think again about the CAGE score. It can have a value of 0,1,2,3, or 4. Does a patient with a CAGE score of 4 have a greater probability of alcoholism than one with a score of 2? Using a single cutpoint lumps patients with scores of 2, 3, and 4 together. Similarly, do patients with a score of 0 have the same likelihood of alcoholism as ones with a score of 1? (The answer is a resounding "No!" to both questions…)

Therefore, an important limitation of using a single cutpoint is that it results in a loss of information. In the next section we will learn about likelihood ratios, and how they overcome this and other limitations.