Evaluating the validity of an article about diagnosis

Articles about diagnosis, like any study, can be of varying quality. Evaluating the quality of an article involves first determining whether the results are relevant to your practice, then whether they are valid. We’ll tackle each of these in turn, and consider some example along the way.

Relevance

First, the test should be one that is feasible for you in your community. PET scans may give us useful information about central nervous system disease, but aren’t a practical option in my community. Similarly, while brain biopsy is an accurate test for diagnosing dementia, it’s not practical for my (living) patients!

Second, the population should be reasonably similar to those seen in your practice or community. A study which only includes very sick patients (who have more dramatic findings) will make a test look better than it is. For example, a stress thallium is more likely to be abnormal in patients with severe coronary artery disease than in patients with mild disease.

Applying the test only to patients with disease and healthy controls can also make the test look better than it is, because healthy controls are very unlikely to have an abnormal test result. Ideally, a study should include patients with a wide and without disease, selected because they have symptoms that would ordinarily cause you to consider ordering the test, and with a spectrum of severity from mild to advanced disease.    It should also include patients with diseases that have similar manifestations;  for example, a study of MI diagnosis should include patients who turned out to have esophageal spasm, gall bladder disease, dyspepsia, and panic disorder.

Let’s test our knowledge of relevance in diagnostic tests. Consider a study of ultrasound in the diagnosis of appendicitis. The study population should consist of (click on the correct answer):

Validity

Once you’ve determined that the test and population studied are relevant to your practice, you have to assess the validity of the information. There are only a few key indicators of diagnostic study quality.

First, did the authors use a reasonable reference standard? For example, in a study of the diagnosis of streptococcal pharyngitis, while ASO titers might be the ideal reference standard, for practical purposes we would probably have to accept throat culture. Should we accept rapid antigen tests as a reference standard, though? Probably not. In some cases we are limited by ethical considerations. For example, doing an invasive test such as a biopsy in a patient with a negative test result, for the sole purposes of doing a study, might be considered unethical. In many cases, different reference standards are used for patients with positive and negative tests. Let’s consider an example:

In our study of ultrasound in appendicitis, we decide to enroll all patients with abdominal pain and suspected appendicitis who present to our community hospital. We perform the ultrasound, but need a reference standard to definitively diagnose appendicitis. Unfortunately, we doubt that we will be able to convince either the surgeons in our hospital or their patients to operate on every patient with suspected appendicitis, so we can use pathology of the removed appendix as our reference standard. We therefore decide on the following (reasonable) compromise. Surgeons will not be told the results of the ultrasound, so it doesn’t bias their decision of whether or not to operate. For patients who go to surgery, pathology will be the reference standard. Patients who don’t go to surgery will be followed for 30 days, to make sure their symptoms resolve or an alternative diagnosis is made. If they do not undergo appendectomy in the following 30 days, we assume that they do not have appendicitis.

Once you’ve determined that the authors used a reasonable reference standard, check to see whether the examiners doing the test were "blinded" to the reference standard result. As Goethe said, "Mann sieht nur wass mann weist", which translates to "One only sees what one knows". Knowing that the reference standard test is positive makes it much more likely that we find the result of the test being studied positive, and vice versa!

Third, the decision to perform the reference standard should ideally be independent of the results of the test being studied. As we already learned above, though, that isn’t possible. If the reference standard is not applied independently of the test being studied, at least make sure that the authors follow up patients who don’t get it, and have a good justification for designing their study this way.

Fourth, data should be collected prospectively whenever possible. This is especially true when studying the history and physical examination, since we all know how inaccurate the medical record is as a source of information about what physicians actually did! For example, if a physician didn’t record that a patient with sore throat had an exudate, does that mean they really didn’t have one, or that you didn’t look, or that they did but you forgot to record it?

Finally, the diagnostic test should be applied to a reasonable number (at least 100 is a good rule of thumb) of patients with an appropriate "spectrum" of disease. That is, you want patients with both mild, moderate, and severe disease, as well as patients with similar symptoms who don’t have the disease in question.

We’ve developed a form for you to use when evaluating articles about diagnosis. Go ahead and print it.  Now, let’s try using the form to evaluate a paper about diagnosis. Use the article by Williams on sinusitis (Williams JW, Simel DL, Roberts L, Samsa GP.  Clinical evaluation for sinusitis:  making the diagnosis by history and physical examination.  Ann Intern Med 1992;  117:  705-10.) 

Next, let’s use the form to evaluate a secondary source of information. "Secondary sources" are services that critically appraise recent literature, and provide a synopsis. Examples include the ACP Journal Club and the Journal of Family Practice POEMswpe1.jpg (883 bytes) section. Take a look at some recent synopses here, and see if they answer your questions about the diagnostic test.