Articles about diagnosis, like any study, can be of varying quality. Evaluating the quality of an article involves first determining whether the results are relevant to your practice, then whether they are valid. Well tackle each of these in turn, and consider some example along the way.
First, the test should be one that is feasible for you in your community. PET scans may give us useful information about central nervous system disease, but arent a practical option in my community. Similarly, while brain biopsy is an accurate test for diagnosing dementia, its not practical for my (living) patients!
Second, the population should be reasonably similar to those seen in your practice or community. A study which only includes very sick patients (who have more dramatic findings) will make a test look better than it is. For example, a stress thallium is more likely to be abnormal in patients with severe coronary artery disease than in patients with mild disease.
Applying the test only to patients with disease and healthy controls can also make the test look better than it is, because healthy controls are very unlikely to have an abnormal test result. Ideally, a study should include patients with a wide and without disease, selected because they have symptoms that would ordinarily cause you to consider ordering the test, and with a spectrum of severity from mild to advanced disease. It should also include patients with diseases that have similar manifestations; for example, a study of MI diagnosis should include patients who turned out to have esophageal spasm, gall bladder disease, dyspepsia, and panic disorder.
Lets test our knowledge of relevance in diagnostic tests. Consider a study of ultrasound in the diagnosis of appendicitis. The study population should consist of (click on the correct answer):
Once youve determined that the test and population studied are relevant to your practice, you have to assess the validity of the information. There are only a few key indicators of diagnostic study quality.
First, did the authors use a reasonable reference standard? For example, in a study of the diagnosis of streptococcal pharyngitis, while ASO titers might be the ideal reference standard, for practical purposes we would probably have to accept throat culture. Should we accept rapid antigen tests as a reference standard, though? Probably not. In some cases we are limited by ethical considerations. For example, doing an invasive test such as a biopsy in a patient with a negative test result, for the sole purposes of doing a study, might be considered unethical. In many cases, different reference standards are used for patients with positive and negative tests. Lets consider an example:
In our study of ultrasound in appendicitis, we decide to enroll all patients with abdominal pain and suspected appendicitis who present to our community hospital. We perform the ultrasound, but need a reference standard to definitively diagnose appendicitis. Unfortunately, we doubt that we will be able to convince either the surgeons in our hospital or their patients to operate on every patient with suspected appendicitis, so we can use pathology of the removed appendix as our reference standard. We therefore decide on the following (reasonable) compromise. Surgeons will not be told the results of the ultrasound, so it doesnt bias their decision of whether or not to operate. For patients who go to surgery, pathology will be the reference standard. Patients who dont go to surgery will be followed for 30 days, to make sure their symptoms resolve or an alternative diagnosis is made. If they do not undergo appendectomy in the following 30 days, we assume that they do not have appendicitis.
Once youve determined that the authors used a reasonable reference standard, check to see whether the examiners doing the test were "blinded" to the reference standard result. As Goethe said, "Mann sieht nur wass mann weist", which translates to "One only sees what one knows". Knowing that the reference standard test is positive makes it much more likely that we find the result of the test being studied positive, and vice versa!
Third, the decision to perform the reference standard should ideally be independent of the results of the test being studied. As we already learned above, though, that isnt possible. If the reference standard is not applied independently of the test being studied, at least make sure that the authors follow up patients who dont get it, and have a good justification for designing their study this way.
Fourth, data should be collected prospectively whenever possible. This is especially true when studying the history and physical examination, since we all know how inaccurate the medical record is as a source of information about what physicians actually did! For example, if a physician didnt record that a patient with sore throat had an exudate, does that mean they really didnt have one, or that you didnt look, or that they did but you forgot to record it?
Finally, the diagnostic test should be applied to a reasonable number (at least 100 is a good rule of thumb) of patients with an appropriate "spectrum" of disease. That is, you want patients with both mild, moderate, and severe disease, as well as patients with similar symptoms who dont have the disease in question.
Weve developed a form for you to use when evaluating articles about diagnosis. Go ahead and print it. Now, lets try using the form to evaluate a paper about diagnosis. Use the article by Williams on sinusitis (Williams JW, Simel DL, Roberts L, Samsa GP. Clinical evaluation for sinusitis: making the diagnosis by history and physical examination. Ann Intern Med 1992; 117: 705-10.)
Next, lets use the form to evaluate a secondary source of information.
"Secondary sources" are services that critically appraise recent literature, and
provide a synopsis. Examples include the ACP Journal Club and the Journal of Family Practice POEMs
section. Take a look at some
recent synopses here, and see if they answer your questions about the diagnostic test.