|
|
||||||||
Guest Access | Sign In via User Name/Password |
|||||||||

*
From the Harborview Medical Center and the Division of Pulmonary and Critical Care Medicine (Drs. Rubenfeld, Hudson, and Ms. Caldwell), University of Washington, Seattle WA; Toronto Hospital and the Critical Care Medicine Program (Dr. Granton), University of Toronto, Toronto, Canada; and the Cardiovascular Research Institute (Dr. Matthay), University of California, San Francisco, CA.
See Appendix for a complete list of participants who read chest
radiographs in the study.
Correspondence to: Gordon D. Rubenfeld, MD, MSc, Pulmonary & Critical Care Medicine, Harborview Medical Center, Box 359762, 325 9th Ave, Seattle WA 98104; e-mail: nodrog{at}u.washington.edu
| Abstract |
|---|
|
|
|---|
Objective: To study the interobserver variability in applying the AECC radiographic criterion for ALI-ARDS.
Design: Survey.
Participants: A convenience sample of 21 experts selected from participants attending the 1997 Toronto Mechanical Ventilation Workshop and from members of the National Institutes of Health ARDS Network.
Outcome measures: Participants reviewed 28 randomly selected chest radiograph from critically ill, hypoxemic (PaO2/fraction of inspired oxygen ratio, < 300) patients and decided whether the radiograph fulfilled the AECC definition for ALI-ARDS.
Results: Interobserver
agreement in applying the AECC definition for ALI-ARDS was moderate
(
= 0.55; 95% confidence interval, 0.52 to 0.57). Thirteen
radiographs (43%) showed nearly complete agreement (defined as 20 or
21 readers in agreement). Nine radiographs (32%) had more than or
equal to five dissenting readers. The percentage of radiographs
interpreted as consistent with ALI-ARDS by individual readers ranged
from 36 to 71%. Participants commented that mild infiltrates, pleural
effusions, atelectasis, isolated lower lobe involvement, radiographic
technique, and overlying monitoring equipment posed the most
difficulties.
Conclusions: The radiographic criterion used in the current AECC definition for ALI-ARDS showed high interobserver variability when applied by expert investigators in the fields of mechanical ventilation and ARDS. This variability may result in differences in ALI-ARDS populations at different clinical research centers and may make it difficult for clinicians to apply the results of clinical trials to their patients. Modifications to the radiographic criterion or annotated reference radiograph may improve the reliability of future definitions for ALI-ARDS.
Key Words: ARDS chest radiography interobserver variability lung injury
| Introduction |
|---|
|
|
|---|
Critics have hypothesized that the variability in reports of the
incidence, risk factors, and outcomes of ARDS were due in part to
poorly characterized definitions and to heterogeneous patient
populations.6
7
To address this problem and to set
standard definitions, an American-European Consensus Conference (AECC)
was convened in 1992.8
In their report, members of the
conference defined ALI as the acute onset of arterial hypoxemia
(PaO2/fraction of inspired oxygen
[FIO2] ratio,
300), a pulmonary
artery wedge pressure
18 mm Hg or no clinical evidence of left
atrial hypertension, and bilateral infiltrates consistent with
pulmonary edema on frontal chest radiograph. The authors specifically
noted that the pulmonary infiltrates could be mild. ARDS is defined by
the same criteria as ALI, but with more severe hypoxemia
(PaO2/FIO2,
200). No radiographic distinction was made between ALI and ARDS,
and therefore, for this report, we will refer to a common entity,
ALI-ARDS.
Interobserver variability in the interpretation of diagnostic radiograph has been reported by a number of investigators. Evaluations of mammograms, ventilation-perfusion scans, and chest radiographs in cases of pneumonia and pneumoconiosis may demonstrate poor agreement between readers.9 10 11 We hypothesized that the AECC radiographic definition was not specific enough to lead readers to a reliable and reproducible interpretation of chest radiographs. Therefore, in applying the definition, we hypothesized that there would be a wide range of individual thresholds for determining that the infiltrates were consistent with pulmonary edema and interobserver variability.
| Materials and Methods |
|---|
|
|
|---|
Eighteen participants read the radiographs at the Toronto meeting, and 3 others received the series of radiographs by mail. All readers who volunteered to participate in the study were included. Identical instructions, provided to each reader, stated:
"All radiographs were taken from intubated patients with PaO2/FIO2 < 300. Does this chest radiograph fulfill the AECC definition for ALI and ALI-ARDS, bilateral infiltrates consistent with pulmonary edema? Note that the American-European Consensus Conference definition specifically included mild and patchy infiltrates."
No clinical history or additional information was provided. No time constraint was placed on the readers. Responses reported as "positive" indicated that the chest radiographs fulfilled the definition of the AECC. The readers were asked to indicate aspects of the radiographs that made the definition difficult to apply to a specific radiograph.
Data were analyzed to determine the percentage of readers that
interpreted each radiograph as positive, the percentage of radiographs
read as positive or negative by each reader, and to measure
interobserver variability (
-statistic).13
The
approximate normal test was used to test for statistical significance
between
-statistic values. All analyses were performed on a computer
(IBM-PC; Danbury, CT) using appropriate software (SAS; SAS
Institute; Cary, NC).
| Results |
|---|
|
|
|---|
The percentages of readers who scored each radiograph as fulfilling the
radiographic definition "bilateral infiltrates consistent with
pulmonary edema" are listed in Table 1
. The number of readers who agreed on the interpretation of chest
radiographs varied. Thirteen interpretations of radiographs (8 read as
consistent with ALI-ARDS and 5 as negative for ALI-ARDS) showed nearly
complete agreement (0 or 1 dissenting reader). Nine interpretations of
radiographs had more than or equal to five dissenting readers. The
-statistic for interobserver agreement was 0.55 (95% confidence
interval, 0.52 to 0.57). There was no statistically significant
difference between the agreement between the 7 NIH ARDS Clinical Trial
Network readers and the 14 other readers. To evaluate whether
digital imaging had an effect on agreement, we compared the
-statistic values for the two radiographic techniques. Agreement on
the analog radiographs was superior to that on the digital radiographs
(
-statistic, 0.72 vs 0.38, respectively; p < 0.0001).
|
|
|
|
|
|
| Discussion |
|---|
|
|
|---|
-statistic
value of 0.55 indicates only moderate agreement.13
-statistic values in this range have raised concerns in the
interpretation of mammograms, ventilation-perfusion scans, and chest
radiographs in community-acquired pneumonia.10
14
15
There
was full agreement on less than half the radiographs. Chest radiographs
that were interpreted consistently as positive were obtained from
patients with dense alveolar infiltrates in four lung quadrants.
Infiltrates limited to lower lung zones, atelectasis, small lung
volumes, mild involvement, pleural effusions, and overlying monitoring
devices all were identified as contributing factors for high
variability of radiograph interpretations. There was a twofold
difference in the positive radiograph rate between the reader least
likely to call a radiograph positive (36%) and the reader most likely
to call a radiograph positive (71%). This wide range in the percentage
of radiograph readers termed positive was not due to isolated outliers,
but reflected a continuous distribution across all readers (Table 2) .
These data provide empiric evidence for the concerns that have been
raised regarding the reproducibility of the ALI-ARDS definition for
different institutions.6
7
If the variability in
radiographic interpretation causes variability in the clinical
diagnosis of ALI-ARDS, this finding may account for some of the
geographic and institutional variation in the incidence, risk factors,
resource use, and outcome of ALI-ARDS. Only one other published study has evaluated interobserver variability in ARDS chest radiographs.16 These investigators found excellent agreement between two radiologists and poor agreement between four nonradiologist clinicians in calculating a radiographic score in a series of chest radiographs from patients who had ARDS diagnoses. Ours is the first study to explore the reliability of an accepted consensus conference radiographic definition (as opposed to a scoring system) in a set of chest radiographs. In addition, the radiographs in our study were selected only on the basis of intubation and hypoxemia, therefore representing a broad range of radiographs, unlike the earlier study, in which radiographs were selected from patients who had already received diagnoses of ARDS. An additional strength of our study was the group of 21 international experts in ALI-ARDS and mechanical ventilation who were the participating readers. The previous study used readers from a single institution who interpreted the radiographs together. Our study sample may be more representative of interobserver variability. Finally, we report feedback from the readers on specific aspects of the radiographs that led to variability in interpretation; definitions can be modified based on these findings.
There was more agreement on the analog radiographs than on those taken
with computed radiography. There are several possible explanations for
this unexpected observation. All of the analog radiographs were
selected from patients who were actually enrolled in a clinical trial,
reflecting, therefore, only a subset of patients considered for
inclusion. Thus difficult radiographs may be underrepresented. If this
is true, then our
-statistic estimate of 0.55
overestimates the agreement that would be seen in a sample
of radiographs evaluated in ALI-ARDS screening. It is possible that the
digital radiographs were of poorer quality than the analog radiographs
or that the smaller size of the digital radiographs made them more
difficult to interpret. Participants may have lacked experience in
evaluating digital radiographs. However, the seven readers from the
ARDS Network reviewed digital radiographs from patients with
ALI-ARDS at several planning sessions, and their agreement, as a
group, on radiograph interpretations was no different from other
readers. It is not possible to distinguish among these possibilities
without studying the interpretation of digital and analog radiographs
on the same patients.
There are several potential limitations to this study. We did not study the readers accuracy in diagnosing ALI-ARDS using the entire AECC definition. This would have required a presentation of clinical data including onset, history, physical examination, and laboratory tests in the form of a vignette. Had we used vignettes with this information, the resulting agreement on the diagnosis of ALI-ARDS would have reflected our skills, or lack thereof, in abstracting clinical information and writing vignettes. Because we were interested specifically in the readers agreement with each other in applying the radiographic definition, and because there is no "gold standard" with which to compare the readers decisions, accuracy was not particularly relevant. The question was not whether the readers were right or wrong in some objective sense, but whether they agreed with each other in applying a standard radiographic definition.
Because any diagnostic test may perform poorly in a specific spectrum
of cases, it is possible that the poor agreement in this study reflects
the sample of radiographs.17
We tried to simulate the
broad range of chest radiographs that would be encountered in screening
patients for ALI-ARDS for enrollment in a clinical trial by using a
sample of patients who were critically ill, intubated, and met the
oxygenation criterion for ALI-ARDS. It is possible that our sample size
of 21 readers and 28 radiographs was too small to estimate the true
-statistic value. This uncertainty is reflected in the confidence
intervals around the
-statistic value, which exclude excellent
agreement.
The
-statistic is affected by the prevalence of positive readings in
the sample. If, for example, the average reader had read the radiograph
as 90% positive or 90% negative for ALI-ARDS, the
-statistic might
have appeared low when, in fact, considerable agreement existed among
readers. However, this limitation does not apply to our study, because
the wide variability among readers led to a broad range of positive
readings, and the average prevalence was nearly optimal 54% (Table 2)
.
Some aspects of the chest radiograph presentation process may have contributed to the level of observed agreement. Serial radiographs were not available for review as they might be in clinical practice, and such review could have improved apparent agreement on radiographs of pleural effusions or on those using overlying monitoring devices. In addition, we chose to study the readings of pulmonary and critical care physician experts rather than radiologists. We consider it a unique opportunity to have studied their performance; however, it is possible that a group of radiologists would have interpreted the chest radiographs with less variability. Because the diagnosis of ALI-ARDS and the decision to enroll in clinical trials are evaluations frequently made by a clinician at the bedside, we believe the study participants were a valid choice to address the research question. We did not proctor the readers when they interpreted radiographs. However, any collaborative reading would only have biased the study toward finding a higher level of agreement than actually exists.
To reduce interobserver variability, future panels charged with revising the definition for ALI-ARDS should consider the issues raised by this study. An annotated set of training radiographs clarifying the interpretation of the difficult radiographic patterns identified by our readers might be a useful adjunct to a written definition. The effect of analog vs digital technique on agreement needs to be further evaluated. Modifications to the definition may also improve agreement. For example, a "negative" definition that specifies which radiographic patterns are inconsistent with ALI-ARDS may lead to greater agreement than the current version. Specific instructions to interpret radiographs strictly by the definition, even if it results in positive radiographs that the reader might personally consider negative for ALI-ARDS, may facilitate consistent readings. It is interesting to note that the ARDS Network investigators, who have read chest radiographs together as part of clinical trial planning, interpreted radiographs no more consistently as a group than other participants. Group reading exercises may be insufficient to ensure agreement, and a modified definition or example radiograph may be necessary. Finally, it is important that definitions proposed by consensus panels be empirically evaluated for interobserver agreement by readers who will be using the definition.
This study has important implications for consensus panels charged with defining critical care syndromes in general and for the interpretation of clinical trials. Sepsis syndrome, multiple organ dysfunction syndrome, and ARDS are diagnosed on the basis of operational definitions proposed by experts.8 18 19 These definitions frequently make clinical sense and therefore seem valid. However, they are rarely subjected to empiric testing to evaluate their reliability. As we have shown, interobserver variability may be high, particularly with regard to radiographic or clinical features that are difficult to standardize among clinicians. Because the accuracy of critical care syndrome definitions cannot be verified, as accepted "gold standard" diagnostic tests do not exist, it is particularly important that future critical care syndrome definitions demonstrate their reliability. It is important to appreciate that the absence of effective therapies for ALI-ARDS limits the clinical consequences of its diagnosis. Therefore, the findings of this study are largely a challenge to the research community. However, when effective treatments are found, their applicability at the bedside will depend on clinicians abilities to identify and treat patients similar to those enrolled in the clinical trials.12 20 If the efficacy of an ALI-ARDS therapy depends, in part, on radiographic aspects of the syndrome, and if clinicians cannot identify similar patients because of variability in applying the definition, then the treatment will not be as effective in their patients as in the clinical trial subjects. Before we can expect clinicians to consistently identify those chest radiographs that meet criteria for ALI-ARDS, tools should be developed to help experts apply the definition consistently.
| Appendix |
|---|
|
|
|---|
| Footnotes |
|---|
Supported by NIH grants SCORHL96014 (Drs. Rubenfeld, Hudson, and Ms. Caldwell) and RO1HL51856 (Dr. Matthay).
Received for publication January 11, 1999. Accepted for publication May 5, 1999.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
L. B. Ware and M. A. Matthay Acute Pulmonary Edema N. Engl. J. Med., December 29, 2005; 353(26): 2788 - 2796. [Full Text] [PDF] |
||||
![]() |
E. Fan, D. M. Needham, and T. E. Stewart Ventilatory Management of Acute Lung Injury and Acute Respiratory Distress Syndrome JAMA, December 14, 2005; 294(22): 2889 - 2896. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. R. Bernard Acute Respiratory Distress Syndrome: A Historical Perspective Am. J. Respir. Crit. Care Med., October 1, 2005; 172(7): 798 - 806. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. E. Graat, J. Stoker, M. B. Vroom, and M. J. Schultz Can We Abandon Daily Routine Chest Radiography in Intensive Care Patients? J Intensive Care Med, July 1, 2005; 20(4): 238 - 246. [Abstract] [PDF] |
||||
![]() |
G. D. Rubenfeld Is SARS Just ARDS? JAMA, July 16, 2003; 290(3): 397 - 399. [Full Text] [PDF] |
||||
![]() |
G. S. Martin, E. W. Ely, F. E. Carroll, and G. R. Bernard Findings on the Portable Chest Radiograph Correlate With Fluid Balance in Critically Ill Patients Chest, December 1, 2002; 122(6): 2087 - 2095. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Love, P. Opoku-Agyemang, M. B. Tomas, P. V. Pugliese, K. K. Bhargava, and C. J. Palestro Pulmonary Activity on Labeled Leukocyte Images: Physiologic, Pathologic, and Imaging Correlation RadioGraphics, November 1, 2002; 22(6): 1385 - 1393. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Squara, D. Bennett, and C. Perret Pulmonary Artery Catheter* : Does the Problem Lie in the Users? Chest, June 1, 2002; 121(6): 2009 - 2015. [Abstract] [Full Text] [PDF] |
||||
![]() |
K Atabai and M A Matthay The pulmonary physician in critical care * 5: Acute lung injury and the acute respiratory distress syndrome: definitions and epidemiology Thorax, May 1, 2002; 57(5): 452 - 458. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. J. Nuckton, J. A. Alonso, R. H. Kallet, B. M. Daniel, J.-F. Pittet, M. D. Eisner, and M. A. Matthay Pulmonary Dead-Space Fraction as a Risk Factor for Death in the Acute Respiratory Distress Syndrome N. Engl. J. Med., April 25, 2002; 346(17): 1281 - 1286. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. W. Ely and E. F. Haponik Using the Chest Radiograph To Determine Intravascular Volume Status : The Role of Vascular Pedicle Width Chest, March 1, 2002; 121(3): 942 - 950. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. G. Brower, L. B. Ware, Y. Berthiaume, and M. A. Matthay Treatment of ARDS Chest, October 1, 2001; 120(4): 1347 - 1367. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. D. CRANDALL and M. A. MATTHAY Alveolar Epithelial Transport . Basic Science to Clinical Medicine Am. J. Respir. Crit. Care Med., March 15, 2001; 163(4): 1021 - 1029. [Full Text] |
||||
![]() |
M. O. MEADE, G. H. GUYATT, R. J. COOK, R. GROLL, J. R. KACHURA, M. WIGG, D. J. COOK, A. S. SLUTSKY, and T. E. STEWART Agreement between Alternative Classifications of Acute Respiratory Distress Syndrome Am. J. Respir. Crit. Care Med., February 1, 2001; 163(2): 490 - 493. [Abstract] [Full Text] |
||||
![]() |
G. E. Westney and G. D. Rubenfeld Radiographic Criteria in ARDS Chest, August 1, 2000; 118(2): 566 - 566. [Full Text] [PDF] |
||||
![]() |
L. B. Ware and M. A. Matthay The Acute Respiratory Distress Syndrome N. Engl. J. Med., May 4, 2000; 342(18): 1334 - 1349. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |