|
|
||||||||
Guest Access | Sign In via User Name/Password |
|||||||||
Dr. Wood is the Director of the Trauma and Life Support Center and the Assistant Professor of Internal Medicine; Dr. Coursin is the Associate Director of the Trauma and Life Support Center and the Professor of Anesthesiology and Internal Medicine, University of Wisconsin Hospitals and Clinics; and Dr. Grounds is the Consultant Anaesthetist/Intensivist at St. Georges Hospital.
Correspondence to: Douglas B. Coursin, MD, Professor of Anesthesiology and Internal Medicine, B6/319 UW CSC, Madison, WI 53792-3272; e-mail: dcoursin@facstaff.wisc.edu
It has been suggested that physicians should use statistics "as the drunken man uses the lamppostfor support rather than illumination."1 In this issue of CHEST (see page 802), Pappachan and colleagues report on their attempt to perform a validation study of APACHE III (acute physiology and chronic health evaluation) in 17 general adult ICUs in the South of England (hospital size range 300 to 800 beds). The significantly high standard mortality ratio (SMR = observed/predicted mortality ratio) in the UK hospitals raises the questions of whether a sobering wake-up call to a health-care system has been sounded or whether the statistical foundation upon which the lamppost is situated needs to be stabilized and reevaluated. Although the use of SMR as a measure of ICU performance is debatable,2 ,3 ,4 ,5 the significant excess in mortality beyond the severity stratified predictions warrants close examination and explanation.
As with other external validation studies of general ICU scoring systems,6 ,7 ,8 APACHE III showed good discrimination (ability to assign higher probability of mortality to patients who die), but poor calibration (correspondence between the estimated probability and the actual mortality) indicating poor fit. Accordingly, the APACHE III risk predictions were consistently lower than the actual mortality. This excess in observed to predicted mortality was evident in virtually all risk groups and was particularly striking in those patients in the low risk groups. According to APACHE III, the admission of this group to US hospitals would have resulted in 25% less mortality, the equivalent of a 747 jet crashing in southern England annually. If APACHE III is as robust as its developers and entrepreneurial marketing experts would lead us to believe, one could only conclude that the care to the population in southern England is substandard to that in the US. With candid honesty, the authors readily acknowledge this point and thoroughly review the recognized differences and perceived shortcomings in their system. These include the following: less resource allocation; previous failure to recognize critical care as a specialty; fewer ICU directors and dedicated training programs; the logistics of refusal/denial of admissions to the ICU; early ICU discharge with increased readmission rates; and the high requirements for interhospital transfer of critically ill patients (7.7% in the UK vs 2.3% in the US). Therefore, it is quite possible that the standard of care in major ICUs in the US is superior to the UK. Alternatively, the black box of APACHE III,9 which forms the foundation upon which the lamppost is built, may be unstable or inaccurate when applied cross-culturally.
Selection bias and case mix variability, lead time bias, and methodologic problems are criticisms of APACHE III that potentially render it unreliable for ICU performance comparison.2 ,10 ,11 Selection bias and case mix variability refer to the potential differences in the original database and the population in question. Insofar as APACHE III is newer, expensive, and not frequently reported; a review and comparison of reported issues with APACHE II is also instructive. The APACHE II database was not sufficiently broad enough to allow for accurate predictions in specific disease groups.12 ,13 ,14 ,15 ,16 Despite the expanded database in APACHE III, it cannot predict mortality within a specific disease group. Multiple disease groups (424) are combined into the 78 disease categories for which predictive equations exist. Therefore, one can compare only severity, not outcome among patients within the same disease group.
The published APACHE III database is derived from 17,440 patients at 40 US hospitals; 14 were tertiary care centers, and 26 were randomly chosen hospitals with over 200 beds, half of which had medical school affiliations. Of the 12,793 patients evaluated in the present UK study, 94% were from district general hospitals and only 6% were from one teaching center; the size of which was not reported. The case mix in the United Kingdom was also significantly different: older men; greater comorbidity; increased incidence of transfer into the ICU from the hospital wards; fewer patients directly admitted from the ED; and fewer elective, but more emergency surgical patients. The relative weights of these factors in the predictive linear regression APACHE III equations are for all practical purposes in the proprietary domain of the for-profit APACHE Medical Systems company, and consequently, true investigative comparison is virtually impossible. Lead time bias and its analogue, pre-ICU treatment bias, can significantly contribute to the underestimation of mortality as was evident in APACHE II.17 ,18 ,19 ,20 ,21 APACHE III allegedly corrects for this shortcoming by accounting for the patients' pre-ICU location, although the actual statistical weight remains unpublished and unknown. As a greater number of UK patients originated from the ward, one can only speculate that APACHE II shortcomings persist. Lastly, errors in diagnostic labeling and data collection remain problematic. APACHE III requires a single diagnosis in each patient (424 diseases placed in 78 disease categories), and disease labeling in the United Kingdom may be different than the United States. This could significantly impact upon the mortality prediction. Accurate and reliable data collection is crucial to the success of any predictive system. In APACHE II, there was a high degree of interobserver reliability, utilizing trained and dedicated data collectors.22 In this era of cost containment, few hospitals can afford such a luxury. When data collection is accomplished by registered nurses and residents, the error rate for the principal diagnostic category was between 9 and 18% and there was a clinically significant lack of agreement once predicted mortality exceeded 20%.23 As APACHE III retains this essential data collection plus five new physiologic variables, one can only wonder if the error is further magnified, particularly, in this study where no attempts to correct "illogical, extreme or unlikely values" were performed. In our unit (University of Wisconsin multidisciplinary 24-bed unit), due to increasing cost containment strategies, 30% of patients do not have all of the reference laboratory values (eg, albumin and bilirubin) required for the accurate generation of an APACHE III score.
Despite the preceding criticisms, previous applications of the US derived APACHE database to international populations have shown reasonable correlation,24 ,25 ,26 ,27 although results from the United Kingdom have been mixed.8 ,28 ,29 An increased SMR in Brazil30 and Tunisia6 was ascribed to lack of technology and senior physicians and nurses, respectively. As such, it is crucial to reconcile the data reported in this issue of CHEST by Pappachan and coworkers. A desirable characteristic of risk adjusted morbidity predictors is that they be "open to inspection and testing,"31 so prior to offering a sobering indictment of a health-care system, it would be prudent to inspect the black box foundation upon which the statistical lamppost is secured. To do less would only provide validation that entrepreneurial investigation is supplanting scientific investigation in critical care medicine.
References
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |