|
|
||||||||
Guest Access | Sign In via User Name/Password |
|||||||||
Durham, NC
Correspondence to: Neil MacIntyre, MD, FCCP, Duke University Medical Center, PO Box 3911, Durham, NC 27710; e-mail: neil.macintyre{at}duke.edu
Pulmonary function testing has been a mainstay in guiding patient care and assessing clinical trials for over a century. The most commonly used test is FEV1, which assesses the response of respiratory mechanical function to various disease states, medication responses, and environmental exposures. More recently, diffusing capacity of the lung for carbon monoxide (DLCO) has been used to assess the response of the alveolar-capillary interface to a variety of injuries to the lung parenchyma and vascular bed.
As with any clinical test, it is important to understand the signal-to-noise characteristics of pulmonary function tests. The signals are the values or the changes in values that are of interest to clinicians and/or investigators. The noise is all the factors that produce variability and can cloud or mask the signal. In pulmonary function testing, the sources of noise can either be technical (eg, instrument, software, technician performance) or biological (eg, activity, endocrinologic state, comorbidities, posture). Low signal-to-noise ratios can come from either a weak signal or overwhelming noise. With low signal-to-noise ratios, separating signal from noise can be very difficult; and depending on the interpretation cutoff chosen, large numbers of false-negative (missed signals) or false-positive (erroneous signals) results can be present.
The American Thoracic Society (ATS)/European Respiratory Society (ERS) pulmonary function task force recently published standards123 for pulmonary function testing and addressed the issue of interpretation using the available published evidence. Their recommendations are based on the coefficient of variation (CV [SD divided by the mean of multiple measurements]) of the reference or baseline value. Two times the CV gives a range that encompasses 95% of those measurements (95% confidence interval [CI]). For a single determination of normal/abnormal in an individual patient, an abnormal test result is defined as being outside of the 95% CI from a reference population. For a change in a measurement either over time or as a response to an intervention, the ATS/ERS group has made recommendations based on published week-to-week CVs for various tests. In an individual, the ATS/ERS states that the FEV1 would have to change 12 to 15% to be considered a "signal" outside the range of "noise." For the DLCO, they state that a change would have to be > 6 mL/min/mm Hg to be considered significant. Importantly, smaller signals than these can be detected in clinical trials or population studies by making measurements on multiple patients (sample size) to create a 95% CI around the signal.
In this issue of CHEST (see pages 388 and 396), Jensen and colleagues45 significantly add to our understanding of pulmonary function testing variability, and go on to demonstrate how this variability can have a profound effect on clinical trial design. In essence, they took five state-of-the-art, new pulmonary function devices and carefully separated instrument variability from clinical variability. Instrument variability was assessed using an ATS waveform generator for FEV1 and a prototype DLCO simulator for diffusing capacity. They then took these same instruments and did repeated testing on normal human subjects over several weeks. The instrument variability was then subtracted from this total variability to determine the patient or clinical variability.
In general, the technical performance of all five devices was quite good, with FEV1 (and vital capacity) performance well within ATS/ERS guidelines. One device appeared to underestimate some of the mid-expiratory flow measurements, a result that may be related to a specific analysis algorithm performance with some of the simulated flows. An important limitation in generalizing these results, of course, is that only one device from each manufacturer was tested. Moreover, a very skilled laboratory with very skilled technicians and very motivated subjects were used. Thus these results probably represent the best possible results and testing in the general population would likely produce wider variabilities.
The CVs for the FEV1 and DLCO with repeated patient testing (5.12 to 8.48% for FEV1; 9.86 to 19.66% for DLCO) were lower than the week-to-week variability cited by the ATS/ERS review.1 Interestingly, the patient contribution to the variability was generally on the same order of magnitude as the technical contribution. However, the clinical contribution to variability was proportionally higher than the instrument contribution with FEV1 but less than the instrument contribution with DLCO.
There are implications from these studies for clinicians who are usually asking one of two questions from pulmonary function testing: Is the patient outside the normal range? Has function changed? The Jensen results suggest that the current generation of pulmonary function devices have as good or better instrument variability as the earlier devices that were used to generate the commonly used reference equations. Importantly, the lower variability for FEV1 and DLCO over multiple weeks observed by Jensen et al45 compared to the ATS/ERS review1 suggests that smaller changes over time than that recommended by the ATS/ERS might be considered significant in quality laboratories such as the one in this study.
There are implications from these studies for clinical researchers and trial designers as well. Researchers are usually asking one question: Does the intervention being studied affect pulmonary function? An important signal in an individual, however, may not be outside the testing variability or noise. To address this, multiple, repeated measurements are required in order to characterize the signal variability and statistically compare it to the baseline. In a clinical trial, these repeated measurements, usually in multiple patients, are termed the sample size. In general, the smaller the signal or the larger the noise, the more repeated measurements that are needed and the larger the sample size that is required to detect the signal. The Jensen analysis puts this into context for studies using FEV1 and DLCO. Using their Figure 6 from reference 5, it can be seen that several fold changes in sample size may be required to detect a signal if using instruments and procedures with large variability characteristics. The Jensen results also emphasize that standardizing equipment and procedures in a clinical trial can also help reduce interdevice and interlaboratory sources of variability.
These two studies by Jensen et al45 are important reminders to both clinicians and researchers that pulmonary function testing still has considerable noise, and that the best chance at finding a signal is to minimize that noise. These articles are both reassuring and sobering. It is reassuring to know that the modern devices can meet standards reasonably well and that good technicians and well-motivated subjects can keep testing noise to relatively low levels. It is sobering, however, to realize that there is still significant noise around these signals and that this noise can impair our ability to find important clinical signals both in individuals and in populations in clinical trials.
Footnotes
Dr. MacIntyre is Professor of Medicine, Duke University Medical Center.
Dr. MacIntyre serves as a consultant to Viasys Health Care.
References
Related Articles
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |