|
|
||||||||
Guest Access | Sign In via User Name/Password |
|||||||||
* From the Pulmonary Divisions, LDS Hospital and University of Utah School of Medicine, Salt Lake City UT.
The Family Heart Study is a National Institutes of Health-funded multicenter study.
Correspondence to: Robert L. Jensen, PhD, Pulmonary Division, LDS Hospital, Salt Lake City, UT 84143; e-mail: Ldrjens1{at}ihc.com
Abstract
The purpose of the study was to determine the best surrogate for FVC when performing spirometry to detect obstruction or restriction. Volume-time curves from 3,539 participants in the Family Heart Study with acceptable quality test sessions were analyzed. An index of the variability of each timed volume (FEVx) from 1 to 12 s was determined for each subject. The least within-test session variability was seen for forced expired volume in 6 s (FEV6) and forced expired volume in 7 s (for both, mean range was 95 mL). The sensitivity and specificity for detecting obstruction and restriction when substituting the FEV6 for the FVC were then determined before and after allowing for measurement errors of 50 mL (approximately the lower limit of spirometers ability to detect volume). Sensitivity was 76% before the 50-mL error analysis and 95% after. Specificity was 98% before the 50-mL error analysis and 99.5% after. We conclude that use of FEV6 to replace the FVC for spirometry testing will result in improved reproducibility, with no significant loss of sensitivity or specificity, after allowing a 50-mL measurement error, for detecting obstruction or restriction.
Key Words: airway obstruction FEV1 spirometry
Since the development of modern spirometry, the use of vital capacity, FVC, and FEV1 has been standard practice. A low FEV1/FVC ratio is the current standard for determining the presence of airway obstruction, and the degree of reduction in FEV1 is used to determine its severity.1 When only spirometric variables are available, FVC may be used to suggest the presence of restriction.1 One major reason behind the utility of these spirometric parameters is their relatively good reproducibility within an individual, making it possible to measure change over time and to make reference value comparisons with confidence.
Epidemiologic studies23 have shown spirometry measurements to correlate with a number of morbid and comorbid events. A statement from the National Lung Health Education Program4 proposed expansion of spirometry into primary care offices to detect early COPD. As the use of spirometry expands from hospital- or clinic-based pulmonary function laboratories into settings with less training and experience, the level of intraindivdual variability ("noise") is likely to increase, and we can expect increased misclassifications as well as reduced ability to detect change over time.
One strategy to minimize the rate of false-positive results is to use parameters with the least amount of intraindividual variation. When the noise of the spirometry signal is high, and an individuals measured values are close to a lower limit, the interpreter should have less confidence as to which side of the classification threshold the subject lies. Parameters that have less variability (noise) will allow an interpreter to more confidently classify values close to thresholds. Take for example a subject with a measured FEV1 of 1.65 L, which is 77% of predicted. If the lower limit of normal (LLN) was 79% for this subject, a simple classification would tag the subject as abnormal. However, if the noise in the test measurement was ± 4%, the classification would not be so evident, whereas if the noise was ± 1.5% the classification would seem more certain.
The National Lung Health Education Program4 recommended the replacement of the FVC with forced expired volume in 6 s (FEV6) to simplify spirometric testing. FEV6 was probably chosen because it was the minimum expiratory time recommended by the American Thoracic Society (ATS) and reference values were available. We examined intraindividual variability in spirometric tracings from the Family Heart Study (FHS) in order to determine which timed volume (FEVx) candidates would be a good surrogate for FVC. After candidates were selected, their performance was compared to the traditional, "gold standard" FEV1/FVC and FVC in diagnosing airway obstruction and restriction by spirometry.
Materials and Methods
Spirometry was performed in 4,827 subjects recruited for the FHS, a community-based, multicenter study5 of genetic and nongenetic determinants of coronary heart disease. Data were collected at four field centers (Framingham, MA; Minneapolis, MN; Forsyth County, NC; and Salt Lake City, UT). Spirometric data were obtained by trained technicians using the 1987 ATS standards.6 Each participant performed five to eight spirometric maneuvers using a water-sealed spirometer connected to a linear potentiometer (Survey II; W.E. Collins; Braintree, MA). The potentiometer signal was analyzed using software running on a personal computer (S&M; Quakertown, PA). An electronic sensor measured spirometer temperature for each waveform to allow automated correction of measured values to body temperature and pressure, saturated conditions.7 Both raw curve data and calculated data were stored to floppy disks for long-term storage.
Copies of the floppy disks were sent to a reading center (Salt Lake City, UT) to be scored for quality. For each test, overall quality scores similar to academic scores (A to F) were assigned for flow and volume by two of the authors (R.L.J. or R.O.C.). Volume scores were assigned by reviewing both FVC and FEV1 data and tracings. The flow scores were assessed after reviewing the flow volume tracings and peak flow data. A score of "F" indicated that the study failed ATS criteria and the reviewer judged that there were no usable data. A score of "D" indicated ATS criteria were not met but the reviewer judged that some data were useable. For example, a test might have a good initial effort and very short expiratory time, so that FEV1 would be useable but FVC would not. A score of "C" indicated that ATS criteria for acceptability and reproducibility were met. Scores of "A" and "B" were assigned subjectively, indicating significant improvement in effort, duration, or reproducibility over the minimum ATS criteria. Quality scores and suggestions for improvement were regularly returned to the technicians and supervisors at the field centers. The scoring of the data occurred during the collection of the FHS (1993 to 1995). The present study was initially proposed in 1997 to the FHS as an abstract presentation, and later in 1999 as a manuscript. The quality scores have not been reassessed, and any knowledge of the present study was not known at the time of the quality scoring.
Data for Analysis of Intraindividual Variability
At the reading center, raw tracings and computed data with quality scores were transferred to a database for analysis. Volume for each individual curve was measured at 100 samples per second (10 ms). It was only necessary to examine data at 250-ms intervals because no consequential changes in the variability parameters were observed between adjacent 10-ms data. Analysis was started at time zero (determined from the back-extrapolated volume) and continued every 250 ms thereafter up to 12 s (Fig 1
).
|
10 s (defined as time to FVC); and (3) measured FVC from 1.0 to 9 L. The 10-s limit was selected so that the variability index of each FEVx could be examined when most individuals reach their FVC and all subjects had FEV6 data. The 10-s exhalation time may limit the data set because some individuals did not have an exhalation time of 10 s. From the 4,827 subjects, 3,539 subjects (73%) met inclusion criteria. This defined group 1 and did not restrict the data to meet ATS reproducibility criteria. Group 1 was defined in this way so that no bias on the range of variability would be introduced. The three best curves for each individual (defined as having the three highest sums of FVC plus FEV1) were selected for analysis. A second group, group 2 was created from group 1 with the additional requirement that the tests meet ATS reproducibility criteria ± 200 mL and their flow or volume quality scores were A, B, or C. Group 2 contained 3,398 subjects. Group 2 was created so that categorizations of obstruction would be reliable and based only on good quality tests.
Statistical Methods
Analysis of Reproducibility:
Data from group 1 were used to analyze variability across the exhalation time. For each individual, variability was estimated as the range of the three volume points at each 250-ms time interval (range of FEVx, where x is the time point). Figure 1 illustrates the computation for one time point for one subject. Only subjects who had at least three acceptable curves, and only data from the best three curves were used to calculate the range. This eliminated spurious efforts with very low FVCs from being included in the analysis. At each 250-ms time point, the average range of FEVx for all subjects was plotted to illustrate overall variability as a function of expiratory time up to 12 s (Fig 2
). We chose to analyze the data by range because the ATS and now the ATS/European Respiratory Society (ERS) standards emphasize that meeting repeatability is in fact meeting a difference criteria.8 This also reflects the way most pulmonologists, technicians, and manufacturers approach the quality of a spirometry test and further, error codes, and computer warnings are based on differences.
|
Analysis of Diagnostic Categorizations
Using the subjects in group 2, each subject was classified as normal, obstructed, or restricted, using the Third National Health and Nutrition Examination Survey (NHANES III) reference equations.9 These equations provide predicted values and LLNs for FVC, FEV6, FEV1/FVC, and FEV1/FEV6 that are based on statistical lower limits. At present, FEV6 is the FVC surrogate candidate for which reference data are available. Hankinson reference data were chosen because they are based on a random sample of the general population of the United States, the measurements were made with good quality control, and equations are provided for three ethnic groups.9 Each subjects FEV1/FVC and FEV1/FEV6 ratios were calculated using the highest FEV1, FEV6, and FVC from acceptable curves. Each subject was then classified as obstructed or not obstructed using each these two ratios and Hankinson LLN calculated for each subjects age, height, gender, and race. Obstruction was defined as present if either measured ratio fell below the lower limit. The severity of obstruction was graded according to ATS recommendations.1
In a similar manner, a restrictive pattern was defined as either a low FVC or FEV6 in the absence of obstruction. For each comparison, a 2 x 2 table was constructed. Sensitivity and specificity and their 95% confidence intervals (CIs) were calculated. Defining restriction based on spirometry has limitations.1 Absence of a reduced FVC or FEV6 is rarely associated with a total lung capacity (TLC) below the LLN range.
The obstructive categorizations were reanalyzed after introducing the possibility of a ± 50-mL error into FEV1 and FEV6, only for the FEV1/FEV6 ratio. Fifty milliliters is near the limit that spirometers can detect a volume change. It represents approximately 1 to 2% of most FVCs and is approximately half of the variability of FEV6 seen in this study. To simulate the effect of introducing the "50-mL error," the following procedure was performed on each subjects data. First, the largest possible FEV1/FEV6 ratio was calculated by adding 50 mL to FEV1 and subtracting 50 mL from FEV6 (upper limit). Second, the smallest FEV1/FEV6 ratio was calculated by subtracting 50 mL from FEV1 and adding 50 mL to FEV6. When the calculated FEV1/FVC fell between these upper and lower FEV1/FEV6 ratio limits, the clinical categorizations were considered equivalent. A 100-mL error was similarly introduced into only FEV6 for the comparison of the restrictive patterns. Sensitivity and specificity were then recalculated and compared to the original estimates.
Results
There were 4,827 FHS participants; 3,539 subjects were entered into group 1 to analyze the FEVx variability. Group 1 contained spirometry tests for all individuals who completed at least three maneuvers with at least 10 s of exhalation, and data were analyzed up to 12 s. Some of the tests did not meet the ATS reproducibility requirement of 200 mL (n = 141) but were included so that no systemic bias was introduced in the estimates of the FEVx variability. Average spirometric data for these subjects are summarized in Table 1 .
|
Based on FEV1/FVC, the prevalence of obstruction in the sample of 3,398 individuals (the group that met ATS acceptability criteria, scores A, B, or C) was 15% (525 individuals); of these, 68% had mild obstruction, 24% had moderate obstruction, and 8% had severe obstruction. Using exact LLNs,9 the sensitivity of FEV1/FEV6 for predicting obstruction based on FEV1/FVC was 76%, and specificity was 98.3% (Table 2 ). Most discrepant results were close to the lower limits of the reference range. After allowing for a hypothetical 50-mL error, sensitivity increased to 99.2% and specificity increased to 99.5% (Table 2).
|
|
Reference value comparisons require both biological and technical comparability between the person being tested and the reference values.10 Reducing intraindividual variation in spirometry measures will create closer alignment between reference populations and should decrease errors in clinical classification: first, by ensuring that each measured value more precisely represents the individual being tested; second, by more accurately placing an individuals measurements above or below the LLN.
The FVC has a higher intrasubject variability than FEV6, in part, because it depends on expiratory time. This problem is most pronounced in patients with airway obstruction in whom good volume plateaus may not occur even after 20 s of exhalation.11 Longer exhalations also occur in older healthy individuals due to loss of elastic recoil. As exhalation time increases with age and the FVC becomes more dependent on exhalation time, the potential for diagnostic misclassification increases. For example, suppose a 65-year-old patient is being tested and performs the test well, exhaling vigorously, but only for 6 s. This patient meets one ATS end-of-test standard; however, the reference equations for FVC are based on data obtained from subjects with average expiratory times of 15 s. The patients FVC will be low, relative to the reference set simply because the expiratory time is shorter. As a result, the measured FEV1/FVC ratio will be high relative to the reference value and a false-negative classification error is more likely to occur.
The FEV6 is a potential surrogate for FVC. It is approximately 90 to 95% of FVC, it is at a fixed and therefore reproducible time, and it should improve diagnostic accuracy by improving the comparability between patient and reference data. The FEV6 expiratory time is short enough to be achievable by most subjects, is acceptably reproducible, and performs well in comparison to the traditionally used FEV1/FVC ratio in categorizing patients. The FEV6 has advantages as the surrogate for FVC because of the following: (1) it is the minimum expiratory time defined as acceptable by both the ATS and ERS; (2) intraindividual variation in FEV6 is close to the minimum of all FEVx variability, and not significantly different than FEV7, where the minimum occurs; and (3) reference data are presently available for FEV6 and FEV1/FEV6 ratio from NHANES III data that are not available for other FEVx candidates.
During this study, we did not reexamine the individual spirometry curves. Only during the initial data collection phase of the FHS, the authors (R.L.J. and R.O.C.) visually examined the curves from each participant and graded the quality. This analysis relied on these original grades.
The use of spirometry measurements to support a diagnosis of restriction has been shown to be difficult. When FVC is normal, there is a low probability that the TLC is below the LLN1213; however, if the FVC falls below the normal limits, the probability for a clinically low TLC only increases to 55 to 57%. Even though this is a relatively high probability that the TLC may be low, it still requires that additional lung volume testing be performed to determine if a true restrictive pattern is present.
Swanney et al14 published comparisons in 337 individuals referred to a pulmonary laboratory in Christchurch, NZ, for clinical testing. The prevalence of obstruction in her sample was 66%. The sensitivity and specificity of FEV1/FVC6 in predicting airway obstruction by a low FEV1/FVC were 95% and 97%, respectively. When a possible error of ± 100 mL was introduced, both sensitivity and specificity were essentially 100%. In addition, she found that interindividual variability in FEV1/FVC6 was 25 to 30% lower than in FEV1/FVC.
There are technical reasons to select FVC6 as a surrogate for FVC. Exhalations of 6 s are easier for patients and are less likely to fatigue them. The use of FEV6 allows for a well-defined end of test as opposed to the FVC. Several manufacturers have already incorporated calculations of FEV6 into their algorithms.
The only FEV6 reference data available are those from the NHANES III data of US residents, and they may or may not be applicable to other populations. Reference data from other populations or comparisons that establish comparability with the NHANES III equations are necessary before FEV6 can be widely applied. Modern reference equations for continental Europe, India, and China are being developed; however, until these are published we suggest that the NHANES III prediction equations for the FEV1/FEV6 ratio might be a reasonable alternative. This is because the FEV1/FEV6 ratio is somewhat insensitive to height, gender, and ethnic background and uses only age to predict the ratio. For example, in a 40-year-old man, these equations show very small differences between white, African-American, and Mexican-American races in either predicted FEV1/FEV6 (range 1.8%) or the corresponding LLN for FEV1/FEV6 (range 1.9%).
An additional issue arises with a surrogate such as FEV6; namely, what to do if a subject does not exhale for 6 s. This issue is especially relevant for young individuals who routinely exhale for < 6 s. For the young who are within the age range covered by NHANES III (
8 years), one can use the NHANES III equations with confidence. In NHANES III, only FVCs associated with a plateau in the volume-time curve were included in the FEV6 analysis. Therefore, complete short exhalations are comparable between the measured FEV6 in children and the FEV6 NHANES III reference values. More work on defining end of test is clearly required for children. For adults who do not exhale for 6 s, a potential solution is to predict FEV6 from curves with shorter exhalation times. This has been demonstrated to predict FEV6 within ± 40 mL but has only been published in abstract form.15
An error analysis is consistent with ATS recommendations to interpret measured values that lie near thresholds with caution.1 There is always a real possibility that a single subject can cross back and forth over the LLN after a retest or even within a single test session if their measured values are close to the LLN. The error analysis in this study for the diagnostic categorizations was considered conservative since 50 mL is only one fourth of the ATS standard reproducibility standard of ± 200 mL6 and one third of the new ATS/ERS repeatability standard of ± 150 mL.8 We initially used a ± 200 mL for the error analysis. With this level, the sensitivities and specificities were 100%, perfect agreement. We then reduced the error analysis to ± 50 mL and only applied the error analysis to the FEV1/FEV6 ratio, leaving the FEV1/FVC ratio fixed. This is a relative change of ± 2.2% for a FEV1/FEV6 ratio of 75%, based on a FEV1 of 3,000 mL and a FEV6 of 4,000 mL.
Performing an analysis of the effect of noise or measurement error is critical to understanding comparability of potential new methods to classify patients with clinically accepted methods for diagnostic classification. First, LLNs are only estimated from healthy populations and have their own variability and imprecision. In a study16 of American Indians, the lower 95th percentiles for FVC and FEV1 had CIs of approximately ± 2%. Second, pulmonary equipment introduces some noise. In recent evaluations (2005) of Collins Survey spirometers (W.E. Collins; Braintree, MA; the same make and model used in the FHS), we found the average repeatability with a pulmonary waveform generator, using only room air for testing, to be between 1% and 2% for FEV1 and approximately 1% for FVC. These variations were observed even with an input signal repeatable to within < 1 mL. Human testing will further increase intratest variability that is estimated at 2.7% for FVC and 3.3% for FEV1.17 Third, the 50-mL error was approximately half of the repeatability in the present study for FVC and FEV1. Small changes well within the normal noise for a given individual may account for the observed differences in classifications between the FEV1/FEV6 and the FEV1/FVC ratio. Finally, in group 2 subjects in whom the difference between the highest and lowest FEV6 was < 50 mL, 36.6% of the corresponding differences in FEV1 were in the opposite direction, and the scatter about the mean FEV1 was ± 200 mL. In general, as FEV6 decreases so will FEV1, so our choice of the ± 50-mL error might equivocate more of the FEV1/FEV6 ratios with the same classification as the FEV1/FVC ratios. However, this would only occur with values extremely close to the LLN.
Diagnostic classifications are obvious when the measured data are either far above or far below the LLN. When the measured values are close to the LLN, there is uncertainty about the classification. There were 468 subjects (13.2%) in group 2 who were within a borderline region defined as ± 2.2% of the LLN. Of these, 363 subjects (77.6%), FEV1/FEV6 agreed with the FEV1/FVC classifications. From the 105 results that did not agree, all but 18 results (0.51%) changed classifications to agree with the standard FEV1/FVC with the introduction of the ± 50-mL error.
The present study primarily gives evidence that intraindividual FEVx variability minimizes around the FEV6 to FEV7. Therefore, not only the FEV6 but also parameters derived from the FEV6 will have smaller inherent variability. We agree with Pedersen,18 who noted in a recent editorial that there is no real advantage of the FEV6 in diagnosis of restriction. It performs as poorly as the FVC and at best can only increase the probability of a TLC below the LLN to approximately 55%. Our work further supports findings by Akpinar-Elci et al19 and Vandevoorde et al,20 showing acceptable levels of specificity and sensitivity for the diagnosis of obstruction (the levels of Akpinar-Elci et al19 were 92% and 98%, respectively, and for Vandevoorde et al20 were 94% and 93%, respectively). The lower variability found in the FEV6 may lead to more precise categorizations of obstruction than are presently found using the FVC.
Footnotes
Abbreviations: ATS = American Thoracic Society; CI = confidence interval; ERS = European Respiratory Society; FEV6 = forced expired volume in 6 s; FEV7 = forced expired volume in 7 s; FEV12 = forced expiratory volume in 12 s; FEVx = timed volume; FHS = Family Heart Study; LLN = lower limit of normal; NHANES III = Third National Health and Nutrition Examination Survey; TLC = total lung capacity
The authors have no conflicts of interest to disclose.
Received for publication January 18, 2006. Accepted for publication June 15, 2006.
References
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |