|
|
||||||||
Guest Access | Sign In via User Name/Password |
|||||||||
* From the Departments of Medicine (Drs. Harpole, Kelley, Schreiber, and McCrory) and Surgery (Dr. Toloza) and Center for Clinical Health Policy Research (Ms. Kolimaga), Duke University Medical Center, Durham, NC.
Correspondence to: Linda H. Harpole, MD, MPH, Duke Center for Clinical Health Policy Research, 2200 W Main St, Suite 220, Durham, NC 27705; e-mail: harpo003{at}mc.duke.edu
| Abstract |
|---|
|
|
|---|
Design, setting, and participants: A systematic search was performed for relevant literature from MEDLINE, Cancerlit, CINAHL, HealthStar, the Cochrane Library, and the National Guidelines Clearinghouse published from January 1989 to July 2001.
Measurement and results: From 369 citations, 51 relevant guidelines were identified. Each guideline was evaluated by at least four reviewers using the Appraisal of Guidelines for Research and Evaluation (AGREE) instrument and was coded for clinical topics covered. The recommendations included in each guideline also were abstracted. Of the 51 guidelines evaluated, 27 (53%) were evidence-based. Clinical topics identified by the ACCP for their guideline effort each were represented by at least one existing guideline. Of the 880 clinical recommendations abstracted from the guidelines, only 253 (29%) were evidence-based. The AGREE instrument rates guidelines along six domains. As a group, the guidelines performed well in the scope and purpose domain, with only six guidelines (12%) scoring < 50%. For the remaining domains, however, the guidelines did not perform as well, as follows: for stakeholder involvement, 41 guidelines (80%) scored < 50%; for rigor of development, 29 guidelines (57%) scored < 50%; for clarity and presentation, 17 guidelines (33%) scored < 50%; for applicability, 46 guidelines (90%) scored < 50%; and for editorial independence, 47 guidelines (92%) scored < 50%. After considering the domain scores, the reviewers recommended only 19 of the guidelines (37%).
Conclusions: All major clinical lung cancer topics are covered by at least one guideline, but no single guideline addresses all areas. Furthermore, although existing guidelines may accurately reflect clinical practice, most performed poorly when evaluated for quality. Future guideline efforts that address each item of the AGREE instrument would add substantially to the literature.
Key Words: evidence-based medicine lung neoplasms practice guidelines
| Introduction |
|---|
|
|
|---|
Clinical practice guidelines have been defined as "systematically developed statements to assist practitioner and patient decisions about appropriate health care for specific clinical circumstances."2 However, the recent increase in the production of clinical practice guidelines has been accompanied by growing concern about the variations in guideline recommendations3 4 and quality.5 6 7
In 2001, a multidisciplinary panel was convened by the American College of Chest Physicians to develop evidence-based clinical practice guidelines for lung cancer diagnosis and treatment. To avoid potential duplication of effort, the first step taken was to identify and determine the quality of already published guidelines in this area.
| Materials and Methods |
|---|
|
|
|---|
Appraisal Instrument
The methodological quality of existing clinical practice guidelines was evaluated using the Appraisal of Guidelines for Research and Education (AGREE) instrument,8
an international, rigorously developed, and validated instrument that compares well with other instruments designed for this purpose.9
This instrument allowed for the assessment of several components that are integral to guideline development, as follows: (1) scope and purpose; (2) stakeholder involvement; (3) rigor of development; (4) clarity and presentation; (5) applicability; and (6) editorial independence. Five reviewers (LH, DM, ET, MK, and GS) used the AGREE instrument to evaluate the scientific quality of the lung cancer guidelines. A minimum of four reviewers completed the AGREE instrument for each guideline and also determined whether the guideline was evidence-based or consensus-based.
Each guideline was coded for the following topics covered: prevention; screening and early detection; initial evaluation; diagnosis; clinical staging; pathologic/surgical staging; treatment-early stage; treatment-stage I; treatment-stage II; treatment-stage IIIA/potentially resectable; treatment-stage IIIB/nonresectable; treatment-stage IV; treatment-Pancoast tumor, T4, and those requiring special consideration; treatment-small cell lung cancer; treatment-solitary pulmonary nodule; follow-up/surveillance; palliative care; palliative treatment; and practice organization.
The text of the recommendations included in each guideline also was abstracted. Each recommendation was coded for topic, subtopic, and type of evidence utilized when formulating the statement (ie, A, strong evidence; B, weak evidence; and C, consensus). Since different guidelines used different scales to rate the strength and quality of evidence supporting a particular recommendation, we often had to map published grades to our scale. In general, statements were graded as strong evidence (A) if they were supported by randomized controlled trials, weak evidence (B) if they were supported by evidence other than that from randomized controlled trials, and consensus (C) if they were not supported by clear study data. For guidelines that did not grade recommendations, we categorized statements based on the above schema.
Evaluation of Guidelines
The 23-item AGREE instrument is divided into the following six domains (see "Appendix"): scope and purpose (three items); stakeholder involvement (four items); rigor of development (seven items); clarity and presentation (four items); applicability (three items); and editorial independence (two items). The score for each domain is obtained by summing up all the scores of the individual items in a domain and then standardizing as follows:
![]() |
The final component of the AGREE instrument involves a recommendation regarding the use of the guidelines in practice as "strongly recommended," "recommended (with provisos or alterations)," "would not recommend," or "unsure." On this item, the investigators reached consensus for each guideline. For ease of interpretation, we considered "strongly recommended" and "recommended with provisos or alterations" as a response of "recommended," and "would not recommend" or "unsure" as a response of "would not recommend." The AGREE instrument instructs the raters to make a judgment as to the quality of the guideline, taking each of the appraisal criteria into consideration. In our ratings, we took into account the date of the guideline and considered whether we would recommend the document as a useful tool that could be adapted locally by a health-care provider who was considering implementing the guideline in a health-care practice or system. We placed relatively more weight on the quality of development than on whether the recommendations matched our particular clinical practice or were feasible in our particular practice environments.
Prior to evaluating the guidelines included in the review, we first each rated a superseded guideline,10 compared ratings among reviewers, discussed discrepancies, and reached consensus about the interpretation of each question.
We used the
statistic as a measure of the agreement among reviewers.11
However, before performing any calculations, the response categories were dichotomized into strongly agree/agree vs strongly disagree/disagree, as we thought that an analysis of agreement at this level was sufficient. The
statistic was then applied to each of the 23 items of the AGREE instrument. The simple proportion of agreement also was calculated.
| Results |
|---|
|
|
|---|
|
|
|
|
The quality of the guidelines is represented by the AGREE domain scores in Table 3 , 3A, 3B
|
|
|
Stakeholder Involvement:
This domain evaluates the degree to which the guideline represents the views of its intended users. Included are questions regarding the composition of the guideline development group (specifically, whether individuals from all relevant professional groups were represented), whether patients experiences and expectations informed the development of the guideline, whether the target users of the guideline were well-defined, and whether the guideline was piloted among end-users. Overall, the mean score for this domain was 35% (range, 3 to 70%), with 41 guidelines (80%) scoring < 50%. Only 6% of guidelines included individuals from all relevant professional groups in the development stage, and none was piloted among end-users.
Rigor of Development:
This domain specifically evaluates whether systematic methods were used to search for evidence, whether the criteria for selecting the evidence and the methods used to formulate the recommendations were clearly described, whether there was an explicit link between the recommendations and the supporting evidence, whether health benefits, side effects, and risks were considered when formulating the recommendations, whether the guideline was externally reviewed by experts prior to publication, and whether a procedure for updating the guideline was provided. Overall, the mean score for this domain was 52% (range, 2 to 95%), with 57% of guidelines scoring < 50%. Specifically, only 16 guidelines (31%) described systematic methods for searching and selecting the evidence, 18 guidelines (35%) considered health benefits, side effects, and risks when formulating the recommendations, and 20 guidelines (39%) described the methods used to formulate the recommendations. Moreover, only 18 guidelines (35%) were externally reviewed prior to publication.
Clarity and Presentation:
This domain describes the clarity of the guidelines. Specifically, it describes whether the recommendations were specific and unambiguous, whether the different management options were clearly presented, whether key recommendations were easily identifiable, and whether the guideline was supported with tools for application. Overall, the mean score for this domain was 57% (range, 15 to 90%). Only two guidelines (4%) included tools for application. Seventeen guidelines (33%) scored < 50% for this domain.
Applicability
This domain evaluates issues that are pertinent to guideline implementation. More specifically, it considers organizational barriers, cost implications, and monitoring criteria. The score on this domain was the lowest of all, with a mean score of 20% (range, 0 to 98%). Only five guidelines (10%) scored at least 50%. Two guidelines provided review criteria for monitoring purposes, and six discussed potential organizational barriers. No guideline discussed cost implications.
Editorial Independence
This domain addresses conflict of interest, specifically whether the guideline was editorially independent from the funding body and whether potential conflicts of interest were reported for the members of the guideline development group. The score in this domain was also poor, with a mean score of 24% (range, 0 to 83%). Four guidelines (8%) scored
50%. In 48 guidelines (94%), potential conflicts of interest on the part of guideline developers were not recorded.
Overall Recommendations
After reviewing all 51 guidelines and completing the AGREE instrument, the reviewers came to a consensus with respect to an overall recommendation for each guideline. As described in the "Materials and Methods" section, we recommended guidelines that we thought would be useful to health-care providers and that demonstrated good quality on the AGREE instrument. In total, we recommended 19 of 51 guidelines (37%). As noted in Table 3
, for some guidelines (specifically, the British Thoracic Society recommendations to respiratory physicians for organizing the care of patients with lung cancer34
and the Scottish Intercollegiate Guidelines Network guideline on the management of lung cancer58
), we based our recommendation on the guidelines superior performance on the AGREE instrument, while recognizing that many of the guideline statements specific to practice organization and review criteria would be directly relevant only within the system for which they were developed. Nonetheless, these documents serve as examples of well-constructed and well-communicated guidelines.
Agreement Among Reviewers
Table 4
demonstrates both the degree of agreement beyond chance (
statistic) and the observed simple agreement among the reviewers for the 23 items of the AGREE instrument. The
values indicate that overall agreement was poor to fair for 65% of the items and was moderate to substantial for 35% of the items. Observed agreement among reviewers was high, with 74% of items having moderate-to-substantial agreement and 26% of items having excellent agreement. The degree of agreement appeared to be consistent across domains and did not appear to be correlated with domains that were quantitative vs qualitative in nature.
|
| Discussion |
|---|
|
|
|---|
Although many of the guidelines are classified as evidence-based, a thorough review of their quality utilizing the AGREE instrument led the authors to recommend fewer than half of the guidelines. The reasons for this are multifactorial. Overall, almost all the guidelines performed poorly with respect to applicability and editorial independence. Even those guidelines that explicitly based their recommendations on evidence, such as those developed by the Cancer Care Ontario Practice Guidelines Initiative,35 36 37 38 39 40 41 42 43 44 45 failed to address issues of barriers to implementation, monitoring criteria, and evidence of pilot testing. Addressing such issues is necessary if the guideline movement is to continue successfully. Although few studies have assessed the impact of guideline development on patient outcomes,63 it has been demonstrated that explicit guidelines can improve clinical practice; however, improvement requires rigorous evaluation. Well-developed guidelines should include the consideration of potential barriers to guideline implementation, should supply monitoring criteria to assess the guidelines impact, and should provide evidence of pilot testing to ensure that the guideline can be practically put to clinical use.
Another area where the lung cancer guidelines consistently failed to perform was in the domain of editorial independence. Poor performance in this domain could represent true conflicts of interest between funding sources and guideline development panels; alternatively, it may simply reflect poor reporting on these topics. The developers of the AGREE instrument contacted the authors of each guideline they reviewed to obtain background material that could inform the reviewers ratings. For some items, this additional communication may have provided more information than we were able to obtain from reviewing the guidelines and any accompanying material we could obtain from additional references or the World Wide Web. For the lung cancer guidelines, documentation regarding the issue of an individuals conflict of interest was rarely stated. Explicit statements about whether or not the funding body was independent editorially from the guideline committee were also infrequent. Therefore, poor performance in this domain could have been due to our failure to obtain further information from each guideline author. However, future guideline efforts would benefit from clear documentation on this matter within the text of the guideline document so that readers will be able to determine for themselves whether or not a conflict of interest potentially exists.
One of the key factors regarding the adequacy of the guidelines pertains to the rigor of development. Despite the fact that most of the guidelines included references to published literature, many did not clearly delineate the literature review methodology used or the mechanism by which recommendations were formulated. This step is crucial in determining whether the recommendations are truly based on the evidence and in understanding how the evidence is synthesized.
Patient preferences and experiences should be factored into decisions regarding clinical care, especially in diseases such as lung cancer in which treatments can have significant morbidity and can impact on quality of life. Almost all of the guidelines we reviewed would have benefited from more attention to this issue. This could be accomplished by ensuring that all guideline committees have patient representatives and that literature reviews specifically address quality of life when available. It would also be helpful if greater efforts were made in the research community to ensure that quality of life and patient preferences are incorporated into research protocols.
Implementation of practice guidelines also requires attention to local practice patterns. For example, the Cancer Guidance Group46 guideline provides specific implementation strategies and review criteria for their recommendations. Many of these strategies and some of the corresponding monitoring criteria are specific to the United Kingdom. Nonetheless, this guideline provides an example of items that should be incorporated by those undertaking a guideline effort. Furthermore, those looking to apply currently available guidelines can learn from their detailed efforts to address implementation issues and adapt what is relevant locally.
The recommendations that result from interpretation of the evidence also can vary among guidelines. This variation could be a function of local bias, difference in data interpretation, or a manifestation of available resources. Nonetheless, one needs to be careful when considering others guidelines for local use and needs to ensure that the clinical data are concordant with the evidence and clinical judgment. Because of the variability noted in guidelines that referenced the same studies, it is crucial that a guideline effort have a clear methodology for going from the evidence to the recommendations so that the possibility of bias is minimized.
In this study, many of the conclusions are based on a review utilizing the AGREE instrument. Although this instrument is fairly new, it is one of the few guideline assessment tools to demonstrate validity and reliability. Furthermore, the areas covered by the instrument are logical for anyone to consider when conducting guideline development or evaluation. A guideline that addresses the issues raised by the AGREE instrument is more likely to be a rigorously developed guideline. Nevertheless, the AGREE instrument has some limitations. For one, the interrater reliability was primarily slight to moderate. Some of the variability may be due to differences in interpretation of several items where the instructions were broad. For example, for item 22, which is contained within the domain of editorial independence, the
statistic was only 0.14. This slight agreement probably arises from the fact that this question, which asks whether or not the guideline is editorially independent from the funding body, is open to interpretation, with some reviewers stating that the criterion was not met unless the statement was explicitly made in the guideline, while others interpreted the criterion to be met if the funding agency was the government, as in several of the UK and Canadian guidelines.34
35
36
37
38
39
40
41
42
43
44
45
58
However, in this case, the simple agreement was still 58%. In contrast, for item 12, which addresses whether or not there is an explicit link between the recommendations and the supporting evidence, the
statistic was > 0.6. This is not unexpected, given that this item is relatively straightforward. The issue of low interrater reliability was observed in the previous version of this instrument64
and was accommodated by the use of multiple reviewers, as well as through the refinement of the instruments questions and instructions. Moreover, the degree of agreement among raters was good, with all items having moderate or better agreement.
Another potential limitation of the AGREE instrument concerns the validity of the responses to the question on the overall assessment of the guideline. Although the reviewers were instructed to consider the domain scores when making a decision about whether or not to recommend the guideline, no clear rules were established as to how to weight the differing domains. However, when reviewing the assessments compared with the domain scores, the responses appear to have validity. For each guideline that was recommended, the overall domain scores were > 50% for at least three domains, with an average of four domains with a score of > 50%. For guidelines that were not recommended, on average only 1.3 domains had a score of > 50%. Furthermore, the score on the domain "rigor of development" for recommended guidelines was high. All scores were > 50%, with an average score of 84%. Conversely, for guidelines we did not recommend, the average score for this domain was only 33%.
In conclusion, a review of current lung cancer guidelines demonstrates that many of the clinical topics of interest have been considered by at least one guideline. None covers all the necessary elements. Furthermore, although prior guidelines may accurately reflect clinical practice, few adhere to the standards set forth by the AGREE instrument. A guideline effort for lung cancer, which adheres to all the criteria explicit in the AGREE instrument and clearly addresses each item in the guideline text, would add substantially to the literature.
| Appendix 1 |
|---|
|
|
|---|
Scope and Purpose
Stakeholder Involvement
Rigor of Development
Clarity and Presentation
Applicability
Editorial Independence
| Acknowledgements |
|---|
| Footnotes |
|---|
Abbreviation: AGREE = Appraisal of Guidelines for Research and Education
This research was supported by a contract from the American College of Chest Physicians.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
J. Watine, B. Friedberg, E. Nagy, R. Onody, W. Oosterhuis, P. S. Bunting, J.-C. Charet, and A. R. Horvath Conflict between Guideline Methodologic Quality and Recommendation Validity: A Potential Problem for Practitioners Clin. Chem., January 1, 2006; 52(1): 65 - 72. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |