|
|
||||||||
Guest Access | Sign In via User Name/Password |
|||||||||
* From the Departments of Medicine, and Clinical Epidemiology and Biostatistics (Dr. Guyatt), McMaster University, Hamilton, ON, Canada; the Department of Medicine (Dr. Gutterman), Medical College of Wisconsin, Milwaukee, WI; the University of Mississippi Medical Center (Dr. Baumann), Jackson, MS; New York University School of Medicine (Dr. Addrizzo-Harris), New York, NY; the Department of Medicine (Dr. Hylek), Research Unit-Section of General Internal Medicine, Boston University School of Medicine, Boston, MA; the University of Kentucky College of Medicine (Dr. Phillips), Lexington, KY; College of Public Health (Dr. Raskob), University of Oklahoma Health Sciences Center, Oklahoma City, OK; the American College of Chest Physicians (Dr. Lewis), Northbrook, IL; and the Department of Clinical Epidemiology, Italian National Cancer Institute (Dr. Schünemann), Rome, Italy.
Correspondence to: Gordon Guyatt, MD, MSc, FCCP, Department of Clinical Epidemiology and Biostatistics, HSC-2C12, McMaster University, 1200 Main St West, Hamilton, ON, Canada L8N 3Z5; e-mail: guyatt{at}mcmaster.ca
Abstract
While grading the strength of recommendations and the quality of underlying evidence enhances the usefulness of clinical guidelines, the profusion of guideline grading systems undermines the value of the grading exercise. An American College of Chest Physicians (ACCP) task force formulated the criteria for a grading system to be utilized in all ACCP guidelines that included simplicity and transparency, explicitness of methodology, and consistency with current methodological approaches to the grading process. The working group examined currently available systems, and ultimately modified an approach formulated by the international GRADE group. The grading scheme classifies recommendations as strong (grade 1) or weak (grade 2), according to the balance among benefits, risks, burdens, and possibly cost, and the degree of confidence in estimates of benefits, risks, and burdens. The system classifies quality of evidence as high (grade A), moderate (grade B), or low (grade C) according to factors that include the study design, the consistency of the results, and the directness of the evidence. For all future ACCP guidelines, The College has adopted a simple, transparent approach to grading recommendations that is consistent with current developments in the field. The trend toward uniformity of approaches to grading will enhance the usefulness of practice guidelines for clinicians.
Key Words: grading recommendations grading system methodology
Treatment decisions involve a tradeoff between benefits on the one hand, and risks, burdens, and, potentially, costs on the other. Guideline panels provide recommendations for the management of typical patients. To integrate these recommendations with their own clinical judgment, and with individual patient values and preferences, clinicians need to understand the basis for the recommendations that expert guidelines offer. A systematic approach to grading the strength of management recommendations can minimize bias and aid interpretation.3 Indeed, most guideline groups have accepted the necessity for some sort of grading scheme.
While the grading of recommendations represents a positive development for guideline development and interpretation, the proliferation of grading systems has proved to be an unfortunate consequence. Methodologists and guideline developers have given much thought and effort to considering the criteria and approaches to an optimal grading system. The American College of Chest Physicians (ACCP) convened a working group to review the issue and to agree on a grading system that would be consistent with the latest developments in the field.
The task force began by developing criteria that define an optimal grading system (Table 1 ), placing them in an order that approximates their relative importance. These criteria guided the decisions of the group in the choice of the grading system that follows.
|
Guideline panels should make recommendations to administer, or not administer, an intervention, on the basis of tradeoffs between benefits on the one hand, and risks, burdens, and, potentially, costs on the other. If benefits outweigh risks and burdens, experts will recommend that clinicians offer a treatment to appropriately chosen patients. The uncertainty associated with the tradeoff between the benefits and the risks and burdens will determine the strength of recommendations.
The ACCP task force chose to classify recommendations into two levels, strong and weak (Table 2 ). If guideline panelists are very certain that benefits do, or do not, outweigh risks and burdens, they will make a strong recommendation, grade 1. If they think that the benefits and the risks and burdens are finely balanced, or if appreciable uncertainty exists about the magnitude of the benefits and risks, they must offer a weak, grade 2 recommendation.
|
Clinicians are becoming increasingly aware of the importance of patient values and preferences in clinical decision making.4 A second way to interpret strong and weak recommendations is in relation to patient values and preferences. For decisions in which it is clear that benefits far outweigh risks, or risks far outweigh benefits, virtually all patients will make the same choice (see box 1 for an example). In such instances, guideline panels can offer a strong (grade 1) recommendation. In contrast, there are other choices in which patient values and preferences will play a crucial role and in which patients will, as a result, make different choices. See boxes 2 and 3 for examples. When, across the range of patient values, fully informed patients are liable to make different choices, guideline panels should offer weak (grade 2) recommendations.
| Box 1: Short-term aspirin reduces the relative risk of death after myocardial infarction by approximately 25%. Aspirin has minimal side effects and very low cost. Peoples values and preferences are such that virtually all patients suffering a myocardial infarction would, if they understood the choice they were making, opt to receive aspirin. Guideline panels can thus offer a strong recommendation for aspirin administration in this setting.
|
| Box 2: Consider a patient a 40 year-old man who has suffered an idiopathic deep venous thrombosis and has been taking adjusted dose warfarin for one year. If the patient continues on standard-intensity warfarin his risk of recurrent DVT will be reduced by approximately 10% per year.1 The inevitable burdens of the treatment include taking a warfarin pill daily, keeping dietary intake of vitamin K constant, monitoring the intensity of anticoagulation with blood tests, and living with the increased risk of both minor and major bleeding. Some patients who are very averse to a recurrent DVT may consider the down sides of taking warfarin well worth it. Others are likely to consider the benefit not worth the risks and inconvenience.
|
| Box 3: A systematic review of randomized trials suggests that in 1,000 patients with ST elevation myocardial infarction who are receiving thrombolytic therapy and aspirin and who are treated with heparin (versus no treatment with heparin) 5 fewer will die, 3 fewer will have reinfarction, and 1 fewer will have a pulmonary embolus, while 3 more will have major bleeds.2 Further, these estimates are not precise, and the advantage in decreased infarctions may be lost after six months. The small, imprecise and possibly transient benefit leaves us less confident about any recommendation to use heparin in this situation. Hence, the recommendation is likely to be weak.
|
Following closely from this reasoning, a third way for clinicians to interpret strong recommendations is, for typical patients, to just do it. On the other hand, when clinicians face weak recommendations, or when they face patients with very atypical circumstances or values, they should carefully consider the benefits, risks, and burdens in the context of the individual patient before them.
How to individualize decision making in weak recommendations remains a challenge. One strategy uses decision aids that present patients with both the benefits and downsides of therapy.5 Because of time constraints, clinicians cannot use decision aids in all patients. For strong recommendations, using a decision aid is likely, for most patients, to constitute a poor use of time and energy. For weak recommendations, clinicians should consider the use of a decision aid or, alternatively, a detailed conversation with the patient to ensure that the ultimate decision is consistent with the patients values.
Factors That Influence the Strength of a Recommendation
Guideline panels must consider a number of factors in grading recommendations (Table 3 ). One issue is their confidence in the best estimates of benefit and harm. The rating of methodological quality, which we discuss below, captures that degree of confidence.
|
The choice of adjusted-dose warfarin vs aspirin for the prevention of stroke in patients with atrial fibrillation illustrates a number of the factors that will influence the strength of a recommendation. A systematic review and metaanalysis8 found a relative risk reduction (RRR) of 46% in all strokes with warfarin vs aspirin. This large effect supports a strong recommendation for warfarin. Furthermore, the relatively narrow 95% confidence interval (RRR, 29 to 57%) suggests that warfarin provides an RRR of at least 29%, and further supports a strong recommendation. At the same time, warfarin is associated with the inevitable burdens of keeping the dietary intake of vitamin K constant, monitoring the intensity of anticoagulation with blood tests, and living with the increased risk of both minor and major bleeding. Most patients, however, are much more stroke averse than they are bleeding averse.8 As a result, almost all patients with high risk of stroke would choose therapy with warfarin, suggesting the appropriateness of a strong recommendation.
This last point emphasizes the importance of the patients baseline risk of the adverse outcome that treatment is designed to avoid. Consider a 65-year-old patient with atrial fibrillation and no other risk factors for stroke. This individuals risk for stroke in the next year is approximately 2%. Therapy with dose-adjusted warfarin can, relative to aspirin, reduce the risk to approximately 1%. Some patients who are very stroke-averse may consider the downside of receiving warfarin therapy to be well worth it. Others are likely to consider the benefit not worth the risks and inconvenience. When, across the range of patient values, fully informed patients are liable to make different choices, guideline panels should offer weak (grade 2) recommendations.
As benefits and risks become more finely balanced, or more uncertain, decisions to administer an effective therapy also become more sensitive to resource use (cost) implications. When dealing with resource allocation issues, guideline panels face challenges of limited expertise, paucity of rigorous and unbiased cost-effectiveness analyses, and wide variability of costs across jurisdictions or health-care systems. Ignoring the issue of resource use (costs) is, however, becoming less and less tenable for guideline panels.9
When guideline developers make recommendations, they assume a particular set of values as they weigh the possible beneficial and detrimental outcomes. When value or preference judgments are particularly salient, guideline panels should describe the key values attached to these outcomes and that influenced the direction of a recommendation or its grade. Guideline panels often do not elicit direct or indirect representation from patients in arriving at these values. Moreover, recommendations can only reflect average values. These considerations emphasize the importance of guideline panels making explicit the key values and preference judgments that drive their recommendations.
Wording of Recommendations
Given the proliferation of grading systems, and the resulting confusion, it is desirable to provide clinicians with as many indicators as possible in interpreting the strength of recommendations. ACCP panels, when they are making a strong recommendation, will use the terminology, "We recommend... . " When they make a weak recommendation, ACCP guideline panels will use less definitive wording, such as, "We suggest... . " Further, the clarity of recommendations requires that the target patient population be defined and, when appropriate, the details of how clinicians should administer the intervention.
Confidence in Estimates of Magnitude of Benefits, Risks, Burdens, and Costs
Early systems of grading methodological quality relied primarily on the basic study design (ie, randomized control trials [RCTs], or observational studies). The fundamental study design remains critically important in determining our confidence in estimates of beneficial and detrimental treatment effects. Because of prognostic differences between groups, and the lack of safeguards such as blinding that can avoid biased ascertainment of outcomes, evidence based on observational studies will, in general, be appreciably weaker than evidence from RCTs. The last several years have seen, however, an increased awareness of a number of other factors that influence our confidence in our estimates of risk and benefit (Table 4 ).
|
The moderate quality category is populated by randomized trials with important limitations and by exceptionally strong observational studies. Observational studies, and on occasion RCTs with multiple serious limitations, will fill the low-quality evidence category. This categorization follows the principle that all relevant clinical studies provide evidence, the quality of which varies. Following this principle, the ACCP does not use a threshold for "acceptable evidence" in the peer-reviewed published medical literature.
Factors That Modify the Quality of Evidence: Limitations in RCTs
When RCTs have addressed the impact of alternative management strategies (both benefits and harms) on all relevant outcomes, they will yield high-quality evidence unless they have one of a number of limitations. The following limitations may decrease the quality of evidence supporting a recommendation (Table 4).
Factors That Modify the Quality of Evidence: Observational Studies Can Provide Moderate or Strong Evidence
While observational studies will generally yield only low-quality evidence, there may be unusual circumstances in which guideline panels will classify such evidence as of moderate quality, or even high quality.
The investigators postulated two likely sources of bias. The first was residual confounding with disease severity. It is likely that, if anything, patients in the not-for-profit hospitals were sicker than those in the for-profit hospitals. Thus, to the extent that residual confounding existed, it would bias results against the not-for-profit hospitals.
The second likely bias was the possibility that higher numbers of patients with excellent private insurance coverage could lead to a hospital having more resources and to a "spillover" effect that would benefit those without such coverage. Since for-profit hospitals are likely to admit a larger proportion of such well-insured patients than are not-for-profit hospitals, the bias is once again against the not-for-profit hospitals. Because the plausible biases would all diminish the demonstrated treatment effect, one might consider the evidence from these observational studies as being of moderate quality rather than of low quality.
What to Do When Quality of Evidence Differs Across Outcomes?
When RCT results are available, the quality of evidence will often differ between primary efficacy and toxicity outcomes, usually between efficacy outcomes and cost, and almost always between efficacy outcomes and rare but serious side effects. On most occasions, efficacy outcomes will be the most important, and guideline panels can base their rating of the quality of the evidence exclusively on these end points. Panels should, however, consider whether toxicity end points are also crucial to the decision regarding the optimal management strategy. If they are, panels should consider the quality of evidence regarding those end points, and should make a final rating about the quality of evidence accordingly.
For instance, consider a guideline panel addressing the use of long-term oral steroids for patients with stage 2 or 3 sarcoidosis with moderate-to-severe symptoms and radiographic changes. Randomized trials have addressed the impact of steroids on radiographic findings, symptoms, and spirometry over a period of 2 years.17 These trials failed, however, to address steroid toxicity. If a guideline panel ignored toxicity, they might well rate the quality of evidence as high. If, however, they consider steroid toxicity as crucial in their decision, the uncertainty about the impact of treatment increases. If they look for observational studies to estimate steroid toxicity, the quality of the evidence about toxicity is likely to be low, and this may be the most appropriate rating for the overall quality of evidence. Alternatively, they may seek randomized trials of steroids in other conditions and face limitations of directness. They may then conclude that the evidence regarding steroid toxicity, and the overall quality of the evidence, is moderate.
The ACCP Grading System and Initiatives Toward Uniform Grading Across Guideline Panels
In considering alternative grading systems, we found that the structure and guides for application and interpretation suggested by the GRADE group largely met the criteria in Table 1.18 As a result, the categories presented in Table 2 permit similar interpretation to those of the GRADE group. The important aspect in which the ACCP task force approach differs is in combining low-quality and very low-quality evidence. While we achieved the primary goal of the ACCP task force, to identify a unified grading system for all future ACCP evidence-based guidelines, this exercise went beyond that goal. This article will facilitate the adoption of uniform guidelines through a simple, straightforward presentation that any guideline panel interested in the principles underlying Table 2 will find useful.
Clinicians understanding of systems of grading the strength of recommendations and quality of evidence will also benefit if systems map easily onto one another. The ACCP mapping onto the GRADE system is obvious, and the approach that the ACCP has adopted also maps easily onto other systems, including that of the ACC/AHA and prior ACCP guideline grading systems, further facilitating understanding and usefulness.
Summary
In the system that the ACCP has adopted, the strength of any recommendation depends on the following two factors: the tradeoff between the benefits and the risks and burdens; and the quality of the evidence regarding treatment effect. We grade the tradeoff between the benefits, and the risks and burdens into the following two categories; category 1, in which the tradeoff is clear enough that most patients, despite differences in values, would make the same choice, leading to a strong recommendation; and category 2, in which the tradeoff is less clear, and individual patient values will likely lead to different choices, leading to a weak recommendation. We grade methodological quality in terms of the following three categories: randomized trials that show consistent results, or observational studies with very strong treatment effects; randomized trials with limitations, or observational studies with exceptional strengths; and observational studies without exceptional strengths and case series. The framework summarized in Table 2 generates recommendations from the very strong (benefit/risk tradeoff unequivocal, high-quality evidence, grade 1A) to the very weak (benefit/risk questionable, low-quality evidence, grade 2C). Whatever the grade of the recommendation, clinicians must use their judgment, considering both local and individual patient circumstances, and patient values, in making individual decisions. In general, however, they should place progressively greater weight on expert recommendations as they move from grade 2C to grade 1A.
Footnotes
Abbreviations: ACCP = American College of Chest Physicians; RCT = randomized controlled trial; RRR = relative risk reduction
Received for publication August 16, 2005. Accepted for publication August 21, 2005.
References
This article has been cited by other articles:
![]() |
P. J. Karanicolas, R. Kunz, and G. H. Guyatt Point: Evidence-Based Medicine Has a Sound Scientific Base Chest, May 1, 2008; 133(5): 1067 - 1071. [Full Text] [PDF] |
||||
![]() |
G. H. Guyatt, P. J. Karanicolas, and R. Kunz Rebuttal From Dr. Guyatt et al Chest, May 1, 2008; 133(5): 1074 - 1075. [Full Text] [PDF] |
||||
![]() |
D. Langer, T. Troosters, M. Decramer, and R. Gosselink Grading Recommendations: A Matter of Interpretation Chest, March 1, 2008; 133(3): 830 - 830. [Full Text] [PDF] |
||||
![]() |
K. E. Donahue, G. Gartlehner, D. E. Jonas, L. J. Lux, P. Thieda, B. L. Jonas, R. A. Hansen, L. C. Morgan, and K. N. Lohr Systematic Review: Comparative Effectiveness and Harms of Disease-Modifying Medications for Rheumatoid Arthritis Ann Intern Med, November 19, 2007; (2007) 0000605-200801150-00192. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Milroy New American College of Chest Physicians Lung Cancer Guidelines*: An Important Addition to the Lung Cancer Guidelines Armamentarium Chest, September 1, 2007; 132(3): 744 - 746. [Full Text] [PDF] |
||||
![]() |
M. H. Baumann, S. Z. Lewis, and D. Gutterman ACCP Evidence-Based Guideline Development: A Successful and Transparent Approach Addressing Conflict of Interest, Funding, and Patient-Centered Recommendations Chest, September 1, 2007; 132(3): 1015 - 1024. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. L. Ries, G. S. Bauldoff, B. W. Carlin, R. Casaburi, C. F. Emery, D. A. Mahler, B. Make, C. L. Rochester, R. ZuWallack, and C. Herrerias Pulmonary Rehabilitation: Joint ACCP/AACVPR Evidence-Based Clinical Practice Guidelines Chest, May 1, 2007; 131(5_suppl): 4S - 42S. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Verkerk, H. Van Veenendaal, J. L. Severens, E. J. M. Hendriks, and J. S. Burgers Considered judgement in evidence-based guideline development Int. J. Qual. Health Care, October 1, 2006; 18(5): 365 - 369. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. S. Irwin The New "Face" of CHEST Heralds a New Era Chest, January 1, 2006; 129(1): 1 - 3. [Full Text] [PDF] |
||||
![]() |
R. Kunz "What's in the Black Box?" Chest, January 1, 2006; 129(1): 7 - 10. [Full Text] [PDF] |
||||
![]() |
M. H. Baumann and D. D. Gutterman American College of Chest Physicians Evidence-Based Guidelines-the Next Generation: Considering Resource Use and Evolution to a Single Grading System Chest, January 1, 2006; 129(1): 10 - 12. [Full Text] [PDF] |
||||
![]() |
G. Guyatt, M. Baumann, S. Pauker, J. Halperin, J. Maurer, D. K. Owens, A. N.A. Tosteson, B. Carlin, D. Gutterman, M. Prins, et al. Addressing Resource Allocation Issues in Recommendations From Clinical Practice Guideline Panels: Suggestions From an American College of Chest Physicians Task Force Chest, January 1, 2006; 129(1): 182 - 187. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |