|
|
||||||||
Guest Access | Sign In via User Name/Password |
|||||||||
* From the Critical Care Program (Dr. Hébert), University of Ottawa, Ottawa; University of Toronto (Dr. Marshall), Toronto; Clinical Epidemiology Unit (Dr. Wells), University of Ottawa, Ottawa; and Department of Epidemiology and Biostatistics, McMaster University (Dr. Cook), Hamilton, ON, Canada.
Correspondence to: Paul C. Hébert, MD, MHSc(Epid), Ottawa Health Research Institute, The Ottawa Hospital/General Campus, 501 Smyth Rd, Room 1812H, Box 201, Ottawa, ON, K1H 8 L6 Canada
| Abstract |
|---|
|
|
|---|
Key Words: critical care methodology randomized trials study protocols
Randomized clinical trials (RCTs) have evolved to become the "gold standard" clinical research design used to distinguish the risks and benefits of therapeutic interventions.1 2 3 4 In 1948, for the first time, a controlled clinical trial made use of random allocation, a control group, and blinding. Additional principles guiding the design of RCTs were first elaborated by Sir Austin Brandford Hill in the 1960s.5 6 7
Many important questions regarding the management of critically ill patients have not been subjected to well-designed and executed RCTs. Consequently, clinicians frequently base their therapeutic decisions on suboptimal levels of clinical evidence including observational studies, poorly controlled clinical trials, or laboratory studies.3 8 The complex nature of critical illness, and a host of methodologic challenges have hampered the development and execution of clinical trials in this discipline. In this article, we will outline some of the methodologic issues central to the development and conduct of critical care RCTs.
| What Is Unique About Critical Care RCTs? |
|---|
|
|
|---|
In critical care, as in other disciplines such as surgery, study interventions are administered in conjunction with other treatment modalities by skilled multidisciplinary teams.9 10 The large number of therapeutic interventions required in the care of the critically ill also creates special challenges when performing clinical trials in this field. This is because of the significant number of therapeutic choices faced by the clinical investigator.
The selection of outcomes for RCTs in the critical care setting also poses unique challenges. Until recently, the choice of mortality as an RCT outcome was widely advocated by critical care researchers, the pharmaceutical industry, and government agencies such as the US Food and Drug Administration. A mortality rate, ascertained at 28 days or 30 days, is still considered the "gold standard" for the evaluation of ICU therapeutic interventions applying for licensure. However, in the past several years, a large number of clinical trials with negative outcomes have led investigators to suggest that the choice of mortality as an outcome may have significant limitations.11 As a primary outcome in an RCT, mortality might be too insensitive to detect the benefits of interventions when small but clinically important differences truly exist. These unique aspects of patients, interventions, and outcomes in critical care must first be considered in light of the research questions being asked and the design philosophy chosen to address these questions.
| Overall Design Approaches |
|---|
|
|
|---|
The most important consequence of these conflicting objectives is that choices made in the design of RCTs must focus on whether an intervention works or whether it results in more good than harm for patients.12 13 Trials that attempt to determine therapeutic "efficacy" address the question of "Will the therapy work under optimal conditions?" while trials attempting to determine therapeutic "effectiveness" address the question of "Will the therapy do more good than harm under usual practice conditions in all patients who are offered the intervention?" Clearly, both questions will yield useful information for health practitioners. Efficacy is often established first, and then the intervention is evaluated for its effectiveness. In pivotal RCTs used in the final phase of obtaining regulatory approval (phase III trials), pharmaceutical companies primarily wish to demonstrate that their product has proven efficacy. Rarely are attempts made to demonstrate therapeutic effectiveness in larger RCTs.
The design characteristics of efficacy and effectiveness trials tend to differ considerably (Table 1 ). As a consequence of design choices, inferences, and threats to the validity of effectiveness and efficacy trials are different. Therefore, one of the first steps in planning an RCT is to determine which of these two design approaches will best reflect the primary study question. Efficacy trials often opt for restricted eligibility, rigorous treatment protocols, and disease-specific outcomes responsive to the potential benefits of the experimental intervention. By using this approach, efficacy studies attempt to maximize internal validity, defined as the extent to which the experimental findings represent the true effect in study participants. Effectiveness trials would enroll most patients, introduce the interventions into the community at large with few controls, and monitor easily measured outcomes that are considered important to patients. As a consequence, the effectiveness approach will attempt to maximize external validity defined as the extent to which the experimental findings in the study represent the true effect in the target population. Hence, there are often trade-offs between the two forms of validity and their design approaches, as efficacy studies maximize internal validity at the expense of external validity while effectiveness studies optimally assess external validity (Fig 1 ).14
|
|
The level of control in all aspects of study design exercised by Morris and colleagues15 may be contrasted to the approach adopted in a study comparing restrictive to liberal transfusion strategies in the critically ill.16 In the Transfusion Requirements in Critical Care (TRICC) trial,17 a large number of clinical centers enrolled patients using broad eligibility criteria, followed simple treatment strategies for the administration of packed RBCs, and ascertained mortality rates and rates of organ failure. This approach would be considered more of a hybrid or combined approach especially when compared to the prototypical example of effectiveness trials, the very large International Study of Infarct Survival18 19 trials in acute myocardial infarction. In critical care, there are few examples of such large trials.20 Indeed, the largest trials have enrolled only a few thousand patients. RCTs in sepsis syndrome and septic shock have been successfully conducted using a hybrid approach21 22 23 rather than a true large, simple trial design. Most of the studies in this field collected significant amounts of data, implemented reasonably detailed but flexible treatment protocols, enrolled heterogeneous patient populations, and evaluated 28-day mortality rates.20 21 Therefore, many of the design characteristics adopted in sepsis RCTs could be considered a compromise between efficacy and effectiveness trial approaches. This approach was successfully used in the recently published activated protein C study published in the New England Journal of Medicine.24
At this juncture, we suggest that a compromise between the two extreme design approaches would be desirable for most multicenter trials. Although providing important information in cardiovascular and cancer care, large simple trials (effectiveness trials) have not been used successfully in the intensive-care setting.
| RCT Design Alternative |
|---|
|
|
|---|
|
Factorial designs imply concurrent comparisons between at least two therapies. It is also possible to implement a design that compares interventions sequentially. For example, one might compare two therapies in the early treatment of a disease followed by the evaluation of a second intervention(s) in the late phase of care several days later. One such example is the approach adopted by the National Institutes of Health ARDS trials network, where the RCT evaluated two ventilatory strategies (12 mL/kg vs 6 mL/kg of tidal volume) in conjunction with ketoconazole, 400 mg/d, vs placebo. The optimal use of this design requires that the outcome from the initial portion of the trial be ascertained prior to initiation of the second study.
Both the simple parallel-group design and a factorial design are generally implemented with the understanding that the sample size is fixed according to pre-established assumptions prior to the commencement of enrollment. There are other experimental designs that are more responsive to patient outcomes as the study progresses. Sequential designs26 27 28 set boundaries for significance levels that consider the increasing number of comparisons and sample size throughout the study. True sequential studies randomly allocate patients to receive one of two therapies. Pairs of patients are then sequentially compared. The study is terminated as soon as one of the significance boundaries is crossed. This design was successfully used by Meduri and colleagues29 to establish the benefits of IV methylprednisolone in the treatment of late-phase ARDS. The authors demonstrated that high-dose methylprednisolone was associated with improvement in lung injury, multiple-organ dysfunction syndrome (MODS) score, and mortality in 24 patients. This study question had all of the necessary attributes for this design. The population was well defined and homogenous; more importantly, the study end points were easily ascertained within a very short time following randomization. In critical care, the sequential design may be limited to select patients in whom a dichotomous outcome is promptly available for analysis, for example, progression of disease or intubation status (yes or no). Therefore, this approach may be considered when performing efficacy evaluations. One of the major concerns with the design may be its inability to conceal the randomization process and the uncertainty of not knowing the exact sample size in advance. From this methodology, several biostatisticians have developed methods of performing interim analyses in large clinical trials referred to as group sequential methods.30 31
Another RCT design option particularly amenable to an efficacy evaluation is a two-period crossover study in which patients are used as their own controls. In a two-period crossover trial,27 28 31 patients are randomized to one of two therapies for a fixed period of time and then proceed to receive the other therapy in a second comparable interval. Significant gains in efficiency are made by minimizing "between-subject" variability in this manner. Cooper et al32 determined the hemodynamic consequences of sodium bicarbonate by randomly allocating critically ill patients with lactic acidosis to receive either 2 mmol/kg of sodium bicarbonate or an equimolar amount of sodium chloride, followed by the other therapy. The authors determined that both sodium bicarbonate and sodium chloride equally increased left ventricular filling pressures and cardiac output without significantly changing arterial BP. One of the fundamental assumptions underlying this design is a "carryover effect": a treatment effect from the first period does not persist through the second period introduced. As a second example, Wright and colleagues33 examined whether bronchodilators decreased airflow resistance in patients with ARDS. The authors demonstrated that airways resistance, a short-term physiologic end point, was decreased in patients with ARDS. This was the optimal design choice given the reversibility of the outcome and the intervention. In our example, the administration of sodium bicarbonate in the first period may have altered acid-base status or calcium homeostasis in the second treatment period, potentially resulting in a bias toward the null hypothesis. Crossover studies are therefore best suited to relatively stable conditions (stability required during the study), interventions with rapid onset of action and a very short half-life (biological effect must disappear prior to second treatment period), and rapidly modifiable end points such as hemodynamic and respiratory measures.
All designs discussed so far have described the evaluation of interventions for individual patients. However, it is sometimes necessary to evaluate therapies, protocols, guidelines, or treatment programs for groups of individuals.34 35 36 37 Using this design, groups such as ICU and physician practices, often referred to as clusters, are randomized to receive alternative interventions. A cluster design may be the most appropriate for evaluating interventions such as antibiotic protocols, weaning guidelines, or early discharge programs. One of the major concerns is the possibility of large variations between clusters that may make it difficult to detect actual differences between therapies. Partial clustering in the allocation of patients was used in a study38 of hyperbaric oxygen therapy for acute carbon monoxide poisoning. With access to a single chamber, the investigators were faced with only enrolling one patient at a time or allocating all patients who were poisoned in the same incident to be treated at the same time in one cluster. In this well-conducted RCT,38 the authors not only did not find any benefit to hyperbaric oxygen but may have detected the possibility of harm.
| The Patient Population in Critical Care RCTs |
|---|
|
|
|---|
Second, selection is often based on disease definitions, comprised of a constellation of physiologic and other biological abnormalities rather than well-defined disease entities. One of the major challenges facing critical care investigators in the past several years has been to develop definitions for disease processes such as sepsis, septic shock, and ARDS.39 40 41 42 43 44 Unfortunately, few diseases in critical care are simple and clearly defined by a pathologic process such as myocardial infarction or a diagnosis of cancer. Consequently, clinical syndromes are often characterized by using alterations in physiologic, immunologic, and biochemical parameters. Recently, expert opinions from consensus conferences have helped in the formulation of these definitions.45 46 Prior to undertaking any clinical trial, it is not only important to understand the pathophysiology of the disease or syndrome, but investigators should also have a sound appreciation of its epidemiology.47 A detailed understanding of disease incidence and risk factors, as well as some of the limitations in using the definition of the clinical syndrome to select patients, is essential in the planning of an RCT.48 In assessing proposed or established syndrome definitions, investigators should question their validity and reproducibility (or reliability) as well as whether definitions are sufficiently well established and user friendly to warrant their use in a clinical trial.
In addition to concerns related to location and disease definitions, the choice of either an efficacy or effectiveness approach will have a substantial impact on the selection of the study population. Specifically, in choosing an efficacy approach, investigators usually perform the study in a well-defined patient population where the intervention has the highest probability of demonstrating an effect. This may be done by narrowly defining the patient population through the use of restrictive eligibility criteria and disease definitions as well as selecting specialized centers with clinical expertise in the field. Choosing a narrowly defined study population will decrease overall variability attributed to patient selection but may potentially hamper patient recruitment and jeopardize the generalizability of the study results.49 Despite these concerns, this approach has been successfully used by the National Institutes of Health ARDS Network.50
When defining the eligibility criteria for an effectiveness trial, investigators should consider utilizing more liberal criteria in a wide range of clinical settings. Thus, as the study is being designed, medical or surgical critically ill patients with a broad range of primary diagnoses (or underlying conditions) from a range of tertiary-care centers might be considered for enrollment in the study. Liberal selection of study centers and more permissive eligibility have been adopted in many studies performed by the Canadian Critical Care Trials Group, including the TRICC trial,17 and the study by Cook et al51 comparing sucralfate and ranitidine.
On the spectrum between highly selected patients (efficacy) and all ICU patients (effectiveness), we suggest that critical care investigators should consider a number of factors in making the decision. In practice, considerations such as the spectrum of biological activity of the intervention (wide or narrow), funding and resource constraints, the prevalence of the specific condition or disease process, the frequency of the primary outcome, as well as the scientific or clinical interest of the investigative team, will impact on choices made in the selection criteria for potential study participants and study sites. Targeting high-risk patients and possibly centers where the condition of interest is more prevalent may be used as a strategy to maximize the use of study resources.
| Study Interventions |
|---|
|
|
|---|
To cope with concerns regarding the complexity of care, investigators must consider the degree of control or constraints imposed on experimental and nonexperimental interventions that will be adopted in an RCT.52 Experimental constraints on study interventions can be implemented by instituting rigorous treatment protocols or by the selection of study centers. Thus, a number of choices face the clinical investigator. For example, should the administration of antibiotics be standardized in a septic shock trial? Should the ventilatory management be tightly controlled in an RCT of a weaning intervention?
As outlined in the previous section, the answers to these questions will partially depend on whether investigators wish to evaluate therapeutic efficacy or effectiveness. Elaborate study protocols detailing the use of experimental and nonexperimental therapies characterize efficacy evaluations. It is expected that the development and implementation of elaborate treatment protocols will decrease overall variability attributed to the confounding influence of co-interventions.53 Decreased variation in the study may enable the detection of smaller clinically important treatment differences, if truly present. However, the development of treatment protocols themselves may improve patient care by decreasing unnecessary practice variation, by increasing the general knowledge of practitioners, or by adopting evidence-based practices in participating centers. Just as critical paths are not easily implemented at a site that did not participate in their development, elaborate study protocols may not be easily adopted in a wide variety of practice settings. Also, elaborate treatment protocols may jeopardize accrual of study participants and physician compliance.
An alternative approach would be to let therapeutic decisions (other than the experimental therapy) devolve onto the attending physician. Allowing the attending physician complete autonomy will maximize the generalizability (increased external validity) of study results but potentially increase the effect of confounding from co-interventions (decreased internal validity). The number and intensity of co-interventions invariably magnify underlying random error in an RCT, potentially leading investigators to falsely conclude that there are no benefits to a promising new therapy. To cope with increased variation, investigators must plan to substantially increase the sample size because of diminished benefits of the experimental therapy. Selecting specific ICUs to participate in an RCT may also be a worthwhile method of ensuring compliance with the protocols for the study intervention and co-intervention.
For all interventions, the blinding of the care team to the study interventions should be seriously considered because this study maneuver has been shown to minimize co-interventions and biases in ascertaining outcomes.54 Although blinding maneuvers are vitally important, feasibility and patient safety sometimes do not permit blinding of the study intervention. This is more problematic in nonpharmaceutical interventions. However, a number of examples exist in which study investigators have successfully implemented complex blinding maneuvers without jeopardizing either safety or feasibility.55 Pilot studies are recommended to determine whether blinding can be maintained safely and successfully. If, during a pilot study, caregivers are able to discern which treatment is being administered, then the blinding process should be reconsidered, improved, or possibly abandoned. When double blinding is not possible, for example in the evaluation of surgical techniques and new devices, other safeguards to minimize bias should include the selection of objective outcomes as well as independent and blinded outcome assessments if more subjective outcomes are chosen. In order to minimize differences in therapy due to inability to blind, regimented treatment protocols should also be considered. In addition, the influence of co-interventions can be tested post hoc using multivariate statistical techniques.
When treatment protocols are complex or controversial, compliance may also be a concern. We suggest investigators consider some or all of the following strategies to improve adherence to study protocols. There are several ways to increase compliance with study protocols: (1) by making study protocols simple and easy to implement; (2) by developing the protocols with as many stakeholders as possible; (3) by extensive dissemination of the study protocol and its rationale in participating study centers; (4) by obtaining formal agreements to respect the protocol from all ICU physicians and other potential collaborators; (5) by implementing a mechanism to minimize crossovers (such as consulting the site investigator and the study chair prior to crossing over), and by developing objective crossover criteria when this is a concern.
| Outcome Measures |
|---|
|
|
|---|
|
Outcomes should also measure what they are supposed to measure (validity), they should be precise, and they should be reproducible. There is little doubt that all-cause mortality would meet all of these criteria, but cause-specific mortality and a quality-of-life scale may or may not be a valid and reliable assessment of a patients health status following an episode of septic shock. Finally, an outcome must be able to detect a clinically important true positive or negative change in the patients condition following a therapy. In critically ill patients, the ability to discriminate or detect the potential benefits of therapy may be less than optimal using mortality rates ascertained at 30 days as the primary outcome.11 Because few sepsis studies have shown any significant impact on 30-day mortality, many investigators11 47 have suggested that other outcomes should be considered in RCTs evaluating therapeutic effectiveness.
However, the ability to discriminate between beneficial and risky therapies may be modified by specific design choices including many related to outcomes. Discriminability can easily be increased by increasing the sample size. Using mortality as an example of primary outcome, the sample size in a clinical trial comparing two therapies is based on the baseline event rate, the expected incremental benefit, the level of significance (
), and the power to detect differences (1-ß). Establishing the anticipated incremental benefit of a new therapy is vitally important because of the enormous sample size repercussions. A sample size calculation for an RCT requires that the investigators establish the minimum therapeutic effect detectable within the trial. This difference in outcomes between interventions is referred to as the minimally important difference or minimal clinically important difference. The minimally important difference is essentially establishing the level of discrimination in the study population exposed to the interventions given acceptable levels of type I and type II error and the baseline event rate. Too often, investigators calculate a sample size based on very large and unrealistic expected differences in outcomes. To determine a plausible effect size, investigators should ask themselves the following questions: (1) what difference or incremental benefit can be realistically expected of the experimental therapy (anticipated biological effect of therapy); (2) are the required number of patients available to participate in the clinical trial (feasibility); and (3) how much of a survival benefit, given the added costs and expected side effects of therapy, would be required for clinicians, patients and administrators to adopt a new therapy (overall benefit of therapy)?
As a concrete example, let us assume that a given study population has an expected mortality rate of 25% in the standard-therapy group while the experimental therapy is expected to decrease mortality by an absolute difference of 12.5% (a 50% relative risk reduction). The total number of patients required would approximate 250. Most therapies used in the ICU would not be expected to decrease mortality so dramatically. More realistic expectations may be in the range of a 5% absolute decrease (a 20% relative risk reduction), which would require a total sample size of 2,200 patients, respectively, if the baseline mortality was 25%. Investigators need to consider whether an absolute incremental benefit in the range of 5 to 10% is attainable using the experimental therapy. If not, another more discriminating outcome should be sought.
As an alternate approach, discriminability may be improved by altering the ascertainment period. In other words, mortality rates may be determined at 24 h, 7 days, or at ICU discharge, rather than longer time intervals such as 30 days or 6 months. The timing of the ascertainment will have opposing influences on its ability to discriminate and its clinical relevance. As the ascertainment period of mortality is lengthened, the clinical importance of the outcome is increased. However, as the time from the administration of the therapy to the assessment of mortality is increased, the relationship between the effects of the therapy and the outcome may be confounded by extraneous factors and intervention. Therefore, the ability of an intervention to discriminate between groups on the basis of mortality may decrease as time progresses (Fig 2 ).
|
| Conclusion |
|---|
|
|
|---|
|
| Acknowledgements |
|---|
| Footnotes |
|---|
Drs. Hébert and Cook are Career Scientists of the Ontario Ministry of Health.
Received for publication March 29, 2000. Accepted for publication September 24, 2001.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
H Mann Controversial choice of a control intervention in a trial of ventilator therapy in ARDS: standard of care arguments in a randomised controlled trial J. Med. Ethics, September 1, 2005; 31(9): 548 - 553. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Cook Is Albumin Safe? N. Engl. J. Med., May 27, 2004; 350(22): 2294 - 2296. [Full Text] [PDF] |
||||
![]() |
K. D. Chinsky Ventilator-Associated Pneumonia: Is There Any Gold in These Standards? Chest, December 1, 2002; 122(6): 1883 - 1885. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||