It”s increasingly clear that surrogate endpoints don”t tell the entire story when it comes to a treatment”s effectiveness. Just a couple of weeks ago, we learned that a drug which raises “good” cholesterol (a surrogate for cardiovascular disease risk) had no effect on the incidence of heart attacks and strokes. Previous research found that aggressively lowering blood sugar (a surrogate for diabetes complications) actually increased the risk of death among individuals with type 2 diabetes.
Such findings remind us to focus more closely on the real outcomes that matter to patients –- things like death, disease severity, time spent in the hospital, and patient quality of life. But even these supposedly “real” outcomes can give us an inflated sense of how well a treatment works if we don”t evaluate the results carefully. This is especially true for studies that employ so-called composite endpoints, or outcomes that consist of two or more components that are combined into a single result.
Such outcomes are commonly used in studies testing new treatments for cardiovascular disease. An example might be a composite outcome consisting of non-fatal heart attacks, cardiovascular deaths, or emergency surgery to treat a blocked coronary artery.
Why use composites? The main advantage of this approach is increased statistical efficiency. By measuring more than one result and combining the data in a single outcome, researchers have an easier time showing a statistically significant difference between the treatment group and controls. This allows for studies that require fewer patients, take less time, and ultimately are more cost-effective. However, this approach can also open the door to misdirection and statistical sleight of hand.
One way this can happen is by combining components with varying clinical importance. As Gloria Cordoba, of the University of Copenhagen, and colleagues noted in a BMJ analysis, such combinations can make a treatment seem more effective than it really is:
For example, suppose a drug leads to a large reduction in a composite outcome of “death or chest pain.” This finding could mean that the drug resulted in fewer deaths and less chest pain. But it is also possible that the composite was driven entirely by a reduction in chest pain with no change, or even an increase, in death.
To help readers understand which components of the composite are most responsible for any treatment effect, many experts emphasize the importance of presenting data for all composite components in published research studies. But in their systematic review of 40 randomized trials that were published in 2008, Cordoba et al found that only 60% of the studies they looked at provided reliable estimates for all composite components. In many studies, there was a misleading implication that the results applied to the most important clinical component of the composite, when the results were primarily attributable to less serious components.
Here”s another concern with composites: Many studies use components, such as hospital admissions, that are based on a judgment call made by the clinicians conducting the study. And these are often the components of the composite that show the largest effect and contribute most to an overall positive result. This is problematic, Cordoba and colleagues note, because clinicians often aren’t blinded to the treatment that study patients are receiving (i.e. they know whether the patient is in the experimental treatment group or the placebo/control group). And so their judgment in these cases could easily be biased by their knowledge of the patient’s study group allocation. Not surprisingly, studies that include such “clinician driven” components are more likely to report a statistically significant result for the primary outcome.
A final warning involves studies that “cherry pick” data to include in the composite. When the components of the composite aren’t clearly identified prior to the study, researchers may be tempted to mix and match outcome components until they arrive at a statistically significant result (something that’s bound to happen eventually due to chance). In one study singled out by Cordoba et al, the primary outcome was a composite of 8 different components that wasn’t statistically significant. However, the authors also reported on a number of secondary composites that consisted of “combinations of primary end points as well as death from any cause.” These combinations weren’t specified in the study, but Cordoba calculated 502 ways that these components could be combined. It’s no shock that the researchers ultimately turned up a statistically significant result for one of these combinations—a finding that was singled out for emphasis in the study abstract, but which is of uncertain clinical importance.
The bottom line is that we need to be careful when reporting on studies that use composite outcomes. When these studies report a benefit, reporters should evaluate whether there was a similar effect on all components of the composite; if not, they should identify which component of the composite was primarily responsible for the result, and explain whether that component is more or less important than the others. Be especially careful when the component calls for a judgment call on the part of the clinician (e.g. hospital admissions, referral for surgery, initiation of new antibiotics), as these measure are more likely to show a positive result that may reflect bias on the part of the researchers.
Lastly, it’s also important to check whether the components of the composite were determined before the study was initiated (a priori) or after it was completed (post hoc). This can often be gleaned either from a careful reading of the study itself or by checking its registry listing (if one exists) at clinicaltrials.gov. Trial registries provide a record of what outcomes were specified before the study started, so that researchers can’t later decide to cherry pick other results that showed a benefit. Post hoc changes to the composite components should generally be viewed with skepticism.