A “composite endpoint” is when researchers in a clinical trial decide to combine several measurable outcomes into a single result.
Here’s a real-world example from 2018, of a heart disease drug called Repatha, made by the drug company Amgen. The composite outcomes are bolded:
“Amgen announced that the FOURIER trial evaluating whether Repatha reduces the risk of cardiovascular events met its primary composite endpoint (cardiovascular death, non-fatal myocardial infarction (MI), non-fatal stroke, hospitalization for unstable angina or coronary revascularization).”
Composite outcomes are commonly used in studies testing new treatments for cardiovascular disease, but they can be found across medical research. A study may look at how a single intervention impacted several things, such as:
- rate of heart attack, stroke, or sudden death.
- rate of death or chronic lung disease in preterm babies
- rate of complete or partial organ rejection in transplants
Using composite outcomes has its advantages and disadvantages, but for journalists and news consumers, it’s good to be wary anytime you come across them.
Why use composite outcomes in clinical trials?
The main advantage of this approach is increased statistical efficiency. By measuring more than one result and combining the data in a single outcome, researchers have an easier time showing a statistically significant difference between the treatment group and controls. This allows for studies that require fewer patients, take less time, and ultimately are more cost-effective.
However, this approach can also open the door to misdirection and statistical sleight of hand.
One way this can happen is by combining components with varying clinical importance. As Gloria Cordoba, of the University of Copenhagen, and colleagues noted in a BMJ analysis, such combinations can make a treatment seem more effective than it really is:
For example, suppose a drug leads to a large reduction in a composite outcome of “death or chest pain.” This finding could mean that the drug resulted in fewer deaths and less chest pain. But it is also possible that the composite was driven entirely by a reduction in chest pain with no change, or even an increase, in death.
To help readers understand which components of the composite are most responsible for any treatment effect, many experts emphasize the importance of presenting data for all composite components in published research studies.
But in their systematic review of 40 randomized trials, Cordoba et al found that only 60% of the studies they looked at provided reliable estimates for all composite components. In many studies, there was a misleading implication that the results applied to the most important clinical component of the composite, when the results were primarily attributable to less serious components.
Some composites are based on ‘judgement calls’
Here’s another concern with composites: Many studies use components, such as hospital admissions, that are based on a judgment call made by the clinicians conducting the study. And these are often the components of the composite that show the largest effect and contribute most to an overall positive result.
Researchers on a major NIH-funded clinical trial recently faced criticism for adding these “judgment call” outcomes to their study after it was already underway.
This is problematic, Cordoba and colleagues note, because clinicians often aren’t blinded to the treatment that study patients are receiving (i.e. they know whether the patient is in the experimental treatment group or the placebo/control group). And so their judgment in these cases could easily be biased by their knowledge of the patient’s study group allocation.
Not surprisingly, studies that include such “clinician driven” components are more likely to report a statistically significant result for the primary outcome.
Shifting endpoints can lead to cherry-picked data
A final warning involves studies that “cherry pick” data to include in the composite. When the components of the composite aren’t clearly identified prior to the study, researchers may be tempted to mix and match outcome components until they arrive at a statistically significant result (something that’s bound to happen eventually due to chance). In one study singled out by Cordoba et al, the primary outcome was a composite of 8 different components that wasn’t statistically significant. However, the authors also reported on a number of secondary composites that consisted of “combinations of primary end points as well as death from any cause.”
These combinations weren’t specified in the study, but Cordoba calculated 502 ways that these components could be combined. It’s no shock that the researchers ultimately turned up a statistically significant result for one of these combinations—a finding that was singled out for emphasis in the study abstract, but which is of uncertain clinical importance.
We need to be careful when reporting on studies that use composite outcomes. When these studies report a benefit, reporters should evaluate whether there was a similar effect on all components of the composite; if not, they should identify which component of the composite was primarily responsible for the result, and explain whether that component is more or less important than the others. Be especially careful when the component calls for a judgment call on the part of the clinician (e.g. hospital admissions, referral for surgery, initiation of new antibiotics), as these measures are more likely to show a positive result that may reflect bias on the part of the researchers.
Lastly, it’s also important to check whether the components of the composite were determined before the study was initiated (a priori) or after it was completed (post hoc). This can often be gleaned either from a careful reading of the study itself or by checking its registry listing (if one exists) at clinicaltrials.gov.
Trial registries provide a record of what outcomes were specified before the study started, so that researchers can’t later decide to cherry pick other results that showed a benefit. Post hoc changes to the composite components should generally be viewed with skepticism.