Note to our followers: Due to a lack of sufficient funding, HealthNewsReview.org will cease daily publication of new content at the end of 2018. Publisher Gary Schwitzer and other contributors may post new articles periodically. If you wish to donate, your gift might help keep the site available to the public for a few more years, by defraying costs of web hosting and maintenance. All of our 6,000+ published articles contain lessons to help people improve their critical thinking about health care. Read more about our change in status. And here's how to make a donation.

Why you should be careful with composite outcomes in clinical trials


composite outcomes in clinical trials

A “composite endpoint” is when researchers in a clinical trial decide to combine several measurable outcomes into a single result.

Here’s a real-world example from 2018, of a heart disease drug called Repatha, made by the drug company Amgen. The composite outcomes are bolded:

“Amgen announced that the FOURIER trial evaluating whether Repatha reduces the risk of cardiovascular events met its primary composite endpoint (cardiovascular death, non-fatal myocardial infarction (MI), non-fatal stroke, hospitalization for unstable angina or coronary revascularization).”

Composite outcomes are commonly used in studies testing new treatments for cardiovascular disease, but they can be found across medical research. A study may look at how a single intervention impacted several things, such as:

  • rate of heart attack, stroke, or sudden death.
  • rate of death or chronic lung disease in preterm babies
  • rate of complete or partial organ rejection in transplants

Using composite outcomes has its advantages and disadvantages, but for journalists and news consumers, it’s good to be wary anytime you come across them.

Why use composite outcomes in clinical trials?

The main advantage of this approach is increased statistical efficiency. By measuring more than one result and combining the data in a single outcome, researchers have an easier time showing a statistically significant difference between the treatment group and controls. This allows for studies that require fewer patients, take less time, and ultimately are more cost-effective.

However, this approach can also open the door to misdirection and statistical sleight of hand.

One way this can happen is by combining components with varying clinical importance. As Gloria Cordoba, of the University of Copenhagen, and colleagues noted in a BMJ analysis, such combinations can make a treatment seem more effective than it really is:

For example, suppose a drug leads to a large reduction in a composite outcome of “death or chest pain.” This finding could mean that the drug resulted in fewer deaths and less chest pain. But it is also possible that the composite was driven entirely by a reduction in chest pain with no change, or even an increase, in death.

Misleading implications

To help readers understand which components of the composite are most responsible for any treatment effect, many experts emphasize the importance of presenting data for all composite components in published research studies.

But in their systematic review of 40 randomized trials, Cordoba et al found that only 60% of the studies they looked at provided reliable estimates for all composite components. In many studies, there was a misleading implication that the results applied to the most important clinical component of the composite, when the results were primarily attributable to less serious components.

Some composites are based on ‘judgement calls’

Here’s another concern with composites: Many studies use components, such as hospital admissions, that are based on a judgment call made by the clinicians conducting the study. And these are often the components of the composite that show the largest effect and contribute most to an overall positive result.

Researchers on a major NIH-funded clinical trial recently faced criticism for adding these “judgment call” outcomes to their study after it was already underway.

This is problematic, Cordoba and colleagues note, because clinicians often aren’t blinded to the treatment that study patients are receiving (i.e. they know whether the patient is in the experimental  treatment group or the placebo/control group). And so their judgment in these cases could easily be biased by their knowledge of the patient’s study group allocation.

Not surprisingly, studies that include such “clinician driven” components are more likely to report a statistically significant result for the primary outcome.

Shifting endpoints can lead to cherry-picked data

A final warning involves studies that “cherry pick” data to include in the composite. When the components of the composite aren’t clearly identified prior to the study, researchers may be tempted to mix and match outcome components until they arrive at a statistically significant result (something that’s bound to happen eventually due to chance). In one study singled out by Cordoba et al, the primary outcome was a composite of 8 different components that wasn’t statistically significant. However, the authors also reported on a number of secondary composites that consisted of “combinations of primary end points as well as death from any cause.”

These combinations weren’t specified in the study, but Cordoba calculated 502 ways that these components could be combined. It’s no shock that the researchers ultimately turned up a statistically significant result for one of these combinations—a finding that was singled out for emphasis in the study abstract, but which is of uncertain clinical importance.

Bottom line

We need to be careful when reporting on studies that use composite outcomes. When these studies report a benefit, reporters should evaluate whether there was a similar effect on all components of the composite; if not, they should identify which component of the composite was primarily responsible for the result, and explain whether that component is more or less important than the others. Be especially careful when the component calls for a judgment call on the part of the clinician (e.g. hospital admissions, referral for surgery, initiation of new antibiotics), as these measures are more likely to show a positive result that may reflect bias on the part of the researchers.

Lastly, it’s also important to check whether the components of the composite were determined before the study was initiated (a priori) or after it was completed (post hoc). This can often be gleaned either from a careful reading of the study itself or by checking its registry listing (if one exists) at clinicaltrials.gov.

Trial registries provide a record of what outcomes were specified before the study started, so that researchers can’t later decide to cherry pick other results that showed a benefit. Post hoc changes to the composite components should generally be viewed with skepticism.

 back to “Tips for Understanding Studies”

Comments (1)

We Welcome Comments. But please note: We will delete comments left by anyone who doesn’t leave an actual first and last name and an actual email address.

We will delete comments that include personal attacks, unfounded allegations, unverified facts, product pitches, or profanity. We will also end any thread of repetitive comments. Comments should primarily discuss the quality (or lack thereof) in journalism or other media messages about health and medicine. This is not intended to be a forum for definitive discussions about medicine or science. Nor is it a forum to share your personal story about a disease or treatment -- your comment must relate to media messages about health care. If your comment doesn't adhere to these policies, we won't post it. Questions? Please see more on our comments policy.

Terry Corbin

January 4, 2016 at 9:48 am

Unfortunately, the FDA requires their own composites to be included in spine implant studies. These always require much larger studies (typically triple the number of participants) and yet result in muddy conclusions. This seems to be a matter of convenience to the FDA–they want a single outcome to judge whether a product is worthy of approval. When the data is published, journal editors inevitably require publication of the individual components, as they should.

Reply