If you’re living with diabetes, you might be tempted to click on a headline like this one from a National Institutes of Health news release touting cardiovascular benefits from a cholesterol drug.
“Fenofibrate may reduce heart disease risk in some patients with type 2 diabetes”
The release goes on to suggest that among patients with type 2 diabetes who also had high levels of triglycerides and low levels of “good” cholesterol, the drug lowered the risk of cardiovascular events compared with placebo.
Sounds like it might be time to ask your doctor about the benefits of fenofibrate, right?
Before you schedule that appointment, you might want to consider this detail that wasn’t included in the news release: The findings of benefit came from a small group of patients within a larger study whose results had previously been reported. And that larger study, known as ACCORD, found no overall cardiovascular benefit among patients treated with fenofibrate.
It’s only when the researchers looked more closely at the data – specifically at patients with high triglycerides and low HDL cholesterol – that a possible benefit became apparent.
As our expert reviewers pointed out in their evaluation of the news release, deeper-dive analyses such as this one – called a subgroup analysis — need to be interpreted with great caution. They can be considered “hypothesis-generating,” meaning they can point to trends that merit further study. They cannot, however, provide conclusive evidence that a treatment provides a health benefit.
1. Subgroup results are more likely to be skewed
Randomization – assigning patients in a study to treatment or control groups according to chance — is one of the greatest protections we have against bias in medical research. It helps ensure that different groups in a study are balanced with respect to key patient characteristics, and that any differences observed are due to the experimental intervention rather than some other factor.
But you lose some of that protection in a subgroup analysis. Because they’re smaller than the overall study, subgroups are more likely to harbor subtle imbalances that can skew the findings.
Consider what might happen if there were slightly more older, sicker patients in the placebo arm of a study subgroup than in the active treatment group. The group receiving the active therapy might start to look pretty good in comparison with these sicker folks. But how much of that apparent benefit would be real, and how much due to there being more underlying illness in the placebo group?
It would be impossible to say for certain, but here’s something we can say with confidence: Subgroups have a lousy track record of identifying legitimate health benefits. In a recent analysis of clinical trials that included a claim from a subgroup, researchers found that only 5 out of the 46 results they looked at were ever tested in a follow-up study. None of the 46 subgroup claims held up in the subsequent research.
“Attempts to corroborate statistically significant subgroup differences are rare; when done, the initially observed subgroup differences are not reproduced,” the authors, led by John P. A. Ioannidis, MD, DScI, of Stanford University, concluded.
In other words, the findings rarely get tested on their own, and when they do, the results are lackluster.
2. Positive subgroup results are more likely to be due to chance
Another problem with subgroups is the higher likelihood of false-positive findings. Simply put, the more subgroups you look at, the greater the risk that you’ll find a positive result that just reflects chance rather than the intervention.
For every comparison that’s made in a study, there’s typically a 5% chance that you’ll randomly hit on a false-positive, experts say. That risk increases quickly with more comparisons, such that a study looking at 10 subgroups has a 40% chance of hitting on at least one false-positive result.
It’s especially problematic when researchers don’t specify which subgroups they are going to look at before they conduct the study. Exploring subgroups after the fact (known as a “post hoc” analysis) is sometimes referred to as “data dredging” or “p-hacking.” It greatly increases the chances that any positive findings are random and meaningless.
Having a limited number of subgroups that are prespecified (as in the NIH study on fenofibrate mentioned above) is more reassuring, but the findings should still be considered speculative until confirmed by stronger research.
We’ve written about these concerns before in the context of diet soda and heart disease risk.
These problems can be mitigated, but not entirely eliminated, through a statistical correction known as a “multiple comparisons adjustment” that University of Minnesota statistician Susan Wei, PhD, discusses in this video:
3. Subgroups have reduced statistical power
While we often see news reports that inappropriately emphasize benefits seen in subgroups, it’s also possible for subgroup analyses to miss important beneficial effects. Subgroups, by definition, are smaller than the main study and typically have less statistical power to identify subtle effects. False-negative results may mislead readers into thinking there is no benefit in a subgroup, when in fact there is a benefit that the analysis doesn’t have enough statistical power to detect.
Journalists can do more to acknowledge subgroup limitations
Journalists tend to cover health care in a way that favors simple, feel-good messages but which short-changes nuance. That’s one reason why subgroup limitations are rarely discussed in health care news. They get in the way of a tight narrative.
But we have occasionally seen articles that deftly navigate this topic, helping readers understand why subgroups matter without getting bogged down in detail.
Take for example this STAT piece about a drug therapy for prostate cancer that seemed to work particularly well in certain subgroups of men. STAT cautioned:
…looking at those subsets of patients was not what the study was initially designed to do, which limits the statistical power of those conclusions. More research will be needed to confirm which men benefit most from taking the drug.
But all in all, it’s more common to see PR managers and journalists inadequately address subgroup limitations. In fact, we’ve reviewed many news releases and stories where positive subgroup findings are the only results discussed, with no mention at all of the main study’s (typically negative) outcome! This is highly misleading.
Take this Penn State news release about the apparent benefits of omega-3 fatty acid supplements for the prevention of breast cancer in women taking the osteoporosis drug raloxifene. Our reviewers commented:
According to the release, at the end of two years, women considered obese (their body mass index — BMI — was greater than 29) showed a reduction in breast tissue density, which the researchers argued reduced their cancer risk. However, this is misleading. This finding is from a small subgroup of obese women within the already-small study. The main finding of the study — which is never even mentioned in the news release — is that the supplement had no impact on reducing high breast density whether taken alone or in tandem with Raloxifene.
Here’s another example in an NBC story about a device to treat headaches. Our reviewers noted that the study showed no benefit for the device across all study subjects – a fact neglected by the NBC coverage:
The apparent benefit of the device over the sham device was only seen after a pre-specified subgroup analysis. As noted in the accompanying editorial: “The primary endpoint was not achieved in this study – ie, differences in reduction in pain severity between active and sham groups were not significant…..conclusions drawn in this way are notorious for failing to replicate.”
The release didn’t mention the sizeable limitations of the study. The published study itself states that the research — based on a 2,000 patient cohort — did “not reach its primary objectives.” The evidence of efficacy was for a subgroup of patients who have specific genetic characteristics — and that sample size was limited…None of this made it into the news release.
Analyze subgroups sparingly and report on them cautiously
Despite their limitations, researchers say there is a legitimate role for subgroups in scientific studies. A treatment that doesn’t work in a broad group of patients might well provide benefits to a smaller, more well-defined group. Subgroup analyses are designed to find those potential treatment candidates, and allow elusive hints of benefit to be tested in a larger follow-up study.
But until such confirmation research is performed, you should view these analyses with caution and heed the conclusions of experts in study design, including James F Burke, MD from the University of Michigan and colleagues.
Writing in the BMJ, Burke and his coauthors outline a handful of very limited circumstances under which subgroup analyses may provide useful information (criteria which aren’t met by the vast majority of studies published today). They suggest that “subgroup analyses that do not meet these criteria should never be performed because false positives will greatly outnumber true positives and could be integrated into clinical decisions in spite of the best intentions of researchers.”
“Subgroup analyses have historically misinformed as much as they have informed,” they conclude.
We close with a video that explores some of the problems discussed on this page — although not in all cases and not perfectly. It deals with the “Texas Sharpshooter Fallacy,” which is a logical error that occurs when we inappropriately attach significance to a random finding. Many subgroup analyses are based on this flawed thinking and we hope the video will help clarify the issues at play for some readers.
Special thanks to Steven Woloshin, MD and Lisa M. Schwartz, MD of The Dartmouth Institute for Health Policy and Clinical Practice, for editing assistance. They’re co-authors of Know Your Chances, a book about understanding medical risks and seeing through misleading statistics.