Kathlyn Stone is an associate editor at HealthNewsReview.org. She tweets at @KatKStone.
If you’re a runner you might be drawn to a headline like “Turns out running may actually be good for your knees: Running lowers knee inflammation and may even protect against arthritis, according to a new study.” You might even scan the article then file the information away for future reference, like when you notice your knees giving you a lot of pain and you decide to just run through it, since you recently read about a new study saying that’s what you should do.
Every week we see dozens of news releases and news stories publicizing studies that involve very few patient volunteers, like the running and inflammation study above that was based on data from just six participants. Some of these trials make good clickable stories since they have broad appeal and deal with seemingly practical things, but they really don’t serve the public well. In fact, they could be doing more harm than good. Why?
Let’s review some of the key reasons:
Sometimes serious problems with a drug, device or procedure won’t show up until the intervention has been tested in a very large group of people. It’s also not unheard of for serious, even life-threatening harms to only be observed once a drug or device has been released commercially and given to hundreds of thousands of people.
The laparoscopic gastric band procedure (“lap band surgery”) reminds us of what can go wrong. In 2011 lap bands represented 35 percent of U.S. weight loss surgeries, but dipped to under 6 percent in 2015. Why? Long-term studies showed that a large number of people undergoing the procedure had complications, some of which were serious enough to necessitate expensive new surgeries. The lap bands also proved less effective than other surgical methods.
And yet as recently as January 2017, the University of Adelaide in Australia was touting the procedure as “a first option” for very obese adolescents based on a study of just 21 patients — 7 of whom had the device removed during the course of the study! Our reviewers called the news release out for its unjustifiably cheery tone, omission of any potential harms, and lack of any caution about the study’s tiny sample size.
Similar problems were on display with a story on a deep brain stimulation (DBS) device intended to help people with severe anorexia gain weight. The story downplayed the harms, even when those harms were stated clearly in the study. As our reviewers noted:
The story minimizes potential harms, stating “Implanting the DBS device requires minimally invasive surgery which can be completely reversed if problems occur.” There was no other discussion of potential harms.
In fact, the researchers reported:
“DBS was associated with several adverse events, only one of which (seizure during programming, roughly 2 weeks after surgery) was serious. Other related adverse events were panic attack during surgery, nausea, air embolus, and pain.”
And this study only involved six people.
Under-powered studies undermine confidence in scientific research. When a study is under-powered, it means it doesn’t have enough participants to reliably detect an effect or benefit — even though that effect or benefit may very well exist. A study may conclude that there’s “no statistically significant effect” from a test or treatment, but if it’s under-powered we’ll always be left wondering: Did the researchers miss an effect that an adequately powered study would have picked up?
The distortion produced by low-power studies also cuts the other way, as this helpful explainer in Nature Reviews Neuroscience points out. It states that “even when an under-powered study discovers a true effect, it is likely that the estimate of the magnitude of that effect provided by that study will be exaggerated.” Effect inflation — sometimes called the “winners curse” — is worse in small, low-powered studies, which can only detect large effects. “If, for example, the true effect is medium-sized, only those small studies that, by chance, overestimate the magnitude of the effect will pass the threshold for discovery.”
To determine how big a study needs to be, researchers do what’s called a “power calculation.” The number of participants who need to be recruited depends on how big an effect the researchers expect to see from the intervention; the smaller the expected effect, the more people that will be required to demonstrate a statistically significant result. (Here’s a primer on calculating sample sizes that’s reasonably easy to follow for the non-statistician.)
There are of course numerous reasons to undertake small studies (in the case of rare diseases, for one example, where it’s difficult to find trial participants). But these studies shouldn’t be promoted to a mass audience as being evidence for undergoing a procedure or used to steer patients into asking their physician for a drug or to buy a supplement off the shelves.
Why are small, under-powered studies unethical? Because they involve too few people to accurately determine if an intervention works and that leaves them “scientifically useless,” according to a trio of experts on bioethics and medicine who wrote a 2002 JAMA commentary. These studies are also ethically deficient, the authors wrote, because study volunteers believe they are doing something that they believe will help other patients or society at large. “Because patients often choose to participate in research because they want to help people with similar diseases, or help advance the cause of medicine, it is reasonable to assume that they would rather enroll in studies that have a better ability to provide healthcare benefits than in studies that have a lesser ability to provide healthcare benefits,” said one author. (Very large “over-powered” studies may also present ethical concerns, by the way, but that’s a topic for a different post.)
How big does a study need to be before we should take its results seriously? There’s no hard and fast answer to that question. Studies of dozens to hundreds of patients can be used to support the safety and efficacy of new treatments, according to the FDA, but research sufficient to support a new drug approval typically involves several thousand participants who are studied for at least a year.
In the case of the story on running and knee inflammation mentioned above, our reviewers zeroed in on the study’s small size. “This new study, of only 6 people, is low in quality and adds little to the evidence base. It probably did not deserve a news release at all, no less one that overstates the findings as as carelessly as this one.” Much larger and more practically relevant studies have examined the effects of running on arthritis symptoms and have come to conflicting results — something the story (and the news release it was based on) should have, but didn’t, mention.
Reviewers also thought a news release on using blueberry extract for improved cognition was problematic because of the small sample size. They wrote, “The study from the get-go wasn’t sufficiently powered (meaning it didn’t have enough study volunteers) to detect changes in cognitive performance. The published report notes that: “Cognitive function was assessed, as a secondary outcome, although the study was not sufficiently powered to detect changes in cognitive performance.”
HealthNewsReview.org is full of blog posts and reviewed news releases and stories where the size of the study was a top concern for reviewers.
When a news story or news release we’ve reviewed has gotten it right, we’ll point that out, as in this review of a HealthDay story on a stem cell “patch” for people with heart failure. Reviewers wrote:
Preliminary studies involving small numbers of people need to be handled with care–you don’t want to create false hope in patients or exaggerate the findings. This story does a good job placing the research findings in context, in terms of both how far this work is removed from clinical relevance, and how the work fits into the broader field of stem cell therapeutics for heart disease.
We’re not saying small study results should never be reported, but the limitations of such research need to be put front and center. And if they aren’t you should be very skeptical.