‘Not statistically significant but clinically meaningful’: A researcher calls ‘BS’ on cancer drug spin

Kevin Lomangino is the managing editor of HealthNewsReview.org. He tweets as @KLomangino.

This post has been updated; scroll to the bottom for details.

An egregious example of pharma spin was highlighted by Dr. Vinay Prasad, an oncologist at Oregon Health Sciences University, this week on Twitter.

He pointed to a Novartis promotional website for the immunosuppressant drug everolimus (brand name Afinitor) that’s used to treat kidney and other cancers.

His annotation of a graphic on the site called attention to some startling doublespeak:



The graphic claims that there was a 6.3 month difference in overall survival between groups who did and didn’t receive the drug for the treatment of pancreatic tumors. The fine print describes the difference as “not statistically significant but clinically meaningful.”

Come again?

In the conference abstract where these results were presented, the study authors reported the findings as “not statistically significant” – full stop. Nothing about “clinically meaningful” improvement.

Spinning a negative into a positive

In academic communications, when the difference between a treatment and control group is not statistically significant, it’s customary for researchers to conclude that there was “no difference” between the groups.

Podcast: Dr. Vinay Prasad

Dr. Vinay Prasad

The oncologist is one of our go-to sources for un-spinning health news on cancer and other topics. He previously went deep with publisher Gary Schwitzer on issues ranging from surrogate endpoints to the flailing promise of precision oncology. Take a listen here.

When results aren’t statistically significant, researchers can’t be sufficiently confident that any benefit they observed is real. Such findings are considered speculative until confirmed by other studies.

Sometimes, a result that was initially “not significant” might well reach the threshold of significance in a bigger study group with more patients, which is what this promotional material seems to anticipate.

But that’s a massive leap of logic, because nobody knows what would happen in larger study or whether the benefits seen in a smaller group of patients would hold up.

Despite that uncertainty, the Novartis promotional material treats this negative finding as evidence of benefit and hypes it as “clinically meaningful” to doctors visiting its website.

Dr. Susan Molchan, a HealthNewsReview.org contributor who has extensive experience in clinical research at the National Institutes of Health, called the advertising “blatantly misleading.”

“People, doctors included are more easily taken in because it’s a medical product and has FDA approval, which unfortunately is becoming less and less meaningful, as Dr. Prasad and others have pointed out with the increasingly lenient use of surrogate endpoints that haven’t been well studied enough to show that they really predict anything that will really be helpful to a patient,” she said.

Why does this matter?

Expensive and toxic cancer drugs are often approved and used despite the fact that they don’t work very well.

Everolimus specifically has been approved by the FDA for treatment of a wide range of aggressive cancers even though its side effects are very serious and there’s no proof that it extends life – as reported in this excellent Milwaukee Journal-Sentinel/Medpage Today investigation.

Research has repeatedly shown that doctors are likely to overestimate the benefits and underestimate the harms of many drugs and procedures.

While misleading pharmaceutical spin isn’t the only reason for this, it’s clearly an important contributor to the problem.

Update 3/24/17

Dr. Susan Wei, a biostatistician with the University of Minnesota and HealthNewsReview.org contributor, shared some thoughts on the issue of statistical significance and how it relates to “clinically meaningful” results. She told me that while it’s desirable to have “high power” in clinical studies (achieved through larger sample size/more patients), there is actually such a thing as too much power in a study:

“Specifically, studies with excessive power may detect statistically significant results that are not clinically meaningful. Researchers and practitioners would do well to keep this in mind: statistically significant does not mean clinically meaningful.”

With respect to the current study and its portrayal by Novartis, however, she wrote:

“A result that is statistically insignificant is not meaningful, period. Thus, we cannot say a result is statistically insignificant and clinically meaningful at the same time. The only hope is that perhaps the current study was inadequately powered so that it could not detect a statistically significant effect, which can be remedied in future studies. But since we are beholden to current evidence, we cannot make any meaningful statements from a statistically insignificant result.”

You might also like

Comments (6)

Please note, comments are no longer published through this website. All previously made comments are still archived and available for viewing through select posts.

Brad Flansbaum

March 24, 2017 at 8:14 pm

Statistically significant is an arbitrary construct. A p = 0.04 versus 0.06? Depending on the decision required one might call either “clinically significant.”

    Kevin Lomangino

    March 25, 2017 at 10:07 am


    I agree that the significance of a p value of 0.04 vs. 0.06 might be open to debate. The cutoff at 5% is arbitrary. However, in this case the p value is 0.30, which is very large and not even close to being significant.

    Kevin Lomangino
    Managing Editor

William M. London

March 27, 2017 at 10:17 am

I’m glad to see the point made that a statistically insignificant result cannot be clinically meaningful. However, I do have a quibble with the first paragraph under the heading “Spinning a negative into a positive”:

It may be customary for many researchers to conclude that p values greater than a specified cutoff indicate no difference between groups, but they are wrong when they do so. A p value is generated by analyzing the data based on the initial assumption (null hypothesis) that there is no difference. The p value tells us how likely it is to get a result at least as extreme as what was found, assuming the null hypothesis is true. It cannot justify the conclusion that there is no difference–what was assumed to begin with.

A better message is in the second paragraph of that section: “When results aren’t statistically significant, researchers can’t be sufficiently confident that any benefit they observed is real. Such findings are considered speculative until confirmed by other studies.”

Sam Watson

March 29, 2017 at 8:13 am

It has been pointed out in many places, over and again, that one should not conflate clinical and statistical significance. But on the basis of the article it seems you are arguing that statistical significance is a necessary condition for clinical significance, which is false.

The update that “A result that is statistically insignificant is not meaningful, period” would suggest that without statistical significance, we have learned *nothing* from the new data that could aid in a decision about the use of this drug. If this were true, then random chance would dictate whether we’ve learned something from new data, which is a perhaps tenuous epistemological position. Data from a trial must tell us *something*. Even if it is uncertain, it is not “not meaningful”. The claim that statistical significance is required to make any meaningful conclusion is the whole reason p-hacking and its consequences are a problem for the biomedical literature.

Finally, this entire discussion revolves around an arbitrary test of a indefensible hypothesis. Everolimus is an mTOR inhibitor. As a class of drugs I think that there is good reason to believe that they should have an effect on cancer cell proliferation, even if it may be small. But we have consigned ourselves to test if its effect is *exactly* zero, and then decide that this may actually be the case if p>0.05 or t<1.96. This would seem to go against any rational thinking. Was the trial well designed and conducted? Can we consider the estimate reliable?

I have no skin in the game, no desire to defend Novartis or any other drug company, and know full well industry sponsored trials can be biased. But, here they can only be said here to have overstepped the mark if: (i) the trial was poorly done; or (ii) the benefit is not actually clinically significant in context. Focusing on the statistical significance just goes to show how distracting this little statistic can be from a decent health care decision making process.

    Charles L Carter

    March 31, 2017 at 9:18 pm

    Yes, statistical significance and clinical significance are distinct. Both should be weighed when considering clinical decisions.
    But a large absolute effect that is statistically insignificant should be considered too likely due to chance to have actual meaning. Confidence intervals using the p deemed best (a different discussion) will show overlap in possible results comparing treatment to placebo. Using such results in clinical practice is tantamount to experimenting on patients.

      Sam Watson

      April 3, 2017 at 3:54 am

      I don’t agree with the line of thinking that says large p-values suggests results are ‘due to chance’ and using statistically insignificant evidence is tantamount to experimenting on patients (which all RCTs do, by the way! But I get the point.). If everything but the p-value from a statistically insignificant study was reported, would you conclude ‘this drug is completely ineffective’, or would you conclude, ‘this drug likely reduces cancer cell proliferation but we are uncertain about the magnitude of the effect’? It is purely an accident of history which drug is used as standard practice, and decisions have to be made today about how to treat patients. Relying on p-values skews the decision making process and research towards demonstrating statistical significance, rather than a convincing estimate of the magnitude of effectiveness. An under-powered but unbiased study is far preferable to an adequately powered but biased study.