Statistically significant: where to draw the line? Part 1 of series to help journalists/readers

I’ve decided it’s time to address statistical significance –  because I don’t think many journalists understand this concept very well. At a very high, introductory level, you could say that it’s an attempt to judge the probability that something we think we have found is due only to random chance.

(Addendum 2 hours later:  I should have emphasized that while “you could say” this…and it’s a commonly-used definition, it misses the mark and will be refuted in part 3 of this series.)

Recently our story reviewers offered a constructive criticism on a story that didn’t report on a potential harm that was just slightly beyond what’s considered statistically significant.  But the harm was a possible increase in the rate of stillbirth or neonatal death, and the p-value – the recognized yardstick for statistical significance was .06 – not the gold standard of less than .05.

Mosby’s defines the p-value as:

“the statistical probability of the occurrence of a given finding by chance alone in comparison with the known distribution of possible findings, considering the kinds of data, the technique of analysis, and the number of observations.The P value may be noted as a decimal: P <.01 means that the likelihood that the phenomena tested occurred by chance alone is less than 1%. The lower the P value, the less likely the finding would occur by chance alone.”

I was prompted to write about statistical significance after receiving the following note from Randy Dotinga, a freelance health care journalist who is also president of the American Society of Journalists and Authors.  Randy wrote:

Could you folks provide some guidance to reporters about when to report statistically insignificant findings?

I tend to avoid reporting them at all. If they’re not statistically significant, then why tell layperson readers about them? But this review (see Harms criterion) suggests paying more attention to such numbers, at least in this case, which would be a challenge: Hey reader, these numbers are statistically insignificant but we’re going to tell them to you anyway because they might mean something or they might not.

What are some best practices in this kind of situation?

I wrote back:

Thanks for your note on this important topic.

I’m happy to take a shot at guidance but we should acknowledge that hard-and-fast rules on statistical significance are somewhat problematic.

The somewhat arbitrary choice to set the p-value for statistical significance at less than 5% was made nearly 100 years ago.  There’s nothing magical about it. It’s just become a time-honored norm.  In reality, the difference between “not quite statistically significant” – as our reviewers noted with the stillbirth/neonatal death rate – and statistically significant at the .05 level can be minuscule.  And when we’re talking about stillbirth/neonatal death, I agree with our reviewers that close is important enough to mention.  In this case, the p value was .06.  That’s even clear in the abstract.  I agree with our reviewers’ apparent intent:  let’s not split hairs over what’s a 1%  one percentage point (see comment below that led to this addendum two hours later) difference in our estimate of what’s statistically significant when we’re talking about stillbirth/neonatal death.

While the story mentioned “a slight bump in the rate of stillbirths,” our criterion on harms calls for quantification.  What does a slight bump in the rate mean to readers?  Why not give them the numbers and let them decide what’s “slight”?

Your note is an excellent reminder to me that we need to update our “Toolkit” and the primers therein.   This is a topic that we should explore in more depth.

I hope this is helpful for the moment.  If we were talking about ingrown toenails or hair loss, I don’t think our reviewers would have urged including more mention of the just-off-the-time-honored-mark of what’s statistically significant stillbirth/neonatal death rate.

Two other things I should have addressed but didn’t:

  • The journalist who wrote to me asked how to explain to readers that these numbers “might mean something or they might not.”  My not-intended-to-be-glib answer would be, “You should be including caveats like that in almost every story you do.”  We need to help people understand and embrace uncertainty, rather than promoting false certainty where it doesn’t exist.
  • In focusing on statistical significance, let’s not forget to question whether even results with a p-value < 0.05 are clinically significant.  In other words, did they make a meaningful difference in people’s lives?

This is a very important topic, and I’ve asked some experts to weigh in on this – people who are far better with statistics than I am.  I readily admit that I am as math-phobic as any journalist.  But if I can get a toe-hold on some of these topics, it’s clear that anyone can.

In parts two and three of this mini-series in the next two days, I’ll pass along some experts’ analyses of the question of where to draw the line on statistical significance.  I think these upcoming examples will help readers better understand – and further question – claims about evidence.

Part two: Are P-values greater than .05 really just statistical noise?

Part three: “Acting as though statistical significance implies truth isn’t even approximately correct.

 

———————-

Follow us on Twitter:

https://twitter.com/garyschwitzer

https://twitter.com/healthnewsrevu

and on Facebook.

You might also like

Comments (4)

We Welcome Comments. But please note: We will delete comments left by anyone who doesn’t leave an actual first and last name and an actual email address.

We will delete comments that include personal attacks, unfounded allegations, unverified facts, product pitches, or profanity. We will also end any thread of repetitive comments. Comments should primarily discuss the quality (or lack thereof) in journalism or other media messages about health and medicine. This is not intended to be a forum for definitive discussions about medicine or science. Nor is it a forum to share your personal story about a disease or treatment -- your comment must relate to media messages about health care. If your comment doesn't adhere to these policies, we won't post it. Questions? Please see more on our comments policy.

David Miller

March 10, 2015 at 10:45 am

The difference between p=0.05 and p=0.06 is not “1%”. It’s one percentage point, if we’re thinking about p-values as percentages. This seems like a meaningless quibble, but it can be quite important in many areas of study, but particularly in healthcare.

For example, consider a new drug has a 10% rate of serious side effects leading to death compared to an older drug with a 5% rate of serious side effects leading to death. If this is reported as a “5%” (10%-5%=5%) increase in the rate of death, not only is this math not correct it dramatically understates the fact the new drug DOUBLES the risk of death.

If the new drug DOUBLES the risk of death, the clinical benefit difference to the old needs to be considerable — much different, perhaps, than a mere “5%” increase in the risk of death.

    Gary Schwitzer

    March 10, 2015 at 10:48 am

    David,

    Thanks for your note, and for the correction.

    I knew I would be on the road this week at a conference, and put this series “on the shelf” to be published this week before I left.

    Everybody needs an editor, and you provided some editing help.

    It’s also been pointed out to me that I didn’t raise caveats that were strong enough about the sometimes-commonly-accepted description of statistical significance, when I simply wrote, “At a very high, introductory level, you could say that it’s an attempt to judge the probability that something we think we have found is due only to random chance.”

    There will be a refutation of that description in part 3.

    So, to critics of that description, be patient and please read the entire series.

Matthew

March 11, 2015 at 8:04 am

Mosby’s definition of p-value is wrong. I know a lot of people think about p-values in that way, but it is simply not accurate, or even necessarily close to accurate, to say that “P <.01 means that the likelihood that the phenomena tested occurred by chance alone is less than 1%." To determine the probability of something being due to chance, you need to know the prior distribution, and depending on that prior, a p-value of 0.01 could correspond to a very high probability of being due to chance. All we can say from the p-value is that the lower the p-value, the less likely it is the results were due to chance–the p-value is a measure of evidence, has no literal real-world interpretation.

More specifically, the p-value is conditional on the assumption that the results were just due to chance (ie, that the null hypothesis is true), and therefore cannot itself be the probability of being due to chance. I've written this up further here: http://www.separatinghyperplanes.com/2013/05/a-few-thoughts-on-statistical.html

    Gary Schwitzer

    March 11, 2015 at 8:23 am

    Matthew,

    Thanks for your note. As mentioned in my comment above, this definition will be refuted in part 3 of this series.