I’ve decided it’s time to address statistical significance – because I don’t think many journalists understand this concept very well. At a very high, introductory level, you could say that it’s an attempt to judge the probability that something we think we have found is due only to random chance.
(Addendum 2 hours later: I should have emphasized that while “you could say” this…and it’s a commonly-used definition, it misses the mark and will be refuted in part 3 of this series.)
Recently our story reviewers offered a constructive criticism on a story that didn’t report on a potential harm that was just slightly beyond what’s considered statistically significant. But the harm was a possible increase in the rate of stillbirth or neonatal death, and the p-value – the recognized yardstick for statistical significance was .06 – not the gold standard of less than .05.
Mosby’s defines the p-value as:
“the statistical probability of the occurrence of a given finding by chance alone in comparison with the known distribution of possible findings, considering the kinds of data, the technique of analysis, and the number of observations.The P value may be noted as a decimal: P <.01 means that the likelihood that the phenomena tested occurred by chance alone is less than 1%. The lower the P value, the less likely the finding would occur by chance alone.”
I was prompted to write about statistical significance after receiving the following note from Randy Dotinga, a freelance health care journalist who is also president of the American Society of Journalists and Authors. Randy wrote:
Could you folks provide some guidance to reporters about when to report statistically insignificant findings?
I tend to avoid reporting them at all. If they’re not statistically significant, then why tell layperson readers about them? But this review (see Harms criterion) suggests paying more attention to such numbers, at least in this case, which would be a challenge: Hey reader, these numbers are statistically insignificant but we’re going to tell them to you anyway because they might mean something or they might not.
What are some best practices in this kind of situation?
I wrote back:
Thanks for your note on this important topic.
I’m happy to take a shot at guidance but we should acknowledge that hard-and-fast rules on statistical significance are somewhat problematic.
The somewhat arbitrary choice to set the p-value for statistical significance at less than 5% was made nearly 100 years ago. There’s nothing magical about it. It’s just become a time-honored norm. In reality, the difference between “not quite statistically significant” – as our reviewers noted with the stillbirth/neonatal death rate – and statistically significant at the .05 level can be minuscule. And when we’re talking about stillbirth/neonatal death, I agree with our reviewers that close is important enough to mention. In this case, the p value was .06. That’s even clear in the abstract. I agree with our reviewers’ apparent intent: let’s not split hairs over what’s a 1% one percentage point (see comment below that led to this addendum two hours later) difference in our estimate of what’s statistically significant when we’re talking about stillbirth/neonatal death.
While the story mentioned “a slight bump in the rate of stillbirths,” our criterion on harms calls for quantification. What does a slight bump in the rate mean to readers? Why not give them the numbers and let them decide what’s “slight”?
Your note is an excellent reminder to me that we need to update our “Toolkit” and the primers therein. This is a topic that we should explore in more depth.
I hope this is helpful for the moment. If we were talking about ingrown toenails or hair loss, I don’t think our reviewers would have urged including more mention of the just-off-the-time-honored-mark of what’s statistically significant stillbirth/neonatal death rate.
Two other things I should have addressed but didn’t:
This is a very important topic, and I’ve asked some experts to weigh in on this – people who are far better with statistics than I am. I readily admit that I am as math-phobic as any journalist. But if I can get a toe-hold on some of these topics, it’s clear that anyone can.
In parts two and three of this mini-series in the next two days, I’ll pass along some experts’ analyses of the question of where to draw the line on statistical significance. I think these upcoming examples will help readers better understand – and further question – claims about evidence.
Follow us on Twitter: