It’s very difficult to capture the essence of statistical significance in a brief primer. (Journalist Christie Aschwanden published a thoughtful piece, “Not even scientists can easily explain P values.”)
First, let’s at least be as concerned about clinical significance as we are about statistical significance. Journalists should keep that in mind as they write stories about studies, but it’s a good idea for the general public to understand the possible difference as well. In other words, did the result make a difference in people’s lives?
From a Duke website:
Clinical versus Statistical Significance
“Although it is tempting to equate statistical significance with clinical importance, critical readers should avoid this temptation. To be clinically important requires a substantial change in an outcome that matters. Statistically significant changes, however, can be observed with trivial outcomes. And because statistical significance is powerfully influenced by the number of observations, statistically significant changes can be observed with trivial (small) changes in important outcomes. Large studies can be significant without being clinically important and small studies may be important without being significant.” (Effective Clinical Practice, July/August 2001, ACP)
Clinical significance has little to do with statistics and is a matter of judgment. Clinical significance often depends on the magnitude of the effect being studied. It answers the question “Is the difference between groups large enough to be worth achieving?” Studies can be statistically significant yet clinically insignificant.
For example, a large study might find that a new antihypertensive drug lowered BP, on average, 1 mm Hg more than conventional treatments. The results were statistically significant with a P Value of less than .05 because the study was large enough to detect a very small difference. However, most clinicians would not find the 1 mm Hg difference in blood pressure large enough to justify changing to a new drug. This would be a case where the results were statistically significant (p value less than .05) but clinically insignificant.
Guyatt, G. Rennie, D. Meade, MO, Cook, DJ. Users’ Guide to Medical Literature: A Manual for Evidence-Based Clinical Practice, 2nd Edition 2008.
The somewhat arbitrary choice to set the p-value for statistical significance at less than 5% was made nearly 100 years ago. There’s nothing magical or sacrosanct about it. It’s just become a time-honored norm. In reality, the difference between “not quite statistically significant” and statistically significant at the .05 level can be minuscule.
In a Scientific American blog post, Hilda Bastian wrote, “Statistical significance and its part in science downfalls.” Excerpt:
Get a “p value” over or under 0.05 and you can be 95% certain it’s either a fluke or it isn’t. You can eliminate the play of chance! You can separate the signal from the noise!
Except that you can’t. That’s not really what testing for statistical significance does. And therein lies the rub.
Testing for statistical significance estimates the probability of getting roughly that result if the study hypothesis is assumed to be true. It can’t on its own tell you whether this assumption was right, or whether the results would hold true in different circumstances. It provides a limited picture of probability, taking limited information about the data into account and giving only “yes” or “no” as options.
Dartmouth’s Lisa Schwartz, Steve Woloshin and Gil Welch wrote “Fat or Fiction? Is There a Link Between Dietary Fat and Cancer Risk? Why Two Big Studies Reached Different Conclusions” in a column for the Washington Post years ago that touched on some of these issues.
- It reflected on an “apparent flip-flop” in recent news about low-fat diet and breast cancer. One month, a front page Post headline read, “Low-Fat Diet’s Benefit Rejected: Study Finds No Drop in Risk for Disease.” But less than a year before, a headline sent a different message: “Study of Breast Cancer Patients Finds Benefit in Low-Fat Diet.”
- They wrote: “The p values for the effect of low-fat diet on breast cancer in the two studies were quite similar. For women with breast cancer, the p value was 3 percent. For women without breast cancer, the p value was 7 percent. So even though, by convention, one finding is called “statistically significant” and the other “not-significant,” we would say that the statistics of the two studies are not that different: Both are close to the conventional cutoff point of 5 percent…. if you believe one is real, you should probably believe the other is real.”
- Read the “Research Basics: Accounting for Chance” sidebar in their Post column for an explanation of how close the two can be.
Dr. Donald Berry of MD Anderson Cancer Center is a fellow of the American Statistical Association. He wrote an article in the Journal of the National Cancer Institute, “Multiplicities in Cancer Research: Ubiquitous and Necessary Evils.” It might be difficult to digest all of this in one sitting, but I’ll excerpt from the section on statistical significance:
“ statistical significance is an arcane concept. Few researchers can even repeat the definition of P value. People usually convert it to something they do understand, but the conversion—almost always an inversion—is essentially always wrong. For example: “The P value is the probability that the results could have occurred by chance alone.” This interpretation is ambiguous at best. When pressed for the meaning of “chance” and “could have occurred,” the response is usually circular or otherwise incoherent. Such incoherence is more than academic. Much of the world acts as though statistical significance implies truth, which is not even approximately correct.
Statistical significance is widely regarded to be difficult to understand, perhaps even impossible to understand. Some educators go so far as to recommend not teaching it at all.”
Still, as Berry wrote to me, “the cutpoint of 0.05 for statistical significance has become standard in many fields, including medicine.”
For example, many highly capable and highly intelligent MDs regard p > 0.05 versus p < 0.05 as defining truth. This attitude is sacrosanct in a sense but at the same time it is preposterous. As you say, the cutpoint is arbitrary. Moreover, essentially no one knows what a p-value means. And the rare scientist who can give the correct mathematical interpretation can’t put it into a non-mathematical language that someone else can understand. That’s because p-value is fundamentally a perversion of common logic. For example, if you read what I’ve written about p-values in the attached and you come away being able to repeat what you read (whether you understand it or not), you would be a rare bird indeed!
If there is a conclusion to this discussion, it may be Berry’s line:
One thing is clear: there is no one-size-fits-all approach.