p values, Statistical Significance, and the magic number.
Ever since I’ve been in academia, the term (‘p <0.05’) has always been a constant.
From the days of medical school, where my classmates and I learned to read and understand scientific journals and articles, all the way to grad school (where I currently am), for better or for worse, p values have been an eternal constant.
We accept reality in the way it is presented to us. In much the same fashion, I accepted the notion of p values and the effect it had on statistical significance as a fundamental part of basic science.
And why wouldn’t I?
P values are everywhere. In the scientific articles of most sciences related to healthcare, even tangentially, p values seem not just important, but essential to back up the validity of an observation, or of a statistical test.
The Dichotomization of Science
Why have p values become so inextricably linked with scientific investigation? Millions of articles use them in conjunction with the term statistical significance. Is it really so simple as a bright line in the sand being the only metric that separates meaningless noise from meaningful findings?
To be more precise,
Is a p value of 0.051 so different from a p value of 0.049? Is one really so much more significant than the other?
First, what does statistically significant even mean to begin with?
The term’s original use (all the way back in 1885!) was to be a simple tool to identify if a particular observation or inference merited further exploration. It was never meant to be a yardstick for what was scientifically important, and what wasn’t. Are p values such infallible arbiters of something being, for example,
statistically significant versus clinically significant?
Where do we go from here?
Okay, I think you get the point. I’m hardly the first person to bring this up.
But where do we go from here? How do we navigate the uncertainty that is inherent to data?
Do we replace p values with something else? Perhaps confidence intervals that include or exclude the null hypothesis?
A Bonferroni correction?
Much like how there is no clear line between noise and signal, there is no one-size-fits-all solution to this problem. As long as statistical tests exist, the urge to use them to explain patterns and give meaning to uncertainity will also exist.
We cannot just simply swap one statistical test for another.
To me, atleast, the way forward is what has been recommended by the ASA.
Honest and open communication of results and inferences, such as simply providing the p value as a continuous value, and providing point and interval estimates would go a long way in fostering a thoughtful, open environment.
This would be more conducive for replicablity and reproducible research, one of the cornerstones of responsible research!
This is a learning process, and I’m still working on it too. But hopefully, the learning pain of moving into a world that has moved past p<0.05 will ease up with thoughtful research and open communication!