Don’t give p-values more credit than they deserve

DS202 Blog Post

week04
p-values
Author
Published

19 October 2022

TLDR

There are dangers in relying too much on p-values.

When you run a linear or logistic regression and find out that a regression coefficient has a low associated p-value, it is tempting to scream THIS FEATURE IS SIGNIFICANT AND I CAN PROVE!

In reality, although p-values might suggest a non-zero relationship between variables, you shouldn’t judge the performance or explainability of a model simply by the p-values of coefficients, nor the p-value associated with the full model (say, the F-statistic).

When assessing a model, look beyond goodness-of-fit. Perform train/test splits, cross-validation, bootstrap, and use appropriate measures of success to the problem you have at hand. Come to πŸ—“οΈ Week 04 workshop (the lecture) this Friday 21 October to learn more about this.

The reason I am saying all this is because p-values are very easy to hack. In fact, there is even a term for misuse of p-values in the scientific literature: p-hacking.

Where do I inform myself about this?

I have separated a list of articles and commentaries about this topic. Check them out:

πŸ’‘ If you are in a hurry and want to read just ONE thing, read the β€œScience Isn’t Broken.” piece at FiveThirtyEight. They have a cool visualisation to illustrate the problem.