Understanding the Limitations of p-Values
In the field of people analytics, p-values are often treated as the gold standard for determining statistical significance. A result with a p-value below 0.05 is commonly considered significant, while anything above this threshold is often dismissed. However, blindly adhering to the 0.05 cutoff can be misleading. Relying too heavily on this arbitrary threshold can result in an oversimplified view of statistical outcomes, potentially leading to flawed decision-making.
In this article, we’ll explore the limitations of using p-values, especially the 0.05 threshold, and explain why it’s essential to also consider the practical significance of your findings when working in people analytics.
P-values tell us the probability of obtaining a result at least as extreme as the one observed, assuming the null hypothesis is true. While useful for determining whether an effect is statistically significant, p-values, particularly the use of the 0.05 threshold, have several limitations that can distort how we interpret data.
Let’s say you’re analyzing the effect of a new leadership training program. The HR department wants to evaluate the program's effectiveness by comparing the performance ratings of employees who underwent the training (500 participants) versus those who didn't (4,500 non-participants). After running the analysis, you find a p-value of 0.049, suggesting statistical significance.
Based solely on the p-value being less than 0.05, the HR team might conclude that the training program has a statistically significant positive effect on employee performance. They might recommend expanding the program, potentially investing significant resources. However, this is problematic since the analysis ignored effect sizes. While the difference is statistically significant (p < 0.05), the actual difference in performance ratings is only 0.1 points on a 10-point scale. This tiny effect size might not justify the cost and effort of the training program.
Furthermore, the large sample size (5,000 employees) plays a significant role. In such large datasets, even tiny, practically insignificant differences can produce statistically significant results. The large sample size makes it easier to detect small differences that may not be meaningful in practice.
The HR team should also consider practical versus statistical significance. A 0.1 point increase in performance ratings may not translate into any meaningful improvement in actual job performance or organizational outcomes.
Additionally, there’s the matter of cost-benefit considerations. The cost of training 500 employees for a 0.1 point increase in performance ratings may far outweigh the benefits, making the program economically unsound, despite its "statistical significance."
Lastly, given that many factors influence performance ratings, this small difference could be due to other variables not accounted for in the analysis. Without controlling for other factors, the HR team risks attributing the small performance improvement solely to the training program.
This example illustrates how while the p-value suggests statistical significance, the small effect size, cost considerations, and potential for other influencing factors highlight the need for HR analysts to look beyond p-values and focus on practical significance when making decisions.
Another major issue with relying on p-values, particularly the 0.05 threshold, is the increased likelihood of false positives, especially when conducting multiple comparisons. In large-scale HR studies involving multiple hypotheses, the chance of finding a statistically significant result purely by chance increases with each additional test. This is often referred to as the multiple comparisons problem.
Correcting for multiple comparisons (such as using the Bonferroni correction) can help mitigate this risk, but it further highlights the limitations of using a fixed threshold like 0.05.
The 0.05 threshold for p-values, while useful, can create a false sense of certainty. It’s important to recognize the limitations of this threshold and avoid treating it as the ultimate arbiter of significance.
For HR professionals and people analysts, the key is not to stop at the p-value. Always consider the effect size and practical impact to ensure that your insights are both statistically and practically significant. In doing so, you’ll be better equipped to guide decisions that improve organizational outcomes and foster a data-driven culture in HR.
At DataSkillUp, we specialize in helping people analysts develop the skills needed to interpret both p-values and effect sizes effectively. If you’re looking to deepen your understanding of people analytics, reach out to us for personalized coaching and training.
Book a 60-minute discovery call to learn how we can help you achieve your People Analytics goals here.
Learn more about our coaching programs here.
Connect with us on LinkedIn here.