Statistical Analysis is an essential part of Data Science and analysis. One of the most important concepts in statistics is Hypothesis Testing and P-Values. Interpreting P-Value can be tricky and you might be doing it wrong. Beware of P-Hacking!
By the end of this tutorial you will have the knowledge of below:
- How to reject/accept hypothesis
- What is P-Hacking and how to avoid it
- What is Statistical Power
Let’s dive right in!
What are P-Values?
P-values evaluate how well the sample data supports that the null hypothesis is true. It measures how correct your sample data are with the null hypothesis.
While performing Statistical tests, a threshold value or the alpha needs to be set prior to starting the test. A common value for it is 0.05, which can be thought of as a probability. P-values are defined as the probability of getting the outcome as rare as that alpha or even rarer.
Therefore, if we get our P-value less than that alpha, that would mean that our statistical test didn’t occur by chance and it was indeed significant. So, if our P-Value comes, say, 0.04, we say we reject the Null Hypothesis.
A low P value suggests that your sample provides enough evidence that you can reject the null hypothesis for the entire population. If you got a P-Value of anything less than 0.05 in our case, then you can safely say that the null hypothesis can be rejected. In other words, the sample you took from the population didn’t occur by pure chance and the experiment indeed had a significant effect.
So what can go wrong?
As we say that getting any P-value of less than alpha gives us the liberty to safely reject the Null Hypothesis, we might be making a mistake if our experiment itself is not showing the right picture! In other words, it might be a false positive.
What is P-Hacking?
We say that we P-Hacked when we incorrectly exploit the statistical analysis and falsely conclude that we can reject the null hypothesis. Let’s understand this in detail.
# Hack 1
Consider we have 5 types of CoronaVirus candidate Vaccines with us for which we need to check which one has actual impact on recovery time of patients. So let’s say we do Hypothesis Tests for all 5 types of vaccines one by one. We set the alpha as 0.05. And hence if P-Value for any vaccine comes less than that, we say we can reject the Null Hypothesis.. Or can we?
Say, Vaccine A gives a P-Value of 0.2, Vaccine B gives 0.058, Vaccine C gives 0.4, Vaccine D gives 0.02, Vaccine E gives 0.07.
Now, by above results, a naive way to deduce will be that Vaccine D is the one which significantly reduces recovery time and can be used as the CoronaVirus Vaccine. But can we really say that just yet? No. If we do, we might be P-Hacking. As this can be a false positive.
Okay, let’s take it another way. Consider we have a Vaccine X and we surely know that this Vaccine is useless and has no effect on recovery time. Still we carry out 10 hypothesis tests by different random samples each time with P-Value of 0.05. Say we get the following P-values in our 10 tests: 0.8, 0.7, 0.78, 0.65, 0.03, 0.1, 0.4, 0.09, 0.6, 0.75. Now if we had to consider the above tests, the test with a surprisingly low P-Value of 0.03 would have made us reject the Null Hypothesis, but in reality it was not.
So what do we see from the above examples? In essence, when we say that alpha = 0.05 we set a confidence interval of 95%. And that means that 5% of the tests will still result in errors as above.
Multiple Testing Problem
One way to tackle this would be to increase the number of tests. So more the tests, more easily you can say that the maximum number of tests are resulting in rejection of Null. But also, more tests will mean that there will be more false positives(5% of total tests in our case). 5 out of 100, 50 out of 1000 or 500 out of 10,000! This is also called the Multiple Testing Problem.
False Discovery Rate
One of the ways to tackle above problems is to adjust all the P-Value by using a mechanism called False Discovery Rate (FDR). FDR is a mathematical adjustment of the P-Values which increases them by some values and in the end, the P-Values which incorrectly came lower, might get adjusted to values higher than 0.05.
# Hack 2
Now consider a case from example where Vaccine B gave a P-value of 0.058. Wouldn’t you be tempting to add some more data and retest to see if P-Value decreases? Say, you add a few more data points, and the P-value for Vaccine B came to be 0.048. Is this legit? No, you’d again be P-Hacking. We cannot change or add data to suit our tests later and the exact sample size needs to be decided prior to performing the tests by doing Power Analysis.
Power Analysis tells us the right sample size we need to have the maximum chances of correctly rejecting the null hypothesis and not getting fooled.
# Hack 3
One more mistake you shouldn’t do is to change the alpha after you perform the experiments. So once you see a P-Value of 0.058, you think what if my alpha was 0.06?
But you cannot change it once your experiment starts.
Must Read: How to Become a Data Scientist?
Before you go
Hypothesis Testing and P-Values is a tricky subject and needs to be carefully understood before having any deductions. Statistical Power and Power Analysis are an important part of this which need to be kept in mind before starting the tests.
If you are curious to learn about data science, check out IIIT-B & upGrad’s PG Diploma in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.