The parameter of interest when we deal with categorical data is the sample proportion.
An example of a sample proportion could be the proportion of people that are use cannabis. Imagine that we want to compare the proportion of males that are cannabis users to the proportion of females that cannabis users. Our two groups here are obviously independent. To do the above analysis, we usually use a Z test.
In order to start the analysis, we calculate 2 values:
1. The difference between the proportions of the 2 groups , and
2. The standard error
The difference between the two sample proportions is easily calculated with the following formula:
p1 – p2
The formula for the standard error is more complicated as it involves the calculation of pooled estimate . You can use the following formula:
Pooled = (p1 +p2)/(n1+n2)
p1 stands for the number of successes in group 1,p2 for the number of successes in group 2,n1 for the sample size of group 1 and n2 for the sample size of group 2.
After the pooled estimate , we can calculate the standard error. The formula for the standard error is the following:
se =square root( pooled x (1 − pooled) x (1/n1 + 1/n2 ))
Knowing the standard error , the pooled estimate and the proportions difference we are able now to calculate the Z value.
The formula is as follows :
Z = (p1 – p2 – null value)/se
The last step in the process is to calculate the p value associated with the Z value and to compare it to the critical cut-off point.
In order to check whether this p value is small enough to reject our null hypothesis, we first have to know two things:
1. What is the significance level against which we are testing?
2. Are we doing a one-sided or two-sided hypothesis test?
In a two-sided hypothesis test. If you are using a significance level of 0.05, a two-tailed test allots half of your alpha to testing the statistical significance in one direction and half of your alpha to testing statistical significance in the other direction. This means that .025 is in each tail of the distribution of your test statistic. If you would sum both tails, you would end up having a total significance level of 0.05
If you have your z value you can get the p_value through this small web app:
P_Value CalculationZ value:
This is one-tailed p value. For two-tailed testing multiply by 2.
For an example see our next post.