\[ H_0: \pi_1 - \pi_2 = 0 \\ H_a: \pi_1 - \pi_2 > 0 \]
Seed observed | Seed not observed | Total | |
---|---|---|---|
Subject yawned | 11 | 3 | 14 |
Did not yawn | 23 | 13 | 36 |
Total | 34 | 16 | 50 |
\[ \begin{align} \hat{p}_1 &= 11/34 \approx 0.32 \\ \hat{p}_2 &= 3/16 \approx 0.19 \\ \hat{p}_1-\hat{p}_2 &= 11/34-3/16 \approx 0.136 = \mbox{test statistic} \\ \end{align} \]
How does parents’ behavior affect the sex of their children? Fukuda et al., 2002 (Japan) found the following:
Other studies have shown a reduced male to female birth ratio where high concentrations of other environmental chemicals are present (e.g. industrial pollution, pesticides).
## Nonsmokers Smokers Sum
## Boy 1975 255 2230
## Girl 1627 310 1937
## Sum 3602 565 4167
## Nonsmokers Smokers
## Boy 0.5483065 0.4513274
## Girl 0.4516935 0.5486726
## Nonsmokers Smokers
## Boy 1975 255
## Girl 1627 310
## Nonsmokers Smokers
## Boy 0.5483065 0.4513274
## Girl 0.4516935 0.5486726
\[ \mbox{test statistic} = \hat{p}_1 - \hat{p}_2 \approx 0.5483 - 0.4513 \approx 0.097 \]
\[ H_0: \pi_1 - \pi_2 = 0 \\ H_a: \pi_1 - \pi_2 \neq 0 \]
Let’s enter the data into the Two Proportion Applet.
\[ \mbox{standard error} \approx \sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2} \right)} \]
where \(n_1\) and \(n_2\) are the sizes of the two groups, and \(\hat{p}\) is the pooled proportion of “successes” in both groups.
When the validity conditions are met, the standardized statistic is
\[ \begin{align} z &= \frac{\mbox{observed statistic} - \mbox{null value}}{\mbox{standard error of statistic}} \\ &\approx \frac{(\hat{p}_1 - \hat{p}_2) - 0}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2} \right)}} \end{align} \]
For our data:
\[ \begin{align} z &= \frac{(\hat{p}_1 - \hat{p}_2) - 0}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2} \right)}} \\ &\approx \frac{(0.548 - 0.451) - 0}{\sqrt{0.535(1-0.535)\left(\frac{1}{3602} + \frac{1}{565} \right)}} \\ &\approx 4.30 \end{align} \]
Luckily, there’s an applet: Theory-Based Inference
## Nonsmokers Smokers Sum
## Boy 1975 255 2230
## Girl 1627 310 1937
## Sum 3602 565 4167
When computing a confidence interval, there is no \(H_0\), so we can’t assume that \(\pi_1 = \pi_2\). So the standard error is:
\[ \mbox{standard error} \approx \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}} \]
So a confidence interval for \(\pi_1 - \pi_2\) has the form
\[ (\hat{p}_1-\hat{p}_2) \pm \mbox{multiplier} \times \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}} \]
where the multiplier depends on the confidence level (e.g., 1.96 for 95% confidence).
For example, use the Theory-Based Inference applet.
##
## 2-sample test for equality of proportions with continuity correction
##
## data: c(1975, 255) out of c(3602, 565)
## X-squared = 18.077, df = 1, p-value = 2.122e-05
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.05182158 0.14213654
## sample estimates:
## prop 1 prop 2
## 0.5483065 0.4513274
##
## 2-sample test for equality of proportions with continuity correction
##
## data: c(1975, 255) out of c(3602, 565)
## X-squared = 18.077, df = 1, p-value = 2.122e-05
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.05182158 0.14213654
## sample estimates:
## prop 1 prop 2
## 0.5483065 0.4513274
We are going to look at data from the General Social Survey, which is a national survey conducted every two years on a nationwide random sample of adult Americans.
Are Americans any more or less generous about donating blood in some years than others? Are women any more or less willing to give blood than men? To investigate these questions we can analyze data from the General Social Survey, which is a national survey conducted every two years on a nationwide random sample of adult Americans.
We are going to look at data from the years 2002 and 2004.
## Y2002 Y2004 Sum
## Donated blood 210 230 440
## Did not donate 1152 1106 2258
## Sum 1362 1336 2698
Use the Two Proportion Applet to record the test statistic \(\hat{p}_{2002} - \hat{p}_{2004}\), and record a p-value for the test in #2. Also record the mean and SD of the null distribution.
Record a 95% 2SD confidence interval for \(\pi_{2002} - \pi_{2004}\). Is zero a plausible value? What does this fact mean in the context of blood donation?
Explain why the validity conditions for the theory-based test are met.
Check the box for Overlay normal distribution. Is the normal distribution a good approximation for the null distribution? Record a theory-based p-value, and compare it to #3.
Now use the Two Proportion scenario in the Theory-Based Inference applet to record a 95% confidence interval. How close is your interval to the 2SD interval in #4?
Find the theory-based p-value, and compare it to what you got in #6 (it should be the same).
Consider the combined data for 2002 and 2004, classified by sex:
## Male Female Sum
## Donated blood 239 201 440
## Did not donate 1032 1226 2258
## Sum 1271 1427 2698