Sections 3.4-3.5

February 11, 2021

Sections 3.4-3.5: Properties of Confidence Intervals

Goals for today

What factors affect the width of a confidence interval?
How can we think about the meaning of “95% Confidence”?
What are some pitfalls to watch out for when using statistical inference?

Holiday Spending Habits

A November 7-10, 2013, Gallup poll asked 1,039 U.S. adults how much they planned to personally spend on Christmas gifts this year. The report cited an average of $704.

observational units?
variable? Is it quantitative or categorical?
observed statistic? (use appropriate symbols)
parameter of interest (in words and symbols)?
sample size?

Recall: 2SD confidence interval for a mean

\[ \left( \bar{x} - 2\frac{s}{\sqrt{n}}, \bar{x} + 2\frac{s}{\sqrt{n}} \right) \]

Now work on 1-5 in your table groups.

The impact of Standard deviation

The Gallup report did not provide $s$, the sample standard deviation.

What property of the sample does $s$ measure?
Suppose $s$ was equal to $150. Use the 2SD method to record a 95% confidence interval for $\mu$.
Suppose $s$ was equal to $300. Use the 2SD method to record the 95% confidence interval for $\mu$.

The impact of sample size

The poll involved 562 men and 477 women. Suppose the women reported planning to spend an average of $704 with standard deviation $150. Record a 2SD interval for the mean amount that women plan to spend.
Compare the intervals you recorded in #4, #5, and #6. How do sample standard deviation and sample size affect the width of a confidence interval? What else can affect the width of a confidence interval?

Analogy: Catching Fish

Analogy: Catching a Fish

This fish: parameter of interest
$s$: the range over which the fish swims
Size of net: size of confidence interval
Confidence that you will catch it: confidence level

Exploration 3.4B

What proportion is orange?

Exploration 3.4B

Reese’s Pieces candies come in three colors: orange, yellow, and brown. Suppose that you take a random sample of candies and want to estimate the long-run proportion of candies that is orange. Let’s assume for now (although we would not know this when conducting the study) that this long-run proportion, symbolized by $\pi$, is equal to 0.50.

Sampling Variability

Suppose that you take a random sample of 100 Reese’s Pieces candies and find the sample proportion of orange. Is there any guarantee that the sample proportion will equal 0.50?
Suppose we calculate a confidence interval from this sample proportion. Is there any guarantee that the interval will contain the value 0.50?
Suppose that you select another random sample of 100 Reese’s Pieces candies. How would a new confidence interval compare?

Studies are random processes

A statistical study, done with random sampling, is a random process.
Two identical studies, done with exactly the same methodology, can/will have different samples.
- This is called sampling variability.
Two identical studies, done with exactly the same methodology, could come to different conclusions, one right, and one wrong.
- This is called sampling error. It doesn’t mean anybody made a mistake. It is just “bad luck.”
- We can control sampling error, but it can’t be eliminated.

Simulating Confidence Intervals Applet

Use the Simulating Confidence Intervals applet to make a confidence interval for $\pi$. Leave the Method part as Proportions, Binomial, Wald, and use $\pi = 0.5$. Set $n=100$ and Intervals to 1. Click Sample to make one 95% confidence interval. A colored horizontal line segment should appear on one of the graphs. Click on this line to see the endpoints of the confidence interval, and record these endpoints.

Make more intervals

Now click Reset, and change Intervals to 200, and click Sample, to generate 200 random samples, along with their corresponding confidence intervals.

What is the applet doing?

On the “Sample Proportions” dotplot, click on one of the red dots. Doing this will reveal the value of the sample proportion and the corresponding 95% confidence interval. Also click on one of the green dots, and observe the corresponding interval. Operator: Use this form to answer the following questions:

What is the difference between the red confidence intervals and the green confidence intervals?
What does the location of the dot on the “sample proportions” dot plot have to do with the corresponding confidence interval?

The Main Point

Based on your observations, fill in the blanks:

Thus, 95% confidence means that if we repeatedly sampled from a process and used the sample statistic to construct a 95% confidence interval, in the long run, roughly _______% of all those intervals would manage to capture the actual value of the long-run proportion , and the remaining _______% would not.

Confidence Level

Before you change the confidence level to 90%, have your group predict what will happen. Widths? Percent green/red?
Now change Conf level to 90% and Recalculate. Record how the widths of the intervals change. Why does this make sense? Record how the running total changes. Why does this make sense?

Confidence Level

The confidence level indicates the long-run percentage of confidence intervals that would succeed in capturing the (unknown) value of the parameter if random samples were to be taken repeatedly from the population/process and a confidence interval produced from each sample.

Sample Size

Before you change the sample size to 400, have your group predict what will happen. Widths? Percent green/red?
Change the sample size to 400 and press Sample. Record how the widths of the intervals change. Why does this make sense? Record how the running total changes. Why does this make sense?

Concept Review

Suppose that you calculate a 99% confidence interval for the long-run proportion of orange to be (0.461, 0.589). Decide whether each statement is valid or invalid.
- There is a 99% chance that the long-run proportion of Reese’s Pieces candy that is orange is between 0.461 and 0.589.
- We are 99% confident that the long-run proportion of Reese’s Pieces candy that is orange is between 0.461 and 0.589.
- If we were to repeat the process of taking random samples making 99% CI’s, then in the long run, 99% of all those CI’s would contain the long-run proportion $\pi$.

Section 3.5: Cautions!

The Bradley Effect?

1982: Tom Bradley lost to George Deukmejian, despite being ahead in the polls.

Did people lie to pollsters?

2016: Trump v. Clinton?

Bias

Sampling Bias refers to bias of the sampling method.
- Random sampling is an unbiased sampling method.
- Convenience sampling can often be a biased sampling method.
- Voluntary response (e.g., web polls) is almost always a biased method.
There are other types of bias (e.g., response bias).
- Suggestively worded questions
- Reluctance to express an unpopular opinion

Errors

Type I and II errors are “errors” due to sampling variablity.
- Not mistakes.
- $\alpha$ = probability of Type I error
- Power = 1 - probability of type II error
Nonrandom errors are mistakes.
- Errors due to response bias
- Errors due to experimenter interference
- Errors due to contamination of materials
- etc.

Did you vote?

Voting for President

In the 1998 General Social Survey, a random sample of 2613 eligible voters found that 1783 said they had voted in the 1996 US presidential election. Use the Theory-Based Inference applet to record a 99% confidence interval for the proportion of eligible US voters who voted in 1996.

According to the Federal Election Commision, the actual proportion of eligible US voters who voted in 1996 was 0.49. Did your confidence interval capture the true value of the parameter it was estimating? If not, which is the best explanation?

The sample was not representative of the population because we used a biased sampling method.
The sample was not representative of the population, because we were just unlucky.
The data is misleading because of response bias.
Something else?

Explainer: Write up your group’s choice, and explain why you made it.

Cat Households

Cat Ownership

A survey of 47000 Americans found that 32.4% own a cat. Use the Theory-Based Inference applet to test $H_0: \pi = 0.33333$ versus the alternative $H_a:\pi < 0.33333$. Record one sentence interpreting the p-value in the context of this problem.
Record a theory-based 95% confidence interval for the proportion of Americans who own a cat. Why is the interval so narrow?

“significant” vs “a lot”

The narrow confidence interval and small p-value can be interpreted in a couple ways:

The sample data provide very strong (highly significant) evidence that less than one-third of all American households own a cat.
The sample data indicate that a lot less than one-third of all American households own a cat.

What is the difference between these two explanations? Explainer: Write up your group’s choice and why you made it.

Statistical Signficance vs. Practical Importance

“Statistically significant” means “very unlikely to have occurred by chance, assuming $H_0$.”
“Practically important” means “the difference is large enough to matter in the real world.”

Compute a rejection region

Suppose you survey a random sample of 100 households to assess cat ownership. Use the Theory-Based Inference applet to determine (by trial and error) the largest count that would result in rejecting $H_0: \pi = 0.33333$ in favor the alternative $H_a:\pi < 0.33333$, at a significance level of $\alpha = 0.05$. Record this value: anything less or equal to it will lie in the rejection region of this test.

Simulate 1000 surveys

Now suppose that the actual percentage of American households that own a cat is 30%. Use the One Proportion Applet to simulate 1000 samples of size 100 (assuming that the population parameter is 0.3) and make a dot plot of the number of successes for these 1000 samples.
Regard these 1000 samples as 1000 different surveys, each of 100 households. In how many of these surveys would you have rejected the test in #15? (Use the rejection region.) Record the proportion of samples in which your survey count lies in the rejection region.

Cats and Power

If the null hypothesis is $H_0: \pi = 0.33333$, but the actual population parameter is 0.3, is the null true or false?
In how many of your 1000 simulated surveys did you fail to reject the null? If you fail to reject a false null, what type of error have you made? Record the power of this test.

More power?

What can be done to improve the power of the test in #6?