Listen to the following music clip.
Now write down how many seconds long you think that clip was.
Based on what you wrote down, answer the poll in the #ma-005 chat.
-poll "How long did the music play?" "Less than 10 seconds" "More than 10 seconds" "I wrote down 10 seconds exactly."
timedata <-
read.table("http://www.isi-stats.com/isi/data/chap3/TimeEstimate.txt",
header = TRUE)
timedata
Time
1 6
2 7
3 7
4 9
5 9
6 9
7 10
8 10
9 10
10 10
11 12
12 12
13 12
14 14
15 15
16 15
17 15
18 15
19 20
20 20
21 20
22 20
23 20
24 22
25 26
Time
Min. : 6.0
1st Qu.:10.0
Median :12.0
Mean :13.8
3rd Qu.:20.0
Max. :26.0
The median is the middle data value when the data are sorted in order from smallest to largest.
Here are the sleep times of a small class of six statistics students: 6, 7.5, 5.5, 8, 6.5, 7.5. What is the median for this small class? (Compute this by hand.)
-poll "What is the median of the numbers 6, 7.5, 5.5, 8, 6.5, 7.5?" 5 5.5 6 6.5 6.75 7 7.5 8 "none of these"
We simulate a large (>6000) population of made-up time estimates that conform to our null hypothesis. Population properties:
Estimate
Min. : 1
1st Qu.: 5
Median : 9
Mean :10
3rd Qu.:15
Max. :25
Number of simulated time estimates = 6215
Find out where our observed mean of 13.71 sec is located.
Question: How many hours do Westmont students sleep on a typical night? Let’s make the question more specific and ask about last night. Is the average less than the recommended eight hours? How can we estimate this average?
\[ H_0: \mu = 8 \\ H_a: \mu < 8 \]
Since we didn’t collect the data using random sampling, it is uncertain whether our results will generalize to the population of all Westmont students.
Use the Descriptive Statistics applet to examine a dotplot of the sample data. To do this, press Clear to delete the existing data in the applet and then copy and paste your class data (including the one-word variable name at the top) into the Sample data box and press Use Data.
A distribution is called skewed if it is not symmetric, and, instead, the bulk of observation values tend to fall on one side of the distribution, with a longer “tail” on the other. Right-skewed distributions have their tail on the right, and left-skewed distributions have their tail on the left.
Describe the shape of the distribution of sleep times in the sample as symmetric, right skewed, left skewed, or something else.
One way to summarize the center of a distribution is with the mean. Check the box next to Actual in the Mean row and record the value of the average sleep time for your class. Record the mean, using the appropriate symbol. Also, record the sample size.
Use the applet to find the median sleep time for your class by checking the Actual box in the Median row. Record the median of the class data.
Do the mean and median for your class differ by much? What does this suggest about the skewness of the data?
What is the standard deviation of sleep times for students in your class? Use the applet to find this by checking the Actual box next to Std dev. Record this value.
Are there any sleep hours in your class that you would characterize as unusual? In particular, are there sleep times that are far away from the bulk of the data (outliers)?
Assume that the population of sleep hours follows a normal distribution with mean \(\mu = 8\) hours (as indicated under the null hypothesis) and standard deviation \(\sigma = 1.5\). Open the One Mean applet. Notice that the population distribution on the left looks to be normally distributed with mean very close to 8 hours and SD about 1.5 hours. Check the Show Sampling Options box. Keep Number of Samples set to one for now. Set the Sample Size to match the class data. Press Draw Samples and notice that a dotplot of the simulated sample appeared in the middle, and one dot appeared on the rightmost dotplot, corresponding to the mean of this simulated sample.
Change the Number of Samples to 1,000 and press Draw Samples a few times to make a null distribution of simulated sample means. Record the mean and standard deviation of the null distribution.
Record a p-value using the Count Samples settings. For this study, the observed statistic is the sample mean \(\bar{x}\) that you recorded in #2.
Record the value of a standardized statistic by using the same formula we used in Section 1.3.
\[ \mbox{standardized statistic} = \frac{\mbox{statistic} - \mbox{null value}}{\mbox{SD of null distribution}} \]
Remember that the context of this problem concerns how much sleep Westmont students get.
timedata <-
read.table("http://www.isi-stats.com/isi/data/chap3/TimeEstimate.txt",
header = TRUE)
timedata
Time
1 6
2 7
3 7
4 9
5 9
6 9
7 10
8 10
9 10
10 10
11 12
12 12
13 12
14 14
15 15
16 15
17 15
18 15
19 20
20 20
21 20
22 20
23 20
24 22
25 26
[1] 13.8
[1] 5.416026
Note: you will use these commands in Investigation #2.
Here, \(s\) is the sample standard deviation.
\[ t \approx \frac{13.8 - 10}{5.416/\sqrt{25}} \approx 3.5 \]
The theory-based test for a single mean requires either:
Last time, we did a simulation to get a null distribution for this data.
Using the formula, we get simulated t-statistics:
To get a p-value from a t-statistic, you need some sort of calculator or software (or, gasp, a table in a book).
tstat <- (mean(timedata$Time) - 10)/(sd(timedata$Time)/sqrt(25))
pt(tstat, 24, lower.tail = FALSE)*2
[1] 0.001805775
One Sample t-test
data: timedata$Time
t = 3.5081, df = 24, p-value = 0.001806
alternative hypothesis: true mean is not equal to 10
95 percent confidence interval:
11.56437 16.03563
sample estimates:
mean of x
13.8
Research question: Do Westmont students get less than the recommended 8 hours of sleep per night?
\[ H_0: \mu = 8 \\ H_a: \mu < 8 \]
Validity Conditions: The quantitative variable should have a symmetric distribution or you should have at least 20 observations and the sample distribution should not be strongly skewed.
The central limit theorem implies that the distribution of sample means is normal when the population distribution is normal, or is approximately normal when the sample size is large. Moreover,
\[ \begin{align*} \mbox{MEAN}(\bar{x}) & = \mu \\ \mbox{SD}(\bar{x}) &= \sigma/\sqrt{n} \approx s/\sqrt{n} \end{align*} \]
Have someone from your group put this calculation on the Jamboard.
Have someone post a screenshot of the overlaid t-distribution on the Jamboard.
Use the applet to count the number of simulated samples with a t-statistic less than (our alternative hypothesis) the observed value of your t-statistic (from #5) to find the approximate p-value (based on t-statistics). Record this p-value.
The theory-based p-value (one-sample t-test) is also provided in the output. Record the theory-based p-value. How well do the simulation-based and theory-based p-values match?
The theory-based test can also be done using the Theory-Based Inference applet. Select the One mean scenario, check the Test of significance box, and enter the appropriate values for \(n\), \(\bar{x}\), \(s\), and the hypotheses. Record your p-value and t-statistic. Compare these values to your answers to #9 and #5. Are they the same?
Keep the hypotheses and \(n\) and \(s\) the same, and experiment with different values of \(\bar{x}\). Record the largest value of \(\bar{x}\) that gives a p-value less than 0.05. (This is the largest class average we would have to observe to find so-called “strong” evidence against \(H_0\).)
Read the data and print a summary:
sleepdata <-
read.table("http://math.westmont.edu/ma5/classSleep.txt",
header = TRUE)
summary(sleepdata)
SleepHours
Min. : 5.000
1st Qu.: 6.562
Median : 7.125
Mean : 7.504
3rd Qu.: 8.188
Max. :11.000
We set \(H_0: \mu = 8\), and the default alternative is two-sided:
One Sample t-test
data: sleepdata$SleepHours
t = -3.0095, df = 57, p-value = 0.003892
alternative hypothesis: true mean is not equal to 8
95 percent confidence interval:
7.17449 7.83413
sample estimates:
mean of x
7.50431