drivers <- read.table("http://math.westmont.edu/ma5/carstatus.txt", header = TRUE, stringsAsFactors = TRUE)
plot(Behavior ~ Status, data = drivers)
How is the time a student spends on an exam related to the student’s score on the exam?
How is the time a student spends on an exam related to the student’s score on the exam?
## time score
## 1 30 100
## 2 41 84
## 3 41 94
## 4 43 90
## 5 47 88
## 6 48 99
## 7 51 85
## 8 54 84
## 9 54 94
## 10 56 100
## 11 56 65
## 12 56 64
## 13 57 65
## 14 58 89
## 15 58 83
## 16 60 85
## 17 61 86
## 18 61 92
## 19 62 74
## 20 63 73
## 21 64 75
## 22 66 53
## 23 66 91
## 24 69 85
## 25 72 62
## 26 78 68
## 27 79 72
## 28 93 93
## 29 96 93
## 30 100 97
When we describe data in a scatterplot, we describe the
How would you describe the time and test scatterplot?
\[ r= \frac{1}{n-1} \sum_{i=1}^n \left( \frac{x_i - \bar{x}}{s_x} \right) \left( \frac{y_i -\bar{y}}{s_y} \right) \]
range of \(r\) | Strength | Meaning |
---|---|---|
\(0.7 \leq \lvert r \rvert \leq 1\) | Strong | Points almost form a line. |
\(0.3 \leq \lvert r \rvert \leq 0.7\) | Moderate | Clear pattern, but bloblike. |
\(0.1 \leq \lvert r \rvert \leq 0.3\) | Weak | Slight pattern. |
\(0 \leq \lvert r \rvert \leq 0.1\) | None | No discernible trend. |
Note: \(-1\leq r \leq 1\) always.
Original Data: \(r =\) -0.5636557
Add 3 “unusual” points: \(r =\) -0.124997
## time score
## time 1.000000 -0.124997
## score -0.124997 1.000000
## time score
## 1 30 100
## 2 41 84
## 3 41 94
## 4 43 90
## 5 47 88
## 6 48 99
## 7 51 85
## 8 54 84
## 9 54 94
## 10 56 100
## 11 56 65
## 12 56 64
## 13 57 65
## 14 58 89
## 15 58 83
## 16 60 85
## 17 61 86
## 18 61 92
## 19 62 74
## 20 63 73
## 21 64 75
## 22 66 53
## 23 66 91
## 24 69 85
## 25 72 62
## 26 78 68
## 27 79 72
## 28 93 93
## 29 96 93
## 30 100 97
A scatterplot is a graph showing a dot for each observational unit, where the location of the dot indicates the values of the observational unit for both the explanatory and response variables. Typically, the explanatory variable is placed on the x-axis and the response variable is placed on the y-axis.
Is the association between year and plate size positive or negative? Record a complete sentence explaining what this means in context.
Does the association between year and size appear to be linear or nonlinear?
In your opinion, would you say that the association between plate size and year appears to be strong, moderate, or weak?
Complete this form with your answers.
Two types of unusual observations:
\[ r= \frac{1}{n-1} \sum_{i=1}^n \left( \frac{x_i - \bar{x}}{s_x} \right) \left( \frac{y_i -\bar{y}}{s_y} \right) \]
range of \(r\) | Strength | Meaning |
---|---|---|
\(0.7 \leq \lvert r \rvert \leq 1\) | Strong | Points almost form a line. |
\(0.3 \leq \lvert r \rvert \leq 0.7\) | Moderate | Clear pattern, but bloblike. |
\(0.1 \leq \lvert r \rvert \leq 0.3\) | Weak | Slight pattern. |
\(0 \leq \lvert r \rvert \leq 0.1\) | None | No discernible trend. |
Note: \(-1\leq r \leq 1\) always.
Will the value of the correlation coefficient for the year-plate size data be negative or positive? Why?
Without using the applet, give an estimated range for the value of the correlation coefficient \(r\) between plate size and year based on the scatterplot.
Now, check the Correlation coefficient box in the applet to reveal the actual value of the correlation coefficient \(r\).
Join one of groups 1-6 based on who you are sitting near and work on the corresponding page of the Jamboard. Answer questions 10 and 11 with your group.
Is there a correlation between body temperature and heart rate?
If there was no association between heart rate and body temperature, what is the probability we would get a correlation as high as 0.378 just by chance?
If there is no association, we can break apart the temperatures and their corresponding heart rates. We will do this by shuffling one of the variables. (The applet shuffles the response.)
Paste the data into the Correlation/Regression applet.
tmp HR
98.30 72
98.20 69
98.70 72
98.50 71
97.00 80
98.80 81
98.50 68
98.70 82
99.30 68
97.80 65
98.20 71
99.90 79
98.60 86
98.60 82
97.80 58
98.40 84
98.37 73
97.40 57
96.70 62
98.00 89
Let’s look at a different (and larger) data set comparing temperature and heart rate (Example 10.5A) using the Correlation/Regression applet.
During the Vietnam war, young men in the US were drafted into the Army in an order determined by a random lottery.
## [1] -0.2260414
Could this correlation coefficient just be a product of randomness?
Was the draft order truly random? A “fair” draft should have a correlation coefficient of zero.
Each of the 366 birthdays in a year (including February 29) was assigned a draft number.
Work together with your proximity group to answer questions 1 and 4 on the Jamboard.
Generate a null distribution with at least 1000 simulated statistics. Where is this distribution centered? Why does this make sense?
Use the null distribution to obtain a p-value for this test. Paste a screen shot of the null distribution into the Jamboard, and record the p-value, along with a sentence explaining what it means in the context of this study.
The irregularity can be attributed to improper mixing of the balls used in the lottery drawing process. (Balls with birthdays early in the year were placed in the bin first, and balls with birthdays late in the year were placed in the bin last. Without thorough mixing, balls with birthdays late in the year settled near the top of the bin and so tended to be selected earlier.)
The following year, in 1971, the mixing process was improved. The correlation coefficient turned out to be \(r = 0.014\).
Enter your explanations into this form.