AP Statistics Standards
I. Exploring Data: (continued)
D. Exploring bivariate data
-
Analyzing patterns in scatterplots
-
Correlation and linearity
-
Least-squares regression line
-
Residual plots, outliers, and
influential points
|
|
Objectives |
| Essential Question:
How can we establish and quantify
a cause and effect relationship between two variables? |
Chap 3 2 Variable Relationships
- Identify the response and explanatory variables
from a plot.
- Identify positive and negative associations
from scatter plots.
Note: an association does not establish
cause and effect
- Detect linear and non-linear relationships
using scatter plots.
- Judge the relative strength of a relationship
by the amount of scatter around the curve of best fit.
- Identify outliers on scatter plots.
- Within the expected range X-values
- Outside the expected range of Y-values for
a given X-value
- Identify "influential outliers"
on scatter plots.
- Outside the expected range X-values
- Often outside the expected range of Y-values
- Make scatter plots using the TI-83 calculator and Excel.
- State why any analysis of 2 variable (bivariate) data
should always begin with a scatter plot regardless of which tools are used to
further analyze the data.
Homefun: 3.11, 3.13
|
|
|
Activities |
- Lesson 1
- Key Concept: How to represent 2 variable data.
- Purpose: Lay the foundation for correlation and
regression.
Interactive Discussion:
Objectives--How to interpret scatterplots. How
to identify outliers.
|
|
|
| Essential Question:
Why is it important to qualtify
correlation instead of just estimating it by looking at a graph? |
Correlation
- State
the meaning of correlation and how it is typically indicated.
- Strength
- Direction
- Linear Relationship
- r
- Be as one with the 5 facts about correlation. on p. 132.
- Calculate r using the formula.
Homefun: prob. 3.19, 3.23
|
- Lesson 2
- Key Concept: Correlation
- strength of the relationship
- Purpose: Develop the
ability to evaluate the strength of the relationship.
Interactive Discussion: The dog
barked and the tree fell down. Was there an association. Was there
causation
2-person teams: Perform
correlations on SAT data using a TI-83. (This will carry over
into the regression section.)
|
| Essential Question:
Why would we need to find a
mathematical relationship between variables? Isn't correlation enough? |
Regression
- Explain
the difference between correlation and regression.
- Perform regression/correlation analysis with the TI-83
calculator, Excel Spreads sheets.
- What type of error does least squares regression minimize?
- Interpret regression equations.
- Calculate y-bar using a regression equation, given x-bar.
- Properly state the meaning of slope according to the
official statistics definition. (p142)
- Properly interpret the intercept. (example sales vs.
advertising dollars, what are the sales with no advertising?
- Describe the region where a given regression equation
will give a meaningful association.
- Define and decry the use of extrapolation.
- Be aware that the point (x-bar, y-bar) is in the
center of the regression line.
- Solve problems using the equation for b on p.144.
b = r (sy/
sx)
|
- Lesson 3
- Key Concept: Regression -
finding the mathematical relationship between two variables.
- Purpose: Obtain and
understand regression equations.
Demo: Using
Fathom software, demonstrate the
reasoning process behind least squares regression analysis.
Interactive Discussion:
On objectives
2-person teams: See above
|
| Essential Question:
Would changing the units of the
variables affect the R-square value? |
The Meaning of R-Square
- State the meaning of SSM and SSE. Use them to calculate
R-square.
- Give the official interpretation of r-squared.
- Use the proper magic words p.149
-
Evaluates the entire equation
- Explain why care must be taken in using the official
interpretation of r-squared.
- Susceptible to outliers
- Data points furthest from the center of the line have
more influence
- There may be no causative relationship between explanatory
and response variables.
Homefun: prob. 3.33, 3.35
|
- Lesson 4
- Key Concept: The use and
misuse of R-square
- Purpose: To understand
how R-square is often overused as a measure of regression
analysis "goodness".
Interactive Discussion: Objectives
2-person teams: See Stats
investigation below
|
|
Stats
Investigation: Meaning of
R-Square - time approx 2 class periods (individual work) |
|
Purpose: Determine if a regression analysis using random
numbers can yield an r-square value of 50% or more.
Instructions: Set up a
regression analysis in Excel using integer x-values from 0
to 9. Use a random number from 0 to 10 for the y-values. Run
this simulation 100 times. Calculate the average r-square
and record the highest r-squared value. Record the three
highest r-square values obtained in the class.
Save the data sets from your 4
regression/correlation results with the highest
R-square value. You will use it again at the end of the
year.
Questions /Conclusions:
- Based on your data, does a high
r-square value by itself indicate a meaningful association
or causation?
- Is the random number generator used in this
investigation truly random?
- Is it possible to get a high r-squared value merely
from random events?
- What does it really mean when
we say that r-square
represents the fraction of the variation in the values of
y that is "explained" by the least squares regression of y
on x? Discuss things like the SSM and SSE.
|
|
|
| Essential Question:
Can a regression equation with a
high R-square be inappropriate? |
Residuals
- Define y-hat.
- Calculate residuals using a TI-83 calculator.
- State 2 ways to plot residuals.
- State a major assumption behind all regression lines.
- Variability around the line is constant
- Interpret residual plot patterns.
- Random
- Smiley or Frowning Face (Mr. R's Terms)
-
Pattern in the scatter
- Make residual plots using a TI-83.
- State the sum of the residuals.
Homefun: prob. 3.43, 3.45, 3.49
|
- Lesson 5
- Key Concept: Residual
Plots.
- Purpose: Understand when
a given regression equation is appropriate.
Interactive Discussion:
Objectives.
Individual work:
Perform residual plots on TI-83 calculators and with Excel software.
|
|
Stats
Investigation: Determining if a
Regression Equation is Appropriate - time approx 1 class
periods (individual work) |
|
Purpose: Determine if a linear regression equation is
appropriate for two different situations.
Background: Commercial
resistors follow ohm's law while light bulbs, due to their
high temperatures do not. Ohm's law is as follows:
I = (1/R) V
Where: I = current, V= voltage and R =
resistance.
Plotting I vs. V will theoretically
yield a straight line passing through the origin.
Instructions: Set up a least
squares linear regression analysis in Excel to find the
association between current (response variable) and voltage
(explanatory variable) for a commercial resistor and for a
light bulb. Remember that this means a scatter plot as well
as finding the slope, intercept, and R-square for the data.
Set up the formulas needed to plot a residual plot and make
such a plot for the two sets of data.
Questions /Conclusions:
- Based on your data, does a high
r-square value by itself indicate a meaningful association
or causation?
- Find the resistance value in Ohms for the commercial
resistor?
- Is a linear equation appropriate for the commercial
resistor? How about the light bulb. Explain your answers.
|
|
|