Mr. Rogers - AP Statistics Objectives
Syllabus 1st Quarter 2nd Quarter 3rd Quarter 4th Quarter
1 Distributions 2 N-Distribution 3 Regression 4 NL Regression 5 Data
Unit Plan Practice Test

Chapter 3: Regression Analysis

AP Statistics Standards

I. Exploring Data:  (continued)

D. Exploring bivariate data

  1. Analyzing patterns in scatterplots

  2. Correlation and linearity

  3. Least-squares regression line

  4. Residual plots, outliers, and influential points

Objectives

Essential Question: How can we establish and quantify a cause and effect relationship between two variables?

Chap 3     2 Variable (Bivariate) Relationships

  1. Identify the response and explanatory variables from a plot.
  • response: y-variable, dependent
  • explanatory: x-variable, independent
  1. Identify positive and negative associations from scatter plots.

Note: an association does not establish cause and effect

  1. Detect linear and non-linear relationships using scatter plots.

Note: ALWAYS make a scatter plot when analyzing bivariate data

  1. Judge the relative strength of a relationship by the amount of scatter around the curve of best fit.
  2. Identify outliers on scatter plots.
    • Within the expected range X-values
    • Outside the expected range of Y-values for a given X-value
  1. Identify "influential outliers" on scatter plots.
    • Outside the expected range X-values
    • Note: in this region the expected range of Y-values is undefined
  1. Make scatter plots using the TI-83 calculator and Excel.
  2. State why any analysis of 2 variable (bivariate) data should always begin with a scatter plot regardless of which tools are used to further analyze the data.
  • identifies outliers
  • reveals gaps and clusters in the data
  • displays patterns such as linearity or non-linearity

Note: in the ideal situation all the data points would have equal influence and be uniformly distributed.

Homefun (formative/summative assessment): 3.11, 3.13

Relevance: Many scientific constants and predictions are based on measurements of the slopes of lines.

Activities

Lesson 1
Key Concept: How to represent 2 variable data.
Purpose: Lay the foundation for correlation and regression.

Interactive Discussion: Objectives--How to interpret scatter plots. How to identify outliers.

Essential Question: Why is it important to quantify correlation instead of just estimating it by looking at a graph?

Correlation

  1. State the meaning of correlation and how it is typically indicated.
    • Strength
    • Direction
    • Linear Relationship
    • r
  1. Be as one with the 7 facts about correlation. on p. 143-144.
  2. Calculate r using the formula (p. 140).
r =     1        Σ (  xi - xbar  )(  yi - ybar )  
  n - 1      sx       sy  
Note: r-square is bullet-proof
adding a constant to either y-variable or x-variable or both has no effect on r-square or slope.

multiplying either the y-variable or the x-variable or both has no effect on r-square

 

Homefun (formative/summative assessment): prob. 3.29, 3.31

 

Lesson 2
Key Concept: Correlation - strength of the relationship
Purpose: Develop the ability to evaluate the strength of the relationship.

Interactive Discussion: The dog barked and the tree fell down. Was there an association. Was there causation

Individuals: Perform correlations on SAT data (p71) using  a TI-83. (This will carry over into the regression section.)

Essential Question: Why would we need to find a mathematical relationship between variables? Isn't correlation enough?

Regression

  1. Explain the difference between correlation and regression.
  2. Perform regression/correlation analysis with the TI-83 calculator, Excel Spreads sheets.
  3. What type of error does least squares regression minimize?
  • Error measured in y-dimension (y = response variable)
  • x-dimension (explanatory variable) considered error-free
  1. Interpret regression equations.
    • Single     y-hat = ax + b
    • Multiple  y-hat = ao + a1x1 + a2x2 + ... + anxn
  1. Calculate y-bar using a regression equation, given x-bar.
  2. Properly state the meaning of slope according to the official statistics definition. (p155)

For every increase of one in the x-variable, the y-variable increases by the slope

  1. Properly interpret the intercept. (example sales vs. advertising dollars, what are the sales with no advertising?
  2. Describe the region where a given regression equation will give a meaningful association.
  3. Define and decry the use of extrapolation.
  4. Be aware that the point (x-bar, y-bar) is in the center of the regression line.
  5. Solve problems using the following equations (b = slope, a = intercept):

       b =  r (sy/ sx)                 a = ybar - b(xbar)

Action Effects
Slope Intercept
Multiply by constant = k    
x-data points multiplies by 1 / k none
y-data points multiplies by k multiplies by k
both none multiplies by k

Add a constant = k

   
x-data points none adds -bk
y-data points none adds k
both none adds (k-bk)

 

Homefun (formative/summative assessment): prob. 3.33, 3.35

Relevance: Regression and correlation are the mathematical tools much of the social sciences as well as business tools are founded on.

 

Lesson 3
Key Concept: Regression - finding the mathematical relationship between two variables.

 

Purpose: Obtain and understand regression equations.

Demo: Using Fathom software, demonstrate the reasoning process behind least squares regression analysis.

Interactive Discussion: On objectives

2-person teams: See above

 
Essential Question: Would changing the units of the variables affect the R-square value?

The Meaning of R-Square

  1. State the meaning of SST and SSE. Use them to calculate R-square.
  • SST = ∑ (yi - y-bar)2    is a measure of the scatter or variability of the y-data points about the y-data's mean.
  • SSE = ∑ (yi - y-hat)2    is a measure of the scatter or variability of the y-data points about the regression line.
  • (SST - SSE) is a measure of the amount of variability in the y-data points explained by the regression line.
  • r2 = (SST - SSE) / SST is a measure of the fraction of the variability in the y-data points explained by the regression line.
  1. Give the official interpretation of r-square (coefficient of determination).
    • Use the proper magic words p.162
    • r-square evaluates the entire equation
  1. Explain why care must be taken in using the official interpretation of r-squared.
    • Susceptible to outliers
    • Data points furthest from the center of the line have more influence
    • There may be no causative relationship between explanatory and response variables.

 

Homefun (formative/summative assessment): prob. 3.43, 3.45

Relevance: Sometimes major political decisions are made or social theories proposed based on questionable evidence. It's impossible to evaluate this evidence without knowing something about the meaning of r-square.

 

 

Lesson 4
Key Concept: The use and misuse of R-square

 

Purpose: To understand how R-square is often overused as a measure of regression analysis "goodness".

Interactive Discussion: Objectives

2-person teams: See Stats investigation below

 

Stats Investigation (formative/summative assessment): Meaning of R-Square - time approx 2 class periods (individual work)

Purpose: Determine if a regression analysis using random numbers can yield an r-square value of 50% or more.

Instructions: Set up a regression analysis in Excel using integer x-values from 0 to 9. Use a random number from 0 to 10 for the y-values. Run this simulation 100 times. Calculate the average r-square and record the highest r-squared value. Record the three highest r-square values obtained in the class.

Save the data sets from your 4 regression/correlation  results with the highest R-square value. You will use it again at the end of the year.

Questions /Conclusions:

  1. Based on your data, does a high r-square value by itself indicate a meaningful association or causation?
  2. Is the random number generator used in this investigation truly random?
  3. Is it possible to get a high r-squared value merely from random events?
  4. What does it really mean when we say that r-square represents the fraction of the variation in the values of y that is "explained" by the least squares regression of y on x? Discuss things like the SSM and SSE.

 

Essential Question: Can a regression equation with a high R-square be inappropriate?

Residuals

  1. Define what is meant by a residual.
  2. Calculate residuals using a TI-83 calculator.
  3. State 2 ways to plot residuals.
  • Residuals vs x  commonly used with straight line equation
  • Residuals vs y  commonly used with multiple regression analysis
  1. State a major assumption behind all regression lines.
    Variability around the line is constant
  1. Interpret residual plot patterns.

Residual Plot conclusion: either appropriate or inappropriate

    • Random--appropriate
    • Smiley or Frowning Face (Mr. R's Terms)--inappropriate
    • Pattern in the scatter--inappropriate

Note: residual plots merely magnify the patterns that can be observed in a scatter plot. The horizontal line at the origin of a residual plot represents the regression line. A person skilled at interpreting scatter plots will arrive at the same conclusions that can be drawn from a residual plot.

  1. Make residual plots using a TI-83.
  2. State the sum of the residuals.

 

Homefun (formative/summative assessment): prob. 3.47, 3.61, 3.71

Relevance: Even though the world is largely non-linear, parts of it can often be accurately described with linear models. Knowing when a linear model is inappropriate is essential to building effective models. Various types of regression models are used in everything from predicting grades on AP tests to computer control of chemical plants.

Summative Assessment: Test--Objectives 1-32

 

Lesson 5
Key Concept: Residual Plots.

 

Purpose: Understand when a given regression equation is appropriate.

Interactive Discussion: Objectives.

Individual work: Perform residual plots on TI-83 calculators and with Excel software.

 

Stats Investigation (formative/summative assessment): Determining if a Regression Equation is Appropriate - time approx 1 class periods (individual work)

Purpose: Determine if a linear regression equation is appropriate for two different situations.

Background: Commercial resistors follow ohm's law while light bulbs, due to their high temperatures do not. Ohm's law is as follows:

I = (1/R) V

Where: I = current, V= voltage and R = resistance.

Plotting I vs. V will theoretically yield a straight line passing through the origin.

 

Instructions: Set up a least squares linear regression analysis in Excel to find the association between current (response variable) and voltage (explanatory variable) for a commercial resistor and for a light bulb. Remember that this means a scatter plot as well as finding the slope, intercept, and R-square for the data. Set up the formulas needed to plot a residual plot and make such a plot for the two sets of data.

Questions /Conclusions:

  1. Based on your data, does a high r-square value by itself indicate a meaningful association or causation?
  2. Find the resistance value in Ohms for the commercial resistor?
  3. Is a linear equation appropriate for the commercial resistor? How about the light bulb. Explain your answers.

The Practice of Statistics, Yates, Moore, McCabe

Mr

Mr. Rogers' Twitter Site

Check out other web sites created by Mr. R:

 

 

 
Want to learn more about movie physics in Star Trek and find out :
  • what makes Star Trek unique
  • how Star Trek compares to Star Wars
  • why the star ship Enterprise needs to remain in space
  • what should and shouldn't be done in space battles
  • what it takes to blast off and travel the galaxy
  • the basics of orbiting
Insultingly Stupid Movie Physics is one of the most humorous, entertaining, and readable physics books available, yet is filled with all kinds of useful content and clear explanations for high school, 1st semester college physics students, and film buffs.

It explains all 3 of Newton's laws, the 1st and 2nd laws of thermodynamics, momentum, energy, gravity, circular motion and a host of other topics all through the lens of Hollywood movies using Star Trek and numerous other films.

If you want to learn how to think physics and have a lot of fun in the process, this is the book for you!

 

First the web site,

now the book!


Mr. Rogers Home | Common Sylabus | AP Comp Sci I | AP Comp Sci II | AP Physics Mech | AP Physics E&M | AP Statistics | IB Design Tech | Southside

[ Intuitor Home | Physics | Movie Physics | Chess | Forchess | Hex | Intuitor Store |

Copyright © 1996-2009 T. K. Rogers, all rights reserved. Forchess ® is a registered trademark of T. K. Rogers.
No part of this website may be reproduced in any form, electronic or otherwise, without express written approval.