|
Chapter 4: Nonlinear Regression
AP Statistics Standards
I. Exploring Data:
Observing patterns and departures from patterns (continued)
D. Exploring bivariate data
-
Transformations to achieve
linearity: logarithmic and power transformations
|
|
Objectives |
| Essential Question:
Is everything we'd like to study
and model linear? |
Chapter 4 : 2 Variable Data Continued
Modeling Exponential Data
- Explain how data can be transformed so that
linear regression produces an exponential function.
- First: convert all y data
points to ln y. On a
TI-83 calculator, if y-data is stored in L1 ans x-data is stored in L1, do LN
L2 sto L4
- Second: do linear
regression for L1, L4
- Finally: manipulate the
data as shown below.
ln y = ax + b
e (ln y) = e[ax + b]
y = [ eax
] [ eb ]
y = [ eb
] [ eax
]
let eb
= K
y = keax
- Give examples where an exponential regression model would
be appropriate.
Growth or decay situations
(response variable multiplied by a fixed
amount in each time interval) such as:
- Explain how to determine if an exponential model is
appropriate.
-
theoretical basis such as
objective 2 above
- random residuals
(this means exponential regression is appropriate.
It does not necessarily mean it's right.)
- Explain why an
exponential model should not be selected on
the basis of optimizing r-square.
A different form of
non-linear equation may have a higher R2 value but be less appropriate.
- Perform exponential regression on a TI-83
calculator using the NON-transformed data
and note that these results and the results obtained with transformed data are
mathematically the same.
- NON-transformed data:
y = abx
- Transformed data:
y = keax
- y = k(ea)x
- Let: k = a ,
ea
= b
- By substitution:
- y = abx
Homefun (formative/summative
assessment): Read section 4.1; prob. 4.7,
4.11
Relevance: Exponential data is
commonplace in many business, biological, chemistry, physics, and other
areas. Knowing how to deal with it and how to model it is a significant career
skill.
|
|
|
Activities |
- Lesson 1
- Key Concept: Transforms
- Purpose:
Create a linear plot from nonlinear data
Seat Work: have students graph an
exponential example and do linear regression and residuals.
Interactive Discussion: Objectives.
- Explain the terms concaved upward and downward.
- Review exponents logarithms and
explain.
Seat Work:
perform the transform on the above data and derive an
exponential model from it. Compare this with the linear model.
|
Time |
Microbes |
| 0 |
1 |
| 1 |
2 |
| 2 |
4 |
| 3 |
8 |
| 4 |
16 |
| 5 |
32 |
| 6 |
64 |
| 7 |
128 |
| 8 |
256 |
|
|
| Essential Question:
If
a mouse weighing 0.5 lb were scaled up by a factor of 100, how much would it
weigh? |
Modeling Power Function Data
- Explain how data can be transformed so that
linear regression produces a power function.
- First: convert all
x and y data points to ln x and ln y.
On a TI-83 calculator, if
y-data is stored in L1 and x-data is stored in L1, do LN
L1 sto L3 and LN
L2 sto L4
- Second: do linear
regression for L3, L4
- Finally: manipulate
the data as shown below.
ln y = a(lnx) + b
e (ln y) = e[a(lnx) + b]
y = [ e(lnx)a
] [ eb ]
y = [ eb
] [ xa
]
let eb
= K
y
= kxa
- Give examples where a power regression model would be
appropriate.
Definition of scaling
factor:
If an object is to be scaled up to a larger size without
changing the appearance of the object, all the dimensions of the object have
to be multiplied by a common factor. This factor is called the scaling factor.
Scaling problems:
- Volume & mass
scale with the cube of the scaling factor
- Area scales with the square of the scaling factor
- Explain how to determine if a power model is
appropriate.
-
theoretical basis such as
objective 6 above
- random residuals
- Explain why a power model should not
be selected on the basis of optimizing r-square. A
different form of non-linear equation may have a higher R2 value but be
less appropriate.
- Perform power regression on a TI-83 calculator
using the NON-transformed data
and note that these results and the results obtained with transformed data are
the same.
Homefun (formative/summative
assessment): prob. 4.13, 4.15
Relevance: Power-functiondata
is commonplace in many business, biological, chemistry, physics, and other
areas. Knowing how to deal with it and how to model it is a significant
career skill. |
|
Activities |
- Lesson 2
- Key Concept: The transform needed for a power model
- Purpose:
Recognize the situations where a power model is appropriate and
create one.
Interactive Discussion:
Objectives.
Define scaling factor.
Demonstrate how scaling factors work
using spheres, cubes and rectangular prisms.
Seat Work: plot following simulated
data
|
Pumpkin Dia. |
Surface Area |
| 1 |
1.2 |
| 2 |
4.7 |
| 3 |
9.0 |
| 4 |
16.9 |
| 5 |
25.1 |
| 6 |
36.5 |
| 7 |
49.0 |
| 8 |
64.9 |
Perform linear regression analysis and find R2.
Transform both x and y data. repeat the process. Convert the linear
regression equation to a power model Work Example 4.9
Fishing Tournament p.216 |
For more information about scaling and why it's incredibly
important
-
Read:
Insultingly Stupid Movie Physics
- Chapter 4, Scaling Problems: Big Bugs and Little People, pp 51 - 66
|
|
|
| Essential Question:
What is the most common form of
extrapolation? |
Interpreting Correlation and
Regression
- Decry the evils
of extrapolation but also be aware that it's commonly used.
- projected sales--in
order to plan ahead, companies will often attempt to predict the next
year's sales and earnings based on regression
analysis of data from previous years.
- projected population growth
- projected impact of advertising
dollars spent--used for determining what the
future advertising budget should be.
- radioactive dating--there's
sound theory backing radioactive dating but obviously no one collected
data on it thousands of years ago.
- Be aware of ways the risks of extrapolation
can be moderated.
- Sound theoretical basis
for the regression model
- Strong supporting data
from independent sources. For example:
a limited amount of extrapolation using the
recent exponential grow in American wind-power electrical generation
is reasonable based on
ready availability of wind resources,
low cost compared to other forms of generation, and concerns about
global warming, 3 factors which make wind-power attractive.
- Extrapolating only slightly beyond the range
of the actual data. The greater the distance
beyond the data's range, the greater the risk.
- Simple regression model such as linear,
exponential, or power. Extrapolation with high order polynomials is
very dangerous.
- Positive results with various indicators
such as outlier-free scatter plot, high r-square, random residuals,
etc.
- Identify
possible lurking variable. An important
variable which is not included in the study.
- Name the most common lurking variable.
time)
- State the pitfall of using averaged
data. It makes the r-squared value higher.
Hence, the results look better than they really are.
Homefun
(formative/summative assessment): prob. 4.27, 4.32
|
- Lesson
3
- Key Concept:
Extrapolation and lurking variables
- Purpose: Understand how
conclusions drawn from data can be disastrously wrong
Interactive Discussion:
Objectives. Explain real growth curves--usually sigmoidal.
Seat Work:
Uncover the lurking variable
of time in the example on p. 228. Plot math classes per student vs.
time.
Work problem 4.19 on p.222.
|
| Essential Question:
Can we ever be completely sure
that causation exists? |
|
Causation |
| In other words, is the
association between the x and y variables due to the x-variable
actually causing a response in the y-variable. |
-
State 4 possible explanations for
getting a strong association based on regression/correlation analysis.

- Causation
--Sometimes it's true: x causes y
- Common response variables
(affect both x & y
variables), example:
rum (y) and
Methodist Ministers (x) are both affected by the common response variable,
population growth (z).
- Confounding variables
(affect the y variable
but not the x), example:
The shaman chants an incantation (x) and five
days later the patient who seemed near death gets well. The patient's
immune system (z) was the real cause.
- Random chance
(the association is temporary in nature and
is the result of numerous unidentifiable factors that are not
reproducible), example:
Bob finds a 1957 penny on the sidewalk as
he enters the casino. When he subsequently wins $2000 dollars at
roulette, he concludes that the penny is his good luck charm.
- Explain 4 steps toward
establishing causation. Generally all 4 steps
are required especially for controversial situations.
- Carefully controlled experiments--the
gold standard. Can sometimes be as simple as turning the causative
variable on and off. Weakness = experiments often
are run in an artificial
environment.
- Multiple independent observational
studies of different types
- Account for, control, or eliminate lurking
variables--Must be done
in both observational and experimental studies. Accounting for lurking
variables usually means including them in multiple linear regression
analysis.
- Develop a plausible theory--without
a plausible theory, even experimental data can be questioned.
Homefun
(formative/summative assessment): Read section 4.2; prob.
4.35, 4.41, 4.45
Summative Assessment:
Test objectives 1-14 and previous regression/correlation objectives |
- Lesson
4
- Key Concept:
Extrapolation and lurking variables
- Purpose: Understand how
conclusions drawn from data can be disastrously wrong
Interactive Discussion:
Objectives.
Questions
|
The dog
barked and the tree fell down.
- Did the dog cause the tree to
fall?
- What are the possible common
response variables?
- What are the possible
confounding variables?
- Could the two events coincide due
to random events?
- Could the tree-felling dog be
tested in an experiment?
- Is there a plausible theory for
why the tree could be felled by the noise of a dog barking?
|
Video: use video on smoking
|
AP Statistics Standards
I. Exploring Data:
Observing patterns and departures from patterns (continued)
E. Exploring categorical data
1. Frequency tables and bar charts
2. Marginal and joint frequencies for two-way tables
3. Conditional relative frequencies and association
4. Comparing distributions using bar charts
|
| Essential Question:
How can categorical data
be represented and interpreted? |
Categorical Data
- Create frequency tables for categorical data.
- Convert the above tables into bar charts.
- Use conditional distributions based on relative frequencies to establish
associations.
- Compare distributions using bar charts.
- Interpret 2-way tables.
- Interpret marginal distributions.
- 2 for each table, horizontal & vertical
- Histogram like
- Single variable only
- Calculate and interpret
conditional distributions
| Essential Question:
Can data sets be added together
to obtain a larger sample size and hence more meaningful conclusion? |
Simpson's Paradox
- Analyze data for
Simpson's paradox.
- Conclusions based on parts can be reversed when
considering the whole
- Conclusions based on parts is more likely to be
valid.
-
State two conditions which must exist for
Simpson's Paradox to occur.
- One or more lurking variables
- Data from unequal sized groups being combined into a
single group.
- State how Simpson's paradox can be prevented.
- Avoid combining data from unequal groups into a
single study
- Identify and include lurking variables in the study
Homefun (formative/summative
assessment): Read
Simpsons's Paradox
- When Big Data Sets Go Bad
prob. 4.37, 4.39, 4.45
|
- Lesson 5
- Key Concept:
Simpson's Paradox
- Purpose: Understand how
conclusions drawn from data can be disastrously wrong
Interactive Discussion:
Objectives. Work through hospital example of Simpson's paradox.
Individual work:
Work through Simpson's paradox
worksheet provided by teacher.
Use Titanic data to determine if the
class of one's ticket had an association with the chances of
survival.
Materials: Simpson's Paradox
Worksheet and Titanic data.
p. 247
|
| Essential Question:
Can data sets be added together
to obtain a larger sample size and hence more meaningful conclusion? |
- Analyze data for
Simpson's paradox.
- Conclusions based on parts can be reversed when
considering the whole
- Conclusions based on parts is more likely to be
valid.
-
State two conditions which must exist for
Simpson's Paradox to occur.
- One or more lurking variables
- Data from unequal sized groups being combined into a
single group.
- State how Simpson's paradox can be prevented.
- Avoid combining data from unequal groups into a
single study
- Identify and include lurking variables in the study
Homefun (formative/summative
assessment): Read
Simpsons's Paradox
- When Big Data Sets Go Bad
prob. 4.37, 4.39, 4.45
Summative Assessment: Test
Objectives 1 - 27
|
- Lesson 5
- Key Concept:
Simpson's Paradox
- Purpose: Understand how
conclusions drawn from data can be disastrously wrong
Interactive Discussion:
Objectives. Work through hospital example of Simpson's paradox.
Individual work:
Work through Simpson's paradox
worksheet provided by teacher.
Use Titanic data to determine if the
class of one's ticket had an association with the chances of
survival.
Materials: Simpson's Paradox
Worksheet and Titanic data.
|