|
Chapter 1:
Distributions
AP Statistics Standards
I. Exploring Data:
Describing patterns and departures from patterns
(20% –30%)
- A.
Constructing and interpreting graphical displays of distributions of
univariate data (dotplot, stemplot, histogram, cumulative frequency
plot)
-
Center and spread
-
Clusters and gaps
-
Outliers and other unusual features
-
Shape
B. Summarizing distributions of
univariate data
-
Measuring center:
median, mean
-
Measuring spread: range, interquartile
range, standard deviation
-
Measuring position: quartiles,
percentiles, standardized scores (z-scores)
-
Using boxplots
-
The effect of changing units on summary
measures
C. Comparing distributions of univariate data
(dotplots, back-to-back stemplots, parallel boxplots)
-
Comparing center and spread: within
group, between group variation
-
Comparing clusters and gaps
-
Comparing outliers and other unusual
features
-
Comparing shapes
|
|
Objectives |
|
Class Start
Up
Distribute & discuss
syllabi . |
| Essential Question:
How many numbers are needed to
describe a complex event or object? |
-
State the difference between categorical and
quantitative variables and give examples of each.
Note:
Categorical data is drawn only on
bar graphs
or pie charts
- Define distribution and state two key pieces
of information require to produce a distribution.
The pattern of variation of a
single variable
-
Quantitative data (numbers)
-
Frequency--How often various values are
expected
-
State the 3 key ways a distribution can be
described.
-
Central tendency or center
-
Spread or variability
-
Shape
Homefun
(formative/summative assessment): Read "Statistical Thinking", section 1.1 and
1.2, work exercises 1.3 (p.7), 1.5 (p.10)
|
|
|
Activities |
-
- Lesson 1
-
- Warm Up: Have each person
describe themselves with 2 words and 2 numbers. Object: Complex
objects and phenomenon are frequently described with a few
numbers. How these numbers are produced is critical.
The Key Elements
- Design--the systematic way in which
the data is collected
Analysis--the
systematic use of graphical
and mathematical tools to describe and evaluate the data
- Conclusions--the
systematic manner in which
inferences are drawn from the data
- Key Concept:
What is a distribution?
- Purpose:
Lay the foundation for describing a set of
data.
Interactive Discussion:
Objectives
Resources/Materials:
Picture of histogram with various sized increments to
illustrate the key weakness of histograms. |
|
| Essential Question:
When using a number to describe a
complex event or object is
there a difference between using a single number and using a
single data point? |
Ch1.2 Describing Distributions
- Name and define the 3 key measure of central tendency.
- Mean =
Σxi
/ n or
- Mean = ( x1+ x2
+ x3 + ... + xn) / n
- Median - midpoint, 50% above, 50% below
- Mode - most common data point or highest peak
-
Given a set of data determine the
mean, median and mode.
- Define and ID outliers.
Outliers are data points that are thought to
belong to a different distribution, hence, any influence they have
on the properties of a distribution causes errors.
- Data point not in distribution
- Gaps
Outliers and skew are not the
same thing. Skew is part of a distribution outliers are not.
- State which measure of central tendency is
generally most influenced by outliers.
- Using the Mr. Rogers Rat Tail Rule, state
whether a distribution is skewed left or right, high or low.
- Give examples of data that would tend to be
symmetrical and data that would be skewed left or right.
- Easy Test - skewed
left or skewed low
- Hard Test - skewed
right or skewed high
- Normal Test -
symmetrical
- Incomes - skewed
right or skewed high
Homefun
(formative/summative assessment):
work exercises
1.69 |
- Lesson 2
- Key Concept: Central
tendency and shape of a distribution.
- Purpose: Lay the
foundation for describing a set of data.
Interactive Discussion:
Objective
|
Conclusions based on single
or poorly chosen data point are statistically
indefensible! -- these are
called: Anecdotes |
|
Conclusions unduly
influenced by a single data point are statistically
indefensible! --these data points
are Outliers |
Individual
Work: Find the mean, median, and mode in simulated sets of data with
both odd and even numbers of data points.
|
The Mr. Rogers Rat Tail Rule--FAQ
Skewed distributions often look like a rat with a long tail.
The tail points in the direction of skew.
What gets skewed? The mean gets skewed or moved in the
direction the rat tail points.
Why does skew matter? For a skewed distribution, the
mean is less representative of the bulk of the data points.
What gets skewed very little? The median. It will be
more representative of the bulk of the data points than the
mean. |
|
|
Stats
Investigation:
Investigation School Evaluation - time approx 3 class periods
(individual work) |
| Purpose:
Determine if it is reasonable
for 50% of all schools receiving a school report card to be scored below average.
Instructions:
Perform the simulation of school ratings using the Excel Spread
Sheet provided.
Questions /Conclusions:
(see Excel spread sheet.)
|
|
| Essential Question:
Is there a difference between
looking at tables of numbers and looking at plots or graphs of numbers? |
-
Make dot plots.
-
Make histograms using the TI-83 calculator and in Minitab.
-
State the key weakness of histograms (see "Four
Histograms").
-
Understand the meaning of percentile.
Example: if a score of 57 is at
the 10th percentile then 10% of the observations or data points fall
below a score of 57.
-
Convert distribution data into a relative cumulative
frequency graph (generally expressed in %), often called an ogive (see page 27).
Note:
Ogives are often drawn with the otherwise abhorrent connect the
dots style.
Homefun
(formative/summative assessment): read 1.2,
work exercises
1.15,
1.17, 1.19 p. 24-31 |
- Lesson 3
- Key Concept: Central
tendency, spread, and shape of a distribution must be visualized
in order to analyze them.
- Purpose: Lay the
foundation for describing a set of data.
Interactive Discussion:
Objectives
Individual
Work: Given a set of data, make a dot plot
on paper, make a histogram using the TI-83, and make a frequency
plot on paper.
Group Work: Using an ogive drawn on a white board,
answer was Bill Clinton a young president (page 28)?
|
| Essential Question:
Can the type of plot influence the
conclusions drawn and if so how can this be prevented? |
Stem and Leaf Plots
- Draw and interpret stem and leaf plots.
- Draw and interpret back to back
stem and leaf plots .
- State why a time plot should always
be used in an analysis of data.
Virtually everything is a function
of time.
Homefun
(formative/summative assessment): prob. 1.21, 1.29, 1.31,
1.51
|
Lesson 4
Key Concept: All data
varies with time. Stem and Leaf Plot
Purpose: Understand the
reasons all data should be plotted against time.
Interactive Discussion:
Objectives
- What can a stem and leaf plot
reveal ?
- What variable is virtually everything
dependent on?
Seat Work: Draw stem and leaf plots both on
paper and a histogram with a TI-83 calculators using using hot dog
data p. 59, prob 1.47.
|
| Essential Question:
Is there a difference between skew
and outliers? |
Box and Whiskers Plots
- Calculate quartiles, Q1 and Q3.
- Interpret 5 number summaries.
Low, Q1, Med.,Q3, Hi
- Find the IQR or interquartile range
for a data set.
IQR = Q3 - Q1
- Draw a box and whiskers plot.
- State the Mr. Rogers Rat Whisker
Rule for determining skew using a box and whiskers plot.
Long whisker indicates
direction of skew.
- State the % of the data expected in
each whisker and in the box for a box and whiskers plot.
25%
Homefun
(formative/summative assessment): prob. 1.39, 1.47:
|
- Lesson 5
-
- Key Concept: Using box
and whiskers plots to describe distributions
- Purpose: Box and whiskers
plots are an outstanding tool for communicating information
about data in a straight forward manner
Interactive Discussion:
Objectives
Individual Work: Draw
box and whiskers plots both on paper and with TI-83 calculators using
simulated data.
|
| Essential Question:
Why are outliers important? |
Modified Box and Whiskers Plot
- Identify outliers using a modified
box and whiskers plot.
- Whisker's End
= 1st data pt within 1.5 IQR
of Q3
-
Outlier = data pt beyond the whisker's end
- Create box and whisker plots on
the TI-83.
- Create and interpret parallel box and whisker plots on
the TI-83 and in Minitab.
Note that a box and whiskers plot cannot detect gaps, clusters, or
multi-modes, but here's the problem with other types of graphs such as
dot plots, stem and leaf plots, and histograms: the ability to detect
patterns depends on the interval size. There's no perfect plot for
visualizing distributions.
Homefun
(formative/summative assessment): prob. 1.55, 1.59:
|
- Lesson 6
-
- Key Concept: ID outliers
using modified B&W plots
- Purpose: The modified box
and whiskers plots are an outstanding tool for identifying
outliers.
Interactive Discussion:
Objectives
Which type of plot(s) is(are) best at
identifying outliers in a consistent manner?
- Which type of plot(s)
is(are) best at identifying clusters?
- Which type of plot(s)
is(are) best at identifying multiple-modes?
- Which type of plot(s)
is(are) best at identifying gaps?
Individual Work: Draw modified box and whiskers
plots both on paper and with TI-83 calculators using simulated data.
|
| Essential Question:
Ideally, how many data points in a
set of data are needed to characterize spread? |
Standard Deviation
- Calculate range.
range = (highest) -
(lowest)
- Write the mathematical
definition for standard
deviation from memory and explain its meaning.
| |
Calculated from an entire
population |
| |
σ = |
[
Σ(xi
- μ)2 / n ]1/2 |
| |
|
| |
Calculated from a sample |
| |
s = |
[
Σ(xi
- xbar)2 / (n - 1) ]1/2 |
The standard deviation is a way to
express how much a typical data point differs from the mean but is
calculated so that large deviations have more influence.
- State how standard deviation and
variance are related.
variance = (standard
deviation)2
- Calculate standard deviations by
hand and with a calculator
- Note the difference between S and
sigma.
- S
=
an estimate of a population's
std dev based on a sample
- sigma
= the actual standard deviation of a
population
- Be as one with the 3 points about
standard deviation in the magic box on page 51.
- State why the standard deviation
is a better indicator of spread than range.
Std
dev uses all the data points, range uses only 2 pts.
- State an approximate relationship
between range and standard deviation.
(range roughly = 6 sigma.)
Homefun
(formative/summative assessment): work exercises 1.73
|
- Lesson 7
-
- Key Concept: Measuring
spread - Range, standard deviation, and IQR
- Purpose: Understand the
pros and cons of various spread measuring techniques.
- Quantities represented as Greek
alphabet symbols are considered true (known by
Zeus).
- Quantities represented in our
normal alphabet (known by mere mortals) are
estimates of the ones represented as Greek alphabet symbols.
|
Interactive Discussion:
Objectives
Individual Work: Calculate ranges, IQR,
standard deviations, and variances both on paper and with TI-83 calculators
using monthly temperature data from the
Geenville -
Spartenburg Airport.
|
| Essential Question:
What will changing the units of
measurement do to measures of spread and central tendency? |
Linear Transforms
- What is a linear transform? (See p. 53).
xnew = a + b∙xold
Example:
the linear transform
to change from Celsius to Fahrenheit
(ºF) = 32 + 1.8 (ºC)
- State the effect that multiplying each number by
a constant and/or adding a
constant to each number in a data set has on the following:
(This effect is important to
know when changing the data points' units.)
| |
(each data
point) + a |
(each data
point) * b |
|
mean |
(mean) + a |
(mean) * b |
|
median |
(median) + a |
(median) * b |
|
standard deviation |
no change |
(std dev) * b |
|
IQR |
no change |
(IQR) * b |
|
range |
no change |
(range) * b |
| Q1 and Q3 |
(Q1) + a & (Q3) + a |
(Q1) * b & (Q3) * b |
Homefun
(formative/summative assessment): work exercises 1.45,
1.55 |
- Lesson 8
-
- Key Concept:
The effects of linear transforms
- Purpose:
Interactive Discussion:
Objectives
Individual Work:
Calculate ranges, IQR, standard deviations, and variances with TI-83
calculators using monthly temperature data in Fahrenheit from the
Geenville-Spartenburg Airport. Repeat the process using the same
temperature data converted to Celsius. Record the mean, median, standard
deviation, IQR, range, Q1, and Q3 for both Fahrenheit and Celsius.
|
| Essential Question:
How can I make an "A" on
the test? |
Distributions Review
- Work the practice test.
- Review the objectives.
- Correctly interpret 5 number
summaries.
- Look over
free response problems
from previous years.
- Memorize the mathematical definitions of variance and standard
deviation for samples and populations.
- Master the vocabulary (see example
below).
|
Descriptive
Term |
Comments |
|
Central Tendency |
|
| |
Mean |
Sensitive to
outliers & skew |
| |
Median |
good when
outliers or skew present |
| |
Mode |
rarely used |
|
Spread |
|
| |
range |
Very sensitive
to outliers & skew |
| |
variance |
Sensitive to outliers & skew |
| |
standard
deviation |
Sensitive to
outliers & skew |
| |
IQR |
good when
outliers or skew present |
|
Shape |
|
| |
Symmetrical |
can have
multiple peaks |
| |
Skewed left |
Skewed low, easy test |
| |
Skewed right |
Skewed high,
hard test, income |
|
|