Chapter 1:
Exploring Data
AP Statistics Standards I. Exploring Data:
Describing patterns and departures from patterns
(20% –30%)
 A.
Constructing and interpreting graphical displays of distributions of
univariate data (dotplot, stemplot, histogram, cumulative frequency
plot)

 Center and spread
 Clusters and gaps
 Outliers and other unusual features
 Shape
B. Summarizing distributions of
univariate data
 Measuring center:
median, mean
 Measuring spread: range, interquartile
range, standard deviation
 Measuring position: quartiles,
percentiles, standardized scores (zscores)
 Using boxplots

The effect of changing units on summary
measures
C. Comparing distributions of univariate data
(dotplots, backtoback stemplots, parallel boxplots)
 Comparing center and spread: within
group, between group variation
 Comparing clusters and gaps
 Comparing outliers and other unusual
features
 Comparing shapes


Essential Question: How many numbers are needed to
describe a complex event or object? 
Introductionwhat is statistics about?
 Given a complex system or object, describe it adequately with a limited number of indicators or measurements.
describe yourself with 2 words and 2 numbers. See if your classmates can ID you from the description. Formative assessment: What did you learn about the power of indicators or measurements from doing the above exercise? 
 State the key elements used for answering a research question in a statistically acceptable manner. Statistical analysis is an internationally recognized way of answering research questions and communicating data. It is a powerful international communication tool.
Designthe systematic way in which
the data is collected.
Analysisthe
systematic use of graphical
and mathematical tools to describe and evaluate the data.
Conclusionsthe
systematic manner in which
inferences are drawn from the data and uncertainties are evaluated.
 Evaluate information to determine if it is anecdotal evidence. Anecdotal evidence is based on data that's collected in a haphazard manner. It usually consists of a small sample size, often a single data point, frequently chosen for emotional impact.
Evidence consisting of a single data point is always considered anecdotal
Conclusions based on anecdotal evidence are not statistically defensible
Homefun (formative/summative assessment): Find an article that uses anecdotal evidence. Briefly describe the evidence and how it is used. Provide a reference to the source.

Essential Question:
Is data always expressed as numbers? 

State the difference between categorical and
quantitative variables and give examples of each.
quantitative variables: consists of numerical values that could reasonably be expressed as an average. 
height 
weight 
age 
categorical variable: a classification system 
zip codes

grade (freshmen, sophomores, juniors, seniors)

size (small, medium, large)

Note:
Categorical data is drawn only on
bar graph or pie charts
 Evaluate the effectiveness of bar charts and other graphs. examples: .
Formative assessment: Evaluate the effectiveness of the above charts 

Create frequency tables for categorical data. In other words, convert the "count" data to % data.
 Convert the above tables into bar charts. A 2way table will contain:
a vertical and a horizontal marginal distribution
multiple conditional distributions
 Use conditional distributions based on relative frequencies to establish
associations. This is typically done by looking at bar charts of the distributions
associations: a pattern exists between the values of one variable and the values of another. Association does not establish that one variable causes the other.
Homefun (formative/summative assessment): Read section 1.1, work exercises 1, 11, 17 pages 22 to 24
Essential Question: Can data sets be added together
to obtain a larger sample size and hence more meaningful conclusion? 
Simpson's Paradox
 Analyze data for
Simpson's paradox.
 Conclusions based on parts can be reversed when
considering the whole
 Conclusions based on parts is more likely to be
valid.
 State two conditions which must exist for
Simpson's Paradox to occur.
 One or more lurking variables
 Data from unequal sized groups being combined into a
single group.
Homefun (formative/summative
assessment):

Read Simpson's Paradox
 When Big Data Sets Go Bad

Read "A closer Look at SAT Scores Decline", Summarize in a paragraph how Simpson's paradox might be involved.

work exercises 20, 35 pages 2526


Essential Question:
When using a number to describe a
complex event or object is
there a difference between using a single number and using a
single data point? 
Ch1.2 Describing Distributions
 Define distribution and state two key pieces
of information require to produce a distribution.
The pattern of variation of a
single variable
 Quantitative data(numbers along horizontal or xaxis)
 FrequencyHow often various values are
expected (along vertical or yaxis)
 State the 3 key ways a distribution can be
described.
 Central tendency or center
 Spread or variability
 Shape
 Name and define the 3 key measure of central tendency.
Mean = Σx_{i}
/ n or
Mean = ( x_{1}+ x_{2}
+ x_{3} + ... + x_{n}) / n
 Median  midpoint, 50% above, 50% below
 Mode  most common data point or highest peak

Given a set of data determine the
mean, median and mode.
 Define and ID outliers.
Outliers are data points that are thought to
belong to a different distribution, hence, any influence they have
on the properties of a distribution causes errors.
 Data point not in distribution
 Gaps
Outliers and skew are not the
same thing. Skew is part of a distribution outliers are not.
Conclusions unduly
influenced by a single data point are statistically
indefensible! these data points
are Outliers 
 State which measure of central tendency is
generally most influenced by outliers.
 Using the Mr. Rogers Rat Tail Rule, state
whether a distribution is skewed left or right, high or low.
The Mr. Rogers Rat Tail RuleFAQ
Skewed distributions often look like a rat with a long tail.
The tail points in the direction of skew.
What gets skewed? The mean gets skewed or moved in the
direction the rat tail points.
Why does skew matter? For a skewed distribution, the
mean poorly represents the bulk of the data points.
What gets skewed very little? The median. It is represents the bulk of the data points better than the
mean. 
 Give examples of data that would tend to be
symmetrical and data that would be skewed left or right.
 Easy Test  skewed
left or skewed low
 Hard Test  skewed
right or skewed high
 Normal Test 
symmetrical
 Incomes  skewed
right or skewed high
Homefun
(formative/summative assessment): Read section 1.2

Stats
Investigation:
Investigation School Evaluation  time approx 3 class periods
(individual work) 
Purpose:
Determine if it is reasonable
for 50% of all schools receiving a school report card to be scored below average.
Instructions:
Perform the simulation of school ratings using the Excel Spread
Sheet provided.
Questions /Conclusions:
(see Excel spread sheet.)


Essential Question:
Is there a difference between
looking at tables of numbers and looking at plots or graphs of numbers? 

Make dot plots.
gasoline consumption analysis
Old Fathful analysis
foreign born analysis
Is IQ a bellcurve distribution?

Make histograms using the TI83 calculator and in Minitab.
 State the key weakness of histograms (see "Four
Histograms").
Homefun
(formative/summative assessment):
work exercises
37, 41, 55, 57 pages 4246 
Essential Question:
Can the type of plot influence the
conclusions drawn and if so how can this be prevented? 
Stem and Leaf Plots
 Draw and interpret stem and leaf plots.
 clusters
 skew
 gaps
 multiple modesthese imply that the data comes from more than one distribution.
 Draw and interpret back to back
stem and leaf plots .
 State why a time plot should always
be used in an analysis of data.
Virtually everything is a function
of time.
Homefun
(formative/summative assessment): read section 1.3; exercises 45, 47, 49, pages 4445

Essential Question:
Is there a difference between skew
and outliers? 
Box and Whiskers Plots
 Calculate quartiles, Q1 and Q3.
 Interpret 5 number summaries.
Low, Q1, Med.,Q3, Hi
 Find the IQR or interquartile range
for a data set.
IQR = Q3  Q1
 Draw a box and whiskers plot.
 State the Mr. Rogers Rat Whisker
Rule for determining skew using a box and whiskers plot.
Long whisker indicates
direction of skew.
 State the % of the data expected in
each whisker and in the box for a box and whiskers plot.
25%
Homefun
(formative/summative assessment):

Essential Question:
Why are outliers important? 
Modified Box and Whiskers Plot
 Identify outliers using a modified
box and whiskers plot.
 Whisker's End
= 1st data pt within 1.5 IQR
of Q3

Outlier = data pt beyond the whisker's end

Create box and whisker plots on
the TI83.
 Create and interpret parallel box and whisker plots on
the TI83 and in Minitab.
Note that a box and whiskers plot cannot detect gaps, clusters, or
multimodes, but here's the problem with other types of graphs such as
dot plots, stem and leaf plots, and histograms: the ability to detect
patterns depends on the interval size. There's no perfect plot for
visualizing distributions.
Formative assessments:
 Which type of plot(s)
is(are) best at identifying clusters?
 Which type of plot(s)
is(are) best at identifying multiplemodes?
 Which type of plot(s)
is(are) best at identifying gaps?
Homefun
(formative/summative assessment): exercises 91,93, 95 p. 71 Work the Chapter 1 practice Test TI.1 to TI.15 7881:

Essential Question:
Ideally, how many data points in a
set of data are needed to characterize spread? 
Standard Deviation
 Quantities represented as Greek
alphabet symbols are considered true (known by
Zeus).
 Quantities represented in our
normal alphabet (known by mere mortals) are
estimates of the ones represented as Greek alphabet symbols.


Calculate the range and explain why it is a poor indicator of spread.
 Write the mathematical
definition for standard
deviation from memory and explain its meaning.

Calculated from an entire
population 

σ = 
[
Σ(x_{i}
 μ)^{2} / n ]^{1/2} 



Calculated from a sample 

s = 
[
Σ(x_{i}
 xbar)^{2} / (n  1) ]^{1/2} 
The standard deviation is a way to
express how much a typical data point differs from the mean but it is
weighted so that large deviations have more influence.
 State how standard deviation and
variance are related.
variance = (standard
deviation)^{2}

Calculate standard deviations by
hand and with a calculator

Explain the difference between S and
sigma.
 State why the standard deviation
is a better indicator of spread than range.
Std
dev uses all the data points, range uses only 2 pts.
 State an approximate relationship
between range and standard deviation.
(range roughly = 6 sigma.)
Rank the distributions show here from lowest to highest standard deviation.
Formative assessment: What does a distribution with high or low standard deviaton look like? 
Homefun
(formative/summative assessment): exercise 97, 99 p. 72; Work the Chapter 1 practice Test TI.1 to TI.15 7881

Essential Question:
How can I make an "A" on
the test? 
Exploring Data Review
 Work the practice test.
 Review the objectives.
 Correctly interpret 5 number
summaries.
 Look over
free response problems
from previous years.
 Memorize the mathematical definitions of variance and standard
deviation for samples and populations.
 Master the vocabulary (see example
below).
Descriptive
Term 
Comments 
Central Tendency 


Mean 
Sensitive to
outliers & skew 

Median 
good when
outliers or skew present 

Mode 
rarely used 
Spread 


range 
Very sensitive
to outliers & skew 

variance 
Sensitive to outliers & skew 

standard
deviation 
Sensitive to
outliers & skew 

IQR 
good when
outliers or skew present 
Shape 


Symmetrical 
can have
multiple peaks 

Skewed left 
Skewed low, easy test 

Skewed right 
Skewed high,
hard test, income 
Summative Assessment:
TestObjectives 136
