Big Idea 3: Analyze and Summarize Data Sets - Math GR. 6-8

advertisement
Analyze and Summarize Data Sets
Grade 8
Big Idea 3
Big Idea 3: Analyze and Summarize Data Sets
Select, organize and construct appropriate
data displays, including box and whisker
plots, scatter plots, and lines of best fit to
convey information and make conjectures
about possible relationships.
Determine and describe how changes in data
values impact measures of central
tendency.
Box and Whisker plots  Scatter plots  Lines of Best Fit
MA.8.S.3.1
Select, organize, and construct appropriate data displays,
including box-and-whisker-plots, scatter plots, and lines of
best fit to convey information and make conjectures about
possible relationships.
Graphical Displays
Taught in Previous
Grades
•Pictographs
•Histograms
•Bar graphs (single
and multiple)
•Stem and leaf plots
•Line graphs
•Circle graphs
• Box-and-whisker plots
• Scatter plots
• Lines of best fit
MA.8.S.3.1
Scatterplots and Lines of Best Fit
Annenberg Learner: “Against All Odds: Inside Statistics”
Video 8 “Describing Relationships” (28:35)
Discussion Questions
• 1.“Correlation does not imply causation.” Explain.
• 2. This video covers advanced statistics content, beyond
what your students are expected to know. What basics
must your students understand about linear regression in
order to master this standard?
• 3. Discuss the role of real-world situational context in
the interpretation of a scatterplot/regression.
• 4. What are some cautions about linear regression?
Scatterplots: Ask a Question
• Have you ever wondered
whether tall people have
longer arms than short
people?
Scatterplots: Ask a Question
Is there a positive association
between height and arm span?
Do people with above-average
arm spans tend to have aboveaverage heights?
Do people with below-average
arm spans tend to have belowaverage heights?
Person#
Arm
Span
Height
Person#
Arm
Span
Height
1
156
162
13
177
173
2
157
160
14
177
176
3
159
162
15
178
178
4
160
155
16
184
180
5
161
160
17
188
188
6
161
162
18
188
187
7
162
170
19
188
182
8
165
166
20
188
181
9
170
170
21
188
192
10
170
167
22
194
193
11
173
185
23
196
184
12
173
176
24
200
186
• a. Measure the arm span (fingertip to
fingertip) and height (without shoes) to the
nearest centimeter for six people, including
yourself.
• b. Does the information you collected
generally support or reject the observation
you made?
• c. Identify the person in the table whose
arm span and height are closest to your own
arm span and height.
Take it further
• How strong is the positive
association between arm
span and height?
• How do you compare to
the “average” adult?
Take it further
• Is your arm span and height
above the average of these
24 adults?
• How many of the 24 people
have above-average arm
spans?
• How many of the 24 people
Mean arm span = 175.5 cm
have above-average heights?
Mean height = 174.8 cm
Take it further
• Adding a vertical line to the scatter
plot that intersects the arm span (X)
axis at the mean, 175.5 cm,
separates the points into two
groups.
• Note that there are 12 arm spans
above the mean and 12 below. Will
this always happen? Why or why
not?
• What is true about anyone whose
point in the scatter plot appears to
the right of this line? What is true
about anyone whose point appears
to the left of this line?
Take it further
• Adding a horizontal line to
the scatter plot that
intersects the height (Y) at
the mean, 174.8 cm, also
separates the points into
two groups.
• What is true about anyone
whose scatter plot point
appears above this line?
How many such points are
there?
Take it further
BelowAboveAverage ArmAverage Arm
Span
Span
Above
average
Height
Below
average
Height
2
10
11
1
Mean arm span = 175.5 cm
Mean height = 174.8 cm
Above-Average Below-Average
Arm Span
Arm Span
Take it further
Above
average
Height
Below
average
Height
2
10
11
1
Mean arm span = 175.5 cm
Mean height = 174.8 cm
A New Challenge
The drawing below suggests that
a person's arm span should be
the same as her or his height -- in
which case, a person could be
considered a "square." Is this
correct?
Do most people have heights and
arm spans that are approximately
the same? That is, are most
people "square?”
Arm span = height?
A New Challenge
Why is this not the same
as establishing an
association between
height and arm span?
Arm span = height?
Take it further
• Compare the measurements for
the six heights and arm spans
you collected, including your
own.
• How many people are "squares"
-- i.e., their arm spans and
heights are the same? For how
many people are these
measurements approximately
the same?
• Consider the difference:
Height - Arm Span
If you know only that this difference is
positive, what does it tell you about a person?
What does it not tell you?
If you know that this difference is negative,
what does it tell you? What does it not tell
you?
If you know that this difference is 0, what does
it tell you?
• How many of the 24
people have heights
greater than their
arm spans?
• How many of the 24
people have heights
less than their arm
spans?
• How many of the 24
people have heights
equal to their arm
spans?
• Which three points
represent the
greatest differences
between height and
arm span?
• Other than the
points that fall on
the line Height =
Arm Span, which six
points represent the
smallest differences
between height and
arm span?
Annenberg Learner: “Learning Math: Data Analysis, Statistics,
and Probability”
Session 7 Part C “Modeling Linear Relationships” (1:15)
• Which of the four people
have heights greater than
their arm spans?
• Which of the four people
have heights that are less
than their arm spans?
• Which of the four has the
greatest difference between
height and arm span?
• Which of the four has the
smallest difference between
height and arm span?
Line of Best Fit
“trend line”
Helps us describe the strength
and direction of the relationship
between two variables
Dependent
Independent
The tighter the scatter, the
better the line
describes/predicts the
relationship
Minimizes residuals
Misconceptions About the Line of Best Fit
The line may go through some
data points, but not necessarily
any data points
Sometimes there are half the
observations above the line, and
half the observations below it,
but not always
Misconceptions About the Line of Best Fit
Just because you can draw a
“line of best fit” does not mean
the data is linear, or even that
there is necessarily a
relationship
You must consider the strength
of the regression, not just that a
regression was possible
Using the Correlation Coefficient to Evaluate the
Strength of a Regression
• To evaluate how well a regression describes or predicts
a relationship, we use the correlation coefficient, r.
•
1

r

1
The closer r is to 1 or -1, the stronger the correlation.
(r = 1, perfect positive correlation)
• The sign of r tells the slope of the line of best fit and
the direction of the relationship.

Using the Correlation Coefficient to Evaluate the
Strength of a Regression
• Four distributions
whose r = 0.816
• For which ones is
the line a good
description of the
trend? A good
predictor of
future behavior?
Shodor
Interactivate
http://www.shodor.org/
interactivate/activities/R
egression/
Shodor
Interactivate
 Plot points
Click checkbox for “fit your own
line”
Attempt to place the line in a
way which minimizes residuals
(Hint: click “Show Residuals”)
Check your work by clicking
checkbox for “display line of
best fit”
NCTM
Illuminations
Activities: “Line of Best Fit”
http://illuminations.nctm.org/A
ctivityDetail.aspx?id=146
Almost identical to the Shodor
Interactivate applet
NCTM
Illuminations
 Enter coordinates as ordered
pairs
Click checkbox for “Student Fit”
Attempt to place the line in a
way which minimizes residuals
Check your work by clicking
checkbox for “Computer Fit”
Line of Best Fit
“trend line”
Helps us describe the strength
of the relationship between two
variables
Dependent
Independent
The tighter the scatter, the
better the line
describes/predicts the
relationship
Minimizes residuals
GeoGebra Applet
MA.8.S.3.1
Box and Whisker Plots
Noodles: The Five Noodle Summary
• For the following activities, you will need
these materials:
• a package of spaghetti or linguine
• a metric ruler with millimeter markings
• three pieces of paper or cardboard
• a pen or pencil
Noodles: The Five Number Summary
• Break several spaghetti noodles into pieces to
obtain 11 noodles of varying lengths.
• Make sure that no two noodles in your set are the
same length.
• Draw a horizontal line on a piece of paper or
cardboard large enough to display all the noodles
in a row.
• Next, arrange the 11 noodles in order from
shortest to longest along the horizontal line.
Noodles: Minimum and Maximum
• Your arrangement should look something like this:
• Label the MINIMUM and the MAXIMUM noodle
length.
Noodles: A Two Number Summary
• This is your two-noodle summary:
• Make a vertical axis and label the min and the max.
Noodles: The Two Number Summary
• If you knew only the values of MAX and MIN,
describe some information you would not know
about the remaining nine noodles.
• Suppose someone asked you to
find the "typical" value of a noodle
in this data set. How would you
answer this question if you only
had the information from the TwoNumber Summary?
Noodles: Finding the Median
• Begin with the 11 noodles
arranged in order from
shortest to longest.
• Remove two noodles at a
time, one from each end,
and put them to the side.
• Continue this process until only one noodle remains.
• This noodle is the median, which we'll label "Med.”
Noodles: Understanding the Median
• If you could see only
the median noodle,
what would you know
about the other
noodles?
• If you could see only the median noodle,
describe some information you would not
know about the other noodles.
Noodles: The Three Number Summary
• If you could see only
the min, med, and
max, what would you
know about the other
noodles?
• If you could see only the min, med, and the
max noodles, describe some information you
still would not know about the other noodles.
Noodles: The Three Number Summary
• If we call the length of
the fourth noodle N4,
how does N4 compare
to Min, Med, and
Max?
• What wouldn't you know about N4 if you only
knew Min, Med, and Max?
Median: Even vs. Odd Data Sets
Noodles: Observations with Even Sample Sizes
• Add a 12th noodle, with a different length from
the other 11 noodles, to the original collection.
Arrange the noodles in order from shortest to
longest.
• Using the method of removing pairs of noodles
(the longest and the shortest), try to determine
the median noodle length. What happens?
Noodles: Observations with Even Sample Sizes
Noodles: Observations with Even Sample Sizes
Noodles: Finding the 5 Number Summary
• Return to the 12 noodles
arranged in order from
shortest to longest.
• Divide your noodles into
four groups with an equal
number of noodles in each.
• What is the median of the six noodles to the left of Q2?
• What is the median of the six noodles to the right of
Q2?
The Five Number Summary
Min, Q1, Med (Q2), Q3, and Max
The Five Number Summary
Min, Q1, Q2 (Med), Q3, Max
Using the information
given in this FiveNoodle Summary,
describe what you
know about the 12
noodles. For example,
what do you know
about the ninth
noodle, and what
information are you
still missing?
Noodles: Finding the 5 Number Summary
• To convert the Five-Noodle
Summary to the FiveNumber Summary, use the
same procedure you've
followed throughout this
session.
• Add a vertical number line so that you can indicate the
lengths of the five noodles.
• Remove the noodles, and you're left with the FiveNumber Summary.
Noodles: Finding the 5 Number Summary
• If N4 is the length of the
fourth noodle, what
information would you
know about N4 from the
Five-Number Summary?
• Ralph claims that the Five-Number Summary is enough
to know that N4 is closer to Q1 than it is to Med. He
says, "Since N4, N5, and N6 are all between Q1 and
Med, N4 has to be closer to Q1 than it is to Med." Is his
reasoning valid? Why or why not?
Median: Even vs. Odd Data Sets
Making Conjectures
• Explain how you would create a Five-Noodle
Summary for 14 noodles. How many noodles are
in each of the four groups?
• Explain how you would create a Five-Number
Summary for 15 noodles. How many numbers are
in each of the four groups?
Taking it Further
• How many numbers are in each of the four
groups if you started with 57 noodles? With
112 noodles? Can you find a rule that would
allow you to determine the number of values
in each group without creating a Five-Number
Summary?
Taking it Further
• What information is
learned from the
interquartile range, the
length of the interval
between Q1 and Q3?
Creating A Boxplot from the
Five Number Summary
Creating a Box Plot
Now we'll look at how you can
represent the five-number summary
graphically, using a box plot. For this
activity, we will work with a set of 12
noodles with the following
measurements (in millimeters):
23 28 33 41 56 74 81 91 102
109 118 122
Creating a Box Plot
Notice these observations are
ordered. Why is it necessary to
order the data before creating a
five-number summary?
23 28 33 41 56 74 81 91 102
109 118 122
Ordering Observations
• Now determine min, Q1, median, Q3, and
max.
Finding the Five Noodle Summary
Five Noodle Summary
Five Noodle Summary
Interpreting the Length of Boxes and Whiskers
• Long boxes and long whiskers
simply mean the data is more
spread out.
• Counterintuitively, data points
are less dense in a larger (more
‘spread out’) region or tail,
which indicates skewness (lack
of symmetry).
Why Interpreting Boxplots is “Hard” for Kids
• Students think: A “box” has
area (square) and volume
(rectangular prism)
• Students think: when a box
has more area/volume, it
means more “stuff” fits in/on
there. With boxplots this is
not true.
• By definition, every quartile
has the same* number of
observations.
Why Interpreting Boxplots is “Hard” for Kids
• Student sees a long box
and thinks: “oh, there are
lots of observations piled
up there…”
• … when, the truth is, the
opposite is more likely
true! The observations
are all spread out. This is
counterintuitive for many
middle schoolers.
Why Interpreting Boxplots is “Hard” for Kids
Indicators of large spread
• long ‘box’
• long whiskers
Indicators of
 clusters of data
 consistent observations
 small spread/narrow
range
• short ‘box’
• short whiskers
Interpreting the Orientation of the Box Plot
• Box plots can be drawn
vertically or horizontally,
depending on whether you
display the Five-Number
Summary along a vertical or a
horizontal axis.
Interpreting the Length of the Box (IQR)
• The entire rectangle indicates
the range of the middle
part(the interquartile range) of
the ordered data.
• How much of the data is
represented by the box?
• The interquartile range (IQR)
represents the middle 50% of
the data.
Interpreting the Length of the Whiskers
• The lower whisker extends from
Min to Q1. The length of this
whisker indicates the range of the
lowest (or, in this case, the
shortest) fourth of the ordered
data.
• The upper whisker extends from
Q3 to Max. The length of this
whisker indicates the range of the
highest (or, in this case, the
longest) fourth of the ordered
data.
A Note About
Language
• Note: Glencoe Math Connects
Plus textbooks refer to the
quartiles as “upper quartile”
and “lower quartile.”
• Florida’s Test Specifications
Manual says specifically “first
quartile” and “third quartile”
will be used.
Interpreting the Length of the Whiskers
• How do we interpret a boxplot
when the whiskers are an equal
length?
• How do we interpret a boxplot
when one of the whiskers is
much larger than the other?
Interactive Demonstration
Raisins: How many in each box?
Using the same scale for each plot, create a box plot for
each of the data sets below. Each is an ordered list of the
number of raisins in a group of boxes from a particular
brand.
Brand A
Brand B
23
25
25
26
26
26
17
22
24
24
25
25
26
27
27
27
27
28
25
25
26
26
26
26
29
29
29
30
30
31
26
26
27
27
27
27
31
31
32
32
32
33
28
29
29
29
29
29
34
34
35
35
36
39
29
30
30
Multiple Box Plots
Multiple Box Plots
Annenberg Learner: “Learning Math: Data
Analysis, Statistics, and Probability”
Session 4, Part D
mean  median  mode
MA.8.S.3.2
Determine and describe how changes in data values
impact measures of central tendency.
NCTM Illuminations
The Effect of Outlilers
• Consider these quiz scores. The two sets of scores above
are identical except for the first score. The set on the left
shows the actual scores. The set on the right shows what
would happen if the lowest score had been a zero instead.
The effect of outliers
• Here are Monica’s quiz
grades. What her
average?
• If her teacher drops
your lowest score, what
is her new average?
• By how many points did
her average change?
•
•
•
•
•
•
85
83
93
76
87
88
Average now:
85.3333…
Average after lowest
score is dropped:
87.2
The effect of outliers
• Here are Jack’s grades.
What is his average?
• If his teacher drops his
lowest score, what is his
new average?
• By how many points did
his average change?
•
•
•
•
•
•
85
83
93
76
87
0
Average now:
70.666…
Average after lowest
score is dropped:
84.8
The effect of outliers
Average now:
85.333…
Average now:
70.666...
Average after
lowest score is
dropped: 87.2
Average after
lowest score is
dropped: 84.8
• Which student realized the most benefit from the
teacher dropping the lowest score? Why?
Thinking about measures of center
The median of five numbers is 15. The mode is 6. The
mean is 12. What are the five numbers?
6
6 15
n
n
Thinking about measures of center
The median of five numbers is 15. The mode is 6. The
mean is 12. What are the five numbers?
6
6 15
n
n
6  6 15  n  n 27  2n

12
5
5
27  2n  60
2n  33
n  16.5
Thinking about measures of center
The median of five numbers is 15. The mode is 6. The
mean is 12. What are the five numbers?
6
6 15
a
b
Thinking about measures of center
The median of five numbers is 15. The mode is 6. The
mean is 12. What are the five numbers?
6
6 15
a
b
6  6 15  a  b 27  a  b

12
5
5
27  a  b  60
a  b  33
Missing Observations: Mean
Here are Jane’s scores on her first 4 math tests:
80 82 75 79
What score will she need to earn on the fifth test
for her test average (mean) to be an 80%?
AxN=T
before
after
Average
Number
Total
79
80
4
5
316
400
84 points
Jane has 316
points. She
needs 400 points.
How many more
does she need?
Missing Observations: Mean
Here are Jane’s scores on her first 4 math tests:
80 82 75 79
What score will she need to earn on the fifth test
for her test average (mean) to be an 80%?
80  82  75  79  n
 80
5
316  n
 80
5
400  316  n
n  84
Missing Observations: Mean
Here are Jane’s scores on her first 4 math tests:
80 82 75 79
There is one more test. Is there any way Jane can
earn an A in this class?
(Note: An “A” is 90% or above)
What measure of center
are we asking students to
consider?
Missing Observations: Mean
Here are Jane’s scores on her first 4 math tests:
80 82 75 79
There is one more test. Is there any way Jane can earn an A in this
class? (An “A” is 90% or above)
80  82  75  79  n
 90
5
316  n
 90
5
450  316  n
n  134
Missing Observations: Median
Here are Jane’s scores on her first 4 math tests:
80 82 75 79
What score will she need to earn on the fifth
test for the median of her scores to be an
80%?
75 79 80 82
Missing Observations: Median
What score will she need to earn on the fifth test for the median
of her scores to be an 80%?
75 79 80 82
70?
75?
79?
80?
81?
82?
83?
84?








• Construct a collection of numbers that has the
following properties. If this is not possible,
explain why not.
mean = 6 median = 4 mode = 4
What is the fewest number of observations
needed to accomplish this?
• Construct a collection of numbers that has the
following properties. If this is not possible,
explain why not.
mean = 6 median = 6 mode = 4
What is the fewest number of observations
needed to accomplish this?
• Construct a collection of 5 counting numbers
that has the following properties. If this is not
possible, explain why not.
mean = 5 median = 5 mode = 10
What is the fewest number of observations
needed to accomplish this?
• Construct a collection of 5 real numbers that
has the following properties. If this is not
possible, explain why not.
mean = 5 median = 5 mode = 10
What is the fewest number of observations
needed to accomplish this?
• Construct a collection of 4 numbers that has
the following properties. If this is not
possible, explain why not.
mean = 6, mean > mode
• Construct a collection of 5 numbers that has
the following properties. If this is not
possible, explain why not.
mean = 6, mean > mode
Adding a constant k
mean = 5
median = 5.5
mode = 6
range = 8
Adding a constant k
5
6
7
9
2
4
1
6
mean = 5
median = 5.5
mode = 6
range = 8
5+2=
6+2=
7+2=
9+2=
2+2=
4+2=
1+2=
6+2=
7
8
mean = 7
9
11 median = 7.5
mode = 8
4
range = 8
6
3
8
Multiplying by a constant k
• Suppose a constant k is multiplied by each
value in a data set. How will this affect the
measures of center and spread?
5 6 7 9 2 4 1 6
mean = 5
median = 5.5
mode = 6
range = 8
Multiplying by a constant k
5
6
7
9
2
4
1
6
mean = 5
median = 5.5
mode = 6
range = 8
5×2
=
6×2
=
7×2
=
9×2
=
2×2
=
10
12
14 mean = 10
18 median = 11
mode = 12
4
range = 16
8
2
12
Download