Analyze and Summarize Data Sets Grade 8 Big Idea 3 Big Idea 3: Analyze and Summarize Data Sets Select, organize and construct appropriate data displays, including box and whisker plots, scatter plots, and lines of best fit to convey information and make conjectures about possible relationships. Determine and describe how changes in data values impact measures of central tendency. Box and Whisker plots Scatter plots Lines of Best Fit MA.8.S.3.1 Select, organize, and construct appropriate data displays, including box-and-whisker-plots, scatter plots, and lines of best fit to convey information and make conjectures about possible relationships. Graphical Displays Taught in Previous Grades •Pictographs •Histograms •Bar graphs (single and multiple) •Stem and leaf plots •Line graphs •Circle graphs • Box-and-whisker plots • Scatter plots • Lines of best fit MA.8.S.3.1 Scatterplots and Lines of Best Fit Annenberg Learner: “Against All Odds: Inside Statistics” Video 8 “Describing Relationships” (28:35) Discussion Questions • 1.“Correlation does not imply causation.” Explain. • 2. This video covers advanced statistics content, beyond what your students are expected to know. What basics must your students understand about linear regression in order to master this standard? • 3. Discuss the role of real-world situational context in the interpretation of a scatterplot/regression. • 4. What are some cautions about linear regression? Scatterplots: Ask a Question • Have you ever wondered whether tall people have longer arms than short people? Scatterplots: Ask a Question Is there a positive association between height and arm span? Do people with above-average arm spans tend to have aboveaverage heights? Do people with below-average arm spans tend to have belowaverage heights? Person# Arm Span Height Person# Arm Span Height 1 156 162 13 177 173 2 157 160 14 177 176 3 159 162 15 178 178 4 160 155 16 184 180 5 161 160 17 188 188 6 161 162 18 188 187 7 162 170 19 188 182 8 165 166 20 188 181 9 170 170 21 188 192 10 170 167 22 194 193 11 173 185 23 196 184 12 173 176 24 200 186 • a. Measure the arm span (fingertip to fingertip) and height (without shoes) to the nearest centimeter for six people, including yourself. • b. Does the information you collected generally support or reject the observation you made? • c. Identify the person in the table whose arm span and height are closest to your own arm span and height. Take it further • How strong is the positive association between arm span and height? • How do you compare to the “average” adult? Take it further • Is your arm span and height above the average of these 24 adults? • How many of the 24 people have above-average arm spans? • How many of the 24 people Mean arm span = 175.5 cm have above-average heights? Mean height = 174.8 cm Take it further • Adding a vertical line to the scatter plot that intersects the arm span (X) axis at the mean, 175.5 cm, separates the points into two groups. • Note that there are 12 arm spans above the mean and 12 below. Will this always happen? Why or why not? • What is true about anyone whose point in the scatter plot appears to the right of this line? What is true about anyone whose point appears to the left of this line? Take it further • Adding a horizontal line to the scatter plot that intersects the height (Y) at the mean, 174.8 cm, also separates the points into two groups. • What is true about anyone whose scatter plot point appears above this line? How many such points are there? Take it further BelowAboveAverage ArmAverage Arm Span Span Above average Height Below average Height 2 10 11 1 Mean arm span = 175.5 cm Mean height = 174.8 cm Above-Average Below-Average Arm Span Arm Span Take it further Above average Height Below average Height 2 10 11 1 Mean arm span = 175.5 cm Mean height = 174.8 cm A New Challenge The drawing below suggests that a person's arm span should be the same as her or his height -- in which case, a person could be considered a "square." Is this correct? Do most people have heights and arm spans that are approximately the same? That is, are most people "square?” Arm span = height? A New Challenge Why is this not the same as establishing an association between height and arm span? Arm span = height? Take it further • Compare the measurements for the six heights and arm spans you collected, including your own. • How many people are "squares" -- i.e., their arm spans and heights are the same? For how many people are these measurements approximately the same? • Consider the difference: Height - Arm Span If you know only that this difference is positive, what does it tell you about a person? What does it not tell you? If you know that this difference is negative, what does it tell you? What does it not tell you? If you know that this difference is 0, what does it tell you? • How many of the 24 people have heights greater than their arm spans? • How many of the 24 people have heights less than their arm spans? • How many of the 24 people have heights equal to their arm spans? • Which three points represent the greatest differences between height and arm span? • Other than the points that fall on the line Height = Arm Span, which six points represent the smallest differences between height and arm span? Annenberg Learner: “Learning Math: Data Analysis, Statistics, and Probability” Session 7 Part C “Modeling Linear Relationships” (1:15) • Which of the four people have heights greater than their arm spans? • Which of the four people have heights that are less than their arm spans? • Which of the four has the greatest difference between height and arm span? • Which of the four has the smallest difference between height and arm span? Line of Best Fit “trend line” Helps us describe the strength and direction of the relationship between two variables Dependent Independent The tighter the scatter, the better the line describes/predicts the relationship Minimizes residuals Misconceptions About the Line of Best Fit The line may go through some data points, but not necessarily any data points Sometimes there are half the observations above the line, and half the observations below it, but not always Misconceptions About the Line of Best Fit Just because you can draw a “line of best fit” does not mean the data is linear, or even that there is necessarily a relationship You must consider the strength of the regression, not just that a regression was possible Using the Correlation Coefficient to Evaluate the Strength of a Regression • To evaluate how well a regression describes or predicts a relationship, we use the correlation coefficient, r. • 1 r 1 The closer r is to 1 or -1, the stronger the correlation. (r = 1, perfect positive correlation) • The sign of r tells the slope of the line of best fit and the direction of the relationship. Using the Correlation Coefficient to Evaluate the Strength of a Regression • Four distributions whose r = 0.816 • For which ones is the line a good description of the trend? A good predictor of future behavior? Shodor Interactivate http://www.shodor.org/ interactivate/activities/R egression/ Shodor Interactivate Plot points Click checkbox for “fit your own line” Attempt to place the line in a way which minimizes residuals (Hint: click “Show Residuals”) Check your work by clicking checkbox for “display line of best fit” NCTM Illuminations Activities: “Line of Best Fit” http://illuminations.nctm.org/A ctivityDetail.aspx?id=146 Almost identical to the Shodor Interactivate applet NCTM Illuminations Enter coordinates as ordered pairs Click checkbox for “Student Fit” Attempt to place the line in a way which minimizes residuals Check your work by clicking checkbox for “Computer Fit” Line of Best Fit “trend line” Helps us describe the strength of the relationship between two variables Dependent Independent The tighter the scatter, the better the line describes/predicts the relationship Minimizes residuals GeoGebra Applet MA.8.S.3.1 Box and Whisker Plots Noodles: The Five Noodle Summary • For the following activities, you will need these materials: • a package of spaghetti or linguine • a metric ruler with millimeter markings • three pieces of paper or cardboard • a pen or pencil Noodles: The Five Number Summary • Break several spaghetti noodles into pieces to obtain 11 noodles of varying lengths. • Make sure that no two noodles in your set are the same length. • Draw a horizontal line on a piece of paper or cardboard large enough to display all the noodles in a row. • Next, arrange the 11 noodles in order from shortest to longest along the horizontal line. Noodles: Minimum and Maximum • Your arrangement should look something like this: • Label the MINIMUM and the MAXIMUM noodle length. Noodles: A Two Number Summary • This is your two-noodle summary: • Make a vertical axis and label the min and the max. Noodles: The Two Number Summary • If you knew only the values of MAX and MIN, describe some information you would not know about the remaining nine noodles. • Suppose someone asked you to find the "typical" value of a noodle in this data set. How would you answer this question if you only had the information from the TwoNumber Summary? Noodles: Finding the Median • Begin with the 11 noodles arranged in order from shortest to longest. • Remove two noodles at a time, one from each end, and put them to the side. • Continue this process until only one noodle remains. • This noodle is the median, which we'll label "Med.” Noodles: Understanding the Median • If you could see only the median noodle, what would you know about the other noodles? • If you could see only the median noodle, describe some information you would not know about the other noodles. Noodles: The Three Number Summary • If you could see only the min, med, and max, what would you know about the other noodles? • If you could see only the min, med, and the max noodles, describe some information you still would not know about the other noodles. Noodles: The Three Number Summary • If we call the length of the fourth noodle N4, how does N4 compare to Min, Med, and Max? • What wouldn't you know about N4 if you only knew Min, Med, and Max? Median: Even vs. Odd Data Sets Noodles: Observations with Even Sample Sizes • Add a 12th noodle, with a different length from the other 11 noodles, to the original collection. Arrange the noodles in order from shortest to longest. • Using the method of removing pairs of noodles (the longest and the shortest), try to determine the median noodle length. What happens? Noodles: Observations with Even Sample Sizes Noodles: Observations with Even Sample Sizes Noodles: Finding the 5 Number Summary • Return to the 12 noodles arranged in order from shortest to longest. • Divide your noodles into four groups with an equal number of noodles in each. • What is the median of the six noodles to the left of Q2? • What is the median of the six noodles to the right of Q2? The Five Number Summary Min, Q1, Med (Q2), Q3, and Max The Five Number Summary Min, Q1, Q2 (Med), Q3, Max Using the information given in this FiveNoodle Summary, describe what you know about the 12 noodles. For example, what do you know about the ninth noodle, and what information are you still missing? Noodles: Finding the 5 Number Summary • To convert the Five-Noodle Summary to the FiveNumber Summary, use the same procedure you've followed throughout this session. • Add a vertical number line so that you can indicate the lengths of the five noodles. • Remove the noodles, and you're left with the FiveNumber Summary. Noodles: Finding the 5 Number Summary • If N4 is the length of the fourth noodle, what information would you know about N4 from the Five-Number Summary? • Ralph claims that the Five-Number Summary is enough to know that N4 is closer to Q1 than it is to Med. He says, "Since N4, N5, and N6 are all between Q1 and Med, N4 has to be closer to Q1 than it is to Med." Is his reasoning valid? Why or why not? Median: Even vs. Odd Data Sets Making Conjectures • Explain how you would create a Five-Noodle Summary for 14 noodles. How many noodles are in each of the four groups? • Explain how you would create a Five-Number Summary for 15 noodles. How many numbers are in each of the four groups? Taking it Further • How many numbers are in each of the four groups if you started with 57 noodles? With 112 noodles? Can you find a rule that would allow you to determine the number of values in each group without creating a Five-Number Summary? Taking it Further • What information is learned from the interquartile range, the length of the interval between Q1 and Q3? Creating A Boxplot from the Five Number Summary Creating a Box Plot Now we'll look at how you can represent the five-number summary graphically, using a box plot. For this activity, we will work with a set of 12 noodles with the following measurements (in millimeters): 23 28 33 41 56 74 81 91 102 109 118 122 Creating a Box Plot Notice these observations are ordered. Why is it necessary to order the data before creating a five-number summary? 23 28 33 41 56 74 81 91 102 109 118 122 Ordering Observations • Now determine min, Q1, median, Q3, and max. Finding the Five Noodle Summary Five Noodle Summary Five Noodle Summary Interpreting the Length of Boxes and Whiskers • Long boxes and long whiskers simply mean the data is more spread out. • Counterintuitively, data points are less dense in a larger (more ‘spread out’) region or tail, which indicates skewness (lack of symmetry). Why Interpreting Boxplots is “Hard” for Kids • Students think: A “box” has area (square) and volume (rectangular prism) • Students think: when a box has more area/volume, it means more “stuff” fits in/on there. With boxplots this is not true. • By definition, every quartile has the same* number of observations. Why Interpreting Boxplots is “Hard” for Kids • Student sees a long box and thinks: “oh, there are lots of observations piled up there…” • … when, the truth is, the opposite is more likely true! The observations are all spread out. This is counterintuitive for many middle schoolers. Why Interpreting Boxplots is “Hard” for Kids Indicators of large spread • long ‘box’ • long whiskers Indicators of clusters of data consistent observations small spread/narrow range • short ‘box’ • short whiskers Interpreting the Orientation of the Box Plot • Box plots can be drawn vertically or horizontally, depending on whether you display the Five-Number Summary along a vertical or a horizontal axis. Interpreting the Length of the Box (IQR) • The entire rectangle indicates the range of the middle part(the interquartile range) of the ordered data. • How much of the data is represented by the box? • The interquartile range (IQR) represents the middle 50% of the data. Interpreting the Length of the Whiskers • The lower whisker extends from Min to Q1. The length of this whisker indicates the range of the lowest (or, in this case, the shortest) fourth of the ordered data. • The upper whisker extends from Q3 to Max. The length of this whisker indicates the range of the highest (or, in this case, the longest) fourth of the ordered data. A Note About Language • Note: Glencoe Math Connects Plus textbooks refer to the quartiles as “upper quartile” and “lower quartile.” • Florida’s Test Specifications Manual says specifically “first quartile” and “third quartile” will be used. Interpreting the Length of the Whiskers • How do we interpret a boxplot when the whiskers are an equal length? • How do we interpret a boxplot when one of the whiskers is much larger than the other? Interactive Demonstration Raisins: How many in each box? Using the same scale for each plot, create a box plot for each of the data sets below. Each is an ordered list of the number of raisins in a group of boxes from a particular brand. Brand A Brand B 23 25 25 26 26 26 17 22 24 24 25 25 26 27 27 27 27 28 25 25 26 26 26 26 29 29 29 30 30 31 26 26 27 27 27 27 31 31 32 32 32 33 28 29 29 29 29 29 34 34 35 35 36 39 29 30 30 Multiple Box Plots Multiple Box Plots Annenberg Learner: “Learning Math: Data Analysis, Statistics, and Probability” Session 4, Part D mean median mode MA.8.S.3.2 Determine and describe how changes in data values impact measures of central tendency. NCTM Illuminations The Effect of Outlilers • Consider these quiz scores. The two sets of scores above are identical except for the first score. The set on the left shows the actual scores. The set on the right shows what would happen if the lowest score had been a zero instead. The effect of outliers • Here are Monica’s quiz grades. What her average? • If her teacher drops your lowest score, what is her new average? • By how many points did her average change? • • • • • • 85 83 93 76 87 88 Average now: 85.3333… Average after lowest score is dropped: 87.2 The effect of outliers • Here are Jack’s grades. What is his average? • If his teacher drops his lowest score, what is his new average? • By how many points did his average change? • • • • • • 85 83 93 76 87 0 Average now: 70.666… Average after lowest score is dropped: 84.8 The effect of outliers Average now: 85.333… Average now: 70.666... Average after lowest score is dropped: 87.2 Average after lowest score is dropped: 84.8 • Which student realized the most benefit from the teacher dropping the lowest score? Why? Thinking about measures of center The median of five numbers is 15. The mode is 6. The mean is 12. What are the five numbers? 6 6 15 n n Thinking about measures of center The median of five numbers is 15. The mode is 6. The mean is 12. What are the five numbers? 6 6 15 n n 6 6 15 n n 27 2n 12 5 5 27 2n 60 2n 33 n 16.5 Thinking about measures of center The median of five numbers is 15. The mode is 6. The mean is 12. What are the five numbers? 6 6 15 a b Thinking about measures of center The median of five numbers is 15. The mode is 6. The mean is 12. What are the five numbers? 6 6 15 a b 6 6 15 a b 27 a b 12 5 5 27 a b 60 a b 33 Missing Observations: Mean Here are Jane’s scores on her first 4 math tests: 80 82 75 79 What score will she need to earn on the fifth test for her test average (mean) to be an 80%? AxN=T before after Average Number Total 79 80 4 5 316 400 84 points Jane has 316 points. She needs 400 points. How many more does she need? Missing Observations: Mean Here are Jane’s scores on her first 4 math tests: 80 82 75 79 What score will she need to earn on the fifth test for her test average (mean) to be an 80%? 80 82 75 79 n 80 5 316 n 80 5 400 316 n n 84 Missing Observations: Mean Here are Jane’s scores on her first 4 math tests: 80 82 75 79 There is one more test. Is there any way Jane can earn an A in this class? (Note: An “A” is 90% or above) What measure of center are we asking students to consider? Missing Observations: Mean Here are Jane’s scores on her first 4 math tests: 80 82 75 79 There is one more test. Is there any way Jane can earn an A in this class? (An “A” is 90% or above) 80 82 75 79 n 90 5 316 n 90 5 450 316 n n 134 Missing Observations: Median Here are Jane’s scores on her first 4 math tests: 80 82 75 79 What score will she need to earn on the fifth test for the median of her scores to be an 80%? 75 79 80 82 Missing Observations: Median What score will she need to earn on the fifth test for the median of her scores to be an 80%? 75 79 80 82 70? 75? 79? 80? 81? 82? 83? 84? • Construct a collection of numbers that has the following properties. If this is not possible, explain why not. mean = 6 median = 4 mode = 4 What is the fewest number of observations needed to accomplish this? • Construct a collection of numbers that has the following properties. If this is not possible, explain why not. mean = 6 median = 6 mode = 4 What is the fewest number of observations needed to accomplish this? • Construct a collection of 5 counting numbers that has the following properties. If this is not possible, explain why not. mean = 5 median = 5 mode = 10 What is the fewest number of observations needed to accomplish this? • Construct a collection of 5 real numbers that has the following properties. If this is not possible, explain why not. mean = 5 median = 5 mode = 10 What is the fewest number of observations needed to accomplish this? • Construct a collection of 4 numbers that has the following properties. If this is not possible, explain why not. mean = 6, mean > mode • Construct a collection of 5 numbers that has the following properties. If this is not possible, explain why not. mean = 6, mean > mode Adding a constant k mean = 5 median = 5.5 mode = 6 range = 8 Adding a constant k 5 6 7 9 2 4 1 6 mean = 5 median = 5.5 mode = 6 range = 8 5+2= 6+2= 7+2= 9+2= 2+2= 4+2= 1+2= 6+2= 7 8 mean = 7 9 11 median = 7.5 mode = 8 4 range = 8 6 3 8 Multiplying by a constant k • Suppose a constant k is multiplied by each value in a data set. How will this affect the measures of center and spread? 5 6 7 9 2 4 1 6 mean = 5 median = 5.5 mode = 6 range = 8 Multiplying by a constant k 5 6 7 9 2 4 1 6 mean = 5 median = 5.5 mode = 6 range = 8 5×2 = 6×2 = 7×2 = 9×2 = 2×2 = 10 12 14 mean = 10 18 median = 11 mode = 12 4 range = 16 8 2 12