BIOL 283 Lab 5: Confidence Intervals Lab Objectives: 1. Review frequency distributions 2. Practice making confidence intervals for population means 3. Practice making confidence intervals for differences in population means 4. Get a feel for the R language 5. Develop a resourceful attitude Background: A species of bear, Ursus gumbis, is found in both North America and Europe. It is believed that the North American bears grow larger than the European bears (although the reason for this phenomenon is unclear). It is also believed that they have different distributions of color morphs. You and your team of researchers will sample bears from the Kroger National Forest of Ohio and the Black Forest of Germany to (1) determine their color and (2) measure their mass (you might try to imagine what this would really involve). In this and subsequent labs, you will perform inferential statistical tests to determine if current conjectures are substantiated. Make sure to keep these data handy for the future! Part I. Defining two populations. Open up one bag of gummi bears, representing one entire population (from either North America or Europe). Pour the gummi bears into a plastic cup. One by one, weigh each bear on the provided scale (make sure that the scale is zeroed), by placing it in the provided plastic dish. Once measured, record the mass and color in the table provided in the Excel file found on Blackboard. Place the measured gummi bear in another cup (do not return it to the source cup). After every bear is measured, repeat with the second population. After both populations are measured and all data are collected, simulate a mass extinction event. Hint: using extra cups to sort the bears by color will save time later. Make sure to save a file with a name that will allow you to find it again. (You might consider saving it on the desktop during lab. At the end of lab, save it to a jump drive or email it to yourself as a back-up.) Have one person per group email the file to the instructor before leaving class. Part II. Summarizing populations. The Excel spreadsheet asks you to create frequency and relative frequency tables, as well as find some summary statistics for the populations. In R, create variables for color and mass, for each population, and make sure that you can re-create the same information in tabular or graphical form. This exercise might involve the need to refer to previous labs to find the appropriate functions, or look in the R help files. 1 BIOL 283 Lab 5: Confidence Intervals How did you create these variables? Copy and paste tabular or graphical output to the right 2 BIOL 283 Lab 5: Confidence Intervals Part III. Creating population confidence intervals Recall that the way to create a sample from a population in R is to do something like the following: a.s.10 = sample(mass.American.bears, 10) e.s.10 = sample(mass.European.bears, 10) Explain if your populations and/or your samples are reasonably normally distributed. (Hint: you might need to recall the function for normal probability plots) Create two samples – one each from both bear populations, sampling mass – of the same size (you decide the sample size) and record the statistics in the table below Sample Statistics: American n: y: s: SE : European 3 BIOL 283 Lab 5: Confidence Intervals Based on the sample size you chose, find the t0.025 value from a t-table and write it below t0.025 : As you now have all the information you need, calculate a 95% confidence interval for each population mean, and record that in the table below American European LCL = y - t0.025 ´ SE : UCL = y + t0.025 ´ SE : Where do the population means exist relative to the confidence intervals? Explain. Part IV. Varying sample size and α. Using the accompanying R script, repeat the procedure above multiple times and record the effects below. You can choose one population or comment on how the choice of population alters your conclusion. Make sure you read the direction in the r script! 4 BIOL 283 Lab 5: Confidence Intervals What is the effect of increasing sample size for a constant value of α? What is the effect of changing α for small sample sizes? For large sample sizes? What general conclusions can you make about estimation of population means? Part V. Comparing two population means. From the table on page 3 and the formula in your text book or notes, make calculation that allow you to fill in the following table 5 BIOL 283 Lab 5: Confidence Intervals y A - yE Pooled SE 6 df As you now have all the information you need, calculate a 95% confidence interval for the difference in population means, and record that in the table below American LCL = ( yA - yE ) - t0.025 ´ SE : UCL = ( yA - yE ) + t0.025 ´ SE : Does this 95% confidence interval contain the true difference between population means? If so/ if not, what does it suggest about your confidence interval? Does this 95% confidence interval contain 0? If it did contain 0, what would that say about the two population means European BIOL 283 Lab 5: Confidence Intervals Repeat the procedure (using the second accompanying R script) as on page 5, plus varying the sample sizes, and answer the questions below. What is the effect of increasing sample size for a constant value of α? What is the effect of changing the similarity of sample sizes? What is the effect of changing α for small sample sizes? For large sample sizes? What general conclusions can you make about estimation of difference between population means? 7 BIOL 283 Lab 5: Confidence Intervals CHALLENGE What would happen if you took two samples of the same size from one population and calculated a 95% confidence interval for the difference in population means? What is the expected difference? What do you notice about the confidence interval? If you repeated the procedure 100 times, how many times does the confidence interval not contain 0? How does this compare to α? 8