STA 200 Spring 2011
CHAPTER 2
Objective
We want to be able to extrapolate results
from a sample to the population at large.
In order to do this (and reach meaningful
conclusions), the sample should be
representative of the population.
Bad Sampling
Convenience Sampling
Select the individuals who are the easiest to reach
Voluntary Response Sampling
The sample selects itself via response to a general
appeal (call-in polls, write-in polls)
Example (Convenience)
Suppose you want to find out if UK faculty
members think there should be more math and
statistics as part of the USP requirements.
To obtain the sample, suppose you visit faculty
members on the 7th, 8th, and 9th floors of the
Patterson Office Tower (where the math and
statistics departments are located).
What’s wrong with this?
Example (Voluntary Response)
Consider a write-in poll concerning a
maximum salary for athletes/actors.
Some people are going to be more motivated
than others to participate in the poll. What
kind of opinion might they have?
Bias
When using a bad sampling method, you get
biased results. (With regard to percentages,
this means you’ll get a percentage either
higher or lower than you should.)
Bias occurs when certain outcomes are
statistically favored because the population is
incorrectly represented by the sample.
Good Sampling
Simple Random Sample
Consists of n individuals chosen in such a way that
every set of n individuals has the same chance of
being selected
Choosing a sample randomly significantly
reduces bias. In other words, the sample will
reflect the population much better.
Choosing an SRS
Nowadays, an SRS is usually chosen using a
computer. However, we can also use a table of
random digits (like the one in the back of the
textbook).
The process:
Assign a numerical label to each individual in the
population. Make sure all of the labels are the same
length.
Use software or a table of random digits to select
labels.
Example (Using a Table of Random Digits)
A food distributor wants to know if the boxes
of cereal in a particular shipment contain the
correct amount of cereal. The distributor
intends to randomly select five boxes out of a
shipment of 500 and weigh them.
What labels should we use?
Example (cont.)
Use the following line from the table to pick
the SRS:
19223 95034 05756 28713 96409 12531 …
Now, use another line to pick the SRS:
05007 16632 81194 14873 04197 85576 …
Trusting a Sample
If an SRS (or more complicated good sampling
method) is used, the sample should be quite
representative of the population.
If a poor sampling method is used, this will not
be the case.
Thus, if we try to extrapolate results from a
poorly obtained sample to the entire population,
the conclusions we reach will be rubbish.