Hypothesis Testing - Step 1

advertisement
1
6.6 The Central Limit Theorem
6.6.1 State the Central Limit Theorem
The Central Limit Theorem states that for large random samples, the sampling
distribution of the sample means is close to a normal probability distribution.
6.6.2 Apply the Central Limit Theorem to make predictions about and calculate
probabilities for sample means.
The following steps are used for using the central limit theorem to calculate the
probability for a given sample mean.
Step 1: Calculate the z-score for the sample mean, X , using the following formula:
z
X 

(if the POPULATION standard deviation,  , is known)
n
where: X : is the SAMPLE mean
 : is the population mean
 : is the population standard deviation
n : the sample size
OR
z
X 
s
(if the SAMPLE standard deviation is known)
n
where: X : is the SAMPLE mean
 : is the population mean
s : is the sample standard deviation
n : the sample size
Step 2: Look up the probability in Appendix D and determine the desired
probability using the same methods as before.
2
Examples:
1. A normal population has a mean of 60 and a standard deviation of 12. You select a
random sample of 9. Compute the probability that the sample mean is:
a. between 60 and 63
b. greater than 63
c. less than 56
d. between 56 and 63
e. between 50 and 56
2. A population of 100 with an unknown shape has a mean of 75. You select a sample
of 40. The standard deviation of the sample is 5. Compute the probability that the
sample mean is:
a. less than 74
b. between 74 and 77
c. between 76 and 77
d. greater than 77
e. less than 76
3
Remember:
For finding sample mean probabilities:
Step 1: Calculate the z-score
Step 2: Use Appendix D (z-score chart) and interpret your answer.
Extra Examples:
1. A normal population has a mean of 60 and a standard deviation of 8. A random
sample of 9 is taken.
a. What is the probability that the sample mean is between 60 and 65?
b. What is the probability that the sample mean is between 54 and 60?
c. What is the probability that the sample mean is between 54 and 65?
2. A population of unknown shape has a mean of 70. You select a sample of42. The
standard deviation of the sample is 5.
a. Compute the probability the sample mean is greater than 71.
b. Compute the probability the sample mean is less than 68.8.
c. Compute the probability the sample mean is greater than 68.8.
d. Compute the probability the sample mean is less than 71.
3. A trucking company claims that the mean weight of their delivery trucks when
they are fully loaded is 6000 pounds and the standard deviation is 250 pounds.
Assume that the population follows the normal distribution. Ninety trucks are
randomly selected and weighed.
a. What is the probability that the sample mean is between 6020 and 6070 pounds?
4
b. What is the probability that the sample mean is between 5970 and 5980
pounds?
c. What is the probability that the sample mean is between 5970 and 6020?
d. What is the probability that the sample mean is more than 6020?
e. What is the probability that the sample mean is less than 6020?
f. What is the probability the sample mean is more than 5970?
g. What is the probability that the sample mean is less than 5970?
h. What is the probability that the sample mean is between 5970 and 6000?
More Practice!
Worksheet 6.6
1. The mean rent for a one-bedroom apartment in Southern California is
$2,200 per month. The distribution of the monthly costs does not follow the
normal distribution. In fact, it is positively skewed. What is the probability
of selecting a sample of 50 one-bedroom apartments and finding the mean to
be at least $1,950 per month? The standard deviation of the sample is $250.
2. According to an IRS study, it takes an average of 330 minutes for taxpayers
to prepare, copy, and electronically file a 1040 tax form. A consumer
watchdog agency selects a random sample of 40 taxpayers and finds the
standard deviation of the time to prepare, copy, and electronically file form
1040 is 80 minutes.
a. What assumption or assumptions do you need to make about the shape
of the population?
b. What is the standard error of the mean in this example?
c. What is the likelihood the sample mean is greater than 320 minutes?
d. What is the likelihood the sample mean is between 320 and 350
minutes?
e. What is the likelihood the sample mean is greater than 350 minutes?
5
3. Recent studies indicate that the typical 50-year-old woman spends $350 per
year for personal-care products. The distribution of the amounts spent is
positively skewed. We select a random sample of 40 women. The mean
amount spent for those sampled is $335, and the standard deviation of the
sample is $45. What is the likelihood of finding a sample mean this large or
larger from the specified population?
4. Information from the American Institute of Insurance indicates the mean
amount of life insurance per household in the United States is $110,000.
This distribution is positively skewed. The standard deviation of the
population is now known.
a. A random sample of 50 households revealed a mean of $112,000 and a
standard deviation of $40,000. What is the standard error of the
mean?
b. Suppose that you selected 50 samples of households. What is the
expected shape of the distribution of the sample mean?
c. What is the likelihood of selecting a sample with a mean of at least
$112,000?
d. What is the likelihood of selecting a sample with a mean of more than
$1000,000?
e. Find the likelihood of selecting a sample with a mean of more than
$1000,000 but less than $112,000.
5. The mean age at which men in the United States marry for the first time is
24.8 years. The shape and the standard deviation of the population are both
unknown. For a random sample of 60 men, what is the likelihood that the age
at which they were married for the first time is less than 25.1 years?
Assume that the standard deviation of the sample is 2.5 years.
6. A recent study by the Greater Los Angeles Taxi Drivers Association showed
that the mean fare charged for service from Hermosa Beach to the Los
Angeles International Airport is $18.00 and the standard deviation is $3.50.
We select a sample of 15 fares.
a. What is the likelihood that the sample mean is between $17.00 and
$20.00?
b. What must you assume to make the above calculation?
6
7.0 Hypothesis Testing
7.1 Introduction
7.1.1 Describe the purpose of Hypothesis Testing
What is Hypothesis Testing?
Hypothesis testing is a statistical procedure which involves a decision-making process for evaluating
claims about a certain parameter of a population.
As a researcher of data, you may be interested in answering many types of questions. Automobile
manufacturers may be interested in determining whether seat belts will reduce the severity of
injuries caused by accidents. A ladies' wear store may want to know whether the general public
prefers a certain colour in a new line of fashion swim wear. These types of questions can be
answered using the methods of hypothesis testing.
Hypothesis testing starts with a statement about a population parameter such as the mean.
What is a Hypothesis?
In statistical analysis we make a claim, that is, state a hypothesis, then follow up with tests to
verify the assertion or to determine that it is untrue.
Because we utilize statistical inference, is not necessary to measure the entire population; instead,
we take a sample from the population to determine whether the empirical evidence from the sample
does or does not support the statement concerning the population.
As noted, hypothesis testing starts with a statement about a population parameter such as the
mean.
7


Example: One statement about the performance of a new model car is that the mean miles
per gallon is 30.
Another statement is that the mean miles per gallon is not 30.
Only one of these statements is correct.
To test the validity of the assumption (hypothesis) that the meal miles per gallon is 30, we must
select a sample from the population, calculate sample statistics, and based on certain decision rules
either accept or reject the hypothesis.
7.2 Hypothesis Tests
7.2.1 Name and describe the components of a statistical hypothesis test
Five-Step Procedure for Hypotheses
Testing
When conducting hypothesis tests we actually employ a strategy of "proof by contradiction." We
hope to accept a statement to be true by rejecting or ruling out another statement. Statistical
hypothesis testing is a five-step procedure:
8
Hypothesis Testing - Step 1
The first step is to state the null and alternate hypotheses. What is the null hypothesis?
For example, a recent newspaper report made the claim that the mean length of a hospital stay was
3.3 days. You think that the true length of stay is some other length than 3.3 days.
The null hypothesis is written
Ho: µ = 3.3
It is the statement about the value of the population parameter - in this case the population mean.
The null hypothesis is established for the purpose of testing. On the basis of the sample evidence,
it is either rejected or not rejected. In other words, it is accepted or rejected.
If the null hypothesis is rejected, then we accept the alternate hypothesis.
The alternate hypothesis is written
H1: µ ≠3.3
There are two other formats for writing the null and alternate hypotheses: Suppose you think that
the mean length of stay is greater than 3.3 days. The null and alternate hypothesis would be
written:
µ = 3.3
H1: µ ≠ 3.3
Ho:
Note that in this case the null hypothesis indicates "no change or that
is less than 3.3." The
alternate hypothesis states that the mean length of stay is greater than 3.3 days.
Suppose you think that the mean length of stay is less then 3.3 days. The null and alternate
hypothesis would be written:
µ ≥ 3.3
H1: µ <3.3
Ho:
It is important to remember that no matter how the problem is stated, the null hypothesis will
always contain the equal sign. The equality sign will never appear in the alternate hypothesis.
One-tailed versus two-tailed test
9
When a direction is expressed in the alternate hypothesis, such as > or <, the test is referred to as
being a one-tailed test. When the alternate hypothesis is that of "≠" (not equal to), the test will
be a two-tailed test.
Hypothesis Testing - Step 2
After setting up the null hypothesis and alternate hypothesis, the next step is to state the level of
significance.
The level of significance is designated , the Greek letter alpha. If will indicate when the sample
mean is too far away from the hypothesized mean for the null hypothesis to be true.


When a true null hypothesis is rejected it is referred to as a Type I error.
If the null hypothesis is not true, but our sample results indicate that it is, we have a Type
II error.
10
Hypothesis Testing - Step 3
Step 3 of the hypothesis testing procedure is to compute the test statistic. What is a test
statistic?
Which test statistic do I use? This answer to this question is determined by factors such as
whether the population standard deviation is known and the size of the sample.
The standard normal distribution, the z value, is used



if the population is normally distributed
if the population standard deviation is known
and, when the sample size is greater than 30.
11
Hypothesis Testing - Step 4
Formulate the Decision Rule: A decision rule is based on Ho and H1 , the level of significance, and
the test statistic. The decision rule is formulated by finding the critical values for z.
If we are applying a one-tailed test, there is only one critical value. If we are applying a two-tailed
test, there are two critical values.
The following diagram illustrates the critical values for a two-tailed test, at the 0.01 level of
significance. Since this is a two-tailed test, half of the 0.01 is found in each tail - 0.005. The area
where Ho is not rejected is therefore 0.99. Since appendix D is based on half of the area under
the curve, we locate 0.99/2 = 0.4950 in the body of the table to find the corresponding z critical
values = 2.58.
Therefore, our decision rule is:
Reject the null hypothesis and accept the alternate hypothesis if the computed value of z does not
fall in the region between -2.58 and +2.58.
To find the critical value for a one-tailed test, at the 0.01 level of significance, place the 0.01 of
the total area in the upper or lower tail. This means that 0.5000 - 0.01 = 0.4900 of the area is
located between the z value of 0 and the critical value. We locate 0.4900 in the body of Appendix
D and our decision rule is to reject the null hypothesis if the computed value from the test statistic
exceeds 2.33 for an upper-tailed test or is less than -2.33 for a lower tailed test.
12
The following diagrams will illustrate the acceptance and rejection area for an upper-tailed test.
Hypothesis Testing - Step 5
Select the Sample and Make a Decision: The final step is to select the sample and compute the
value of the test statistic. This value is compared to the critical value, or values, and a decision is
made whether to reject to accept the null hypothesis.
In the following example the critical values for z are -2.58 and +2.58 (a two-tailed test). The
computed value of z = 1.55. Since the computed value falls in the acceptance range, we do not
reject, we accept the null hypothesis.
7.0 Hypothesis test examples:
1. A company manufactures desks. Their production follows the normal distribution,
with a mean of 200 per week and a standard deviation of 16. The president would
like to investigate whether the mean number of desks is different from 200 at the
0.01 significance level. A sample accumulated over 150 weeks has a mean of 203.5.
Is the president right in assuming that the mean number of desks is different
from 200?
13
2. The rate at which a stock of aspirin is changes each year has a mean of 6.0 and a
standard deviation of 0.50. A random sample of 64 aspirin revealed a mean of 5.84.
It is suspected that the mean turnover has changed and is no longer 6.0. Use the
0.05 significance level to test the hypothesis that the mean turnover is not 6.0.
3. The mean age of passenger cars in the US is 8.4 years. A sample of 40 cars in
the student lots at the University of Tennessee showed the mean age to be 9.2
years. The standard deviation of this sample was 2.8 years. At the 0.1 significance
level, can we conclude the mean age is more than 8.4 years for the cars of
Tennessee students?
4. The manager of a store wants to find whether the mean unpaid balance is more
than $400. The level of significance is set at 0.05. A random sample of 60 unpaid
balances revealed the sample mean is $407 and the standard deviation is $22.50.
Should she conclude that the mean is greater than $400?
5. The mean amount of time spent watching TV per day for eighth graders is 1.6
hours. A sample of 35 eight graders showed the mean number of hours to be 1.3
hours with a standard deviation of 1.0 hours. At the 0.01 significance level, can we
conclude that the mean age is less than 1.6 hours?
6. The mean number of hours spent on the phone by employees is said to be 37 with
a standard deviation of 2.1. The owner of a company wants to determine whether
the mean number of minutes is less than 37. She takes a sample of 43 employees
and finds that the mean amount of time spent is 33. Can we conclude that the mean
number of minutes is less than 37? (Use the 0.05 significance level.)
7. A town council claims that the mean number of hours citizens spend commuting
to work is 28 minutes. A company believes that the mean is not 28 minutes and
takes a sample of 50 citizens. They determine that the mean commuting time of
the sample is 36 minutes with a standard deviation of 11 minutes. At the 0.01
significance level, can the company conclude that the mean commuting time for the
town is different from 28 minutes?
14
Worksheet for 7.0
1.
The following information is available.
H0: µ = 50
H1: µ ≠ 50
The sample mean is 49, and the sample size is 36. The population follows the normal
distribution and the standard deviation is 5. Use the .05 significance level.
2. The following information is available.
H0: µ ≤ 10
H1: µ > 10
The sample mean is 12 for a sample of 36. The population follows the normal distribution and
the standard deviation is 3. Use the .02 significance level.
3. A sample of 36 observations is selected from a normal population. The sample mean is 21,
and the sample standard deviation is 5. Conduct the following test of hypothesis using the
.05 significance level.
H0: µ ≤ 20
H1: µ > 20
4. A sample of 64 observations is selected from a normal population. The sample mean is 215,
and the sample standard deviation is 15. Conduct the following test of hypothesis using the
.03 significance level.
H0: µ ≥ 220
H1: µ < 220
For Exercises 5-8: (a) State the null hypothesis and the alternate hypothesis. (b)
State the decision rule. (c) Compute the value of the test statistic. (d) What is your
decision regarding H0? (e) What is the ρ-value? Interpret it.
5. The manufacturer of the Χ-15 steel-belted radial truck tire claims that the mean mileage
the tire can be driven before the tread wears out is 60,000 miles. The Crosset Truck
Company bought 48 tires and found that the mean mileage for their trucks is 59,500 miles
with a standard deviation of 5,000 miles. Is Crosset’s experience different from that
claimed by the manufacturer at the .05 significance level?
6. The MacBurger restaurant chain claims that the waiting time of customers for service is
normally distributed, with a mean of 3 minutes and a standard deviation of 1 minute. The
quality-assurance department found in a sample of 50 customers at the Warren Road
15
MacBurger that the mean waiting time was 2.75 minutes. At the .05 significance level, can
we conclude that the mean waiting time is less than 3 minutes.
7. A recent national survey found that high school students watched an average (mean) of 6.8
DVDs per month. A random sample of 36 college students revealed that the mean number of
DVDs watched last month was 6.2, with a standard deviation of 0.5. At the .05 significance
level, can we conclude that college students watch fewer DVDs a month than high school
students?
8. At the time she was hired as a server at the Grumney Family Restaurant, Beth Brigden was
told, “You can average more than $80 a day in tips.” Over the first 35 days she was
employed at the restaurant, the mean daily amount of her tips was $84.85, with a standard
deviation of $11.38. At the .01 significance level, can Ms. Brigden conclude that she is
earning an average of more than $80 in tips.
Download