732G21/732A35/732G28
1
Formal statement
Yi 0 1 X 1 i
Yi is i th response value
β0 β1 model parameters, regression parameters (intercept,
slope)
Xi is i th predictor value
i is i.i.d. normally distributed random vars with expectation
zero and variance σ2
732G21/732A35/732G28
2
Inference about regression coefficients and response:
Interval estimates and test concerning coefficients
Confidence interval for Y
Prediction interval for Y
ANOVA-table
732G21/732A35/732G28
3
After fitting the data, we may obtain a regr. line
y 1.5 0.00005 x
Is 0.00005 significant or just because of random variation?
(hence, no linear dependence between Y and X)
How to do?
◦ Use Hypothesis testing (later)
◦ Derive confindence interval for β0 . If ”0” does not fall within this
interval, there is dependence
732G21/732A35/732G28
4
Estimated slope b1 is a random variable (look at formula)
X
n
b1
i 1
i
X Yi Y
X
n
i 1
i X
2
Properties of b1
Normally distributed (show)
E(b1)= β1
2
2
b1 n
Variance
2
X
i 1
Further:
i
X
Test statistics
b1 1
sb1
is distributed as t(n-2)
732G21/732A35/732G28
5
See table B.2 (p. 1317)
Example one-sided interval t(95%), 15 observations
t13=1.771
732G21/732A35/732G28
6
Confidence interval for β1 (show…)
b1 t 1 / 2, n 2sb1
If variance in the data is unknown,
s b1
s2
2
X
n
i 1
i X
2
Example Compute confidence interval for slope, Salary dataset
732G21/732A35/732G28
7
50
y = 0.5471x + 8.4545
45
40
Salary (y)
35
30
25
20
15
10
5
0
0
10
20
30
40
50
60
70
Age (x)
732G21/732A35/732G28
8
Often, we have sample and we test at some confidence
level α
H o : 0
H a : 0
or
H o : 0
H a : 0
or
H o : 0
H a : 0
How to do?
Step 1: Find and compute appropriate test function
T=T(sample,λ0)
Step 2: Plot test function’s distrubution and mark a critical
area dependent on α
If T is in the critical area, reject H0 otherwise do not reject H0
(accept H1)
732G21/732A35/732G28
9
Test
H o : 1 0
H a : 1 0
b1
Step 1: compute t
sb1
*
Step 2: Plot the distribution , mark the points t 1 / 2, n 2 and
the critical area.
Step 3: define where t* is and reject H0 if it is in the critical area
Example
Test the hypothesis for Salary dataset:
Manually, compute also P-values
By Minitab
732G21/732A35/732G28
10
Sometimes, we need to know ” β0=0?”
Do confidence intervals and hypothesis testing in the same way
using folmulas below!
b0 Y b1 X
Properties of b0
Normally distributed (show)
E(b0)= β0
2
2 1
Variance (show..) b0
n
Further:
Test statistics
X2
n
2
X
X
i
i 1
b0 0
sb0
is distributed as t(n-2)
732G21/732A35/732G28
11
If distribution not normal (if slightly, OK, otherwise
asymptotic)
Spacing affects variance (larger spacing –smaller variance)
Example Test β0=0 for Salary data
732G21/732A35/732G28
12
Estimate at X=Xh (Xh – any):
Properties of E(Yh)
Normally distributed (show)
E (Yˆh ) E Yh
2
1
X
X
Variance
h
2 Yˆ 2
h
Further:
n
Yˆh b0 b1 X h
n
2
X i X
i 1
ˆ E Y
Y
h is
Test statistics h
s Yˆh
Confidence interval
distributed as t(n-2)
Yˆh t 1 / 2, n 2s Yˆh
732G21/732A35/732G28
13
Make a plot…
CONFIDENCE INTERVAL
We estimate the position of the mean in the population with X = Xh
POINT ESTIMATE
PREDICTION INTERVAL
We estimate the position of the individual observation in the
population with X = Xh
732G21/732A35/732G28
14
When parameters are unknown, the mean E(Yh) may have
more than one possible location
New observation = mean + random error
-> prediction interval should be wider
732G21/732A35/732G28
15
Further:
ˆ
Y
Y
h
(
new
)
h is distributed as t(n-2)
Test statistics
spred
Prediction interval
Yˆh t 1 / 2, n 2s pred
How to estimate s(pred) ? New observ. is any within
b0+b1Xh+ε. Hence
2 pred 2 b0 b1 X h 2 b0 b1 X h 2 2 Yˆh 2
Standard error (show)
2
1
Xh X
2
s pred MSE 1 n
2
n
Xi X
i 1
732G21/732A35/732G28
16
Example
Calculate confidence and prediction intervals for 35 years old
person
Compare with output in Minitab
732G21/732A35/732G28
17
Total sum of squares
SSTO Yi Y
n
Error sum of squares
SSE
i 1
Regression sum of squares
i 1
SSR Yˆ Y
n
2
Yi Yˆi
2
n
i 1
2
i
SSTO SSR SSE
732G21/732A35/732G28
18
SSTO has n-1 (sum up to zero)
SSE has n-2 ( 2 model parameters)
SSR has 1 (fitted values lie on regression line= 2 degreessum up to zero 1 degree)
n-1 = n-2 + 1
SSTO =SSE + SSR
Important :
MSxx= SSxx/degrees_of_freedom
732G21/732A35/732G28
19
ANOVA table
Source of
variation
SS
df
Regression
SSR Yˆ Y
MS
2
1
n-2
i
Error
SSE Yi Yˆi
Total
SSTO Yi Y n - 1
2
MSR
SSR
1
MSE
SSE
n2
2
732G21/732A35/732G28
20
Expected mean squares
E MSE 2
E MSR
2
2
1
X
n
i 1
X
2
i
E(MSE) does not depend on the slope, even when zero
E(MSR) =E(MSE) when slope is zero
-> IF MSR much more than MSE, slope is not zero, if
approximately same, can be zero
732G21/732A35/732G28
21
H o : 1 0
H a : 1 0
Test statistics F* = MSR/MSE , use F(1,n-2) (see p. 1320)
Decision rules:
If F* > F(1-α;1, n-2) conclude Ha
If F* ≤ F(1-α;1, n-2) conclude H0
Note: F test and t test about β1 are equivalent
732G21/732A35/732G28
22
General approach
H o : 1 0
H a : 1 0
Full model: (linear)
n
n
SSE ( F ) Yi (b0 b1 X )
2
i 1
i 1
Reduced model: (constant)
Yi Yˆi
2
SSE
SSE ( R) Yi b0 Yi Y SSTO
n
i 1
2
n
2
i 1
732G21/732A35/732G28
23
It is known (why?..)
SSE(F)≤SSE(R).
Large difference -different models, small difference – can be
same
Test statistics
SSE R SSE F SSE ( F )
F
/
df R df F
df F
*
For univariate linear model, equivalent to F* = MSR/MSE
F* belongs to F(dfR-dfF,dfF) distribution (plot critical area..)
Test rule: F*> F(1-α; dfR-dfF,dfF) reject H0
732G21/732A35/732G28
24
Example For Salary dataset
Compose ANOVA table and compare with MINITAB
Perform F-test and compare with MINITAB
732G21/732A35/732G28
25
Coefficient of determination:
SSR
R
SSTO
2
Coefficient of correlation:
r R2
Limitations:
High R does not mean a good fit
Low R does not mean than X and Y are not related
Example: For Salary dataset, compute R2 and compare with
MINITAB
732G21/732A35/732G28
26
Chapter 2 up to page 78
732G21/732A35/732G28
27