Topic 6: Estimation and Prediction of Yh - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Topic 6: Estimation and Prediction of Yh

Description:

Outline Estimation and inference of E(Yh) Prediction of a new observation Construction of a confidence band for the entire regression line Estimation of E(Yh) E(Yh ... – PowerPoint PPT presentation

Number of Views:101
Avg rating:3.0/5.0
Slides: 32
Provided by: Georg397
Category:

less

Transcript and Presenter's Notes

Title: Topic 6: Estimation and Prediction of Yh


1
Topic 6 Estimation and Prediction of Yh
2
Outline
  • Estimation and inference of E(Yh)
  • Prediction of a new observation
  • Construction of a confidence band for the entire
    regression line

3
Estimation of E(Yh)
  • E(Yh) µh ß0 ß1Xh, the mean value of Y for
    the subpopulation with XXh
  • We will estimate E(Yh) by
  • KNNL use for this estimate, see equation
    (2.28) on pp 52

4
Theory for Estimation of E(Yh)
  • is Normal with mean µh and variance
  • The Normality is a consequence of the fact that
    b0 b1Xh is a linear combination of Yis
  • See KNNL pp 52-54 for details

5
Application of the Theory
  • We estimate s2( ) by
  • It then follows that
  • Details for confidence intervals and significance
    tests are consequences

6
95 Confidence Interval for E(Yh)
  • tcs( )
  • where tc t(.975, n-2)
  • NOTE significance tests can be constructed but
    they are rarely used in practice

7
Toluca Company Example (pg 19)
  • Manufactures refrigeration equipment
  • One replacement part manufactured in lots of
    varying sizes
  • Company wants to determine the optimum lot size
  • To do this, company needs to first describe the
    relationship between work hours and lot size

8
Scatterplot w/ regr line
9
SAS CODE
Generating the data set data toluca
infile ../data/CH01TA01.txt' input lotsize
hours data other size65 output
size100 output data toluca1 set toluca
other proc print datatoluca1 run
10
SAS CODE
Generating the confidence intervals for all
values of X in the data set proc reg
datatoluca1 model hourssize/clm id
lotsize run
clm option generates confidence intervals for the
mean
11
Parameter Estimates Parameter Estimates Parameter Estimates Parameter Estimates Parameter Estimates Parameter Estimates
Variable DF ParameterEstimate StandardError t Value Pr gt t
Intercept 1 62.36586 26.17743 2.38 0.0259
lotsize 1 3.57020 0.34697 10.29 lt.0001
Output Statistics Output Statistics Output Statistics Output Statistics Output Statistics Output Statistics Output Statistics
Obs lotsize DependentVariable PredictedValue Std ErrorMean Predict 95 CL Mean 95 CL Mean
1 80 399.0000 347.9820 10.3628 326.5449 369.4191
25 70 323.0000 312.2800 9.7647 292.0803 332.4797
26 65 . 294.4290 9.9176 273.9129 314.9451
27 100 . 419.3861 14.2723 389.8615 448.9106
12
Notes
  • Standard error affected by how far Xh is from
    (see Figure 2.6)
  • Recall teeter-totter ideaa change in the slope
    has bigger impact on Y as you move away from

13
Prediction of Yh(new)
  • Want to predict value for a new observation at
    XXh
  • Model Yh(new) ß0 ß1Xh ?
  • Since E(e)0 same value as for E(Yh)
  • Prediction interval, however, relies heavily on
    assumption that e are Normally distributed

Note!!
14
Prediction of Yh(new)
  • Var(Yh(new))Var( )Var(? )
  • Then follows that

15
Notes
  • Procedure can be modified for the mean of m
    observations at XXh (see 2.39a and 239b on page
    60)
  • Standard error affected by how far Xh is from
    (see Figure 2.6)

16
SAS CODE
Generating the prediction intervals for all
values of X in data set proc reg
datatoluca1 model hourslotsize/cli id
lotsize run
cli option generates prediction interval for a
new observation
17
Output Statistics Output Statistics Output Statistics Output Statistics Output Statistics Output Statistics Output Statistics
Obs lotsize DependentVariable PredictedValue Std ErrorMean Predict 95 CL Predict 95 CL Predict
1 80 399.0000 347.9820 10.3628 244.7333 451.2307
25 70 323.0000 312.2800 9.7647 209.2811 415.2789
26 65 . 294.4290 9.9176 191.3676 397.4904
27 100 . 419.3861 14.2723 314.1604 524.6117
These are wrongsame as before. Does not include
variability about regression line
18
Notes
  • The standard error (Std Error Mean Predict)given
    in this output is the standard error of
    not s(pred)
  • The prediction interval is correct and wider than
    the previous confidence interval

19
Notes
  • To get correct standard error need to add the
    variance about the regression line

20
Confidence band for regression line
  • Ws( )
  • where W22F(1-a 2, n-2)
  • This gives combined confidence intervals for
    all Xh
  • Boundary values of confidence bands define a
    hyperbola
  • Will be wider at Xh than single CI

21
Confidence band for regression line
  • Theory comes from the joint confidence region for
    (ß0, ß1 ) which is an ellipse (Stat 524)
  • We can find an alpha for tc that gives the same
    results
  • We find W2 and then find the alpha for tc that
    will give W tc

22
SAS CODE
data a1 n25 alpha.10 dfn2 dfdn-2
tsingletinv(1-alpha/2,dfd)
w22finv(1-alpha,dfn,dfd) wsqrt(w2)
alphat2(1-probt(w,dfd)) t_ctinv(1-alphat/2,
dfd) output proc print dataa1 run
23
SAS OUTPUT
n alpha dfn dfd tsingle w2 w alphat t_c
25 0.1 2 23 1.71387 5.09858 2.25800 0.033740 2.25800
Used for 90 confidence band
Used for single 90 CI
24
SAS CODE
symbol1 vcircle irlclm97 proc gplot
datatoluca plot hourslotsize run
25
(No Transcript)
26
Estimation of E(Yh) and Prediction of Yh
  • b0 b1Xh

27
SAS CODE
symbol1 vcircle irlclm95 proc gplot
datatoluca plot hourslotsize symbol1
vcircle irlcli95 proc gplot datatoluca
plot hourslotsize run
Confidence intervals
Prediction intervals
28
Confidence band
29
Confidence intervals
30
Prediction intervals
31
Background Reading
  • Program topic6.sas has the code for the various
    plots and calculations
  • Sections 2.7, 2.8, and 2.9
Write a Comment
User Comments (0)
About PowerShow.com