Econometrics With Eviews Chapter 17 Version 4 Discrete and Limited Dependent Variable Models Part 1: - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

Econometrics With Eviews Chapter 17 Version 4 Discrete and Limited Dependent Variable Models Part 1:

Description:

CORRW specifies that the final working correlation matrix be printed ... Correlation Matrix Dimension 4. Maximum Cluster Size 4. Minimum Cluster Size 4 ... – PowerPoint PPT presentation

Number of Views:90

Avg rating:3.0/5.0

Slides: 35

Provided by: HALL2

Category:

more less

Transcript and Presenter's Notes

Title: Econometrics With Eviews Chapter 17 Version 4 Discrete and Limited Dependent Variable Models Part 1:

1
Longitudinal Data Analysis with Discrete and
Continuous Responses using Proc MixedPart 6
Generalized Linear Models for Longitudinal Data
June 9, 2004 Charlie Hallahan
2

Introduction

These notes are based on a 3-day course taught by
SAS Institute.
Part 1 Introduction to Longitudinal Analysis
Exploratory Analysis
Part 2 General Linear Mixed Model
Evaluating Covariance Structures
Part 3 Model Development and Interpretation
Part 4 Random Coefficient Models
Part 5 Model Assessment
? Part 6 Generalized Linear Models for
Longitudinal data
Part 7 Generalized Linear Mixed Models

3
Review
So far, weve covered the following In Part 1,
we applied a number of EDA tools in looking for
possible relationships between the dependent
variable, CD4 cell counts, and the candidate
explanatory variables, some continuous and some
discrete. Based on this preliminary examination,
in Part 2 we looked for a reasonable covariance
structure for our model starting with a full
model containing all the interactions and higher
order terms (e.g., time entered the model as a
cubic). The main tool used at this stage was the
variogram, a graphic device that generalizes the
autocovariance function and can be used for
highly irregular and unbalanced data sets. The
spatial power covariance structure with the
local option for measurement error was selected.
Random effects were also indicated by the
variogram. In Part 3, once a covariance
structure was selected (using REML as the
estimation method), the mean part of the model
was developed (using ML). Plots of interactions
were examined. Heterogeneity was addressed, and
it was found that the variance of CD4 cell
counts was greater before seroconversion than
after. Heteroskedasticity was modeled with the
GROUP option on the REPEATED statement.
4
Review
In Part 4, the RANDOM statement was introduced as
a way to model between- subject variation in the
form of random coefficient models.
Y
X
Subject-specific regression lines deviate from
the population regression line In Part 5 we
examined residual plots for the final random
coefficients model.
5
Review
6
Review
The final model fit in Part 4 used both the
REPEATED and RANDOM statements and allowed for a
linear random effect for time.
proc mixed datamixed.aids covtest class
timegroup model cd4_scaletime age cigarettes
drug partners depression timeage
timecigarettes timetime timetimetime /
solution ddfmkr random intercept time /
typeun subjectid repeated /
typesp(pow)(time) local subjectid
grouptimegroup title 'Longitudinal Model
with Random Effects and Serial Correlation' run
7
Review
Significant variance estimates for the random
effects associated with the intercept and linear
time effect suggests that the intercepts and
time-slopes vary across subjects.
Covariance Parameter
Estimates
Standard
Z Cov Parm Subject
Group Estimate Error Value
Pr Z UN(1,1) id
5.1340 0.6045 8.49
lt.0001 UN(2,1) id
-0.1736 0.1366
-1.27 0.2036
UN(2,2) id
0.2149 0.06590 3.26 0.0006
Variance id timegroup
1 7.9399 0.8688 9.14 lt.0001
SP(POW) id
timegroup 1 0.1971 0.06965 2.83
0.0047 Variance id
timegroup 2 1.5141 0.5168 2.93
0.0017 SP(POW) id
timegroup 2 0.5059 0.2173 2.33
0.0199 Residual
2.6003 0.3190
8.15 lt.0001
Fit Statistics
-2 Res Log
Likelihood 11503.5
AIC (smaller is better)
11519.5
AICC (smaller is better) 11519.5
BIC (smaller is
better) 11550.7
Null Model Likelihood Ratio
Test
DF Chi-Square Pr gt ChiSq
7 1146.82
lt.0001
8
Review

Solution for Fixed
Effects
Standard
Effect Estimate Error DF
t Value Pr gt t
Intercept 7.7302 0.2227 875
34.71 lt.0001
time -1.0433 0.07799 664
-13.38 lt.0001
age 0.01524 0.01902 338
0.80 0.4236
cigarettes 0.3562 0.07409 895
4.81 lt.0001
drug 0.2702 0.1702 2038
1.59 0.1125
partners 0.04505 0.02056 2099
2.19 0.0286
depression -0.01811 0.007413 2079
-2.44 0.0146
timeage -0.01326 0.006273 237
-2.11 0.0355
timecigarettes -0.1081 0.03024 550
-3.58 0.0004
timetime -0.08501 0.02942 1005
-2.89 0.0039
timetimetime 0.03698 0.006708 941
5.51 lt.0001
Inferences for the fixed effects are similar to
the model with cubic random time effects and the
final model in Part 3 which just modeled the
within-variation through R.
9
Generalized Linear Models for Longitudinal Data

Objectives
Explain the differences between generalized
linear models for independent
observations based on likelihood methods and
those for correlated longitudinal
data based on generalized estimating
equations (GEE).
Show the available correlation structures in the
GENMOD procedure.
Fit a longitudinal data model in PROC GENMOD
with the REPEATED
statement.
Discussion based on Zeger-Liang, Longitudinal
Data Analysis for Discrete and
Continuous Outcomes, Biometrics 42, 121-130,
March 1986.
Another good reference is Generalized
Estimating Equations by J. Hardin
J. Hilbe, Chapman Hall, 2003

10
Generalized Linear Models for Longitudinal Data
In 1983, McCullagh and Nelder introduced
generalized linear models as a generalization of
the ordinary Gaussian linear regression
model. The response variable y is assumed to
have a distribution from the exponential family,
which includes Gaussian, binomial, gamma,
Poisson, categorical, and ordinal data. The data
is assumed to come from independent observations
and estimation is based on likelihood principles.
11
Generalized Linear Models for Longitudinal Data
A key concept is the link function which relates
E(y) ? to a linear expression of the
explanatory variables, Xß, i.e., Xßg(?). Having
distributions from the exponential family means
that generalized linear models also involve a
variance function V relating the variance of y to
its mean, i.e. var(y) ?V(?)
where ? is a constant. Examples Model
Distribution Link Function
Variance Function OLS
Normal g(?) ?
V(?)1 Logistic
Binomial g(?) log(?/1-?)
V(?) ?(1-?) Poisson
Poisson g(?) log(?)
V(?) ?
12
Generalized Linear Models for Longitudinal Data
13
Generalized Linear Models for Longitudinal Data
14
Generalized Linear Models for Longitudinal Data
15
Generalized Linear Models for Longitudinal Data
16
Generalized Linear Models for Longitudinal Data
17
Generalized Linear Models for Longitudinal Data
18
Generalized Linear Models for Longitudinal Data
19
Example Wheezing Study
An example is now presented of a binomial model
for longitudinal data. The study assesses the
health effects of air pollution on children and
the data consists of repeated measures of
wheezing status for each of 537 children from
Steubenville, Ohio. The measurements were taken
at ages 7,8,9, and 10 years. The smoking status
of the mother at the first year of the study is
also recorded. The data is available on the
OZDATA web site in Australia. The variables in
the data set are case patient identification
number wheeze wheezing status of the child
(1yes, 0no) age age of child when measurement
was taken (in years) smoker smoking status of
mother (YES versus NO)
20
Example Wheezing Study
21
Example Wheezing Study

Line Listing of Wheezing
Data
Obs smoker case age wheeze
1 No
1 7 0
2 No 1 8 0
3
No 1 9 0
4 No 1 10
0
5 No 2 7 0
6 No 2
8 0
7 No 2 9 0
8 No
2 10 0
9 No 3 7
0 10
No 3 8 0
11 No 3
9 0
12 No 3 10 0
13 No
4 7 0
14 No 4 8 0
15
No 4 9 0
16 No 4 10
0
17 No 5 7 0
18 No 5
8 0
19 No 5 9 0
20 No
5 10 0
Note that smoker is a time-independent
variable while age is time-dependent. Also
note that age is in sort order within each
subject. If this was not the case or some
subjects had missing time observations, then the
WITHINage option would have to be used in the
REPEATED statement and age would have to
included in the CLASS statement.
22
Example Wheezing Study
Calculate the logits of wheeze for each age.
proc means datamixed.wheeze noprint nway
class age var wheeze output outbins
sum(wheeze)wheeze run data bins set
bins logitlog((wheeze1)/(_freq_-wheeze1))
run proc gplot databins plot logitage
symbol vstar inone title 'Estimated Logit
Plot of Age of Child' run quit
The option NWAY requests that the output data set
have only one observation for each level of the
CLASS statement.
23
Example Wheezing Study
The logit plot of age shows that there is little
change in wheezing status from age 7 to 8, but
after age 8, there seems to be a large drop in
the probability of wheezing.
24
Example Wheezing Study
Calculate the logits of wheeze for each value
of smoker.
proc means datamixed.wheeze noprint nway
class smoker var wheeze output outbins
sum(wheeze)wheeze run data bins set
bins logitlog((wheeze1)/(_freq_-wheeze1))
run proc gplot databins plot
logitsmoker symbol vstar inone title
'Estimated Logit Plot of Smoking Status of
Mother' run quit
25
Example Wheezing Study
The logit plot of smoker shows that mothers who
smoke have higher probabilities of having
children who wheeze. However, the difference in
logits (-1.54 to -1.82) is much smaller than the
difference in the logits between ages 8 and 10
(-1.58 to -2.00).
26
Example Wheezing Study
The goal now is to fit a longitudinal model using
the unstructured correlation structure. Also,
compute the odds ratio for smokers and for a
one-year decrease in age.
proc genmod datamixed.wheeze desc class case
smoker model wheeze smoker age / distbin
type3 repeated subject case / corrw covb
modelse typeunstr estimate 'smoking'
smoker -1 1 / exp estimate 'age' age -1 /
exp title 'Longitudinal Model of Wheezing
among Children' run
27
Example Wheezing Study
Some comments on the options used on the previous
page DESC reverses the sort order for the
levels of the outcome variable Selected MODEL
statement options DIST specifies probability
distribution. DISTBINOMIAL uses the default
logit link function. TYPE3 requests Type 3
likelihood ratio statistics be computed for each
effect specified in the MODEL statement.
28
Example Wheezing Study
Selected REPEATED statement options CORRW specif
ies that the final working correlation matrix be
printed COVB specifies that the parameter
estimate covariance matrix be printed MODELSE dis
plays an analysis of parameter estimates table
using model-based standard errors Selected
ESTIMATE statement option EXP requests that the
exponentiated estimate, its standard error, and
the confidence bounds be computed. This option
computes the odds ratio.
29
Example Wheezing Study

Longitudinal Model of Wheezing among Children

The GENMOD Procedure
Model Information
Data Set
MIXED.WHEEZE
Distribution
Binomial
Link Function Logit
Dependent
Variable wheeze
Observations Used
2148
Class Level Information
Class Levels Values
case 537 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26
27 28 29 30 31 32 33 34 35 36 37
38 39 40 41 42 43
44 45 46 47 48 49 50 51 52 53 54
55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77
78 79 80 81 82 83 84 85 86 87
...
smoker 2 No Yes
Response
Profile
Ordered Total
Value wheeze
Frequency
1 1 326
2 0
1822
30
Example Wheezing Study
PROC GENMOD is modeling the probability that
wheeze'1'.
Parameter Information
Parameter
Effect smoker
Prm1 Intercept
Prm2
smoker No
Prm3 smoker Yes
Prm4
age
Criteria For Assessing Goodness Of Fit
Criterion DF
Value Value/DF
Deviance 2145
1819.8893 0.8484
Scaled Deviance 2145
1819.8893 0.8484
Pearson Chi-Square 2145
2146.1646 1.0005
Scaled Pearson X2 2145
2146.1646 1.0005
Log Likelihood
-909.9447 Algorithm
converged.
In general, a good fitting model has a
value/degrees of freedom of around 1.0.
31
Example Wheezing Study

Analysis Of Initial Parameter Estimates

Standard Wald 95 Confidence Chi-
Parameter DF Estimate
Error Limits Square Pr gt
ChiSq Intercept 1
-0.5909 0.4648 -1.5019 0.3201
1.62 0.2036 smoker No
1 -0.2721 0.1235 -0.5141
-0.0301 4.86 0.0275
smoker Yes 0 0.0000 0.0000
0.0000 0.0000 . .
age 1 -0.1134
0.0541 -0.2194 -0.0074 4.40
0.0360 Scale 0
1.0000 0.0000 1.0000 1.0000
NOTE The scale
parameter was held fixed.
GEE Model
Information
Correlation Structure
Unstructured
Subject Effect case (537
levels)
Number of Clusters
537
Correlation Matrix Dimension
4 Maximum
Cluster Size 4
Minimum Cluster Size
4
The initial parameter estimates assume the data
are independent. Note that both smoker and age
are initially significant at the 0.05 level.
32
Example Wheezing Study

Covariance Matrix (Model-Based)

Prm1 Prm2 Prm4
Prm1 0.16394
-0.01960 -0.01744
Prm2 -0.01960 0.03152
0.0000391
Prm4 -0.01744 0.0000391
0.002104
Covariance Matrix (Empirical)
Prm1
Prm2 Prm4
Prm1 0.15244 -0.01842
-0.01612
Prm2 -0.01842 0.03175
-0.000153
Prm4 -0.01612 -0.000153
0.001957 Algorithm
converged.
Working Correlation Matrix
Col1
Col2 Col3 Col4
Row1 1.0000 0.3519
0.3096 0.3043
Row2 0.3519 1.0000 0.4715
0.3199 Row3
0.3096 0.4715 1.0000
0.3780 Row4
0.3043 0.3199 0.3780 1.0000
Similar matrices provide evidence that the
correlation structure specified in the model was
the correct choice.
The Working Correlation Matrix shows that there
is some positive autocorrelation within
subject, but it is not decreasing very much over
time.
33
Example Wheezing Study

Analysis Of GEE Parameter Estimates
Empirical
Standard Error Estimates
Standard 95
Confidence Parameter
Estimate Error Limits Z Pr
gt Z Intercept
-0.6011 0.3904 -1.3663 0.1642 -1.54
0.1237 smoker No
-0.2534 0.1782 -0.6026 0.0959 -1.42
0.1550 smoker Yes
0.0000 0.0000 0.0000 0.0000 . .
age -0.1149
0.0442 -0.2016 -0.0282 -2.60 0.0094
Analysis Of
GEE Parameter Estimates
Model-Based Standard Error
Estimates
Standard 95 Confidence
Parameter Estimate Error
Limits Z Pr gt Z
Intercept -0.6011 0.4049
-1.3947 0.1925 -1.48 0.1377
smoker No -0.2534 0.1776
-0.6014 0.0946 -1.43 0.1536
smoker Yes 0.0000 0.0000
0.0000 0.0000 . .
age -0.1149 0.0459 -0.2048
-0.0250 -2.51 0.0122
Scale 1.0000 . . .
. .
NOTE The scale parameter
was held fixed.
Since the MODELSE option is used, both the
Model-Based and Empirical ses are listed. The
empirical ses are more robust and do not depend
on the Working Correlation Matrix being correctly
specified, The Model-Based ses are based on
the assumed correlation structure and are better
estimates if that assumption is correct. Note
that smoker is no longer significant and age is
more significant than before
34
Example Wheezing Study

Score Statistics For Type 3 GEE Analysis

Chi-
Source DF Square Pr gt
ChiSq
smoker 1 1.90 0.1677
age
1 6.34 0.0118
Contrast Estimate
Results
Standard
Chi- Label Estimate
Error Alpha Confidence Limits Square
Pr gt ChiSq smoking
0.2534 0.1782 0.05 -0.0959
0.6026 2.02 0.1550
Exp(smoking) 1.2884 0.2296 0.05
0.9086 1.8269 age
0.1149 0.0442 0.05 0.0282
0.2016 6.75 0.0094
Exp(age) 1.1218 0.0496 0.05
1.0286 1.2234
The score statistics for the Type 3 tests are
similar to the results on the previous page based
on the empirical and model-based ses. For small
sample sizes, the score test is recommended. the
Contrast Estimate Results table is produced by
the ESTIMATE statement. The EXP option
produces the exponentiated values, which are the
odds ratio. So the odds of wheezing increase by
29 if the mother smokes and there is a 12
increase in the odds of wheezing for each one
year increase in age.

Write a Comment

User Comments (0)