Causal Relationships with measurement error in the data - PowerPoint PPT Presentation

About This Presentation

Title:

Causal Relationships with measurement error in the data

Description:

Using weights derived from the Variance Covariances of the covariances the ... free ly 1 1 ly 2 1 ly 3 1 ly 4 1. free te 1 1 te 2 2 te 3 3 te 4 4. value 1 ps 1 ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 35

Provided by: Suz145

Category:

more less

Transcript and Presenter's Notes

Title: Causal Relationships with measurement error in the data

1
Causal Relationships with measurement error in
the data

A brief introduction
by
Willem E.Saris

2
Basic concepts
Direct effect
y
x
y
Indirect effect
z
x
y
Spurious relation
z
x
x
z
Joint effect
w
y
3
An example of a model

How can these effects be estimated ?

4
Decomposition rule

The correlation between two variables is equal to
the sum of
- the direct effect,
- indirect effects,
- spurious relationships and
- joint effects between these variables.

5
Expression for the different components

The indirect effect, spurious relations and joint
effects are equal to the products
of the coefficients
along the path going from one variable to the
other while one can not pass the same variable
twice and can not go against the direction of the
arrows.

6
Derivations

These derivations can also be used to estimate
the parameters of this model. How ?

7
A second example
8
A Structural Equations Model
9
Derivations
10
The Proof
11
The correlations between the variables

The effects are equal to the correlations with x1

12
What if x1 is not observed ? Can we still
estimate the effects ?
13
What happens if we have 4 observed variables ?

With extra info

14
Identification

Of these three equations we need only one to
determine the value of b41 when we have solved
b11 and the other coefficients from the first
three
correlation coefficients
This model is called overidentified or the
degrees of freedom or df 2
df correlations - parameters to be estimated

15
A test is possible

If we know that b11 .7 and that
r(y1y4) b11b41 .35 it follows that b41 .5
Now we know all coefficients and two correlations
are not used yet and can be used to test the
model
r(y2y4) b21b41 r(y3y4) b31b41
r(y2y4) - r(y2y4) r(y2y4) - b21b41.3- .6x.5
.0
r(y3y4) - r(y3y4) r(y3y4) - b31b41.5 - .8x.5
.1
These differences are called residuals.
If these residuals are big the model must be
wrong.

16
Identification again

With 3 observed variables df0 and no test is
possible
With 2 observed variables df-1 and no test is
possible but even the effects can not be
estimated
If dflt0 the model is not identified

17
Estimation

The decomposition rules only hold for the
population correlations and not for the sample
correlations
But , normally, we know only the sample
correlations
It is easily shown that the solution is different
depending of the equations used
So an efficient estimation procedure is needed.

18
Estimation

There are several general principles.
We will discuss
- the Unweighted Least Squares (ULS) procedure
- the Weighted Least Squares (WLS) procedure.
Both procedures are based on the residuals
between the sample correlations and the expected
values of the correlations.

19
Estimation

The expected correlations are a function of the
parameters fij(p)
where p represents the set of parameters of the
model
and fij the specific function which gives the
link between the population correlations and the
parameters for the variables i and j.

20
ULS estimators

The ULS procedure suggests to look for the
parameter values that minimize the unweighted sum
of squared residuals
FULS S(rij fij(p))2
where the summation is over all unique elements
of the correlation matrix.

21
Estimation in this specific case
The program looks for the values of all the
parameters that minimize the function Fuls
22
WLS estimators

The WLS procedure suggests to look for the
parameter values that minimize the weighted sum
of squared residuals
FWLS Swij(rij fij(p))2 where the summation
is also over all unique elements of the
correlation matrix.
These weights can be chosen in different ways.

23
ADF estimator

Using weights derived from the Variance
Covariances of the covariances the Asymptotic
Distribution Free estimator is specified.
For any distribution of the observed variables
this estimator is consistent and provides
standard errors and a test statistic
The problem is that it requires very large
samples

24
ML estimator

The most commonly used procedure, the Maximum
Likelihood (ML) estimator, can be specified as a
special case of the WLS estimator.
The ML estimator provides standard errors for the
parameters and a test statistic for the fit of
the model for much smaller samples
but this estimator is developed under the
assumption that the observed variables have a
multivariate normal distribution.

25
Standard Procedure for testing S E Models

Testing is essential for S E Models
The test statistic t used is the value of the
fitting function at its minimum
If the model is correct, t is c2 (df) distributed
Normally the model is rejected if t gt Ca
where Ca is the value of the c2 for which
pr(c2df gt Ca) a
We come back to this issue later

26
LISREL input

estimation and testing a factor model
data ni4 no400 makm
km
1.0
.42 1.0
.56 .48 1.0
.35 .30 .40 1.0
model ny4 ne1 lyfu,fi tedi,fi psdi,fi
free ly 1 1 ly 2 1 ly 3 1 ly 4 1
free te 1 1 te 2 2 te 3 3 te 4 4
value 1 ps 1 1
out ULS

27
LISREL estimates of the effects of the latent
factor
28
LISREL estimates of the error variances
29
Goodness of fit test
30
LISREL input for different correlation matrix
estimation and testing a factor model data ni4
no400 makm km 1.0 .42 1.0 .56 .48 1.0 .35 .50
.50 1.0 model ny4 ne1 lyfu,fi tedi,fi
psdi,fi free ly 1 1 ly 2 1 ly 3 1 ly 4 1 free te
1 1 te 2 2 te 3 3 te 4 4 value 1 ps 1 1 out ULS
31
Estimates of the effects of the latent variable
estimation and testing a factor model
Number of
Iterations 9 LISREL Estimates (Unweighted
Least Squares) LAMBDA-Y
ETA 1 -------- VAR
1 0.64 (0.05)
14.18 VAR 2 0.67 (0.04)
15.43 VAR 3 0.79
(0.05) 15.75 VAR 4
0.64 (0.05) 14.28
32
Goodness of fit test of the model on the new
correlation matrix
Goodness of Fit Statistics W_A_R_N_I_N_G
Chi-square, standard errors, t-values and
standardized residuals are calculated under the
assumption of multi-variate normality.
Degrees of Freedom 2 Normal Theory Weighted
Least Squares Chi-Square 19.62 (P
0.00) Estimated Non-centrality Parameter (NCP)
17.62 90 Percent Confidence Interval for NCP
(6.96 35.72)
33
General Approach

A model is specified with observed and latent
variables
Correlations (covariances) between the observed
variables can be expressed in the parameters of
the model (decomposition rules)
If the model is identified the parameters can be
estimated
A test of the model can be performed if dfgt0
Eventual misspecifications can be detected
Corrections in the models can be introduced

34
Important Result

The distinction between observed and latent
variables makes the estimation of error
variances possible
The errors in social science survey data can be
quite large.
These errors will bias the estimates if not taken
into account
So the SEM approach has important advantages

Write a Comment

User Comments (0)