Causal Relationships with measurement error in the data - PowerPoint PPT Presentation

About This Presentation
Title:

Causal Relationships with measurement error in the data

Description:

Using weights derived from the Variance Covariances of the covariances the ... free ly 1 1 ly 2 1 ly 3 1 ly 4 1. free te 1 1 te 2 2 te 3 3 te 4 4. value 1 ps 1 ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 35
Provided by: Suz145
Category:

less

Transcript and Presenter's Notes

Title: Causal Relationships with measurement error in the data


1
Causal Relationships with measurement error in
the data
  • A brief introduction
  • by
  • Willem E.Saris

2
Basic concepts
Direct effect
y
x
y
Indirect effect
z
x
y
Spurious relation
z
x
x
z
Joint effect
w
y
3
An example of a model
  • How can these effects be estimated ?

4
Decomposition rule
  • The correlation between two variables is equal to
    the sum of
  • - the direct effect,
  • - indirect effects,
  • - spurious relationships and
  • - joint effects between these variables.

5
Expression for the different components
  • The indirect effect, spurious relations and joint
    effects are equal to the products
    of the coefficients
    along the path going from one variable to the
    other while one can not pass the same variable
    twice and can not go against the direction of the
    arrows.

6
Derivations
  • These derivations can also be used to estimate
    the parameters of this model. How ?

7
A second example
8
A Structural Equations Model
9
Derivations
10
The Proof
11
The correlations between the variables
  • The effects are equal to the correlations with x1

12
What if x1 is not observed ? Can we still
estimate the effects ?
13
What happens if we have 4 observed variables ?
  • With extra info

14
Identification
  • Of these three equations we need only one to
  • determine the value of b41 when we have solved
  • b11 and the other coefficients from the first
    three
  • correlation coefficients
  • This model is called overidentified or the
  • degrees of freedom or df 2
  • df correlations - parameters to be estimated

15
A test is possible
  • If we know that b11 .7 and that
  • r(y1y4) b11b41 .35 it follows that b41 .5
  • Now we know all coefficients and two correlations
    are not used yet and can be used to test the
    model
  • r(y2y4) b21b41 r(y3y4) b31b41
  • r(y2y4) - r(y2y4) r(y2y4) - b21b41.3- .6x.5
    .0
  • r(y3y4) - r(y3y4) r(y3y4) - b31b41.5 - .8x.5
    .1
  • These differences are called residuals.
  • If these residuals are big the model must be
    wrong.

16
Identification again
  • With 3 observed variables df0 and no test is
    possible
  • With 2 observed variables df-1 and no test is
    possible but even the effects can not be
    estimated
  • If dflt0 the model is not identified

17
Estimation
  • The decomposition rules only hold for the
    population correlations and not for the sample
    correlations
  • But , normally, we know only the sample
    correlations
  • It is easily shown that the solution is different
    depending of the equations used
  • So an efficient estimation procedure is needed.

18
Estimation
  • There are several general principles.
  • We will discuss
  • - the Unweighted Least Squares (ULS) procedure
  • - the Weighted Least Squares (WLS) procedure.
  • Both procedures are based on the residuals
    between the sample correlations and the expected
    values of the correlations.

19
Estimation
  • The expected correlations are a function of the
    parameters fij(p)
  • where p represents the set of parameters of the
    model
  • and fij the specific function which gives the
    link between the population correlations and the
    parameters for the variables i and j.

20
ULS estimators
  • The ULS procedure suggests to look for the
    parameter values that minimize the unweighted sum
    of squared residuals
  • FULS S(rij fij(p))2
  • where the summation is over all unique elements
    of the correlation matrix.

21
Estimation in this specific case
The program looks for the values of all the
parameters that minimize the function Fuls
22
WLS estimators
  • The WLS procedure suggests to look for the
    parameter values that minimize the weighted sum
    of squared residuals
  • FWLS Swij(rij fij(p))2 where the summation
    is also over all unique elements of the
    correlation matrix.
  • These weights can be chosen in different ways.

23
ADF estimator
  • Using weights derived from the Variance
    Covariances of the covariances the Asymptotic
    Distribution Free estimator is specified.
  • For any distribution of the observed variables
    this estimator is consistent and provides
    standard errors and a test statistic
  • The problem is that it requires very large
    samples

24
ML estimator
  • The most commonly used procedure, the Maximum
    Likelihood (ML) estimator, can be specified as a
    special case of the WLS estimator.
  • The ML estimator provides standard errors for the
    parameters and a test statistic for the fit of
    the model for much smaller samples
  • but this estimator is developed under the
    assumption that the observed variables have a
    multivariate normal distribution.

25
Standard Procedure for testing S E Models
  • Testing is essential for S E Models
  • The test statistic t used is the value of the
    fitting function at its minimum
  • If the model is correct, t is c2 (df) distributed
  • Normally the model is rejected if t gt Ca
  • where Ca is the value of the c2 for which
  • pr(c2df gt Ca) a
  • We come back to this issue later

26
LISREL input
  • estimation and testing a factor model
  • data ni4 no400 makm
  • km
  • 1.0
  • .42 1.0
  • .56 .48 1.0
  • .35 .30 .40 1.0
  • model ny4 ne1 lyfu,fi tedi,fi psdi,fi
  • free ly 1 1 ly 2 1 ly 3 1 ly 4 1
  • free te 1 1 te 2 2 te 3 3 te 4 4
  • value 1 ps 1 1
  • out ULS

27
LISREL estimates of the effects of the latent
factor
28
LISREL estimates of the error variances
29
Goodness of fit test
30
LISREL input for different correlation matrix
estimation and testing a factor model data ni4
no400 makm km 1.0 .42 1.0 .56 .48 1.0 .35 .50
.50 1.0 model ny4 ne1 lyfu,fi tedi,fi
psdi,fi free ly 1 1 ly 2 1 ly 3 1 ly 4 1 free te
1 1 te 2 2 te 3 3 te 4 4 value 1 ps 1 1 out ULS
31
Estimates of the effects of the latent variable
estimation and testing a factor model
Number of
Iterations 9 LISREL Estimates (Unweighted
Least Squares) LAMBDA-Y
ETA 1 -------- VAR
1 0.64 (0.05)
14.18 VAR 2 0.67 (0.04)
15.43 VAR 3 0.79
(0.05) 15.75 VAR 4
0.64 (0.05) 14.28
32
Goodness of fit test of the model on the new
correlation matrix
Goodness of Fit Statistics W_A_R_N_I_N_G
Chi-square, standard errors, t-values and
standardized residuals are calculated under the
assumption of multi-variate normality.
Degrees of Freedom 2 Normal Theory Weighted
Least Squares Chi-Square 19.62 (P
0.00) Estimated Non-centrality Parameter (NCP)
17.62 90 Percent Confidence Interval for NCP
(6.96 35.72)
33
General Approach
  • A model is specified with observed and latent
    variables
  • Correlations (covariances) between the observed
    variables can be expressed in the parameters of
    the model (decomposition rules)
  • If the model is identified the parameters can be
    estimated
  • A test of the model can be performed if dfgt0
  • Eventual misspecifications can be detected
  • Corrections in the models can be introduced

34
Important Result
  • The distinction between observed and latent
    variables makes the estimation of error
    variances possible
  • The errors in social science survey data can be
    quite large.
  • These errors will bias the estimates if not taken
    into account
  • So the SEM approach has important advantages
Write a Comment
User Comments (0)
About PowerShow.com