REALCOM - PowerPoint PPT Presentation

About This Presentation
Title:

REALCOM

Description:

REALCOM Multilevel models for realistically complex data An ESRC research project at Bristol University Methodology and examples for: Measurement errors – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 32
Provided by: Gold87
Category:

less

Transcript and Presenter's Notes

Title: REALCOM


1
REALCOM
  • Multilevel models for realistically complex data

An ESRC research project at Bristol University
Methodology and examples for
Measurement errors Multilevel Structural
equations Multivariate responses at several
levels and of different types
2
(No Transcript)
3
General Format
  • MATLAB software
  • Free standing executable programs
  • ASCII and worksheet input and output
  • Graphical menu based input specification
  • Model equation display
  • Monitoring of MCMC chains
  • A training manual containing
  • Outline of methodology
  • Worked through examples

4
Markov Chain Monte Carlo a quick introduction
  • Bayesian simulation based method that, given
    starting values samples a new set of parameters
    at each cycle of a Markov chain
  • This yields a final chain (after discarding a
    burn-in set) of, say, 5000 sets of values from
    the (joint) posterior distribution of the
    parameters
  • This is formed by combining the likelihood based
    on the data and a prior distribution typically
    diffuse.
  • These chains are used for inference e.g. the
    mean for a parameter is analogous to the point
    estimate from a likelihood analysis, intervals
    etc.

5

Consider the simple 2-level model
The parameters in this model are the fixed
coefficients, the two variances and the level 2
residuals.
From suitable starting values eventually the
chain settles down so that sampling is from the
true posterior distribution and we need to sample
sufficient to provide stable estimates using
suitable convergence criteria.All the MATLAB
routines use MCMC sampling.
6
Measurement errors
  • Continuous variables a simple example
  • Basic model is
  • With a model of interest e.g.

7
Some assumptions we need to make
  • Variance assumed known or alternatively
  • Reliability
  • We also need a distribution for true value
  • An important issue is value for and
    sensitivity analysis useful we can also give it
    a prior.

8
2. Missclassification errors
  • Assume a binary (0,1) variable, for example
    whether or not a school pupil is eligible for
    free school meals (yes1)
  • Probability of observing a zero (no eligibility),
    given that the true value is zero, is
    and the probability of observing a one
    given that the true value is zero by
    - likewise we have and
  • We now assume we know these missclassification
    probabilities similar target model as before
    with a binary predictor.

9
Modelling considerations
  • We can model multivariate continuous measurement
    errors, but only independent binary
    missclassifications.
  • We can allow different measurement error
    variances and covariances for different groups
    e.g. gender.
  • In multivariate case we typically need non-zero
    correlations between measurement errors
  • Thus, say, if R0.7 observed correlation 0.8
    then we require measurement error correlation
    gt0.33

10
An educational example
  • Maths test score related to prior test scores and
    FSM eligibility.
  • We will look at continuous, correlated and binary
    measurement errors.

Open measurement-error.exe and read file
classsize
11
Summary table for analyses
12
Factor analysis and structural equation models
Consider a single level factor model where we
have several responses on each member of a
sample Where r indexes the response variable
and i the person. This is a special kind of
multivariate model where we assume the residuals
are independent and the covariance between two
responses is thus given by

A constraint is needed for identifiability and
the default is to choose
13
Extensions- further factors
  • We can add explanatory variables in addition to
    the
  • (see later) or we can add further factors

As number of factors increases, we require
further constraints, typically on loading values.
A popular choice is simple structure with each
response loading on only 1 factor and non-zero
correlations between factors.
14
Extensions structural variables
  • We can allow the factors themselves to depend on
    further variables e.g.

Or alternatively, but less commonly
15
Two level factor models
Standard formulation

Alternatively
But we shall not consider this case
16
Example PISA data
  • A survey of reading performance, of 15 year olds
    in 32 countries by OECD in 2000.
  • We use one subscale of 35 items retrieving
    information
  • and look at France and England.
  • First we shall fit one and two level models
    assuming responses are Normal in fact they are
    binary and ordered but we come to that later.
  • Open structural-equation.exe load pisadata

17
Binary and ordered responses
  • Assume a binary response z.
  • We will use the idea of a latent Normal
    distribution. Consider the (factor) model for a
    single response

Where we observe a positive (1) response for our
binary variable z if y is positive, that is
So that we obtain the probit model
18
Ordered data
Consider the cumulative probability of being in
one of the lowest s1 categories of a p category
variable - categories numbered from 0 upwards
s0,p-2 We extend the binary response model
as Where the define a set
of thresholds for the categories. So suppose we
have a 3-category variable, then for observed
responses

19
PISA data with binary/ordered responses
  • In fact all the responses are binary except for 4
    with 3 ordered categories C9, C14, C20, and C26
  • Change these responses and rerun models.
  • Finally fit explanatory variables Country and
    Gender in structural part of model.

20
Multivariate models with responses at 2 levels
  • Consider first 2 Normal responses
  • Superscript indicates level
  • Models are linked via level 2 covariance matrix
  • MCMC algorithm handles missing response data and
    categorical (binary, ordered and unordered) as
    well as Normal data.
  • First example is a repeated measures growth curve
    model

21
Child heights adult height

Child height as a cubic polynomial with intercept
slope random at level 2
22
  • Load growthdata.txt and fit the model
  • Results

23
Adult height prediction
  • Suppose we have 2 growth measures we want a
    regression prediction of the form
  • This leads to

24
Mixed response types and missing data
  • Normal and ordered data already considered in
    structural equation models
  • We now introduce unordered categorical responses
  • We can also have general Normalising
    transformations
  • Missing data via imputation is an important
    application for these models

25
Unordered categorical responses
Assume p categories where an individual responds
to just one.
  • We have

    where h indexes the response. For each
    we assume an underlying latent variable
    exists and that we have the following model
  • For identifiability we model p-1 categories and
    assume .
  • The maximum indicant model we observe category h
    for individual i iff .
  • so that

26
(No Transcript)
27
Multiple imputation briefly and simply
Consider the model of interest (MOI) We turn
this into a multivariate response model and
obtain residual estimates of
(from an MCMC chain)
which are missing. Use these to fill in
and produce a complete data set. Do this
(independently) n (e.g. 20) times. Fit MOI to
each data set and combine according to rules to
get estimates and standard errors.

28
Class size example
  • Load classsize_impute
  • MOI is Normalised exam score as response
    regressed on pretest score, gender, FSM, class
    size. 50 level 1 units have missing data.
    Multivariate model

29
MI estimates vs listwise deletion
  • Fixed effects in multivariate model 50 records
    MCAR

Estimate Listwise (SE) MI (SE) Complete (SE)
Post maths 0.102 (0.088) 0.134 (0.071) 0.134 (0.070)
Pre Maths 0.011 (0.088) 0.032 (0.071) 0.019 (0.071)
Gender 0.096 (0.074) 0.073 (0.047) 0.069 (0.047)
FSM -1.124 (0.159) -1.090 (0.129) -1.064 (0.129)
Class size (-30) -4.030 (0.602) -4.049 (0.597) -4.267 (0.544)

30
Further extensions
  • Box-Cox normalising transformations
  • Application to survival data treated as an
    ordered response when divided into discrete time
    intervals
  • Combination of measurement errors, structural
    models and responses at gt1 level into a single
    program
  • Incorporation into MLwiN

31
General remarks
  • Report back welcome (h.goldstein_at_bristol.ac.uk)
  • A REALCOM discussion group is under consideration

Use with care!
Write a Comment
User Comments (0)
About PowerShow.com