Title: slide show for twin workshop 1998
1GENETIC ANALYSIS OF BINARY and CATEGORICAL
TRAITS PART ONE
2TABLE 1. Twin Pair Concordances for Major
Depression (Virginia Twin Study data, adapted
from Neale and Cardon, 1992)
32 x concordant affected pairs discordant
pairs 2 x Total Pairs
Prevalence e.g. for MZ pairs e.g. for
DZ pairs
29.2
34.3
Prevalance proportion of affected (alcoholic)
twins in the general population.
42 x concordant affected pairs 2 x concordant
affected pairs discordant pairs
Probandwise concordance rate e.g. for MZ
pairs e.g. for DZ pairs
48.3
41.7
Probandwise concordance rate probability that
cotwin of a depressed twin will also have a
history of depression.
5 Why do we have (2 x number of concordant
affected pairs) in the numerator and denominator
of the expression for the probandwise concordance
rate? Consider a simple example where there are
4 affected individuals, who came from 3 twin
pairs, ie, 1 0 1 0 1 1 There are 4
potential probands, so if we randomly select an
affected individual, the probability that the
cotwin of that individual is also affected will
be 50
648.329.2
e.g. for MZ pairs 1.65 e.g. for DZ pairs
1.22
41.734.3
7Odds Ratios
- for binary data, a widely used measure of
association, especially in epidemiology
MZ Odds Ratio for depression 3.46 DZ Odds Ratio
for depression 1.64
Also can be estimated via a multiple logistic
regression model to allow statistical control for
covariates. In some applications, a probit model
may be used instead (see later) in general,
logistic regression and probit regression models
lead to almost identical conclusions about the
statistical significance of an association.
8TABLE 1a. Twin Pair Concordances for Alcohol
Dependence (DSM-IIIR) (Virginia Twin Study
data, from Kendler et al., 1992)
9Number of concordant alcoholic pairs N pairs
x prevalence x probandwise concordance MZ 1
5 pairs DZ 11 pairs Number of discordant
pairs 2 x N pairs x prevalence x (1 -
probandwise concordance) MZ 65 pairs DZ 68
pairs Number of concordant unaffected
pairs MZ 510 pairs DZ 361 pairs
10a) Normal Liability Threshold Model
b) Multiple-threshold Model
t
1
t
1
t
2
0
0
UNAFFECTED
AFFECTED
UNAFFECTED
MILD
SEVERE
CASES
CASES
Alcoholism Risk
Alcoholism Risk
11CUMULATIVE NORMAL FREQUENCY DISTRIBUTION
12Table 3. Population distribution of pairs of
relatives with both alcoholic, neither alcoholic,
or only one relative alcoholic, as a function of
(i) lifetime prevalence of alcoholism, and (ii)
liability correlation for alcoholism in relatives
ai.e. Probandwise concordance rate
13EXAMPLE DATA-FILE FOR MX RAW ORDINAL DATA MZF
DEPRESSION DATA (depmzf.rec)
Twin A Twin B Frequency 0 0 329 0 1 83 1 0 95
1 1 83 See MX manual for fit function (pp 89-90)
14EXAMPLE DATA-FILE (II) DERIVED FROM PUBLISHED
SOURCES MZF ALCOHOL DEPENDENCE DATA (alcmzf.rec)
0 0 310 0 1 32.5 1 0 32.5 1 1 15
15! tetrachoric.mx ! estimating tetrachoric
correlations define nvar 1 define maxthresf 1
! number of thresholds Analysis of depression
data estimating tetrachorics confidence
intervals data NI3 NG4 LAbels twina twinb
countmz Ordinal fidepmzf.rec ! Count is a
definition variable that we use to tell MX the
frequency count! for each element of the 2x2
table!Definition_variables countmz /Begin
matricesW LO nvar nvar fr ! ww' is the
tetrachoric correlationY LO nvar nvar fr ! yy'
is 1-tetrachoric correlationM FU maxthresf nvar
fi! this is where we will store the thresholds S
DI nvar nvar ! Matrix that will store weight
variableend matricesSP M3 MATRIX M 1.5487!
This tells MX to store the definition variable
count in SSP S-1 mat w 0.7 mat y 0.7
16Begin algebraRWW'EYY'VREend
algebraFREQ S ! tells MX that S contains the
weight (frequency) variableTH MM ! tells MX
that row and column thresholds contained in
MMCO VR_ R'V ! formula for correlation
matrix!bo 0.001 1.0 y(1,1) bo 0.0001 0.999
w(1,1)bo -5.0 5.0 m(1,1)interval r(1,1) !
compute 95 confidence interval for correlation
OPT func1.E-12OPT RSEND
17Analysis of depression data DZmdata NI3LAbels
twina twinb countdzOR fidepdzf.recDefinition_va
riables countdz /Begin matrices W LO nvar nvar
fr ! ww' is the tetrachoric correlation for DZ
groupY LO nvar nvar fr ! yy' is 1-tetrachoric
correlation for DZ groupN FU maxthresf nvar frS
DI nvar nvar ! Matrix that will store weight
variableend matricesSP N6 MATRIX N 1.4487SP
S-1 mat w 0.6 mat y 0.8Begin
algebraRWW'EYY'VREend algebraFREQ
STH NNCO VR_R'V bo 0.001 1.0 y(1,1) bo
0.0001 0.999 w(1,1)bo -5.0 5.0 n(1,1)interval
r(1,1) ! compute 95 confidence interval for
correlation OPT RSEND
18Constraint function - constrain variances to
unity for MZ groupCO NI1Begin matrices group
1U unit 1 nvarend matricesCO \d2v(V)
uendConstraint function - constrain variances
to unity for DZ groupCO NI1Begin matrices
group 2U unit 1 nvarend matricesCO \d2v(V)
uend
19Summary of VL file data for group 1
COUNTMZ TWINA TWINB Code
-1.0000E00 1.0000E00 2.0000E00 Number
4.0000E00 4.0000E00 4.0000E00 Mean
1.4750E02 5.0000E-01 5.0000E-01 Variance
1.1005E04 2.5000E-01 2.5000E-01 Minimum
8.3000E01 0.0000E00 0.0000E00 Maximum
3.2900E02 1.0000E00 1.0000E00
Summary of VL file data for group 2
COUNTDZ TWINA TWINB Code
-1.0000 1.0000 2.0000 Number
4.0000 4.0000 4.0000 Mean
110.0000 0.5000 0.5000 Variance
2882.5000 0.2500 0.2500 Minimum
63.0000 0.0000 0.0000 Maximum
201.0000 1.0000 1.0000
20 PARAMETER SPECIFICATIONS GROUP NUMBER 1
Analysis of depression data estimating
tetrachorics confidence intervals
MATRIX E This is a computed FULL matrix of order
1 by 1 It has no free parameters
specified MATRIX M This is a FULL matrix of
order 1 by 1 1 1 3 MATRIX R
This is a computed FULL matrix of order 1 by
1 It has no free parameters specified
MATRIX S This is a DIAGONAL matrix of order 1
by 1 1 1 -1 MATRIX V This is a
computed FULL matrix of order 1 by 1 It
has no free parameters specified MATRIX W
This is a LOWER TRIANGULAR matrix of order 1
by 1 1 1 1 MATRIX Y This is a
LOWER TRIANGULAR matrix of order 1 by 1
1 1 2
21 GROUP NUMBER 2 Analysis of ordinal alcohol
tolerance and dependence data DZm
MATRIX E This is a computed FULL matrix of
order 1 by 1 It has no free parameters
specified MATRIX N This is a FULL matrix of
order 1 by 1 1 1 6 MATRIX R
This is a computed FULL matrix of order 1 by
1 It has no free parameters specified
MATRIX S This is a DIAGONAL matrix of order 1
by 1 1 1 -1 MATRIX V This is a
computed FULL matrix of order 1 by 1 It
has no free parameters specified MATRIX W
This is a LOWER TRIANGULAR matrix of order 1
by 1 1 1 4 MATRIX Y This is a
LOWER TRIANGULAR matrix of order 1 by 1
1 1 5
22 MX PARAMETER ESTIMATES GROUP NUMBER 1
Analysis of depression data estimating
tetrachorics confidence intervals
MATRIX E This is a computed FULL matrix of order
1 by 1 YY' 1 1 0.5660
MATRIX M This is a FULL matrix of order 1
by 1 1 1 0.5489 MATRIX R
This is a computed FULL matrix of order 1 by
1 WW' 1 1 0.4340 MATRIX
S This is a DIAGONAL matrix of order 1 by
1 1 1 83.0000 MATRIX V
This is a computed FULL matrix of order 1 by
1 RE 1 1 1.0000
23 MATRIX W This is a LOWER TRIANGULAR matrix of
order 1 by 1 1 1 0.6588
MATRIX Y This is a LOWER TRIANGULAR matrix of
order 1 by 1 1 1 0.7523
Matrix of EXPECTED thresholds
TWINA TWINB Threshold 1 0.5489
0.5489 (OBSERVED MATRIX is nonexistent for
raw data) EXPECTED COVARIANCE MATRIX
TWINA TWINB TWINA 1.0000
TWINB 0.4340 1.0000 Function value
of this group 1383.2565 Where the fit
function is -2 Log-likelihood of raw ordinal
24 GROUP NUMBER 2 Analysis of ordinal alcohol
tolerance and dependence data DZm
MATRIX E This is a computed FULL matrix of
order 1 by 1 YY' 1 1
0.8157 MATRIX N This is a FULL matrix of
order 1 by 1 1 1 0.4038
MATRIX R This is a computed FULL matrix of order
1 by 1 WW' 1 1 0.1843
MATRIX S This is a DIAGONAL matrix of order
1 by 1 1 1 63.0000
MATRIX V This is a computed FULL matrix of order
1 by 1 RE 1 1
1.0000
25 MATRIX W This is a LOWER TRIANGULAR matrix of
order 1 by 1 1 1 0.4294
MATRIX Y This is a LOWER TRIANGULAR matrix of
order 1 by 1 1 1 0.9031
Matrix of EXPECTED thresholds
TWINA TWINB Threshold 1 0.4038
0.4038 (OBSERVED MATRIX is nonexistent for
raw data) EXPECTED COVARIANCE MATRIX
TWINA TWINB TWINA 1.0000
TWINB 0.1843 1.0000 Function value
of this group 1126.3757 Where the fit
function is -2 Log-likelihood of raw ordinal
26 Your model has 6 estimated parameters and
18 Observed statistics Observed statistics
include 2 constraints. -2 times
log-likelihood of data gtgtgt 2509.632 Degrees of
freedom gtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgt 12 1
Confidence intervals requested in group 1
Matrix Element Int. Estimate Lower
Upper Lfail Ufail R 1 1 1 95.0
0.4340 0.3086 0.5477 0 0
0 0 1 Confidence intervals requested in
group 2 Matrix Element Int. Estimate
Lower Upper Lfail Ufail R 2
1 1 95.0 0.1843 0.0306
0.3316 0 0 0 0 This problem used 0.2 of
my workspace Task Time
elapsed (DDHHMMSS) Reading script data
0 0 0 0.11 Execution 0 0
0 2.85 TOTAL 0 0 0
2.96 Total number of warnings issued 0
__________________________________________________
____________________________ ____________________
__________________________________________________
________
27 Mx startup successful MX-Sunos
version 1.49 ! tetra2.mx ! estimating
tetrachoric correlations The following MX
script lines were read for group 1 DEFINE
NVAR 1 DEFINE MAXTHRESF 1 ! NUMBER OF
THRESHOLDS ANALYSIS OF ALCOHOLISM DATA
ESTIMATING TETRACHORICS CONFIDENCE INTERVALS
DATA NI3 NO2 NG4 LABELS TWINA TWINB COUNTMZ
ORDINAL FIALCMZF.REC Ordinal data read
initiated NOTE Rectangular file contained
4 records with data ! Count is a definition
variable that we use to tell MX the frequency
count ! for each element of the 2x2 table !
DEFINITION_VARIABLES COUNTMZ / NOTE
Definition yields 4 data vectors for analysis
NOTE Vectors contain a total of 8
observations
28 Summary of VL file data for group 1
COUNTMZ TWINA TWINB
Code -1.0000E00 1.0000E00 2.0000E00
Number 4.0000E00 4.0000E00 4.0000E00
Mean 1.4750E02 5.0000E-01 5.0000E-01
Variance 4.3853E04 2.5000E-01 2.5000E-01
Minimum 1.5000E01 0.0000E00 0.0000E00
Maximum 5.1000E02 1.0000E00 1.0000E00
Summary of VL file data for group 2
COUNTDZ TWINA TWINB
Code -1.0000E00 1.0000E00 2.0000E00
Number 4.0000E00 4.0000E00 4.0000E00
Mean 1.1000E02 5.0000E-01 5.0000E-01
Variance 2.1088E04 2.5000E-01 2.5000E-01
Minimum 1.1000E01 0.0000E00 0.0000E00
Maximum 3.6100E02 1.0000E00 1.0000E00
29 MX PARAMETER ESTIMATES GROUP NUMBER 1
Analysis of alcoholism data estimating
tetrachorics confidence intervals
MATRIX E This is a computed FULL matrix of order
1 by 1 YY' 1 1 0.4688
MATRIX M This is a FULL matrix of order 1
by 1 1 1 1.4017 MATRIX
R This is a computed FULL matrix of order 1
by 1 WW' 1 1 0.5312
MATRIX S This is a DIAGONAL matrix of order 1
by 1 1 1 15.0000
30 MATRIX V This is a computed FULL matrix of
order 1 by 1 RE 1 1
1.0000 MATRIX W This is a LOWER TRIANGULAR
matrix of order 1 by 1 1 1
0.7288 MATRIX Y This is a LOWER TRIANGULAR
matrix of order 1 by 1 1 1
0.6847 Matrix of EXPECTED thresholds
TWINA TWINB Threshold 1
1.4017 1.4017 (OBSERVED MATRIX is
nonexistent for raw data) EXPECTED
COVARIANCE MATRIX TWINA TWINB
TWINA 1.0000 TWINB 0.5312 1.0000
Function value of this group 635.6429
Where the fit function is -2 Log-likelihood of
raw ordinal
31 GROUP NUMBER 2 Analysis of ordinal alcohol
tolerance and dependence data DZm
MATRIX E This is a computed FULL matrix of
order 1 by 1 YY' 1 1
0.6482 MATRIX N This is a FULL matrix of
order 1 by 1 1 1 1.2687
MATRIX R This is a computed FULL matrix of
order 1 by 1 WW' 1 1
0.3518 MATRIX S This is a DIAGONAL matrix
of order 1 by 1 1 1
11.0000 MATRIX V This is a computed FULL
matrix of order 1 by 1 RE
1 1 1.0000
32 MATRIX W This is a LOWER TRIANGULAR matrix of
order 1 by 1 1 1 0.5931
MATRIX Y This is a LOWER TRIANGULAR matrix of
order 1 by 1 1 1 0.8051
Matrix of EXPECTED thresholds
TWINA TWINB Threshold 1 1.2687
1.2687 (OBSERVED MATRIX is nonexistent for
raw data) EXPECTED COVARIANCE MATRIX
TWINA TWINB TWINA 1.0000
TWINB 0.3518 1.0000 Function value
of this group 572.2531 Where the fit
function is -2 Log-likelihood of raw ordinal
33 MATRIX Y This is a LOWER TRIANGULAR matrix of
order 1 by 1 1 1 0.8051
Your model has 6 estimated parameters and
18 Observed statistics Observed statistics
include 2 constraints. -2 times
log-likelihood of data gtgtgt 1207.896 Degrees of
freedom gtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgt 12 1
Confidence intervals requested in group 1
Matrix Element Int. Estimate Lower
Upper Lfail Ufail R 1 1 1 95.0
0.5312 0.3367 0.6903 0 0
0 0 1 Confidence intervals requested in
group 2 Matrix Element Int. Estimate
Lower Upper Lfail Ufail R 2
1 1 95.0 0.3518 0.1190
0.5558 0 0 0 0 This problem used 0.2 of
my workspace Task Time
elapsed (DDHHMMSS) Reading script data
0 0 0 0.11 Execution 0 0
0 3.41 TOTAL 0 0 0
3.52 Total number of warnings issued 0
34ESTIMATED TETRACHORIC CORRELATIONS (estimating
separate thresholds for each zygosity group)
35TEST FOR ZYGOSITY DIFFERENCE IN PREVALENCE (takes
into account non-independence!)
1
1
36This approach extends naturally to fitting
univariate genetic models.
37! univariate.mx! fitting a univariate genetic
model to 2x2 datadefine nvar 1define
maxthresf 1 ! number of thresholdsAnalysis of
depression data fitting ACE model data NI3
NG3LAbels twina twinb countmzOrdinal
fidepmzf.rec! Count is a definition variable
that we use to tell MX the frequency count! for
each element of the 2x2 table!Definition_variabl
es countmz /Begin matricesW LO nvar nvar fr !
additive genetic path (Aww')X LO nvar nvar fr
! shared environmental path (Cxx') Y LO nvar
nvar fr ! non-shared environmental path (Eyy')
Z LO nvar nvar fi ! non-additive genetic path
(Dzz')M FU maxthresf nvar fi ! matrix of
thresholdsS DI nvar nvar ! Matrix that will
store weight variableend matricesSP M4
MATRIX M 1.5487! This tells MX to store the
definition variable count in SSP S-1 mat w
0.5mat x 0.5 mat y 0.7
38Begin algebraAWW'CXX'EYY'DZZ'VA
CDEend algebraFREQ S ! tells MX that S
contains the weight (frequency) variableTH MM
! tells MX that row and column thresholds
contained in MMCO VADC_ A'D'C'V !
formula for correlation matrix!bo 0.001 1.0
y(1,1) bo 0.0001 0.999 w(1,1) x(1,1)bo -5.0 5.0
m(1,1)interval a(1,1) c(1,1) e(1,1) ! compute
95 confidence interval for correlation OPT
func1.E-12OPT RSEND
39Analysis of depression data DZmdata NI3
NO4LAbels twina twinb countdzOR
fidepdzf.recDefinition_variables countdz
/Begin matrices group 1 S DI nvar nvar !
Matrix that will store weight variableg DI 1 1
! constant (0.5) for coefficient of additive
genetic componenth DI 1 1 ! constant (0.25)
for coefficient of dominance genetic component n
FU maxthresf nvar fi ! matrix of thresholdsend
matricesSP N5 MATRIX N 1.4487MAT g 0.5MAT h
0.25SP S-1 FREQ STH NNCO Vg_at_Ah_at_DC_
g_at_A'h_at_D'C'V ! formula for correlation
matrix!bo -5.0 5.0 n(1,1)OPT RSEND
40Constraint function - constrain variance to
unity CO NI1Begin matrices group 1U unit 1
nvarend matricesCO \d2v(V) uend
41 Mx startup successful MX-Sunos
version 1.49 ! univariate.mx ! fitting a
univariate genetic model to 2x2 data The
following MX script lines were read for group 1
DEFINE NVAR 1 DEFINE MAXTHRESF 1 ! NUMBER OF
THRESHOLDS ANALYSIS OF DEPRESSION DATA FITTING
ACE MODEL DATA NI3 NO2 NG3 LABELS TWINA
TWINB COUNTMZ ORDINAL FIDEPMZF.REC Ordinal
data read initiated NOTE Rectangular file
contained 4 records with data ! Count is
a definition variable that we use to tell MX the
frequency count ! for each element of the 2x2
table ! DEFINITION_VARIABLES COUNTMZ /
NOTE Definition yields 4 data vectors for
analysis NOTE Vectors contain a total of 8
observations BEGIN MATRICES W LO NVAR NVAR
FR ! ADDITIVE GENETIC PATH (AWW') X LO NVAR
NVAR FR ! SHARED ENVIRONMENTAL PATH (CXX') Y
LO NVAR NVAR FR ! NON-SHARED ENVIRONMENTAL PATH
(EYY') Z LO NVAR NVAR FI ! NON-ADDITIVE
GENETIC PATH (DZZ') M FU MAXTHRESF NVAR FI !
MATRIX OF THRESHOLDS S DI NVAR NVAR ! MATRIX
THAT WILL STORE WEIGHT VARIABLE END MATRICES
42 MX PARAMETER ESTIMATES GROUP NUMBER 1
Analysis of depression data fitting ACE model
MATRIX A This is a computed FULL matrix of
order 1 by 1 WW' 1 1
0.4250 MATRIX C This is a computed FULL
matrix of order 1 by 1 XX'
1 1 1.0000E-08 MATRIX D This is a
computed FULL matrix of order 1 by 1
ZZ' 1 1 0.0000 MATRIX E
This is a computed FULL matrix of order 1 by
1 YY' 1 1 0.5750 MATRIX
M This is a FULL matrix of order 1 by 1
1 1 0.5493
43 MATRIX S This is a DIAGONAL matrix of order
1 by 1 1 1 83.0000 MATRIX
V This is a computed FULL matrix of order 1
by 1 ACDE 1 1
1.0000 MATRIX W This is a LOWER TRIANGULAR
matrix of order 1 by 1 1 1
0.6519 MATRIX X This is a LOWER TRIANGULAR
matrix of order 1 by 1 1 1
1.0000E-04 MATRIX Y This is a LOWER
TRIANGULAR matrix of order 1 by 1
1 1 0.7583 MATRIX Z This is a LOWER
TRIANGULAR matrix of order 1 by 1
1 1 0.0000
44 Matrix of EXPECTED thresholds
TWINA TWINB Threshold 1 0.5493
0.5493 (OBSERVED MATRIX is nonexistent for raw
data) EXPECTED COVARIANCE MATRIX
TWINA TWINB TWINA 1.0000 TWINB
0.4250 1.0000 Function value of this
group 1383.2782 Where the fit function is
-2 Log-likelihood of raw ordinal
45 Your model has 5 estimated parameters and
17 Observed statistics Observed statistics
include 1 constraints. -2 times
log-likelihood of data gtgtgt 2509.788 Degrees of
freedom gtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgt 12 3
Confidence intervals requested in group 1
Matrix Element Int. Estimate Lower
Upper Lfail Ufail A 1 1 1 95.0
0.4250 0.1045 0.5325 0 0
0 0 C 1 1 1 95.0 0.0000
0.0000 0.2609 0 0 0 1 E 1 1 1
95.0 0.5750 0.4675 0.6940 0
0 0 0 This problem used 0.1 of my
workspace Task Time
elapsed (DDHHMMSS) Reading script data
0 0 0 0.10 Execution 0 0
015.76 TOTAL 0 0
015.86 Total number of warnings issued 1
__________________________________________________
____________________________ ____________________
__________________________________________________
________
46 Mx startup successful MX-Sunos
version 1.49 ! univar2.mx ! fitting a
univariate genetic model to 2x2 data The
following MX script lines were read for group 1
DEFINE NVAR 1 DEFINE MAXTHRESF 1 ! NUMBER OF
THRESHOLDS ANALYSIS OF ALCOHOL DEPENDENCE DATA
FITTING ACE MODEL DATA NI3 NO2 NG3 LABELS
TWINA TWINB COUNTMZ ORDINAL FIALCMZF.REC
Ordinal data read initiated NOTE
Rectangular file contained 4 records with
data ! Count is a definition variable that we
use to tell MX the frequency count ! for each
element of the 2x2 table ! DEFINITION_VARIABLES
COUNTMZ / NOTE Definition yields 4 data
vectors for analysis NOTE Vectors contain a
total of 8 observations
47 MX PARAMETER ESTIMATES GROUP NUMBER 1
Analysis of alcohol dependence data fitting ACE
model
MATRIX A This is a computed FULL
matrix of order 1 by 1 WW'
1 1 0.3588 MATRIX C This is a computed
FULL matrix of order 1 by 1 XX'
1 1 0.1724 MATRIX D This is a
computed FULL matrix of order 1 by 1
ZZ' 1 1 0.0000 MATRIX E
This is a computed FULL matrix of order 1 by
1 YY' 1 1 0.4688
48 MATRIX M This is a FULL matrix of order 1 by
1 1 1 1.4017 MATRIX S
This is a DIAGONAL matrix of order 1 by 1
1 1 15.0000 MATRIX V This
is a computed FULL matrix of order 1 by 1
ACDE 1 1 1.0000
MATRIX W This is a LOWER TRIANGULAR matrix of
order 1 by 1 1 1 0.5990
MATRIX X This is a LOWER TRIANGULAR matrix of
order 1 by 1 1 1 0.4152
MATRIX Y This is a LOWER TRIANGULAR matrix of
order 1 by 1 1 1 0.6847
MATRIX Z This is a LOWER TRIANGULAR matrix of
order 1 by 1 1 1 0.0000
49 Matrix of EXPECTED thresholds
TWINA TWINB Threshold 1 1.4017
1.4017 (OBSERVED MATRIX is nonexistent for
raw data) EXPECTED COVARIANCE MATRIX
TWINA TWINB TWINA 1.0000
TWINB 0.5312 1.0000 Function value
of this group 635.6429 Where the fit
function is -2 Log-likelihood of raw ordinal
50 Your model has 5 estimated parameters and
17 Observed statistics Observed statistics
include 1 constraints. -2 times
log-likelihood of data gtgtgt 1207.896 Degrees of
freedom gtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgt 12 3
Confidence intervals requested in group 1
Matrix Element Int. Estimate Lower
Upper Lfail Ufail A 1 1 1 95.0
0.3588 0.0000 0.6902 0 0
0 0 C 1 1 1 95.0 0.1724
0.0000 0.5542 0 0 0 0 E 1 1 1
95.0 0.4688 0.3097 0.6628 0
1 0 0 This problem used 0.1 of my
workspace Task Time
elapsed (DDHHMMSS) Reading script data
0 0 0 0.10 Execution 0 0
0 7.55 TOTAL 0 0 0
7.65 Total number of warnings issued 1
__________________________________________________
____________________________ ____________________
__________________________________________________
________
51VIRGINIA TWIN STUDY Female Like-Sex
Pairs Summary Model-Fitting Results
52Model-fitting results Depression in the Virginia
Twin Study
53Singleton Twins?
We can easily handle data where only one twin has
responded. HOWEVER, we are assuming that missing
data are MCAR - Missing Completely at Random. We
can include twins with missing cotwins (indicated
by .) in the same data-file as complete
pairs. Alternatively, if we want to test for
differences in prevalence for complete pairs
versus singles (suggestive of an ascertainment
bias), we can include singleton twins as separate
groups, allowing a test of equality of thresholds.
54EXAMPLE Alcohol Dependence Data from 1992
Survey of the Australian Twin Panel (1981 cohort)
55Table 2. Numbers of twin pairs concordant and
discordant for smoking status in the
Australian twin panel 1981 survey.
56MULTIPLE THRESHOLD MODEL
For n categories, we need to estimate (n-1)
thresholds. The safest way to estimate multiple
thresholds is to estimate t0 t1 t0
t1 (t1 gt 0) t2 t1 t2 (t2 gt 0) and so on.
This is especially important when we estimate
confidence intervals. Note that if L and M
then LM etc. Hence, we merely need
to constrain t1 etc. gt 0.
57! univariate3x3.mx! fitting a univariate genetic
model to 3x3 datadefine nvar 1define
maxthresf 2 ! number of thresholdsAnalysis of
smoking data fitting ACE model data NI3
NG3LAbels twina twinb countmzOrdinal
fismkmzf.rec! Count is a definition variable
that we use to tell MX the frequency count! for
each element of the 3x3 table!Definition_variabl
es countmz /Begin matricesW LO nvar nvar fr !
additive genetic path (Aww')X LO nvar nvar fr
! shared environmental path (Cxx') Y LO nvar
nvar fr ! non-shared environmental path (Eyy')
Z LO nvar nvar fi ! non-additive genetic path
(Dzz')M FU maxthresf nvar fi ! matrix of
thresholdsL LO maxthresf maxthresf ! used to
ensure t1 lt t2S DI nvar nvar ! Matrix that
will store weight variableend matricesSP M45
MATRIX M 1.5487 0.5
58MATRIX L 11 1 ! This tells MX to store the
definition variable count in SSP S-1 mat w
0.5mat x 0.5 mat y 0.7Begin algebraAWW'C
XX'EYY'DZZ'VACDETLMend
algebraFREQ S ! tells MX that S contains the
weight (frequency) variableTH TT ! tells MX
that row and column thresholds contained in
TTCO VADC_ A'D'C'V ! formula for
correlation matrix!bo 0.001 1.0 y(1,1) m(2,1)
bo 0.0001 0.999 w(1,1) x(1,1)bo -5.0 5.0
m(1,1)interval a(1,1) c(1,1) e(1,1) ! compute
95 confidence interval for correlation OPT
func1.E-12OPT RSEND
59Analysis of ordinal smoking data DZmdata NI3
X LAbels twina twinb countdzOR
fismkdzf.recDefinition_variables countdz
/Begin matrices group 1 S DI nvar nvar !
Matrix that will store weight variableg DI 1 1
! constant (0.5) for coefficient of additive
genetic componenth DI 1 1 ! constant (0.25)
for coefficient of dominance genetic component n
FU maxthresf nvar fi ! matrix of thresholdsend
matricesSP N67 MATRIX N 1.4487 0.5MAT g
0.5MAT h 0.25SP S-1 Begin algebraTLNend
algebraFREQ STH TTCO Vg_at_Ah_at_DC_
g_at_A'h_at_D'C'V ! formula for correlation
matrix!bo -5.0 5.0 n(1,1)bo 0.001 1.0
n(2,1)OPT RSEND
60Constraint function - constrain variances to
unityCO NI1Begin matrices group 1U unit 1
nvarend matricesCO \d2v(V) uend
61 Mx startup successful MX-Sunos
version 1.49 ! univariate3x3.mx ! fitting a
univariate genetic model to 3x3 data The
following MX script lines were read for group 1
DEFINE NVAR 1 DEFINE MAXTHRESF 2 ! NUMBER OF
THRESHOLDS ANALYSIS OF SMOKING DATA FITTING ACE
MODEL DATA NI3 NO9 NG3 LABELS TWINA TWINB
COUNTMZ ORDINAL FISMKMZF.REC Ordinal data
read initiated NOTE Rectangular file
contained 9 records with data and
1 records where all data were missing ! Count
is a definition variable that we use to tell MX
the frequency count ! for each element of the
3x3 table ! DEFINITION_VARIABLES COUNTMZ /
NOTE Definition yields 9 data vectors for
analysis NOTE Vectors contain a total of 18
observations BEGIN MATRICES W LO NVAR NVAR
FR ! ADDITIVE GENETIC PATH (AWW') X LO NVAR
NVAR FR ! SHARED ENVIRONMENTAL PATH (CXX') Y
LO NVAR NVAR FR ! NON-SHARED ENVIRONMENTAL PATH
(EYY') Z LO NVAR NVAR FI ! NON-ADDITIVE
GENETIC PATH (DZZ') M FU MAXTHRESF NVAR FI !
MATRIX OF THRESHOLDS L LO MAXTHRESF MAXTHRESF !
USED TO ENSURE T1 lt T2 S DI NVAR NVAR !
MATRIX THAT WILL STORE WEIGHT VARIABLE END
MATRICES
62 Summary of VL file data for group 1
COUNTMZ TWINA TWINB
Code -1.0000E00 1.0000E00 2.0000E00
Number 9.0000E00 9.0000E00 9.0000E00
Mean 1.3689E02 1.0000E00 1.0000E00
Variance 3.1949E04 6.6667E-01 6.6667E-01
Minimum 5.5000E01 0.0000E00 0.0000E00
Maximum 6.2900E02 2.0000E00 2.0000E00
Summary of VL file data for group 2
COUNTDZ TWINA TWINB
Code -1.0000 1.0000 2.0000 Number
9.0000 9.0000 9.0000 Mean
83.0000 1.0000 1.0000 Variance
6923.2778 0.6667 0.6667 Minimum
30.5000 0.0000 0.0000 Maximum
310.0000 2.0000 2.0000
63 MX PARAMETER ESTIMATES GROUP NUMBER 1
Analysis of smoking data fitting ACE model
MATRIX A This is a
computed FULL matrix of order 1 by 1
WW' 1 1 0.5836 MATRIX C
This is a computed FULL matrix of order 1 by
1 XX' 1 1 0.1823 MATRIX
D This is a computed FULL matrix of order 1
by 1 ZZ' 1 1 0.0000
MATRIX E This is a computed FULL matrix of order
1 by 1 YY' 1 1 0.2341
MATRIX L This is a LOWER TRIANGULAR matrix of
order 2 by 2 1 2 1
1.0000 2 1.0000 1.0000
64 MATRIX M This is a FULL matrix of order 2 by
1 1 1 0.2809 2 0.3979
MATRIX S This is a DIAGONAL matrix of order 1
by 1 1 1 190.0000 MATRIX
T This is a computed FULL matrix of order 2
by 1 LM 1 1 0.2809 2
0.6788 MATRIX V This is a computed FULL
matrix of order 1 by 1 ACDE
1 1 1.0000 MATRIX W This is a
LOWER TRIANGULAR matrix of order 1 by 1
1 1 0.7639 MATRIX X This is a
LOWER TRIANGULAR matrix of order 1 by 1
1 1 0.4269
65 MATRIX Y This is a LOWER TRIANGULAR matrix of
order 1 by 1 1 1 0.4839
MATRIX Z This is a LOWER TRIANGULAR matrix of
order 1 by 1 1 1 0.0000
Matrix of EXPECTED thresholds
TWINA TWINB Threshold 1 0.2809 0.2809
Threshold 2 0.6788 0.6788 (OBSERVED
MATRIX is nonexistent for raw data)
EXPECTED COVARIANCE MATRIX TWINA
TWINB TWINA 1.0000 TWINB 0.7659
1.0000 Function value of this group
4094.3378 Where the fit function is -2
Log-likelihood of raw ordinal
66 MATRIX W This is a LOWER TRIANGULAR matrix of
order 1 by 1 1 1 0.7639
MATRIX X This is a LOWER TRIANGULAR matrix of
order 1 by 1 1 1 0.4269
MATRIX Y This is a LOWER TRIANGULAR matrix of
order 1 by 1 1 1 0.4839
MATRIX Z This is a LOWER TRIANGULAR matrix of
order 1 by 1 1 1 0.0000
Matrix of EXPECTED thresholds
TWINA TWINB Threshold 1 0.1992 0.1992
Threshold 2 0.6100 0.6100 (OBSERVED
MATRIX is nonexistent for raw data)
EXPECTED COVARIANCE MATRIX TWINA
TWINB TWINA 1.0000 TWINB 0.4741
1.0000 Function value of this group
2765.7220 Where the fit function is -2
Log-likelihood of raw ordinal
67 Your model has 7 estimated parameters and
37 Observed statistics Observed statistics
include 1 constraints. -2 times
log-likelihood of data gtgtgt 6860.060 Degrees of
freedom gtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgt 30 3
Confidence intervals requested in group 1
Matrix Element Int. Estimate Lower
Upper Lfail Ufail A 1 1 1 95.0
0.5836 0.3990 0.7781 0 0
0 1 C 1 1 1 95.0 0.1823
0.0000 0.3510 0 0 0 0 E 1 1 1
95.0 0.2341 0.1958 0.2777 0
1 0 0 This problem used 0.1 of my
workspace Task Time
elapsed (DDHHMMSS) Reading script data
0 0 0 0.24 Execution 0 0
029.75 TOTAL 0 0
030.00 Total number of warnings issued 2
__________________________________________________
____________________________ ____________________
__________________________________________________
________
68SMOKING IN WOMEN
69BIVARIATE GENETIC APPLICATIONS
It is a simple step to modify the univariate
script to allow for bivariate (or even
trivariate) genetic analyses. If the traits
being analyzed have varying numbers of
thresholds, maxthres will be the maximum number
of thresholds, and we will have,
say, MAT M 0.5 -0.5 0 1.0 In the next example,
we analyze Australian twin data on lifetime
history of major depression and current smoking
status. Here, the original raw data are given in
depsmkmf.rec and depsmkdf.rec. Notice that the
data have been sorted -- this will improve the
efficiency of the MX run.
70! ordinal_bivariate.mxdefine nvar 2define
nvar2 4define maxthres 2Analysis of ordinal
depression (0/1) and smoking !
initiation/persistence (0/1/2) data NInvar2
NG3Ordinal fidepsmkmf.recBegin matricesM FU
maxthres nvar frL LO maxthres maxthresW LO nvar
nvar frX LO nvar nvar frY LO nvar nvar frend
matricesMAT L1.01.0 1.0MATRIX M 0.5294
0.7191 0.0 0.5 SP M1 2 0 3 st 0.7 y(1,1)
y(2,2) w(1,1) w(2,2)st 0.2 x(1,1) x(2,2)st 0.2
w(2,1) x(2,1) y(2,1)
71Begin algebraAWW'O\stnd(A)CXX'r\stnd
(C)EYY'q\stnd(E)PACEend algebraTH
LMLMCO P A C _ A' C' P bo
0.001 1.0 y(1,1) y(2,2)bo 0.0001 0.999 x(1,1)
x(2,2) w(1,1) w(2,2) bo -0.999 0.999 x(2,1)
y(2,1) w(2,1)bo 0.001 3.0 m(2,2)bo -5.0 5.0
m(1,1)! interval a(1,1) a(2,2) c(1,1) c(2,2)
e(1,1) e(2,2) o(1,2) r(1,2) q(1,2) OPT
func1.E-12OPT RSEND
72Analysis of ordinal depression and smoking data
DZFdata NInvar2Ordinal fidepsmkdf.recBegin
matrices group 1N FU maxthres nvar frg fu 1
1end matricesMATRIX N0.5781 0.6884 0
0.72 SP N101 102 0 103 mat g0.5TH LN
LN CO P g_at_A C _ g_at_A' C' P bo
0.001 3.0 n(2,2)bo -5.0 5.0 n(1,1)OPT RSEND
73Data constraintCO NI1Begin matrices group
1U unit 1 nvarend matricesCO \d2v(P) uend
74 Summary of VL file data for group 1
Code 1.0000 2.0000 3.0000
4.0000 Number 1013.0000 1286.0000
982.0000 1254.0000 Mean 0.1925
0.6454 0.2169 0.6555 Variance
0.1554 0.7234 0.1699 0.7426 Minimum
0.0000 0.0000 0.0000 0.0000
Maximum 1.0000 2.0000 1.0000
2.0000 Summary of VL file data for group
2 Code 1.0000 2.0000 3.0000
4.0000 Number 598.0000 826.0000
586.0000 786.0000 Mean 0.1940
0.6525 0.2526 0.7468 Variance
0.1564 0.7086 0.1888 0.7820 Minimum
0.0000 0.0000 0.0000 0.0000
Maximum 1.0000 2.0000 1.0000
2.0000
75 WARNING! I am not sure I have
found a solution that satisfies
Kuhn-Tucker conditions for a minimum.
NAG's IFAIL parameter is 6 Looks like I got
stuck here. Check the following 1. The model
is correctly specified 2. Starting values are
good 3. You are not already at the solution
The error can arise if the Hessian is
ill-conditioned You can try resetting it to
an identity matrix and fit from the solution by
putting TH-n on the OU line where n is the
number of refits that you want to do If all
else fails try putting NAG30 on the OU line
and examine the file NAGDUMP.OUT and the NAG
manual
76 MX PARAMETER ESTIMATES GROUP NUMBER 1
Analysis of ordinal depression (0/1) and smoking
initiation/persistence (0/1/2)
MATRIX A This
is a computed FULL matrix of order 2 by 2
WW' 1 2 1 0.3551 0.0782
2 0.0782 0.6147 MATRIX C This is a
computed FULL matrix of order 2 by 2
XX' 1 2 1
3.5944E-08 5.0292E-05 2 5.0292E-05
1.5118E-01 MATRIX E This is a computed FULL
matrix of order 2 by 2 YY'
1 2 1 0.6451 0.0270 2 0.0270
0.2343 MATRIX L This is a LOWER TRIANGULAR
matrix of order 2 by 2 1 2
1 1.0000 2 1.0000 1.0000
77 MATRIX M This is a FULL matrix of order 2
by 2 1 2 1 0.8249 0.2698
2 0.0000 0.4014 MATRIX O This is a
computed FULL matrix of order 2 by 2
\STND(A) 1 2 1 1.0000
0.1675 2 0.1675 1.0000 MATRIX P This is
a computed FULL matrix of order 2 by 2
ACE 1 2 1 1.0003
0.1053 2 0.1053 1.0001 MATRIX
Q This is a computed FULL matrix of order 2
by 2 \STND(E) 1 2 1
1.0000 0.0695 2 0.0695 1.0000
78 MATRIX R This is a computed FULL matrix of
order 2 by 2 \STND(C) 1
2 1 1.0000 0.6822 2 0.6822 1.0000
MATRIX W This is a LOWER TRIANGULAR matrix of
order 2 by 2 1 2 1
0.5959 2 0.1313 0.7729 MATRIX X This is
a LOWER TRIANGULAR matrix of order 2 by 2
1 2 1 1.8959E-04 2
2.6527E-01 2.8428E-01 MATRIX Y This is a
LOWER TRIANGULAR matrix of order 2 by 2
1 2 1 0.8032 2 0.0337 0.4829
79 Matrix of EXPECTED thresholds
1 2 3 4 Threshold 1
0.8249 0.2698 0.8249 0.2698 Threshold 2
0.8249 0.6712 0.8249 0.6712 (OBSERVED
MATRIX is nonexistent for raw data)
EXPECTED COVARIANCE MATRIX 1
2 3 4 1 1.0003 2
0.1053 1.0001 3 0.3551 0.0783
1.0003 4 0.0783 0.7659 0.1053
1.0001 Function value of this group
6219.6048 Where the fit function is -2
Log-likelihood of raw ordinal
80 WARNING! Minimization may not be
successful. See above CODE RED -
Hessian/precision problem Your model has 15
estimated parameters and 7333 Observed
statistics Observed statistics include 2
constraints. -2 times log-likelihood of data
gtgtgt 10511.502 Degrees of freedom
gtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgt 7318 This problem used
1.2 of my workspace Task
Time elapsed (DDHHMMSS) Reading script data
0 0 0 6.63 Execution 0
0 518.42 TOTAL 0 0
525.04 Total number of warnings issued 2
__________________________________________________
____________________________ ____________________
__________________________________________________
________
81Controlling for Covariates
An advantage of fitting models to raw (binary or
ordinal) data is that we can simultaneously
control for covariates (e.g., age) while fitting
genetic models. To do this, we need to include
covariates as definition variables in our
analysis, and simultaneously model the probit
regression of liability on covariates, so that we
are now testing for genetic effects on the
residual variance in the outcome of interest.
This approach can also be extended to test for
genotype x environment interaction effects
(beyond the scope of this workshop)! Previously,
we have directly estimated threshold values t
that are assumed to be the same for al
individuals of a given gender (and sometimes
zygosity group). Now we must allow thresholds to
differ between individuals as a function of their
covariate values Ci1, Ci2, etc. ti to B1Ci1
B2Ci2 , etc. The regression coefficients B1,
B2, etc. are probit regression coefficients
thus good starting values can be obtained from
standard statistical software such STATA. In the
next program, we estimate probit regression
coefficients and residual twin pair correlation.
82Analysis of regression of alcdep on conduct
(broad), majdep, panatt ! /data2/boulder/probitmu
lt_withpairs.mx ! Second group is DZF complete
pairs data NI9 la cond majdep panatt alcdep
xcond xmjdep xpnatt xlcdep wt ordinal
fifemdzf.dat definition_variables cond majdep
panatt xcond xmjdep xpnatt wt / Begin matrices O
FU 1 6 ! store definition variable (i.e. vara)
here K FU 6 2 fi ! regression coefficient T FU 1
1 fi ! threshold value V FU 1 1 ! variance
(unity) R FU 1 1 fr ! tetrachoric correlation
(to be estimated) W FU 1 1 ! weight variable end
matrices SP O -1 -2 -3 -4 -5 -6 SP W -7 MAT V
1.00 MAT R 0.30 SP T 50 SP K 100 0 101 0 102 0 0
100 0 101 0 102 FREQ W TH -(TT)-OK ! Note
that we now have thresholds for both twins CO
VR_ RV bo -0.999 0.999 r(1,1) interval
r(1,1) OPT RS END
83Analysis of regression of alcdep on conduct
(broad), majdep, panatt ! This group is for
singleton women data NI5 la cond majdep panatt
alcdep wt ordinal fifemsing.dat definition_varia
bles cond majdep panatt wt / Begin matrices O FU
1 3 ! store definition variable (i.e. vara) here
K FU 3 1 fr ! regression coefficient T FU 1 1 fr
! threshold value V FU 1 1 ! variance (unity) W
FU 1 1 ! weight variable end matrices SP O -1 -2
-3 SP W -4 MAT V 1.00 SP T 50 MAT T 0.1 SP
K 100 101 102 MAT K 0.05 0.05 0.05 FREQ W TH
-T-OK CO V OPT func1.E-12 OPT RS END
84 Mx startup successful MX-Sunos
version 1.50c The following MX script
lines were read for group 1 ANALYSIS OF
REGRESSION OF ALCDEP ON CONDUCT (BROAD), MAJDEP,
PANATT ! estimating probit regression
coefficients ! ! /data2/boulder/probitmult_withp
airs.mx ! First group is MZF complete pairs
DATA NI9 NG3 LA COND MAJDEP PANATT ALCDEP
XCOND XMJDEP XPNATT XLCDEP WT ORDINAL
FIFEMMZF.DAT Ordinal data read initiated
NOTE Rectangular file contained 91 records
with data that contained a total of
819 observations DEFINITION_VARIABLES COND
MAJDEP PANATT XCOND XMJDEP XPNATT WT / NOTE
Definition yields 91 data vectors for analysis
NOTE Vectors contain a total of 182
observations BEGIN MATRICES O FU 1 6 !
STORE DEFINITION VARIABLE (I.E. VARA) HERE K FU
6 2 FR ! REGRESSION COEFFICIENT T FU 1 1 FR !
THRESHOLD VALUE V FU 1 1 ! VARIANCE (UNITY) R
FU 1 1 FR ! TETRACHORIC CORRELATION (TO BE
ESTIMATED) W FU 1 1 ! WEIGHT VARIABLE END
MATRICES SP O -1 -2 -3 -4 -5 -6 SP W -7 MAT V
1.00 MAT R 0.60 SP T 50 MAT T 0.1 SP K 100
0 101 0 102 0 0 100 0 101 0 102 MAT K 0.05
0 0.05 0 0.05 0 0 0.05 0 0.05 0 0.05 FREQ
W TH -(TT)-OK ! NOTE THAT WE NOW HAVE
THRESHOLDS FOR BOTH TWINS CO VR_ RV BO -5.0
5.0 T(1,1) K(1,1) K(2,1) K(3,1) BO -0.999 0.999
R(1,1) INTERVAL K(1,1) K(2,1) K(3,1) T(1,1)
R(1,1) OPT FUNC1.E-12 OPT RS END
85 Summary of VL file data for group 1
WT XPNATT XMJDEP XCOND
PANATT MAJDEP Code -7.0000
-6.0000 -5.0000 -4.0000 -3.0000
-2.0000 Number 91.0000 91.0000
91.0000 91.0000 91.0000 91.0000
Mean 7.5385 0.3297 0.5495 0.3077
0.2967 0.5495 Variance 761.9408
0.2210 0.2476 0.2130 0.2087
0.2476 Minimum 1.0000 0.0000
0.0000 0.0000 0.0000 0.0000 Maximum
255.0000 1.0000 1.0000 1.0000
1.0000 1.0000 COND
ALCDEP XLCDEP Code -1.0000
1.0000 2.0000 Number 91.0000
91.0000 91.0000 Mean 0.3516
0.4176 0.4066 Variance 0.2280
0.2432 0.2413 Minimum 0.0000
0.0000 0.0000 Maximum 1.0000
1.0000 1.0000 Summary of VL file data
for group 2 WT XPNATT
XMJDEP XCOND PANATT MAJDEP
Code -7.0000 -6.0000 -5.0000 -4.0000
-3.0000 -2.0000 Number 81.0000
81.0000 81.0000 81.0000 81.0000
81.0000 Mean 6.0864 0.3086
0.5185 0.3704 0.3086 0.5679 Variance
399.1407 0.2134 0.2497 0.2332
0.2134 0.2454 Minimum 1.0000
0.0000 0.0000 0.0000 0.0000
0.0000 Maximum 170.0000 1.0000
1.0000 1.0000 1.0000 1.0000
COND ALCDEP XLCDEP Code
-1.0000 1.0000 2.0000 Number
81.0000 81.0000 81.0000 Mean
0.2840 0.3333 0.4691 Variance
0.2033 0.2222 0.2490 Minimum
0.0000 0.0000 0.0000 Maximum
1.0000 1.0000 1.0000 Summary of
VL file data for group 3
WT PANATT MAJDEP COND
ALCDEP Code -4.0000E00 -3.0000E00
-2.0000E00 -1.0000E00 1.0000E00 Number
1.6000E01 1.6000E01 1.6000E01
1.6000E01 1.6000E01 Mean 6.5000E01
5.0000E-01 5.0000E-01 5.0000E-01
5.0000E-01 Variance 1.7240E04 2.5000E-01
2.5000E-01 2.5000E-01 2.5000E-01 Minimum
1.0000E00 0.0000E00 0.0000E00 0.0000E00
0.0000E00 Maximum 5.4400E02 1.0000E00
1.0000E00 1.0000E00 1.0000E00
86MX PARAMETER ESTIMATES GROUP NUMBER 1
Analysis of regression of alcdep on conduct
(broad), majdep, panatt
MATRIX K
This is a FULL matrix of order 6 by 2
1 2 1 0.6831 0.0000 2 0.3117
0.0000 3 0.2724 0.0000 4 0.0000 0.6831 5
0.0000 0.3117 6 0.0000 0.2724 MATRIX
O This is a FULL matrix of order 1 by 6
1 2 3 4 5 6
1 1.0000 1.0000 1.0000 0.0000 1.0000
0.0000 MATRIX R This is a FULL matrix of
order 1 by 1 1 1 0.5321
MATRIX T This is a FULL matrix of order 1 by
1 1 1 -1.2307 MATRIX V
This is a FULL matrix of order 1 by 1
1 1 1.0000 MATRIX W This is a FULL
matrix of order 1 by 1 1 1
1.0000 Matrix of EXPECTED thresholds
ALCDEP XLCDEP Threshold 1 -0.0365
0.9190 (OBSERVED MATRIX is nonexistent for
raw data) EXPECTED COVARIANCE MATRIX
ALCDEP XLCDEP ALCDEP 1.0000 XLCDEP
0.5321 1.0000 Function value of this group
1017.3444 Where the fit function is -2
Log-likelihood of raw ordinal
87 GROUP NUMBER 2 Analysis of regression of
alcdep on conduct (broad), majdep, panatt
MATRIX K This is a FULL matrix of order
6 by 2 1 2 1 0.6831
0.0000 2 0.3117 0.0000 3 0.2724 0.0000 4
0.0000 0.6831 5 0.0000 0.3117 6 0.0000
0.2724 MATRIX O This is a FULL matrix of
order 1 by 6 1 2 3
4 5 6 1 1.0000 1.0000 1.0000
1.0000 1.0000 1.0000 MATRIX R This is a
FULL matrix of order 1 by 1 1 1
0.2637 MATRIX T This is a FULL matrix of
order 1 by 1 1 1 -1.2307
MATRIX V This is a FULL matrix of order 1
by 1 1 1 1.0000 MATRIX W
This is a FULL matrix of order 1 by 1
1 1 1.0000 Matrix of EXPECTED
thresholds ALCDEP XLCDEP
Threshold 1 -0.0365 -0.0365 (OBSERVED
MATRIX is nonexistent for raw data)
EXPECTED COVARIANCE MATRIX ALCDEP
XLCDEP ALCDEP 1.0000 XLCDEP 0.2637 1.0000
Function value of this group 808.7919
Where the fit function is -2 Log-likelihood of
raw ordinal
88 GROUP NUMBER 3 Analysis of regression of
alcdep on conduct (broad), majdep, panatt
MATRIX K This is a FULL matrix of order
3 by 1 1 1 0.6831 2 0.3117
3 0.2724 MATRIX O This is a FULL matrix
of order 1 by 3 1 2
3 1 1.0000 1.0000 1.0000 MATRIX T This
is a FULL matrix of order 1 by 1
1 1 -1.2307 MATRIX V This is a FULL
matrix of order 1 by 1 1 1
1.0000 MATRIX W This is a FULL matrix of
order 1 by 1 1 1 7.0000
Matrix of EXPECTED thresholds
ALCDEP Threshold 1 -0.0365 (OBSERVED
MATRIX is nonexistent for raw data)
EXPECTED COVARIANCE MATRIX ALCDEP
ALCDEP 1.0000 Function value of this group
916.5047 Where the fit function is -2
Log-likelihood of raw ordinal Your model has
6 estimated parameters and 360 Observed
statistics -2 times log-likelihood of data
gtgtgt 2742.641 Degrees of freedom
gtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgt 354 5 Confidence
intervals requested in group 1 Matrix
Element Int. Estimate Lower
Upper Lfail Ufail K 1 1 1 95.0
0.6831 0.5195 0.8462 0 0 0 0
K 1 2 1 95.0 0.3117 0.2012
0.4222 0 0 0 0 K 1 3 1 95.0
0.2724 0.1145 0.4283 0 0 0 0
T 1 1 1 95.0 -1.2307 -1.3034
-1.1590 0 0 0 0 R 1 1 1 95.0
0.5321 0.3863 0.6553 0 0 0 0
1 Confidence intervals requested in group
2 Matrix Element Int. Estimate
Lower Upper Lfail Ufail R 2 1 1
95.0 0.2637 0.0712 0.4408 0
0 0 0 This problem used 0.4 of my
workspace Task Time
elapsed (DDHHMMSS) Reading script data
0 0 0 0.80 Execution 0 0
113.76 TOTAL 0 0
114.56 Total number of warnings issued 11
__________________________________________________
____________________________ ____________________
__________________________________________________
________
89HIGH-RISK SAMPLING SCHEMES
The Ordinal data-option in MX allows us to
analyze twin or family data collected under a
two-stage sampling scheme, where in the first
stage we study a random sample of families, but
in the second stage the probability that a family
will be assigned for interview is a function of
phenotypic values observed at the first stage.
For example, we may decide that we will do
follow-up assessments with all pairs where at
least one twin is affected at stage one, but only
10 of pairs where neither twin was affected at
stage one.
90 To illustrate this, we have created a simulated
data-set, with the following parameters, using
multsim2_2mz.mx and multsim2_2dz.mx.
First, we analyze this data assuming that all
twin pairs (1000 MZ, 1000 DZ pairs) are assessed
at both waves (ordinal_bivariate_simulated.mx).
91SIMULATED TWO-WAVE DATA
92(No Transcript)
93HIGH-RISK SAMPLING SCHEMES (II)
Next, we analyze the data-set that would arise
under our two-stage sampling scheme, using
ordinal_bivariate_hirisk.mx. This is exactly the
same program as in the previous case, except that
we have changed file names! In 90 of cases
where neither twin was affected at stage one, the
stage two phenotypic values are set to missing.
What parameter estimates do we recover in this
case?
94Two-Wave data simulating high-risk sampling
95(No Transcript)
96HIGH-RISK SAMPLING SCHEMES (III)
Notice that we included all pairs who were
assessed at stage one. What happens if we focus
on the stage two phenotype and include only those
pairs who have data at stage two? Data are in
mzlistwise.rec and dzlistwise.rec the program is
univariate_listwise.mx.
When we ignore the wave one data, our estimates
of population prevalence (not unexpectedly) and
genetic and environmental parameters, are
seriously biased!
97HIGH-RISK SAMPLING SCHEMES (IV)
Suppose that instead we acknowledge that our
population is drawn from a population where the
prevalence of the observed trait is 25, and fix
our estimate of the threshold value, t0.67449.
As in the previous example, we limit ourselves to
twin pairs where wave two assessments occurred.
The bias to parameter estimates is substantially
reduced! (There is still a bias, however in
particular, our estimate of the shared
environmental variance is now zero.)
98HIGH-RISK SAMPLING SCHEMES (V)
How do we explain these results? Missing data
theory is an active area of research in
statistics which is concerned with how we should
adjust for missing observations -- which may be
missing because of subject non-response, or
because of sampling design (e.g. our two-stage
sampling design). Missing data theory
distinguishes between data that are
(i) MCAR -- missing completely at random, i.e.
non-response is completely unrelated to the
variable we are studying (plausible for variables
such as finger ridge count). (ii) MAR -- missing
at random, i.e. non-response is random, but the
probability of non-response may vary as a
function of observed trait values (or underlying
latent variables).
99 Suppose we have a 5-level variable with the
following probabilities of missing data at
subsequent follow-up
These data are certainly not MCAR, but they do
meet the definition of MAR.
100 If probability of non-response is (i) determined
by one or more correlated phenotypes that are not
included in the analyses or (ii) partly a
function of the stage-two phenotype (such as
would be the case if individuals who were
unaffected at wave one but had become affected by
the time wave two were more likely not to agree
to be assessed at wave two than individuals who
remained unaffected throughout), missing data
will be non-ignorable. In the case where we
analyzed only the wave two data, but fixed the
prevalence at 25 (i.e. assuming that missingness
is determined by the stage two phenotype),
missingness was still strictly non-ignorable,
since it was determined by wave two and not wave
one phenotypic values. However, since we
simulated a very high test-retest correlation
between wave one and wave two data, analyzing the
data as though they were MAR greatly reduced
biases to estimates of genetic and environmental
parameters.
101HIGH-RISK SAMPLING SCHEMES (VI)
Under certain conditions, missingness is said to
be ignorable, i.e. we can recover estimates of
the underlying population parameters without
needing to adjust for differential rates of
non-response. For our two-stage high-risk
sampling scheme, where we assumed random sampling
at the first stage, but that only 10 of
concordant unaffected pairs are assessed at the
second stage, the stage-two data are MAR.
Provided that we use the Ordinal option in MX (or
the Raw data option, for continuous variables),
and analyze all pairs observed at stage one, we
can recover correct estimates of population
prevalence and of genetic and environmental
parameters.
102 Missing data theory provides a framework for
thinking about several important classes of
problems in behavior genetics (i) clinically
ascertained samples (ii) cooperation or
retention bias (iii) hierarchical or
stage-dependent models of genetic and
environmental influences on substance use
initiation and outcome (e.g. smoking initiation
and persisence) or risk of psychopathology.