Revealing the riddle of REML - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Revealing the riddle of REML

Description:

Weight gain (g) of chicks fed on one of 4 diets. ANOVA vs ... Chick stratum. Diet 3 0.01847 0.00616 0.10 0.958. Residual 12 0.73230 0.06103. Total 15 0.75078 ... – PowerPoint PPT presentation

Number of Views:234
Avg rating:3.0/5.0
Slides: 47
Provided by: MickO5
Category:
Tags: reml | chick | revealing | riddle

less

Transcript and Presenter's Notes

Title: Revealing the riddle of REML


1
Revealing the riddle of REML
  • Mick ONeill
  • Faculty of Agriculture, Food Natural Resources,
    University of Sydney

2
Background
  • Biometry 1 and 2 are core units with an applied
    stats focus. Many students have only Maths in
    Society on entry to the Faculty
  • Biometry 3 is a Third Year elective
  • Biometry 4 is (still) a possible major
  • All students are now expected to design and
    analyse their fourth year experiments with little
    or no help from the Biometry Unit

3
Third year Biometry students can
  • Design and analyse multi-strata factorial
    experiments (split-plots, strip-plots)
  • Perform binomial ordinal logistic regression,
    Poisson regression,
  • Analyse repeated measures data using REML

4
Step 1. What is Maximum Likelihood?
  • The likelihood is the prior probability of
    obtaining the actual data in your sample
  • This requires you to assume that the data follow
    some distribution, typically
  • Binomial or Poisson for count data
  • Normal or LogNormal for continuous data

5
Step 1. What is Maximum Likelihood?
The likelihood is the prior probability of
obtaining the actual data in your sample Each of
these distributions involves at least one unknown
parameter (probability, mean, standard deviation,
) which must be estimated from the data.
6
Step 1. What is Maximum Likelihood?
The likelihood is the prior probability of
obtaining the actual data in your
sample Estimation is achieved by finding that
parameter value which maximises the likelihood
(or equivalently the log-likelihood)
7
Example 1. Binomial data
  • Guess p P(seed germinates)
  • Evaluate LogL
  • Maximise LogL by varying p

8
Example 2. Normal data
  • Guess m and s
  • Evaluate LogL
  • Maximise LogL by varying m and s

9
Step 2. What is REML?
It is possible to partition the likelihood into
two terms
  • a likelihood that involves m (as well as s2)
  • and
  • a residual likelihood that involves only s2

10
Step 2. What is REML?
It is possible to partition the likelihood into
two terms, in such a way that
  • the first likelihood can be maximised to estimate
    m (and its solution does not depend on the value
    of s2)
  • the residual likelihood can be maximised to
    estimate s2 ? REML estimate

11
How?
12
Solutions
  • ML estimate of variance is
  • REML estimate is
  • In each case the estimate of m is the sample mean

13
sn-1
sn
ML estimate
REML estimate
14
Extensions
15
(No Transcript)
16
Example 3. One-way (no blocking)Fixed effects

17
ANOVA vs REML
  • ANOVA
  • Source of variation d.f. s.s.
    m.s. v.r. F pr.
  • Chick stratum
  • Diet 3 0.01847
    0.00616 0.10 0.958
  • Residual 12 0.73230
    0.06103
  • Total 15 0.75078
  • REML Variance Components Analysis
  • Fixed model ConstantDiet
  • Random model Chick
  • Chick used as residual term
  • Residual variance model
  • Term Factor Model(order) Parameter
    Estimate S.e.
  • Chick Identity Sigma2
    0.0610 0.02491
  • Wald tests for fixed effects

18
  • ANOVA
  • Tables of means
  • Grand mean 3.129
  • Diet Diet_1 Diet_2 Diet_3 Diet_4
  • 3.107 3.188 3.112 3.107
  • Standard errors of differences of means
  • Table Diet
  • rep. 4
  • d.f. 12
  • s.e.d. 0.1747
  • REML Variance Components Analysis

19
Example 4a. One-way (in randomised blocks)
fixed treatments
  • ANOVA
  • Source of variation d.f. s.s.
    m.s. v.r. F pr.
  • Block stratum 5 5.410
    1.082 0.29
  • Block.Units stratum
  • Spacing 4 125.661
    31.415 8.50 lt.001
  • Residual 20 73.919
    3.696
  • REML Variance Components Analysis
  • (a) With Block Spacing both fixed effects
  • Term Factor Model(order) Parameter
    Estimate S.e.
  • Residual Identity Sigma2
    3.696 1.169
  • Fixed term Wald statistic d.f.
    Wald/d.f. Chi-sq prob
  • Block 1.46 5
    0.29 0.917
  • Spacing 34.00 4
    8.50 lt0.001

20
Random blocks, fixed treatments
  • ANOVA
  • Source of variation d.f. s.s.
    m.s. v.r. F pr.
  • Block stratum 5 5.410
    1.082 0.29
  • Block.Units stratum
  • Spacing 4 125.661
    31.415 8.50 lt.001
  • Residual 20 73.919
    3.696
  • REML Variance Components Analysis
  • (b) With Spacing fixed and Block random
  • Estimated Variance Components
  • Random term Component S.e.
  • Block 0.000 BOUND
  • Residual variance model
  • Term Factor Model(order) Parameter
    Estimate S.e.
  • Residual Identity Sigma2
    3.173 0.897

Estimated Source
Value Block -0.5228 Error
3.6959
21
Model for RCBD
  • Yield of soybean Overall mean Block effect
    Spacing effect

  • Error
  • Overall mean and Spacing effects are fixed
    effects
  • Block effect is a random term
  • Error is a random term

22
General Linear Mixed Model
  • Yield of soybean Overall mean Block effect
    Spacing effect

  • Error
  • Y Fixed effects Random effects Error term
  • Y Xt Zu e
  • The random effects can be correlated
  • The error term can be correlated
  • The random effects are uncorrelated with the
    error term

23
General Linear Mixed Model
  • Y Fixed effects Random effects Error term
  • Y Xt Zu e

is a scaling factor, often set to 1
24
  • REML is used as the default to estimate to
    variance and covariance parameters of the model
  • The algorithm does not depend on the data being
    balanced

25
Furthermore, different choices for the variance
matrices allow for
  • an appropriate repeated measures analysis for
    normal data
  • an appropriate spatial analysis for field trials

Nested models can be compared using the change in
deviance which is approximately c2 with df
change in number of parameters
26
Example 5. Adjusting thesis marks for random
markers
27
For students
For markers
28
Example 6. Use of devianceWidths (in mm) of the
dorsal shield of larvae of ticks found on 4
rabbits
29
  • Minitabs analysis
  • Source DF Adj MS F P
  • Rabbit 3 602.6 5.26 0.004
  • Error 33 114.5
  • Estimated
  • Source Term Source Value
  • 1 Rabbit (2) 9.0090 (1) Rabbit 54.18
  • 2 Error (2) Error 114.48
  • Rabbit Mean
  • 1 372.3
  • 2 354.4
  • 3 355.3
  • 4 361.3 these are sample means

30
  • GenStats Linear Mixed Models analysis
  • Random term Component S.e.
  • Rabbit 55.0 55.8
  • Residual variance model
  • Term Model(order) Parameter Estimate
    S.e.
  • Residual Identity Sigma2 114.4
    28.2
  • Table of predicted means for Rabbit (these are
    BLUPs)
  • Rabbit 1 2 3 4
  • 369.9 355.5 356.0 361.2
  • Standard error of differences Average
    4.613
  • Maximum
    5.055
  • Minimum
    4.133
  • Average variance of differences
    21.38

31
  • Test H0
  • Method drop Rabbit as a random term
  • Deviance d.f.
  • 221.21 35 for reduced model
  • 215.22 34
  • Change in deviance 6.0 with 1 df
  • P-value 0.014
  • The variation in the widths of the dorsal shield
    of larvae of ticks found among rabbits differs
    significantly across rabbits (P 0.014)
  • The variance among rabbits is estimated to be
    55.0 (? 55.7) compared to the variance within
    rabbits, namely 114.4 (? 28.2)

32
Example 7 - Repeated Measures
Growth of a fungus (in cm) over time
33
Growth of fungus
34
  • Split plot without Greenhouse-Geisser adjustmment
  • (assumes equi-correlation structure among times)
  • Source of variation d.f. m.s. v.r.
    F pr.
  • Rep.Fungus stratum
  • Fungus 2 8.104 97.30
    lt.001
  • Residual 9 0.083 3.37
  • Rep.Fungus.Time stratum
  • Time 5 55.231 2233.21
    lt.001
  • Fungus.Time 10 0.933 37.71
    lt.001
  • Residual 45 0.025
  • Estimated stratum variances
  • Stratum variance d.f. variance
    component
  • Rep.Fungus 0.0833 9
    0.0098
  • Rep.Fungus.Time 0.0247 45
    0.0247

35
  • Split plot with Greenhouse-Geisser adjustmment
  • (tests equi-correlation structure among times)
  • Source of variation d.f. m.s. v.r.
    F pr.
  • Rep.Fungus stratum
  • Fungus 2 8.104 97.30
    lt.001
  • Residual 9 0.083 3.37
  • Rep.Fungus.Time stratum
  • Time 5 55.231 2233.21
    lt.001
  • Fungus.Time 10 0.933 37.71
    lt.001
  • Residual 45 0.025
  • (d.f. are multiplied by the correction factors
    before calculating F probabilities)
  • Box's tests for symmetry of the covariance
    matrix
  • Chi-square 57.47 on 19 df probability
    0.000
  • F-test 2.93 on 19, 859 df probability
    0.000

36
Split plot via REML ignoring changing
variances
  • Fixed model ConstantFungusTimeFungus.Tim
    e
  • Random model Rep.FungusRep.Fungus.Time
  • Estimated Variance Components
  • Random term Component S.e.
  • Rep.Fungus 0.00976 0.00660
  • Residual variance model
  • Term Model(order) Parameter
    Estimate S.e.
  • Rep.Fungus.Time Identity Sigma2
    0.0247 0.0052
  • Deviance d.f.
  • -109.90 52
  • Fixed term Wald statistic d.f. Wald/d.f.
    Chi-sq prob
  • Fungus 194.60 2 97.30
    lt0.001
  • Time 11166.05 5 2233.21
    lt0.001
  • Fungus.Time 377.08 10 37.71
    lt0.001

37
Split plot via REML accounting for changing
variances (a)
  • Fixed model ConstantFungusTimeFungus.Tim
    e
  • Random model Rep.FungusRep.Fungus.Time
  • Estimated Variance Components
  • Random term Component S.e.
  • Rep.Fungus 0.01053 0.00539
  • Residual variance model
  • Term Model(order) Parameter
    Estimate S.e.
  • Rep.Fungus.Time Identity Sigma2
    0.0082 .00428
  • Rep Identity - -
    -
  • Fungus Identity - -
    -
  • Time Diagonal d_1
    1.000 FIXED
  • d_2
    1.102 0.815
  • d_3
    0.227 0.215
  • d_4
    0.262 0.253
  • d_5
    1.965 1.443
  • d_6
    13.550 9.580

38
Split plot via REML accounting for changing
variances (b)
  • Fixed model ConstantFungusTimeFungus.Tim
    e
  • Random model Rep.FungusRep.Fungus.Time
  • Estimated Variance Components
  • Random term Component S.e.
  • Rep.Fungus 0.01053 0.00539
  • Residual variance model
  • Term Model(order) Parameter
    Estimate S.e.
  • Rep.Fungus.Time Identity Sigma2
    1.000 FIXED
  • Rep Identity - -
    -
  • Fungus Identity - -
    -
  • Time Diagonal d_1
    0.0082 0.0043
  • d_2
    0.0091 0.0047
  • d_3
    0.0019 0.0015
  • d_4
    0.0022 0.0017
  • d_5
    0.0162 0.0081
  • d_6
    0.1116 0.0530

39
Split plot via REML accounting for changing
variances
and an AR(1) correlation structure
  • Fixed model ConstantFungusTimeFungus.Tim
    e
  • Random model Rep.FungusRep.Fungus.Time
  • Estimated Variance Components
  • Random term Component S.e.
  • Rep.Fungus 0.010616 0.005572
  • Residual variance model
  • Term Model(order) Parameter
    Estimate S.e.
  • Rep.Fungus.Time Identity Sigma2
    0.0085 0.0045
  • Rep Identity - -
    -
  • Fungus Identity - -
    -
  • Time AR(1) het phi_1
    0.148 0.209
  • d_1
    1.000 FIXED
  • d_2
    1.202 0.895
  • d_3
    0.260 0.249
  • d_4
    0.264 0.266
  • d_5
    1.829 1.347
  • d_6
    13.560 9.620

40
Split plot via REML accounting for changing
variances
and an AR(1) correlation structure
  • Deviance d.f.
    Change d.f.
  • Same variance, uncorrelated -109.90 52
  • Different variances over time -151.03 47
    41.13 5
  • AR(1) correlation structure -151.59 46
    0.56 1

41
Example 8 Spatial analysis
RCBD (fixed) fertilisers Potato
yields (t/ha)
42
  • Source of variation d.f. m.s.
    v.r. F pr.
  • Block stratum 3 2.929
    1.43
  • Block.Treatment stratum
  • Treatment 5 92.359
    45.07 lt.001
  • Residual 15 2.049
  • Treatment A B C D
    E F
  • 17.55 25.60 27.67 25.94
    30.51 30.63
  • Standard errors of differences of means
  • Table Treatment
  • rep. 4
  • d.f. 15
  • s.e.d. 1.012

43
Contour plot of residuals
44
(No Transcript)
45
  • REML
  • Random term Component S.e.
  • Block 0.395 0.500
  • Residual variance model
  • Term Factor Model Parameter Estimate
    S.e.
  • Y.X Sigma2 2.849
    1.739
  • Y AR(1) phi_1
    0.7054 0.2078
  • X AR(1) phi_1
    -0.2508 0.3397
  • Deviance d.f.
  • 36.54 14
  • Treatment A B C D
    E F
  • 17.74 26.29 26.79 26.34
    30.41 29.37
  • Standard error of differences Average
    0.7749
  • Maximum
    0.8942

46
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com