Multilevel models: concept and application - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Multilevel models: concept and application

Description:

Modeling on the loge scale, cannot make prediction of a negative count on the raw scale ... Model loge to get a linear model (log link) ... – PowerPoint PPT presentation

Number of Views:194
Avg rating:3.0/5.0
Slides: 53
Provided by: svsubra
Category:

less

Transcript and Presenter's Notes

Title: Multilevel models: concept and application


1
Modelling Count Data Outline
  • Characteristics of count data and the Poisson
    distribution
  • Applying the Poisson Flying bomb strikes in
    South London
  • Deaths by horse-kick as a single-level model
    Poisson model fitted in MLwiN
  • Overdispersion types and consequences, the
    unconstrained Poisson, the Negative Binomial
  • Taking stock 4 distributions for modeling counts
  • Number of extramarital affairs the incidence
    rate ratio (IRR) handling categorical
    continuous predictors comparing model with DIC
  • Titanic survivor data taking account of
    exposure, the offset
  • Multilevel Poisson and NBD models estimation and
    VPC
  • Applications HIV in India and Teenage employment
    in Glasgow
  • Spatial models Lip cancer in Scotland
    respiratory cancer in Ohio

2
Some characteristics of count data
  • Very common in the social sciences
  • Number of children Number of marriages
  • Number of arrests Number of traffic accidents
  • Number of flows Number of deaths
  • Counts have particular characteristics
  • Integers cannot be negative
  • Often positively skewed a floor of zero
  • In practice often rare events which peak at 1,2
    or 3 and rare at higher values
  • Modelled by
  • Logit regression models the log odds of an
    underlying propensity of an outcome
  • Poisson regression models the log of the
    underlying rate of occurrence of a count.

3
Theoretical Poisson distribution I
  • The Poisson distribution results if the
    underlying number of random events per unit time
    or space have a constant mean (?) rate of
    occurrence, and each event is independent
    Simeon-Denis Poisson (1838) Research on the
    Probability of Judgments in Criminal and Civil
    Matters
  • Applying the Poisson Flying bomb strikes in
    South London
  • Key research question falling at random or under
    a guidance system
  • If random independent events should be
    distributed spatially as a Poisson distribution
  • Divide south London into 576 equally sized small
    areas (0.24km2)
  • Count the number of bombs in each area and
    compare to a Poisson
  • Mean rate ? 229(0) 211(1) 93(2) 35(3)
    7(4) 1(5)/576
  • 0.929 hits per unit square
  • Very close fit concluded random

4
Theoretical Poisson distribution II
Probability mass function(PMF) for 3 different
mean occurrences
  • When mean 1 very positively skewed
  • As mean occurrence increases (more common event),
    distribution approaches Gaussian
  • So use Poisson for rareish events mean below
    10
  • Fundamental property of the Poisson mean
    variance
  • Simulated 10,000 observations according to
    Poisson
  • Mean Variance Skewness
  • 0.993 0.98 1.00
  • 4.001 4.07 0.50
  • 10.03 10.38 0.33
  • Variance is not a freely estimated parameter as
    in Gaussian

5
Death by Horse-kick I the data
  • Bortkewicz L von.(1898) The Law of Small
    Numbers, Leipzig
  • No of soldiers killed annually by horse-kicks in
    Prussian cavalry 10 corps over 20 years
    (occurrences per unit time)
  • The full data 200 corps years of observations
  • As a frequency distribution (grouped data)
  • Deaths 0 1 2 3 4 5
  • Frequency 109 65 22 3 1 0
  • Mean Variance Number of obs
  • 0.61 0.611 200
  • Interpretation mean rate of 0.61 deaths per
    cohort year (ie rare)
  • Mean equals variance, therefore a Poisson
    distribution

6
Death by Horse-kick II as a Poisson
Again The Poisson results if underlying number
of random events per unit time or space have a
constant mean (?) rate of occurrence, and each
event is independent
  • With a mean (and therefore a variance) of 0.61
  • Deaths 0 1 2 3 4 5
  • Frequency 109 65 22 3 1 0
  • Theory 109 66 20 4 1 1
  • Formula for Poisson PMF
  • e is base of the natural logarithm (2.7182)
  • ? is the mean (shape parameter) the average
    number of events in a given time interval
  • x! is the factorial of x

EG mean rate ? of 0.6 accidents per corps year
what is probability of getting 3 accidents in a
corps in a year?
7
Horse-kick III as a single level Poisson model
General form of the single-level model
  • Observed count is distributed as an underlying
    Poisson with a mean rate of occurrence of ?
  • That is as an underlying mean and level-1 random
    term of z0 (the Poisson weight)
  • Mean rate is related to predictors non-linearly
    as an exponential relationship
  • Model loge to get a linear model (log link)
  • The Poisson weight is the square root of
    estimated underlying count, re-estimated at each
    iteration
  • Variance of level-1 residuals constrained to 1,
  • Modeling on the loge scale, cannot make
    prediction of a negative count on the raw scale
  • Level-1 variance is constrained to be an exact
    Poisson, (variance mean)

8
Horse-kick IV Null single-level Poisson model in
MLwiN
  • The raw ungrouped counts are modeled with a log
    link and a variance constrained to be equal to
    the mean
  • -0.494 is the mean rate of occurrence on the
    log scale
  • Exponentiate -0.494 to get the mean rate of 0.61
  • 0.61 is interpreted as RATE the number of
    events per unit time (or space), ie 0.61
    horse-kick deaths per corps-year

9
Overdispersion I Types and consequences
  • So far equi-dispersion, variances equal to the
    mean
  • Overdispersion variance gt mean long tail, eg
    LOS (common)
  • Un-dispersion variance lt mean data more alike
    than pure Poisson process in multilevel
    possibility of missing level
  • Consequences of overdispersion
  • Fixed part SEs "point estimates are accurate
    but they are not as precise as you think they
    are"
  • In multilevel, mis-estimate higher-level random
    part
  • Apparent and true overdispersion thought
    experiment
  • number of extra-marital affairs men women with
    different means
  • apparent mis-specified fixed part, not
    separated out distributions with different means
  • true genuine stochastic property of more
    inherent variability
  • in practice model fixed part as well as
    possible, and allow for overdisperion

10
Overdispersion II the unconstrained Poisson
  • Deaths by horse-kick
  • estimate an over- dispersed Poisson
  • allow the level-1 variance to be estimated
  • Not significantly different from 1
  • No evidence that this is not a Poisson
    distribution


11
Overdispersion III the Negative binomial
  • Instead of fitting an overdispersed Poisson,
    could fit a NBD model
  • Handles long-tailed distributions
  • An explicit model in which variance is greater
    than the mean
  • Can even have an over-dispersed NBD
  • Same log-link but NBD has 2 parameters for the
    level-1 variance that is quadratic level-1
    variance, v is the overdispersion parameter

12
Overdispersion IV the Negative binomial
  • Horsekick analysis
  • Null single-level NBD model essentially no
    change, v is estimated to be 0.00 (see with
    Stored model Compared stored model)
  • Overdispersed negative binomial
  • No evidence of overdispersion deaths are
    independent

13
Linking the Binomial and the Poisson I
  • First Bernoulli and Binomial
  • Bernoulli is a distribution for binary discrete
    events
  • y is observed outcome ie 1 or 0
  • E(y) ? underlying propensity/probability
    for occurrence
  • Var(y) ?/1- ?
  • Binomial is a distribution for discrete events
    out of a number of trials
  • y is observed outcome n is the number of
    trials,
  • E(y) ? underlying propensity/ of occurrence
  • Var(y) ?/(1- ?)/n
  • Least variation when denominator is large (more
    reliable), and as underlying probability
    approaches 0 or 1

14
Linking the Binomial and the Poisson II
  • Poisson is limit of a binomial process in which
    prob ? 0, n?8
  • Poisson describes the probability that a random
    event will occur in a time or space interval
    when the probability is very small, the number
    of trials is very large, and each event is
    independent
  • EG The probability that any automobile is
    involved in an accident is very small, but there
    are very many cars on a road in a day so that
    the distribution (if each crash is independent)
    follows a Poisson count
  • If non-independence of crashes (a pile-up),
    then over-dispersed Poisson/NBD, latter used for
    contagious processes
  • In practice, Poisson and NBD used for rare
    occurrences, less than 10 cases per interval,
    hundreds or even thousands for denominator/
    trials Clayton Hills (1993)Statistical Models
    in Epidemiology OUP

15
Taking stock 4 distributions for counts
  • If common rate of occurrence (mean gt10) then use
    raw counts and Gaussian distribution (assess
    Normality assumption of the residuals)
  • If rare rate of occurrence, then use
    over-dispersed Poisson or NBD the level-1
    unconstrained variance estimate will allow
    assessment of departure from equi-dispersion
    improved SEs, but biased estimates if apparent
    overdispersion due to model mis-specification
  • Use the Binomial distribution if count is out of
    some total and the event is not rare that is
    numerator and denominator of the same order
  • Mean variance relations for 4 different
    distributions that could be used for counts


16
Modeling number of extra-marital affairs Single-
level Poisson with Single categorical Predictor
Extract of raw data (601 individuals from Fair
1978)
Fair, R C(1978) A theory of extramarital affairs,
Journal of Political Economy, 86(1), 45-61
  • Single categorical predictor Children with
    NoKids, as the base
  • Understanding customized predictions.

17
Single- level Poisson with Single categorical
Predictor
Understanding customized predictions
  • Log scale
  • NoKids -0.092 WithKids-0.0920.606 0.514
  • First use equation to get underyling log-number
    of events then exponeniate to get estimated count
    (since married)
  • As mean/median counts
  • NoKids expo(-0.092) 0.91211
  • Withkids expo(-0.092 0.606) 1.6720
  • Those with children have a higher average rate of
    affairs (but have they been married longer?)

18
Modeling number of extra marital affairs the
incidence rate ratio (IRR)
So far mean counts NoKids expo(-0.092)
0.91211 Withkids expo(-0.092 0.606)
1.6720 But also as IRR comparing the
ratio of those with and without kids IRR
1.6720/0.91211 1.8331 That is Withkids have a
83 higher rate
BUT can get this directly from the model by
exponentiating the estimate for the contrasted
category expo(0.606) 1.8331
  • Rules
  • exponeniating the estimates for the (constant
    plus the contrasted category) gives the mean rate
    for the contrasted category
  • b) ) exponeniating the contrasted category gives
    IRR in comparison to base category

19
Why is the exponentiated coefficient a IRR?
  • As always, the estimated coefficient is the
    change in response corresponding to a one unit
    change in the predictor
  • Response is underlying logged count
  • When Xi is 0 (Nokids) log count Xi is 0 ß0
  • But when Xi is 1(Withkids) log count Xi is 1
    ß0 ß1 X1i
  • Subtracting the first equation from the second
    gives
  • (log count Xi is 1)-(log count Xi is 0) ß1
  • Exponentiating both sides gives (note the
    division sign)
  • (count Xi is 1)/(count Xi is 0) exp(ß1)
  • Thus, exp(ß1) is a rate ratio corresponding to
    the ratio of the mean number of affairs for a
    with-child person to the mean number of affairs
    without-child person
  • Incidence number of new cases
  • Rate because it number of events per time or
    space
  • Ratio because its is ratio of two rates

20
Modeling number of extra-marital affairs
changing the base category
  • Previously contrasted category Withkids
    0.606
  • Now contrasted category Nokids -0.606
  • Changing base simply produces a change of sign
    on the loge scale
  • Exponentiating the contrasted category
  • Before expo(0.606) 1.8331
  • Now expo(-0.606) 0.5455
  • Doubling the rate on loge scale is 0.693 Halving
    the rate on loge scale is -0.693
  • IRR of 0.111 IRR of 9-fold increase,
    difficult to appreciate
  • Advice choose base category to be have the
    lowest mean rate
  • get positive contrasted estimates
  • always then comparing a larger value to a base
    of 1

21
Affairs modeling a set of categorical predictors
  • A model with years married included with lt 4
    years as base

Customised predictions mean rate, IRR, graph
with 95 CIs

22
Affairs modeling a continuous predictor Age
Age as a 2nd order polynomial centred around 17
years (the youngest person in survey also lowest
rate
  • To get mean rate as it changes with age
  • Expo (-0.990 0.149(Age-17) -0.003(Age-17)2)
  • To get IRR in comparison for a person aged 33
    compared to 17
  • Expo( 0.14916) (0.003 162) (drop the
    constant!)
  • Easiest interpreted as graphs!


23
Affairs a SET of predictors models
  • Notice substantial overdispersion
  • Poisson extra-Poisson no change in estimates
    some change to NBD
  • Notice larger SEs when allow for
    overdispersion NBD most conservative
  • In full model, WithKids not significant

24
NBD model for Marital Affairs
  • IRR of 1 for Under 4 years married, Very
    religious, No children, aged 17
  • Previous Age effect is really length of marriage
  • Used comparable vertical axes, range of 4

25
NBD model for number of Extra-Marital Affairs
  • With 95 confidence intervals
  • NB that they are asymmetric on the unlogged scale

26
Affairs Evaluating a sequence of models using DIC
  • Likelihood and hence the Deviance are not
    available for Poisson and NBD models fitted by
    quasi-likelihood
  • DIC criterion available though MCMC typically
    needs larger number of iterations than Normal
    Binomial (suggested default is 50k not 5k)
  • Currently MCMC not available in MLwiN for
    over-dispersed Poisson nor NBD models so have to
    use Wald tests in Intervals and tests window

27
Titanic survivor data Taking account of exposure
  • So far, response is observed count, now we want
    to model a count given exposure EG only 1
    high-class female child survived but only 1
    exposed!

Here 2 possible measures of exposure a) the
number of potential cases could use a
binomial b) the expected number if everyone had
the same exposure (i indexes cell) Death
rate Total Deaths/ Total exposed 817/1316
0.379179 Expi Casesi Survival rate
Latter often used to treat the exposure as a
nuisance parameter allows calculation of
Standardised Rates SRi Obsi/Expi 100
Previous examples Horsekick Exposure removed
by design 200 cohort years Affairs included
length of marriage theoretically interesting
28
Modeling SRs the use of the OFFSET
Model SRi (Obsi/Expi) F(Agei, Genderi,
Classi) Where i is a cell, groups with same
characteristics Aim are observed survivors
greater or less than expected, and how these
differences are related to a set of predictor
variables? As a non-linear model E(SRi)
E(Obsi/Expi)
As a linear model (division of raw data is
subtraction of a log)
Loge (Obsi) - Loge(Expi)) As a model with an
offset, moving Loge(Expi) to the right-hand side,
and constraining coefficient to be 1 ie Exp
becomes predictor variable
Loge (Obsi) 1.0 Loge(Expi) NB MLwiN
automatically loge transforms the observed
response you have to create the loge of the
expected and declare it as an offset
Sir John Nelder
29
Surviving on the Titanic as a log-linear model
  • Include the offset
  • As a saturated model ie Age GenderClass,
    (223), 12 terms for 12 cells
  • Make predictions on the loge scale (must include
    constant) exponentiate all terms to get
    departures from the expected rate, that is
    modeled SRs

30
Titanic survival parsimonious model
  • Remove insignificant terms starting with 3-way
    interactions for Highwomenchildren

Customized predictions Very low rates of
survival for Low and Middle class adult men
large gender gap for adults, but not for children
31
Titanic survival parsimonious model
  • Modeled SRs and descriptive SRs
  • Ordered by worse survival
  • Estimated SRs only shown if 95 CIs do not
    include 1.0


32
Two-level multilevel Poisson
One new term, the level 2 differential, on the
loge scale, is assumed to come from Normal
distribution with a variance of
  • Can also fit Poisson multilevel with offset and
    NBD multilevel in MLwiN

33
Estimation of multilevel Poisson and NBD in MLwiN
I
  • Same options as for binary and binomial
  • Quasi-likelihood and therefore MQL or PQL fitted
    using IGLS/RIGLS fast, but no deviance (have to
    use Wald tests) may be troubled by small number
    of higher-level units simulations have shown
    that MQL tends to overestimate the higher-level
    variance parameters
  • MCMC estimates good quality and can use DIC to
    compare Poisson models but currently MCMC is not
    possible for extra-Poisson nor for NBD
  • MCMC in MLwiN often produces highly correlated
    chains (in part due to the fact that the
    parameters of the model are highly correlated
    variance mean) Therefore requires substantial
    number of simulations typically much larger than
    for Normal or for Binomial

34
Estimation of multilevel Poisson and NBD in MLwiN
II
  • Possibility to output to WinBUGS and use the
    univariate AR sampler and Gamerman (1997) method
    which tends to have less correlated chains, but
    WinBUGS is considerably slower generally
    Gamerman, D. (1997) Sampling from the posterior
    distribution in generalized linear mixed models.
    Statistics and Computing 7, 57-68
  • Advice start with IGLS PQL switch to MCMC,
    be prepared to make 500,000 simulations (suggest
    use 1 in 10 thinning to store the chains) use
    Effective sample size to assess required length
    of change, eg need ESS of at least 500 for key
    parameters of interest compare results and
    contemplate using PQL and over-dispersed Poisson
  • Freely available software MIXPREG for multilevel
    Poisson counts including offsets uses full
    information maximum likelihood estimated using
    quadrature http//tigger.uic.edu/hedeker/mixpcm.P
    DF

35
VPC for Poisson models
  • Can either use
  • Simulation method to derive VPC (modify the
    binomial procedure)
  • Use exact method http//people.upei.ca/hstryhn/i
    ccpoisson.ppt (Henrik Stryhn)
  • VPC for two level random intercepts model
    (available for other models)



Clearly VPC depends on ? and
36
Aim investigate the State geography of HIV in
terms of riskData nationally representative
sample of 100k individuals in 2005- 2006Response
HIV sero-status from blood samplesStructure
1720 cells within 28 States cells are a group of
people who share common characteristics
Age-Groups(4), Education(4), Sex(2), Urbanity(2)
and State (28)Rarity only 467 sero-positives
were foundModel Log count of number of
seropositives in a cell related to an offset of
Log expected count if national rates
applied Predictors of Age, Sex and Education and
Urbanity Two-level multilevel Poisson,
extra-Poisson NBD
Modeling Counts in MLwiN HIV in India
37
HIV in India Standardized Morbidity Rates
Higher educated females have the lowest risk,
across the age-groups
38
HIV in India some results
Risks for different States relative to living in
urban and rural areas nationally.
39
Modeling proportions as a binomial in MLwiN
  • exactly the same procedure as for binary models
  • except that observed y is a proportion (not just
    1 and 0, the denominator (n) is variable (not
    just 1) and extra-dispersion at level 1 is
    allowed (not just exact binomial)

Reading Subramanian S V, Duncan C, Jones K
(2001) Multilevel perspectives on modeling census
data Environment and Planning A 33(3) 399  417
40
Data teenage employment In Glasgow districts
  • Ungrouped data that is individual data
  • Model binary outcome of employed or not and two
    individual predictors

41
Same data as a multilevel structure a set of
tables for each district
  • GENDER
  • QUALIF MALE FEMALE Postcode UnErate
  • LOW 5 out of 6 3 out of 12 G1A 15
  • HIGH 2 out of 7 7 out of 9
  • LOW 5 out of 9 7 out of 11 G1B 12
  • HIGH 8 out of 8 7 out of 9
  • LOW 3 out of 3 - G99Z 3
  • HIGH 2 out of 3 out of 5
  • Level 1 cell in table
  • Level 2 Postcode sector
  • Margins define the two categorical predictors
  • Internal cells the response of 5 out of 6 are
    employed

42
Teenage unemployment some results from a
binomial, two-level logit model
43
Spatial Models as a combination of strict
hierarchy and multiple membership counts are
commonly used
44
Scottish Lip Cancer Spatial multiple-membership
model
  • Response observed counts of male lip cancer
    for the 56 regions of Scotland (1975-1980)
  • Predictor of workforce working in outdoor
    occupations (AgricFor Fish) Expected count
    based on population size
  • Structure areas and their neighbours defined as
    having a common border (up to 11) equal
    weights for each neighbouring region that sum
    to 1
  • Rate of lip cancer in each region is affected by
    both the region itself and its nearest neighbours
    after taking account of outdoor activity
  • Model Log of the response related to fixed
    predictor, with an offset, Poisson
    distribution for counts
  • NB Two sets of random effects
  • 1 area random effects (ie unstructured
    non-spatial variation)
  • 2 multiple membership set of random effects for
    the neighbours of each region

45
MCMC estimation 50,000 draws
Poisson model
Fixed effects Offset and Well-supported
relation
Well-supported Residual neighbourhood effect
NB Poisson highly correlated chains
46
Scottish Lip Cancer CAR model CAR CAR one
set of random effects, which have an expected
value of the average of the surrounding random
effects weights divided by the number of
neighbours
where ni is the number of neighbours for area i
and the weights are typically all 1
MLwiN limited capabilities for CAR model ie at
one level only (unlike Bugs)
47
MCMC estimation CAR model, 50,000 draws
Poisson model
Fixed effects Offset and Well-supported
relation
Well-supported Residual neighbourhood effect
48
NB Scales shrinkage
49
  • Ohio cancer repeated measures (space and time!)
  • Response counts of respiratory cancer deaths
    in Ohio counties
  • Aim Are there hotspot counties with distinctive
    trends? (small numbers so borrow strength
    from neighbours)
  • Structure annual repeated measures (1979-1988)
    for counties
  • Classification 3 nhoods as MM (3-8 nhoods)
  • Classification 2 counties (88)
  • Classification 1 occasion (8810)
  • Predictor Expected deaths Time
  • Model Log of the response related to fixed
    predictor, with an offset, Poisson
    distribution for counts (C1)
  • Two sets of random effects
  • 1 area random effects allowed to vary over time
    trend for each county from the Ohio
    distribution (c2)
  • 2 multiple membership set of random effects for
    the neighbours of each region (C3)

50
MCMC estimation repeated measures model, 50,000
draws
General trend
Nhood variance
Variance function for between county time trend
Default priors
51
Respiratory cancer trends in Ohio raw and
modelled
Red County 41 in 1988 SMR 77/49 1.57 Blue
County 80 in 1988 SMR 6/19 0.31
52
General References on Modeling Counts Agresti, A.
(2001) Categorical Data Analysis (2nd ed). New
York Wiley. Cameron, A.C. and P.K. Trivedi
(1998). Regression analysis of count data,
Cambridge University Press Hilbe, J.M. (2007).
Negative Binomial Regression, Cambridge
University Press. McCullagh, P and Nelder, J
(1989). Generalized Linear Models, Second
Edition. Chapman Hall/CRC.  On spatial
models Browne, W J (2003) MCMC Estimation in
MLwiN Chapter 16 Spatial models Lawson, A.B.,
Browne W.J., and Vidal Rodeiro, C.L. (2003)
Disease Mapping using WinBUGS and MLwiN Wiley.
London (Chapter 8 GWR)
Write a Comment
User Comments (0)
About PowerShow.com