Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures

Description:

A landmine field where landmines tend to be close together. 17 ... What if the first column is missing? Unusual case in statistics, so few people work on it. ... – PowerPoint PPT presentation

Number of Views:225
Avg rating:3.0/5.0
Slides: 42
Provided by: people3
Category:

less

Transcript and Presenter's Notes

Title: Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures


1
Clustering in Generalized Linear Mixed Model
Using Dirichlet Process Mixtures
  • Ya Xue Xuejun Liao
  • April 1, 2005

2
Introduction
  • Concept drift is in the framework of generalized
    linear mixed model, but brings new question of
    exploiting the structuring of auxiliary data.
  • Mixtures with a countably infinite number of
    components can be handled in a Bayesian framework
    by employing Dirichlet process priors.

3
Outline
  • Part I generalized linear mixed model
  • Generalized linear model (GLM)
  • Generalized linear mixed model (GLMM)
  • Advanced applications
  • Bayesian feature selection in GLMM
  • Part II nonparametric method
  • Chinese restaurant process
  • Dirichlet process (DP)
  • Dirichlet process mixture models
  • Variational inference for Dirichlet process
    mixtures

4
  • Part I
  • Generalized Linear Mixed Model

5
Generalized Linear Model (GLM)
  • A linear model specifies the relationship between
    a dependent (or response) variable Y, and a set
    of predictor variables, Xs, so that
  • GLM is a generalization of normal linear
    regression models to exponential family (normal,
    Poisson, Gamma, binomial, etc).

6
Generalized Linear Model (GLM)
  • GLM differs from linear model in two major
    respects
  • The distribution of Y can be non-normal, and does
    not have to be continuous.
  • Y still can be predicted from a linear
    combination of Xs, but they are "connected" via a
    link function.

7
Generalized Linear Model(GLM)
  • DDE Example binomial distribution
  • Scientific interest does DDE exposure increase
    the risk of cancer? Test on rats. Let i index
    rat.
  • Dependent variables
  • Independent variable dose of DDE exposure,
    denoted by xi.

8
Generalized Linear Model(GLM)
  • Likelihood function of yi
  • Choosing the canonical link
    , the likelihood function becomes

9
GLMM Basic Model
  • Returning to the DDE example, 19 labs all over
    the world participated this bioassay.
  • There are unmeasured factors that vary between
    the different labs.
  • For example, rodent diet.
  • GLMM is an extension of the generalized linear
    model by adding random effects to the linear
    predictor (Schall 1991).

10
GLMM Basic Model
  • The previous linear predictor is modified as
  • ,
  • where index lab,
    index rat within lab .
  • are fixed effects - parameters common to
    all rats.
  • are random effects - deviations for lab i.

11
GLMM Basic Model
  • If we choose xij zij , then all the regression
    coefficients are assumed to vary for the
    different labs.
  • If we choose zij 1, then only the intercept
    varies for the different labs (random intercept
    model).

12
GLMM - Implementation
  • Gibbs sampling
  • Disadvantage slow convergence.
  • Solution hierarchical centering
    reparametrisation (Gelfand 1994 Gelfand 1995)
  • Deterministic methods are only available for
    logit and probit models.
  • EM algorithm (Anderson 1985)
  • Simplex method (Im 1988)

13
GLMM Advanced Applications
  • Nested GLMM within each lab, rats were group
    housed with three cats per cage.
  • let i index lab, j index cage and k index rat.
  • Crossed GLMM for all labs, four dose protocols
    were applied on different rats.
  • let i index lab, j index rat and k indicate
    the protocol applied on rat i,j.

14
GLMM Advanced Applications
  • Nested GLMM within each lab, rats were group
    housed with three cats per cage.
  • Two-level GLMM
  • level I lab, level II cage.
  • Crossed GLMM for all labs, four dose protocols
    were applied on different rats.
  • Rats are sorted into 19 groups by lab.
  • Rats are sorted into 4 groups by protocol.

15
GLMM Advanced Applications
  • Temporal/spatial statistics
  • Account for correlation between the random
    effects at different times/locations.
  • Dynamic latent variable model (Dunson 2003)
  • Let i index patient and t index follow-up
    time,

16
GLMM Advanced Applications
  • Spatially varying coefficient processes (Gelfand
    2003) random effects are modeled as spatially
    correlated process.

Possible application A landmine field where
landmines tend to be close together.
17
Bayesian Feature Selection in GLMM
  • Simultaneous selection of fixed and random
    effects in GLMM (Cai and Dunson 2005)
  • Mixture prior

18
Bayesian Feature Selection in GLMM
  • Fixed effects choose mixture priors for the
    fixed effects coefficients.
  • Random effects reparameterization
  • LDU decomposition of the random effect covariance
  • Choose mixture prior for the elements in the
    diagonal matrix.

19
Missing Identification in GLMM
  • Data table of DDE bioassay
  • What if the first column is missing?
  • Unusual case in statistics, so few people work on
    it.
  • But this is the problem we have to solve for
    concept drift.

20
Concept Drift
  • Primary data
  • Auxiliary data
  • If we treat the drift variable as random
    variable, concept drift is a random intercept
    model - a special case of GLMM.

21
Clustering in Concept Drift
K 51 clusters (including 0) out of 300
auxiliary data points Bin resolution 1
22
Clustering in Concept Drift
  • There are intrinsic clusters in auxiliary data
    with respect to drift value.
  • The simplest explanation is best.
  • Occam Razor
  • Why dont we instead give each cluster a
    random effect variable?

23
Clustering in Concept Drift
  • In usual statistics applications, we know which
    individuals share the same random effect .
  • However, in concept drift, we do not know which
    individuals (data points or features) share the
    same random-intercept.
  • Can we train the classifier and cluster the
    auxiliary data simultaneously? This is a new
    problem we aim to solve.

24
Clustering in Concept Drift
  • How many clusters (K) should we include in our
    model?
  • Does choosing K actually make sense?
  • Is there a better way?

25
  • Part II
  • Nonparametric Method

26
Nonparametric method
  • Parametric method the forms of the underlying
    density functions were known.
  • Nonparametric method is a wide category, e.g. NN,
    minmax, bootstrapping...
  • Nonparametric Bayesian method make use of the
    Bayesian calculus without prior parameterized
    knowledge.

27
Cornerstones of NBM
  • Dirichlet process (DP)
  • allow flexible structures to be learned and
    allow sharing of statistical strength among sets
    of related structures.
  • Gaussian process (GP)
  • allow sharing in the context of multiple
    nonparametric regressions
  • (suggest to have a separate seminar on GP)

28
Chinese Restaurant Process
  • Chinese restaurant process (CRP) is a
    distribution on partitions of integers.
  • CRP is used to represent uncertainty over the
    number of components in a mixture model.

29
Chinese Restaurant Process
  • Unlimited number of tables
  • Each table has an unlimited capacity to seat
    customers.

30
Chinese Restaurant Process
The (m1)th subsequent customer sits at a table
drawn from the following distribution
where mi is the number of previous customers at
table i and is a parameter.
31
Chinese Restaurant Process
Example The probability that next customer sits
at table
32
Chinese Restaurant Process
  • CRP yields an exchangeable distribution on
    partitions of integers, i.e., the specific
    ordering of the customers is irrelevant.
  • An infinite set of random variables is said to be
    infinitely exchangeable if for every finite
    subset , we have

for any permutation .
33
Dirichlet Process
G0 any probability measure on the reals,
partition.
A process is a Dirichlet process if the following
equation holds for all partitions
where is a concentration parameter.
Note Dir Dirichlet distribution, DP -
Dirichlet process.
34
Dirichlet Process
  • Denote a sample from the Dirichlet process as
  • G is a distribution.
  • Denote a sample from the distribution G as

Graphical model for a DP generating the
parameters .
35
Dirichlet Process
  • Properties of DP

36
Dirichlet Process
  • The marginal probabilities for a new

This is Chinese restaurant process.
37
DP Mixtures
If F is a normal distribution, this is the a
Gaussian mixture model.
38
Applications of DP
  • Infinite Gaussian Mixture Model (Rasmussen 2000)
  • Infinite Hidden Markov Model (Beal 2002)
  • Hierarchical Topic Models and the Nested Chinese
    Restaurant Process (Blei 2004)

39
Implementation of DP
  • Gibbs sampling
  • If G0 is a conjugate prior for the likelihood
    given by F (Escobar 1995)
  • Non-conjugate prior (Neal 1998)

40
Variational Inference for DPM
  • The goal is to compute the predictive density
    under DP mixture
  • Also, we minimized the KL distance between p and
    a variational distribution q.
  • This algorithm is based on the stick-breaking
    representation of DP.
  • (I would suggest to have a separate seminar on
    stick-breaking view of DP and variational DP.)

41
Open Questions
  • Can we apply ideas of infinite models beyond
    identifying the number of states or components in
    a mixture?
  • Under what conditions can we expect these models
    to give consistent estimates of densities?
  • ...
  • Specified to our problem Non conjugate due to
    sigmoid function
Write a Comment
User Comments (0)
About PowerShow.com