Title: DIRICHLET PROCESS MIXTURE MODELS WITH MULTIPLE MODALITIES
1DIRICHLET PROCESS MIXTURE MODELS WITH MULTIPLE
MODALITIES John Paisley and Lawrence
Carin Department of Electrical Computer
Engineering Duke University, Durham NC 27708
USA jwp4,lcarin_at_ee.duke.edu
Introduction
Application The Gaussian-HMM Mixture Model
Graphical Representations
. . .
Toy example Data were generated from three
Gaussians and two HMMs, with each observation one
of the six possible combinations. Mixing
separately will group data according to the
Gaussian or HMM, but mixing on all the data will
produce the correct six clusters.
The Dirichlet process is a useful means for
partitioning datasets into subgroups, or
clusters. It allows for a potentially infinite
number of clusters, while simultaneously
promoting sparseness. The output of a Dirichlet
process mixture model is the partitioning of data
observations into the correct number of groups,
as well as the corresponding parameters that
characterize each group. These parameters are
typically limited to a single distribution, or
modality. This can be too restrictive when an
observation is composed of multiple pieces of
data, each of which can be considered as coming
from separate data-generating processes. For
example, if an observation consists of both
time-series data and data in Rd. To resolve this
issue, we propose an extension to the Dirichlet
process, called the Dirichlet process with
product base measure (DP-PBM). This formulation
allows for all data in an observation to be used
in the mixing process, which can provide for a
more refined and meaningful partitioning into
subgroups.
. . .
. . .
The Dirichlet Process Mixture Model
Dirichlet Process with Product Base Measure
Interpolating Missing Data
A potential use of the DP-PBM framework is for
interpolating missing data. Consider the
likelihood function in its factorized form
The Dirichlet Process Mixture Model
Major League Baseball Dataset (obtained from
www.retrosheet.org) 2007 season, 252 players
with each observation consisting of X1 AVG
OBP SLGT X2 Sequence of length 300
Codebook 1. Strikeout 2. Fielded Out 3. Hit
A mixture model is composed of a set of locations
and probability weights for those locations. The
stick-breaking construction of the Dirichlet
process is given at right (rows 3-5), which is
used to constitute G DP(aG0). As can be seen,
the parameter a controls the number of
significant mixture components. Rows 1 and 2
constitute draws from the mixture model defined
by G. When two observations, Xi and Xk share the
same parameter, which occurs when ci ck, they
will exhibit similar statistical properties via
the distribution function F.
When performing inference, the missing data can
be skipped in calculating the component
membership probabilities. For example, if the Mth
piece of data is missing, the posterior of the
latent membership indicator is proportional to
Upon convergence, the distribution on this latent
indicator can be used along with the posterior
distribution on the parameters of interest to
form a mixture of distributions for the missing
data of interest.
MCMC Gibbs Sampler
The Dirichlet Process with Product Base Measure
For the MLB data, we built two models an HMM
mixture model and a Gaussian-HMM mixture model.
As a measure of performance, we took a weighted
average of the entropy of the Gaussian and the
HMM for each component. We use the empirical
covariance for the HMM mixture and estimate the
entropy using the original sequences for the HMMs.
To accommodate multiple data-generating
modalities, we extend the Dirichlet process to
multiple base distributions. Now, for each
component from which an observation is drawn
(indexed by ci), data is drawn from each of m
modalities according to distribution functions
Fm. The space over which the DP-PBM is defined
is