Inferring Data InterRelationships - PowerPoint PPT Presentation

About This Presentation
Title:

Inferring Data InterRelationships

Description:

... reflects a balance between the DP-based desire for sharing, constituted ... A DP prior on the parameters of a Gaussian model yields a GMM in which the number ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 48
Provided by: lawrenc9
Category:

less

Transcript and Presenter's Notes

Title: Inferring Data InterRelationships


1
Inferring Data Inter-Relationships Via Fast
Hierarchical Models
Lawrence Carin Duke University www.ece.duke.edu/l
carin
2
Sensor Deployed Previously Across Globe
Previous deployments
New deployment
Deploy to New Location. Can Algorithm Infer Which
Data from Past is Most Relevant for New Sensing
Task?
3
Semi-Supervised Active Learning
  • Enormous quantity of unlabeled data - exploit
    context via semi-supervised learning
  • Focus the analyst on most-informative data -
    active learning

4
Technology Employed Motivation
  • Appropriately exploit related data from previous
    experience over sensor lifetime
  • - Transfer learning
  • Place learning with labeled data in the context
    of unlabeled data, thereby
  • exploiting manifold information
  • - Semi-supervised learning
  • Reduce load on analyst only request labeled
    data on subset of data for which
  • label acquisition would be most informative
  • - Active learning

5
Bayesian Hierarchical Models Dirichlet Processes
  • Principled setting for transfer learning
  • Avoids problems with model selection
  • Number of mixture components
  • Number of HMM states

iGMM Rasmussan, 00, iHMM Teh et al., 04,06,
Escobar West, 95
6
Data Sharing Stick-Breaking View of DP 1/2
  • The Dirichlet process (DP) is a prior on a
    density function, i.e., G(T) DPa,Go(T)
  • One draw of G(T) from DPa,Go(T)

Sethuraman, 94
7
Data Sharing Stick-Breaking View of DP 2/2
1
  • As a ? 0, the more likely that Beta(1, a) yields
    large ?k , implying more sharing
  • a few larger sticks, with corresponding
    likely parameters
  • As a ? 8, sticks very small and roughly the same
    size, so reduces to Go

8
Non-Parametric Mixture Models
  • Data sample di drawn from a Gaussian/HMM with
    associated parameters
  • - Posterior on model parameters indicates which
    parameters are shared, yielding a
  • Gaussian/HMM mixture model no model selection
    on number of mixture components

9
Dirichlet Process as a Shared Prior
  • Cumulative set of data Dd1, d2, ,dn, with
    associated parameters
  • When parameters are shared then the associated
    data are also shared data sharing implies
  • learning from previous/other experiences ?
    Life-long learning
  • Posterior reflects a balance between the
    DP-based desire for sharing, constituted by the
  • prior ,
    against the likelihood function
  • that rewards parameters that match the data well

DP Desire for Sharing Parameters
Likelihoods Desire to Fit Data
Posterior Balances these Objectives
10
Hierarchical Dirichlet Process 1/2
  • A DP prior on the parameters of a Gaussian model
    yields a GMM in which the number
  • of mixture components need not be set a priori
    (non-parametric)
  • Assume we wish to build N GMMs, each designed
    using a DP prior
  • We link the N GMMs via an overarching DP hyper
    prior

Teh et al., 06
11
Hierarchical Dirichlet Process 2/2
  • Coefficients an,k represent the probability of
    transitioning from state n to state k
  • Naturally yields the structure of an HMM number
    of large amplitude coefficients an,k
  • implicitly determines the most-probable
    number of states

12
Computational Challenges in Performing Inference
  • We have the general challenge of estimating the
    posterior
  • The denominator is typically of high dimension
    (number of parameters in model), and
  • cannot be computed exactly in reasonable time
  • Approximations required

MCMC
Variational Bayes (VB)
Accuracy
Laplace
Computational Complexity
Blei Jordan, 05
13
Graphical Model of the nDP-iHMM
Ni, Dunson, Carin ICML 07
14
How Do You Convince Navy Data Search
Works? Validation Not as Simple as Text
Search Consider Special Kind of Acoustic Data
Music
15
Multi-Task HMM Learning
  • Assume we have N sequential data sets
  • Wish to learn HMM for each of the data sets
  • Believe that data can be shared between the
    learning tasks not independent task
  • All N HMMs learned jointly, with appropriate
    data sharing
  • Use of iHMM avoids the problem of selecting
    number of states in HMM
  • Validation on large music database VB yields
    fast inference

16
Demonstration Music Database 525 Jazz
975 Classical 997
Rock
Rock
Jazz
17
Classical
18
Inter-Task Similarity Matrix
19
Typical Recommendations from Three Genres
Classical Jazz Rock
20
(No Transcript)
21
(No Transcript)
22
Applications of Interest to Navy
  • Music search provides a fairly good objective
    demonstration of the technology
  • Other than use of acoustic/speech features
    (MFCCs), nothing in previous
  • analysis specifically tied to music simply
    data search
  • Use similar technology for underwater acoustic
    sensing (MCM) - generative
  • Use related technology for synthetic aperture
    radar and EO/IR detection
  • and classification discriminative
  • Technology delivered to NSWC Panama City, and
    demonstrated independently
  • on mission-relevant MCM data

23
Underwater Mine Counter Measures (MCM)
24
Generative Model - iHMM
Ni Carin, 07
25
(No Transcript)
26
(No Transcript)
27
Full Posterior on Number of HMM States
28
Anti-Submarine Warfare (ASW)
29
Design HMM for all Targets of Interest Over
Sensor Lifetime
30
State Sharing Between ASW Targets
31
Semi-Supervised Multi-Task Learning
32
Semi-Supervised Discriminative Multi-Task
Learning
  • Semi-supervised learning implemented via
    graphical techniques
  • Multi-task learning implemented via DP
  • Exploits all available data-driven context
  • - Data available from previous collections,
    labeled unlabeled
  • - Labeled and unlabeled data from current
    data set

33
Graph representation of partially labeled data
manifolds (1/2)
  • Construct the graph G(X,W), with the affinity
    matrix W, where the (i, j)-th element of W is
    defined by a Gaussian kernel
  • Define a Markov random walk on the graph by the
    transition matrix A,
  • where the (i, j)-th element
  • which gives the probability of walking
    from xi to xj by a single step
  • Markov random walk.
  • The one-step Markov random walk provides a local
    similarity measure between data points.

Lu, Liao, Carin 07 Szummer Jaakkola, 02
34
Graph representation (2/2)
  • To account for global similarity between data
    points, we consider a t-step random walk, where
    the transition matrix is given by A raised to the
    power of t
  • It was demonstrated1 that the t-step Markov
    random walk would result in a volume of paths
    connecting the data points in stead of the
    shortest path that are susceptible to noise
    thus it permits us to incorporate global manifold
    structure in the training data set.
  • The t-step neighborhood of xi is defined as the
    set of data points xj with
  • and denoted as

1 Tishby and Slonim, Data clustering by
Markovian relaxation and the information
bottleneck Method. NIPS 13, 2000
35
Semi-Supervised Learning Algorithm (1/2)
  • Neighborhood-based classifier Define the
    probability of label yi given the t-step
    neighborhood of xi as
  • where is probability
    of labeling yi given a single data point xj and
    is represented by a standard probabilistic
    classifier parameterized by
  • The label yi implicitly propagates over the
    neighborhood. Thus it is possible to learn a
    classifier with only a few labels present.

36
The Algorithm (2/2)
  • For binary classification problems, we choose the
    form of as logistic regression
    classifier
  • To enforce sparseness, we impose a normal prior
    with zero mean and diagonal precision matrix
    on , and each
    hyper-parameter has an independent Gamma prior.
  • Important for transfer learning The
    semi-supervised algorithm is inductive and
    parametric
  • Place a DP prior on parameters, shared among all
    tasks

37
Toy Data for Tasks 1-6
38
Sharing Data
Pooling tasks 1-3
Pooling tasks 1-6
39
(No Transcript)
40
Task similarity for MTL tasks 1-6
41
Navy-Relevant Data Synthetic Aperture Radar
(SAR) Data Collected At 19 Different Locations
Across USA
42
Real Radar Sensor Data
  • Data from 19 tasks or geographical regions
  • 10 of these regions are relatively highly
    foliated
  • 9 regions bare earth, or desert
  • Algorithm adaptively and autonomously clusters
    the task-dependent classifier
  • weights into two basic pools, which agree with
    truth
  • Active learning used to define labels of
    interest for the site under test
  • Other sites used as auxiliary data, in a
    life-long-learning setting

43
Supervised MTL JMLR 07
44
(No Transcript)
45
Previous deployments
New deployment
  • Classifier at new site placed appropriately
    within context of all available previous data
  • Both labeled and unlabeled data employed
  • Found that the algorithm relatively insensitive
    to particular labeled data selected
  • Validation with relatively large music database

46
Reconstruction of Random-Bars with hybrid CS.
Example (a) is from 3, and (b-c) are the
modified images from (a) by us to represent
similar tasks for simultaneous CS inversion. The
intensities of all the rectangles are randomly
permuted, and the positions of all the rectangles
are shifted by distances randomly sampled from a
uniform distribution of -10,10.
47
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com