Hidden Process Models - PowerPoint PPT Presentation

About This Presentation

Title:

Hidden Process Models

Description:

Motivation: fMRI (functional Magnetic Resonance ... fMRI Data: High-Dimensional and Sparse. Imaged once per second ... Y,Wold,Qold,sold) P(C=4|Y,Wold,Qold, ... – PowerPoint PPT presentation

Number of Views:64

Avg rating:3.0/5.0

Slides: 42

Provided by: RAH56

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Hidden Process Models

1
Hidden Process Models

Rebecca Hutchinson
May 26, 2006
Thesis Proposal
Carnegie Mellon University
Computer Science Department

2
Talk Outline

Motivation fMRI (functional Magnetic Resonance
Imaging) data.
Problem new kind of probabilistic time series
modeling.
Solution Hidden Process Models (HPMs).
Results preliminary experiments with HPMs.
Extensions proposed improvements to HPMs.

3
fMRI Data High-Dimensional and Sparse

Imaged once per second for 15-20 minutes
Only a few dozen trials (i.e. training examples)
10,000-15,000 voxels per image

4
The Hemodynamic Response

fMRI measures an indirect, temporally blurred
correlate of neural activity.
Also called BOLD response Blood Oxygen Level
Dependent.

Signal Amplitude
Subject reads a word and indicates whether it is
a noun or verb in less than a second.
Time (seconds)
5
Study Pictures and Sentences
Press Button
View Picture
Read Sentence
Read Sentence
View Picture
Fixation
Rest
4 sec.
8 sec.
t0

Task Decide whether sentence describes picture
correctly, indicate with button press.
13 normal subjects, 40 trials per subject.
Sentences and pictures describe 3 symbols , ,
and , using above, below, not above, not
below.
Images are acquired every 0.5 seconds.

6
Motivation

To track cognitive processes over time.
Estimate process hemodynamic responses.
Estimate process timings.
Allowing processes that do not directly
correspond to the stimuli timing is a key
contribution of HPMs!
To compare hypotheses of cognitive behavior.

7
The Thesis

It is possible to
simultaneously
estimate the parameters and timing of
temporally and spatially overlapped,
partially observed processes
(using many features and a small number of noisy
training examples).
We are developing a class of probabilistic models
called Hidden Process Models (HPMs) for this task.

8
Related Work in fMRI

General Linear Model (GLM)
Must assume timing of process onset to estimate
hemodynamic response
Dale99
4-CAPS and ACT-R
Predict fMRI data rather than learning parameters
of processes from the data
Anderson04, Just99

9
Related Work in Machine Learning

Classification of windows of fMRI data
Does not typically estimate hemodynamic response
Cox03, Haxby01, Mitchell04
Dynamic Bayes Networks
HPM assumptions/constraints are difficult to
encode in DBNs
Murphy02, Ghahramani97

10
HPM Modeling Assumptions

Model latent time series at process-level.
Process instances share parameters based on their
process types.
Use prior knowledge from experiment design.
Sum process responses linearly.

11
HPM Formalism (Hutchinson06)

HPM ltH,C,F,Sgt
H lth1,,hHgt, a set of processes
h ltW,d,W,Qgt, a process
W response signature
d process duration
W allowable offsets
Q multinomial parameters over values in W
C ltc1,, cCgt, a set of configurations
c ltp1,,pLgt, a set of process instances
lth,l,Ogt, a process instance
h process ID
associated stimulus landmark
O offset (takes values in Wh)
ltf1,,fCgt, priors over C
S lts1,,sVgt, standard deviation for each voxel

Notation parameter(entity) e.g. W(h) is the
response signature of process h, and O(p) is
the offset of process instance p.
12
Process 1 ReadSentence Response signature
W Duration d 11 sec. Offsets W 0,1
P(?) q0,q1
Process 2 ViewPicture Response signature
W Duration d 11 sec. Offsets W 0,1
P(?) q0,q1
Processes of the HPM
v1 v2
v1 v2
Input stimulus ?
sentence
picture
Timing landmarks ?
Process instance ?2 Process h 2 Timing
landmark ?2 Offset time O 1 sec (Start
time ?2 O)
?1
?2
One configuration c of process instances
?1, ?2, ?k (with prior fc)
?1
?2
?
Predicted mean
N(0,s1)
v1 v2
N(0,s2)
13
HPMs the graphical model
Configuration c
Timing Landmark l
The set C of configurations constrains the
joint distribution on h(k),o(k) " k.
Process Type h
Offset o
Start Time s
S
p1,,pk
observed
unobserved
Yt,v
t1,T, v1,V
14
Encoding Experiment Design
Processes
Input stimulus ?
Constraints Encoded h(p1) 1,2 h(p2)
1,2 h(p1) ! h(p2) o(p1) 0 o(p2) 0 h(p3)
3 o(p3) 1,2
ReadSentence 1
ViewPicture 2
Timing landmarks ?
?2
?1
Decide 3
Configuration 1
Configuration 2
Configuration 3
Configuration 4
15
Inference

Over configurations
Choose the most likely configuration, where

Cconfiguration, Yobserved data, Dinput
stimuli, HPMmodel

16
Learning

Parameters to learn
Response signature W for each process
Timing distribution Q for each process
Standard deviation s for each voxel
Case 1 Known Configuration.
Least squares problem to estimate W.
Standard MLEs for Q and s.
Case 2 Unknown Configuration.
Expectation-Maximization (EM) algorithm to
estimate W and Q.
E step estimate a probability distribution over
configurations.
M step update estimates of W (using reweighted
least squares) and Q (using standard MLEs) based
on the E step.
Standard MLEs for s.

17
Case 1 Known Configuration

Following Dale99, use GLM.
The (known) configuration generates a TxD
convolution matrix X

DShd(h)
d(1)
d(3)
d(2)
Configuration p1 h1, start1 p2 h2,
start2 p3 h3, start2
1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1
0 1 0 0 1 0 0 0 0 0 0 1 0 0 1
t1 t2 t3 t4
T
For this example, d(1)d(2)d(3)3.
18
Case 1 Known Configuration
V
d(1)
d(3)
d(2)
V
1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1
0 1 0 0 1 0 0 0 0 0 0 1 0 0 1
W(1)
d(1)

d(2)
W(2)
Y
T
W(3)
d(3)
19
Case 2 Unknown Configuration

E step Use the inference equation to estimate a
probability distribution over the set of
configurations.
M step Use the probabilities computed in the
E-step to form weights for the least-squares
procedure for estimating W.

20
Case 2 Unknown Configuration

Convolution matrix models several choices for
each time point.

Configurations for each row
d(1)
d(3)
d(2)
1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0
0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0
1 ...
t1 t1 t2 t2 t18 t18 t18 t18
3,4 1,2 3,4 1,2 3 4 1 2
TgtT
21
Case 2 Unknown Configuration

Weight each row with probabilities from E-step.

d(1)
d(3)
d(2)
Configurations
Weights
e1 e2 e3 e4
3,4 1,2 3,4 1,2
1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0

Y

W
e1 P(C3Y,Wold,Qold,sold) P(C4Y,Wold,Qold,s
old)
22
Learned HPM with 3 processes (S,P,D), and d13sec.
S
S
P
P
D?
D?
observed
23
Results Model Selection

Use cross-validation to choose a model.
GNB Gaussian Naïve Bayes
HPM-2 HPM with ViewPicture, ReadSentence
HPM-3 HPM-2 Decide

Accuracy predicting picture vs. sentence (random
0.5)
Data log likelihood
24
Synthetic Data Results

Timing of synthetic data mimics the real data,
but we have ground truth.
Can use to investigate effects of
signal to noise ratio
number of voxels
number of training examples
on
training time
cross-validated classification accuracy
cross-validated data log-likelihood

25
(No Transcript)
26
Recall Motivation

To track cognitive processes over time.
Estimate process hemodynamic responses.
Estimate process timings.
Allowing processes that do not directly
correspond to the stimuli timing is a key
contribution of HPMs!
To compare hypotheses of cognitive behavior.

27
Proposed Work

Goals
Increase efficiency.
fewer parameters
better accuracy from fewer examples
faster inference and learning
Handle larger, more complex problems.
more voxels
more processes
fewer assumptions
Research areas
Model Parameterization
Timing Constraints
Learning Under Uncertainty

28
Model Parameterization

Goals
Improve biological plausibility of learned
responses.
Decrease the number of parameters to be estimated
(improving sample complexity).
Tasks
Parameter sharing across voxels
Parametric form for response signatures
Temporally smoothed response signatures

29
Timing Constraints

Goals
Specify experiment design domain knowledge more
efficiently.
Improve the computational and sample complexities
of the HPM algorithms.
Tasks
Formalize limitations in terms of fMRI experiment
design.
Improve the specification of timing constraints.
Develop more efficient exact and/or approximate
algorithms.

30
Learning Under Uncertainty

Goals
Relax the current modeling assumptions.
Allow more types of uncertainty about the data.
Tasks
Learn process durations.
Learn the number of processes in the model.

31
HPM Parameter Sharing (Niculescu05)
Special case HPMs with known configuration.
Parameter reduction from d(h) V to d(h)
V.
Scaling parameter per voxel per process.
New mean for voxel v at time t
No more voxel index on weights.
32
Extension to Unknown Timing

Simplifying assumptions
No clustering. All voxels share a response.
Voxels that share a response for one process
share a response for all processes.
Algorithm notes
Residual is linear in shared response parameters
and in scaling parameters, so minimize
iteratively.
Empirically, convergence occurs within 3-5
iterations.

33
Iterative M-step Step 1

Using current estimates of S, re-estimate W.

d(1)
d(3)
d(2)
s110 0 0 0 0 0 0 0 0 0 0 s21 0 0 0 0 0 0
s11 0 0 0 0 0 0 0 0 0 0 0 s21 0 0 0 0

W(1) W(2) W(3)
New shape TV x 1
No more voxel index here. Single column of
parameters describing the shared responses.
TxD for v1

Y
s120 0 0 0 0 0 0 0 0 0 0 s22 0 0 0 0 0 0
s12 0 0 0 0 0 0 0 0 0 0 0 s22 0 0 0 0

TxD for v2

Replace ones of convolution matrix with shv.
Repeat for all v.
34
Iterative M-step Step 2

Using current estimates of W, re-estimate S.

d(1)
d(3)
d(2)
s11 s1V s11 s1V s11 .... s1V s21
s2V s21 ... s2V sH1 .... sHV
w110 0 0 0 0 0 0 0 0 0 0 w210 0 0 0 0 0
w12 0 0 0 0 0 0 0 0 0 0 0 w22 0 0 0 0

Y
d(1)

Each column has the scaling parameters for a
voxel. The parameter for each process is
replicated over its duration.
Original size data matrix.
Original size convolution matrix. Ones replaced
with W estimates.

Need to constrain these parameter sets to be
equal.
35
Next Step?

Implement this approach.
Anticipated memory issues
Replicating the convolution matrix for each voxel
in step 1.
Working on exploiting sparsity/structure of these
matrices.
Add clustering back in
Adapt for other parameterizations of response
signatures

36
Response Signature Parameters

Temporal smoothing
Gamma functions
Hemodynamic basis functions

37
Temporally Smooth Responses

Idea Add a regularizer to the loss function to
penalize large jumps between time points.
e.g. minimize (Y-XW)2 lSt(Wt-Wt-1)2
choose l by cross-validation
should be a straightforward extension to the
optimization code
Concerns
this adds l instead of reducing the number of
parameters!

38
Gamma-shaped Responses

Idea Use a gamma function with 3 parameters for
each process response signature (Boynton96).
a controls amplitude
t controls width of peak
n controls delay of peak
Questions
Are gamma functions a reasonable modeling
assumption?
Details of how to fit parameters in M-step?

n
Signal Amplitude
t
a
Seconds
39
Hemodynamic Basis Functions

Idea Process response signatures are weighted
sum of basis functions.
parameters are weights on n basis functions
e.g. gammas with different sets of parameters
learn process durations for free with variable
length basis functions
share basis functions across voxels and processes
Questions
How to choose/learn basis? (Dale99)

40
Schedule

August 2006
Parameter sharing.
Progress on model parameterization.
December 2006
Improved expression of timing constraints.
Corresponding updates to HPM algorithms.
June 2007
Application of HPMs to an open cognitive science
problem.
December 2007
Projected completion.

41
References
John R. Anderson, Daniel Bothell, Michael D.
Byrne, Scott Douglass, Christian Lebiere, and
Yulin Qin. An integrated theory of the mind.
Psychological Review, 111(4)10361060, 2004.
http//act-r.psy.cmu.edu/about/. Geoffrey M.
Boynton, Stephen A. Engel, Gary H. Glover, and
David J. Heeger. Linear systems analysis of
functional magnetic resonance imaging in human
V1. The Journal of Neuroscience,
16(13)42074221, 1996. David D. Cox and Robert
L. Savoy. Functional magnetic resonance imaging
(fMRI) brain reading detecting and classifying
distributed patterns of fMRI activity in human
visual cortex. NeuroImage, 19261270,
2003. Anders M. Dale. Optimal experimental
design for event-related fMRI. Human Brain
Mapping, 8109114, 1999. Zoubin Ghahramani and
Michael I. Jordan. Factorial hidden Markov
models. Machine Learning, 29245275,
1997. James V. Haxby, M. Ida Gobbini, Maura L.
Furey, Alumit Ishai, Jennifer L. Schouten, and
Pietro Pietrini. Distributed and overlapping
representations of faces and objects in ventral
temporal cortex. Science, 29324252430,
September 2001. Rebecca A. Hutchinson, Tom M.
Mitchell, and Indrayana Rustandi. Hidden Process
Models. To appear at International Conference on
Machine Learning, 2006. Marcel Adam Just,
Patricia A. Carpenter, and Sashank Varma.
Computational modeling of high-level cognition
and brain function. Human Brain Mapping,
8128136, 1999. http//www.ccbi.cmu.edu/project
10modeling4CAPS.htm. Tom M. Mitchell et al.
Learning to decode cognitive states from brain
images. Machine Learning, 57145175,
2004. Kevin P. Murphy. Dynamic bayesian
networks. To appear in Probabilistic Graphical
Models, M. Jordan, November 2002. Radu Stefan
Niculescu. Exploiting Parameter Domain Knowledge
for Learning in Bayesian Networks. PhD thesis,
Carnegie Mellon University, July 2005.
CMU-CS-05-147.

Write a Comment

User Comments (0)