Hidden Process Models - PowerPoint PPT Presentation

About This Presentation
Title:

Hidden Process Models

Description:

Motivation: fMRI (functional Magnetic Resonance ... fMRI Data: High-Dimensional and Sparse. Imaged once per second ... Y,Wold,Qold,sold) P(C=4|Y,Wold,Qold, ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 42
Provided by: RAH56
Learn more at: http://www.cs.cmu.edu
Category:
Tags: hidden | models | process | wold

less

Transcript and Presenter's Notes

Title: Hidden Process Models


1
Hidden Process Models
  • Rebecca Hutchinson
  • May 26, 2006
  • Thesis Proposal
  • Carnegie Mellon University
  • Computer Science Department

2
Talk Outline
  • Motivation fMRI (functional Magnetic Resonance
    Imaging) data.
  • Problem new kind of probabilistic time series
    modeling.
  • Solution Hidden Process Models (HPMs).
  • Results preliminary experiments with HPMs.
  • Extensions proposed improvements to HPMs.

3
fMRI Data High-Dimensional and Sparse
  • Imaged once per second for 15-20 minutes
  • Only a few dozen trials (i.e. training examples)
  • 10,000-15,000 voxels per image

4
The Hemodynamic Response
  • fMRI measures an indirect, temporally blurred
    correlate of neural activity.
  • Also called BOLD response Blood Oxygen Level
    Dependent.

Signal Amplitude
Subject reads a word and indicates whether it is
a noun or verb in less than a second.
Time (seconds)
5
Study Pictures and Sentences
Press Button
View Picture
Read Sentence
Read Sentence
View Picture
Fixation
Rest
4 sec.
8 sec.
t0
  • Task Decide whether sentence describes picture
    correctly, indicate with button press.
  • 13 normal subjects, 40 trials per subject.
  • Sentences and pictures describe 3 symbols , ,
    and , using above, below, not above, not
    below.
  • Images are acquired every 0.5 seconds.

6
Motivation
  • To track cognitive processes over time.
  • Estimate process hemodynamic responses.
  • Estimate process timings.
  • Allowing processes that do not directly
    correspond to the stimuli timing is a key
    contribution of HPMs!
  • To compare hypotheses of cognitive behavior.

7
The Thesis
  • It is possible to
  • simultaneously
  • estimate the parameters and timing of
  • temporally and spatially overlapped,
  • partially observed processes
  • (using many features and a small number of noisy
    training examples).
  • We are developing a class of probabilistic models
    called Hidden Process Models (HPMs) for this task.

8
Related Work in fMRI
  • General Linear Model (GLM)
  • Must assume timing of process onset to estimate
    hemodynamic response
  • Dale99
  • 4-CAPS and ACT-R
  • Predict fMRI data rather than learning parameters
    of processes from the data
  • Anderson04, Just99

9
Related Work in Machine Learning
  • Classification of windows of fMRI data
  • Does not typically estimate hemodynamic response
  • Cox03, Haxby01, Mitchell04
  • Dynamic Bayes Networks
  • HPM assumptions/constraints are difficult to
    encode in DBNs
  • Murphy02, Ghahramani97

10
HPM Modeling Assumptions
  • Model latent time series at process-level.
  • Process instances share parameters based on their
    process types.
  • Use prior knowledge from experiment design.
  • Sum process responses linearly.

11
HPM Formalism (Hutchinson06)
  • HPM ltH,C,F,Sgt
  • H lth1,,hHgt, a set of processes
  • h ltW,d,W,Qgt, a process
  • W response signature
  • d process duration
  • W allowable offsets
  • Q multinomial parameters over values in W
  • C ltc1,, cCgt, a set of configurations
  • c ltp1,,pLgt, a set of process instances
  • lth,l,Ogt, a process instance
  • h process ID
  • associated stimulus landmark
  • O offset (takes values in Wh)
  • ltf1,,fCgt, priors over C
  • S lts1,,sVgt, standard deviation for each voxel

Notation parameter(entity) e.g. W(h) is the
response signature of process h, and O(p) is
the offset of process instance p.
12
Process 1 ReadSentence Response signature
W Duration d 11 sec. Offsets W 0,1
P(?) q0,q1
Process 2 ViewPicture Response signature
W Duration d 11 sec. Offsets W 0,1
P(?) q0,q1
Processes of the HPM
v1 v2
v1 v2
Input stimulus ?
sentence
picture
Timing landmarks ?
Process instance ?2 Process h 2 Timing
landmark ?2 Offset time O 1 sec (Start
time ?2 O)
?1
?2
One configuration c of process instances
?1, ?2, ?k (with prior fc)
?1
?2
?
Predicted mean
N(0,s1)
v1 v2
N(0,s2)
13
HPMs the graphical model
Configuration c
Timing Landmark l
The set C of configurations constrains the
joint distribution on h(k),o(k) " k.
Process Type h
Offset o
Start Time s
S
p1,,pk
observed
unobserved
Yt,v
t1,T, v1,V
14
Encoding Experiment Design
Processes
Input stimulus ?
Constraints Encoded h(p1) 1,2 h(p2)
1,2 h(p1) ! h(p2) o(p1) 0 o(p2) 0 h(p3)
3 o(p3) 1,2
ReadSentence 1
ViewPicture 2
Timing landmarks ?
?2
?1
Decide 3
Configuration 1
Configuration 2
Configuration 3
Configuration 4
15
Inference
  • Over configurations
  • Choose the most likely configuration, where
  • Cconfiguration, Yobserved data, Dinput
    stimuli, HPMmodel

16
Learning
  • Parameters to learn
  • Response signature W for each process
  • Timing distribution Q for each process
  • Standard deviation s for each voxel
  • Case 1 Known Configuration.
  • Least squares problem to estimate W.
  • Standard MLEs for Q and s.
  • Case 2 Unknown Configuration.
  • Expectation-Maximization (EM) algorithm to
    estimate W and Q.
  • E step estimate a probability distribution over
    configurations.
  • M step update estimates of W (using reweighted
    least squares) and Q (using standard MLEs) based
    on the E step.
  • Standard MLEs for s.

17
Case 1 Known Configuration
  • Following Dale99, use GLM.
  • The (known) configuration generates a TxD
    convolution matrix X

DShd(h)
d(1)
d(3)
d(2)
Configuration p1 h1, start1 p2 h2,
start2 p3 h3, start2
1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1
0 1 0 0 1 0 0 0 0 0 0 1 0 0 1
t1 t2 t3 t4
T
For this example, d(1)d(2)d(3)3.
18
Case 1 Known Configuration
V
d(1)
d(3)
d(2)
V
1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1
0 1 0 0 1 0 0 0 0 0 0 1 0 0 1
W(1)
d(1)

d(2)
W(2)
Y
T
W(3)
d(3)
19
Case 2 Unknown Configuration
  • E step Use the inference equation to estimate a
    probability distribution over the set of
    configurations.
  • M step Use the probabilities computed in the
    E-step to form weights for the least-squares
    procedure for estimating W.

20
Case 2 Unknown Configuration
  • Convolution matrix models several choices for
    each time point.

Configurations for each row
d(1)
d(3)
d(2)
1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0
0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0
1 ...
t1 t1 t2 t2 t18 t18 t18 t18
3,4 1,2 3,4 1,2 3 4 1 2
TgtT
21
Case 2 Unknown Configuration
  • Weight each row with probabilities from E-step.

d(1)
d(3)
d(2)
Configurations
Weights
e1 e2 e3 e4
3,4 1,2 3,4 1,2
1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0

Y

W
e1 P(C3Y,Wold,Qold,sold) P(C4Y,Wold,Qold,s
old)
22
Learned HPM with 3 processes (S,P,D), and d13sec.
S
S
P
P
D?
D?
observed
23
Results Model Selection
  • Use cross-validation to choose a model.
  • GNB Gaussian Naïve Bayes
  • HPM-2 HPM with ViewPicture, ReadSentence
  • HPM-3 HPM-2 Decide

Accuracy predicting picture vs. sentence (random
0.5)
Data log likelihood
24
Synthetic Data Results
  • Timing of synthetic data mimics the real data,
    but we have ground truth.
  • Can use to investigate effects of
  • signal to noise ratio
  • number of voxels
  • number of training examples
  • on
  • training time
  • cross-validated classification accuracy
  • cross-validated data log-likelihood

25
(No Transcript)
26
Recall Motivation
  • To track cognitive processes over time.
  • Estimate process hemodynamic responses.
  • Estimate process timings.
  • Allowing processes that do not directly
    correspond to the stimuli timing is a key
    contribution of HPMs!
  • To compare hypotheses of cognitive behavior.

27
Proposed Work
  • Goals
  • Increase efficiency.
  • fewer parameters
  • better accuracy from fewer examples
  • faster inference and learning
  • Handle larger, more complex problems.
  • more voxels
  • more processes
  • fewer assumptions
  • Research areas
  • Model Parameterization
  • Timing Constraints
  • Learning Under Uncertainty

28
Model Parameterization
  • Goals
  • Improve biological plausibility of learned
    responses.
  • Decrease the number of parameters to be estimated
    (improving sample complexity).
  • Tasks
  • Parameter sharing across voxels
  • Parametric form for response signatures
  • Temporally smoothed response signatures

29
Timing Constraints
  • Goals
  • Specify experiment design domain knowledge more
    efficiently.
  • Improve the computational and sample complexities
    of the HPM algorithms.
  • Tasks
  • Formalize limitations in terms of fMRI experiment
    design.
  • Improve the specification of timing constraints.
  • Develop more efficient exact and/or approximate
    algorithms.

30
Learning Under Uncertainty
  • Goals
  • Relax the current modeling assumptions.
  • Allow more types of uncertainty about the data.
  • Tasks
  • Learn process durations.
  • Learn the number of processes in the model.

31
HPM Parameter Sharing (Niculescu05)
Special case HPMs with known configuration.
Parameter reduction from d(h) V to d(h)
V.
Scaling parameter per voxel per process.
New mean for voxel v at time t
No more voxel index on weights.
32
Extension to Unknown Timing
  • Simplifying assumptions
  • No clustering. All voxels share a response.
  • Voxels that share a response for one process
    share a response for all processes.
  • Algorithm notes
  • Residual is linear in shared response parameters
    and in scaling parameters, so minimize
    iteratively.
  • Empirically, convergence occurs within 3-5
    iterations.

33
Iterative M-step Step 1
  • Using current estimates of S, re-estimate W.

d(1)
d(3)
d(2)
s110 0 0 0 0 0 0 0 0 0 0 s21 0 0 0 0 0 0
s11 0 0 0 0 0 0 0 0 0 0 0 s21 0 0 0 0

W(1) W(2) W(3)
New shape TV x 1
No more voxel index here. Single column of
parameters describing the shared responses.
TxD for v1

Y
s120 0 0 0 0 0 0 0 0 0 0 s22 0 0 0 0 0 0
s12 0 0 0 0 0 0 0 0 0 0 0 s22 0 0 0 0

TxD for v2

Replace ones of convolution matrix with shv.
Repeat for all v.
34
Iterative M-step Step 2
  • Using current estimates of W, re-estimate S.

d(1)
d(3)
d(2)
s11 s1V s11 s1V s11 .... s1V s21
s2V s21 ... s2V sH1 .... sHV
w110 0 0 0 0 0 0 0 0 0 0 w210 0 0 0 0 0
w12 0 0 0 0 0 0 0 0 0 0 0 w22 0 0 0 0

Y
d(1)

Each column has the scaling parameters for a
voxel. The parameter for each process is
replicated over its duration.
Original size data matrix.
Original size convolution matrix. Ones replaced
with W estimates.

Need to constrain these parameter sets to be
equal.
35
Next Step?
  • Implement this approach.
  • Anticipated memory issues
  • Replicating the convolution matrix for each voxel
    in step 1.
  • Working on exploiting sparsity/structure of these
    matrices.
  • Add clustering back in
  • Adapt for other parameterizations of response
    signatures

36
Response Signature Parameters
  • Temporal smoothing
  • Gamma functions
  • Hemodynamic basis functions

37
Temporally Smooth Responses
  • Idea Add a regularizer to the loss function to
    penalize large jumps between time points.
  • e.g. minimize (Y-XW)2 lSt(Wt-Wt-1)2
  • choose l by cross-validation
  • should be a straightforward extension to the
    optimization code
  • Concerns
  • this adds l instead of reducing the number of
    parameters!

38
Gamma-shaped Responses
  • Idea Use a gamma function with 3 parameters for
    each process response signature (Boynton96).
  • a controls amplitude
  • t controls width of peak
  • n controls delay of peak
  • Questions
  • Are gamma functions a reasonable modeling
    assumption?
  • Details of how to fit parameters in M-step?

n
Signal Amplitude
t
a
Seconds
39
Hemodynamic Basis Functions
  • Idea Process response signatures are weighted
    sum of basis functions.
  • parameters are weights on n basis functions
  • e.g. gammas with different sets of parameters
  • learn process durations for free with variable
    length basis functions
  • share basis functions across voxels and processes
  • Questions
  • How to choose/learn basis? (Dale99)

40
Schedule
  • August 2006
  • Parameter sharing.
  • Progress on model parameterization.
  • December 2006
  • Improved expression of timing constraints.
  • Corresponding updates to HPM algorithms.
  • June 2007
  • Application of HPMs to an open cognitive science
    problem.
  • December 2007
  • Projected completion.

41
References
John R. Anderson, Daniel Bothell, Michael D.
Byrne, Scott Douglass, Christian Lebiere, and
Yulin Qin. An integrated theory of the mind.
Psychological Review, 111(4)10361060, 2004.
http//act-r.psy.cmu.edu/about/. Geoffrey M.
Boynton, Stephen A. Engel, Gary H. Glover, and
David J. Heeger. Linear systems analysis of
functional magnetic resonance imaging in human
V1. The Journal of Neuroscience,
16(13)42074221, 1996. David D. Cox and Robert
L. Savoy. Functional magnetic resonance imaging
(fMRI) brain reading detecting and classifying
distributed patterns of fMRI activity in human
visual cortex. NeuroImage, 19261270,
2003. Anders M. Dale. Optimal experimental
design for event-related fMRI. Human Brain
Mapping, 8109114, 1999. Zoubin Ghahramani and
Michael I. Jordan. Factorial hidden Markov
models. Machine Learning, 29245275,
1997. James V. Haxby, M. Ida Gobbini, Maura L.
Furey, Alumit Ishai, Jennifer L. Schouten, and
Pietro Pietrini. Distributed and overlapping
representations of faces and objects in ventral
temporal cortex. Science, 29324252430,
September 2001. Rebecca A. Hutchinson, Tom M.
Mitchell, and Indrayana Rustandi. Hidden Process
Models. To appear at International Conference on
Machine Learning, 2006. Marcel Adam Just,
Patricia A. Carpenter, and Sashank Varma.
Computational modeling of high-level cognition
and brain function. Human Brain Mapping,
8128136, 1999. http//www.ccbi.cmu.edu/project
10modeling4CAPS.htm. Tom M. Mitchell et al.
Learning to decode cognitive states from brain
images. Machine Learning, 57145175,
2004. Kevin P. Murphy. Dynamic bayesian
networks. To appear in Probabilistic Graphical
Models, M. Jordan, November 2002. Radu Stefan
Niculescu. Exploiting Parameter Domain Knowledge
for Learning in Bayesian Networks. PhD thesis,
Carnegie Mellon University, July 2005.
CMU-CS-05-147.
Write a Comment
User Comments (0)
About PowerShow.com