# Learning Tree Conditional Random Fields - PowerPoint PPT Presentation

PPT – Learning Tree Conditional Random Fields PowerPoint presentation | free to view - id: 6a4a0e-ZTlkM

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## Learning Tree Conditional Random Fields

Description:

### Learning Tree Conditional Random Fields Joseph K. Bradley Carlos Guestrin TO DO: SAY FRACTION EDGES RECOVERED NOT PERCENT ! Global CMI OK; local CMI bad. – PowerPoint PPT presentation

Number of Views:6
Avg rating:3.0/5.0
Slides: 50
Provided by: S951951
Category:
Tags:
Transcript and Presenter's Notes

Title: Learning Tree Conditional Random Fields

1
Learning Tree Conditional Random Fields
• Carlos Guestrin

2
(Application from Palatucci et al., 2009)
X fMRI voxels
Y semantic features
• Metal?
• Found in house?
• ...

We want to model conditional correlations
Predict independently? Yi X, for all i
Image from http//en.wikipedia.org/wiki/FileFMRI.
jpg
3
Conditional Random Fields (CRFs)
• (Lafferty et al., 2001)

In fMRI, X 500 to 10,000 voxels
Pro Avoid modeling P(X)
4
Conditional Random Fields (CRFs)
Y4
Y3
Y2
Y1
Pro Avoid modeling P(X)
5
Conditional Random Fields (CRFs)
Y4
Y3
Y2
Y1
Pro Avoid modeling P(X)
6
Conditional Random Fields (CRFs)
Con Compute Z(x) for each inference
Pro Avoid modeling P(X)
7
Conditional Random Fields (CRFs)
Exact inference intractable in general. Approximat
e inference expensive.
Use tree CRFs!
Con Compute Z(x) for each inference
Pro Avoid modeling P(X)
8
Conditional Random Fields (CRFs)
Use tree CRFs!
Pro Fast, exact inference
Con Compute Z(x) for each inference
Pro Avoid modeling P(X)
9
CRF Structure Learning
Feature selection
Tree CRFs Fast, exact inference Avoid
modeling P(X)
10
CRF Structure Learning
(scalable)
Local inputs
Tree CRFs Fast, exact inference Avoid
modeling P(X)
11
This work
Goals
• Structured conditional models P(YX)
• Scalable methods
• Tree structures
• Local inputs Xij
• Max spanning trees
• Outline
• Gold standard
• Max spanning trees
• Generalized edge weights
• Heuristic weights
• Experiments synthetic fMRI

12
Related work
Method Feature selection? Tractable models?
Torralba et al. (2004) Boosted Random Fields Yes No
Schmidt et al. (2008) Block-L1 regularized pseudolikelihood No No
Shahaf et al. (2009) Edge weight low-treewidth model No Yes
• Vs. our work
• Choice of edge weights
• Local inputs

13
Chow-Liu
For generative models
14
Chow-Liu for CRFs?
For CRFs with global inputs
• Global CMI (Conditional Mutual Information)
• Pro Gold standard
• Con I(YiYj X) intractable for big X

15
Where now?
• Global CMI (Conditional Mutual Information)
• Pros Gold standard
• Cons I(YiYj X) intractable for big X
• Algorithmic framework
• Given data (y(i),x(i)).
• Given input mapping Yi ? Xi
• Weight potential edge (Yi,Yj) with Score(i,j)
• Choose max spanning tree

Local inputs!
16
Generalized edge scores
• Key step Weight edge (Yi,Yj) with Score(i,j).

Local Linear Entropy Scores Score(i,j)
linear combination of
entropies over Yi,Yj,Xi,Xj
E.g., Local Conditional Mutual Information
17
Generalized edge scores
• Key step Weight edge (Yi,Yj) with Score(i,j).

Local Linear Entropy Scores Score(i,j)
linear combination of
entropies over Yi,Yj,Xi,Xj
• Theorem
• Assume true P(YX) is tree CRF
• (w/ non-trivial parameters).
• ?No Local Linear Entropy Score can recover all
such tree CRFs
• (even with exact entropies).

18
Heuristics
• Outline
• Gold standard
• Max spanning trees
• Generalized edge weights
• Heuristic weights
• Experiments synthetic fMRI

? Piecewise likelihood ? Local CMI ? DCI
19
Piecewise likelihood (PWL)
Sutton and McCallum (2005,2007) PWL for
parameter learning Main idea Bound Z(X)
For tree CRFs, optimal parameters give
• Edge score w/ local inputs Xij
• Bounds log likelihood
• Fails on simple counterexample
• Helps explain other edge scores

20
Piecewise likelihood (PWL)
True P(Y,X)
21
Local Conditional Mutual Info
• Decomposable score w/ local inputs Xij
• Theorem Local CMI bounds log likelihood gain
• Does pretty well in practice
• Can fail with strong potentials

22
Local Conditional Mutual Info
True P(Y,X)
Strong potential ?
Y3
Y2
Y1
23
Decomposable Conditional Influence (DCI)
• Exact measure of gain for some edges
• Edge score w/ local inputs Xij
• Succeeds on counterexample
• Does best in practice

24
Experiments
Algorithmic details
• Given Data (yi,xi) input mapping Yi ? Xi
• Compute edge scores
• Choose max spanning tree
• Parameter learning
• Conjugate gradient on L2-regularized log
likelihood
• 10-fold CV to choose regularization

25
Synthetic experiments
P(YX)
P(X)
...
X1
X2
X3
Xn
• Experiments
• Binary Y,X tabular edge factors
• Use natural input mapping Yi ? Xi

26
Synthetic experiments
P(YX)
P(X)
Y4
Y3
Y2
Y1
Y5
intractable P(Y,X)
tractable P(Y,X)
• P(YX), P(X) chains trees
• P(Y,X) tractable intractable

F(Yij,Xij)
27
Synthetic experiments
P(YX)
Y1
Y2
Y3
Yn
...
cross factors
X1
X2
X3
Xn
• P(YX) chains trees
• P(Y,X) tractable intractable

F(Yij,Xij)
• With without cross-factors
• Associative (all positive alternating /-)
random factors

28
Synthetic vary train exs.
29
Synthetic vary train exs.
Tree Intractable P(Y,X) Associative F
(alternating /-) Y40 1000 test examples
30
Synthetic vary train exs.
Tree Intractable P(Y,X) Associative F
(alternating /-) Y40 1000 test examples
31
Synthetic vary train exs.
Tree Intractable P(Y,X) Associative F
(alternating /-) Y40 1000 test examples
32
Synthetic vary train exs.
Tree Intractable P(Y,X) Associative F
(alternating /-) Y40 1000 test examples
33
Synthetic vary train exs.
Tree Intractable P(Y,X) Associative F
(alternating /-) Y40 1000 test examples
34
Synthetic vary train exs.
35
Synthetic vary model size
Fixed 50 train exs., 1000 test exs.
36
fMRI experiments
X (500 fMRI voxels)
Y (218 semantic features)
predict
• Metal?
• Found in house?
• ...

Data, setup from Palatucci et al. (2009)
Zero-shot learning Can predict objects not
in training data (given decoding).
Image from http//en.wikipedia.org/wiki/FileFMRI.
jpg
37
fMRI experiments
X (500 fMRI voxels)
Y (218 semantic features)
predict
Input mapping Regressed Yi Y-i,X
Chose top K inputs
38
fMRI experiments
Accuracy (for zero-shot learning) Hold out
objects i,j. Predict Y(i), Y(j) If
Y(i) - Y(i)2 lt Y(j) - Y(i)2 then we
got i right.
39
fMRI experiments
Accuracy CRFs a bit worse
better
40
fMRI experiments
Accuracy CRFs a bit worse Log likelihood
CRFs better
better
41
fMRI experiments
Accuracy CRFs a bit worse Log likelihood
CRFs better Squared error CRFs better
better
42
fMRI experiments
Accuracy CRFs a bit worse Log likelihood
CRFs better Squared error CRFs better
better
43
Conclusion
• Scalable learning of CRF structure
• Analyzed edge scores for spanning tree methods
• Local Linear Entropy Scores imperfect
• Heuristics
• Pleasing theoretical properties
• Empirical successwe recommend DCI
• Future work
• Templated CRFs
• Learning edge score
• Assumptions on model/factors which give
learnability

Thank you!
44
Thank you!
• References
• M. Craven, D. DiPasquo, D. Freitag, A. McCallum,
T. Mitchell, K. Nigam, S. Slattery. Learning to
Extract Symbolic Knowledge from the World Wide
Web. AAAI 1998.
• Lafferty, J.D., McCallum, A., Pereira, F.C.N.
Conditional Random Fields Probabilistic Models
for Segmenting and Labeling Sequence Data. ICML
2001.
• M. Palatucci, D. Pomerleau, G. Hinton, T.
Mitchell. Zero-Shot Learning with Semantic Output
Codes. NIPS 2009.
• M. Schmidt, K. Murphy, G. Fung, R. Rosales.
Structure learning in random fields for heart
motion abnormality detection. CVPR 2008.
• D. Shahaf, A. Chechetka, C. Guestrin. Learning
Thin Junction Trees via Graph Cuts. AI-STATS
2009.
• C. Sutton, A. McCallum. Piecewise training of
undirected models. UAI 2005.
• C. Sutton, A. McCallum. Piecewise
pseudolikelihood for efficient training of
conditional random fields. ICML, 2007.
• A. Torralba, K. Murphy, W. Freeman. Contextual
models for object detection using boosted random
fields. NIPS 2004.

45
(extra slides)
46
B Score Decay Assumption
47
B Example complexity
48
Future work Templated CRFs
• Learn template, e.g.
• Score(i,j) DCI(i,j)
• Parametrization
• WebKB (Craven et al., 1998)
• Given webpages (Yipage type, Xicontent)
• Use template to Choose tree over pages
• Instantiate parameters
• ? P(YXx) P(pages types pages content)
• Requires local inputs
• Potentially very fast

49
Future work Learn score
• Given training queries
• Data
• Ground-truth model (E.g., from expensive
structure learning method)
• Learn function Score(Yi,Yj) for MST algorithm.