Dynamic Programming presentation

About This Presentation

Transcript and Presenter's Notes

Title: Dynamic Programming

1
Dynamic Programming Hidden Markov Models.

Alan Yuille
Dept. Statistics UCLA

2
Goal of this Talk
1. Chair

This talk introduces one of the major algorithms
dynamic programming (DP).
Then describe how it can be used in conjunction
with EM for learning.

3
Dynamic Programming

Dynamic Programming exploits the graphical
structure of the probability distribution. It can
be applied to any structure without closed loops.
Consider the two-headed coin example given in Tom
Griffiths talk (Monday).

4
Probabilistic Grammars

By the Markov Condition
Hence we can exploit the graphical structure to
efficiently compute

The structure means that the sum over x2 drops
out. We need only sum over x1 and x3. Only four
operations instead of eight.
5
Dynamic Programming Intuition

Suppose you wish to travel to Boston from Los
Angeles by car.
To determine the cost of going via Chicago you
only need to calculate the shortest cost from Los
Angeles to Chicago and then, independently, the
shortest cost from Chicago to Boston.
Decomposing the route in this way gives an
efficient algorithm which is polynomial in the
number of nodes and feasible for computation.

6
Dynamic Programming Diamond

Compute the shortest cost from A to B.

7
Application to a 1-dim chain.

Consider a distribution defined on a 1-dim chain.
Important property directed and undirected
graphs are equivalent (for 1-dim chain).
P(A,B) P(AB) P(B)
or P(A,B) P(BA) P(A)
For these simple graphs with two nodes -- you
cannot distinguish causation from correlation
without intervention (Wus lecture Friday).
For this lecture we will treat a simple
one-dimensional cover directed and undirected
models simultaneously. (Translating between
directed and undirected is generally possible for
graphs without closed loops but has subtleties).

8
Probability distribution on 1-D chain
9
1-D Chain.
10
1-Dim Chain

(Proof by induction).

11
1-Dim Chain

We can also use DP to compute other properties
e.g. to convert the distribution from undirected
form
To directed form

12
1-Dim Chain

13
Special Case 1-D Ising Spin Model
14
Dynamic Programming Summary

Dynamic Programming can be applied to perform
inference on all graphical models defined on
trees The key insight is that, for trees, we can
define an order on the nodes (not necessarily
unique) and process nodes in sequence (never
needing to return to a node that have already
been processed).

15
Extensions of Dynamic Programming

What to do if you have a graph with closed loops?
There are a variety of advanced ways to exploit
the graphical structure and obtain efficient
exact algorithms.
Prof. Adnan Darwiche (CS, UCLA) is an expert on
this topic. There will be an introduction to his
SamIam code.
Also can use approximate methods like BP.

16
Junction Trees.

It is also possible to take a probability
distribution defined on a graph with closed loops
and reformulate it as a distribution on a new
nodes without closed loops. (Lauritzen and
Spiegelhalter 1990).
This lead to a variety of algorithm generally
known as junction trees.
This is not a universal solution because the
resulting new graphs may have too many nodes to
make them practical.
Google junction trees to find nice tutorials on
junction trees.

17
Graph Conversion

Convert graph by a set of transformations.

18
Triangles Augmented Variables

From triangles to ordered triangles.

Original Variables Loops
Augmented Variables No Loops
19
Summary of Dynamic Programming.

Dynamic Programming can be used to efficiently
compute properties of a distribution for graphs
defined on trees.
Directed graphs on trees can be reformulated as
undirected graphs on trees, and vice versa.
DP can be extended to apply to graphs with closed
loops by restructuring the graphs (junction
trees).
It is an active research area to determine
efficient inference algorithms which exploit the
graphical structures of these models.
Relationship between DP and reinforcement
learning (week 2).
DP and A. DP and pruning.

20
HMMs Learning and Inference

So far we have considered inference only.
This assumes that the model is known.
How can we learn the model?
For 1D models -- this uses DP and EM.

21
A simple HMM for Coin Tossing

Two coins, one biased and the other fair, with
the coins switched occasionally.
The observable 0,1 is whether the coin is head or
tails.
The hidden state A,B is which coin is used.
There are unknown transition probabilities
between the hidden states A and B, and unknown
probabilities for the observations conditioned on
the hidden states.
The learning task is to estimate these
probabilities from a sequence of measurements.

22
HMM for Speech
23
HMM for Speech
24
HMM Summary

HMM define a class of markov models with hidden
variables. Used for speech recognition, and many
other applications.
Tasks involving HMMs involve learning,
inference, and model selection.
These can often be performed by algorithms based
on EM and DP.

Write a Comment

User Comments (0)

About PowerShow.com

Dynamic Programming PowerPoint PPT Presentation