Dynamic Programming as Sequential Decision Making - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Dynamic Programming as Sequential Decision Making

Description:

Need to find a function at each time k to map xk to uk , uk=mk(xk) ... There is no value of information, uk =mk (xk), xk is determined from x0 and previous controls ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 21
Provided by: seanwa6
Category:

less

Transcript and Presenter's Notes

Title: Dynamic Programming as Sequential Decision Making


1
Dynamic Programming as Sequential Decision Making
  • Lecture 20 Dynamic Programming as Sequential
    Decision Making
  • Sequential Decision Making
  • Example Inventory Control
  • Value of Information
  • Dynamic Programming Algorithm
  • Principle of Optimality
  • Deterministic Finite State Systems
  • Shortest Paths
  • Lecture 18 Introduction to Dynamic Programming
  • Structure of Dynamic Programs
  • Calculating Binomial Coefficient
  • Pascals Triangle
  • Coins Revisited
  • Lecture 19 DNA Sequence Alignment
  • Review Primer of Genome Science Handout
  • Needleman/Wunsch example
  • Review Assignment 5

Reference Dynamic Programming and Optimal
Control, D. Bertsekas
2
Sequential Decision Making
  • Given a discrete-time system (difference
    equation)
  • wk are disturbances over which we have no control
  • Model the impact of external influences by
    independent identically distributed random
    variables (think a roll of the dice)
  • uk are the control variables we can choose from
    an admissible set

Is this sequence well defined?
3
Sequential Decision Making
  • Given a discrete-time system (difference
    equation)
  • wk are disturbances over which we have no control
  • Model the impact of external influences by
    independent identically distributed random
    variables (think a roll of the dice)
  • uk are the control variables we can choose from
    an admissible set

Does this make sense?
4
Sequential Decision Making
  • Since wk is random, what does it mean to choose
    uk to optimize the value of the cost function?

Does this make sense?
5
Sequential Decision Making
  • Given a discrete-time system (difference
    equation)
  • wk are disturbances over which we have no control
  • Model the impact of external influences by
    independent identically distributed random
    variables (think a roll of the dice)
  • uk are the control variables we can choose from
    an admissible set
  • Choose uk to optimize an additive cost function

6
Example Inventory Management
  • Inventory Management is one primary function of
  • Enterprise Resource Planning (ERP) software
  • Model the system
  • xk is the stock available at the beginning of
    period k
  • uk is the stock ordered (and immediately
    delivered) at the beginning of the kth period,
    ukgt0
  • wk is the demand during the kth period, with
    known probability distribution
  • Excess demand (resulting in negative values of x)
    is backlogged and filled as soon as inventory is
    available

7
Example Inventory Management
  • Inventory Management is one primary function of
  • Enterprise Resource Planning (ERP) software
  • Model the system
  • Define cost function
  • Pay r(xk) for either storing excess inventory
    (xkgt0), or for shortage costs (xklt0)
  • Pay purchasing cost cuk where c is cost per unit
    ordered
  • Pay an End of Season cost, R(xN), for inventory
    left after N periods

8
Example Inventory Management
  • Inventory Management is one primary function of
  • Enterprise Resource Planning (ERP) software
  • Model the system
  • Define cost function
  • Want to choose (u0, u1, , uN-1) to minimize the
    total expected cost. Two important ways of doing
    this
  • Open Loop Control Decide (u0, u1, , uN-1) all
    at once
  • Closed Loop Control Use xk to improve each
    decision. Need to find a function at each time k
    to map xk to uk , ukmk(xk). Decide (m0, m1, ,
    mN-1) all at once, not u

9
Example Inventory Management
  • Inventory Management is one primary function of
  • Enterprise Resource Planning (ERP) software
  • Model the system
  • Define cost function
  • Want to choose (u0, u1, , uN-1) to minimize the
    total expected cost

10
Value of Information
11
Dynamic Programming Algorithm
  • Principle of Optimality (why DP works)
  • Let pm0, m1, , mN-1 be an optimal policy
    for the basic problem
  • Suppose that when using p, a state xi occurs at
    time i with some probability
  • Consider the subproblem starting from xi at time
    i minimizing the cost-to-go from i to N
  • Then, the truncated policy mi, mi1, , mN-1
    is optimal for the subproblem.

Why?
12
Optimality in Driving
Shortest route from AF to Provo passes through
Orem, so...
13
Optimality in Driving
Shortest route from AF to Orem follows the
shortest route from AF to Provo.
14
Dynamic Programming Algorithm
  • Principle of Optimality (why DP works)
  • Let pm0, m1, , mN-1 be an optimal policy
    for the basic problem
  • Suppose that when using p, a state xi occurs at
    time i with some probability
  • Consider the subproblem starting from xi at time
    i minimizing the cost-to-go from i to N
  • Then, the truncated policy mi, mi1, , mN-1
    is optimal for the subproblem.

Prove it!
15
Dynamic Programming Algorithm
  • Re-consider the Inventory Management example
  • Use Principle of Optimality, working backwards in
    time
  • Period N-1 assume xN-1 is given
    J(xN-1)ER(xN)r(xN-1)cuN-1
  • r(xN-1) is fixed, regardless of choice
    for uN-1
  • choose uN-1gt0 to minimize
  • JN-1(xN-1 ) r(xN-1) cuN-1 ER(xN)
    r(xN-1) cuN-1 ER(xN-1uN-1-wN-1)
  • Need to compute J for all values of
    xN-1, get m(xN-1)
  • Period N-2 assume xN-2 is given
  • choose uN-1gt0 to minimize
  • JN-2 (xN-2 ) r(xN-2)cuN-2EJN-1(xN-2u
    N-2-wN-2 )?m(xN-2)
  • Period k Jk (xk ) r(xk) min cuk
    EJk1(xkuk-wk ) ?m(xk)

16
Dynamic Programming Algorithm
  • Theorem For every initial state x0, the
    optimal cost J(x0) of the basic problem is equal
    to J0(x0), where the function J0 is given by the
    last step of the following algorithm, which
    proceeds backward in time from period N-1 to
    period 0
  • The uk that minimizes the right hand side,
    given xk, is a function mk (xk) for each k, and
    the policy pm0, m1, , mN-1 is optimal.

17
Special Case Deterministic Finite-State Systems
  • Suppose wk is fixed to take on only one value
  • There is no value of information, uk mk (xk), xk
    is determined from x0 and previous controls
  • No need for feedback
  • Choose u0, u1, , uN-1 directly, instead of
    mi, mi1, , mN-1
  • Suppose state space is finite i.e. xk is chosen
    from a finite set for every k
  • Given a state xk , a control uk is associated
    with the transition fk(xk,,uk) and a cost
    gk(xk,,uk)
  • Equivalently represented as a graph
  • Nodes are states
  • Edges are transitions
  • Every edge has a cost associated with it

18
Special Case Deterministic Finite-State Systems
DP Shortest Path!
19
Special Case Deterministic Finite-State Systems
5
Senine
4

Seon
1
3
Initial state
0
Shum
Artificial Terminal Node
2
0
Limnah
1
0

Stage 1
Stage 2
Stage N-1
Stage N
Stage 0
Amount to make change N7
Edge weights are 1 except when transitioning from
zero to zero, in which case theyre zero.
Shortest path is optimal!
20
Dynamic Programming as Sequential Decision Making
  • Life can only be understood going backwards,
  • But it must be lived going forwards.
  • Kierkegaard
Write a Comment
User Comments (0)
About PowerShow.com