Dynamic Programming Applications presentation

About This Presentation

Transcript and Presenter's Notes

Title: Dynamic Programming Applications

1
Dynamic Programming Applications

2
Preview

3
The RM problem

4
The RM problem - results

5
Optimal Stopping

At each stage a control is available that stops
the evolution of the system.
At stage k there are 2 options
Stop process (get a certain reward)
Continue process, perhaps at a certain cost, and
select one of the next available choices.
If there is only one other choice besides
stopping,
policy is characterized by the stopping
states-set.

6
Secretary Problems

7
Archetype problem

8
Max P(get best)

9
Objective

Jt P(under optimal policy we select best
candidate given that weve rejected t-1 so far )
Jt (0)P(under optimal policy we select best
candidate given that weve seen t so far and the
last one was NOT the best so far)
Jt (1)
P(best of N best of first t) ?

10
DP equation

11
Recursion

N-1 st
N-1 st
12
Policy
N-1 st0

For large N S 1/s ? loge(N/ t0)
Therefore t0 ? N/e
Policy Interview ? N/e candidates and reject
them, then select best you see so far.
P(success) J(t0) ? t0 /N ? 1/e ? .3679
Empirical validation?

13
The Last Shall be First

14
Asset selling

Like maximizing interview score, but with
discounting/investment
Offers w0,w1,,wN-1 i.i.d with fixed known
distribution (if not known inference, learning)
Stage k choices
Accept, and invest wk at rate r
Reject, and wait until stage k1
Objective maximize revenue at end of period N

15
Formulation

16
Costs

17
Policy

18
Structural properties

Fact ak ? ak1 for all k
Intuition
if an offer is good enough to be acceptable at
time k, it should be so at time k1.

19
General stopping OLA

Stopping mandatory at or before stage N
Stationary state, control, disturbances, and
their space sets, and cost/stage are constant
over time
Xtra action go to termination state _at_ cost t(xk)
DP-algorithm
JN(xN) t(xN )
Jk(xk) min(t(xk), Ewg(xk,uk,wk)Jk1(f(
xk,uk,wk))

20
Stopping set

21
Absorbance

Condition TN-1 is absorbing if x ? TN-1 and
termination not selected, then next state is in
TN-1.
That is f(x,u,w) ? TN-1 for all x ?TN-1 , u
?U(x), w.
Intuition if you reach a state thats optimal to
stop at, but you dont stop, then you move to a
state thats also optimal to stop at.
Theorem If TN-1 is absorbing then TkTN-1
for all k.
OLA policy iff TN-1 (1-step stopping set)
absorbing.

Write a Comment

User Comments (0)

Dynamic Programming Applications PowerPoint PPT Presentation