Dynamic Programming Applications PowerPoint PPT Presentation

presentation player overlay
1 / 21
About This Presentation
Transcript and Presenter's Notes

Title: Dynamic Programming Applications


1
Dynamic Programming Applications
  • Lecture 5

2
Preview
  • Last time
  • Structural properties .
  • Today
  • Optimal stopping the OLA rule
  • (Secretary problem, Asset selling)
  • Next time
  • Infinite horizon.

3
The RM problem
  • Jt(x,i) maxJt-1(x), Ri Jt-1(x-1) (Ri-
    OCt-1(x)) Jt-1(x)
  • Optimal policy accept cls. i iff Ri? OCt-1(x)
    Jt-1(x) - Jt-1(x-1)
  • Results
  • 1. Jt(x) increasing in x - by induction
  • 2. OCt(x) decreasing in x - single crossing
  • 3. OCt(x) increasing in t - by induction 2
  • Jt (x) S pi (Ri- OCt-1(x)) Jt-1(x)
  • Jt(x-1) S pi (Ri- OCt-1(x-1)) Jt-1(x-1)
  • OCt(x)- OCt-1(x) S pi (Ri- OCt-1(x))- (Ri-
    OCt-1(x-1)) ?0

4
The RM problem - results
  • The optimal policy is characterized by threshold
    levels bit as follows
  • Accept class i at time t iff 0 ? x lt bit
  • where bit minx OCt-1(x) gt Ri
  • Moreover, b1t ? ? bmt , where R1? ? Rm

5
Optimal Stopping
  • At each stage a control is available that stops
  • the evolution of the system.
  • At stage k there are 2 options
  • Stop process (get a certain reward)
  • Continue process, perhaps at a certain cost, and
    select one of the next available choices.
  • If there is only one other choice besides
    stopping,
  • policy is characterized by the stopping
    states-set.

6
Secretary Problems
  • Cayley 1875
  • Interview N candidates for a job
  • Must accept/reject at end of interview
  • Objectives
  • Maximize expected score
  • Maximize P(get the best)
  • (you risk to hire nobody!)

7
Archetype problem
  • Make irrevocable choice from a fixed
  • number of opportunities whose values
  • are revealed sequentially.
  • Asset selling
  • Purchasing with a deadline
  • Exercising stock options (in your next HW)

8
Max P(get best)
  • Wthistory of relative ranks of candidates seen
    by time t (inclusive)
  • xt 1, if tth candidate is best seen so far
  • 0, otherwise
  • Relevant t and xt
  • Fact xt1 and Wt-1 statistically independent

9
Objective
  • Jt P(under optimal policy we select best
    candidate given that weve rejected t-1 so far )
  • Jt (0)P(under optimal policy we select best
    candidate given that weve seen t so far and the
    last one was NOT the best so far)
  • Jt (1)
  • P(best of N best of first t) ?

10
DP equation
  • JN1 0
  • Jt (t-1)/t Jt (0) 1/t Jt (1)
  • Jt (0) Jt1 (must
    continue)
  • Jt (1) max ( t/N , Jt1) (accept or
    continue)
  • Fact 1 Jt -1 ? Jt
  • Fact 2 Jt ? t and t/N ? t gt single
    crossing
  • Define t min t Jt1 ? t/N

11
Recursion
  • Jt Jt , if t lt t
  • (t-1)/t Jt 1/N, if t ? t
  • Jt/(t-1) Jt1/t 1/(N(t-1))
  • Therefore Jt1 t/N S 1/s (after
    telescoping)
  • By definition, t is the smallest s.t. Jt1 ?
    t /N , so
  • t mint S 1/s ? 1 ?

N-1 st
N-1 st
12
Policy
N-1 st0
  • For large N S 1/s ? loge(N/ t0)
  • Therefore t0 ? N/e
  • Policy Interview ? N/e candidates and reject
    them, then select best you see so far.
  • P(success) J(t0) ? t0 /N ? 1/e ? .3679
  • Empirical validation?

13
The Last Shall be First
  • ..The last person interviewed for a job gets it
    55.8
  • of the time according to Runzheimer Canada, Inc.
  • Early applicants are hired only 17.6 of the
    time
  • the management consulting firm suggests that job-
  • seekers who find they are among the first to be
    grilled
  • tactfully ask to be rescheduled for a later
    date.
  • Mondays are also poor days to be interviewed and
    any
  • day just before quitting time is also bad.
  • (The Globe and Mail, Sept. 12, 1990, pg. A22)

14
Asset selling
  • Like maximizing interview score, but with
    discounting/investment
  • Offers w0,w1,,wN-1 i.i.d with fixed known
    distribution (if not known inference, learning)
  • Stage k choices
  • Accept, and invest wk at rate r
  • Reject, and wait until stage k1
  • Objective maximize revenue at end of period N

15
Formulation
  • State
  • xk?T asset has not been sold, current offer is
    xk
  • xkT asset has been sold
  • Decision
  • uk u sell uk u dont sell
  • Plant equation
  • xk1 T, if xkT, or if xk?T and uk u
    (sell)
  • wk, otherwise

16
Costs
  • gN(xN) xN , if xN ?T
  • 0 , else
  • gk(xk) (1r)N-k xk , if xk ?T and uku
  • 0 , else
  • JN(xN) xN , if xN ?T
  • 0 , else
  • Jk(xk) max((1r)N-k xk , EwJk1(wk)), if
    xk ?T
  • 0 , else

17
Policy
  • Accept offer xk if xk gt ak
  • Reject offer xk if xk lt ak
  • Indifferent if xk ak
  • Optimal policy is determined by sequence ak
  • ak EwJk1(wk) / (1r)N-k

18
Structural properties
  • Fact ak ? ak1 for all k
  • Intuition
  • if an offer is good enough to be acceptable at
    time k, it should be so at time k1.

19
General stopping OLA
  • Stopping mandatory at or before stage N
  • Stationary state, control, disturbances, and
    their space sets, and cost/stage are constant
    over time
  • Xtra action go to termination state _at_ cost t(xk)
  • DP-algorithm
  • JN(xN) t(xN )
  • Jk(xk) min(t(xk), Ewg(xk,uk,wk)Jk1(f(
    xk,uk,wk))

20
Stopping set
  • It is optimal to stop at time k for states x in
    the set
  • Tkx t(x) ? minu Eg(x,u,w) Jk1(f(x,u,w))
  • Fact JN-1(x) ? JN(x), so Jk-1(x) ? Jk(x) for all
    k, x.
  • Cor. T0 ? ? Tk ? Tk1 ? ? TN-1
  • Question how to guarantee equality?

21
Absorbance
  • Condition TN-1 is absorbing if x ? TN-1 and
    termination not selected, then next state is in
    TN-1.
  • That is f(x,u,w) ? TN-1 for all x ?TN-1 , u
    ?U(x), w.
  • Intuition if you reach a state thats optimal to
    stop at, but you dont stop, then you move to a
    state thats also optimal to stop at.
  • Theorem If TN-1 is absorbing then TkTN-1
    for all k.
  • OLA policy iff TN-1 (1-step stopping set)
    absorbing.
Write a Comment
User Comments (0)