Toggle navigation
Help
Preferences
Sign up
Log in
Advanced
Dynamic Programming Applications PowerPoint PPT Presentation
1
/
21
Actions
Remove this presentation
Flag as Inappropriate
I Don't Like This
I like this
Remember as a Favorite
Share
Share
About This Presentation
Transcript and Presenter's Notes
Title: Dynamic Programming Applications
1
Dynamic Programming Applications
Lecture 5
2
Preview
Last time
Structural properties .
Today
Optimal stopping the OLA rule
(Secretary problem, Asset selling)
Next time
Infinite horizon.
3
The RM problem
Jt(x,i) maxJt-1(x), Ri Jt-1(x-1) (Ri-
OCt-1(x)) Jt-1(x)
Optimal policy accept cls. i iff Ri? OCt-1(x)
Jt-1(x) - Jt-1(x-1)
Results
1. Jt(x) increasing in x - by induction
2. OCt(x) decreasing in x - single crossing
3. OCt(x) increasing in t - by induction 2
Jt (x) S pi (Ri- OCt-1(x)) Jt-1(x)
Jt(x-1) S pi (Ri- OCt-1(x-1)) Jt-1(x-1)
OCt(x)- OCt-1(x) S pi (Ri- OCt-1(x))- (Ri-
OCt-1(x-1)) ?0
4
The RM problem - results
The optimal policy is characterized by threshold
levels bit as follows
Accept class i at time t iff 0 ? x lt bit
where bit minx OCt-1(x) gt Ri
Moreover, b1t ? ? bmt , where R1? ? Rm
5
Optimal Stopping
At each stage a control is available that stops
the evolution of the system.
At stage k there are 2 options
Stop process (get a certain reward)
Continue process, perhaps at a certain cost, and
select one of the next available choices.
If there is only one other choice besides
stopping,
policy is characterized by the stopping
states-set.
6
Secretary Problems
Cayley 1875
Interview N candidates for a job
Must accept/reject at end of interview
Objectives
Maximize expected score
Maximize P(get the best)
(you risk to hire nobody!)
7
Archetype problem
Make irrevocable choice from a fixed
number of opportunities whose values
are revealed sequentially.
Asset selling
Purchasing with a deadline
Exercising stock options (in your next HW)
8
Max P(get best)
Wthistory of relative ranks of candidates seen
by time t (inclusive)
xt 1, if tth candidate is best seen so far
0, otherwise
Relevant t and xt
Fact xt1 and Wt-1 statistically independent
9
Objective
Jt P(under optimal policy we select best
candidate given that weve rejected t-1 so far )
Jt (0)P(under optimal policy we select best
candidate given that weve seen t so far and the
last one was NOT the best so far)
Jt (1)
P(best of N best of first t) ?
10
DP equation
JN1 0
Jt (t-1)/t Jt (0) 1/t Jt (1)
Jt (0) Jt1 (must
continue)
Jt (1) max ( t/N , Jt1) (accept or
continue)
Fact 1 Jt -1 ? Jt
Fact 2 Jt ? t and t/N ? t gt single
crossing
Define t min t Jt1 ? t/N
11
Recursion
Jt Jt , if t lt t
(t-1)/t Jt 1/N, if t ? t
Jt/(t-1) Jt1/t 1/(N(t-1))
Therefore Jt1 t/N S 1/s (after
telescoping)
By definition, t is the smallest s.t. Jt1 ?
t /N , so
t mint S 1/s ? 1 ?
N-1 st
N-1 st
12
Policy
N-1 st0
For large N S 1/s ? loge(N/ t0)
Therefore t0 ? N/e
Policy Interview ? N/e candidates and reject
them, then select best you see so far.
P(success) J(t0) ? t0 /N ? 1/e ? .3679
Empirical validation?
13
The Last Shall be First
..The last person interviewed for a job gets it
55.8
of the time according to Runzheimer Canada, Inc.
Early applicants are hired only 17.6 of the
time
the management consulting firm suggests that job-
seekers who find they are among the first to be
grilled
tactfully ask to be rescheduled for a later
date.
Mondays are also poor days to be interviewed and
any
day just before quitting time is also bad.
(The Globe and Mail, Sept. 12, 1990, pg. A22)
14
Asset selling
Like maximizing interview score, but with
discounting/investment
Offers w0,w1,,wN-1 i.i.d with fixed known
distribution (if not known inference, learning)
Stage k choices
Accept, and invest wk at rate r
Reject, and wait until stage k1
Objective maximize revenue at end of period N
15
Formulation
State
xk?T asset has not been sold, current offer is
xk
xkT asset has been sold
Decision
uk u sell uk u dont sell
Plant equation
xk1 T, if xkT, or if xk?T and uk u
(sell)
wk, otherwise
16
Costs
gN(xN) xN , if xN ?T
0 , else
gk(xk) (1r)N-k xk , if xk ?T and uku
0 , else
JN(xN) xN , if xN ?T
0 , else
Jk(xk) max((1r)N-k xk , EwJk1(wk)), if
xk ?T
0 , else
17
Policy
Accept offer xk if xk gt ak
Reject offer xk if xk lt ak
Indifferent if xk ak
Optimal policy is determined by sequence ak
ak EwJk1(wk) / (1r)N-k
18
Structural properties
Fact ak ? ak1 for all k
Intuition
if an offer is good enough to be acceptable at
time k, it should be so at time k1.
19
General stopping OLA
Stopping mandatory at or before stage N
Stationary state, control, disturbances, and
their space sets, and cost/stage are constant
over time
Xtra action go to termination state _at_ cost t(xk)
DP-algorithm
JN(xN) t(xN )
Jk(xk) min(t(xk), Ewg(xk,uk,wk)Jk1(f(
xk,uk,wk))
20
Stopping set
It is optimal to stop at time k for states x in
the set
Tkx t(x) ? minu Eg(x,u,w) Jk1(f(x,u,w))
Fact JN-1(x) ? JN(x), so Jk-1(x) ? Jk(x) for all
k, x.
Cor. T0 ? ? Tk ? Tk1 ? ? TN-1
Question how to guarantee equality?
21
Absorbance
Condition TN-1 is absorbing if x ? TN-1 and
termination not selected, then next state is in
TN-1.
That is f(x,u,w) ? TN-1 for all x ?TN-1 , u
?U(x), w.
Intuition if you reach a state thats optimal to
stop at, but you dont stop, then you move to a
state thats also optimal to stop at.
Theorem If TN-1 is absorbing then TkTN-1
for all k.
OLA policy iff TN-1 (1-step stopping set)
absorbing.
Write a Comment
User Comments (
0
)
Cancel
OK
OK
Latest
Latest
Highest Rated
Sort by:
Latest
Highest Rated
Page
of
Recommended
Recommended
Relevance
Latest
Highest Rated
Most Viewed
Sort by:
Recommended
Relevance
Latest
Highest Rated
Most Viewed
Related
More from user
«
/
»
Page
of
«
/
»