Linearly-solvable Markov decision problems Emanuel Todorov (UCSD) - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Linearly-solvable Markov decision problems Emanuel Todorov (UCSD)

Description:

Linearly-solvable Markov decision problems Emanuel Todorov (UCSD) Figures are borrowed from the paper in NIPS2006 IBM TRL – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 19
Provided by: acjp
Category:

less

Transcript and Presenter's Notes

Title: Linearly-solvable Markov decision problems Emanuel Todorov (UCSD)


1
Linearly-solvable Markov decision
problemsEmanuel Todorov (UCSD)
Figures are borrowed from the paper in NIPS2006
  • ?????? ???? (IBM TRL)

2
??????? ??????????????
  • ??????????????(MDP)????(?linearMDP????????MDP?????
    ??MDP?)?????
  • ???????????????????MDP????
  • ????????????????????
  • ?????????????MDP???MDP?????
  • ?????????MDP????????????
  • ?????????????MDP???MDP????????????

3
??????
  • ???Emanuel Todorov ?UCSD ??
  • ??(?????)???????????????
  • NIPS???????
  • ????????NIPS??????????
  • ????4?(!)????????????????????????

4
????????(MDP)?????
5
????????(MDP)???????????????????????????????
  • ??? ??????????????????????????????????????????????
    ?
  • ?? i ???? j ??????????? i ??????????? u ?????
  • ?????? pij(u) (??????????????u??????)
  • ?? i ?????? u ???????? l(i, u) ?????
  • ????????????????
  • ? ????????????????????u??????
  • ??? ???????? v(i) ???????????
  • ???? v(i) ?? i ????????????????????
  • ???value iteration (? policy iteration)
    ??????(???????)
  • ????????????????????????????
  • ??????????????????????????

0.3
0.7
0.7
0.3
red ? blue ?2????????????????????????
6
?????MDP
7
MDP?????????????????????????????????MDP????????
?
  • ?????????????????????????MDP????
  • ?? i ?????????? j ?????????? uj ???????????
  • ????? ?uj ?????????????????
  • ????????????????? v(i) ????
  • ??????(???1?)????????????????????
  • ???? ???? ??????????
  • ???????????(?)? ???? v(i) ??????????????

?????????????
????????
?? j ???????? (?i ??????)
??????? (?????)
given
8
????MDP????????????????????? ??????????????????
????
  • ??????? i ???? ?? ? ?KL-divergence
    ?????
  • ??) KL-divergence ???
  • ????

?????????????????? ? KL-divergence
?? i ??? u ????
?? i ????
9
??)?????
  • ?????MDP??
  • ???????????????
  • ??????????????????? ? ?????????
  • ?????????????
  • ?????MDP???????(min????)??????????

????????
?? i ????
?
?????????1
???????
10
????????
  • ???????????????????????????????i ??j
    ??????????????????MDP??????
  • ????????????????????????????min???????????????????
    ????
  • ?????????KL-divergence??????????
  • KL-divergence ??????????????????????????????
  • ?????? ?????????????????????
  • ????KL-divergence?????????????????????????????????
    ??
  • ? ?????????????????????
  • ????????? ??????????
  • ??????????MDP???MDP??????

11
????????MDP???
12
????????MDP??????
  • ?????? ?????????????????????
  • Dijkstra??O(? log???)
  • ????????MDP????
  • ????????????????????
  • ????? ?????????????
  • ????
  • ??????
  • ????????????(??????????????????) KL
  • ???????????? ?(??????????????????) ??????
  • ? ????????????????????????

i ? j ??????1
(???????????)
???????????????????????
13
???????????
  • ??????
  • ??????

(??????)
14
????????MDP??MDP?????
15
???????????MDP?????MDP??????????????????????????
?????????????
  • ???(????????)MDP ??MDP???????????????????????????
    ????????????????????????
  • ???????????????
  • ??????????????
  • ??????? ? ? ????

??MDP??????
?MDP??????
????????
? (?????a?)
???MDP?????(??????? a ?????)
??MDP????
?MDP????
?? i ????
?????????????????? ? KL-divergence
16
?????MDP???????MDP????????????????????????????????
  • ??????? ? ?
    ??????????????MDP??????
  • ?????MDP???????? a ????????????????
    ???????????????????MDP??????
  • ?? ?????MDP???????????????
    ?????????? a ? ???????
  • ??? ??????
  • ??? ? ??????MDP??? ???
  • ???????? ???????? a
    ???????
  • ?????????????
  • ? ???????????????????????????

17
????????
  • ?????????MDP??????
  • ????????????????????????????????
  • ??????????????????
  • ????KL-divergence??????????????KL?????????????????
    ??KL????????????????
  • ??????????MDP???MDP?????????
  • ???????MDP??????MDP????
  • ?MDP????????????????MDP????????????????????
  • ???MDP??????????????????????????

18
???
  • ???????MDP?????????????(???? Q-learning)?????????
    ??????????????????stochastic approximation????????
    ??
  • ???Q-learning ????????
  • ?????max(or min)??????????????????????????????????
    ?????????????????
  • parsing ????structure output ????
  • ???????????
Write a Comment
User Comments (0)
About PowerShow.com