Decision Making Under Uncertainty

About This Presentation

Title:

Decision Making Under Uncertainty

Description:

Decision Making Under Uncertainty Russell and Norvig: ch 16, 17 CMSC 671 Fall 2005 material from Lise Getoor, Jean-Claude Latombe, and Daphne Koller – PowerPoint PPT presentation

Number of Views:5

Avg rating:3.0/5.0

Slides: 61

Provided by: cseeUmbcE

Learn more at: https://redirect.cs.umbc.edu

more less

Transcript and Presenter's Notes

Title: Decision Making Under Uncertainty

1
Decision Making Under Uncertainty

Russell and Norvig ch 16, 17
CMSC 671 Fall 2005

material from Lise Getoor, Jean-Claude Latombe,
and Daphne Koller
2
Decision Making Under Uncertainty

Many environments have multiple possible outcomes
Some of these outcomes may be good others may be
bad
Some may be very likely others unlikely
Whats a poor agent to do??

3
Non-Deterministic vs. Probabilistic Uncertainty

a,b,c
decision that is best for worst case

Non-deterministic model
Probabilistic model
Adversarial search
4
Expected Utility

Random variable X with n values x1,,xn and
distribution (p1,,pn)E.g. X is the state
reached after doing an action A under uncertainty
Function U of XE.g., U is the utility of a state
The expected utility of A is EUA
Si1,,n p(xiA)U(xi)

5
One State/One Action Example
U(S0) 100 x 0.2 50 x 0.7 70 x 0.1
20 35 7 62
6
One State/Two Actions Example

U1(S0) 62
U2(S0) 74
U(S0) maxU1(S0),U2(S0)
74

80
7
Introducing Action Costs

U1(S0) 62 5 57
U2(S0) 74 25 49
U(S0) maxU1(S0),U2(S0)
57

-5
-25
80
8
MEU Principle

A rational agent should choose the action that
maximizes agents expected utility
This is the basis of the field of decision theory
The MEU principle provides a normative criterion
for rational choice of action

AI is Solved!!!
9
Not quite

Must have complete model of
Actions
Utilities
States
Even if you have a complete model, will be
computationally intractable
In fact, a truly rational agent takes into
account the utility of reasoning as
well---bounded rationality
Nevertheless, great progress has been made in
this area recently, and we are able to solve much
more complex decision-theoretic problems than
ever before

10
Well look at

Decision-Theoretic Planning
Simple decision making (ch. 16)
Sequential decision making (ch. 17)

11
Axioms of Utility Theory

Orderability
(AgtB) ? (AltB) ? (AB)
Transitivity
(AgtB) ? (BgtC) ? (AgtC)
Continuity
AgtBgtC ? ?p p,A 1-p,C B
Substitutability
AB ? p,A 1-p,Cp,B 1-p,C
Monotonicity
AgtB ? (pq ? p,A 1-p,B gt q,A 1-q,B)
Decomposability
p,A 1-p, q,B 1-q, C p,A (1-p)q, B
(1-p)(1-q), C

12
Money Versus Utility

Money ltgt Utility
More money is better, but not always in a linear
relationship to the amount of money
Expected Monetary Value
Risk-averse U(L) lt U(SEMV(L))
Risk-seeking U(L) gt U(SEMV(L))
Risk-neutral U(L) U(SEMV(L))

13
Value Function

Provides a ranking of alternatives, but not a
meaningful metric scale
Also known as an ordinal utility function
Remember the expectiminimax example
Sometimes, only relative judgments (value
functions) are necessary
At other times, absolute judgments (utility
functions) are required

14
Multiattribute Utility Theory

A given state may have multiple utilities
...because of multiple evaluation criteria
...because of multiple agents (interested
parties) with different utility functions
We will talk about this more later in the
semester, when we discuss multi-agent systems and
game theory

15
Decision Networks

Extend BNs to handle actions and utilities
Also called influence diagrams
Use BN inference methods to solve
Perform Value of Information calculations

16
Decision Networks cont.

Chance nodes random variables, as in BNs
Decision nodes actions that decision maker can
take
Utility/value nodes the utility of the outcome
state.

17
RN example
18
Umbrella Network
take/dont take
P(rain) 0.4
umbrella
weather
have umbrella
forecast
P(havetake) 1.0 P(havetake)1.0
happiness
f w p(fw) sunny rain
0.3 rainy rain 0.7 sunny no rain
0.8 rainy no rain 0.2
U(have,rain) -25 U(have,rain) 0 U(have,
rain) -100 U(have, rain) 100
19
Evaluating Decision Networks

Set the evidence variables for current state
For each possible value of the decision node
Set decision node to that value
Calculate the posterior probability of the parent
nodes of the utility node, using BN inference
Calculate the resulting utility for action
Return the action with the highest utility

20
Decision MakingUmbrella Network
Should I take my umbrella??
take/dont take
P(rain) 0.4
umbrella
weather
have umbrella
forecast
P(havetake) 1.0 P(havetake)1.0
happiness
f w p(fw) sunny rain
0.3 rainy rain 0.7 sunny no rain
0.8 rainy no rain 0.2
U(have,rain) -25 U(have,rain) 0 U(have,
rain) -100 U(have, rain) 100
21
Value of Information (VOI)

Suppose an agents current knowledge is E. The
value of the current best action ? is

22
Value of InformationUmbrella Network
What is the value of knowing the weather forecast?
take/dont take
P(rain) 0.4
umbrella
weather
have umbrella
forecast
P(havetake) 1.0 P(havetake)1.0
happiness
f w p(fw) sunny rain
0.3 rainy rain 0.7 sunny no rain
0.8 rainy no rain 0.2
U(have,rain) -25 U(have,rain) 0 U(have,
rain) -100 U(have, rain) 100
23
Sequential Decision Making

Finite Horizon
Infinite Horizon

24
Simple Robot Navigation Problem

In each state, the possible actions are U, D, R,
and L

25
Probabilistic Transition Model

In each state, the possible actions are U, D, R,
and L
The effect of U is as follows (transition
model)
With probability 0.8 the robot moves up one
square (if the robot is already in the top
row, then it does not move)

26
Probabilistic Transition Model

In each state, the possible actions are U, D, R,
and L
The effect of U is as follows (transition
model)
With probability 0.8 the robot moves up one
square (if the robot is already in the top
row, then it does not move)
With probability 0.1 the robot moves right one
square (if the robot is already in the
rightmost row, then it does not move)

27
Probabilistic Transition Model

In each state, the possible actions are U, D, R,
and L
The effect of U is as follows (transition
model)
With probability 0.8 the robot moves up one
square (if the robot is already in the top
row, then it does not move)
With probability 0.1 the robot moves right one
square (if the robot is already in the
rightmost row, then it does not move)
With probability 0.1 the robot moves left one
square (if the robot is already in the
leftmost row, then it does not move)

28
Markov Property
The transition properties depend only on the
current state, not on previous history (how that
state was reached)
29
Sequence of Actions
3,2
3
2
1
4
3
2
1

Planned sequence of actions (U, R)

30
Sequence of Actions
3
2
1
4
3
2
1

Planned sequence of actions (U, R)
U is executed

31
Histories
3
2
1
4
3
2
1

Planned sequence of actions (U, R)
U has been executed
R is executed
There are 9 possible sequences of states
called histories and 6 possible final states
for the robot!

32
Probability of Reaching the Goal
3
Note importance of Markov property in this
derivation
2
1
4
3
2
1

P(4,3 (U,R).3,2)
P(4,3 R.3,3) x
P(3,3 U.3,2)
P(4,3 R.4,2) x P(4,2 U.3,2)

P(3,3 U.3,2) 0.8
P(4,2 U.3,2) 0.1

P(4,3 R.3,3) 0.8
P(4,3 R.4,2) 0.1

P(4,3 (U,R).3,2) 0.65

33
Utility Function

4,3 provides power supply
4,2 is a sand area from which the robot cannot
escape

34
Utility Function

4,3 provides power supply
4,2 is a sand area from which the robot cannot
escape
The robot needs to recharge its batteries

35
Utility Function

4,3 provides power supply
4,2 is a sand area from which the robot cannot
escape
The robot needs to recharge its batteries
4,3 or 4,2 are terminal states

36
Utility of a History

4,3 provides power supply
4,2 is a sand area from which the robot cannot
escape
The robot needs to recharge its batteries
4,3 or 4,2 are terminal states
The utility of a history is defined by the
utility of the last state (1 or 1) minus
n/25, where n is the number of moves

37
Utility of an Action Sequence
1
3
-1
2
1
4
3
2
1

Consider the action sequence (U,R) from 3,2

38
Utility of an Action Sequence
1
3
-1
2
1
4
3
2
1

Consider the action sequence (U,R) from 3,2
A run produces one among 7 possible histories,
each with some probability

39
Utility of an Action Sequence
1
3
-1
2
1
4
3
2
1

Consider the action sequence (U,R) from 3,2
A run produces one among 7 possible histories,
each with some probability
The utility of the sequence is the expected
utility of the histories
U ShUh P(h)

40
Optimal Action Sequence
1
3
-1
2
1
4
3
2
1

Consider the action sequence (U,R) from 3,2
A run produces one among 7 possible histories,
each with some probability
The utility of the sequence is the expected
utility of the histories
The optimal sequence is the one with maximal
utility

41
Optimal Action Sequence
1
3
-1
2
1
4
3
2
1

Consider the action sequence (U,R) from 3,2
A run produces one among 7 possible histories,
each with some probability
The utility of the sequence is the expected
utility of the histories
The optimal sequence is the one with maximal
utility
But is the optimal action sequence what we want
to compute?

42
Reactive Agent Algorithm

Repeat
s ? sensed state
If s is terminal then exit
a ? choose action (given s)
Perform a

43
Policy (Reactive/Closed-Loop Strategy)

A policy P is a complete mapping from states to
actions

44
Reactive Agent Algorithm

Repeat
s ? sensed state
If s is terminal then exit
a ? P(s)
Perform a

45
Optimal Policy
1
3
-1
2
1
4
3
2
1

A policy P is a complete mapping from states to
actions
The optimal policy P is the one that always
yields a history (ending at a terminal state)
with maximal
expected utility

46
Optimal Policy
1
3
-1
2
1
4
3
2
1

A policy P is a complete mapping from states to
actions
The optimal policy P is the one that always
yields a history with maximal expected utility

47
Additive Utility

History H (s0,s1,,sn)
The utility of H is additive iff
U(s0,s1,,sn) R(0) U(s1,,sn) S R(i)

Reward
48
Additive Utility

History H (s0,s1,,sn)
The utility of H is additive iff
U(s0,s1,,sn) R(0) U(s1,,sn) S R(i)
Robot navigation example
R(n) 1 if sn 4,3
R(n) -1 if sn 4,2
R(i) -1/25 if i 0, , n-1

49
Principle of Max Expected Utility

History H (s0,s1,,sn)
Utility of H U(s0,s1,,sn) S R(i)
First-step analysis ?
U(i) R(i) maxa SkP(k a.i) U(k)
P(i) arg maxa SkP(k a.i) U(k)

50
Value Iteration

Initialize the utility of each non-terminal
state si to U0(i) 0
For t 0, 1, 2, , do Ut1(i) ? R(i)
maxa SkP(k a.i) Ut(k)

51
Value Iteration
Note the importance of terminal states
and connectivity of the state-transition graph

Initialize the utility of each non-terminal
state si to U0(i) 0
For t 0, 1, 2, , do Ut1(i) ? R(i)
maxa SkP(k a.i) Ut(k)

0.812
0.868
0.918
1
3
0.762
0.660
-1
2
0.705
0.655
0.388
0.611
1
4
3
2
1
52
Policy Iteration

Pick a policy P at random

53
Policy Iteration

Pick a policy P at random
Repeat
Compute the utility of each state for P Ut1(i)
? R(i) SkP(k P(i).i) Ut(k)

54
Policy Iteration

Pick a policy P at random
Repeat
Compute the utility of each state for P Ut1(i)
? R(i) SkP(k P(i).i) Ut(k)
Compute the policy P given these utilities
P(i) arg maxa SkP(k a.i) U(k)

55
Policy Iteration

Pick a policy P at random
Repeat
Compute the utility of each state for P Ut1(i)
? R(i) SkP(k P(i).i) Ut(k)
Compute the policy P given these utilities
P(i) arg maxa SkP(k a.i) U(k)
If P P then return P

56
n-Step decision process

Assume that
Each state reached after n steps is terminal,
hence has known utility
There is a single initial state
Any two states reached after i and j steps are
different

57
n-Step Decision Process

P(i) arg maxa SkP(k a.i) U(k)
U(i) R(i) maxa SkP(k a.i) U(k)

For j n-1, n-2, , 0 do
For every state si attained after step j
Compute the utility of si
Label that state with the corresponding action

58
What is the Difference?

P(i) arg maxa SkP(k a.i) U(k)
U(i) R(i) maxa SkP(k a.i) U(k)

59
Infinite Horizon
In many problems, e.g., the robot navigation
example, histories are potentially unbounded and
the same state can be reached many times
What if the robot lives forever?
One trick Use discounting to make
infinite Horizon problem mathematically tractable
1
3
-1
2
1
4
3
2
1
60
Example Tracking a Target

An optimal policy cannot be computed ahead
of time
- The environment might be unknown
The environment may only be partially observable
The target may not wait
? A policy must be computed on-the-fly

The robot must keep the target in view
The targets trajectory is not known in
advance
The environment may
or may not be known

61
POMDP (Partially Observable Markov Decision
Problem)

A sensing operation returns multiple states,
with a probability distribution
Choosing the action that maximizes the
expected utility of this state distribution
assuming state utilities computed as
above is not good enough, and actually
does not make sense (is not rational)

62
Example Target Tracking
63
Summary