CS 188: Artificial Intelligence Fall 2006 - PowerPoint PPT Presentation

Loading...

PPT – CS 188: Artificial Intelligence Fall 2006 PowerPoint presentation | free to download - id: a3af7-NDA4M



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

CS 188: Artificial Intelligence Fall 2006

Description:

Class later in the day (3) Lecture to be less fast / dense / technical / confusing (3) ... programming language or time of day - Anonymous. Midcourse Reviews ... – PowerPoint PPT presentation

Number of Views:85
Avg rating:3.0/5.0
Slides: 30
Provided by: wwwinstEe
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: CS 188: Artificial Intelligence Fall 2006


1
CS 188 Artificial Intelligence Fall 2006
  • Lecture 13 Advanced Reinforcement Learning
  • 10/12/2006

Dan Klein UC Berkeley
2
Midterms
  • Exams are graded, will be in glookup today,
    returned and gone over in section Monday
  • My impressions long and fairly hard exam, class
    generally did fine
  • You should expect that the final will be equally
    hard, but seem much less long
  • We added 25 points and took it out of 100

3
Midcourse Reviews
  • You liked
  • Projects (19)
  • Lectures (19)
  • Visual / demo presentations (14)
  • Newsgroup (4)
  • Pacman (3)
  • You wanted
  • Debugging help / coding advice / see staff code
    (8)
  • More time between projects (6)
  • More problem sets (4)
  • A webcast or podcast (4)
  • More coding or more projects (4)
  • Slides / reading earlier (3)
  • Class later in the day (3)
  • Lecture to be less fast / dense / technical /
    confusing (3)

4
Midcourse Reviews II
  • Difficult / workload is
  • Hard (15)
  • Medium (9)
  • Easy (7)
  • Dans office hours
  • Thursday (7)
  • Not Thursday (9)

5
Midcourse Reviews
  • I propose
  • Ill hang out after class on Tuesdays, we can
    walk to my office if there are more questions
  • Ill add Thursday OH for a few weeks, and keep if
    attended
  • Well spend more section time devoted to projects
  • Ill link to last terms slides so you can get a
    preview
  • Ill keep coding demos for you
  • There will be a (slight) shift from projects to
    written questions
  • Rough midterm grades soon, to let you know
    where you stand (will incorporate at least first
    two projects)
  • Other
  • Ive asked about webcasting / podcasting, but it
    seems very unlikely this term
  • Limits to how early I can get new slides up,
    since I do revise extensively from last term
  • Cant really change programming language or time
    of day

6
Midcourse Reviews
7
Pacman 1.2 Honors
  • Best Admissible Heuristics
  • Best Fast Search
  • Example of inadmissible heuristic
  • NumFood ManhattanDistToClosestFood

8
Today
  • How advanced reinforcement learning works for
    large problems
  • Some previews of fundamental ideas well see
    throughout the rest of the term
  • Next class well start on probabilistic reasoning
    and reasoning about beliefs

9
Recap Q-Learning
  • Learn Q(s,a) values from samples
  • Receive a sample (s,a,s,r)
  • On one hand old estimate of return
  • But now we have a new estimate for this sample
  • Nudge the old estimate towards the new sample
  • Equivalently, average samples over time

10
Q-Learning
  • Q-learning produces tables of q-values

11
Q-Learning
  • In realistic situations, we cannot possibly learn
    about every single state!
  • Too many states to visit them all in training
  • Too many states to even hold the q-tables in
    memory
  • Instead, we want to generalize
  • Learn about some small number of training states
    from experience
  • Generalize that experience to new, similar states
  • This is a fundamental idea in machine learning,
    and well see it over and over again

12
Example Pacman
  • Lets say we discover through experience that
    this state is bad
  • In naïve q-learning, we know nothing about this
    state or its q-states
  • Or even this one!

13
Feature-Based Representations
  • Solution describe a state using a vector of
    features
  • Features are functions from states to real
    numbers (often 0/1) that capture important
    properties of the state
  • Example features
  • Distance to closest ghost
  • Distance to closest dot
  • Number of ghosts
  • 1 / (dist to dot)2
  • Is Pacman in a tunnel? (0/1)
  • etc.
  • Can also describe a q-state (s, a) with features
    (e.g. action moves closer to food)

14
Linear Feature Functions
  • Using a feature representation, we can write a
    q-function (or value function) for any state
    using a few weights
  • Advantage our experience is summed up in a few
    powerful numbers
  • Disadvantage states may share features but be
    very different in value!

15
Function Approximation
  • Q-learning with linear q-functions
  • Intuitive interpretation
  • Adjust weights of active features
  • E.g. if something unexpectedly bad happens,
    disprefer all states with that states features
  • Formal justification online least squares (much
    later)

16
Example Q-Pacman
17
Hierarchical Learning
18
Hierarchical RL
  • Stratagus Example of a large RL task, from
    Bhaskara Marthis thesis (w/ Stuart Russell)
  • Stratagus is hard for reinforcement learning
    algorithms
  • gt 10100 states
  • gt 1030 actions at each point
  • Time horizon 104 steps
  • Stratagus is hard for human programmers
  • Typically takes several person-months for game
    companies to write computer opponent
  • Still, no match for experienced human players
  • Programming involves much trial and error
  • Hierarchical RL
  • Humans supply high-level prior knowledge using
    partial program
  • Learning algorithm fills in the details

19
Partial Alisp Program
  • (defun top ()
  • (loop
  • (choose
  • (gather-wood)
  • (gather-gold))))
  • (defun gather-wood ()
  • (with-choice
  • (dest forest-list)
  • (nav dest)
  • (action get-wood)
  • (nav base-loc)
  • (action dropoff)))
  • (defun gather-gold ()
  • (with-choice
  • (dest goldmine-list)
  • (nav dest))
  • (action get-gold)
  • (nav base-loc))
  • (action dropoff)))
  • (defun nav (dest)
  • (until ( (pos (get-state))
    dest)
  • (with-choice
  • (move (N S E W NOOP))
  • (action move))))

20
Hierarchical RL
  • They then define a hierarchical Q-function which
    learns a linear feature-based mini-Q-function at
    each choice point
  • Very good at balancing resources and directing
    rewards to the right region
  • Still not very good at the strategic elements of
    these kinds of games (i.e. the Markov game aspect)

DEMO
21
Policy Search
22
Policy Search
  • Problem often the feature-based policies that
    work well arent the ones that approximate V / Q
    best
  • E.g. your value functions from 1.3 were probably
    horrible estimates of future rewards, but they
    still produce good decisions
  • Well see this distinction between modeling and
    prediction again later in the course
  • Solution learn the policy that maximizes rewards
    rather than the value that predicts rewards
  • This is the idea behind policy search, such as
    what controlled the upside-down helicopter

23
Policy Search
  • Simplest policy search
  • Start with an initial linear value function or
    q-function
  • Nudge each feature weight up and down and see if
    your policy is better than before
  • Problems
  • How do we tell the policy got better?
  • Need to run many sample episodes!
  • If there are a lot of features, this can be
    impractical

24
Policy Search
  • Advanced policy search
  • Write a stochastic (soft) policy
  • Turns out you can efficiently approximate the
    derivative of the returns with respect to the
    parameters w (details in the book, but you dont
    have to know them)
  • Take uphill steps, recalculate derivatives, etc.

25
Take a Deep Breath
  • Were done with search and planning!
  • Next, well look at how to reason with
    probabilities
  • Diagnosis
  • Tracking objects
  • Speech recognition
  • Robot mapping
  • lots more!
  • Last part of course machine learning

26
Digression / Preview
27
Linear regression
40
26
24
Temperature
22
20
20
30
40
20
30
20
10
0
10
0
10
20
0
0
Given examples
given a new point
Predict
28
Linear regression
40
Temperature
20
0
0
20
29
Ordinary Least Squares (OLS)
Error or residual
Observation
Prediction
0
0
20
30
Overfitting
30
25
20
Degree 15 polynomial
15
10
5
0
-5
-10
-15
0
2
4
6
8
10
12
14
16
18
20
DEMO
About PowerShow.com