Computational Discovery of Communicable Knowledge - PowerPoint PPT Presentation

About This Presentation
Title:

Computational Discovery of Communicable Knowledge

Description:

Institute for the Study of Learning and Expertise. Palo Alto, California USA ... Historical Trends. Work on learning plan knowledge has seen many shifts in fashion: ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 41
Provided by: Lang8
Learn more at: http://www.isle.org
Category:

less

Transcript and Presenter's Notes

Title: Computational Discovery of Communicable Knowledge


1
Challenges in Learning Plan Knowledge
Pat Langley School of Computing and
Informatics Arizona State University Tempe,
Arizona USA Institute for the Study of Learning
and Expertise Palo Alto, California USA
Thanks to D. Choi, T. Konik, U. Kutur, N. Li, D.
Nau, N. Nejati, and D. Shapiro for their many
contributions. This talk reports research funded
by grants from DARPA IPTO, which is not
responsible for its contents.
2
Outline of the Talk
  • Brief review of learning plan knowledge
  • Learning from different sources
  • Learning for new performance tasks
  • Learning in different scenarios
  • Learning with novel representations
  • Some responses to these challenges
  • Concluding remarks

3
The Problem Learning Plan Knowledge
  • Given Basic knowledge about some action-oriented
    domain. (e.g., state/goal representation,
    operators)
  • Given A set of training problems (e.g., initial
    states, goals, and possibly more)
  • Given Some performance task that the system must
    carry out.
  • Given A performance mechanism that can use
    knowledge to carry out that task.
  • Learn Knowledge that will let the system improve
    its ability to perform new tasks from the same
    or similar domain.

4
Topics Not Covered
This talk will range widely, but I will not cover
issues related to
  • Learning with impoverished representations
  • Interested in human-like, intelligent behavior
  • Most work on reinforcement learning is irrelevant
  • Acquiring basic knowledge about domain
  • Interested in building on such knowledge
  • Most work on learning action models is too basic
  • Nonincremental learning from large data sets
  • Interested in human-like incremental learning
  • This rules out most data-mining approaches

5
Historical Topics
There has been a long history of work on learning
plan knowledge
  • Forming macro-operators
  • Fikes et al. (1972), Iba (1988), Mooney (1989),
    Botea et al. (2005)
  • Inducing forward-chaining control rules
  • Anzai Simon (1978) Mitchell et al. (1981),
    Langley (1982)
  • Learning control rules analytically
  • Laird et al. (1986), Mitchell et al. (1986),
    Minton (1988)
  • Problem solving by analogy
  • Veloso (1994), Jones Langley (1995), VanLehn
    Jones (1994)
  • Inducing control rules for partial-order plans
  • Kautukam Kambhampati (1994), Estlin Mooney
    (1997)

6
Historical Trends
Work on learning plan knowledge has seen many
shifts in fashion
  • Early hope for improving problem
    solvers/planners (1978?1985)
  • Excitement/confusion introduced by EBL movement
    (1986?1992)
  • Some doubts raised by the utility problem
    (1988?1993)
  • Mass migration to reinforcement learning
    paradigm (1993?2003)
  • Resurgence of interest in learning plan
    knowledge (2004?present)

Throughout these changes, the problems and
potential of learning plan knowledge have
remained.
7
Traditional Sources of Information
Most research on learning for planning has
assumed the system uses search to generate
  • Successful paths that achieve the goals (positive
    instances)
  • Failed paths that do not achieve the goals
    (negative instances)
  • Alternative paths of different desirability
    (preferred instances)

But humans learn from other sources of
information and our AI systems should as well.
8
Challenge Learn from Many Sources
There has been relatively little research on plan
learning from
  • Demonstrations of solved problems (Nejati et al.,
    2006)
  • Explicit instruction from teacher (Blythe et al.,
    2007)
  • Advice or hints from teacher (Mostow, 1983)
  • Mental simulations or daydreaming (Mueller, 1985)
  • Undesirable side effects during execution

Humans learn from all of these sources, and our
learning systems should support the same
capabilities. Moreover, we should develop
single systems that integrate plan knowledge
learned from all of them (Oblinger, 2006).
9
Traditional Performance Tasks
Most research on learning for planning has
assumed the system aims to improve
  • The efficiency of plan generation (nodes
    expanded, time)
  • The quality of generated plans (path length,
    utility)
  • The coverage of plan knowledge (problems solved)

But humans learn and use plan knowledge for
other purposes that are just as valid.
10
Challenge Learn for Plan Execution
Many important domains require executing plan
knowledge in some environment that includes
  • operators with likely but nonguaranteed effects
  • external events not directly under the agents
    control
  • other agents that are pursuing their own goals

Urban driving is one setting that raises all
three of these issues. Complex board games like
chess, although deterministic, still require
interleaving of planning and execution. We need
more research on plan learning in contexts of
this sort (e.g., Benson, 1995 Fern et al.,
2004).
11
Challenge Learn for Plan Understanding
Another understudied problem is learning for plan
understanding.
  • Given A partially observed sequence of states
    influenced by another agents actions.
  • Given Learned knowledge about how to achieve
    goals.
  • Find The other agents goals and the plans it
    is pursuing to achieve them.

Plan understanding is important not only in
complex games, but in military planning,
politics, and other settings. This performance
task suggests new learning problems, methods, and
evaluation criteria.
12
Traditional Learning Scenarios
Most research on learning for planning has
assumed the system
  • Trains on problems from a given distribution /
    domain
  • Tests on problems from the same distribution /
    domain

Success depends on the extent to which the
learner generalizes well to new problems from the
same domain. But humans also use their learned
plan knowledge in other, more flexible ways to
improve performance.
13
Challenge Cumulative Learning
In complex domains, humans learn plan knowledge
gradually
  • Starting with small, relatively easy problems
  • Moving to complex problems after mastering
    simpler ones

Later acquisitions build naturally on earlier
experience, learning to cumulative learning.
Our education system depends heavily on such
vertical transfer of learned knowledge. We
need more learning systems that demonstrate this
form of cumulative improvement (e.g., Reddy
Tadepalli, 1997).
14
Challenge Cross-Domain Transfer
In other cases, humans exhibit a form of transfer
that involves
  • Learning to solve problems in one domain
  • Reusing this knowledge to solve problems in
    another domain that is superficially quite
    different

Such cross-domain transfer is related to
within-domain analogical reasoning, but it is far
more challenging. In its extreme form, the two
domains support similar solutions but have no
shared symbols or predicates. We need more
learning systems that demonstrate this radical
form of knowledge reuse.
15
Traditional Learned Representations
Most research on learning for planning has
focused on learning
  • Control rules that reduce effective branching
    factor
  • Macro-operators that reduce effective solution
    depth

These grew naturally from representations used to
create hand-crafted expert problem solvers. But
now we have other representations of plan
knowledge that suggest new learning tasks and
methods. Nor does this refer to POMDPs,
workflows, or other highly constrained
formalisms.
16
Challenge Learn HTNs
Hierarchical task networks (HTNs) offer the most
effective planning available, but they are
expensive to build manually. HTNs provide an
ideal target for learning because they have
  • the modularity and flexibility of search-control
    rules
  • the large-scale structure of macro-operators

Machine learning has automated the creation of
expert classifiers. We should do the same for
HTNs, which are effectively expert planning
systems.
17
Challenge Learn HTNs
We can define the task of learning hierarchical
task networks as
  • Given Basic knowledge about some action-oriented
    domain
  • Given A set of training problems (initial states
    and goals)
  • Given Some performance task the system must
    carry out.
  • Given Some module that uses HTNs to perform this
    task
  • Learn An HTN that lets the system improve its
    performance on new tasks from the same or
    similar domain.

We need more research on this important topic
(e.g., Reddy Tadepalli, 1997 Ilghami et al.,
2005).
18
Some Responses
Our recent research attempts to respond to these
challenges by developing methods that
  • acquire a constrained but important class of HTNs
  • that one can use for both planning and reactive
    control
  • from both successful problem solving and expert
    traces
  • that extends naturally to support cross-domain
    transfer

Moreover, these ideas are embedded in an
integrated architecture that supports many
capabilities ? ICARUS (Langley, 2006).
19
Conceptual Knowledge in ICARUS
Nonprimitive Concept (patient-form-filled
?patient)
Primitive Concept (assigned-mission ?patient
?mission)
  • Conceptual knowledge is cast as Horn clauses that
    specify relevant relations in the environment
  • Memory is organized hierarchically
  • Divided into primitive and non-primitive
    predicates

20
HTN Methods in ICARUS
HTN goal concept
subgoal
HTN method
precondition concept
HTN method
operator
  • Similar to SHOP2 but methods indexed by goals
    they achieve
  • Each method decomposes a goal into subgoals
  • If a methods goal is active and its precondition
    is satisfied, then try to achieve its subgoals or
    apply its operators

21
Operators in ICARUS
Action (get-arrival-time ?patient ?from ?to)
Effects Concept (arrival-time ?patient)
Precondition Concept (patient ?p)
and (travel-from ?p ?from) and (travel-to ?p ?to)
  • Operators describe low-level actions that agents
    can execute directly in the environment
  • Preconditions legal conditions for action
    execution
  • Effects expected changes when action is executed

22
Training Input Expert Traces and Goals
Operator instance (get-arrival-time P2)
Goal concept (all-patients-arranged)
State
Concept instance (assigned-flight P1 M1)
  • Expert demonstration traces
  • Operators the expert uses and the resulting
    belief state
  • State Set of concept instances
  • Goal is a concept instance in the final state
  • ICARUS learns generalized skills that achieves
    similar goals

23
Learning Plan Knowledge from Demonstration
Reactive Executor
Problem
Plan Knowledge
?
Initial State
goal
Learned plan knowledge
If Impasse
HTNs
Demonstration Traces
Expert
States and actions
Operators
Concept definitions
Background knowledge
24
Learning HTNs by Trace Analysis
concepts
actions
25
Learning HTNs by Trace Analysis
Operator Chaining
26
Learning HTNs by Trace Analysis
Concept Chaining
concepts
actions
27
Explanation Structure for Trace
(transfer-hospital patient1 hospital2)
(arrange-ground-transportation SFO hospital2 1pm)
Time3
(location patient1 SFO 1pm)
(close-airport hospital2 SFO)
(assigned patient1 NW32)
(arrival-time NW32 1pm)
(dest-airport patient1 SFO)
(query-arrival-time)
(assign patient1 NW32)
Time1
Time2
(scheduled NW32)
(flight-available)
28
Hierarchical Task Network Structure
(transfer-hospital ?patient ?hospital)
(close-airport ?hospital ?loc)
(arrange-ground-transportation ?loc ?hospital
?time)
(location ?patient ?loc ?time)
(assigned ?patient ?flight)
(arrival-time ?flight ?time)
(dest-airport ?patient ?loc)
(scheduled ?flight)
(flight-available)
(query-arrival-time)
(assign ?patient ?flight)
29
Transfer by Representation Mapping
Source domain
Target domain
concepts
Predicate mappings
actions
30
Challenge Learn with Richer Goals
HTNs are more expressive than classical plans
(Errol et al., 1994). Our approach loses this
advantage because it assumes the head of each
method is a goal it achieves, but we can
  • Extend goal concepts to describe temporal
    behavior
  • Revise the execution module to handle these
    structures
  • Augment trace analysis to reason about temporal
    goals
  • Learn new methods with temporal goals in their
    heads

This scheme should acquire the full class of HTNs
while still retaining the tractability of
goal-directed learning.
31
Challenge Extend Conceptual Vocabularies
Our approach to learning HTNs relies on the
concept hierarchy used to explain solution
traces. The method would be less dependent if
it extended this hierarchy
  • Given A set of concepts used in goals, states,
    and methods
  • Given New methods acquired from sample solution
    traces
  • Find New concepts that produce improved
    performance as the result of future method
    learning.

This would support a bootstrapped learner that
invents predicates to describe states, goals, and
methods.
32
Challenge Extend Conceptual Vocabularies
Our approach to utilizing predicate invention has
three steps
  • Define a new concept for the precondition of each
    method learned by chaining off a concept
    definition.
  • Check traces for states in which this concept
    becomes true and learn methods to achieve it.
  • During performance, treat each methods
    precondition as its first subgoal, which it can
    achieve if submethods are known.

This technique would make an HTN more complete by
growing it downward, introducing nonterminal
symbols as necessary. We have partially
implemented this scheme and hope to report
results at the next meeting.
33
Concluding Remarks Research Style
Clearly, there remain many open problems to
address in learning plan knowledge. These
involve new abilities, not improvements on
existing ones, which suggests that we
  • Look at human behavior for ideas on how to
    proceed
  • Develop integrated systems rather than component
    algorithms
  • Demonstrate their behavior on challenging domains

These strategies will help us extend the reach of
our learning systems, not just strengthen their
grasp.
34
Concluding Remarks Evaluation
We must evaluate our new plan learners, but this
does not mean
  • Measuring their speed in generating plans
  • Showing they run faster than existing systems
  • Entering them in planning competitions

More appropriate experiments would revolve
around
  • Demonstrating entirely new functionalities
  • Running lesion studies to show new features are
    required
  • Using performance measures appropriate to the task

These steps will produce conceptual advances and
scientific understanding far more than will
mindless bake-offs.
35
Concluding Remarks Summary
Learning plan knowledge is a key area with many
open problems
  • Learning from traces, advice, and other sources
  • Transferring knowledge within and across domains
  • Learning and extending rich structures like HTNs

These challenges will benefit from earlier work
on plan learning, but they also require new
ideas. Together, they should lead us toward
learning systems that rival humans in their
flexibility and power.
36
End of Presentation
37
ICARUS Concepts for In-City Driving
((in-rightmost-lane ?self ?clane) percepts
( (self ?self) (segment ?seg) (line ?clane
segment ?seg)) relations ((driving-well-in-segme
nt ?self ?seg ?clane) (last-lane ?clane) (not
(lane-to-right ?clane ?anylane)))) ((driving-well
-in-segment ?self ?seg ?lane) percepts ((self
?self) (segment ?seg) (line ?lane segment ?seg))
relations ((in-segment ?self ?seg) (in-lane
?self ?lane) (aligned-with-lane-in-segment ?self
?seg ?lane) (centered-in-lane ?self ?seg
?lane) (steering-wheel-straight
?self))) ((in-lane ?self ?lane) percepts
( (self ?self segment ?seg) (line ?lane segment
?seg dist ?dist)) tests ( (gt ?dist -10)
(lt ?dist 0)))
38
Representing Short-Term Beliefs/Goals
(current-street me A) (current-segment me
g550) (lane-to-right g599 g601) (first-lane
g599) (last-lane g599) (last-lane
g601) (at-speed-for-u-turn me) (slow-for-right-tur
n me) (steering-wheel-not-straight
me) (centered-in-lane me g550 g599) (in-lane me
g599) (in-segment me g550) (on-right-side-in-segme
nt me) (intersection-behind g550
g522) (building-on-left g288) (building-on-left
g425) (building-on-left g427) (building-on-left
g429) (building-on-left g431) (building-on-left
g433) (building-on-right g287) (building-on-right
g279) (increasing-direction me) (buildings-on-righ
t g287 g279)
39
ICARUS Skills for In-City Driving
((in-rightmost-lane ?self ?line) percepts
((self ?self) (line ?line)) start
((last-lane ?line)) subgoals ((driving-well-in-s
egment ?self ?seg ?line))) ((driving-well-in-seg
ment ?self ?seg ?line) percepts ((segment
?seg) (line ?line) (self ?self)) start
((steering-wheel-straight ?self)) subgoals
((in-segment ?self ?seg) (centered-in-lane ?self
?seg ?line) (aligned-with-lane-in-segment ?self
?seg ?line) (steering-wheel-straight
?self))) ((in-segment ?self ?endsg) percepts
((self ?self speed ?speed) (intersection ?int
cross ?cross) (segment ?endsg street ?cross
angle ?angle)) start ((in-intersection-fo
r-right-turn ?self ?int)) actions ((?steer
1)))
40
ICARUS Interleaves Execution and Problem Solving
Skill Hierarchy
Problem
Reactive Execution
?
no
impasse?
Primitive Skills
Executed plan
yes
Problem Solving
This organization reflects the psychological
distinction between automatized and controlled
behavior.
Write a Comment
User Comments (0)
About PowerShow.com