Title: Agent Intelligence
1Agent Intelligence
- Based on tutorials and presentations
- J.H. Siekmann, N. Nillson, S.J. Russel, P.
Norvig, A. Geyer-Schulz, C.Dyer, J. Robin, J.
Han, C. Isik, M. Kamber, A.
Logvinovskiy, S. Puuronen, V. Terziyan, Wikipedia
2Intelligent perception of the external
environment, mining data and discovering
knowledge about it, reasoning new facts about it,
planning own behavior within it and acting based
on plans - are among the basic abilities of an
intelligent agent
Knowledge and facts
Agent Environment
Plans
Behavior
3Agent Logic, Reasoning Planning
- Based on tutorials and presentations
- J.H. Siekmann, N. Nillson, S. Russel, P. Norvig,
- A. Geyer-Schulz, C. Dyer, J. Robin
4Real-World ReasoningTackling inherent
computational complexity
DARPA Research Program
1M 5M
Multi-Agent Systems
10301,020
0.5M 1M
Hardware/Software Verification
10150,500
Worst Case complexity
Exponential Complexity
200K 600K
Military Logistics
1015,050
50K 200K
Chess
103010
No. of atoms on earth
10K 50K
Deep space mission control
Technology Targets
1047
- High-Performance Reasoning
- Temporal/ uncertainty reasoning
- Strategic reasoning/Multi-player
Seconds until heat death of sun
100 200
Car repair diagnosis
1030
Protein folding calculation (petaflop-year)
Variables
100
10K
20K
100K
1M
Rules (Constraints)
Example domains cast in propositional reasoning
system (variables, rules).
5The Agent Architecture A Model
Head General Abilities
Body Application- specific Abilities
6TYPE 1Simple Reflex Agents
7TYPE 2 State-based Agents
8TYPE 3 Goal-based Agents
9TYPE 4 Learning Agents/Utility based Agents
10A knowledge-based agent
- A knowledge-based agent includes a knowledge base
and an inference system. - A knowledge base is a set of representations of
facts of the world. - Each individual representation is called a
sentence. - The sentences are expressed in a knowledge
representation language. - The agent operates as follows
- 1. It TELLs the knowledge base what it perceives.
- 2. It ASKs the knowledge base what action it
should perform. - 3. It performs the chosen action.
11Knowledge Base
- Knowledge Base
- set of sentences
- in a formal knowledge representation language
- that represents assertions about the world.
- Declarative approach to building an agent
- Tell it what it needs to know.
- Ask it what to do ? answers should follow by
inference rules from the KB.
ask
tell
12Knowledge Reasoning
- To address these issues we will introduce
- A knowledge base (KB) a list of facts that are
known to the agent. - Rules to infer new facts from old facts using
rules of inference. - Logic provides the natural language for this.
13Why knowledge-base
Agent knowledge of state
Description of the world
Agent explicit specification of what he knows
- The state of the world
- may require lots of information..
- The agent knowledge of the state of the world
- If S is world state K(S) is what the agent
knows. - For economy
- Not everything explicitly specified. Some facts
can be inferred. - Agent may infer whatever he does not know
explicitly. - Constraints on feature values
- Age of a person is not more than 200 years
- Issues
- In what language to express what the agent knows
about the world. How explicit to make this
knowledge. How to infer.
14Logic in general
- Logics are formal languages for representing
information such that conclusions can be drawn - Syntax defines the sentences in the language
- Semantics define the "meaning" of sentences
- i.e., define truth of a sentence in a world
- E.g., the language of arithmetic
- x2 y is a sentence x2y gt is not a
sentence - x2 y is true iff the number x2 is no less
than the number y - x2 y is true in a world where x 7, y 1
- x2 y is false in a world where x 0, y 6
15Entailment
- Entailment means that one thing follows from
another - KB a
- Knowledge base KB entails sentence a if and only
if a is true in all worlds where KB is true - E.g., the KB containing Milan won and Inter
won entails Either Milan won or Inter won - E.g., xy 4 entails 4 xy
- Entailment is a relationship between sentences
(i.e., syntax) that is based on semantics
16Models
- Logicians typically think in terms of models,
which are formally structured worlds with respect
to which truth can be evaluated - We say m is a model of a sentence a if a is true
in m - M(a) is the set of all models of a
- Then KB a iff M(KB) ? M(a)
17Inference, Soundness, Completeness
- KB i a sentence a can be derived from KB by
procedure i - Soundness i is sound if whenever KB i a, it is
also true that KB a - Completeness i is complete if whenever KB a, it
is also true that KB i a
18Knowledge RepresentationDefined by syntax,
semantics
Agent
Inference
? ?
Assertions Conclusions (knowledge
base) Facts Facts
Semantics
? ?
Imply
Real-World
19Schematic perspective
If KB is true in the real world, then any
sentence ? derived from KB by a sound inference
procedure is also true in the real world.
20Logic as a KR language
A modal logic is any logic for handling
modalities concepts like possibility, existence,
necessity, eventually, formerly, can, could,
might, may, must, etc.
Temporal logic is used to describe any system of
rules and symbolism for representing, and
reasoning about, propositions qualified in terms
of time.
A higher-order logic is the logic where it is
allowed to quantify over predicates. A
higher-order predicate is a predicate that takes
one or more other predicates as arguments.
Multi-valued logics is logic, in which there are
more than two truth values.
Multi-valued Logic
First-order logic is a system of deduction
extending propositional logic by the ability to
express relations between individuals (e.g.
people, numbers, and "things") more generally.
Non-monotonic Logic
Modal
Temporal
Probabilistic logic is the logic, where the truth
values of sentences are probabilities.
Higher Order
A non-monotonic logic is a formal logic, in which
adding a formula to a theory may produce a
reduction of its set of consequences.
Probabilistic Logic
First Order
Propositional Logic
Fuzzy Logic
Fuzzy logic is derived from fuzzy set theory
dealing with reasoning that is approximate rather
than precisely deduced from classical first-order
logic.
Propositional logic is logic that studies ways of
joining and/or modifying entire propositions,
statements or sentences to form more complicated
ones, as well as the logical relationships and
properties that are derived from these methods of
combining or altering statements.
21Ontology and epistemology
22Propositional logic Syntax
- Propositional logic is the simplest logic
illustrates basic ideas - The proposition symbols P1, P2 etc are sentences
- If S is a sentence, ?S is a sentence
(negation) - If S1 and S2 are sentences, S1 ? S2 is a sentence
(conjunction) - If S1 and S2 are sentences, S1 ? S2 is a sentence
(disjunction) - If S1 and S2 are sentences, S1 ? S2 is a sentence
(implication) - If S1 and S2 are sentences, S1 ? S2 is a sentence
(biconditional)
23Propositional logic Semantics
- Each model/world specifies true or false for each
proposition symbol - Rules for evaluating truth with respect to a
model m - ?S is true iff S is false
- S1 ? S2 is true iff S1 is true and S2 is
true - S1 ? S2 is true iff S1is true or S2 is
true - S1 ? S2 is true iff S1 is false or S2 is true
- i.e., is false iff S1 is true and S2 is
false - S1 ? S2 is true iff S1?S2 is true andS2?S1 is
true - Simple recursive process evaluates an arbitrary
sentence, e.g., - ?P1,2 ? (P2,2 ? P3,1) true ? (true ? false)
true ? true true
24Logical equivalence
- To manipulate logical sentences we need some
rewrite rules. - Two sentences are logically equivalent iff they
are true in same models a ß iff a ß and ß a
You need to know these !
25Pros and cons of propositional logic
- ? Propositional logic is declarative
- ? Propositional logic allows partial/disjunctive/n
egated information - (unlike most data structures and databases)
- Propositional logic is compositional
- meaning of B1,1 ? P1,2 is derived from meaning of
B1,1 and of P1,2 - ? Meaning in propositional logic is
context-independent - (unlike natural language, where meaning depends
on context) - ? Propositional logic has very limited expressive
power - (unlike natural language)
26First-order logic
- Propositional logic assumes the world contains
facts - First-order logic (like natural language) assumes
the world contains - Objects people, houses, numbers, colors,
baseball games, wars, centuries - Relations red, round, prime, brother of, bigger
than, part of, comes between, - Functions father of, best friend, one more than,
plus,
27Syntax of FOL Basic elements
- Constants KingJohn, 2, Penn,...
- Predicates Brother, gt,...
- Functions Sqrt, LeftLegOf,...
- Variables x, y, a, b,...
- Connectives ?, ?, ?, ?, ?
- Equality
- Quantifiers ?, ?
28Atomic sentences
- Term function (term1,...,termn) or
constant or variable - Atomic sentence predicate (term1,...,termn) or
term1 term2 - For example
- Brother(KingJohn, RichardTheLionheart)
- gt (Length(LeftLegOf(Richard)),Length(LeftLegOf(Kin
gJohn)))
29Complex sentences
- Complex sentences are made from atomic sentences
using connectives - ?S, S1 ? S2, S1 ? S2, S1 ? S2, S1 ? S2,
- For example
- Sibling(KingJohn,Richard) ? Sibling(Richard,King
John)
30Universal quantification
- ?ltvariablesgt ltsentencegt
- Everyone at Penn is smart
- ? x At(x,Penn) ? Smart(x)
- ?x P is true in a model m iff
- P is true with x being each possible object in
the model - Roughly speaking, equivalent to the conjunction
of instantiations of P - At(KingJohn,Penn) ? Smart(KingJohn)
- ? At(Richard,Penn) ? Smart(Richard)
- ? At(Penn,Penn) ? Smart(Penn)
- ? ...
31Existential quantification
- ? ltvariablesgt ltsentencegt
- Someone at Penn is smart? x At(x,Penn) ?
Smart(x) - ? x P is true in a model m iff P is true with
x being some possible object in the model - Roughly speaking, equivalent to the disjunction
of instantiations of P - At(KingJohn,Penn) ? Smart(KingJohn)
- ? At(Richard,Penn) ? Smart(Richard)
- ? At(Penn,Penn) ? Smart(Penn)
- ? ...
32Properties of quantifiers
- ? x ? y is the same as ? y ? x
- ? x ? y is the same as ? y ? x
- ? x ? y is not the same as ? y ? x
- ? x ? y Loves(x,y)
- There is a person who loves everyone in the
world - ? y ? x Loves(x,y)
- Everyone in the world is loved by at least one
person - Quantifier duality each can be expressed using
the other? x Likes(x,IceCream) ?? x
?Likes(x,IceCream)? x Likes(x,Broccoli) ?? x
?Likes(x,Broccoli)
33Using FOL
- Brothers are siblings
- ? x,y Brother(x,y) ? Sibling(x,y)
- One's mother is one's female parent
- ? m,c Mother(c) m ? (Female(m) ? Parent(m,c))
- Sibling is symmetric
- ? x,y Sibling(x,y) ? Sibling(y,x)
- A first cousin is a child of a parents sibling
- ? x,y FirstCousin(x,y) ? ? p,ps Parent(p,x) ?
Sibling(ps,p) ? Parent(ps,y)
34Wumpus world
- Performance measureGold 1000, death 1000, step
1, arrow 10 - Environment- squares adjacent to Wumpus are
smelly- squares adjacent to pits are breezy-
glitter iff gold is in the same square- shooting
kills Wumpus if you are facing it- shooting uses
up the only arrow- grabbing picks up gold if in
the same square- releasing drops the gold in
same square - SensorsBreeze, glitter, smell
- ActuatorsLeft, right turn, forward, grab,
release, shoot
35Wumpus world
- A four by four cave with locations identified by
coordinates (3,4), etc. - Agent is at a location, facing a particular
direction (L,R,D,U) - Agent starts at (1,1) facing R
4
gt
1
1
4
36Wumpus world
- In the cave is
- A Wumpus that smells
- It can kill the agent if at same location
- It can be killed by the agent shooting an arrow
if facing the Wumpus. When the Wumpus dies, it
SCREAMs
4
1
1
4
37Wumpus world
- In the cave are
- 3 Pits. Breezes blow from pits.
- If an agent steps into a pit, it falls to its
death. - A heap of gold that glitters
4
1
1
4
38Wumpus world
- Agent goal
- get gold and get out alive
- Agent actions
- Move forward one square in current direction
(Fwd) - Turn left or right 90o (TL,TR)
- Shoot arrow in current direction
- Grab gold
- Agent perceptions at each location
- Stench, Breeze, Glitter, Bump, Scream
39Wumpus world
- Cave is created randomly (location of Wumpus,
pits and gold) - Perception / action loop
- Agent must construct a model knowledge base
about the cave as it tries to achieve its goal
4
gt
1
1
4
40Wumpus world knowledge
- General knowledge (known at start)
- Location and direction
- Living
- Grab and holding
- Wumpus and stench, shooting, scream, life
- Pits and breeze
- Gold and glitter
- Movement and location, direction and bumps
- Starting state of agent
- Goal
- Facts (not known)
- Location of Wumpus, pits, gold
41Wumpus world characterization
- Observable
- No only local perception
- Deterministic
- Yes outcomes explicit
- Episodic
- No sequential actions
- Discrete
- Yes
- Single-agent
- Yes
42Exploring Wumpus world
A
43Exploring Wumpus world
B
A
44Exploring Wumpus world
B
A
45Exploring wumpus world
B
S
A
46Exploring Wumpus world
P
ok
B
S
W
A
How can we make these inferences automatically?
47Wumpus world in propositional logic
- Facts are propositions
- e.g., W44 Wumpus is at square (4,4)
- 96 propositions (16 each for wumpus, stench, pit,
breeze, gold, glitter) to represent a particular
cave - General knowledge in sentences
- e.g., W44?(S44 ? S43 ? S34) if the Wumpus is at
(4,4), there is stench at (4,4), (4,3) and (3,4) - many sentences
48Wumpus world in propositional logic
- Facts that may change
- Location of agent, direction of agent
- Agent holding gold
- Agent has shot arrow
- Agent, Wumpus are alive
49Wumpus world in FOL
- the objects in the environment
- terms constants, variables, functions
- constants
- times 0, 1, 2, ...
- headings R, L, D, U
- coordinates 1, 2, 3, 4
- locations 16 squares
- percepts Stench, Breeze, Glitter, Bump, Scream,
None - actions Turn(Left), Turn(Right), Forward, Grab,
Shoot - Agent, Wumpus
50Wumpus world in FOL
- the objects in the environment
- terms constants, variables, functions
- functions
- Square(x,y)
- Home(Wumpus)
- Perception(s, b, g, h, y)
- Heading(t), Location(t)
51Wumpus world in FOL
- the basic knowledge
- atomic sentences predicates, termterm
- predicates (true or false)
- properties (of one term/object)
- Breezy(t) // agent feeling breeze at time t
- Breeze(s) // breeze blowing on square s
- Pit(s), Gold(s), etc.
- Time(x) // object x is a time
- Coordinate(x), Action(x), Heading(x), etc.
52Wumpus world in FOL
- the basic knowledge
- atomic sentences predicates, termterm
- predicates (true or false)
- relations (of multiple terms/objects)
- At(s,t) // agent on square s at time t
- Adjacent (r,s) // squares r and s are adjacent
- Alive(x,t) // x is alive at time t
- Percept(p,t) // perception at time t
- BestAction(a,t) // action a to take at time t
53Wumpus world in FOL
- the basic knowledge
- atomic sentences predicates, termterm
- term term (true or false)
- Home(Wumpus) 3,3
- Heading(5) U
54Exploring Wumpus world in FOL
Deciding the best action (incomplete description
here) Need to reason about the cave
conditions Diagnostic rules ?s Breezy(s) ? ?r
Adjacent(r,s) ? Pit(r) Causal rules ?r Pit(r) ?
?s Adjacent(r,s) ? Breezy(s)
55Rules as a Knowledge Representation Formalism
- What is a rule?
- A statement that specifies that
- If a determined logical combination of
conditions is satisfied, - over the set of an agents percepts
- and/or facts in its Knowledge Base (KB)
- that represent the current, past and/or
hypothetical future of its environment model, its
goals and/or its preferences, - then a logic-temporal combination of actions can
or must be executed by the agent, - directly on its environment (through actuators)
or on the facts in its KB. - A KB agent such that the persistent part of its
KB consists entirely of such rules is called a
rule-base agent - In such case, the inference engine used by the KB
agent is an interpreter or a compiler for a
specific rule language.
56Rule-Based Agent
Environment
Sensors
Ask
Tell
Retract
- Rule Engine
- Domain class independent
- Only dependent on rule language
- Declarative code interpreter or compiler
Ask
- Rule Base
- Persistent intentional knowledge
- Domain class dependent
- Declarative code
Effectors
57Rules examples
- Examples in semi-natural language syntax
- IF P sells a W to N AND W is a weapon
- AND N is a nation AND N is hostile
- THEN P is a criminal
- IF P is a criminal AND L is location of
P - THEN call to police AND report P is a
criminal - AND report L
-
58Toulmins Argumentation Scheme
therefore
Qualifier, inference result
Facts
if not
cause
Inference rule
Exception rule
because of
support
COGNITIVE SCIENCE
59Toulmins argumentation scheme example
therefore
The offeredused car is old
Probably, the offered used car is cheap
if not
cause
Used cars are cheapmost of the time
The offered used caris a collectors item
because of
Used things loose their valuewhen time goes by,
because they break down more often etc.
60Basic principles of XPS (1)
Every production rule has two parts
A
B
Assumption Antecedence Evidence If-Part Left hand
side (LHS) Condition
Conclusion Consequence Hypotheses Then-Part Right
hand side (RHS) Action
Productions are evaluated over a (data) pool (not
data base!) that is named working memory (WM)
or at applications in cognitive psychology
denoted as short-term memory (STM)
61Basic principles of XPS (2)
There are two modes of evaluating production
rules
Backward chaining
Forward chaining
A
B
A
B
Data controlled inference Antecedence-oriented
inference Bottom-up inference If-added
methods LHS-controlled chaining
Goal controlled inference Consequence-oriented
inference Top-down inference If-needed
methods RHS-controlled chaining
62Production Rule Systems
Facts
(( car_no DÜW-AW 205) motor_status
on) oil_control on) air_pressure 0,1 bar) ...)
Rules
(1) IF (motor_status on) AND
(oil_control on) THEN WRITE(Stop
motor) AND SET (motor_status off)
(2) IF (car_no x) AND
(air_pressure y) (LESS y 1.5)
THEN WRITE( x has a flat tire)
63General Structure of Production Rules
Simple rules
IF B1 ? B2 ? ? Bn THEN A1 ? A2 ? ?
Am ELSE C1
IF B1 ? B2 ? ? Bn THEN DO A1 ? A2 ? ?
Am ELSEDO C1
Example
IF the site of the culture is throat AND the
organism is streptococcus THEN there is strong
evidence that the subtype is not of group-D
64Architecture of a Production System
data base
rules
C1 ? C2 ? A1 C3 ? A2 C1 ? C3 ? A3 C4 ? A4 C5 ?
A5
C5 C1 C3
Rule interpreter
recognition
action
conflict set
match
productionrules againstdata base
C3 ? A2 C1 ? C3 ? A3 C5 ? A5
C3 ? A2
evaluate A2
65Rules with Certainty Factors
IF C1(w1) ? C2(w2) ? ? Cn(wn) THEN
DO A(W)
Example
IF the organism is gram-pos AND the
organism grows in chains AND the morphology is
spherical THEN by 70 evidence the organism is
streptococcus
66Structured Rules (mapping relations)
Condition
Action
Default
Context
Example (causal relations)
IF (COND serious diarrhea AND longer
then two days) (CTXT malabsorprion)
(DFLT no bicarbonat therapy) THEN medium
metabolic acidosis with a normal anions
ELSE light metabolic acidosis with normal anions
67 68What is AI Planning ?
- Generate sequences of actions to perform tasks
and achieve objectives - Until recently, AI planning was essentially a
theoretical endeavor Its now becoming useful in
industrial applications - Example application areas
- military operations logistics
- space exploration
- Proof planning in mathematics
- Speech and dialog planning
- Agent behavior planning
69Planning Involves
Room 2
- Given knowledge about task domain (actions)
- Given problem specified by initial state
configuration and goals to achieve - Agent tries to find a solution, i.e. a sequence
of actions that solves a problem
Agent
Room 1
70Notions
Room 2
Go to the basket
Go to the can
- Plan sequence of (actions) transforming the
initial state into a final state - Operators representation of actions
- Planner algorithm that generates a plan from a
(partial) description of initial and final state
and from a specification of operators
Room 1
71The Blocks World in Reality
72What is a Planning Problem?
- A planning problem is given by
- an initial state and a goal
state.
ontable (B) ontable (C) on (D,B) on (A,D) clear
(A) clear (C) handempty
A
GOAL
D
B
C
For a transition there are certain operators
available.
PICKUP (x) picking up x from the table PUTDOWN
(x) putting down x on the table STACK (x,
y) putting x on y UNSTACK (x, y) picking up x
from y
73Representing States of the World
- State a consistent assignment of TRUE or FALSE
to every literal in the universe - State description
- a set of ground literals that are all taken to be
TRUE
c
on(c,a),ontable(a),clear(c), ontable(b),clear(b),h
andempty
a
b
- The negation of these literals are taken to be
false - Truth values of other ground literals are
unknown
74STRIPS Operators (with negation)
STRIPS Stanford Research Institute Problem
Solver
Name name(v1, v2, ..., vn)
Preconditions atom1, atom2, ..., atomn
Effects literal1, literal2,
..., literalm
Name unstack(?x,?y) Preconditions
on(?x,?y), clear(?x), handempty Effects
on(?x,?y), clear(?x), handempty,
holding(?x), clear(?y)
Example
- Operator Instance replacement of variables by
constants
75Example The Blocks World
- unstack(?x,?y)
- Pre on(?x,?y), clear(?x), handempty
- Eff on(?x,?y), clear(?x), handempty,
- holding(?x), clear(?y)
stack(?x,?y) Pre holding(?x), clear(?y)
Eff holding(?x), clear(?y), on(?x,?y),
clear(?x), handempty
pickup(?x) Pre ontable(?x), clear(?x),
handempty Eff ontable(?x), clear(?x),
handempty, holding(?x)
putdown(?x) Pre holding(?x)
Eff holding(?x), ontable(?x), clear(?x),
handempty
76Plans
- STRIPS planning domain
- A language L (choose the predicate and constant
symbols) - A set of planning operators (e.g., the
blocks-world operators) - Plan
- A sequence P (o1, o2, ..., ok) of ground
instances of operators
unstack(c,a), putdown(c), pickup(a), stack(a,c)
Each oi is called a step of P
77Planning Problems
clear(c) on(c,a) ontable(a) clear(b) ontable(b) ha
ndempty
- STRIPS planning problems
- - a triple R (i, g, O)
c
- i is the initial state description
b
a
- O is a set of planning operators
a
on(a,c) ontable(c)
c
- P is a correct plan for R if g is true in
- result(i, P)
unstack(c,a), putdown(c), pickup(a), stack(a,c)
unstack(c,a), putdown(c), pickup(a), stack(a,b)
pickup(a), stack(a,c), unstack(c,a), putdown(c)
78CLEAR(A) ONTABLE(A) CLEAR(B) ONTABLE(B) CLEAR(C) O
NTABLE(C) HANDEMPTY
putdown(B)
putdown(A)
Search Space
pickup(A)
pickup(B)
pickup(C)
Putdown(C)
CLEAR(A) CLEAR(C) HOLDING(B) ONTABLE(A) ONTABLE(
C)
CLEAR(A) CLEAR(B) HOLDING(C) ONTABLE(A) ONTABLE(
B)
CLEAR(B) CLEAR(C) HOLDING(A) ONTABLE(B) ONTABLE(
C)
stack(A, B)
unstack(A, B)
stack(C, A)
unstack(B, A)
unstack(C, A)
stack(B, A)
stack(C, B)
stack(A, C)
unstack(A, C)
stack(B, C)
unstack(B, C)
unstack(C, B)
CLEAR(A) ON(B, C) CLEAR(B) ONTABLE(A) ONTABLE(C)
HANDEMPTY
CLEAR(C) ON(B, A) CLEAR(B) ONTABLE(A) ONTABLE(C)
HANDEMPTY
CLEAR(A) ON(C, B) CLEAR(C) ONTABLE(A) ONTABLE(B)
HANDEMPTY
CLEAR(B) ON(C, A) CLEAR(C) ONTABLE(A) ONTABLE(B)
HANDEMPTY
CLEAR(B) ON(A, C) CLEAR(A) ONTABLE(B) ONTABLE(C)
HANDEMPTY
CLEAR(C) ON(A, B) CLEAR(A) ONTABLE(B) ONTABLE(C)
HANDEMPTY
putdown(C)
pickup(c)
pickup(B)
pickup(C)
putdown(C)
putdown(B)
pickup(A)
putdown(A)
pickup(A)
pickup(B)
putdown(B)
putdown(B)
ON(B, C) CLEAR(B) HOLDING(A)ONTABLE(C)
ON(B, C) CLEAR(B) HOLDING(A)ONTABLE(C)
ON(B, C) CLEAR(B) HOLDING(A)ONTABLE(C)
ON(B, C) CLEAR(B) HOLDING(A)ONTABLE(C)
ON(B, C) CLEAR(B) HOLDING(A)ONTABLE(C)
ON(B, C) CLEAR(B) HOLDING(A)ONTABLE(C)
a
stack(C, B)
unstack(C, B)
stack(B, C)
unstack(B, C)
stack(C, A)
unstack(C, A)
stack(A, C)
stack(A, B)
unstack(A, C)
unstack(A, B)
stack(B, A)
stack(B, A)
b
CLEAR(A) ON(A, B) ON(B, C) ONTABLE(C) HANDEMPTY
CLEAR(C) ON(C, B) ON(B, A) ONTABLE(A) HANDEMPTY
CLEAR(A) ON(A, C) ON(C, B) ONTABLE(B) HANDEMPTY
CLEAR(B) ON(B, C) ON(C, A) ONTABLE(A) HANDEMPTY
CLEAR(B) ON(B, A) ON(A, C) ONTABLE(C) HANDEMPTY
CLEAR(C) ON(C, A) ON(A, B) ONTABLE(B) HANDEMPTY
c
79State-Space Search State-space planning is a
search in the space of states
C
A
B
C
A
B
C
B
Initialstate
A
B
A
C
A
B
C
B
B
C
A
B
A
B
C
C
A
C
A
A
A
Goal
B
C
B
C
B
C
A
A
C
B
80State-Space Search Vacuum World example
Initial state
Goal
81Depth-first search
Not necessarily shortest path Limited memory
requirement
82Depth-First search example
83Breadth-first search
Finds shortest path Large memory requirement
84Breadth-First search example
85Both Depth-first and Breadth-first search can be
- Forward (from the initial state to the goal)
- Backward (from the goal to the initial state)
- Bi-Directional (from both starting points until
meeting point)
86Bi-directional search
Schematic view of a bidirectional search is about
to succeed, when a branch from the start node
meets a branch from the goal node. The motivation
is that the area of the two small circles is less
than the area of one big circle centered on the
start and reaching to the goal.
87Depth-Limited and Iterative Deepening search
- Usually, breadth first search requires too much
memory to be practical. - Main problem with depth first search
- can follow a dead-end path very far before this
is discovered. - Depth-Limited search
- ? impose a depth limit l
- never explore nodes at depth gt l
- Iterative Deepening search is depth-limited
search with increasing limit - ? solution improves with more computation time
88Depth-Limited search (limit 3) example
89Iterative Deepening search example
90Uniform-Cost search
- Uniform-cost search is a tree search algorithm
used for traversing or searching a weighted tree,
tree structure, or graph. The search begins at
the start or goal node. The search continues by
visiting the next node which has the least total
cost from the root.
91Partial Plans
? Partial plan a partially ordered set of
operator instances
The partial order gives only some constraints on
the order in which the operations have to be
performed
? Start a dummy operator
? Finish another dummy operator
putdown(c)
pickup(a)
unstack(c, a)
Start
stack(a, b)
Finish
pickup(b)
stack(b, c)
92Partial Plan Example
SM Super Market HS Hardware Store
- At(Home)
- Sells(SM, Banana)
- Sells(HS, Drill)
- Have(Drill)
- Have(Milk)
- Have(Banana)
93Partial Plan Example
94GraphPlan THE BASIC IDEA
- 1. Construct the (initial) Planning Graph
- 2. Extract Solution (if possible)
- with fast
Graph-Search-Algorithms - 3. Else expand Graph and goto 2.
95The Planning Graph
- Alternating layers of ground literals and
actions (ground instances of operators)
representing the literals and actions that might
occur at each time step 0 lt i lt N
literals that might be true at time t (i-1)/2
0
i-1
i
i1
literals that are true at the initial state
literals that might be true at time t (i1)/2
...
...
...
...
...
...
...
...
preconditions
effects
...
...
operators
Maintenance NoOps
96Mutual Exclusion
InconsistentEffects
Competing Needs
Inconsistent Support
Interference
Inconsistent effects an effect of one negates an
effect of the other
Interference one deletes a precondition of the
other
Competing needs they have mutually exclusive
preconditions
- Two literals are mutex if
Inconsistent support one is the negation of the
other, or all ways of achievingthem are pairwise
mutex
97The 8th March Example
- Suppose you want to clean the room and prepare
dinner as a surprise for your sweetheart who is
asleep
Initial Conditions (and (garbage) (cleanHands)
(quiet)) Goal (and (dinner) (surprise) (not
(garbage))
Actions
cook precondition (cleanHands) effect
(dinner) serve precondition (quiet) effect
(surprise) clean precondition effect (and
(not (garbage)) (not (cleanHands))) vacuum preco
ndition effect (and (not (garbage)) (not
(quiet)))
98The Graph for this Example (1)
- Generate the first two levels of the planning
graph
- clean is mutex with garbage(inconsistent effects)
0
1
2
garb
garb
clean
??garb
- vacuum is mutex with serve(interference)
vacuum
cleanH
cleanH
- ?quiet is mutex with surprise(inconsistent
support)
?cleanH
cook
quiet
quiet
serve
??quiet
cook precondition (cleanHands)
effect (dinner)
dinner
clean precondition
surprise
effect (and (not (garbage)) (not (cleanHands)))
99Extraction of a Solution for the Example (1)
- Check to see whether theres a possible plan
- Recall that the goal is(and (dinner)
(surprise) - (not (garbage)))
0
1
2
garb
garb
clean
??garb
vacuum
- Note that
- All literals are present at level 2
- None are mutex with each other
cleanH
cleanH
?cleanH
cook
quiet
quiet
serve
- Thus there is a chance that a plan exists
??quiet
dinner
Solution Extraction
surprise
100Solution Extraction for the Example (2)
- Two sets of actions for the goals at level 2
- Neither works both sets contain actions that are
mutex
0
1
2
0
1
2
garb
garb
garb
garb
clean
clean
??garb
??garb
vacuum
vacuum
cleanH
cleanH
cleanH
cleanH
?cleanH
?cleanH
cook
cook
quiet
quiet
quiet
quiet
serve
serve
??quiet
??quiet
dinner
dinner
surprise
surprise
101Solution Extraction Example (3)
- Go back and do more graph extension generate two
more levels
0
1
2
3
4
garb
garb
garb
clean
clean
??garb
??garb
vacuum
vacuum
cleanH
cleanH
cleanH
?cleanH
?cleanH
cook
cook
quiet
quiet
quiet
serve
serve
??quiet
??quiet
dinner
dinner
surprise
surprise
102Example Solution extraction (4)
0
1
2
3
4
garb
garb
garb
clean
clean
??garb
??garb
vacuum
vacuum
cleanH
cleanH
cleanH
?cleanH
?cleanH
cook
cook
quiet
quiet
quiet
serve
serve
??quiet
??quiet
dinner
dinner
Twelve combinations at level 4
surprise
surprise
- Tree ways to archive ?garb
- Two ways to archive dinner
- Two ways to archive surprise
103Example Solution extraction (5)
Call Solution-Extraction recursively at level 2
one combination works, so we have got a plan
0
1
2
3
4
garb
garb
garb
clean
clean
??garb
??garb
vacuum
vacuum
cleanH
cleanH
cleanH
?cleanH
?cleanH
cook
cook
quiet
quiet
quiet
serve
serve
??quiet
??quiet
dinner
dinner
surprise
surprise
104Constraint Satisfaction Problems
- Set of variablesEach variable has a range of
possible values - Set of constraintsFind values for the variables
that satisfy all the constraints
- Dynamic constraint satisfaction problemsWhen we
select values for some variables, this changes
what the remaining variables and constraints are
105Agent Knowledge Discovery, Classification,
Prediction, Multidatabase Mining
Based on tutorials and presentations J. Han, C.
Isik, M. Kamber, A. Logvinovskiy, S. Puuronen, V.
Terziyan
106Data Mining A KDD Process
Knowledge
Pattern Evaluation
- Data mining the core of knowledge discovery
process.
Data Mining
Task-relevant Data
Selection
Data Warehouse
Data Cleaning
Data Integration
Databases
107Data Mining Confluence of Multiple Disciplines
- Database systems
- Statistics
- Machine learning
- Visualization
- Information science
- High performance computing
- Other disciplines
- Neural networks, mathematical modeling,
information retrieval, pattern recognition, etc.
108Introduction to Classification
- Classify data (creating a model) based on the
training set and the values in a classifying
attribute
109Classification vs. Prediction
- Classification
- predicts categorical class labels
- classifies data (constructs a model) based on the
training set and the values (class labels) in a
classifying attribute and uses it in classifying
new data - Prediction
- models continuous-valued functions, i.e.,
predicts unknown or missing values
110ClassificationA Two-Step Process
- Model construction describing a set of
predetermined classes - Each tuple/sample is assumed to belong to a
predefined class, as determined by the class
label attribute - The set of tuples used for model construction
training set - The model is represented as classification rules,
decision trees, or mathematical formulae - Model usage for classifying future or unknown
objects - Estimate accuracy of the model
- The known label of test sample is compared with
the classified result from the model - Accuracy rate is the percentage of test set
samples that are correctly classified by the
model - Test set is independent of training set,
otherwise over-fitting will occur
111Classification Process(I)
Classification Algorithms
IF rank professor OR years gt 6 THEN tenured
yes
112Classification Process(II)
(Jeff, Professor, 4)
Tenured?
113Supervised vs. Unsupervised Learning
- Supervised learning (classification)
- Supervision The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations - Based on the training set to classify new data
- Unsupervised learning (clustering)
- We are given a set of measurements, observations,
etc with the aim of establishing the existence of
classes or clusters in the data - No training data, or the training data are not
accompanied by class labels
114Data Preparation
- Data cleaning
- Preprocess data in order to reduce noise and
handle missing values - Relevance analysis (feature selection)
- Remove the irrelevant or redundant attributes
- Data transformation
- Generalize and/or normalize data
115Classification Accuracy Estimating Error Rates
- Partition Training-and-testing
- use two independent data sets, e.g., training set
(2/3), test set(1/3) - used for data set with large number of samples
- Cross-validation
- divide the data set into k subsamples
- use k-1 subsamples as training data and one
sub-sample as test data --- k-fold
cross-validation - for data set with moderate size
- Bootstrapping (leave-one-out)
- for small size data
116Boosting Techniques
- Boosting increases classification accuracy.
- Learn a series of classifiers, where each
classifier in the series pays more attention to
the examples misclassified by its predecessor - Boosting requires only linear time and constant
space
117What is a decision tree?
- A decision tree is a flow-chart-like tree
structure. - Internal node denotes a test on an attribute
- Branch represents an outcome of the test
- All tuples in branch have the same value for the
tested attribute. - Leaf node represents class label or class label
distribution.
118Training Dataset Example
buys_computer
Sample
no
1
no
2
yes
3
yes
4
yes
5
no
6
yes
7
no
8
yes
9
yes
10
yes
11
yes
12
yes
13
no
14
119How to construct a tree?
- Algorithm
- greedy algorithm
- make optimal choice at each step select the best
attribute for each tree node. - top-down recursive divide-and-conquer manner
- from root to leaf
- split node to several branches
- for each branch, recursively run the algorithm
120Output A Decision Tree for buys_computer
age?
lt30
overcast
gt40
30..40
student?
credit rating?
yes
no
yes
fair
excellent
no
no
yes
yes
121Algorithm for Decision Tree Induction
- Basic algorithm (a greedy algorithm)
- Tree is constructed in a top-down recursive
divide-and-conquer manner - At start, all the training examples are at the
root - Attributes are categorical (if continuous-valued,
they are discretized in advance) - Examples are partitioned recursively based on
selected attributes - Test attributes are selected on the basis of a
heuristic or statistical measure (e.g.,
information gain) - Conditions for stopping partitioning
- All samples for a given node belong to the same
class - There are no remaining attributes for further
partitioning majority voting is employed for
classifying the leaf - There are no samples left
122Decision Tree Construction
1,2,3,4,5,6,7,8,9,10,11,12,1314
age?
lt30
overcast
gt40
30..40
4,5,6,10,14
1,2,8,9,11
3,7,12,13
student?
credit rating?
yes
no
yes
fair
excellent
1,2,8
9,11
4,5,10
6,14
no
no
yes
yes
123Information Gain (ID3/C4.5)
- Select the attribute with the highest information
gain - Assume there are two classes, P and N
- Let the set of examples S contain p elements of
class P and n elements of class N - The amount of information, needed to decide if an
arbitrary example in S belongs to P or N is
defined as
124Information Gain in Decision Tree Induction
- Assume that using attribute A a set S will be
partitioned into sets S1, S2 , , Sv - If Si contains pi examples of P and ni examples
of N, the entropy, or the expected information
needed to classify objects in all subtrees Si is - The encoding information that would be gained by
branching on A
125Attribute Selection
1,2,3,4,5,6,7,8,9,10,11,12,1314
p12
n13
age?
p24
n20
p33
n32
lt30
overcast
gt40
30..40
4,5,6,10,14
1,2,8,9,11
3,7,12,13
b_c
1
n
2
n
3
y
4
y
5
y
6
n
7
y
8
n
9
y
10
y
11
y
12
y
13
y
14
n
126Attribute Selection by Information Gain
Computation
- Class P buys_computer yes
- Class N buys_computer no
- I(p, n) I(9, 5) 0.940
- Compute the entropy for age
127Extracting Classification Rules from Trees
- Represent the knowledge in the form of IF-THEN
rules - One rule is created for each path from the root
to a leaf - Each attribute-value pair along a path forms a
conjunction - The leaf node holds the class prediction
- Rules are easier for humans to understand
- Example
- IF age lt30 AND student no THEN
buys_computer no - IF age lt30 AND student yes THEN
buys_computer yes - IF age 3140 THEN buys_computer yes
- IF age gt40 AND credit_rating excellent
THEN buys_computer yes - IF age gt40 AND credit_rating fair THEN
buys_computer no.
128Bayesian Classification Why?
- Probabilistic learning Calculate explicit
probabilities for hypothesis, among the most
practical approaches to certain types of learning
problems. - Incremental Each training example can
incrementally increase or decrease the
probability that a hypothesis is correct. Prior
knowledge can be combined with observed data. - Probabilistic prediction Predict multiple
hypotheses, weighted by their probabilities. - Standard Even in cases where Bayesian methods
prove computationally intractable, they can
provide a standard of optimal decision making
against which other methods can be measured.
129Bayesian Theorem
- Given training data D, posteriori probability of
a hypothesis h, P(hD) follows the Bayes theorem - Practical difficulty require initial knowledge
of many probabilities, significant computational
cost.
130Bayesian classification
- The classification problem may be formalized
using a-posteriori probabilities - P(CX) prob. that the sample tuple
- Xltx1,,xkgt is of class C.
- E.g. P(classN outlooksunny,windytrue,)
- Idea assign to sample X the class label C such
that P(CX) is maximal
131Estimating a-posteriori probabilities
- Bayes theorem
- P(CX) P(XC)P(C) / P(X)
- P(X) is constant for all classes
- P(C) relative freq of class C samples
- C such that P(CX) is maximum C such that
P(XC)P(C) is maximum - Problem computing P(XC) is unfeasible!
132Naïve Bayesian Classification
- Naïve assumption attribute independence
- P(x1,,xkC) P(x1C)P(xkC)
- If i-th attribute is categoricalP(xiC) is
estimated as the relative frequency of samples
having value xi as i-th attribute in class C - If i-th attribute is continuousP(xiC) is
estimated through a Gaussian density function - Computationally easy in both cases
133Play Tennis Example Data
N - not to play tennis
P - play tennis
134Play-tennis example estimating P(xiC)
135Play-tennis example classifying X
- An unseen sample X ltrain, hot, high, falsegt
- P(XP)P(P) P(rainP)P(hotP)P(highP)P(fa
lseP)P(P) - 3/92/93/96/99/14 0.010582
- P(XN)P(N) P(rainN)P(hotN)P(highN)P(fa
lseN)P(N) - 2/52/54/52/55/14 0.018286
- Sample X is classified in class N (dont play)
136Neural Networks
- Advantages
- prediction accuracy is generally high
- robust, works when training examples contain
errors - output may be discrete, real-valued, or a vector
of several discrete or real-valued attributes - fast evaluation of the learned target function.
- Criticism
- long training time
- difficult to understand the learned function
(weights). - not easy to incorporate domain knowledge
137Artificial Neuron
x0
w0
w1
s?xiwi
yf(s)
x1
y
w2
x2
138A Neural Network
139Network Training
- The ultimate objective of training
- obtain a set of weights that makes almost all the
tuples in the training data classified correctly - Steps
- Initialize weights with random values
- Feed the input tuples into the network one by one
- For each unit
- Compute the net input to the unit as a linear
combination of all the inputs to the unit - Compute the output value using the activation
function - Compute the error
- Update the weights and the bias
140Learning Paradigms
(1) Supervised adjust weights using Error
Desired - Actual
(2) Unsupervised adjust weights using
reinforcement
Inputs
Actual Output
141Training Neural Network
142Classification Example
x2
x1
143Equation of a Line
2x1 3x2 - 6 0
x2
2
2x1 3x2 - 6 gt 0
2x1 3x2 - 6 lt 0
x1
0
3
144Neural Classifier
x01
w0 -6
s?xiwi
y 1?
w1 2
x1
ysgn(s)
y -1?
w2 3
x2
145Genetic Algorithms
- GA based on an analogy to biological evolution
- Each rule is represented by a string of bits
- An initial population is created consisting of
randomly generated rules - Based on the notion of survival, a new population
is formed to consists of the rules and their
offsprings - Offsprings are generated by crossover and mutation
146Genetic Algorithms
147Example Initial Population
b_c
1
n
2
n
3
y
4
y
5
y
6
n
7
y
8
n
9
y
10
y
11
y
12
y
13
y
14
n
b_c
100 100 01 01 01
1
OI
100 100 01 10 01
2
OI
010 100 01 01 10
3
IO
001 010 01 01 10
4
IO
001 001 10 01 10
5
IO
001 001 10 10 01
6
OI
010 001 10 10 10
7
IO
100 010 01 01 01
8
OI
100 001 10 01 10
9
IO
001 010 10 01 10
10
IO
100 010 10 10 10
11
IO
010 010 01 10 10
12
IO
13
IO
010 100 10 01 10
14
OI
001 010 01 10 01
148Example Generated Rule
IF age lt30 AND student no THEN
buys_computer no
001 111 01 11 01
149Instance-Based Methods
- Instance-based learning Store training examples
and delay the processing (lazy evaluation)
until a new instance must be classified. - Typical approaches
- k-nearest neighbor approach
- Instances represented as points in a Euclidean
space. - Locally weighted regression
- Constructs local approximation.
- Case-based reasoning
- Uses symbolic representations and knowledge-based
inference.
150The k-Nearest Neighbor Algorithm
- All instances correspond to points in the n-D
space. - The nearest neighbor are defined in terms of
Euclidean distance. - The target function could be discrete- or real-
valued. - For discrete-valued, the k-NN returns the most
common value among the k training examples
nearest to xq. - Vonoroi diagram the decision surface induced by
1-NN for a typical set of training examples.
.
_
_
_
.
_
.
.
.
_
xq
.
_
151Discussion on the k-NN Algorithm
- The k-NN algorithm for continuous-valued target
functions. - Calculate the mean values of the k nearest
neighbors. - Distance-weighted nearest neighbor algorithm.
- Weight the contribution of each of the k
neighbors according to their distance to the
query point xq. - giving greater weight to closer neighbors
- Similarly, we can distance-weight the instances
for real-valued target functions. - Robust to noisy data by averaging k-nearest
neighbors. - Curse of dimensionality distance between
neighbors could be dominated by irrelevant
attributes. To overcome it axes stretch or
elimination of the least relevant attributes.
152Fuzzy Set Approaches
- Fuzzy logic uses truth values between 0.0 and 1.0
to represent the degree of membership (such as
using fuzzy membership graph) - Attribute values are converted to fuzzy values
- e.g., income is mapped into the discrete
categories low, medium, high with fuzzy values
calculated - For a given new sample, more than one fuzzy value
may apply - Each applicable rule contributes a vote for
membership in the categories - Typically, the truth values for each predicted
category are summed
153Fuzzy Sets
Membership Grade ?
1
Warm
Mild
Cold
0
F
30
60
154Fuzzy Sets
?
1
0.85
Warm
Mild
Cold
0.24
0
F
30
60
38
155A Discrete Fuzzy Set
Temperature cold0.24, mild0.85
Membership of cold to the set Temperature is 0.24
Membership of mild to the set Temperature is 0.85