Agent Intelligence

About This Presentation

Title:

Agent Intelligence

Description:

J.H. Siekmann, N. Nillson, S.J. Russel, P. Norvig, A. Geyer-Schulz, C.Dyer, J. ... Intelligent perception of the external environment, mining data and discovering ... – PowerPoint PPT presentation

Number of Views:148

Avg rating:3.0/5.0

Slides: 186

Provided by: compi9

Category:

more less

Transcript and Presenter's Notes

Title: Agent Intelligence

1
Agent Intelligence

Based on tutorials and presentations
J.H. Siekmann, N. Nillson, S.J. Russel, P.
Norvig, A. Geyer-Schulz, C.Dyer, J. Robin, J.
Han, C. Isik, M. Kamber, A.
Logvinovskiy, S. Puuronen, V. Terziyan, Wikipedia

2
Intelligent perception of the external
environment, mining data and discovering
knowledge about it, reasoning new facts about it,
planning own behavior within it and acting based
on plans - are among the basic abilities of an
intelligent agent
Knowledge and facts
Agent Environment
Plans
Behavior
3
Agent Logic, Reasoning Planning

Based on tutorials and presentations
J.H. Siekmann, N. Nillson, S. Russel, P. Norvig,
A. Geyer-Schulz, C. Dyer, J. Robin

4
Real-World ReasoningTackling inherent
computational complexity
DARPA Research Program
1M 5M
Multi-Agent Systems
10301,020
0.5M 1M
Hardware/Software Verification
10150,500
Worst Case complexity
Exponential Complexity
200K 600K
Military Logistics
1015,050
50K 200K
Chess
103010
No. of atoms on earth
10K 50K
Deep space mission control
Technology Targets
1047

High-Performance Reasoning
Temporal/ uncertainty reasoning
Strategic reasoning/Multi-player

Seconds until heat death of sun
100 200
Car repair diagnosis
1030
Protein folding calculation (petaflop-year)
Variables
100
10K
20K
100K
1M
Rules (Constraints)
Example domains cast in propositional reasoning
system (variables, rules).
5
The Agent Architecture A Model
Head General Abilities
Body Application- specific Abilities
6
TYPE 1Simple Reflex Agents
7
TYPE 2 State-based Agents
8
TYPE 3 Goal-based Agents
9
TYPE 4 Learning Agents/Utility based Agents
10
A knowledge-based agent

A knowledge-based agent includes a knowledge base
and an inference system.
A knowledge base is a set of representations of
facts of the world.
Each individual representation is called a
sentence.
The sentences are expressed in a knowledge
representation language.
The agent operates as follows
1. It TELLs the knowledge base what it perceives.
2. It ASKs the knowledge base what action it
should perform.
3. It performs the chosen action.

11
Knowledge Base

Knowledge Base
set of sentences
in a formal knowledge representation language
that represents assertions about the world.
Declarative approach to building an agent
Tell it what it needs to know.
Ask it what to do ? answers should follow by
inference rules from the KB.

ask
tell
12
Knowledge Reasoning

To address these issues we will introduce
A knowledge base (KB) a list of facts that are
known to the agent.
Rules to infer new facts from old facts using
rules of inference.
Logic provides the natural language for this.

13
Why knowledge-base
Agent knowledge of state
Description of the world
Agent explicit specification of what he knows

The state of the world
may require lots of information..
The agent knowledge of the state of the world
If S is world state K(S) is what the agent
knows.
For economy
Not everything explicitly specified. Some facts
can be inferred.
Agent may infer whatever he does not know
explicitly.
Constraints on feature values
Age of a person is not more than 200 years
Issues
In what language to express what the agent knows
about the world. How explicit to make this
knowledge. How to infer.

14
Logic in general

Logics are formal languages for representing
information such that conclusions can be drawn
Syntax defines the sentences in the language
Semantics define the "meaning" of sentences
i.e., define truth of a sentence in a world
E.g., the language of arithmetic
x2 y is a sentence x2y gt is not a
sentence
x2 y is true iff the number x2 is no less
than the number y
x2 y is true in a world where x 7, y 1
x2 y is false in a world where x 0, y 6

15
Entailment

Entailment means that one thing follows from
another
KB a
Knowledge base KB entails sentence a if and only
if a is true in all worlds where KB is true
E.g., the KB containing Milan won and Inter
won entails Either Milan won or Inter won
E.g., xy 4 entails 4 xy
Entailment is a relationship between sentences
(i.e., syntax) that is based on semantics

16
Models

Logicians typically think in terms of models,
which are formally structured worlds with respect
to which truth can be evaluated
We say m is a model of a sentence a if a is true
in m
M(a) is the set of all models of a
Then KB a iff M(KB) ? M(a)

17
Inference, Soundness, Completeness

KB i a sentence a can be derived from KB by
procedure i
Soundness i is sound if whenever KB i a, it is
also true that KB a
Completeness i is complete if whenever KB a, it
is also true that KB i a

18
Knowledge RepresentationDefined by syntax,
semantics
Agent
Inference
? ?
Assertions Conclusions (knowledge
base) Facts Facts
Semantics
? ?
Imply
Real-World
19
Schematic perspective
If KB is true in the real world, then any
sentence ? derived from KB by a sound inference
procedure is also true in the real world.
20
Logic as a KR language
A modal logic is any logic for handling
modalities concepts like possibility, existence,
necessity, eventually, formerly, can, could,
might, may, must, etc.
Temporal logic is used to describe any system of
rules and symbolism for representing, and
reasoning about, propositions qualified in terms
of time.
A higher-order logic is the logic where it is
allowed to quantify over predicates. A
higher-order predicate is a predicate that takes
one or more other predicates as arguments.
Multi-valued logics is logic, in which there are
more than two truth values.
Multi-valued Logic
First-order logic is a system of deduction
extending propositional logic by the ability to
express relations between individuals (e.g.
people, numbers, and "things") more generally.
Non-monotonic Logic
Modal
Temporal
Probabilistic logic is the logic, where the truth
values of sentences are probabilities.
Higher Order
A non-monotonic logic is a formal logic, in which
adding a formula to a theory may produce a
reduction of its set of consequences.
Probabilistic Logic
First Order
Propositional Logic
Fuzzy Logic
Fuzzy logic is derived from fuzzy set theory
dealing with reasoning that is approximate rather
than precisely deduced from classical first-order
logic.
Propositional logic is logic that studies ways of
joining and/or modifying entire propositions,
statements or sentences to form more complicated
ones, as well as the logical relationships and
properties that are derived from these methods of
combining or altering statements.
21
Ontology and epistemology
22
Propositional logic Syntax

Propositional logic is the simplest logic
illustrates basic ideas
The proposition symbols P1, P2 etc are sentences
If S is a sentence, ?S is a sentence
(negation)
If S1 and S2 are sentences, S1 ? S2 is a sentence
(conjunction)
If S1 and S2 are sentences, S1 ? S2 is a sentence
(disjunction)
If S1 and S2 are sentences, S1 ? S2 is a sentence
(implication)
If S1 and S2 are sentences, S1 ? S2 is a sentence
(biconditional)

23
Propositional logic Semantics

Each model/world specifies true or false for each
proposition symbol
Rules for evaluating truth with respect to a
model m
?S is true iff S is false
S1 ? S2 is true iff S1 is true and S2 is
true
S1 ? S2 is true iff S1is true or S2 is
true
S1 ? S2 is true iff S1 is false or S2 is true
i.e., is false iff S1 is true and S2 is
false
S1 ? S2 is true iff S1?S2 is true andS2?S1 is
true
Simple recursive process evaluates an arbitrary
sentence, e.g.,
?P1,2 ? (P2,2 ? P3,1) true ? (true ? false)
true ? true true

24
Logical equivalence

To manipulate logical sentences we need some
rewrite rules.
Two sentences are logically equivalent iff they
are true in same models a ß iff a ß and ß a

You need to know these !
25
Pros and cons of propositional logic

? Propositional logic is declarative
? Propositional logic allows partial/disjunctive/n
egated information
(unlike most data structures and databases)
Propositional logic is compositional
meaning of B1,1 ? P1,2 is derived from meaning of
B1,1 and of P1,2
? Meaning in propositional logic is
context-independent
(unlike natural language, where meaning depends
on context)
? Propositional logic has very limited expressive
power
(unlike natural language)

26
First-order logic

Propositional logic assumes the world contains
facts
First-order logic (like natural language) assumes
the world contains
Objects people, houses, numbers, colors,
baseball games, wars, centuries
Relations red, round, prime, brother of, bigger
than, part of, comes between,
Functions father of, best friend, one more than,
plus,

27
Syntax of FOL Basic elements

Constants KingJohn, 2, Penn,...
Predicates Brother, gt,...
Functions Sqrt, LeftLegOf,...
Variables x, y, a, b,...
Connectives ?, ?, ?, ?, ?
Equality
Quantifiers ?, ?

28
Atomic sentences

Term function (term1,...,termn) or
constant or variable
Atomic sentence predicate (term1,...,termn) or
term1 term2
For example
Brother(KingJohn, RichardTheLionheart)
gt (Length(LeftLegOf(Richard)),Length(LeftLegOf(Kin
gJohn)))

29
Complex sentences

Complex sentences are made from atomic sentences
using connectives
?S, S1 ? S2, S1 ? S2, S1 ? S2, S1 ? S2,
For example
Sibling(KingJohn,Richard) ? Sibling(Richard,King
John)

30
Universal quantification

?ltvariablesgt ltsentencegt
Everyone at Penn is smart
? x At(x,Penn) ? Smart(x)
?x P is true in a model m iff
P is true with x being each possible object in
the model
Roughly speaking, equivalent to the conjunction
of instantiations of P
At(KingJohn,Penn) ? Smart(KingJohn)
? At(Richard,Penn) ? Smart(Richard)
? At(Penn,Penn) ? Smart(Penn)
? ...

31
Existential quantification

? ltvariablesgt ltsentencegt
Someone at Penn is smart? x At(x,Penn) ?
Smart(x)
? x P is true in a model m iff P is true with
x being some possible object in the model
Roughly speaking, equivalent to the disjunction
of instantiations of P
At(KingJohn,Penn) ? Smart(KingJohn)
? At(Richard,Penn) ? Smart(Richard)
? At(Penn,Penn) ? Smart(Penn)
? ...

32
Properties of quantifiers

? x ? y is the same as ? y ? x
? x ? y is the same as ? y ? x
? x ? y is not the same as ? y ? x
? x ? y Loves(x,y)
There is a person who loves everyone in the
world
? y ? x Loves(x,y)
Everyone in the world is loved by at least one
person
Quantifier duality each can be expressed using
the other? x Likes(x,IceCream) ?? x
?Likes(x,IceCream)? x Likes(x,Broccoli) ?? x
?Likes(x,Broccoli)

33
Using FOL

Brothers are siblings
? x,y Brother(x,y) ? Sibling(x,y)
One's mother is one's female parent
? m,c Mother(c) m ? (Female(m) ? Parent(m,c))
Sibling is symmetric
? x,y Sibling(x,y) ? Sibling(y,x)
A first cousin is a child of a parents sibling
? x,y FirstCousin(x,y) ? ? p,ps Parent(p,x) ?
Sibling(ps,p) ? Parent(ps,y)

34
Wumpus world

Performance measureGold 1000, death 1000, step
1, arrow 10
Environment- squares adjacent to Wumpus are
smelly- squares adjacent to pits are breezy-
glitter iff gold is in the same square- shooting
kills Wumpus if you are facing it- shooting uses
up the only arrow- grabbing picks up gold if in
the same square- releasing drops the gold in
same square
SensorsBreeze, glitter, smell
ActuatorsLeft, right turn, forward, grab,
release, shoot

35
Wumpus world

A four by four cave with locations identified by
coordinates (3,4), etc.
Agent is at a location, facing a particular
direction (L,R,D,U)
Agent starts at (1,1) facing R

4
gt
1
1
4
36
Wumpus world

In the cave is
A Wumpus that smells
It can kill the agent if at same location
It can be killed by the agent shooting an arrow
if facing the Wumpus. When the Wumpus dies, it
SCREAMs

4

1
1
4
37
Wumpus world

In the cave are
3 Pits. Breezes blow from pits.
If an agent steps into a pit, it falls to its
death.
A heap of gold that glitters

4

1
1
4
38
Wumpus world

Agent goal
get gold and get out alive
Agent actions
Move forward one square in current direction
(Fwd)
Turn left or right 90o (TL,TR)
Shoot arrow in current direction
Grab gold
Agent perceptions at each location
Stench, Breeze, Glitter, Bump, Scream

39
Wumpus world

Cave is created randomly (location of Wumpus,
pits and gold)
Perception / action loop
Agent must construct a model knowledge base
about the cave as it tries to achieve its goal

4
gt
1
1
4
40
Wumpus world knowledge

General knowledge (known at start)
Location and direction
Living
Grab and holding
Wumpus and stench, shooting, scream, life
Pits and breeze
Gold and glitter
Movement and location, direction and bumps
Starting state of agent
Goal
Facts (not known)
Location of Wumpus, pits, gold

41
Wumpus world characterization

Observable
No only local perception
Deterministic
Yes outcomes explicit
Episodic
No sequential actions
Discrete
Yes
Single-agent
Yes

42
Exploring Wumpus world

A
43
Exploring Wumpus world
B
A

44
Exploring Wumpus world
B

A
45
Exploring wumpus world
B
S

A
46
Exploring Wumpus world
P
ok
B
S

W
A
How can we make these inferences automatically?
47
Wumpus world in propositional logic

Facts are propositions
e.g., W44 Wumpus is at square (4,4)
96 propositions (16 each for wumpus, stench, pit,
breeze, gold, glitter) to represent a particular
cave
General knowledge in sentences
e.g., W44?(S44 ? S43 ? S34) if the Wumpus is at
(4,4), there is stench at (4,4), (4,3) and (3,4)
many sentences

48
Wumpus world in propositional logic

Facts that may change
Location of agent, direction of agent
Agent holding gold
Agent has shot arrow
Agent, Wumpus are alive

49
Wumpus world in FOL

the objects in the environment
terms constants, variables, functions
constants
times 0, 1, 2, ...
headings R, L, D, U
coordinates 1, 2, 3, 4
locations 16 squares
percepts Stench, Breeze, Glitter, Bump, Scream,
None
actions Turn(Left), Turn(Right), Forward, Grab,
Shoot
Agent, Wumpus

50
Wumpus world in FOL

the objects in the environment
terms constants, variables, functions
functions
Square(x,y)
Home(Wumpus)
Perception(s, b, g, h, y)
Heading(t), Location(t)

51
Wumpus world in FOL

the basic knowledge
atomic sentences predicates, termterm
predicates (true or false)
properties (of one term/object)
Breezy(t) // agent feeling breeze at time t
Breeze(s) // breeze blowing on square s
Pit(s), Gold(s), etc.
Time(x) // object x is a time
Coordinate(x), Action(x), Heading(x), etc.

52
Wumpus world in FOL

the basic knowledge
atomic sentences predicates, termterm
predicates (true or false)
relations (of multiple terms/objects)
At(s,t) // agent on square s at time t
Adjacent (r,s) // squares r and s are adjacent
Alive(x,t) // x is alive at time t
Percept(p,t) // perception at time t
BestAction(a,t) // action a to take at time t

53
Wumpus world in FOL

the basic knowledge
atomic sentences predicates, termterm
term term (true or false)
Home(Wumpus) 3,3
Heading(5) U

54
Exploring Wumpus world in FOL
Deciding the best action (incomplete description
here) Need to reason about the cave
conditions Diagnostic rules ?s Breezy(s) ? ?r
Adjacent(r,s) ? Pit(r) Causal rules ?r Pit(r) ?
?s Adjacent(r,s) ? Breezy(s)
55
Rules as a Knowledge Representation Formalism

What is a rule?
A statement that specifies that
If a determined logical combination of
conditions is satisfied,
over the set of an agents percepts
and/or facts in its Knowledge Base (KB)
that represent the current, past and/or
hypothetical future of its environment model, its
goals and/or its preferences,
then a logic-temporal combination of actions can
or must be executed by the agent,
directly on its environment (through actuators)
or on the facts in its KB.
A KB agent such that the persistent part of its
KB consists entirely of such rules is called a
rule-base agent
In such case, the inference engine used by the KB
agent is an interpreter or a compiler for a
specific rule language.

56
Rule-Based Agent
Environment
Sensors
Ask
Tell
Retract

Rule Engine
Domain class independent
Only dependent on rule language
Declarative code interpreter or compiler

Ask

Rule Base
Persistent intentional knowledge
Domain class dependent
Declarative code

Effectors
57
Rules examples

Examples in semi-natural language syntax
IF P sells a W to N AND W is a weapon
AND N is a nation AND N is hostile
THEN P is a criminal
IF P is a criminal AND L is location of
P
THEN call to police AND report P is a
criminal
AND report L

58
Toulmins Argumentation Scheme
therefore
Qualifier, inference result
Facts

if not
cause
Inference rule
Exception rule
because of
support
COGNITIVE SCIENCE
59
Toulmins argumentation scheme example
therefore
The offeredused car is old
Probably, the offered used car is cheap

if not
cause
Used cars are cheapmost of the time
The offered used caris a collectors item
because of
Used things loose their valuewhen time goes by,
because they break down more often etc.
60
Basic principles of XPS (1)
Every production rule has two parts
A
B
Assumption Antecedence Evidence If-Part Left hand
side (LHS) Condition
Conclusion Consequence Hypotheses Then-Part Right
hand side (RHS) Action
Productions are evaluated over a (data) pool (not
data base!) that is named working memory (WM)
or at applications in cognitive psychology
denoted as short-term memory (STM)
61
Basic principles of XPS (2)
There are two modes of evaluating production
rules
Backward chaining
Forward chaining
A
B
A
B
Data controlled inference Antecedence-oriented
inference Bottom-up inference If-added
methods LHS-controlled chaining
Goal controlled inference Consequence-oriented
inference Top-down inference If-needed
methods RHS-controlled chaining
62
Production Rule Systems
Facts
(( car_no DÜW-AW 205) motor_status
on) oil_control on) air_pressure 0,1 bar) ...)
Rules
(1) IF (motor_status on) AND
(oil_control on) THEN WRITE(Stop
motor) AND SET (motor_status off)
(2) IF (car_no x) AND
(air_pressure y) (LESS y 1.5)
THEN WRITE( x has a flat tire)
63
General Structure of Production Rules
Simple rules
IF B1 ? B2 ? ? Bn THEN A1 ? A2 ? ?
Am ELSE C1
IF B1 ? B2 ? ? Bn THEN DO A1 ? A2 ? ?
Am ELSEDO C1
Example
IF the site of the culture is throat AND the
organism is streptococcus THEN there is strong
evidence that the subtype is not of group-D
64
Architecture of a Production System
data base
rules
C1 ? C2 ? A1 C3 ? A2 C1 ? C3 ? A3 C4 ? A4 C5 ?
A5
C5 C1 C3
Rule interpreter
recognition
action
conflict set
match
productionrules againstdata base
C3 ? A2 C1 ? C3 ? A3 C5 ? A5
C3 ? A2
evaluate A2
65
Rules with Certainty Factors

General Form

IF C1(w1) ? C2(w2) ? ? Cn(wn) THEN
DO A(W)
Example
IF the organism is gram-pos AND the
organism grows in chains AND the morphology is
spherical THEN by 70 evidence the organism is
streptococcus
66
Structured Rules (mapping relations)

Condition
Action
Default
Context
Example (causal relations)
IF (COND serious diarrhea AND longer
then two days) (CTXT malabsorprion)
(DFLT no bicarbonat therapy) THEN medium
metabolic acidosis with a normal anions
ELSE light metabolic acidosis with normal anions
67

Agent Planning

68
What is AI Planning ?

Generate sequences of actions to perform tasks
and achieve objectives
Until recently, AI planning was essentially a
theoretical endeavor Its now becoming useful in
industrial applications
Example application areas

design manufacturing

military operations logistics

games

space exploration
Proof planning in mathematics
Speech and dialog planning
Agent behavior planning

69
Planning Involves
Room 2

Given knowledge about task domain (actions)
Given problem specified by initial state
configuration and goals to achieve
Agent tries to find a solution, i.e. a sequence
of actions that solves a problem

Agent
Room 1
70
Notions
Room 2
Go to the basket
Go to the can

Plan sequence of (actions) transforming the
initial state into a final state
Operators representation of actions
Planner algorithm that generates a plan from a
(partial) description of initial and final state
and from a specification of operators

Room 1
71
The Blocks World in Reality
72
What is a Planning Problem?

A planning problem is given by
an initial state and a goal
state.

ontable (B) ontable (C) on (D,B) on (A,D) clear
(A) clear (C) handempty
A
GOAL
D
B
C
For a transition there are certain operators
available.
PICKUP (x) picking up x from the table PUTDOWN
(x) putting down x on the table STACK (x,
y) putting x on y UNSTACK (x, y) picking up x
from y
73
Representing States of the World

State a consistent assignment of TRUE or FALSE
to every literal in the universe
State description
a set of ground literals that are all taken to be
TRUE

c
on(c,a),ontable(a),clear(c), ontable(b),clear(b),h
andempty
a
b

The negation of these literals are taken to be
false
Truth values of other ground literals are
unknown

74
STRIPS Operators (with negation)
STRIPS Stanford Research Institute Problem
Solver

A STRIPS operator

Name name(v1, v2, ..., vn)
Preconditions atom1, atom2, ..., atomn
Effects literal1, literal2,
..., literalm
Name unstack(?x,?y) Preconditions
on(?x,?y), clear(?x), handempty Effects
on(?x,?y), clear(?x), handempty,
holding(?x), clear(?y)
Example

Operator Instance replacement of variables by
constants

75
Example The Blocks World

unstack(?x,?y)
Pre on(?x,?y), clear(?x), handempty
Eff on(?x,?y), clear(?x), handempty,
holding(?x), clear(?y)

stack(?x,?y) Pre holding(?x), clear(?y)
Eff holding(?x), clear(?y), on(?x,?y),
clear(?x), handempty
pickup(?x) Pre ontable(?x), clear(?x),
handempty Eff ontable(?x), clear(?x),
handempty, holding(?x)
putdown(?x) Pre holding(?x)
Eff holding(?x), ontable(?x), clear(?x),
handempty
76
Plans

STRIPS planning domain
A language L (choose the predicate and constant
symbols)
A set of planning operators (e.g., the
blocks-world operators)
Plan
A sequence P (o1, o2, ..., ok) of ground
instances of operators

unstack(c,a), putdown(c), pickup(a), stack(a,c)
Each oi is called a step of P
77
Planning Problems
clear(c) on(c,a) ontable(a) clear(b) ontable(b) ha
ndempty

STRIPS planning problems
- a triple R (i, g, O)

i is the initial state description

b
a

g is the goal

O is a set of planning operators

a
on(a,c) ontable(c)
c

P is a correct plan for R if g is true in
result(i, P)

unstack(c,a), putdown(c), pickup(a), stack(a,c)
unstack(c,a), putdown(c), pickup(a), stack(a,b)
pickup(a), stack(a,c), unstack(c,a), putdown(c)
78
CLEAR(A) ONTABLE(A) CLEAR(B) ONTABLE(B) CLEAR(C) O
NTABLE(C) HANDEMPTY
putdown(B)
putdown(A)
Search Space
pickup(A)
pickup(B)
pickup(C)
Putdown(C)
CLEAR(A) CLEAR(C) HOLDING(B) ONTABLE(A) ONTABLE(
C)
CLEAR(A) CLEAR(B) HOLDING(C) ONTABLE(A) ONTABLE(
B)
CLEAR(B) CLEAR(C) HOLDING(A) ONTABLE(B) ONTABLE(
C)
stack(A, B)
unstack(A, B)
stack(C, A)
unstack(B, A)
unstack(C, A)
stack(B, A)
stack(C, B)
stack(A, C)
unstack(A, C)
stack(B, C)
unstack(B, C)
unstack(C, B)
CLEAR(A) ON(B, C) CLEAR(B) ONTABLE(A) ONTABLE(C)
HANDEMPTY
CLEAR(C) ON(B, A) CLEAR(B) ONTABLE(A) ONTABLE(C)
HANDEMPTY
CLEAR(A) ON(C, B) CLEAR(C) ONTABLE(A) ONTABLE(B)
HANDEMPTY
CLEAR(B) ON(C, A) CLEAR(C) ONTABLE(A) ONTABLE(B)
HANDEMPTY
CLEAR(B) ON(A, C) CLEAR(A) ONTABLE(B) ONTABLE(C)
HANDEMPTY
CLEAR(C) ON(A, B) CLEAR(A) ONTABLE(B) ONTABLE(C)
HANDEMPTY
putdown(C)
pickup(c)
pickup(B)
pickup(C)
putdown(C)
putdown(B)
pickup(A)
putdown(A)
pickup(A)
pickup(B)
putdown(B)
putdown(B)
ON(B, C) CLEAR(B) HOLDING(A)ONTABLE(C)
ON(B, C) CLEAR(B) HOLDING(A)ONTABLE(C)
ON(B, C) CLEAR(B) HOLDING(A)ONTABLE(C)
ON(B, C) CLEAR(B) HOLDING(A)ONTABLE(C)
ON(B, C) CLEAR(B) HOLDING(A)ONTABLE(C)
ON(B, C) CLEAR(B) HOLDING(A)ONTABLE(C)
a
stack(C, B)
unstack(C, B)
stack(B, C)
unstack(B, C)
stack(C, A)
unstack(C, A)
stack(A, C)
stack(A, B)
unstack(A, C)
unstack(A, B)
stack(B, A)
stack(B, A)
b
CLEAR(A) ON(A, B) ON(B, C) ONTABLE(C) HANDEMPTY
CLEAR(C) ON(C, B) ON(B, A) ONTABLE(A) HANDEMPTY
CLEAR(A) ON(A, C) ON(C, B) ONTABLE(B) HANDEMPTY
CLEAR(B) ON(B, C) ON(C, A) ONTABLE(A) HANDEMPTY
CLEAR(B) ON(B, A) ON(A, C) ONTABLE(C) HANDEMPTY
CLEAR(C) ON(C, A) ON(A, B) ONTABLE(B) HANDEMPTY
c
79
State-Space Search State-space planning is a
search in the space of states
C
A
B
C
A
B
C
B
Initialstate
A
B
A
C
A
B
C
B
B
C
A
B
A
B
C
C
A
C
A
A
A
Goal
B
C
B
C
B
C
A
A
C
B
80
State-Space Search Vacuum World example
Initial state
Goal
81
Depth-first search
Not necessarily shortest path Limited memory
requirement
82
Depth-First search example
83
Breadth-first search
Finds shortest path Large memory requirement
84
Breadth-First search example
85
Both Depth-first and Breadth-first search can be

Forward (from the initial state to the goal)
Backward (from the goal to the initial state)
Bi-Directional (from both starting points until
meeting point)

86
Bi-directional search
Schematic view of a bidirectional search is about
to succeed, when a branch from the start node
meets a branch from the goal node. The motivation
is that the area of the two small circles is less
than the area of one big circle centered on the
start and reaching to the goal.
87
Depth-Limited and Iterative Deepening search

Usually, breadth first search requires too much
memory to be practical.
Main problem with depth first search
can follow a dead-end path very far before this
is discovered.
Depth-Limited search
? impose a depth limit l
never explore nodes at depth gt l
Iterative Deepening search is depth-limited
search with increasing limit
? solution improves with more computation time

88
Depth-Limited search (limit 3) example
89
Iterative Deepening search example
90
Uniform-Cost search

Uniform-cost search is a tree search algorithm
used for traversing or searching a weighted tree,
tree structure, or graph. The search begins at
the start or goal node. The search continues by
visiting the next node which has the least total
cost from the root.

91
Partial Plans
? Partial plan a partially ordered set of
operator instances
The partial order gives only some constraints on
the order in which the operations have to be
performed
? Start a dummy operator
? Finish another dummy operator
putdown(c)
pickup(a)
unstack(c, a)
Start
stack(a, b)
Finish
pickup(b)
stack(b, c)
92
Partial Plan Example
SM Super Market HS Hardware Store

At(Home)
Sells(SM, Banana)
Sells(HS, Drill)
Have(Drill)
Have(Milk)
Have(Banana)

93
Partial Plan Example
94
GraphPlan THE BASIC IDEA

1. Construct the (initial) Planning Graph
2. Extract Solution (if possible)
with fast
Graph-Search-Algorithms
3. Else expand Graph and goto 2.

95
The Planning Graph

Alternating layers of ground literals and
actions (ground instances of operators)
representing the literals and actions that might
occur at each time step 0 lt i lt N

literals that might be true at time t (i-1)/2
0
i-1
i
i1
literals that are true at the initial state
literals that might be true at time t (i1)/2
...
...
...

...
...
...
...
...
preconditions

effects
...
...
operators
Maintenance NoOps
96
Mutual Exclusion
InconsistentEffects
Competing Needs
Inconsistent Support
Interference

Two actions are mutex if

Inconsistent effects an effect of one negates an
effect of the other
Interference one deletes a precondition of the
other
Competing needs they have mutually exclusive
preconditions

Two literals are mutex if

Inconsistent support one is the negation of the
other, or all ways of achievingthem are pairwise
mutex
97
The 8th March Example

Suppose you want to clean the room and prepare
dinner as a surprise for your sweetheart who is
asleep

Initial Conditions (and (garbage) (cleanHands)
(quiet)) Goal (and (dinner) (surprise) (not
(garbage))
Actions
cook precondition (cleanHands) effect
(dinner) serve precondition (quiet) effect
(surprise) clean precondition effect (and
(not (garbage)) (not (cleanHands))) vacuum preco
ndition effect (and (not (garbage)) (not
(quiet)))
98
The Graph for this Example (1)

Generate the first two levels of the planning
graph

clean is mutex with garbage(inconsistent effects)

0
1
2
garb
garb

clean
??garb

vacuum is mutex with serve(interference)

vacuum
cleanH
cleanH

?quiet is mutex with surprise(inconsistent
support)

?cleanH
cook
quiet
quiet

serve
??quiet
cook precondition (cleanHands)
effect (dinner)
dinner
clean precondition
surprise
effect (and (not (garbage)) (not (cleanHands)))
99
Extraction of a Solution for the Example (1)

Check to see whether theres a possible plan

Recall that the goal is(and (dinner)
(surprise)
(not (garbage)))

0
1
2
garb
garb

clean
??garb
vacuum

Note that
All literals are present at level 2
None are mutex with each other

cleanH
cleanH

?cleanH
cook
quiet
quiet

serve

Thus there is a chance that a plan exists

??quiet
dinner
Solution Extraction
surprise
100
Solution Extraction for the Example (2)

Two sets of actions for the goals at level 2

Neither works both sets contain actions that are
mutex

0
1
2
0
1
2
garb
garb
garb
garb

clean
clean
??garb
??garb
vacuum
vacuum
cleanH
cleanH
cleanH
cleanH

?cleanH
?cleanH
cook
cook
quiet
quiet
quiet
quiet

serve
serve
??quiet
??quiet
dinner
dinner
surprise
surprise
101
Solution Extraction Example (3)

Go back and do more graph extension generate two
more levels

0
1
2
3
4
garb
garb
garb

clean
clean
??garb
??garb
vacuum
vacuum
cleanH
cleanH
cleanH

?cleanH
?cleanH
cook
cook
quiet
quiet
quiet

serve
serve
??quiet
??quiet
dinner
dinner
surprise
surprise
102
Example Solution extraction (4)
0
1
2
3
4
garb
garb
garb

clean
clean
??garb
??garb
vacuum
vacuum
cleanH
cleanH
cleanH

?cleanH
?cleanH
cook
cook
quiet
quiet
quiet

serve
serve
??quiet
??quiet
dinner
dinner
Twelve combinations at level 4
surprise
surprise

Tree ways to archive ?garb
Two ways to archive dinner
Two ways to archive surprise

103
Example Solution extraction (5)
Call Solution-Extraction recursively at level 2
one combination works, so we have got a plan
0
1
2
3
4
garb
garb
garb

clean
clean
??garb
??garb
vacuum
vacuum
cleanH
cleanH
cleanH

?cleanH
?cleanH
cook
cook
quiet
quiet
quiet

serve
serve
??quiet
??quiet
dinner
dinner
surprise
surprise
104
Constraint Satisfaction Problems

Set of variablesEach variable has a range of
possible values
Set of constraintsFind values for the variables
that satisfy all the constraints

Dynamic constraint satisfaction problemsWhen we
select values for some variables, this changes
what the remaining variables and constraints are

105
Agent Knowledge Discovery, Classification,
Prediction, Multidatabase Mining
Based on tutorials and presentations J. Han, C.
Isik, M. Kamber, A. Logvinovskiy, S. Puuronen, V.
Terziyan
106
Data Mining A KDD Process
Knowledge
Pattern Evaluation

Data mining the core of knowledge discovery
process.

Data Mining
Task-relevant Data
Selection
Data Warehouse
Data Cleaning
Data Integration
Databases
107
Data Mining Confluence of Multiple Disciplines

Database systems
Statistics
Machine learning
Visualization
Information science
High performance computing
Other disciplines
Neural networks, mathematical modeling,
information retrieval, pattern recognition, etc.

108
Introduction to Classification

Classify data (creating a model) based on the
training set and the values in a classifying
attribute

109
Classification vs. Prediction

Classification
predicts categorical class labels
classifies data (constructs a model) based on the
training set and the values (class labels) in a
classifying attribute and uses it in classifying
new data
Prediction
models continuous-valued functions, i.e.,
predicts unknown or missing values

110
ClassificationA Two-Step Process

Model construction describing a set of
predetermined classes
Each tuple/sample is assumed to belong to a
predefined class, as determined by the class
label attribute
The set of tuples used for model construction
training set
The model is represented as classification rules,
decision trees, or mathematical formulae
Model usage for classifying future or unknown
objects
Estimate accuracy of the model
The known label of test sample is compared with
the classified result from the model
Accuracy rate is the percentage of test set
samples that are correctly classified by the
model
Test set is independent of training set,
otherwise over-fitting will occur

111
Classification Process(I)
Classification Algorithms
IF rank professor OR years gt 6 THEN tenured
yes
112
Classification Process(II)
(Jeff, Professor, 4)
Tenured?
113
Supervised vs. Unsupervised Learning

Supervised learning (classification)
Supervision The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations
Based on the training set to classify new data
Unsupervised learning (clustering)
We are given a set of measurements, observations,
etc with the aim of establishing the existence of
classes or clusters in the data
No training data, or the training data are not
accompanied by class labels

114
Data Preparation

Data cleaning
Preprocess data in order to reduce noise and
handle missing values
Relevance analysis (feature selection)
Remove the irrelevant or redundant attributes
Data transformation
Generalize and/or normalize data

115
Classification Accuracy Estimating Error Rates

Partition Training-and-testing
use two independent data sets, e.g., training set
(2/3), test set(1/3)
used for data set with large number of samples
Cross-validation
divide the data set into k subsamples
use k-1 subsamples as training data and one
sub-sample as test data --- k-fold
cross-validation
for data set with moderate size
Bootstrapping (leave-one-out)
for small size data

116
Boosting Techniques

Boosting increases classification accuracy.
Learn a series of classifiers, where each
classifier in the series pays more attention to
the examples misclassified by its predecessor
Boosting requires only linear time and constant
space

117
What is a decision tree?

A decision tree is a flow-chart-like tree
structure.
Internal node denotes a test on an attribute
Branch represents an outcome of the test
All tuples in branch have the same value for the
tested attribute.
Leaf node represents class label or class label
distribution.

118
Training Dataset Example
buys_computer
Sample
no
1
no
2
yes
3
yes
4
yes
5
no
6
yes
7
no
8
yes
9
yes
10
yes
11
yes
12
yes
13
no
14
119
How to construct a tree?

Algorithm
greedy algorithm
make optimal choice at each step select the best
attribute for each tree node.
top-down recursive divide-and-conquer manner
from root to leaf
split node to several branches
for each branch, recursively run the algorithm

120
Output A Decision Tree for buys_computer
age?
lt30
overcast
gt40
30..40
student?
credit rating?
yes
no
yes
fair
excellent
no
no
yes
yes
121
Algorithm for Decision Tree Induction

Basic algorithm (a greedy algorithm)
Tree is constructed in a top-down recursive
divide-and-conquer manner
At start, all the training examples are at the
root
Attributes are categorical (if continuous-valued,
they are discretized in advance)
Examples are partitioned recursively based on
selected attributes
Test attributes are selected on the basis of a
heuristic or statistical measure (e.g.,
information gain)
Conditions for stopping partitioning
All samples for a given node belong to the same
class
There are no remaining attributes for further
partitioning majority voting is employed for
classifying the leaf
There are no samples left

122
Decision Tree Construction
1,2,3,4,5,6,7,8,9,10,11,12,1314
age?
lt30
overcast
gt40
30..40
4,5,6,10,14
1,2,8,9,11
3,7,12,13
student?
credit rating?
yes
no
yes
fair
excellent
1,2,8
9,11
4,5,10
6,14
no
no
yes
yes
123
Information Gain (ID3/C4.5)

Select the attribute with the highest information
gain
Assume there are two classes, P and N
Let the set of examples S contain p elements of
class P and n elements of class N
The amount of information, needed to decide if an
arbitrary example in S belongs to P or N is
defined as

124
Information Gain in Decision Tree Induction

Assume that using attribute A a set S will be
partitioned into sets S1, S2 , , Sv
If Si contains pi examples of P and ni examples
of N, the entropy, or the expected information
needed to classify objects in all subtrees Si is
The encoding information that would be gained by
branching on A

125
Attribute Selection
1,2,3,4,5,6,7,8,9,10,11,12,1314
p12
n13
age?
p24
n20
p33
n32
lt30
overcast
gt40
30..40
4,5,6,10,14
1,2,8,9,11
3,7,12,13

b_c
1
n
2
n
3
y
4
y
5
y
6
n
7
y
8
n
9
y
10
y
11
y
12
y
13
y
14
n
126
Attribute Selection by Information Gain
Computation

Hence
Similarly

Class P buys_computer yes
Class N buys_computer no
I(p, n) I(9, 5) 0.940
Compute the entropy for age

127
Extracting Classification Rules from Trees

Represent the knowledge in the form of IF-THEN
rules
One rule is created for each path from the root
to a leaf
Each attribute-value pair along a path forms a
conjunction
The leaf node holds the class prediction
Rules are easier for humans to understand
Example
IF age lt30 AND student no THEN
buys_computer no
IF age lt30 AND student yes THEN
buys_computer yes
IF age 3140 THEN buys_computer yes
IF age gt40 AND credit_rating excellent
THEN buys_computer yes
IF age gt40 AND credit_rating fair THEN
buys_computer no.

128
Bayesian Classification Why?

Probabilistic learning Calculate explicit
probabilities for hypothesis, among the most
practical approaches to certain types of learning
problems.
Incremental Each training example can
incrementally increase or decrease the
probability that a hypothesis is correct. Prior
knowledge can be combined with observed data.
Probabilistic prediction Predict multiple
hypotheses, weighted by their probabilities.
Standard Even in cases where Bayesian methods
prove computationally intractable, they can
provide a standard of optimal decision making
against which other methods can be measured.

129
Bayesian Theorem

Given training data D, posteriori probability of
a hypothesis h, P(hD) follows the Bayes theorem
Practical difficulty require initial knowledge
of many probabilities, significant computational
cost.

130
Bayesian classification

The classification problem may be formalized
using a-posteriori probabilities
P(CX) prob. that the sample tuple
Xltx1,,xkgt is of class C.
E.g. P(classN outlooksunny,windytrue,)
Idea assign to sample X the class label C such
that P(CX) is maximal

131
Estimating a-posteriori probabilities

Bayes theorem
P(CX) P(XC)P(C) / P(X)
P(X) is constant for all classes
P(C) relative freq of class C samples
C such that P(CX) is maximum C such that
P(XC)P(C) is maximum
Problem computing P(XC) is unfeasible!

132
Naïve Bayesian Classification

Naïve assumption attribute independence
P(x1,,xkC) P(x1C)P(xkC)
If i-th attribute is categoricalP(xiC) is
estimated as the relative frequency of samples
having value xi as i-th attribute in class C
If i-th attribute is continuousP(xiC) is
estimated through a Gaussian density function
Computationally easy in both cases

133
Play Tennis Example Data
N - not to play tennis
P - play tennis
134
Play-tennis example estimating P(xiC)
135
Play-tennis example classifying X

An unseen sample X ltrain, hot, high, falsegt
P(XP)P(P) P(rainP)P(hotP)P(highP)P(fa
lseP)P(P)
3/92/93/96/99/14 0.010582
P(XN)P(N) P(rainN)P(hotN)P(highN)P(fa
lseN)P(N)
2/52/54/52/55/14 0.018286
Sample X is classified in class N (dont play)

136
Neural Networks

Advantages
prediction accuracy is generally high
robust, works when training examples contain
errors
output may be discrete, real-valued, or a vector
of several discrete or real-valued attributes
fast evaluation of the learned target function.
Criticism
long training time
difficult to understand the learned function
(weights).
not easy to incorporate domain knowledge

137
Artificial Neuron
x0
w0
w1
s?xiwi
yf(s)
x1
y
w2
x2
138
A Neural Network
139
Network Training

The ultimate objective of training
obtain a set of weights that makes almost all the
tuples in the training data classified correctly
Steps
Initialize weights with random values
Feed the input tuples into the network one by one
For each unit
Compute the net input to the unit as a linear
combination of all the inputs to the unit
Compute the output value using the activation
function
Compute the error
Update the weights and the bias

140
Learning Paradigms
(1) Supervised adjust weights using Error
Desired - Actual
(2) Unsupervised adjust weights using
reinforcement
Inputs
Actual Output
141
Training Neural Network
142
Classification Example
x2
x1
143
Equation of a Line
2x1 3x2 - 6 0
x2
2
2x1 3x2 - 6 gt 0
2x1 3x2 - 6 lt 0
x1
0
3
144
Neural Classifier
x01
w0 -6
s?xiwi
y 1?
w1 2
x1
ysgn(s)
y -1?
w2 3
x2
145
Genetic Algorithms

GA based on an analogy to biological evolution
Each rule is represented by a string of bits
An initial population is created consisting of
randomly generated rules
Based on the notion of survival, a new population
is formed to consists of the rules and their
offsprings
Offsprings are generated by crossover and mutation

146
Genetic Algorithms
147
Example Initial Population

b_c
1
n
2
n
3
y
4
y
5
y
6
n
7
y
8
n
9
y
10
y
11
y
12
y
13
y
14
n

b_c
100 100 01 01 01
1
OI
100 100 01 10 01
2
OI
010 100 01 01 10
3
IO
001 010 01 01 10
4
IO
001 001 10 01 10
5
IO
001 001 10 10 01
6
OI
010 001 10 10 10
7
IO
100 010 01 01 01
8
OI
100 001 10 01 10
9
IO
001 010 10 01 10
10
IO
100 010 10 10 10
11
IO
010 010 01 10 10
12
IO
13
IO
010 100 10 01 10
14
OI
001 010 01 10 01
148
Example Generated Rule
IF age lt30 AND student no THEN
buys_computer no
001 111 01 11 01
149
Instance-Based Methods

Instance-based learning Store training examples
and delay the processing (lazy evaluation)
until a new instance must be classified.
Typical approaches
k-nearest neighbor approach
Instances represented as points in a Euclidean
space.
Locally weighted regression
Constructs local approximation.
Case-based reasoning
Uses symbolic representations and knowledge-based
inference.

150
The k-Nearest Neighbor Algorithm

All instances correspond to points in the n-D
space.
The nearest neighbor are defined in terms of
Euclidean distance.
The target function could be discrete- or real-
valued.
For discrete-valued, the k-NN returns the most
common value among the k training examples
nearest to xq.
Vonoroi diagram the decision surface induced by
1-NN for a typical set of training examples.

.
_
_
_
.
_
.

.

.
_

xq
.
_

151
Discussion on the k-NN Algorithm

The k-NN algorithm for continuous-valued target
functions.
Calculate the mean values of the k nearest
neighbors.
Distance-weighted nearest neighbor algorithm.
Weight the contribution of each of the k
neighbors according to their distance to the
query point xq.
giving greater weight to closer neighbors
Similarly, we can distance-weight the instances
for real-valued target functions.
Robust to noisy data by averaging k-nearest
neighbors.
Curse of dimensionality distance between
neighbors could be dominated by irrelevant
attributes. To overcome it axes stretch or
elimination of the least relevant attributes.

152
Fuzzy Set Approaches

Fuzzy logic uses truth values between 0.0 and 1.0
to represent the degree of membership (such as
using fuzzy membership graph)
Attribute values are converted to fuzzy values
e.g., income is mapped into the discrete
categories low, medium, high with fuzzy values
calculated
For a given new sample, more than one fuzzy value
may apply
Each applicable rule contributes a vote for
membership in the categories
Typically, the truth values for each predicted
category are summed

153
Fuzzy Sets
Membership Grade ?
1
Warm
Mild
Cold
0
F
30
60
154
Fuzzy Sets
?
1
0.85
Warm
Mild
Cold
0.24
0
F
30
60
38
155
A Discrete Fuzzy Set
Temperature cold0.24, mild0.85
Membership of cold to the set Temperature is 0.24
Membership of mild to the set Temperature is 0.85

Write a Comment

User Comments (0)