Machine Learning - PowerPoint PPT Presentation

About This Presentation
Title:

Machine Learning

Description:

Machine Learning Foundations of Artificial Intelligence – PowerPoint PPT presentation

Number of Views:168
Avg rating:3.0/5.0
Slides: 44
Provided by: Bamsh2
Category:
Tags: learning | machine

less

Transcript and Presenter's Notes

Title: Machine Learning


1
Machine Learning
  • Foundations of Artificial Intelligence

2
Learning
  • What is Learning?
  • Learning in AI is also called machine learning or
    pattern recognition.
  • The basic objective is to allow an intelligent
    agent to discover autonomously knowledge from
    experience.
  • Lets examine the definition more closely
  • an intelligent agent The ability to learn
    requires a prior level of intelligence and
    knowledge. Learning has to start from an
    existing level of capability.
  • to discover autonomously Learning is
    fundamentally about an agent recognizing new
    facts for its own use and acquiring new abilities
    that reinforce its own existing abilities.
    Literal programming, i.e. rote learning from
    instruction, is not useful.
  • knowledge Whatever is learned has to be
    represented in some way that the agent can use.
    If you can't represent it, you can't learn it
    is a corollary of the slogan Knowledge is
    power.
  • from experience Experience is typically a set
    of so-called training examples examples may be
    categorized or not. They may be random or
    selected by a teacher. They may include
    explanations or not.

3
Learning Agent
Critic
Percepts
Problem solver
Learning element
KB
Actions
4
Learning element
  • Design of a learning element is affected by
  • Which components of the performance element are
    to be learned
  • What feedback is available to learn these
    components
  • What representation is used for the components
  • Type of feedback
  • Supervised learning correct answers for each
    training example
  • Unsupervised learning correct answers not given
  • Reinforcement learning occasional
    rewards/feedback

5
Inductive Learning
  • Inductive Learning
  • inductive learning involves learning generalized
    rules from specific examples (can think of this
    as the inverse of deduction)
  • main task given a set of examples, each
    classified as positive or negative produce a
    concept description that matches exactly the
    positive examples
  • Some Notes
  • The examples are coded in some representation
    language, e.g. they are coded by a finite set of
    real-valued features.
  • The concept description is in a certain language
    that is presumably a superset of the language of
    possible example encodings.
  • A correct concept description is one that
    classifies correctly ALL possible examples, not
    just those given in the training set.
  • Fundamental Difficulties with Induction
  • cant generalize with perfect certainty
  • examples and concepts are NOT available
    directly they are only available through
    representations which may be more or less
    adequate to capture them
  • some examples may be classified as both positive
    and negative
  • the features supplied may not be sufficient to
    discriminate between positive and negative
    examples

6
Inductive Learning Frameworks
  1. Function-learning formulation
  2. Logic-inference formulation

7
Inductive learning
  • Simplest form learn a function from examples
  • f is the target function
  • An example is a pair (x, f(x))
  • Problem find a hypothesis h
  • such that h f
  • given a training set of examples
  • This is a highly simplified model of real
    learning
  • Ignores prior knowledge
  • Assumes examples are given

8
Inductive learning
  • Construct/adjust h to agree with f on training
    set
  • h is consistent if it agrees with f on all
    examples
  • E.g., curve fitting

9
Inductive learning
  • Construct/adjust h to agree with f on training
    set
  • h is consistent if it agrees with f on all
    examples
  • E.g., curve fitting

10
Inductive learning
  • Construct/adjust h to agree with f on training
    set
  • h is consistent if it agrees with f on all
    examples
  • E.g., curve fitting

11
Inductive learning
  • Construct/adjust h to agree with f on training
    set
  • h is consistent if it agrees with f on all
    examples
  • E.g., curve fitting

12
Inductive learning
  • Construct/adjust h to agree with f on training
    set
  • h is consistent if it agrees with f on all
    examples
  • E.g., curve fitting

13
Inductive learning
  • Construct/adjust h to agree with f on training
    set
  • h is consistent if it agrees with f on all
    examples
  • E.g., curve fitting
  • Ockhams razor prefer the simplest hypothesis
    consistent with data

14
Logic-Inference Formulation
  • Background knowledge KB
  • Training set D (observed knowledge) that is not
    logically implied by KB
  • Inductive inference Find h (inductive
    hypothesis) such that KB and h imply D

h D is a trivial, but uninteresting solution
(data caching)
Usually, not a sound inference
15
Rewarded Card Example
  • Deck of cards, with each card designated by
    r,s, its rank and suit, and some cards
    rewarded
  • Background knowledge KB
  • ((r1) v v (r10)) ? NUM(r)((rJ) v (rQ) v
    (rK)) ? FACE(r)((sS) v (sC)) ?
    BLACK(s)((sD) v (sH)) ? RED(s)
  • Training set DREWARD(4,C) ? REWARD(7,C) ?
    REWARD(2,S) ?
    ?REWARD(5,H) ? ?REWARD(J,S)
  • Possible inductive hypothesish ? (NUM(r) ?
    BLACK(s) ? REWARD(r,s))

Note There are several possible inductive
hypotheses
16
Learning a Predicate
  • Set E of objects (e.g., cards)
  • Goal predicate CONCEPT(x), where x is an object
    in E,
  • takes the value True or False (e.g., REWARD)
  • Observable predicates A(x), B(X),
  • e.g., NUM, RED
  • Training set
  • values of CONCEPT for some combinations of values
    of the observable predicates

17
A Possible Training Set
Ex. A B C D E CONCEPT
1 True True False True False False
2 True False False False False True
3 False False True True True False
4 True True True False True True
5 False True True False False False
6 True True False True True False
7 False False True False True False
8 True False True False True True
9 False False False True True False
10 True True True True False True
Note that the training set does not say whether
an observable predicate A, , E is pertinent or
not
18
Learning a Predicate
  • Set E of objects (e.g., cards)
  • Goal predicate CONCEPT(x), where x is an object
    in E,
  • takes the value True or False (e.g., REWARD)
  • Observable predicates A(x), B(X),
  • e.g., NUM, RED
  • Training set
  • values of CONCEPT for some combinations of values
    of the observable predicates
  • Find a representation of CONCEPT in the form
  • CONCEPT(x) ? S(A,B, )
  • where S(A,B,) is a sentence built with the
    observable predicates, e.g.
  • CONCEPT(x) ? A(x) ? (?B(x) v C(x))

19
Example set
  • An example consists of the values of CONCEPT and
    the observable predicates for some object x
  • A example is positive if CONCEPT is True, else it
    is negative
  • The set X of all examples is the example set
  • The training set is a subset of X

20
Hypothesis Space
  • An hypothesis is any sentence h of the form
  • CONCEPT(x) ? S(A,B, )
  • where S(A,B,) is a sentence built with the
    observable predicates
  • The set of all hypotheses is called the
    hypothesis space H
  • An hypothesis h agrees with an example if it
    gives the correct value of CONCEPT

21
Inductive Learning Scheme
22
Size of Hypothesis Space
  • n observable predicates
  • 2n entries in truth table
  • In the absence of any restriction (bias), there
    are 22n hypotheses to choose from
  • n 6 ? 2x1019 hypotheses!

23
Multiple Inductive Hypotheses
Rewarded Card Example (Continued)
h1 ? NUM(x) ? BLACK(x) ? REWARD(x) h2 ?
BLACK(r,s) ? ?(rJ) ? REWARD(r,s) h3 ?
(r,s4,C) ? (r,s7,C) ? r,s2,S) ?
REWARD(r,s) h4 ? ?(r,s5,H) ?
?(r,sJ,S) ? REWARD(r,s) agree with all the
examples in the training set
24
Inductive Bias
  • Need for a system of preferences called a bias
    to compare possible hypotheses
  • Keep-It-Simple (KIS) Bias
  • If an hypothesis is too complex it may not be
    worth learning it
  • There are much fewer simple hypotheses than
    complex ones, hence the hypothesis space is
    smaller
  • Examples
  • Use much fewer observable predicates than
    suggested by the training set
  • Constrain the learnt predicate, e.g., to use only
    high-level observable predicates such as NUM,
    FACE, BLACK, and RED and/or to have simple syntax
    (e.g., conjunction of literals)

If the bias allows only sentences S that are
conjunctions of k ltlt n predicates picked from the
n observable predicates, then the size of H is
O(nk)
25
Version Spaces
  • Idea assume you are looking for a CONJUNCTIVE
    CONCEPT
  • e.g., spade A, club 7, club 9 yes
  • club 8, heart 5 no
  • concept odd and black
  • now notice that the set of conjunctive concepts
    is partially ordered by specificity

any card
  • at any point, keep most specific and least
    specific
  • conjuncts consistent with data
  • most specific
  • anything more specific misses some positive
    instances
  • always exists -- conjoin all OK conjunctions
  • least specific
  • anything less specific admits some negative
    instances
  • may not be unique -- imagine all you know is
    club
  • 4 not ok, odd black ok, spade ok, black not ok
  • Idea is to gradually merge least and most
    specific as data comes in.

black
odd black
spade
odd spade
3 of spade
26
Version Spaces Example
  • Step 0 most specific concept (msc) is the empty
    set least specific concept (lsc) is the set of
    all cards.
  • Step 1 A-spade is found to be in target set
  • msc A-spade
  • lsc set of all cards
  • Step 2 7-club is found to be in target set
  • msc odd black cards
  • lsc set of all cards
  • Step 3 8-heart is not in target set
  • msc odd black cards
  • lsc all odd cards OR all black cards
  • . . .

The training examples (obtained) incrementally
27
Predicate as a Decision Tree
The predicate CONCEPT(x) ? A(x) ? (?B(x) v C(x))
can be represented by the following decision
tree
  • ExampleA mushroom is poisonous iffit is yellow
    and small, or yellow,
  • big and spotted
  • x is a mushroom
  • CONCEPT POISONOUS
  • A YELLOW
  • B BIG
  • C SPOTTED

28
Decision Trees
  • What is a Decision Tree
  • it takes as input the description of a situation
    as a set of attributes (features) and outputs a
    yes/no decision (so it represents a Boolean
    function)
  • each leaf is labeled "positive or "negative",
    each node is labeled with an attribute (or
    feature), and each edge is labeled with a value
    for the feature of its parent node
  • Attribute-value language for examples
  • in many inductive tasks, especially learning
    decision trees, we need a representation language
    for examples
  • each example is a finite feature vector
  • a concept is a decision tree where nodes are
    features

29
Decision Trees
  • Example is it a good day to play golf?
  • a set of attributes and their possible values
  • outlook sunny, overcast, rain
  • temperature cool, mild, hot
  • humidity high, normal
  • windy true, false

A particular instance in the training set might
be ltovercast, hot, normal, falsegt play
In this case, the target class is a binary
attribute, so each instance represents a
positive or a negative example.
30
Using Decision Trees for Classification
  • Examples can be classified as follows
  • 1. look at the example's value for the feature
    specified
  • 2. move along the edge labeled with this value
  • 3. if you reach a leaf, return the label of the
    leaf
  • 4. otherwise, repeat from step 1
  • Example (a decision tree to decide whether to go
    play golf)

outlook
sunny
overcast
rain
humidity
windy
high
normal
true
false
31
Classification 3 Step Process
  • 1. Model construction (Learning)
  • Each record (instance) is assumed to belong to a
    predefined class, as determined by one of the
    attributes, called the class label
  • The set of records used for construction of the
    model is called training set
  • The model is usually represented in the form of
    classification rules, (IF-THEN statements) or
    decision trees
  • 2. Model Evaluation (Accuracy)
  • Estimate accuracy rate of the model based on a
    test set
  • The known label of test sample is compared to
    classified result from model
  • Accuracy rate percentage of test set samples
    correctly classified by the model
  • Test set is independent of training set otherwise
    over-fitting will occur
  • 3. Model Use (Classification)
  • The model is used to classify unseen instances
    (assigning class labels)
  • Predict the value of an actual attribute

32
Memory-Based Reasoning
  • Basic Idea classify new instances based on their
    similarity to instances we have seen before
  • also called instance-based learning
  • Simplest form of MBR Rote Learning
  • learning by memorization
  • save all previously encountered instance given a
    new instance, find one from the memorized set
    that most closely resembles the new one assign
    new instance to the same class as the nearest
    neighbor
  • more general methods try to find k nearest
    neighbors rather than just one
  • but, how do we define resembles?
  • MBR is lazy
  • defers all of the real work until new instance is
    obtained no attempts are made to learn a
    generalized model from the training set
  • less data preprocessing and model evaluation, but
    more work has to be done at classification time

33
MBR Collaborative Filtering
  • Collaborative Filtering or Social Learning
  • idea is to give recommendations to a user based
    on the ratings of objects by other users
  • usually assumes that features in the data are
    similar objects (e.g., Web pages, music, movies,
    etc.)
  • usually requires explicit ratings of objects by
    users based on a rating scale
  • there have been some attempts to obtain ratings
    implicitly based on user behavior (mixed results
    problem is that implicit ratings are often
    binary)
  • Nearest Neighbors Strategy
  • Find similar users and predicted (weighted)
    average of user ratings
  • We can use any distance or similarity measure to
    compute similarity among users (user ratings on
    items viewed as a vector)
  • In case of ratings, often the Pearson r algorithm
    is used to compute correlations

34
MBR Collaborative Filtering
  • Collaborative Filtering Example
  • A movie rating system
  • Ratings scale 1 detest 7 love it
  • Historical DB of users includes ratings of movies
    by Sally, Bob, Chris, and Lynn
  • Karen is a new user who has rated 3 movies, but
    has not yet seen Independence Day should we
    recommend it to her?

Will Karen like Independence Day?
35
Clustering
Clustering is a process of partitioning a set of
data (or objects) in a set of meaningful
sub-classes, called clusters
Helps users understand the natural grouping or
structure in a data set
  • Cluster
  • a collection of data objects that are similar
    to one another and thus can be treated
    collectively as one group
  • but as a collection, they are sufficiently
    different from other groups
  • Clustering
  • unsupervised classification
  • no predefined classes

36
Distance or Similarity Measures
  • Measuring Distance
  • In order to group similar items, we need a way to
    measure the distance between objects (e.g.,
    records)
  • Note distance inverse of similarity
  • Often based on the representation of objects as
    feature vectors

Term Frequencies for Documents
An Employee DB
37
Distance or Similarity Measures
  • Common Distance Measures
  • Manhattan distance
  • Euclidean distance
  • Cosine similarity

38
What Is Good Clustering?
  • A good clustering will produce high quality
    clusters in which
  • the intra-class (that is, intra-cluster)
    similarity is high
  • the inter-class similarity is low
  • The quality of a clustering result also depends
    on both the similarity measure used by the method
    and its implementation
  • The quality of a clustering method is also
    measured by its ability to discover some or all
    of the hidden patterns
  • The quality of a clustering result also depends
    on the definition and representation of cluster
    chosen

39
Applications of Clustering
  • Clustering has wide applications in Pattern
    Recognition
  • Spatial Data Analysis
  • create thematic maps in GIS by clustering feature
    spaces
  • detect spatial clusters and explain them in
    spatial data mining
  • Image Processing
  • Market Research
  • Information Retrieval
  • Document or term categorization
  • Information visualization and IR interfaces
  • Web Mining
  • Cluster Web usage data to discover groups of
    similar access patterns
  • Web Personalization

40
Learning by Discovery
  • One example AM by Doug Lenat at Stanford
  • a mathematical system
  • inputs set theory (union, intersection, etc)
    how to do mathematics (based on a book by
    Polya), e.g., if f is an interesting function of
    two arguments, then f(x,x) is an interesting
    function on one, etc.
  • speculated about what was interesting an made
    conjectures, etc.
  • What AM discovered
  • integers (as equivalence relation on cardinality
    of sets)
  • addition (using disjoint union of sets)
  • multiplication
  • primes 1 was interesting, the function returning
    the cardinality of set of divisors was
    interesting, etc.
  • Glodbachs conjecture all even numbers are the
    sum of two prime numbers (note that AM did not
    prove it, just discovered that it was
    interesting)
  • Why was AM so successful?
  • Connection between LISP and mathematics
    (mutations of small bits of LISP code are likely
    to be interesting)
  • Doesnt extend to other domains
  • Lessons from EURISKO (fleet game)

41
Explanation-Based Learning
  • Explanation- based learning (EBL) systems try to
    explain why each training instance belongs to the
    target concept.
  • The resulting proof is then generalized and
    saved.
  • If a new instance can be explained in the same
    manner as a previous instance, then it is also
    assumed to be a member of the target concept.
  • Like macro- operators, EBL systems never learn to
    solve a problem that they couldnt solve before
    (in principle).
  • However, they can become much more efficient at
    problem-solving by reorganizing the search space.
  • One of the strengths of EBL is that the resulting
    explanations are typically easy to understand.
  • One of the weaknesses of EBL is that they rely on
    a domain theory to generate the explanations.

42
Case-Based Learning
  • Case-based reasoning (CBR) systems keep track of
    previously seen instances and apply them directly
    to new ones.
  • In general, a CBR system simply stores each
    case that it experiences in a case base which
    represents its memory of previous episodes.
  • To reason about a new instance, the system
    consults its case base and finds the most similar
    case that its seen before. The old case is then
    adapted and applied to the new situation.
  • CBR is similar to reasoning by analogy. Many
    people believe that much of human learning is
    case- based in nature.

43
Connectionist Algorithms
  • Connectionist models (also called neural
    networks) are inspired by the interconnectivity
    of the brain.
  • Connectionist networks typically consist of many
    nodes that are highly interconnected. When a node
    is activated, it sends signals to other nodes so
    that they are activated in turn.
  • Using layers of nodes allows connectionist models
    to learn fairly complex functions.
  • Neural networks are loosely modeled after the
    biological processes involved in cognition
  • 1. Information processing involves many simple
    elements called neurons.
  • 2. Signals are transmitted between neurons using
    connecting links.
  • 3. Each link has a weight that controls the
    strength of its signal.
  • 4. Each neuron applies an activation function to
    the input that it receives from other neurons.
    This function determines its output.
Write a Comment
User Comments (0)
About PowerShow.com