Monday, January 22, 2001 - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Monday, January 22, 2001

Description:

Kansas State University. Department of Computing and ... Version space - space of all currently consistent (or satisfiable) hypotheses. Inductive Bias ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 19
Provided by: lindajacks
Category:

less

Transcript and Presenter's Notes

Title: Monday, January 22, 2001


1
Lecture 3
Data Mining Basics
Monday, January 22, 2001 William H.
Hsu Department of Computing and Information
Sciences, KSU http//www.cis.ksu.edu/bhsu Readin
gs Chapter 1-2, Witten and Frank Sections
2.7-2.8, Mitchell
2
Lecture Outline
  • Read Chapters 1-2, Witten and Frank 2.7-2.8,
    Mitchell
  • Homework 1 Due Friday, February 2, 2001 (before
    12 AM CST)
  • Paper Commentary 1 Due This Friday (in class)
  • U. Fayyad, From Data Mining to Knowledge
    Discovery
  • See guidelines in course notes
  • Supervised Learning (continued)
  • Version spaces
  • Candidate elimination algorithm
  • Derivation
  • Examples
  • The Need for Inductive Bias
  • Representations (hypothesis languages) a
    worst-case scenario
  • Change of representation
  • Computational Learning Theory

3
Representing Version Spaces
  • Hypothesis Space
  • A finite meet semilattice (partial ordering
    Less-Specific-Than ? ? all ?)
  • Every pair of hypotheses has a greatest lower
    bound (GLB)
  • VSH,D ? the consistent poset (partially-ordered
    subset of H)
  • Definition General Boundary
  • General boundary G of version space VSH,D set
    of most general members
  • Most general ? minimal elements of VSH,D ? set
    of necessary conditions
  • Definition Specific Boundary
  • Specific boundary S of version space VSH,D set
    of most specific members
  • Most specific ? maximal elements of VSH,D ? set
    of sufficient conditions
  • Version Space
  • Every member of the version space lies between S
    and G
  • VSH,D ? h ? H ? s ? S . ? g ? G . g ?P h ?P
    s where ?P ? Less-Specific-Than

4
Candidate Elimination Algorithm 1
1. Initialization G ? (singleton) set containing
most general hypothesis in H, denoted lt?, ,
?gt S ? set of most specific hypotheses in H,
denoted ltØ, , Øgt 2. For each training example
d If d is a positive example (Update-S) Remove
from G any hypotheses inconsistent with d For
each hypothesis s in S that is not consistent
with d Remove s from S Add to S all minimal
generalizations h of s such that 1. h is
consistent with d 2. Some member of G is more
general than h (These are the greatest lower
bounds, or meets, s ? d, in VSH,D) Remove from S
any hypothesis that is more general than another
hypothesis in S (remove any dominated elements)
5
Candidate Elimination Algorithm 2
(continued) If d is a negative example
(Update-G) Remove from S any hypotheses
inconsistent with d For each hypothesis g in G
that is not consistent with d Remove g from G Add
to G all minimal specializations h of g such
that 1. h is consistent with d 2. Some member
of S is more specific than h (These are the least
upper bounds, or joins, g ? d, in VSH,D) Remove
from G any hypothesis that is less general than
another hypothesis in G (remove any dominating
elements)
6
Example Trace
d1 ltSunny, Warm, Normal, Strong, Warm, Same, Yesgt
d2 ltSunny, Warm, High, Strong, Warm, Same, Yesgt
d3 ltRainy, Cold, High, Strong, Warm, Change, Nogt
d4 ltSunny, Warm, High, Strong, Cool, Change, Yesgt
7
What Next Training Example?
  • What Query Should The Learner Make Next?
  • How Should These Be Classified?
  • ltSunny, Warm, Normal, Strong, Cool, Changegt
  • ltRainy, Cold, Normal, Light, Warm, Samegt
  • ltSunny, Warm, Normal, Light, Warm, Samegt

8
What Justifies This Inductive Leap?
  • Example Inductive Generalization
  • Positive example ltSunny, Warm, Normal, Strong,
    Cool, Change, Yesgt
  • Positive example ltSunny, Warm, Normal, Light,
    Warm, Same, Yesgt
  • Induced S ltSunny, Warm, Normal, ?, ?, ?gt
  • Why Believe We Can Classify The Unseen?
  • e.g., ltSunny, Warm, Normal, Strong, Warm, Samegt
  • When is there enough information (in a new case)
    to make a prediction?

9
An Unbiased Learner
  • Example of A Biased H
  • Conjunctive concepts with dont cares
  • What concepts can H not express? (Hint what
    are its syntactic limitations?)
  • Idea
  • Choose H that expresses every teachable concept
  • i.e., H is the power set of X
  • Recall A ? B B A (A X B
    labels H A ? B)
  • Rainy, Sunny ? Warm, Cold ? Normal, High ?
    None, Mild, Strong ? Cool, Warm ? Same,
    Change ? 0, 1
  • An Exhaustive Hypothesis Language
  • Consider H disjunctions (?), conjunctions
    (?), negations () over previous H
  • H 2(2 2 2 3 2 2) 296 H
    1 (3 3 3 4 3 3) 973
  • What Are S, G For The Hypothesis Language H?
  • S ? disjunction of all positive examples
  • G ? conjunction of all negated negative examples

10
Inductive Bias
  • Components of An Inductive Bias Definition
  • Concept learning algorithm L
  • Instances X, target concept c
  • Training examples Dc ltx, c(x)gt
  • L(xi, Dc) classification assigned to instance
    xi by L after training on Dc
  • Definition
  • The inductive bias of L is any minimal set of
    assertions B such that, for any target concept c
    and corresponding training examples Dc, ? xi
    ? X . (B ? Dc ? xi) ? L(xi, Dc) where A ? B
    means A logically entails B
  • Informal idea preference for (i.e., restriction
    to) certain hypotheses by structural (syntactic)
    means
  • Rationale
  • Prior assumptions regarding target concept
  • Basis for inductive generalization

11
Inductive Systemsand Equivalent Deductive Systems
12
Three Learners with Different Biases
  • Rote Learner
  • Weakest bias anything seen before, i.e., no bias
  • Store examples
  • Classify x if and only if it matches previously
    observed example
  • Version Space Candidate Elimination Algorithm
  • Stronger bias concepts belonging to conjunctive
    H
  • Store extremal generalizations and
    specializations
  • Classify x if and only if it falls within S and
    G boundaries (all members agree)
  • Find-S
  • Even stronger bias most specific hypothesis
  • Prior assumption any instance not observed to be
    positive is negative
  • Classify x based on S set

13
Hypothesis SpaceA Syntactic Restriction
  • Recall 4-Variable Concept Learning Problem
  • Bias Simple Conjunctive Rules
  • Only 16 simple conjunctive rules of the form y
    xi ? xj ? xk
  • y Ø, x1, , x4, x1 ? x2, , x3 ? x4, x1 ? x2 ?
    x3, , x2 ? x3 ? x4, x1 ? x2 ? x3 ? x4
  • Example above no simple rule explains the data
    (counterexamples?)
  • Similarly for simple clauses (conjunction and
    disjunction allowed)

14
Hypothesis Spacem-of-n Rules
  • m-of-n Rules
  • 32 possible rules of the form y 1 iff
    at least m of the following n variables are 1
  • Found A Consistent Hypothesis!

15
Views of Learning
  • Removal of (Remaining) Uncertainty
  • Suppose unknown function was known to be m-of-n
    Boolean function
  • Could use training data to infer the function
  • Learning and Hypothesis Languages
  • Possible approach to guess a good, small
    hypothesis language
  • Start with a very small language
  • Enlarge until it contains a hypothesis that fits
    the data
  • Inductive bias
  • Preference for certain languages
  • Analogous to data compression (removal of
    redundancy)
  • Later coding the model versus coding the
    uncertainty (error)
  • We Could Be Wrong!
  • Prior knowledge could be wrong (e.g., y x4 ?
    one-of (x1, x3) also consistent)
  • If guessed language was wrong, errors will occur
    on new cases

16
Two Strategies for Machine Learning
  • Develop Ways to Express Prior Knowledge
  • Role of prior knowledge guides search for
    hypotheses / hypothesis languages
  • Expression languages for prior knowledge
  • Rule grammars stochastic models etc.
  • Restrictions on computational models other
    (formal) specification methods
  • Develop Flexible Hypothesis Spaces
  • Structured collections of hypotheses
  • Agglomeration nested collections (hierarchies)
  • Partitioning decision trees, lists, rules
  • Neural networks cases, etc.
  • Hypothesis spaces of adaptive size
  • Either Case Develop Algorithms for Finding A
    Hypothesis That Fits Well
  • Ideally, will generalize well
  • Later Bias Optimization (Meta-Learning, Wrappers)

17
Terminology
  • The Version Space Algorithm
  • Version space constructive definition
  • S and G boundaries characterize learners
    uncertainty
  • Version space can be used to make predictions
    over unseen cases
  • Algorithms Find-S, List-Then-Eliminate,
    candidate elimination
  • Consistent hypothesis - one that correctly
    predicts observed examples
  • Version space - space of all currently consistent
    (or satisfiable) hypotheses
  • Inductive Bias
  • Strength of inductive bias how few hypotheses?
  • Specific biases based on specific languages
  • Hypothesis Language
  • Searchable subset of the space of possible
    descriptors
  • m-of-n, conjunctive, disjunctive, clauses
  • Ability to represent a concept

18
Summary Points
  • Introduction to Supervised Concept Learning
  • Inductive Leaps Possible Only if Learner Is
    Biased
  • Futility of learning without bias
  • Strength of inductive bias proportional to
    restrictions on hypotheses
  • Modeling Inductive Learners with Equivalent
    Deductive Systems
  • Representing inductive learning as theorem
    proving
  • Equivalent learning and inference problems
  • Syntactic Restrictions
  • Example m-of-n concept
  • Other examples?
  • Views of Learning and Strategies
  • Removing uncertainty (data compression)
  • Role of knowledge
  • Next Lecture More on Knowledge Discovery in
    Databases (KDD)
Write a Comment
User Comments (0)
About PowerShow.com