Bayesian Classification AutoClass: Theory and Results - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Bayesian Classification AutoClass: Theory and Results

Description:

Find most likely class descriptors given X and priors. Data ... More math... J? T? AutoClass Attribute Models. Each class model is a product of attribute models ... – PowerPoint PPT presentation

Number of Views:140
Avg rating:3.0/5.0
Slides: 26
Provided by: ivo66
Category:

less

Transcript and Presenter's Notes

Title: Bayesian Classification AutoClass: Theory and Results


1
Bayesian ClassificationAutoClass Theory and
Results
  • Authors Peter Cheeseman and John Stutz
  • Presenter Ivo Everts

2
TOC
  • Introduction
  • Bayesian Classification
  • Data Representation
  • Symbols Used
  • AutoClass Model Overview
  • AutoClass Search Overview
  • AutoClass in Detail
  • AutoClass Search
  • AutoClass Attribute Models
  • Case Study
  • AutoClass vs Cobweb

3
Introduction
  • Task automatic discovery of classes from db
  • Classes reflect natural causal mechanisms
  • Maybe sample bias of data
  • Maybe a new discovery
  • Process of finding classes

4
Bayesian Classification
  • Classification
  • Find most likely class descriptors given X and
    priors

5
Data Representation
  • Instances are ordered vectors of attribute values
  • Simple attribute values

6
Symbols
7
AutoClass Model Overview
  • Probabilistic model pdf
  • Gaussian
  • Bernoulli
  • T
  • Free parameters
  • V

8
AutoClass Model Overview
  • AutoClass fundamental model is a classical finite
    mixture model
  • Two parts
  • Interclass mixture probability
  • Intraclass pdf

9
AutoClass Model Overview
  • AutoClass fundamental model is a classical finite
    mixture model
  • Two parts
  • Interclass mixture probability
  • Intraclass pdf

10
AutoClass Model Overview
  • Interclass pdf Bernoulli
  • Class pdf is a product over attribute pdfs
  • Bernoulli
  • Gaussian
  • Poisson

11
AutoClass Model Overview
  • Instances are never assigned to a class
  • Weighted by probability on class membership

12
AutoClass Search Overview
  • Two things
  • MAP for any T
  • Most probable T
  • Number of classes

13
AutoClass in Detail
  • Classes constitute discrete partitioning
  • ? Bernoulli
  • Mixture probabilities

14
AutoClass in Detail
  • Class pdf is a product of pdfs modelling
    conditionally independant attributes

15
AutoClass in Detail
  • Combine Interclass and Intraclass probabilities
  • sum over classes
  • product over instances
  • Note similarity between instances is accounted
    for by class membership

16
AutoClass in Detail
  • So far, only mixture model
  • Include priors on parameter values to convert to
    Bayesian model

17
AutoClass Search
  • We seek
  • MAP parameter values, obtained from parameters
    posterior pdf
  • MAP model form

18
AutoClass Search
  • All this is easily evaluated for known
    parameters
  • ? EM
  • Given estimates, we can compute
    weighted assignments
  • We now have a weighted known class case
  • Switch between estimating weights and parameter
    values

19
AutoClass Search
  • More math...
  • J?
  • T?

20
AutoClass Attribute Models
  • Each class model is a product of attribute models
  • Models may differ, but must model the same
    attributes

21
AutoClass Attribute Models
  • Different data, different models
  • Discrete ? Bernoulli distribution
  • Real valued scalars ? Log Gaussian
  • Missing values (discrete)
  • Modeled similar as normal values
  • Hierarchical
  • Specific classes inherit model term(s) from
    general classes.
  • Attribute model nodes

22
AutoClass Attribute Models
  • Different data, different models
  • Irrelevant attributes
  • In AutoClass, attribute is not taken into account
    by class description if irrelevant
  • This is an error
  • Solutions
  • Root node of hierarchical model
  • In mixture model, fix appropriate T

23
Case Study
  • Infrared Astronomical Satellite (IRAS) Data
  • 5425 spectra, 100 blue channels 7-14 microns, 100
    red channels 10-24 microns
  • Independant normally distributed atts
  • Subtle distinctions between superficially similar
    spectra
  • Discovered new Carbon stars

24
Case Study
  • Lessons
  • Knowledge discovery is a process
  • Howto handle experts
  • Undocumented data (pre)processing

25
AutoClass vs Cobweb
  • Similarities
  • Description of classes in terms of probability
    distributions over attributes
  • General classes summarize their children (from
    AutoClass III)
  • Similarities
  • Assignment of instances to classes
  • Classification of new instances
  • Clustering process (incremental sorting vs EM)
Write a Comment
User Comments (0)
About PowerShow.com