Boosted Augmented Naive Bayes Efficient discriminative learning of Bayesian network classifiers - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Boosted Augmented Naive Bayes Efficient discriminative learning of Bayesian network classifiers

Description:

Boosted Augmented Naive Bayes. Efficient discriminative learning of. Bayesian network classifiers ... Generalizes Boosted Na ve Bayes (Elkan 1997) Comprehensive ... – PowerPoint PPT presentation

Number of Views:333
Avg rating:3.0/5.0
Slides: 29
Provided by: velblodVid
Category:

less

Transcript and Presenter's Notes

Title: Boosted Augmented Naive Bayes Efficient discriminative learning of Bayesian network classifiers


1
Boosted Augmented Naive Bayes Efficient
discriminative learning of Bayesian network
classifiers
  • Yushi Jing
  • GVU, College of Computing, Georgia Institute of
    Technology
  • Vladimir Pavlovic
  • Department of Computer Science, Rutgers
    University
  • James M. Rehg
  • GVU, College of Computing, Georgia Institute of
    Technology

2
Contribution
  • Boosting approach to Bayesian network
    classification
  • Additive combination of simple models (e.g. Naïve
    Bayes)
  • Weighted maximum likelihood learning
  • Generalizes Boosted Naïve Bayes (Elkan 1997)
  • Comprehensive experimental evaluation of BNB.
  • Boosted Augmented Naïve Bayes (BAN)
  • Efficient training algorithm
  • Competitive classification accuracy
  • Naïve Bayes, TAN, BNC (2004), ELR (2001)

3
Bayesian network
  • Modular and Intuitive graphical representation
  • Explicit Probabilistic Representation

Bayesian network classifiers
  • Joint distribution
  • Conditional distribution
  • Class Label
  • How to efficiently train Bayesian network
    discriminatively to improve its classification
    accuracy?

4
Parameter Learning
  • Maximum Likelihood parameter learning
  • Efficient parameter learning algorithm
  • Maximizes LLG score
  • No analytic solution for parameters that
    maximizes CLLG

5
Model selection
  • ML does not optimize CLLA
  • ELRA optimizes CLLA
  • (Greiner and Zhou, 2002)

A
Excellent classification accuracy Computationally
expensive in training
B
  • ML optimizes CLLB when B is optimal
  • BNC algorithm searches for the optimal
    structure
  • (Grossman and Domingos, 2004)

6
Talk outline
Our Goal
  • Combine parameter and structure optimization
  • Avoid over-fitting
  • Retain training efficiency
  • Minimization function for Boosted Bayesian
    network
  • Empirical Evaluation of Boosted Naïve Bayes
  • Boosted Augmented Naïve Bayes (BAN)
  • Empirical Evaluation of BAN

7
Exponential Loss Function (ELF)
  • Boosted Bayesian network classifier minimizes ELF
    function.
  • ELFF is an upper bound of CLLF

8
Minimizing ELF via ensemble method
  • Ensemble method
  • Adaboost (Population version) constructs F(x)
    additively to approximately minimizes ELFF
  • Discriminatively updates the data weights
  • Tractable ML learning to train the parameters

9
Results 25 UCI datasets (BNB)
BNB vs. NB 0.151 vs. 0.173
10
Results 25 UCI datasets (BNB)
(13)
(14)
BNB (9)
BNB (10)
BNB vs. NB 0.151 vs. 0.173
  • BNB vs. TAN
  • 0.151 vs. 0.184

TAN (2)
NB (2)
BNB (5)
BNB (7)
(16)
(15)
BNB vs. ELR-NB 0.151 vs. 0.161
BNB vs. BNC-2P 0.151 vs. 0.164
ELR-NB (4)
BNC-2P (3)
11
Evaluation of BNB
  • Computationally Efficient method
  • O(MNT) , T 520, O(MN)
  • Good classification Accuracy
  • Outperforms NB, TAN
  • Competitive with ELR, BNC
  • Sparse structure boosting competitive
    accuracy
  • Potential drawbacks
  • Strongly correlated features (Corral, etc)

12
Structure Learning
  • Challenge
  • Efficiency
  • NP-hard problem
  • K-2, Hill Climbing search still examines
    polynomial number of structures
  • Resisting overfitting
  • Structure controls classifier capacity
  • Our proposed solution
  • Combines sparse model to form an ensemble
  • Constrains edge selection

13
Creating
  • Step 1 (Friedman et al. 1999)
  • Build pair-wise conditional mutual information
    table
  • Create maximum spanning tree using conditional
    mutual information as edge weight
  • Convert a undirected graph into a directed graph

1
2
3
4
14
Initial structure
  • Select Naïve Bayes
  • Create BNB via AdaBoost
  • Evaluate BNB

2
1
3
4
15
Iteratively adding edges
Ensemble CLL -0.75
2
1
3
4
16
Final BAN structure
Ensemble of the final structure produced by
17
Analysis of BAN
  • BAN
  • The base structure is sparser than BNC model
  • BAN uses an ensemble of sparser models to
    approximate a densely connected structure

Example of BAN model
Example of BNC-2P model
18
Computational complexity of BAN
  • Training Complexity O(MN2 MNTS)
  • O (MN2) G_tree
  • O (MNTS) Structure Search
  • T gt boosting iteration per structure
  • S gt number of structure examined
  • S lt N
  • Empirical training time
  • T 525, S 05
  • Approximately 25-100 times the training of NB

19
Result (simulated dataset)
True structure
Naïve Bayes
  • 25 different distribution
  • CPT table
  • Number of features
  • 4000 samples, 5 fold cross validation

20
Results (simulated dataset)
(6)
BAN(19)
NB (0)
  • BAN VS NB

21
Results (simulated dataset)
True structure
BAN (3)
22
  • BAN VS BNB
  • Correct edges added under BAN

BNB achieved optimal error in 22 datasets BAN
outperforms BNB in the remaining 3
22
Results 25 UCI datasets (BAN)
  • Standard datasets for Bayesian network
    classifiers
  • Friedman et. al. 1999
  • Greiner and Zhou 2002
  • Grossman and Domingos 2004
  • 5 fold cross validation
  • Implemented NB, TAN, BAN, BNB, BNC-2P
  • Obtained results for ELR-NB, ELR-TAN

23
Results BAN vs. Standard method
(13)
BAN (10)
BAN (10)
NB (2)
TAN (2)
BAN VS NB 0.141 VS 0.173
BAN VS TAN 0.141 VS 0.184
24
Results BAN vs. Structure Learning
BAN (7)
BNC (1)
BAN VS BNC-2P 0.141 VS 0.164
BAN contains 0-5 augmented edges BNC-2P contains
4-16 augmented edges
25
Results BAN vs. ELR
(13)
BAN (8)
(14)
BAN (5)
BAN (6)
BAN (4)
  • BAN VS ELR-TAN 0.141 vs. 0.155
  • BAN VS ELR-NB 0.141 vs. 0.161

Error stats directly taken from published
results BAN is more efficient to train
26
Evaluation of BAN vs. BNB
  • Comparison under significance test
  • BAN outperforms BNB (7)
  • Corral
  • 2 - 5
  • BNB outperforms BAN (2)
  • 0.5-2
  • Not significant 13
  • BAN choose BNB as base structure
  • IRIS, MOFN
  • Average testing error
  • 0.141 vs. 0.151
  • BAN outperforms BNB (16)
  • BNB outperforms BAN (6)

BAN (7)
(14)
BNB (2)
  • BAN VS BNB 0.141 VS 0.151

27
Conclusion
  • An ensemble of sparse model as an alternative to
    structure and parameter optimization
  • Simple to implement
  • Very efficient in training
  • Competitive classification accuracy
  • NB, TAN, HGC
  • BNC
  • ELR

28
Future Work
  • Extend BAN to handle sequential data
  • Analyze the class of Bayesian network classifiers
    that can be approximated with an ensemble of
    sparse structures.
  • Can the BAN model parameters be obtained through
    parameter learning given the final model
    structure?
  • Can we use BAN approach to learn generative
    models?
Write a Comment
User Comments (0)
About PowerShow.com