Boosted Augmented Naive Bayes Efficient discriminative learning of Bayesian network classifiers - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Boosted Augmented Naive Bayes Efficient discriminative learning of Bayesian network classifiers

Description:

Boosted Augmented Naive Bayes. Efficient discriminative learning of. Bayesian network classifiers ... Generalizes Boosted Na ve Bayes (Elkan 1997) Comprehensive ... – PowerPoint PPT presentation

Number of Views:333

Avg rating:3.0/5.0

Slides: 29

Provided by: velblodVid

Category:

more less

Transcript and Presenter's Notes

Title: Boosted Augmented Naive Bayes Efficient discriminative learning of Bayesian network classifiers

1
Boosted Augmented Naive Bayes Efficient
discriminative learning of Bayesian network
classifiers

Yushi Jing
GVU, College of Computing, Georgia Institute of
Technology
Vladimir Pavlovic
Department of Computer Science, Rutgers
University
James M. Rehg
GVU, College of Computing, Georgia Institute of
Technology

2
Contribution

Boosting approach to Bayesian network
classification
Additive combination of simple models (e.g. Naïve
Bayes)
Weighted maximum likelihood learning
Generalizes Boosted Naïve Bayes (Elkan 1997)
Comprehensive experimental evaluation of BNB.
Boosted Augmented Naïve Bayes (BAN)
Efficient training algorithm
Competitive classification accuracy
Naïve Bayes, TAN, BNC (2004), ELR (2001)

3
Bayesian network

Modular and Intuitive graphical representation
Explicit Probabilistic Representation

Bayesian network classifiers

Joint distribution
Conditional distribution
Class Label
How to efficiently train Bayesian network
discriminatively to improve its classification
accuracy?

4
Parameter Learning

Maximum Likelihood parameter learning
Efficient parameter learning algorithm
Maximizes LLG score
No analytic solution for parameters that
maximizes CLLG

5
Model selection

ML does not optimize CLLA
ELRA optimizes CLLA
(Greiner and Zhou, 2002)

A
Excellent classification accuracy Computationally
expensive in training
B

ML optimizes CLLB when B is optimal
BNC algorithm searches for the optimal
structure
(Grossman and Domingos, 2004)

6
Talk outline
Our Goal

Combine parameter and structure optimization
Avoid over-fitting
Retain training efficiency

Minimization function for Boosted Bayesian
network
Empirical Evaluation of Boosted Naïve Bayes
Boosted Augmented Naïve Bayes (BAN)
Empirical Evaluation of BAN

7
Exponential Loss Function (ELF)

Boosted Bayesian network classifier minimizes ELF
function.

ELFF is an upper bound of CLLF

8
Minimizing ELF via ensemble method

Ensemble method
Adaboost (Population version) constructs F(x)
additively to approximately minimizes ELFF
Discriminatively updates the data weights
Tractable ML learning to train the parameters

9
Results 25 UCI datasets (BNB)
BNB vs. NB 0.151 vs. 0.173
10
Results 25 UCI datasets (BNB)
(13)
(14)
BNB (9)
BNB (10)
BNB vs. NB 0.151 vs. 0.173

BNB vs. TAN
0.151 vs. 0.184

TAN (2)
NB (2)
BNB (5)
BNB (7)
(16)
(15)
BNB vs. ELR-NB 0.151 vs. 0.161
BNB vs. BNC-2P 0.151 vs. 0.164
ELR-NB (4)
BNC-2P (3)
11
Evaluation of BNB

Computationally Efficient method
O(MNT) , T 520, O(MN)
Good classification Accuracy
Outperforms NB, TAN
Competitive with ELR, BNC
Sparse structure boosting competitive
accuracy
Potential drawbacks
Strongly correlated features (Corral, etc)

12
Structure Learning

Challenge
Efficiency
NP-hard problem
K-2, Hill Climbing search still examines
polynomial number of structures
Resisting overfitting
Structure controls classifier capacity

Our proposed solution
Combines sparse model to form an ensemble
Constrains edge selection

13
Creating

Step 1 (Friedman et al. 1999)
Build pair-wise conditional mutual information
table
Create maximum spanning tree using conditional
mutual information as edge weight
Convert a undirected graph into a directed graph

1
2
3
4
14
Initial structure

Select Naïve Bayes
Create BNB via AdaBoost
Evaluate BNB

2
1
3
4
15
Iteratively adding edges
Ensemble CLL -0.75
2
1
3
4
16
Final BAN structure
Ensemble of the final structure produced by
17
Analysis of BAN

BAN
The base structure is sparser than BNC model
BAN uses an ensemble of sparser models to
approximate a densely connected structure

Example of BAN model
Example of BNC-2P model
18
Computational complexity of BAN

Training Complexity O(MN2 MNTS)
O (MN2) G_tree
O (MNTS) Structure Search
T gt boosting iteration per structure
S gt number of structure examined
S lt N
Empirical training time
T 525, S 05
Approximately 25-100 times the training of NB

19
Result (simulated dataset)
True structure
Naïve Bayes

25 different distribution
CPT table
Number of features
4000 samples, 5 fold cross validation

20
Results (simulated dataset)
(6)
BAN(19)
NB (0)

BAN VS NB

21
Results (simulated dataset)
True structure
BAN (3)
22

BAN VS BNB

Correct edges added under BAN

BNB achieved optimal error in 22 datasets BAN
outperforms BNB in the remaining 3
22
Results 25 UCI datasets (BAN)

Standard datasets for Bayesian network
classifiers
Friedman et. al. 1999
Greiner and Zhou 2002
Grossman and Domingos 2004
5 fold cross validation
Implemented NB, TAN, BAN, BNB, BNC-2P
Obtained results for ELR-NB, ELR-TAN

23
Results BAN vs. Standard method
(13)
BAN (10)
BAN (10)
NB (2)
TAN (2)
BAN VS NB 0.141 VS 0.173
BAN VS TAN 0.141 VS 0.184
24
Results BAN vs. Structure Learning
BAN (7)
BNC (1)
BAN VS BNC-2P 0.141 VS 0.164
BAN contains 0-5 augmented edges BNC-2P contains
4-16 augmented edges
25
Results BAN vs. ELR
(13)
BAN (8)
(14)
BAN (5)
BAN (6)
BAN (4)

BAN VS ELR-TAN 0.141 vs. 0.155

BAN VS ELR-NB 0.141 vs. 0.161

Error stats directly taken from published
results BAN is more efficient to train
26
Evaluation of BAN vs. BNB

Comparison under significance test
BAN outperforms BNB (7)
Corral
2 - 5
BNB outperforms BAN (2)
0.5-2
Not significant 13
BAN choose BNB as base structure
IRIS, MOFN
Average testing error
0.141 vs. 0.151
BAN outperforms BNB (16)
BNB outperforms BAN (6)

BAN (7)
(14)
BNB (2)

BAN VS BNB 0.141 VS 0.151

27
Conclusion

An ensemble of sparse model as an alternative to
structure and parameter optimization
Simple to implement
Very efficient in training
Competitive classification accuracy
NB, TAN, HGC
BNC
ELR

28
Future Work

Extend BAN to handle sequential data
Analyze the class of Bayesian network classifiers
that can be approximated with an ensemble of
sparse structures.
Can the BAN model parameters be obtained through
parameter learning given the final model
structure?
Can we use BAN approach to learn generative
models?

Write a Comment

User Comments (0)