Model Averaging with Discrete Bayesian Network Classifiers - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Model Averaging with Discrete Bayesian Network Classifiers

Description:

with Discrete Bayesian Network Classifiers. Denver Dash and Gregory F. Cooper. In the Proceedings of the Ninth International Workshop on Artificial ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 20
Provided by: jang5
Category:

less

Transcript and Presenter's Notes

Title: Model Averaging with Discrete Bayesian Network Classifiers


1
Model Averagingwith Discrete Bayesian Network
Classifiers
  • Denver Dash and Gregory F. Cooper
  • In the Proceedings of the Ninth International
    Workshop on Artificial Intelligence and
    Statistics (AISTATS 2003)

2
Contents
  • Model-averaging over a class of discrete Bayesian
    network classifiers
  • A partial ordering and bounded in-degree k.
  • Theoretical results (for N nodes)
  • The class has at least
    distinct structures.
  • The summation can be performed in
    time.
  • Approximate averaging in O(N) time.
  • Experiments
  • The technique can be beneficial even when the
    generating distribution is not a member of the
    class.
  • Characterize the performance over several
    parameters.

3
Bayesian network classifiers
  • Naïve Bayes classifier
  • General Bayesian network classifiers

C
F1
F2
FN
Optimal in zero-one loss
Poor generalization performance could be improved
by Bayesian model averaging. ? the space of
network structure is super-exponential.
C
F1
F2
FN
4
In this paper
  • Bayesian model-averaging over a restricted class
    of Bayesian network classifiers
  • A partial order (p) and a bounded in-degree (k).
  • Contributions
  • The factorization of the conditionals to apply to
    the task of classification.
  • Show that MA over this class can be approximated
    by a single network S ? calculation in O(N)
    time.
  • Empirical evaluation of the method compared with
  • A single naïve Bayes classifer
  • A single Bayesian network learned by a greedy
    search
  • Exact MA on naïve Bayes classifiers.

5
Notations
  • The classification problem
  • A set of features F F1, F2, , FN.
  • X0 C, X1 F1, , XN FN. ? X (in Bayesian
    networks)
  • A set of classes C C1, C2, , CNC.
  • A database D D1, D2, , DR.
  • A Bayesian network
  • G(X) a DAG structure
  • Xi a multinomial distribution
  • Pi a parents of Xi
  • A parameter
  • Parameter set ?
  • Other assumptions parameter independence,
    Dirichlet priors,

6
Fixed network structures
  • With the fixed network parameters ?
  • Bayesian averaging over the parameters with
    conjugate priors

7
Averaging with a fixed ordering (1)
  • For a structural feature, e.g. XL ? XM
  • The posterior probability P(XL ? XMD),
  • The structure modularity
  • The marginal likelihood (decomposable)

8
Averaging with a fixed ordering (2)
  • Then, the posterior probability of a structural
    feature can be represented as,

9
Averaging with a fixed ordering (3)
  • Enumerating the possible parents of Xi given a
    partial ordering
  • p ltX1, X3, X2, X4gt, k 2.
  • P20 0, P21 X1, P22 X3, P23 X1, X3.

10
Averaging with a fixed ordering (4)
11
Averaging with a fixed ordering (5)
12
Averaging with a fixed ordering (6)
  • Dynamic programming solution
  • Finally,

13
Model averaging for predictions
  • The probability of a new example can be
    calculated as similarly as the probability of a
    structural feature. Hence,
  • The parameter value ?ijk is used on behalf of the
    Kronecker-delta function.

14
Approximation on the model averaging
  • The time bound is still severe even for moderate
    cases (k 3 or 4).
  • One approximation
  • Order the set of possible parents for Xi based on
    the function f(Xi, Pi?D) and prune them.

15
Experimental evaluation (1)
  • Performance metric d (R1 R2 / T R2)
  • Synthetic data sets
  • Comparisons between exact averaging and
    approximation

16
Experimental evaluation (2)
  • Approximate model averaging vs. greedy thick-thin
    search

17
Experimental evaluation (3)
  • Synthetic data from the ALARM network
  • AMA vs. GTT

18
Experimental evaluation (4)
  • Real classification data sets from the UCI
    repository

19
Discussion
  • Approximate model averaging outperforms a single
    BN classifier.
  • Simplicity of the implementation.
  • Future work
  • Find a better method for optimizing for the
    ordering.
  • Applications to the real-world problems.
  • Relax the assumption of the complete data.
Write a Comment
User Comments (0)
About PowerShow.com