Analyzing Attribute Dependencies - PowerPoint PPT Presentation

About This Presentation
Title:

Analyzing Attribute Dependencies

Description:

Analyzing Attribute Dependencies. Aleks Jakulin & Ivan Bratko ... Negative: conditional independence, redundant attributes redundancy ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 20
Provided by: AleksJ5
Category:

less

Transcript and Presenter's Notes

Title: Analyzing Attribute Dependencies


1
Analyzing Attribute Dependencies
  • Aleks Jakulin Ivan Bratko
  • Faculty of Computer and Information Science
  • University of Ljubljana
  • Slovenia

2
Overview
  • Problem
  • Generalize the notion of correlation from two
    variables to three or more variables.
  • Approach
  • Use the Shannons entropy as the foundation for
    quantifying interaction.
  • Application
  • Visualization, with focus on supervised learning
    domains.
  • Result
  • We can explain several mysteries of machine
    learning through higher-order dependencies.

3
Problem Attribute Dependencies
4
Approach Shannons Entropy
A
C
5
Interaction Information
I(ABC)
I(ABC)
- I(BC)
- I(AC)
I(ABC) - I(AB)
(Partial) history of independent reinventions
McGill 54 (Psychometrika) -
interaction information Han 80 (Information
Control) - multiple mutual
information Yeung 91 (IEEE Trans. On Inf.
Theory) - mutual information GrabischRo
ubens 99 (I. J. of Game Theory) - Banzhaf
interaction index Matsuda 00 (Physical Review
E) - higher-order mutual
inf. Brenner et al. 00 (Neural Computation)
- average synergy Demar 02 (A thesis in
machine learning) - relative information
gain Bell 03 (NIPS02, ICA2003) -
co-information Jakulin 03 -
interaction gain
6
Properties
  • Invariance with respect to attribute/label
    division
  • I(ABC) I(ACB) I(CAB)
    I(BAC) I(CBA) I(BCA).
  • Decomposition of mutual information
  • I(ABC) I(AC)I(BC)I(ABC)
  • I(ABC) is synergistic information.
  • A, B, C are independent ? I(ABC) 0.

7
Positive and Negative Interactions
  • If any pair of the attributes is conditionally
    independent w/r to a third attribute, the
    3-information neutralizes the 2-information
  • I(ABC) 0 ? I(ABC) -I(AB)
  • Interaction information may be positive or
    negative
  • Positive XOR problem (A B ? C) synergy
  • Negative conditional independence, redundant
    attributes redundancy
  • Zero Independence of one of the attributes or a
    mix of synergy and redundancy.

8
Applications
  • Visualization
  • Interaction graphs
  • Interaction dendrograms
  • Model construction
  • Feature construction
  • Feature selection
  • Ensemble construction
  • Evaluation on the CMC domain predicting
    contraception method from demographics.

9
Interaction Graphs
10
CMC
11
Application Feature Construction
  • NBC Model Predictive perf.
  • (Brier
    score)__
  • 0.2157 ? 0.0013
  • Wedu, Hedu 0.2087 ? 0.0024
  • Wedu 0.2068 ? 0.0019
  • Wedu?Hedu 0.2067 ? 0.0019
  • Age, Child 0.1951 ? 0.0023
  • Age?Child 0.1918 ? 0.0026
  • A?C?W?H 0.1873 ? 0.0027
  • A, C, W, H 0.1870 ? 0.0030
  • A, C, W 0.1850 ? 0.0027
  • A?C, W?H 0.1831 ? 0.0032
  • A?C, W 0.1814 ? 0.0033

12
Alternatives
TAN
NBC
0.1874 ? 0.0032
0.1849 ? 0.0028
BEST gt100000 models A?C, W?H, MediaExp
GBN
0.1811 ? 0.0032
0.1815 ? 0.0029
13
Dissimilarity Measures
  • The relationships between attributes are to some
    extent transitive.
  • Algorithm
  • Define a dissimilarity measure between two
    attributes in the context of the label C
  • Apply hierarchical clustering to summarize the
    dissimilarity matrix.

14
Interaction Dendrogram
weakly interacting
strongly interacting
cluster tightness
loose
tight
15
Application Feature Selection
  • Soybean domain
  • predict disease from symptoms
  • predominantly negative interactions.
  • Global optimization procedure for feature
    selection gt5000 NBC models tested (B-Course)
  • Selected features balance dissimilarity and
    importance.
  • We can understand what global optimization did
    from the dendrogram.

16
Application Ensembles
17
Implication Assumptions in Machine Learning
18
Work in Progress
  • Overfitting the interaction information
    computations do not account for the increase in
    complexity.
  • Support for numerical and ordered attributes.
  • Inductive learning algorithms which use these
    heuristics automatically.
  • Models that are based on the real relationships
    in the data, not on our assumptions about them.

19
Summary
  • There are relationships exclusive to groups of n
    attributes.
  • Interaction information is a heuristic for
    quantification of relationships with entropy.
  • Two visualization methods
  • Interaction graphs
  • Interaction dendrograms
Write a Comment
User Comments (0)
About PowerShow.com