Title: Simultaneous row and column partitioning: Evaluation of an heuristic
1Djamel A. Zighed and Nicolas Nicoloyannis ERIC
Laboratory University of Lyon 2
(France) zighed_at_univ-lyon2.fr
Prague Sept. 04
2About Computer science dep.
- In Lyon, there are 3 universities, 100000
students - Lumière university Lyon 2, has 22000 students,
- Lyon 2, is mainly a liberal art university
- The faculty of economic has tree departments,
among them the computer science one - We belong to this department
- We have Bachelor, Master and PhD programs for 300
students
3ERIC Lab at the University
Faculties of university of Lyon 2
Economic
Sociology
Linguistic
Law
ERIC
Research centers of the university
Knowledge Engineering Research Center
- The budget of ERIC doesnt depend from the
university, its given par The national ministry
of education - We have a large autonomy in
decision making
4ERIC Lab
- Born in 1995,
- 11 professors (N. Nicoloyannis, director)
- 15 PhD Students
- GrantscontractsWK200K/year
- Research topics
- Data mining (theory, tools and applications)
- Data warehouse management (T,T,A)
5Data Mining (T,T,A)
- Theory
- Induction graphs
- Learning and classification
- Tools
- SIPINA Plate form for data mining
- Applications
- Medical fields
- Chemical applications
- Human science
Data mining TTA for complex data
6Data mining on complex data
- An example Breast cancer diagnosis
7Motivations
8Motivations
Association measure It measures the strength
of the relationship between X and Y
Contingency table
9Motivations
Association measure It measures the strength
of the relationship between X and Y
Contingency table
10Motivations
Association measure It measures the strength
of the relationship between X and Y
According to a specific association measure, may
we improve the strength of the relationship by
merging some rows and/or some columns ?
Contingency table
11Motivations
According to a specific association measure, may
we improve the strength of the relation ship by
merging some rows and/or some columns ?
Contingency table
12An example
13Goal Find the groupings that maximize the
association between attributes
Yes, we can improve the association by reducing
the size of the contingency table
14Extension
According to a specific association measure, may
we find the optimal reduced contingency table ?
Contingency table
15Optimal solution (exhaustive search)
Goal Find the best cross partition on T
16Optimal solution (exhaustive search)
17Optimal solution (exhaustive search)
According to a specific association measure, may
we find the optimal reduced contingency table ?
Yes, but the solution is intractable in real word
because of the high time complexity
18Heuristic
Proceed successively to the grouping of 2 (row or
column) values that maximizes the increase in the
association criteria.
19Complexity
20Simulation
Goal How far is the quasi-optimal solution from
the true optimum? Comparison tractable for
tables not greater than 6 6.
Simulation Design
Randomly generate 200 tables Analysis of the
distribution of the deviations between optima
and quasi-optima.
Generating the Tables
10000 cases distributed in the cxr cells of the
table with an uniform distribution (worst case).
21Quasi-optimal solution
22Quasi-optimal solution
23Conclusion
- Implementation for new approach induction
decision tree. - Zighed, D.A., Ritschard, G., W. Erray and V.-M.
Scuturici (2003), Abogodaï,a New approach for
Decision Trees, in Lavrac, N., D.Gamberger, L.
Todorovski and H. Blockeel (eds), Knowledge
Discovery in databases PKDD 2003 , LNAI 2838,
Berlin Springer, 495--506. - Zighed D. A., Ritschard G., Erray W., Scuturici
V.-M. (2003), Decision tree with optimal join
partitioning, To appear in Journal of Information
Intelligent Systems, Kluwer (2004). - Divisive top-down approach
- Extension to multidimensionnal case