Simultaneous row and column partitioning: Evaluation of an heuristic - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Simultaneous row and column partitioning: Evaluation of an heuristic

Description:

Sociology. Linguistic. Law. Faculties of university of Lyon 2. ERIC ... Research topics. Data mining (theory, tools and applications) Data warehouse management (T,T,A) ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 24
Provided by: Admini321
Category:

less

Transcript and Presenter's Notes

Title: Simultaneous row and column partitioning: Evaluation of an heuristic


1
Djamel A. Zighed and Nicolas Nicoloyannis ERIC
Laboratory University of Lyon 2
(France) zighed_at_univ-lyon2.fr
Prague Sept. 04
2
About Computer science dep.
  • In Lyon, there are 3 universities, 100000
    students
  • Lumière university Lyon 2, has 22000 students,
  • Lyon 2, is mainly a liberal art university
  • The faculty of economic has tree departments,
    among them the computer science one
  • We belong to this department
  • We have Bachelor, Master and PhD programs for 300
    students

3
ERIC Lab at the University
Faculties of university of Lyon 2
Economic
Sociology
Linguistic
Law
ERIC
Research centers of the university
Knowledge Engineering Research Center
- The budget of ERIC doesnt depend from the
university, its given par The national ministry
of education - We have a large autonomy in
decision making
4
ERIC Lab
  • Born in 1995,
  • 11 professors (N. Nicoloyannis, director)
  • 15 PhD Students
  • GrantscontractsWK200K/year
  • Research topics
  • Data mining (theory, tools and applications)
  • Data warehouse management (T,T,A)

5
Data Mining (T,T,A)
  • Theory
  • Induction graphs
  • Learning and classification
  • Tools
  • SIPINA Plate form for data mining
  • Applications
  • Medical fields
  • Chemical applications
  • Human science

Data mining TTA for complex data
6
Data mining on complex data
  • An example Breast cancer diagnosis

7
Motivations
8
Motivations
Association measure It measures the strength
of the relationship between X and Y





Contingency table




9
Motivations
Association measure It measures the strength
of the relationship between X and Y





Contingency table




10
Motivations
Association measure It measures the strength
of the relationship between X and Y
According to a specific association measure, may
we improve the strength of the relationship by
merging some rows and/or some columns ?





Contingency table




11
Motivations
According to a specific association measure, may
we improve the strength of the relation ship by
merging some rows and/or some columns ?





Contingency table




12
An example
13
Goal Find the groupings that maximize the
association between attributes
Yes, we can improve the association by reducing
the size of the contingency table
14
Extension
According to a specific association measure, may
we find the optimal reduced contingency table ?





Contingency table




15
Optimal solution (exhaustive search)
Goal Find the best cross partition on T
16
Optimal solution (exhaustive search)
17
Optimal solution (exhaustive search)
According to a specific association measure, may
we find the optimal reduced contingency table ?
Yes, but the solution is intractable in real word
because of the high time complexity
18
Heuristic
Proceed successively to the grouping of 2 (row or
column) values that maximizes the increase in the
association criteria.
19
Complexity
20
Simulation
Goal How far is the quasi-optimal solution from
the true optimum? Comparison tractable for
tables not greater than 6 6.
Simulation Design
Randomly generate 200 tables Analysis of the
distribution of the deviations between optima
and quasi-optima.
Generating the Tables
10000 cases distributed in the cxr cells of the
table with an uniform distribution (worst case).
21
Quasi-optimal solution
22
Quasi-optimal solution
23
Conclusion
  • Implementation for new approach induction
    decision tree.
  • Zighed, D.A., Ritschard, G., W. Erray and V.-M.
    Scuturici (2003), Abogodaï,a New approach for
    Decision Trees, in Lavrac, N., D.Gamberger, L.
    Todorovski and H. Blockeel (eds), Knowledge
    Discovery in databases PKDD 2003 , LNAI 2838,
    Berlin Springer, 495--506.
  • Zighed D. A., Ritschard G., Erray W., Scuturici
    V.-M. (2003), Decision tree with optimal join
    partitioning, To appear in Journal of Information
    Intelligent Systems, Kluwer (2004).
  • Divisive top-down approach
  • Extension to multidimensionnal case
Write a Comment
User Comments (0)
About PowerShow.com