A three-step approach for STULONG database analysis: characterization of patients - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

A three-step approach for STULONG database analysis: characterization of patients

Description:

Smoker. If smokerconsumption!=0 OR duration then 1 else 0. 13 ... AGE & SMOKER & CHOL is less frequent in group12 than in group5. etc ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 20
Provided by: lisp8
Category:

less

Transcript and Presenter's Notes

Title: A three-step approach for STULONG database analysis: characterization of patients


1
A three-step approach for STULONG database
analysis characterization of patients groups
Discovery Challenge (PKDD 2004)
  • O. Couturier, H. Delalin, H. Fu, E. Kouamou, E.
    Mephu Nguifo
  • Computer Science Research Center of Lens (CRIL)
  • CNRS - Université dArtois IUT de Lens

2
Goal
  • What are the relations between social factors
    (social characteristics) and the other
    characteristics of men in the respective groups?

3
Overview
  • Discovery process
  • Techniques and results
  • Clustering
  • Classification
  • Association rules
  • Conclusion and further work

4
Discovery Process
  • Hypothesis on data
  • ENTRY table
  • Groups provided by expert
  • Merging groups 1 and 2 Normal group
  • Merging groups 3 and 4 Risk group
  • Ignoring group 6
  • Characteristics
  • Considering previous work of LRI ML research team
    at previous PKDD Challenges

5
Discovery Process
  • Can we find a model that fits with the provided
    groups ?
  • Are there strong similarities among instances of
    different groups ?
  • Which kind of relations exist among group
    characteritics ?

6
Discovery Process
Data Tasks Knowledge

Clustering Generated clusters vs provided ones

Entry data groups Supervised classification Similarities among instances, and groups

Association rules search Affinity among groups characteristics
7
Techniques and Results Clustering
  • Goal do the initials groups can be considered as
    they were defined?
  • Data groups 12, 34 and 5
  • Clustering systems (WEKA package)
  • COBWEB 2 groups
  • EM 4 groups
  • KMEANS 2 groups
  • Results difficulty to identify properties which
    allow to retrieve the initial groups

8
Techniques and Results Supervised Classification
  • Risk group patients similar to those in Normal or
    Pathological group ?
  • Data
  • Training set group 12 and group 5
  • Test set group 34 (Risk)
  • System (WEKA package)
  • Decision tree C4.5

9
Techniques and Results Supervised Classification
  • Training results
  • HT descriptor are one of the most relevant
    factors of the disease
  • Thirdteen instances of Pathological group are
    classified as Normal

Confusion Matrix a e
lt--classified as 276 0 a 12 13 101
e 5
10
Techniques and Results Supervised Classification
  • Test set Risk Group34
  • Health district number is not a relevant factor
  • 2/5 Risk patients similar to Normal group patients

Confusion Matrix a c d e lt--
classified as 0 0 0 0 a 12 177
0 0 250 c 3 (odd) 197 0 0 235 d
4 (even) 0 0 0 0 e 5
11
Techniques and Results Association rules search
  • Goal Find relations that exist among group
    characteritics
  • Data 1417 patients of groups 12, 34 and 5
  • System
  • Apriori B. Goethals implementation
  • Preprocess
  • Binary conversion of the 27 characteristics
  • Frequent Itemsets Search
  • Results
  • Frequent itemsets common to different groups

12
Techniques and Results Association rules search
  • Preprocessing Binary conversion
  • BMI weight / size² (m)
  • If bmi gt 27 then 1 else 0
  • Age
  • If age gt 45 then 1 else 0
  • Smoker
  • If smokerconsumption!0 OR duration then 1 else 0

13
Techniques and Results Association rules search
  • Pre-processing Binary conversion
  • Bolhr (chest pain)
  • If bolhr1 or bolhr6 then 0 else 1
  • Chol
  • If (chol gt 2(age/100)) then 1 else 0
  • Tg
  • If tglt150 then 0 else 1

14
Techniques and Results Association rules search
  • Frequent itemsets search
  • Support threshold (Minsup) 0.10
  • significant for at least 10 of the population
  • Search was done with no MinSup (i.e MinSup value
    0)

Itemsets Itemsets Itemsets
Class 12 Class 34 Class 5
Support value Support value Support value
15
(No Transcript)
16
Techniques and Results Association rules search
  • Frequent itemsets search Results
  • Support value of Alcohol attribute 1
  • 1-itemsets
  • Attribute IM is false for each patient of group
    12 and 34. The value is true for 33 of patients
    of group 5.
  • HT is false for each patient of group 12.
  • STUDY is more frequent in group12 than in group5
  • 3-itemsets
  • AGE SMOKER CHOL is less frequent in group12
    than in group5
  • etc
  • SupportValue Group 34 is between SupportValue
    Group 12 and SupportValue of Group 5.

17
Conclusion
  • RG similarity with NG and PG.
  • 3 steps
  • Clustering initial groups are not found
  • Classification some attributes which
    characterize the pathological group but already
    known
  • Frequent itemsets search difficult to highlight
    concrete results but interesting informations

18
Further work
  • Upgrade the binary conversion
  • Refining the data set on the population
  • for instance, 12 patients died because of
    atherosclerosis while they were in the NG.
  • Refining our hypothesis
  • Data set of ENTRY table
  • Look at the CONTROL table

19
Thanks !
Write a Comment
User Comments (0)
About PowerShow.com