A three-step approach for STULONG database analysis: characterization of patients

About This Presentation

Title:

A three-step approach for STULONG database analysis: characterization of patients

Description:

Smoker. If smokerconsumption!=0 OR duration then 1 else 0. 13 ... AGE & SMOKER & CHOL is less frequent in group12 than in group5. etc ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 20

Provided by: lisp8

Category:

more less

Transcript and Presenter's Notes

Title: A three-step approach for STULONG database analysis: characterization of patients

1
A three-step approach for STULONG database
analysis characterization of patients groups
Discovery Challenge (PKDD 2004)

O. Couturier, H. Delalin, H. Fu, E. Kouamou, E.
Mephu Nguifo
Computer Science Research Center of Lens (CRIL)
CNRS - Université dArtois IUT de Lens

2
Goal

What are the relations between social factors
(social characteristics) and the other
characteristics of men in the respective groups?

3
Overview

Discovery process
Techniques and results
Clustering
Classification
Association rules
Conclusion and further work

4
Discovery Process

Hypothesis on data
ENTRY table
Groups provided by expert
Merging groups 1 and 2 Normal group
Merging groups 3 and 4 Risk group
Ignoring group 6
Characteristics
Considering previous work of LRI ML research team
at previous PKDD Challenges

5
Discovery Process

Can we find a model that fits with the provided
groups ?
Are there strong similarities among instances of
different groups ?
Which kind of relations exist among group
characteritics ?

6
Discovery Process
Data Tasks Knowledge

Clustering Generated clusters vs provided ones

Entry data groups Supervised classification Similarities among instances, and groups

Association rules search Affinity among groups characteristics
7
Techniques and Results Clustering

Goal do the initials groups can be considered as
they were defined?
Data groups 12, 34 and 5
Clustering systems (WEKA package)
COBWEB 2 groups
EM 4 groups
KMEANS 2 groups
Results difficulty to identify properties which
allow to retrieve the initial groups

8
Techniques and Results Supervised Classification

Risk group patients similar to those in Normal or
Pathological group ?
Data
Training set group 12 and group 5
Test set group 34 (Risk)
System (WEKA package)
Decision tree C4.5

9
Techniques and Results Supervised Classification

Training results
HT descriptor are one of the most relevant
factors of the disease
Thirdteen instances of Pathological group are
classified as Normal

Confusion Matrix a e
lt--classified as 276 0 a 12 13 101
e 5
10
Techniques and Results Supervised Classification

Test set Risk Group34
Health district number is not a relevant factor
2/5 Risk patients similar to Normal group patients

Confusion Matrix a c d e lt--
classified as 0 0 0 0 a 12 177
0 0 250 c 3 (odd) 197 0 0 235 d
4 (even) 0 0 0 0 e 5
11
Techniques and Results Association rules search

Goal Find relations that exist among group
characteritics
Data 1417 patients of groups 12, 34 and 5
System
Apriori B. Goethals implementation
Preprocess
Binary conversion of the 27 characteristics
Frequent Itemsets Search
Results
Frequent itemsets common to different groups

12
Techniques and Results Association rules search

Preprocessing Binary conversion
BMI weight / size² (m)
If bmi gt 27 then 1 else 0
Age
If age gt 45 then 1 else 0
Smoker
If smokerconsumption!0 OR duration then 1 else 0

13
Techniques and Results Association rules search

Pre-processing Binary conversion
Bolhr (chest pain)
If bolhr1 or bolhr6 then 0 else 1
Chol
If (chol gt 2(age/100)) then 1 else 0
Tg
If tglt150 then 0 else 1

14
Techniques and Results Association rules search

Frequent itemsets search
Support threshold (Minsup) 0.10
significant for at least 10 of the population
Search was done with no MinSup (i.e MinSup value
0)

Itemsets Itemsets Itemsets
Class 12 Class 34 Class 5
Support value Support value Support value
15
(No Transcript)
16
Techniques and Results Association rules search

Frequent itemsets search Results
Support value of Alcohol attribute 1
1-itemsets
Attribute IM is false for each patient of group
12 and 34. The value is true for 33 of patients
of group 5.
HT is false for each patient of group 12.
STUDY is more frequent in group12 than in group5
3-itemsets
AGE SMOKER CHOL is less frequent in group12
than in group5
etc
SupportValue Group 34 is between SupportValue
Group 12 and SupportValue of Group 5.

17
Conclusion

RG similarity with NG and PG.
3 steps
Clustering initial groups are not found
Classification some attributes which
characterize the pathological group but already
known
Frequent itemsets search difficult to highlight
concrete results but interesting informations

18
Further work

Upgrade the binary conversion
Refining the data set on the population
for instance, 12 patients died because of
atherosclerosis while they were in the NG.
Refining our hypothesis
Data set of ENTRY table
Look at the CONTROL table

19
Thanks !

Write a Comment

User Comments (0)

About PowerShow.com

A three-step approach for STULONG database analysis: characterization of patients - PowerPoint PPT Presentation

A three-step approach for STULONG database analysis: characterization of patients

Smoker. If smokerconsumption!=0 OR duration then 1 else 0. 13 ... AGE & SMOKER & CHOL is less frequent in group12 than in group5. etc ... – PowerPoint PPT presentation