A particle swarm based hybrid system for imbalanced medical data sampling - PowerPoint PPT Presentation

About This Presentation
Title:

A particle swarm based hybrid system for imbalanced medical data sampling

Description:

Imbalanced class distribution in medical data ... sample1. sample2. samplem. m is the sample size of majority class. Final Schema. Results ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 17
Provided by: incobAp
Category:

less

Transcript and Presenter's Notes

Title: A particle swarm based hybrid system for imbalanced medical data sampling


1
A particle swarm based hybrid system for
imbalanced medical data sampling
  • Pengyi Yang
  • yangpy_at_it.usyd.edu.au

School of Information Technologies The University
of Sydney, NSW 2006, Australia
NICTA, Australian Technology Park, Eveleigh NSW
2015, Australia
2
Road map
  • Imbalanced class distribution in medical data
  • Sampling
  • Over-sampling
  • Under-sampling
  • Convert feature selection techniques as sampling
    strategy
  • System overview
  • Results
  • Conclusion

3
Imbalanced class distribution in medical data
  • Medical data are commonly with imbalanced class
    distribution
  • Why?
  • Positive samples are special cases (rare) while
    negative samples are abundant
  • (Reversely, only positive samples are collected)
  • Data contain subtypes each with limited samples

4
Problem
  • Building classification model with imbalanced
    dataset will cause the under represented class
    been overlooked or even ignored.
  • Yet, the rare classes often carry important
    biological implication.
  • The difficulty becomes how to remedy the
    imbalanced class distribution.

5
Remedy
  • Via sampling before model building process
  • Over sampling increase sample size of minority
    class (could introduce noise and redundancy)
  • Under sampling decrease sample size of majority
    class (could remove representative samples)
  • Via cost-sensitive learning within model
    building process
  • Need to choose an appropriate cost-metric (hard
    to determine a prior)

6
Current methods
  • The most straightforward way random
    over-sampling and under-sampling
  • Naive method but work well in different
    situations
  • Clustering and sampling
  • Clustering dataset and sampling according to the
    characteristic of each cluster
  • Synthesizing new examples
  • Most popular is smote which creates
    artificial samples to increase the size of
    minority class

7
Our contribution Proposing a novel sampling
strategy
  • Convert feature selection technique as sampling
    strategy
  • Selecting a subset of optimal samples from
    majority class

Supervised sample selection
(balanced dataset)
(imbalanced dataset)
8
Conceptual representation
9
Particle swarm optimization
  • Problem encoding
  • Each particle is a subset of samples from the
    majority class

m is the sample size of majority class
10
Final Schema
11
Results
(1) PSO achieved better classification
results.
(2) Different evaluation metrics could
gives a different evaluation
indication.
12
Results continue
(3) Different classifiers also perform
differently within the same sampling method
13
Key observation
  • The evaluation of data sampling strategy is
    compounded by the type of classifier applied and
    the evaluation metric used.
  • Therefore, caution should be drawn when the
    conclusion is made on the basis of a single type
    of classifier or evaluation metric.

14
Conclusion
  • The study shows that with proper modification
    feature selection techniques can be applied to
    sampling of imbalanced data.
  • The application of such technique to medical
    domain demonstrates it can help to increase the
    classification accuracy which is valuable to
    prediction or decision support systems.

15
Publication
  • P Yang, L Hsu, B. Zhou, Z. Zhang, A Zomaya, A
    particle swarm based hybrid system for imbalanced
    medical data sampling, accepted by BMC Genomics.

16
Questions!
Write a Comment
User Comments (0)
About PowerShow.com