Identification of Subpopulations in Breast Cancer Data through Different Clustering Algorithms - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Identification of Subpopulations in Breast Cancer Data through Different Clustering Algorithms

Description:

TMA technology allows concomitant analyses of many proteins on tumour samples. ... M. Abd El-Rehim, G. Ball, S.E. Pinder, E. Rakha, C. Paish, J.F. Robertson, D. ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 17
Provided by: dqs
Category:

less

Transcript and Presenter's Notes

Title: Identification of Subpopulations in Breast Cancer Data through Different Clustering Algorithms


1
Identification of Sub-populations inBreast
Cancer Data throughDifferent Clustering
Algorithms
  • Daniele Soria
  • http//www.cs.nott.ac.uk/dqs
  • Supervisors Jon Garibaldi, Elia Biganzoli, Ian
    Ellis
  • BIOPTRAIN Meeting 09-01-2007

2
Background
  • Identification of biologically distinct groups
    with clinical and prognostic relevance.
  • TMA technology allows concomitant analyses of
    many proteins on tumour samples.
  • Started from Abd El-Rehim et al. (2005) 1
  • IHC applied to TMA preparations of cases of
    invasive breast cancer
  • To study the combined protein expression profiles
    of a large panel of biomarkers.

3
Abd El-Rehim et al. (2005)
  • IHC results analyzed with hierarchical clustering
  • ANN to categorize cases into groups
  • Six groups obtained
  • Each group driven by different markers
  • Arbitrary choice of clusters number

4
Aims of our work
  • Apply different clustering techniques
  • Integrate previous results and compare them with
    ours
  • Verify stability of results across different
    methods

5
Patients
  • Patients entered into the Nottingham Tenovus
    Primary Breast Carcinoma Series between 1986 and
    1998
  • A total of 1076 cases informative for all 25
    biological markers
  • Clinical information (grade, size, age, survival,
    follow-up, etc.)

6
Methods
  • Algorithms we used (2?20)
  • Fuzzy c-means (FCM)
  • K-means (KM)
  • Partitioning Around Medoids (PAM)
  • Validity indices computed
  • Software R used (www.r-project.org)
  • Methods provided by other authors
  • Hierarchical (Hier)
  • Adaptive Resonance Theory (ART)

7
Results
  • FCM quite unstable
  • KM not clear classification (3 or 6 groups)
  • PAM best computational method (4 groups)
  • Hier same results as in 1
  • ART six (fixed) groups
  • Classes definition and characterisation

8
KM and PAM
9
Principal Components
km1, km2, km3,
pam1, pam2, pam3, pam4 km4, km5,
km6
10
Results
  • FCM quite unstable
  • KM not clear classification (6 groups)
  • PAM best computational method (4 groups)
  • Hier same results as in 1
  • ART six (fixed) groups
  • Classes definition and characterisation

11
Classes
class1 class2 class3 class4 class5 cla
ss6 62 of data
12
Box plots
13
Actual and future work
  • Clinical interpretation of the results
  • Survival analysis (second events)
  • Recover all the missing information
  • Different initialization techniques

14
Main references
  • D.M. Abd El-Rehim, G. Ball, S.E. Pinder, E.
    Rakha, C. Paish, J.F. Robertson, D. Macmillan,
    R.W. Blamey, I.O. Ellis, High-throughput protein
    expression analysis using tissue microarray
    technology of a large well-characterised series
    identifies biologically distinct classes of
    breast cancer confirming recent cDNA expression
    analyses, Int. Journal of Cancer, 116, 340-350,
    2005.
  • L. Kaufman, P.J. Rousseeuw, Finding groups in
    data, Wiley series in probability and
    mathematical statistics, 1990.
  • A. Weingessel, E. Dimitriadou and S. Dolnicar, An
    Examination Of Indexes For Determining The Number
    Of Clusters In Binary Data Sets, Working Paper
    No.29, 1999.

15
Identification of Sub-populations inBreast
Cancer Data throughDifferent Clustering
Algorithms
  • Thank you!

16
BIOPTRAIN web page
  • Talk given (ASAP group seminar)
  • Courses attended (biology and bioinformatics
    ones)
  • Research skills modules (Grad School)
  • Communication skills modules (CELE)
  • All entries divided by year
Write a Comment
User Comments (0)
About PowerShow.com