Extraction of Organism Groups from Phylogenetic Profiles Using Independent Component Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Extraction of Organism Groups from Phylogenetic Profiles Using Independent Component Analysis

Description:

Extraction of Organism Groups from Phylogenetic Profiles Using Independent ... O1. O2. O3. O4. ON. O1. O2. O3. O4. ON. Independent Component Analysis. ICA ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 13
Provided by: bisn2
Category:

less

Transcript and Presenter's Notes

Title: Extraction of Organism Groups from Phylogenetic Profiles Using Independent Component Analysis


1
Extraction of Organism Groups from Phylogenetic
Profiles Using Independent Component Analysis
  • Yamanishi, M., Itoh, M., Kanehisa, M.
  • Genome Informatics 13 61-70, 2002
  • Summarized by Jeong-Ho Chang

2
  • Goal extract organism groups and their hierarchy
    from phylogenetic profiles using ICA.
  • Find independent components that characterize
    major organism groups.
  • Identify genes that are characteristic to each
    organism group.

Hierarchical clustering of organisms
Gene identification
3
Phylogenetic profiles
  • Definition a bit pattern that encodes the
    presence or absence of conserved (orthologous)
    genes in a set of organisms.
  • Application
  • Functional prediction of genes when two genes
    share similar phylogenetic profiles, it is
    assumed that these genes are functionally
    correlated.
  • Construction of genome trees stems from the
    assumption that gene losses or acquisitions are
    major evolution phenomena.

4
Independent Component Analysis
  • ICA
  • A linear transformation method in the field of
    statistics and signal processing.
  • Represent a set of variables as a linear
    combination of latent variables which are
    statistically independent each other.

IC score
5
Experiments
  • Data set
  • Phylogenetic profiles constructed from 2875
    orthologous genes in 77 organisms.
  • KEGG/GENES database as of May 2002.
  • 6 eukaryotes, 13 archaea, and 58 bacteria.
  • Grouping of organisms
  • 2875 x 77 ? 2875 x 18
  • For the interpretation of biological meanings of
    each ICs, correlation coefficients for all
    combinations of 77 organisms and 18 ICs were
    computed.

6
E
A
B
  • 9 out of 18 components were well correlated with
    specific organism groups.
  • 74 organisms were well represented by the 9 ICs.
  • Exception Deinococcus raiodurans, Aquifex
    aeolicus, Thermotoga maritima

7
  • Hierarchy of organism groups
  • Original data set ?? result of ICA.
  • Distance in original data set hamming distance.
  • Distance in reduced set correlation coefficient.
  • In case of the reduced set, only 9 ICs are used.
  • Complete linkage hierarchical clustering.

8
(No Transcript)
9
  • Identification of Genes
  • The result of ICA can be used to identify genes
    that are clustered at high and low scores along
    each independent component.

10
(No Transcript)
11
Discussion
  • Proposed to use the ICA for extraction of
    organism groups from phylogenetic profiles.
  • ICA is an appropriate method to detect biological
    features
  • ICA attempts to maximize nongaussianity.
  • PCA attempts to maximize variance ? interrupt the
    process of detecting biologically meaningful
    features.
  • Future works
  • The development from independent components to
    tree components.
  • Incorporating phylogenetic tree structure for the
    similarity of two phylogenetic profiles.

12
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com