Title: Gene Ontologydriven similarity for gene expression correlation analysis
1Gene Ontology-driven similarityfor gene
expression correlation analysis
Francisco Azuaje, Haiying Wang, Olivier
Bodenreider and Joaquín Dopazo
2Two kinds of gene information
3Gene expression analysis
- Clustering of genes based on co-regulation
- Similar expression patterns represent
- Common pathway
- Common response to experimental conditions
4Annotation analysis
5Limitations of annotation analysis
Not reflectedin the analysis
6Semantic similarity in taxonomies
7Approaches to computing semantic
similarity/distance in taxonomies
- Edge counting
- Intuitive
- Requires density to behomogeneous in the
taxonomy - Information-theoretic metrics
- Grounded in information theory
- Compensates in heterogeneity in the taxonomy
d1
d2
d1
8Information-theoretic approaches
- Information content (IC) nodes high in the
hierarchy have a small IC - The information shared by two nodes can be
represented by their common ancestors(least
common subsumer) - The more information two terms share, the more
similar they are
9Computing information content
- Taxonomy
- Frequency distribution of the nodes in a
corpus/database - Information content of C based on the
probability of finding a descendant of C in the
corpus/database
10Information content in GO
- Taxonomyhierarchy (DAG) of is a part of
relations - Frequency distribution of GO terms annotation
databases
11Semantic similarity in GO
Lord et al., PSB 2003 Wang et al., CIBCB 2004
- Based on the information content of the least
common subsumer (LCS) - Several variants
- Resnik (1995)
- Lin (1998)
- Jiang Conrath (1997)
12Semantic similarity among gene products
sim(c,c) sim(c,e) sim(d,c) sim(d,e)
annotations
g2
g1
SIM(g1,g2)
13Experiment
14Comparing gene-gene similarity
- Gene expression data(similar expression levels)
- Annotations(high semantic similarity)
15Experiment
- Hypothesis Pairs of genes exhibiting similar
expression levels also tend to have high semantic
similarity - Dataset from Eisen et al., 1998
- expression responses to several perturbations
inS. cerevisiae (2460 gene products with GO
annotations) - Method for each pair of genes (gi, gj), compute
- absolute correlation value between gi and gj
- semantic similarity between gi and gj
- (on each GO hierarchy and for each metric
separately)
16(No Transcript)
17(No Transcript)
18(No Transcript)
19Summary of results
- Confirmation of our hypothesis
- High similarity values are significantly
associated with strong expression correlation
values - Weak similarity are significantly associated
withlow expression correlation values - Additionally
- Similar results were obtained
- For different number of intervals
- With the three metrics tested
- For the three GO hierarchies
- This trend is significantly stronger in the case
of the highest expression correlation values
20(No Transcript)