Predicting interactions between genes based on genome Sequence comparisons The - PowerPoint PPT Presentation

About This Presentation
Title:

Predicting interactions between genes based on genome Sequence comparisons The

Description:

... Contents Predicting functional interactions between proteins Genomic context methods General Gene fusion Gene order Presence / absence of genes across genomes ... – PowerPoint PPT presentation

Number of Views:223
Avg rating:3.0/5.0
Slides: 61
Provided by: ber48
Category:

less

Transcript and Presenter's Notes

Title: Predicting interactions between genes based on genome Sequence comparisons The


1
Predicting interactions between genes based on
genome Sequence comparisonsThe genomic
context component of STRINGBioinformatics
seminar series5-10-2004Berend Snel
2
To do
  • Seminar (today) please ask questions
  • Article a gene co-expression network for global
    discovery of conserved genetic modules
  • Make schedule for article discussion (today)
  • Read article (next couple of days)
  • 5 minute discussion per person of the article
    (Preferentially Monday 11 October)

3
http//string.embl.de
4
Contents
  • Predicting functional interactions between
    proteins
  • Genomic context methods
  • General
  • Gene fusion
  • Gene order
  • Presence / absence of genes across genomes
  • Integration and benchmarking of predictions
  • Interaction networks
  • In addition to genomic context functional
    genomics data

5
Complete genomes, now what?
  • Post-genomic era we have the parts list
    (complete genomes)
  • to understand the cell we need to know the
    functions of the genes

6
For most genes in any genome we need function
prediction
  • E. Coli, the most intensively studied organism
  • only 1924 genes (43) have been (partially)
    experimentally characterized.

7
Predicting protein function
What is function ? Various levels of
description Sequence similarity/homology has
the largest relevance for Molecular Function.
This aspect of protein function is best
conserved. Molecular function can often be
predicted from similarities between protein
sequences (BLAST), or structures.
8
BLAST
9
Beyond homology and molecular function
  • Homolgy based function prediction works very
    well, but
  • a large fraction of genes are poorly described
    (no homologs, uncharacterized homologs this
    holds for 60 of the human genes)
  • There are other aspects of function functional
    associations, e.g. the target of a protein kinase
    or a transcriptional regulator
  • Thus predicting these associations

10
  • Genome sequences
  • Allowing us to interpret the function of proteins
    within the context in which they occur
  • Reverse this process predict the function of a
    protein from the context in which it tends to
    occur ? prediction of protein function/pathways
    from genome sequences Use the genome sequences
    (through comparative genome analysis) for
    interaction prediction genomic context methods
  • Genomic context methods have been shown to be
    reliable indicators for functional associations

11
There are many types of functional associations
(AKA functional interactions, interactions,
functional links, functional relations) in
molecular biology
Cellular process
12
Types of functional associations
metabolic pathways filling gaps
13
Types of functional associations
Transcription regulation
Signalling pathways
P
14
Types of functional associations
Cellular process
Protein complexes
15
Contents
  • Predicting functional interactions between
    proteins
  • Genomic context methods
  • General
  • Gene fusion
  • Gene order
  • Presence / absence of genes across genomes
  • Integration and benchmarking of predictions
  • Interaction networks
  • In addition to genomic context functional
    genomics data

16
Genomic context is an tool to predict functional
associations between genes
  • Use the genome sequences (through comparative
    genome analysis) for interaction prediction
    genomic context methods
  • Genomic context methods have been shown to be
    reliable indicators for functional interaction
  • Genomic context is also known as in silico
    interaction prediction, or genomic associations

17
Genomic context methods detect evolutionary
traces in genomes of functionally associated
proteins
trpA
trpB
18
(No Transcript)
19
Three different genomic context methods in STRING
  • Gene fusion, Rosetta stone method
  • Conserved gene order between divergent genomes
  • Co-occurrence of genes across genomes,
    phylogenetic profiles

20
All genomic context methods use orthologs
corresponding genes between genomes
  • Orthologs not just homologs related by
    speciation
  • Orthologs are very likely to have the same
    function
  • orthologs genomes alignment sequence

Gene Duplication
Speciation
21
Contents
  • Predicting functional interactions between
    proteins
  • Genomic context methods
  • General
  • Gene fusion
  • Gene order
  • Presence / absence of genes across genomes
  • Integration and benchmarking of predictions
  • Interaction networks
  • In addition to genomic context functional
    genomics data

22
Gene fusion
  • i.e. the orthologs of two genes in another
    organism are fused into one polypeptide
  • A very reliable indicator for functional
    interaction partly because it is an relatively
    infrequent evolutionary event 3470 distinct
    fusions when surveying 179 genomes

Fusion
23
Gene fusion an example
24
Contents
  • Predicting functional interactions between
    proteins
  • Genomic context methods
  • General
  • Fusion
  • Gene order
  • Presence / absence of genes across genomes
  • Integration and benchmarking of predictions
  • Interaction networks
  • In addition to genomic context functional
    genomics data

25
Gene order evolves rapidly
But
26
Differential retention of divergent / convergent
gene pairs suggests that conservation implies a
functional association
27
Comparison to pathways conservation implies a
functional association
28
Conserved gene order
  • i.e. genes that are present over sufficiently
    large evolutionary distances in the same gene
    cluster
  • Contributes by far the most predictions

29
Conserved gene order
NB1 predicting operons is not trivial in fact
conserved gene order or functional association is
a major clue NB2 using only operons without
requiring conservation results in much less
reliable function prediction
30
Conserved gene order an example from metabolism
of propionyl-CoA
target
query
31
Conserved gene order an example from metabolism
of propionyl-CoA
Biochemical assays confirm the function of
members of COG0346 as a DL-methylmalonyl-CoA
racemase
32
Contents
  • Predicting functional interactions between
    proteins
  • Genomic context methods
  • General
  • Gene fusion
  • Gene order
  • Presence / absence of genes across genomes
  • Integration and benchmarking of predictions
  • Interaction networks
  • In addition to genomic context functional
    genomics data

33
Presence / absence of genes
Gene content ? co-evolution. (The easy case, few
genomes. )
Differences between gene Content reflect
differences in Phenotypic potentialities
Genomes share genes for phenotypes they have in
common
34
Presence / absence of genes
L. innocua (non-pathogen)
L. monocytogenes (pathogen)
35
Presence / absence of genes
Genes involved in pathogenecity
L. monocytogenes (pathogenic)
L. innocua (non-pathogenic)
36
Generalization phylogenetic profiles /
co-occurence
species 1 species 2 species 3 species 4
species 5 ...... ... .. ..
Gene 1 Gene 2 Gene 3 ....
species 1 species 2 species 3 species 4
species 5 ...... ... .. ..
Gene 1 1 0 1 1 0 1
Gene 2 1 1 0 0 1
0 Gene 3 0 1 0 0 1
0 ....
37
but phylogenetic signal in gene content!
Escherichia coli
Haemophilus influenzae
\s sp1 sp2 sp3 sp4 sp1 \1 0.2 0.4
0.2 sp2 \1 0.9 0.1 sp3
\1 0.3 sp4 \1

38
Co-occurrence of genes across genomes
  • i.e. two genes have the same presence/ absence
    pattern over multiple genomes they have
    co-evolved
  • AKA phylogenetic profiles

39
Predicting function of a disease gene protein
with unknown function, frataxin, using
co-occurrence of genes across genomes
  • Friedreichs ataxia
  • No (homolog with) known function

40
Frataxin has co-evolved with hscA and hscB
indicating that it plays a role in iron-sulfur
cluster assembly
A
.
a
e
B
o
u
l
i
c
c
h
u
n
R
s
.
e
S
p
r
y
a
r
P
D
X
H
n
o
.
N
P
.
.
.
a
.
e
V
i
M
r
.
f
E
w
e
B
.
c
a
C
a
n
m
.
.
m
r
c
a
.
s
f
h
d
.
g
s
M
u
l
u
z
e
t
h
i
c
o
e
M
i
l
coli
u
g
u
o
e
r
.
n
o
t
d
c
n
l
i
b
.
e
i
l
e
k
o
t
d
i
i
o
y
t
s
n
n
e
i
n
t
c
i
u
u
t
o
s
i
r
c
o
l
a
i
g
z
i
r
s
t
b
a
i
l
e
i
s
d
i
a
a
a
s
i
e
t
e
s
a
n
e
a
i
u
r
n
t
d
c
s
u
m
i
u
H
s
s
l
.
s
o
D.melan.
s
a
i
p
s
i
e
n
s


s
cyaY Yfh1
41
Iron-Sulfur (2Fe-2S) cluster in the Rieske protein
42
Prediction
Confirmation
43
The opposite of co-occurrenceanti-correlation /
complementary patterns predicting analogous
enzymes
Genes with complementary phylogenetic profiles
tend to have a similar biochemical function.
A
B
A
B
44
Complementary patterns in thiamin biosynthesis
predict analogous enzymes
45
Prediction of analogous enzymes is confirmed
46
Contents
  • Predicting functional interactions between
    proteins
  • Genomic context methods
  • General
  • Gene fusion
  • Gene order
  • Presence / absence of genes across genomes
  • Integration and benchmarking of predictions
  • Interaction networks
  • In addition to genomic context functional
    genomics data

47
Benchmark and integration KEGG maps
48
Integrating genomic context scores into one
single score
  • Compare each individual method against an
    independent benchmark (KEGG), and find
    equivalency
  • Multiply the chances that two proteins are not
    interacting and subtract from 1 naive bayesian
    i.e. assuming independence

1
0.8
0.6
Fraction same KEGG map
0.4
Fusion
Gene Order
0.2
Co-occurrence
0
0
0.2
0.4
0.6
0.8
1
Score
49
Benchmark
100000
10000
1000
Coverage (number of predicted links between
orthologous groups)
Integrated
Gene Order (norm.)
Gene Order (abs.)
100
Cooccurrence
Fusion (norm.)
Fusion (abs.)
10
0.5
0.6
0.7
0.8
0.9
1.0
Accuracy (fraction of confirmed predictions,
i.e. same KEGG map)
50
Performance of genomic context compared to
high-throughput interaction data
purified complexes TAP
Purified Complexes HMS-PCI
genomic context
mRNA co-expression
two methods
synthetic lethality
Coverage
combined evidence
fraction of reference set covered by data
yeast two-hybrid
three methods
raw data
filtered data
parameter choices
Accuracy
fraction of data confirmed by reference set
51
Genomic context biochemistry by other means
Despite the high performance of genomic context
methods, as a tool for function prediction it is
not a button press method It is more like
biochemistry by other means. Often quite a lot
of manual input and expert knowledge from the
researcher is needed to distill associations into
a concrete function prediction Small-scale
bioinformatics?
52
Contents
  • Predicting functional interactions between
    proteins
  • Genomic context methods
  • General
  • Fusion
  • Gene order
  • Co-occurrence across genomes
  • Integration and benchmarking of predictions
  • Interaction networks
  • In addition to genomic context functional
    genomics data

53
STRING allows a network view
e.g. see not only to which genes the query gene
has an association, but also what the relations
are among these other genes
54
STRING
Network output (depth1)
Assigning
uncharacterized archeal proteins
to a network around
Archeal flagellins
Archeal flagellin biosynth. ATPase
55
STRING
Type IV secretion pathway
Network (depth2)
Connecting associated cellular processes
Archeal flagellins
Archeal flagella components
Chemotaxis- related
56
STRING
Network (depth3)
Zooming out to other cellular processes
57
Using the local network to detect
multi-functional proteins
58
Contents
  • Predicting functional interactions between
    proteins
  • Genomic context methods
  • General
  • Fusion
  • Gene order
  • Co-occurrence across genomes
  • Integration and benchmarking of predictions
  • Interaction networks
  • In addition to genomic context functional
    genomics data

59
  • STRING currently in addition includes
  • Functional association data from large scale /
    high-throughput biochemical experiments
    (functional genomics data)
  • protein complex purification
  • yeast-2-hybrid
  • ChIP-on-chip
  • micro-array gene expression
  • known functional relations, so called legacy
    data, as present in PubMed abstracts and
    databases like MIPS or KEGG.

60
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com