Prediction of protein function and pathways in the genome era - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Prediction of protein function and pathways in the genome era

Description:

Prediction of protein function and pathways in the genome era. Toni Gabald n Estevan ... Classic method: function prediction by homology. No homolog (orfans) ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 51
Provided by: Toni79
Category:

less

Transcript and Presenter's Notes

Title: Prediction of protein function and pathways in the genome era


1
Prediction of protein function and pathways in
the genome era
Toni Gabaldón Estevan Computational
Genomics CMBI-NCMLS U. Nijmegen (The
Netherlands) T.Gabaldon_at_cmbi.kun.nl www.cmbi.kun.n
l/jagabald
2
Prediction of protein function and pathways in
the genome era
1- Analysis of non-coding regions in DNA
2- Coding regions Introduction to genomic
context function-prediction techniques
3- Case studies mitochondrial proteome Availabil
ity www.cmbi.kun.nl/jagabald/summercourse.html
3
1995 Haemophilus influenzae 1,8 GB
4
(No Transcript)
5
Gene prediction
ATGATGCAGTTCTCGAAAATGCATGGCCTTGGCAACGATTTTATGGTCGT
CGACGCGGTAACGCAGAATGTCTTTTTTTCACCGGAGCTGATTCGTCGCC
TGGCTGATCGGCACCTGGGGGTAGGGTTTGACCAACTGCTGGTGGTTGAG
CCGCCGTATGATCCTGAACTGGATTTTCACTATCGCATTTTCAATGCTGA
TGGCAGTGAAGTGGCGCAGTGCGGCAACGGTGCGCGCTGCTTTGCCCGTT
TTGTGCGTCTGAAAGGACTGACCAATAAGCGTGATATCCGCGTCAGCACC
GCCAACGGGCGGATGGTTCTGACCGTCACCGATGATGATCTGGTCCGCGT
AAATATGGGCGAACCCAACTTCGAACCTTCCGCCGTGCCGTTTCGCGCTA
ACAAAGCGGAAAAGACCTATATTATGCGCGCCGCCGAGCAGACAATCTTA
TGCGGCGTGGTGTCGATGGGAAATCCGCATTGCGTGATTCAGGTCGATGA
TGTCGATACCGCGGCGGTAGAAACGCTTGGTCCTGTTCTGGAAAGCCACG
AGCGTTTTCCGGAGCGCGCCAATATCGGTTTTATGCAAGTGGTTAAGCGC
GAGCATATTCGTTTACGCGTTTATGAGCGTGGGGCAGGAGAAACCCAGGC
CTGCGGCAGCGGCGCGTGTGCGGCGGTTGCAGTAGGGATTCAGCAAGGTT
TGCTGGCCGAAGAAGTACGCGTGGAACTCCCCGGCGGTCGTCTTGATATC
GCCTGGAAAGGTCCGGGTCACCCGTTATATATGACTGGCCCGGCGGTACA
TGTCTACGACGGATTTATTCATCTATGA
6
Non-coding regions Alignment of mouse-man
genomes - 5 under positive selection (1.5
coding, 3.5 non-coding) - More subtle signals ?
evolutionarily conserved (comparative genomics)
7
Detecting subtle signals in DNA sequences
Typical Structure of a Eukaryotic Gene
8
Control of Transcription Initiation
9
Representing motifsSequence Logo
Height is the information content per position.
Height of the individual nucleotides is
determined by their frequency, the most frequent
on top
Information content 2 S pi log2 pi
10
a1
a2
a3
a4
ak
Genes regulated by the same factor
?
Find the motif for the binding site
a1
a2
a3
a4
ak
11
Gibbs Sampling
  • Goal find the best ak to maximize the difference
    between motif and background base distribution.

12
  • E. Coli, the most intensively studied organism
  • only 1924 genes (43) have been (partially)
    experimentaly characterized.

13
Classic method function prediction by homology
14
Classic method function prediction by homology
15
  • No homolog (orfans)
  • Homologos of unknown function.

60 poorly annotated
16
What is protein function? Fuzzy term
Homology
17
A genome is more than the sum of its genes
18
Turning data into knowledge
Comparative genomics
biology
19
Genomic context
Homology
20
(some)Types of genomic context
  • Gene fusion/fission
  • Chromosomal location
  • Co-evolution
  • Co-expression

21
Gene Fusion (fission)
22
Gene fusion/fission
3 genomes ? 88 gene fusions 30 genomes ? 10.075
g. fusions
trpA trpB
E.coli
Yeast
Tryptophan synthase subunits A and B, fused in
yeast.
23
(No Transcript)
24
GENE ORDER
Genomes are shuffled In the course of
evolution. But,..
25
Gene order/neighborhood. Extreme case bacterial
operons.
26
(No Transcript)
27
Gene content ? co-evolution. (The easy case, few
genomes. )
Differences between gene Content reflect
differences in Phenotypic potentialities
Genomes share genes for phenotypes they have in
common
28
L. innocua (non-pathogen)
L. monocytogenes (pathogen)
29
Genes involved in pathogenecity
L. innocua (non-pathogenic)
L. monocytogenes (pathogenic)
30
species 1 species 2 species 3 species 4
species 5 ...... ... .. ..
Generalization phylogenetic profiles
Gene 1 Gene 2 Gene 3 ....
31
species 1 species 2 species 3 species 4
species 5 ...... ... .. ..
Generalization phylogenetic profiles
Gene 1 1 0 1 1 0 1
Gene 2 1 1 0 0 1
0 Gene 3 0 1 0 0 1
0 ....
32
Generalization phylogenetic profiles
Genes with similar phylogenetic profiles tend to
be involved in the same biological process.
A
B
C
33
Generalization phylogenetics profiles
Genes with complementary phylogenetic profiles
tend to have a similar biochemical function.
A
B
A
B
34
Co-evolution, correlation of mutations, physical
interaction
Receptor-ligand Complexes ..
35
(No Transcript)
36
Predicting gene function by conserved
co-expression after gene duplication or
speciation
  • Co-expression in one species too weak
  • Use evolutionary conservation to improve function
    prediction?

37
Benchmarking high-throughput interaction data
100
10
fraction of reference set covered by data (log )
Coverage
1
1
0.1
0.1
1
1
10
100
Accuracy
Snel B. et al Nat. Gen. (2003)
fraction of data confirmed by reference set (log
)
38
http//string.embl.de
39
2. Case studies mitochondrial proteome
40
Calcium signaling
Coenzyme synthesis
Citric acid cycle
Urea cycle
Heme synthesis
Electrical signaling
Apoptosis
ATP production
Fatty acids oxidation
Heat generation
41
Mitochondria originated from the endosymbiosis of
an alpha-proteobacteria
42
Our method
Identify eukariotic proteins with an
alpha-proteobacterial origin based on its
phylogeny.
Eukaryotes
Common origin endosymbiosis
Alpha-proteobacteria
43
Reconstruction of an ancestral metabolism
Gabaldón T. and Huynen M. Science (2003)
44
Eukaryotes underwent extensive lineage-specific
gene loss of the proto-mitochondrial derived set
45
We used this property to predict biological
interactions among our set.
. Identifying proteins with a similar
evolutionary history
46
Proteins that have a similar evolution tend to
function in the same biological process
Fraction of proteins Functioning in the same
biochemical pathway
Average
47
  • Complex I deficiency.
  • Inherited
  • Severe (patients lt 5 years old)
  • No mutation in known 46 Complex I genes.
  • ???

48
(No Transcript)
49
(No Transcript)
50
Recommended
  • ? Read Prediction of protein function and
    pathways in the genome era Gabadon Huynen
    (2004) Cell Mol. Life Sci. 2004
    Apr61(7-8)930-44
  • www.cmbi.ru.nl/jagabald/summercourse.html
  • ? Try these methods with your favourite protein
  • http//string.embl.de/
Write a Comment
User Comments (0)
About PowerShow.com