Title: Detecting distant homologies among protein sequences by comparison of profile hidden Markov models
1- Detecting distant homologies among protein
sequences by comparison of profile hidden Markov
models - Martin Madera
- Department of Computer Science
- University of Bristol
2What's a protein? (1/2)?
- linear polymer of amino-acids - protein
sequence string over alphabet of 20 amino-acids
stored in the DNA
sequence of its gene
3What's a protein? (2/2)?
all atoms
just backbone
cartoon
- proteins fold up into complex 3D structures
- structure functionally important, conserved
during evolution sequences diverge rapidly
- but solving structure experimentally hard, DNA
sequencing easy
4- The problem
- Find if proteins are related
(homologous)? - from their sequences alone
- Why?
- Homologs have similar structures and
- perform similar functions
5A simple approach
?
MLDQQTINIIKATVPVLKEHGVTITTTFYKNLFAKHPEVRPLFDMGRQES
LEQPKALAM TVLAAAQNIENLPAILPAVKKIAVKHCQAGVAAAHYPIVG
QELLGAIKEVLGDAATDDIL DAWGKAYGVIADVLYAQAVE
PIVDSGSVSPLSDAEKNKIRAAWDLVYKDYEKTGVDILVKFFTGTPAAQA
FFPKFKGLT TADDLKQSSDVRWHAERIINAVNDAVKSMDDTEKMSMKLK
ELSIKHAQSFYVDRQYF AGIIA
- take two sequences - calculate least number of
mutations required to turn one into the other -
compare to numbers for random sequences -
estimate level of significance
6Profiles of protein sequences
... model a family of related proteins
multiple sequence alignment
7Profile hidden Markov models (HMMs)?
deletes
inserts
matches
- stochastic machines that emit sequences - each
state emission probabilities transition
probabilities among states
8The traditional paradigm
?
MLDQQTINIIKATVPVLKEHGVTITTTFYKNLFAKHPEVRPLFDMGRQES
LEQPKALAM TVLAAAQNIENLPAILPAVKKIAVKHCQAGVAAAHYPIVG
QELLGAIKEVLGDAATDDIL DAWGKAYGVIADVLYAQAVE
unknown sequence s
HMM of a known family F
- calculate emission probability P(sHMM), -
compare to null probability P(sNULL), -
calibrate, estimate probability that s belongs to
F
9An obvious generalization?
?
collect similar unknown sequences, build HMM of
an unknown family A
HMM of a known family F
PRofile Comparer (PRC, http//supfam.org/PRC)? cal
culates
10Fold recognition results
11Acknowledgements
Julian Gough, University of Bristol Kevin
Karplus, University of California, Santa Cruz