Hidden Markov Models for Multiple Sequence Analysis - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Hidden Markov Models for Multiple Sequence Analysis

Description:

Free model (align long sequences) Tied model (take advantage of periodicity) ... 10 state wheel accounts for these patterns with skip and loop arrows. Protein ... – PowerPoint PPT presentation

Number of Views:156
Avg rating:3.0/5.0
Slides: 23
Provided by: jeffph
Category:

less

Transcript and Presenter's Notes

Title: Hidden Markov Models for Multiple Sequence Analysis


1
Hidden Markov Models for Multiple Sequence
Analysis
  • Jeff Phillips
  • 10.14.01

2
Resources
  • Baldi, Pierre and Brunak, Soren. Bioinformatics
    The Machine Learning Approach. 2001.
  • Clote, Peter and Backofen, Rolf. Computational
    Molecular Biology. 2000.
  • Shamir, Ron. Lecture in Algorithms for Molecular
    Biology. http//www.math.tau.ac.il/rshamir/algmb/
    scribe/html/lec06/

3
Outline
  • Description of General Technique
  • Application to DNA/RNA
  • Application to Proteins
  • Conclusions

4
Algorithm Overview
  • Used on Family of sequences.
  • Build profile HMM from family. (Baum-Welch
    Algorithm)
  • From profile HMM use Viterbi Algorithm to fit new
    sequence.
  • Results aligns all sequences.
  • Score determines membership in family

5
Algorithm Overview
Family of sequences
Baum-Welch Algorithm
Profile HMM
Viterbi Algorithm
many-to-one not many-to-many
6
Algorithm Overview
  • What is a Profile HMM?

Squares match states (correct
match) Diamonds insert states (need to
insert) Circle delete states (need to
remove item)
7
Algorithm Overview
  • Align GGCT, ACCGAT, CT
  • GGCT m0, m1, m2, m3, m4, m5
  • ACCGAT m0, i0, m1, d2, m3, i3, i3, m4, m5
  • CT m0, m1, d2, d3, m4, m5
  • . G G C . . T
  • a C C g a T
  • . C - - . . T

8
RNA/DNA Analysis
  • Long sequences (100-200 nucleotides)
  • Only four symbols A,G,C,T
  • ? Hard to create large profile HMMs
  • Free model (align long sequences)
  • Tied model (take advantage of periodicity)
  • Wheel model (take advantage of periodicity)

9
Symbology
  • AT -- A or T
  • G -- not G (A, C, or T)
  • C -- C
  • GTATGACGC

10
Periodicity
  • Note periodicity.
  • AG in phase
  • CT in anti-phase
  • AGCT about every 10 base pairs.
  • 10 indicates structural patterns.
  • 9 would indicate triplet reading frame patterns.

11
Tied Model
Repeated HMM
di
mi
End
Start
setup
tail
ii
12
Tied Model
  • GCACATCGCATGCTATCGC
  • Periodic in 10 base pairs, so find pattern every
    period.
  • Tie sections of HMM together
  • Thicker the lines, the higher probability of
    transition

13
Wheel Model
  • TATC in 8,9,10
  • Outside arrows represent start
  • Loop and skip arrows work like inserts and
    deletes.
  • Thicker arrows means higher probability

14
Wheel Model
  • Emissions of 9 state wheel with no skip or loop
    arrows.
  • 3-period reading frame pattern visible.
  • AGATCG on 123, 456, 789
  • 10 state wheel accounts for these patterns with
    skip and loop arrows.

15
Protein Alignment
  • Successfully applied to
  • globins, immunoglobulins, kinases, and
    G-protein-coupled receptors
  • Multiple alignment.
  • Family classification.
  • Profile HMM is long sequence, so we need a lot of
    data (large families).

16
Multiple Sequence Alignment
  • Use Free model. We usually have enough data.
  • Apply tied model in repeated structure.
  • Sometimes stronger patterns will emerge.

17
Sugar Transport Proteins
- H. Mamitsuka 96
18
Family Classification
  • From profile HMM score new sequence.
  • Negative log-likelihoods of the complete Viterbi
    paths.
  • Compare score to scores of known sequences in
    family.

19
Family Classification
20
Available Software
  • HMMpro www.netid.com
  • HMMER www.genetics.wustl.edu/eddy
  • SAM www.cse.ucsc.edu/research/compbio/sam.html

21
Conclusion
Advantages
  • From many-to-many problem to many-to-one problem.
  • One profile HMM can multiply align and classify.
  • Seems to notice some structural patterns.
  • Available software.

22
Conclusion
Disadvantages
  • Requires large amount of base data.
  • Limited to first order properties.
Write a Comment
User Comments (0)
About PowerShow.com