Introduction to Sequence Alignment - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Introduction to Sequence Alignment

Description:

Gibbs & McIntyre (1970) Dot Matrix Alignment. Has many variations. Can be used to find sequence repeats ... Find self-complimentary subsequences of RNA to ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 42
Provided by: sch17
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Sequence Alignment


1
Introduction to Sequence Alignment
2
(No Transcript)
3
Why Align Sequences?
  • Find homology within the same species
  • Find clues to gene function
  • Practical issues in experiments
  • Find homology in other species
  • Gather info for an evolutionary model
  • Gene families

4
The Most Visual Way of Aligning Two Sequences
5
Dot Matrix Alignment
CACTAGGC AGCTAGGA
Gibbs McIntyre (1970)
6
Dot Matrix Alignment
7
  • Has many variations
  • Can be used to find sequence repeats
  • Find self-complimentary subsequences of RNA to
    predict secondary structure
  • Still used today

8
Alignment using Dynamic Programming
9
An Example
  • GCGCATGGATTGAGCGA
  • TGCGCCATTGATGACCA
  • A possible alignment
  • -GCGC-ATGGATTGAGCGA
  • TGCGCCATTGAT-GACC-A

10

Alignments
  • -GCGC-ATGGATTGAGCGA
  • TGCGCCATTGAT-GACC-A
  • Three elements
  • Perfect matches
  • Mismatches
  • Gaps

11
Choosing Alignments
  • There are many possible alignments
  • For example, compare
  • -GCGC-ATGGATTGAGCGA
  • TGCGCCATTGAT-GACC-A
  • to
  • ------GCGCATGGATTGAGCGA
  • TGCGCC----ATTGATGACCA--
  • Which one is better?

12
Scoring Rule
  • Example Score
  • ( matches) ( mismatches) ( gaps) x 2

13
Example
  • -GCGC-ATGGATTGAGCGA
  • TGCGCCATTGAT-GACC-A
  • Score (1x13) (-1x2) (-2x4) 3
  • ------GCGCATGGATTGAGCGA
  • TGCGCC----ATTGATGACCA--
  • Score (1x5) (-1x6) (-2x11) -23

14
Optimal Alignment
  • Optimal alignment is achieved at best similarity
    score d, thus is determined by the scoring rule

15
Finding the Best Alignment Score
  • The additive form of the score allows to perform
    dynamic programming to find the best score
    efficiently
  • Guaranteed to find the best alignment

16
Assume that an Optimal Score Exists
  • d(s,t) Optimal score for globally aligning s
    and t

17
The Idea
  • The best alignment that ends at a given pair of
    bases the best among best alignments of the
    sequences up to that point, plus the score for
    aligning the two additional bases.

18
Dynamic Programming
  • Consider the best alignment score of two
    sequences s, t at base/residue i1, j1,
    respectively

19
Dynamic Programming
  • The best alignment must be in one of three cases
  • 1. Last position is (si1,tj 1 )
  • 2. Last position is (-, tj 1 )
  • 3. Last position is (si 1,-)

20
Dynamic Programming
  • The best alignment must be in one of three cases
  • 1. Last position is (si1,tj 1 )
  • 2. Last position is (-, tj 1 )
  • 3. Last position is (si 1,-)

21
Dynamic Programming
  • The best alignment must be in one of three cases
  • 1. Last position is (si1,tj 1 )
  • 2. Last position is (-, tj 1 )
  • 3. Last position is (si 1,-)

22
Dynamic Programming
23
Dynamic Programming
  • Of course, we first need to handle the base cases
    in the recursion

24
Dynamic Programming
A G C A A A C
We fill the matrix using the recurrence rule
25
Dynamic Programming
26
Dynamic Programming
Conclusion d(AAAC,AGC) -1
27
Reconstructing the Best Alignment
AAAC AG-C
28
More than one best alignment
AAAC A-GC
29
Complexity
  • Space O(mn)
  • Time O(mn)
  • Filling the matrix O(mn)
  • Backtrace O(mn)

30
Needleman Wunsch (1970)
  • A General Method Applicable to the Search for
    Similarities in the Amino Acid Sequence of Two
    Proteins
  • J. Mol. Biol. 48 443-453

31
Local Alignment
  • We just introduced global alignment
  • Now introduce local alignment
  • A local Alignment between sequence s and sequence
    t is an alignment with maximum similarity between
    a substring of s and a substring of t.

32
Smith and Waterman (1981)
  • Identification of Common Molecular Subsequences
  • J. Mol. Biol., 147195-197

33
Best-aligned Subsequences
The best score or start over
34
  • Note different scoring rule

35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
Best aligned subsequences
Write a Comment
User Comments (0)
About PowerShow.com