Global Pairwise Sequence Alignment - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Global Pairwise Sequence Alignment

Description:

A gap penalty g(k) is a penalty for a gap of length k. It is affine if g(k) is linear ... Tandem repeats should be given an affine penalty as well, because having one ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 35
Provided by: brian277
Category:

less

Transcript and Presenter's Notes

Title: Global Pairwise Sequence Alignment


1
(No Transcript)
2
Global Pairwise Sequence Alignment
  • Methods for Aligning and Scoring Sequences
  • Brian Chen

3
Pairwise Alignment
  • Goals
  • Distance between two different sequences
  • Align sequences between conserved regions
  • Need tools and definitions for this

4
Definitions
  • S is a finite alphabet.
  • A word aÎS is a sequence of a1a2..an
  • The length of word a is a
  • The empty word is l
  • The gap symbol is

5
Operations on Strings
  • Edit Operations
  • (x, y) is a Substitution if
  • x, y ÎS, with x ¹ y, x ¹ -, y ¹ -,
  • (x, y) is an Insertion if
  • x - (a gap), and y ÎS
  • (x, y) is an Deletion if
  • xÎS and y -

6
More on Edit Operations
  • For a, b Î (SÈ-), and
  • edit operation (ai, bi) for ai ¹ bi
  • we call this a (a, b) b
  • a ÞS b is a sequence of operations S
    transforming a into b.

7
Weighting Edit Operations
  • In general biological systems, the probability
    for a position to change from one value to
    another is not the same between values.
  • Thus, we must support weighted Edit Operations.

8
The Cost Function
  • w(x, y) is the cost function
  • Assigns weights to the edit operation involved
    with changing from x to y
  • This holds even for insertions and deletions
    (shorthand indels)
  • For insertion, x -, and y ÎS
  • For deletion, xÎS and y -

9
Edit Distance
  • Definition
  • Given a cost function
  • w (SÈ-) (SÈ-) Reals, two words a,b Î
    S, and S, the sequence of operations
    transforming a into b,
  • The Edit Distance is
  • dw(a,b) min w(S) a ÞS b

10
Properties of the Edit Distance
  • Its a metric
  • w(x,y) 0 iff x y
  • w(x,y) w(y,x)
  • w(x,z) w(x,y) w(y,z)
  • Note that if w is a metric, then dw is also a
    metric (pretty obvious)

11
Alignments
  • For x Î (SÈ-), xS is the restriction of x to
    S. That is, x with all instances of deleted.
  • An alignment is the pair (aà, bà) with
  • aà, bà Î (SÈ-) such that aà bà
  • with no aià - bià
  • (no matching gaps)

12
Properties of alignments
  • aàS a
  • bà S b
  • The cost of an alignment,
  • w(aà, bà) Saàw(aià,bià)
  • The alignment distance of a,b is
  • daw(a,b) minw(aà, bà) "(aà, bà)

13
Additivity
  • For alignments (aà, bà) and (cà, dà),
  • (aà cà, bà dà) is an alignment
  • w(aà cà, bà dà) w(aà, bà) w(cà, dà)

14
Theorem 1
  • Let w be a metric cost function, and
  • a,b Î S. Then for every alignment
  • (aà, bà) of (a,b), there is a sequence S of edit
    operations satisfying
  • a ÞS b, and
  • w(S) w(aà, bà)
  • Proof obvious

15
Needleman-Wunsch Edit Distance
  • Let w (SÈ-) (SÈ-) Reals,
  • Let a,b Î S, with a n, b n.
  • Define the matrix D by
  • 0 i a and 0 j b
  • D0,0 0,
  • D0,j Sk1iw(-, bk)
  • Dj,0 Sk1jw(ak, -)

16
  • Di,j min
  • Di,j-1w(-,bj),
  • Di-1, j-1w(ai,bj),
  • Di-1,jw(ai,0)
  • Then, Da,b D(a, b), the minimum global
    sequence alignment distance

17
TraceBacks
  • A simple modification which helps visualize the
    process which the algorithm takes.
  • Replace integer values with arrows showing which
    square in the matrix to proceed to
  • Alignments are generated by moving from the
    lowest right square to the upper left square.
  • All alignments generated are optimal

18
TraceBack Example
  • Let a AT, b AAGT
  • Corresponding Trace Matrix

19
TraceBack Results
  • The red arrows indicate two possible optimal
    alignments
  • Between AT and AGGT, we get
  • A- - T
  • AGGT
  • And
  • -A-T
  • AGGT

20
Gap Penalties
  • Attempting to model edit systems closer and
    closer to biological systems
  • A gap penalty g(k) is a penalty for a gap of
    length k. It is affine if g(k) is linear
  • The weight of an alignment with gaps is the
    weight of the alignment plus the weight of all
    gaps in both sequences

21
Further Results
  • Waterman, Smith, and Beyer, 1976
  • Developed simple traceback methods for for
    evaluating sequence pairs with simple gap
    penalties

22
Further Results
  • Gotoh, 1982
  • Developed traceback evaluation of affine gap
    penalty sequences.
  • An affine gap penalty is
  • This is because in biological systems, the
    insertion of one gap is different than the
    insertion of two serial gaps.

23
Tandem Repeats
  • Biological systems also exhibit the tendency to
    create tandem repeats
  • Example
  • ACGATACCG
  • â
  • AC GATAC GATAC GATAC GATAC CG
  • Tandem repeats should be given an affine penalty
    as well, because having one repeat is different
    from having two

24
Results with Tandom Repeats
  • Benson developed a method in 1997 to address
    insertion, deletion, and substitution, as well as
    affine gaps and affine tandem repeats
  • Assume tandem duplication occurs before other
    operations
  • There is no removals of individual positions in a
    tandem repeated region

25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com