Gapped%20BLAST%20and%20PSI-BLAST:%20a%20new%20generation%20of%20protein%20database%20search%20programs - PowerPoint PPT Presentation

About This Presentation
Title:

Gapped%20BLAST%20and%20PSI-BLAST:%20a%20new%20generation%20of%20protein%20database%20search%20programs

Description:

Title: PSI-BLAST: Overview Author: niranjan Last modified by: niranjan Created Date: 9/16/2001 10:32:06 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:308
Avg rating:3.0/5.0
Slides: 16
Provided by: nir105
Category:

less

Transcript and Presenter's Notes

Title: Gapped%20BLAST%20and%20PSI-BLAST:%20a%20new%20generation%20of%20protein%20database%20search%20programs


1
Gapped BLAST and PSI-BLAST a new generation of
protein database search programs
  • By Stephen F. Altschul, Thomas L. Madden,
    Alejandro A. Schäffer, Jinghui Zhang, Zheng
    Zhang, Webb Miller and David J. Lipman

2
Introduction to BLAST
  • BLAST is a heuristic approximation to dynamic
    programming based local alignment.
  • Finds locally maximal segment pairs with scores
    over a cutoff.
  • Has a formal statistical theory to assess the
    significance of scores.

3
Basic Algorithm
  • Looks for words of length w with score greater
    than T.
  • These hits are then extended to check for segment
    pairs with score greater than S (gtT.)
  • Tradeoff Lowering T reduces probability of
    missing segment pairs (increases sensitivity) but
    increases number of hits to be extended.

4
Scanning for hits
  • Two Approaches
  • Positions of length w words in query with score
    higher than T stored in a 20w sized array and
    hits detected by array lookup.
  • A DFA for the appropriate words is generated and
    used to scan the sequences. A Mealy machine
    (acceptance on transitions) is used for
    efficiency.

5
Other Issues
  • Hit extension is simplified by stopping when
    score falls below a threshold compared to the
    best score found for shorter extensions.
  • The various parameters are chosen based on
    experiments using random sequences.
  • Combinations of MSPs can be used to get better
    scores for matching sequences.

6
Two-Hit Method
  • Original BLAST One-Hit
  • Extend each hit to determine if it is in a
    high-scoring alignment
  • Extension consumes gt90 of processing time
  • hit short word pair whose aligned score T
  • Two-Hit Method
  • Extension invoked only if there are two
    non-overlapping word pairs on the same diagonal
  • Lowering T yields more hits, but only a few are
    extended
  • 3x faster

T threshold parameter as T ?, speed ?,
probability of missing weak similarities ?
7
Two-Hit Method Algorithm
  • Scan db for hits (word pair scoring T)
  • Seek pairs of non-overlapping hits found with
    distance A of one another on same diagonal
  • Invoke (ungapped) extension to determine if hits
    lie within a statistically significant alignment
    with query. Extend until alignment score has
    dropped X below max score yet attained.

8
Gapped Alignments
  • Original BLAST
  • Implicitly treat gapped alignments
  • Locate several distinct HSPs within same db
    sequence
  • Calculate statistical significance on combined
    result
  • Gapped BLAST
  • Trigger gapped extension for any HSP exceeding
    moderate score Sg
  • Gapped extension longer to execute, few undergo
    this extension

HSP high-scoring segment pair locally optimal
9
Advantage of New Heuristic for Generating Gapped
Alignments
  • Two or more HSPs may each have low scores
    independently, but can have a statistically
    significance together
  • Only one of the constituent HSPs need to be found
    to generate a successful combined result can
    increase T

10
Older Gapped Alignments
  • Confine the dynamic programming to a banded
    section of the full path graph
  • Optimal gapped alignment may be outside this band
  • As width of band ?, speed ?

11
New Heuristic for Generating Gapped Alignments
  • Starting from a seed HSP, dynamic programming
    proceeds both bidirectionally through the path
    graph
  • Consider only cells for which optimal local
    alignment score falls Xg below best score yet
    found
  • Region of path graph explored adapts to alignment
    being constructed
  • Seed central residue pair of segment with
    highest alignment along HSP

12
New Gapped BLAST
  • Ungapped extension of second hit invoked for two
    non-overlapping hits of score T within distance
    A of one another
  • If HSP generated has normalized score Sg,
    gapped extension is triggered
  • Resulting gapped alignment reported if
    statistically significant (low enough E-value)
  • Runs on average 3x faster than original BLAST

13
PSI-BLAST Overview
  • Results of initial BLAST search used to construct
    position-specific scores.
  • BLAST is repeated using the new scores till no
    more sequences are found.
  • Position-specific scores improve the ability of
    successive BLAST iterations for detecting remote
    homologs.

14
Position-specific score matrix
  • Dimensions Lx20
  • Multiple Alignment created using all segments
    with e-value above a threshold.
  • Alignment based on pairwise alignments.
  • Columns with gaps in query ignored.
  • For each column C a reduced alignment MC is
    created.

15
  • MC includes all sequences with a residue in C and
    all columns which have the above sequences.
  • Sequence weighting method used to generate
    observed residue frequencies.
  • Score for residue i in column C given by log(Qi/
    Pi)
  • Qi is the weighted sum of observed frequencies
    and a pseudocount.
Write a Comment
User Comments (0)
About PowerShow.com