Review of BLAST - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Review of BLAST

Description:

... database search programs SF Altschul, TL Madden, AA Schaffer, J Zhang, Z Zhang, ... 10 vertebrate donor site sequences aligned at exon/intron boundary ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 30
Provided by: ryang
Category:
Tags: blast | madden | meet | review

less

Transcript and Presenter's Notes

Title: Review of BLAST


1
Review of BLAST
  • Heuristic, local alignment algorithm
  • Permits trade-off between speed and sensitivity
  • HSPs are dependent on matrix used
  • Statistical significance of results can be
    calculated by E-Value

significant matches detected
sensitivity
__________________________________________________
________________________
significant matches in DB
2
Refinements of BLAST
  • Two-hit method
  • Gapped Alignments
  • Position-Specific Iteration
  • Gapped BLAST and PSI-BLAST a new generation of
    protein database search programs SF Altschul, TL
    Madden, AA Schaffer, J Zhang, Z Zhang, W Miller,
    and DJ Lipman Nucl. Acids Res. 25 3389-3402.
    http//nar.oxfordjournals.org/cgi/content/full/25/
    17/3389

3
Two-hit method
  • BLAST v1
  • Seeks short word pairs with aligned score T
  • Each hit is extended to test if it is within a
    high-scoring alignment (consumes most processing
    time)
  • BLAST v2
  • Seeks two non-overlapping word pairs on the same
    diagonal, within a certain distance
  • T is lowered yielding more hits
  • Fewer number of two non-overlapping word pairs
    exist decreasing average compute time

4
Gapped Alignments
  • BLAST v1
  • Finds several alignments involving a single
    database sequence
  • When alignments are combined, resulting alignment
    is statistically significant
  • When the alignments are not combined, individual
    alignments may not meet statistical threshold to
    be reported
  • BLAST v2
  • Introduces an algorithm to generate gapped
    alignments overcoming issues with BLAST v1
  • Allows T to be raised increasing speed of initial
    database scan
  • Gapped alignment algorithm uses DP to extend a
    central pair of aligned residues in both
    directions confined to a pre-defined strip of the
    DP path graph

5
PSI-BLAST Position-Specific Iterative BLAST
  • Motif or profile search methods are much more
    sensitive than pairwise comparison methods at
    detecting distant relationships
  • Basic Idea
  • BLAST searches may be iterated, with a
    position-specific score matrix generated from
    significant alignments found in round i used for
    round i 1

6
Definition of Profile/Motif
  • an analysis representing the extent to which
    something exhibits various characteristics
  • Examples from ProSite
  • N-glycosylation site
  • N-P-ST-P
  • Glycosaminoglycan attachment site
  • S-G-x-G
  • cAMP- and cGMP-dependent protein kinase
    phosphorylation site
  • RK(2)-x-ST

7
Position-Specific Scoring Matrix
  • A PSSM is a motif descriptor
  • The description includes a weight (score,
    probability, likelihood) for each symbol
    occurring at each position along the motif
  • Examples of motifs
  • Protein active sites, structural elements, zinc
    finger, intron/exon boundaries,
    transcription-factor binding sites, etc.

8
PSSM Example
  • For DNA
  • GTA-AT-G-AC-N-TAC

9
PSSM Example
  • For DNA
  • GTA-AT-G-AC-N-TAC

10
Position-Specific Scoring Matrix
  • Construction of PSSM is a multi-stage process
  • Architecture of matrix
  • Create multiple alignment from which the matrix
    is derived
  • Calculate frequencies for each position
  • Applying BLAST to PSSM

11
Position-Specific Scoring Matrix
  • 10 vertebrate donor site sequences aligned at
    exon/intron boundary

12
Position-Specific Scoring Matrix
  • Calculate the absolute frequency of each
    nucleotide at each position

13
Position-Specific Scoring Matrix
  • Calculate the absolute frequency of each
    nucleotide at each position

14
Position-Specific Scoring Matrix
  • Calculate the relative frequency of each
    nucleotide at each position

15
Position-Specific Scoring Matrix
  • Calculate the relative frequency of each
    nucleotide at each position

16
Position-Specific Scoring Matrix
  • What is the probability of finding CAGGTTGGA?
  • The product of the frequency of each nucleotide
    at each position
  • C is 0.2 at position 1, A is 0.6 at position 2,
    etc -gt 0.2 0.6 0.7 1 1 0.1 0.1 0.5
    0.1

17
Position-Specific Scoring Matrix
  • The ratio of the probability of a sequence in a
    given model, P(SM), and the probability of a
    sequence in a random model, P(SR) is a
    likelihood ratio (or odds ratio) P(SM) / P(SR)
  • And its logarithm is a log likelihood ratio (or
    log odds ratio)
  • If the ratio is
  • 0, S has the same probability to appear in M as
    in R
  • gt0, S is more likely to appear in M than in R
  • lt 0, S is less likely to appear in M than in R

18
Position-Specific Scoring Matrix
  • Compute the log-likelihood values with the
    transformation ln(Mij/Pi)
  • where Mij is the probability of nucleotide i at
    position j and Pi is background probability of of
    nucleotide i
  • Assume each nucleotide can appear at any
    position, then Pi0.25 in random model

19
Position-Specific Scoring Matrix
  • Compute the log-likelihood values with the
    transformation ln(Mij/Pi)
  • where Mij is the probability of nucleotide i at
    position j and Pi is background probability of of
    nucleotide i
  • Assume each nucleotide can appear at any
    position, then Pi0.25 in random model

20
Position-Specific Scoring Matrix
  • Use the profile to scan a sequence
  • Sum the coefficients from the matrix for each
    nucleotide in each position
  • Formally, matrix M for a sequence s of length l
    (s s1, ... , sl, and sk being one of A, C, G,
    T) is computed as

21
Position-Specific Scoring Matrix
  • Assume the sequence GTAGTAGAAGGTAAGTGTCCGTAG
  • Find the score forGTAGTAgaaggtaagTGTCCGTAGGTAGTA
    GaaggtaagtGTCCGTAGGTAGTAGAAGGtaagtgtccGTAG

22
Position-Specific Scoring Matrix
  • Assume the sequence GTAGTAGAAGGTAAGTGTCCGTAG
  • Find the score forGTAGTAgaaggtaagTGTCCGTAG
    (-8)GTAGTAGaaggtaagtGTCCGTAGGTAGTAGAAGGtaagtgtcc
    GTAG

23
Position-Specific Scoring Matrix
  • Assume the sequence GTAGTAGAAGGTAAGTGTCCGTAG
  • Find the score forGTAGTAgaaggtaagTGTCCGTAG
    (-8)GTAGTAGaaggtaagtGTCCGTAG (8.33)GTAGTAGAAGGta
    agtgtccGTAG

24
Position-Specific Scoring Matrix
  • Assume the sequence GTAGTAGAAGGTAAGTGTCCGTAG
  • Find the score forGTAGTAgaaggtaagTGTCCGTAG
    (-8)GTAGTAGaaggtaagtGTCCGTAG (8.33)GTAGTAGAAGGta
    agtgtccGTAG (0.24)

25
PSI-BLAST
  • Confirming relationships of purine
  • nucleotide metabolism proteins

26
PSI-BLAST
e value cutoff for PSSM
27
RESULTS Initial BLASTP
Same results as protein-protein BLAST
28
Results of First PSSM Search
Other purine nucleotide metabolizing enzymes not
found by ordinary BLAST
29
Third PSSM Search Convergence
Write a Comment
User Comments (0)
About PowerShow.com