Basic Local Alignment Search Tool (BLAST) - PowerPoint PPT Presentation

About This Presentation
Title:

Basic Local Alignment Search Tool (BLAST)

Description:

In bioinformatics, a sequence alignment is a way of arranging the primary ... Local (Smith-Waterman) Finds an alignment for parts of the two strings ... – PowerPoint PPT presentation

Number of Views:833
Avg rating:3.0/5.0
Slides: 97
Provided by: davidd99
Category:

less

Transcript and Presenter's Notes

Title: Basic Local Alignment Search Tool (BLAST)


1
Basic Local Alignment Search Tool (BLAST)
  • Katie Moreland

2
Overview
  • Sequence Alignment
  • Dynamic Programming
  • BLAST tutorial
  • Example execution of BLAST
  • References

3
Sequence Alignment
  • In bioinformatics, a sequence alignment is a way
    of arranging the primary sequences of DNA, RNA,
    or protein to identify regions of similarity that
    may be a consequence of functional, structural,
    or evolutionary relationships between the
    sequences. (http//wikipedia.org)
  • Example Alignment
  • G A A T T C A G T T A
  • G G A - T C - G - - A

4
Sequence Alignment Cont
  • Motivations
  • Similar primary structure in proteins implies
    similar form and function
  • Similar short sequences can lead to motif finding
    (ie promoter regions)
  • Similarities between gene regions can be used for
    phylogenetic classification

5
Sequence Similarity
  • Alignments are not unique
  • Need a way to compare alignments to find optimal
  • Optimal Alignment is the alignment that maximizes
    the overall score (may not be unique)
  • Three possibilities when aligning a character for
    each string (perfect match, mismatch, indel)
  • Align the two characters
  • Perfect Match Mismatch
  • C C
  • C G
  • Insertion/Deletion (indel)
  • Gap in 1st string (S) Gap in 2nd string (T)
  • - C
  • C -

6
Sequence Similarity Cont
  • Simple Metric
  • s(x,x) 1 (match)
  • s(x,y) -1 (mismatch)
  • s(x,-) s(-,x) -1 (indel)
  • In practice it is useful to define a substitution
    matrix such as PAM250 to take probabilities of
    certain mutations into account.
  • ie cost of mutation to a chemically similar
    amino-acid less than cost of mutation to
    dissimilar amino-acid
  • Cost of indels depends on application

7
Intro to Dynamic Programming
  • Used to reduce time complexity of algorithms with
    certain properties
  • Characteristics of Dynamic Programming
  • Overlapping subproblems (otherwise
    recursion/divide and conquer)
  • Optimality of subproblems (ie Shortest Path)

8
Intro to Dynamic Programming
  • Two types of alignment
  • Global (Needleman-Wunsch)
  • Attempt to align every residue in the sequences
  • Most useful when sequences are similar in size
    and sequence
  • Local (Smith-Waterman)
  • Finds an alignment for parts of the two strings
  • Most useful for dissimilar sequences that share
    regions of similarity or contain similar motifs

9
Needleman-Wunsch Algorithm
  • Input Two strings, S and T
  • Construct a matrix with S1 rows and T1
    columns
  • Label each row with a symbol from S and each
    column with a symbol from T, except for the first
    position in each which represents an initial gap
  • Beginning at upper left corner
  • Move diagonally to represent aligning the two
    characters from the strings
  • Move right to represent inserting a space in S
  • Move down to represent insert a space in T
  • Update when newScore gt oldScore (include arrow to
    show which cell we came from)
  • Optimal alignment score is in bottom right corner
    of matrix
  • Backtrack to find optimal alignment

10
Needleman-Wunsch Algorithm
  • Sequences to Align
  • S GCTC
  • T CGTTC
  • Simple Scoring Function
  • s(x,x) 2 (match)
  • s(x,y) -1 (mismatch)
  • s(x,-) s(-,x) -1 (indel)

11
Tracing Needleman-Wunsch
C G T T C

G
C
T
C
12
Tracing Needleman-Wunsch
C G T T C
0
G
C
T
C
13
Tracing Needleman-Wunsch
C G T T C
0 -1
G -1 -1
C
T
C
14
Tracing Needleman-Wunsch
C G T T C
0 -1
G -1 -1
C
T
C
15
Tracing Needleman-Wunsch
C G T T C
0 -1 -2
G -1 -1 1
C
T
C
16
Tracing Needleman-Wunsch
C G T T C
0 -1 -2
G -1 -1 1
C
T
C
17
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3
G -1 -1 1 -3
C
T
C
18
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3
G -1 -1 1 -3
C
T
C
19
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4
G -1 -1 1 -3 -4
C
T
C
20
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4
G -1 -1 1 -3 -4
C
T
C
21
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 -3 -4 -5
C
T
C
22
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 -3 -4 -5
C
T
C
23
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 -3 -4 -5
C
T
C
24
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 -3 -4 -5
C
T
C
25
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 -3 -4 -5
C -2 1
T
C
26
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 -3 -4 -5
C -2 1
T
C
27
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 -3 -4 -5
C -2 1 -2
T
C
28
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 -3 -4 -5
C -2 1 -2
T
C
29
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -4 -5
C -2 1 0 0
T
C
30
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -4 -5
C -2 1 0 0
T
C
31
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -5
C -2 1 0 0 -1
T
C
32
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -5
C -2 1 0 0 -1
T
C
33
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T
C
34
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T
C
35
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T
C
36
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T
C
37
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 -3
C
38
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 -3
C
39
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0
C
40
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0
C
41
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2
C
42
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2
C
43
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2
C
44
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2
C
45
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 -2
C
46
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 -2
C
47
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 0
C
48
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 0
C
49
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 0
C -4 -1
50
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 0
C -4 -1
51
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 0
C -4 -1 -1
52
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 0
C -4 -1 -1
53
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 0
C -4 -1 -1 -1
54
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 0
C -4 -1 -1 -1
55
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 0
C -4 -1 -1 1 1
56
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 0
C -4 -1 -1 1 1
57
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 1
C -4 -1 -1 1 1 4
58
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 1
C -4 -1 -1 1 1 4
59
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 1
C -4 -1 -1 1 1 4
60
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 1
C -4 -1 -1 1 1 4
61
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 1
C -4 -1 -1 1 1 4
62
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 1
C -4 -1 -1 1 1 4
63
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 1
C -4 -1 -1 1 1 4
64
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 1
C -4 -1 -1 1 1 4
65
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 1
C -4 -1 -1 1 1 4
66
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 1
C -4 -1 -1 1 1 4
67
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 1
C -4 -1 -1 1 1 4
68
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 1
C -4 -1 -1 1 1 4
69
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 1
C -4 -1 -1 1 1 4
70
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 1
C -4 -1 -1 1 1 4
71
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 1
C -4 -1 -1 1 1 4
72
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 1
C -4 -1 -1 1 1 4
73
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 1
C -4 -1 -1 1 1 4
74
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 1
C -4 -1 -1 1 1 4
75
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 1
C -4 -1 -1 1 1 4
76
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 1
C -4 -1 -1 1 1 4
77
Tracing Needleman-Wunsch
C G T T C
0 -1 -2 -3 -4 -5
G -1 -1 1 0 -1 -2
C -2 1 0 0 -1 1
T -3 0 0 2 2 1
C -4 -1 -1 1 1 4
78
Modifications for Local Alignment
  • Allow the algorithm to restart whenever it is
    advantageous to do so (start the algorithm from
    any position in S or T)
  • If 0 gt newScore, set score for cell I,j to 0
  • The optimal score is now the maximum value in all
    cells of the matrix (stop at any position in S or
    T)

79
Other Modifications
  • Use a gap penalty function to accommodate large
    areas of gaps vs many gaps of size 1
  • Biological motivations (ie mutations, cDNA
    matching)

80
BLAST
  • Basic Local Alignment Search Tool
  • http//www.ncbi.nlm.nih.gov/BLAST/
  • Features
  • Finds regions of local similarity between
    sequences
  • Heuristic approach achieves efficiency
    (important when searching entire databases of
    sequences)
  • Computes statistical significance of matches
  • Uses
  • Infer evolutionary/functional relationships
  • Identify members of gene families

81
BLAST Algorithm
  • Three Stages
  • Find hotspots exact matches of word lengthW in
    the two sequences being considered (idea good
    alignments for sequences will share regions of
    similarity, find first)
  • Extend hotspots in both directions using ungapped
    alignment to increase alignment score, pass high
    scoring sequences to stage 3
  • Perform gapped alignment between the 2 sequences
    using variation of Smith-Waterman algorithm. Only
    statistically significant alignments are
    displayed to the user.

82
BLAST Input
  • FASTA format
  • gtgi532319pirTVFV2ETVFV2E envelope protein
    ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLM NTTVTTG
    LLLNGSYSENRTQIWQKHRTSNDSALILLNKHYNL TVTCKRPGNKTVLP
    VTIMAGLVFHSQKYNLRLRQAWCHFPS NWKGAWKEVKEEIVNLPKERYR
    GTNDPKRIFFQRQWGDPE TANLWFNCHGEFFYCKMDWFLNYLNNLTVDA
    DHNECKNTS GTKSGNKRAPGPCVQRTYVACHIRSVIIWLETISKKTYAP
    PRE GHLECTSTVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRY KL
    VEITPIGFAPTEVRRYTGGHERQKRVPFVXXXXXXXXXXX XXXXXXXXX
    XXVQSQHLLAGILQQQKNL LAAVEAQQQMLKLTIWGVK
  • Accession/GI Number
  • Found using GenBank
  • In FASTA example, gi number is 532319

83
BLAST Input
84
BLAST Options
  • Select Program
  • blastp, blastn, etc
  • Select database(s) to search
  • Nr default, contains GenBank, PDB, SwissProt, and
    others
  • Gapped/Ungapped Alignment
  • Search within certain organism

85
BLAST Options Cont
  • Filtering on/off
  • On by default, locates low complexity regions in
    a sequence and removes them before performing an
    alignment
  • Low complexity region a region with highly
    biased amino acid composition
  • E Value Threshold
  • Default 10, represents the number of hits one
    can expect to find by chance when searching the
    database
  • Substitution Matrix
  • Default BLOSUM62
  • Assigns probability for each alignment position
    that a given substitution is known to occur
  • Other matrices are supported, including PAM
    matrices

86
BLAST Options
87
Advanced BLAST Options
  • -G Cost to open a gap Integer
  • default 11
  • -E Cost to extend a gap Integer
  • default 1
  • -e Expectation value (E) Real
  • default 10.0
  • -W Word size
  • default is 11 for blastn, 3 for other programs.
  • -v Number of one-line descriptions (V) Integer
  • default 100
  • -b Number of alignments to show (B) Integer
  • default 100

88
BLAST Output
  • Request ID
  • Query Information
  • Database Information
  • Taxonomy Reports Link
  • Graphical Display of alignments
  • Description of significant alignments
  • Pairwise alignments

89
BLAST Output Cont
90
Taxonomy Reports
  • Lineage Report
  • Hierarchical tree structure representing how many
    hits occurred in each group
  • 'focused' on the organism which yielded the
    strongest BLAST hit
  • Organism Report
  • Groups hits by species
  • Taxonomy Report
  • Summary of relationships between organisms in
    BLAST hit list

91
Graphical Display of Alignments
  • displays the top 100 sequence alignments for a
    search by default
  • Thick red bar at top represents query sequence,
    numbers correspond to amino acid residues
  • Hits represented by colored bars, mouse over the
    bar to view the definition and score in the text
    box, click to go to pairwise alignment
  • Bar color represents alignment similarity score
  • Color Key given above query sequence to determine
    ranges of similarities for a particular color

92
Graphical Display of Alignments
93
Description of Significant Alignments
  • Listed in order of decreasing significance
  • Default number displayed100

94
Pairwise Alignments
95
BLAST Demonstration
  • gtgi2501594spQ57997Y577_METJA PROTEIN MJ0577
  • MSVMYKKILYPTDFSETAEIALKHVKAFKTLKAEEVILLHVIDEREIKK
    RDIFSLLLGVAGLNKSVEEFENELKNKLTEEAKNKMENIKKELEDVGFKV
    KDIIVVGIPHEEIVKIAEDEGVDIIIMGSHGKTNLKEILLGSVTENVIKK
    SNKPVLVVKRKNS
  • http//www.ncbi.nlm.nih.gov/BLAST/

96
References
  • Altschul, SF, W Gish, W Miller, EW Myers, and DJ
    Lipman. Basic local alignment search tool. J Mol
    Biol 215(3)403-10, 1990."
  • 2. BLAST Tutorials
  • http//www.ncbi.nlm.nih.gov/Education/BLASTinfo/i
    nformation3.html
  • http//www.ornl.gov/sci/techresources/Human_Genom
    e/posters/chromosome/blast.shtml
  • http//wikipedia.org
  • 4. Hatzivassiloglou, V. http//www.hlt.utdalla
    s.edu/7Evh/Courses/Fall06/Lectures/Alignment20pa
    rt203.ppt
Write a Comment
User Comments (0)
About PowerShow.com