BLAST II - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

BLAST II

Description:

Joseph Bedell, Ian Korf, Mark Yandell (2003) BLAST. ... (Bedell et al. 2003) 39. consider using ungapped alignment for BLASTX, TBLASTN, and TBLASTX ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 46
Provided by: jimf98
Category:
Tags: blast | bedell

less

Transcript and Presenter's Notes

Title: BLAST II


1
BLAST (II)
Basic Local Alignment Search Tool
??? ?????? email jimfann_at_itri.org.tw 03/11/2008
2
Reference Sources
Jian Ye, Scott McGinnis, and Thomas L. Madden
(2006) "BLAST improvements for better sequence
analysis" Nucleic Acids Res. July 1 34 (Web
Server issue) W6-W9 McGinnis S, Madden TL.
(2004) "BLAST at the core of a powerful and
diverse set of sequence analysis tools." Nucleic
Acids Res. Jul 132 (Web Server issue)
W20-5. Altschul, S.F., Gish, W., Miller, W.,
Myers, E.W. Lipman, D.J. (1990) "Basic local
alignment search tool." J. Mol. Biol.
215403-410. Altschul, S.F., Madden, T.L.,
Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W.
Lipman, D.J. (1997) "Gapped BLAST and
PSI-BLAST a new generation of protein database
search programs." Nucleic Acids Res.
253389-3402. http//www.ncbi.nlm.nih.gov/BLAST/
ftp//ftp.ncbi.nih.gov/blast/ Joseph Bedell,
Ian Korf, Mark Yandell (2003) BLAST. O'Reilly
http//www.oreilly.com/catalog/blast/ http//ww
w.bioinfbook.org Jonathan Pevsner (2003)
Bioinformatics and Functional Genomics. John
Wiley Sons, Inc.
3
Contents
  • blastp
  • Protein-protein BLAST
  • PSI-BLAST
  • Position-Specific Iterated BLAST
  • PSSM/profile
  • PHI-BLAST
  • Pattern-Hit Initiated BLAST
  • pattern/motif
  • Choose the BLAST program
  • Tips to improve BLAST searches
  • Stand-alone BLAST

4
BLAST programs
http//www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMDWe
bPAGE_TYPEBlastHome
5
blastp
6
blastp
  • Enter Query Sequence
  • Choose Search Set
  • Program Selection
  • Algorithm parameters
  • General Parameters
  • Scoring Parameters
  • Filters and Masking

7
Peptide Sequence Databases (FASTA format)
8
BLAST programs
http//www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMDWe
bPAGE_TYPEBlastHome
9
Consensus sequences - Patterns - PSSM
  • Multiple sequence alignment (MSA) to detect
    conserved regions in protein or DNA sequences and
    to build models of these conserved regions
  • Consensus sequences
  • Patterns
  • Position Specific Score Matrices (PSSMs),
    Profiles
  • etc.

10
Consensus sequences
  • the simplest method to build a model from a
    multiple sequence alignment
  • Majority wins
  • Skip too much variation

11
Pattern
  • a set of alternative sequences, using regular
    expression
  • Prosite (http//www.expasy.org/prosite/)

12
The Prosite syntax for patterns
  • uses the standard IUPAC one-letter codes for
    amino acids (GGly, PPro, ...),
  • each element in a pattern is separated from its
    neighbor by a -,
  • the symbol X is used where any amino acid is
    accepted,
  • ambiguities are indicated by square parentheses
    (AG means Ala or Gly),
  • amino acids that are not accepted at a given
    position are listed between a pair of curly
    brackets (AG means any amino acid except
    Ala and Gly),
  • repetitions are indicated between parentheses (
    ) (AG(2,4) means Ala or Gly between2 and 4
    times, X(2) means any amino acid twice,
  • a pattern is anchored to the N-term and/or C-term
    by the symbols lt and gt respectively.

13
Pattern
  • ltA-x-ST(2)-x(0,1)-V
  • an Ala in the N-term,
  • followed by any amino acid,
  • followed by a Ser or Thr twice,
  • followed or not by any residue,
  • followed by any amino acid except Val.

14
PSSM (Position Specific Scoring Matrice)
15
PSSM (Position Specific Scoring Matrice)
  • set a pseudo-counts of 1

16
PSSM (Position Specific Scoring Matrice)
17
PSSM (Position Specific Scoring Matrice)
18
PSI-BLAST
19
BLAST programs
http//www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMDWe
bPAGE_TYPEBlastHome
20
PSI-BLAST(Position-Specific Iterated BLAST)
21
PSI-BLAST
22
PSSM (NCBI)
  • To save a PSSM file
  • Run a protein BLAST search.
  • Check the PSI-BLAST box on formatting page.
  • Click the "Format" Button.
  • On the PSI-BLAST results page, click the "Run
    PSI-BLAST Iteration 2" button.
  • Now, on the Format page, select "PSSM" from the
    "Show" pull down menu.
  • Click "Format" button.
  • This will display text output with the
    ASCII-encoded PSSM. The "Save as..." option of
    the browser can be used to save this to a plain
    text file on your hard drive.
  • To use the PSSM in a new protein BLAST search
    against other databases
  • Copy the above PSSM from the browser
  • Open a new protein BLAST page
  • Paste the PSSM in the PSSM field in the page
  • provide the SAME query in the search box
  • select a different target database
  • click "BLAST" button to start the search
  • If the database is the same as when the PSSM was
    stored, you'll reproduce the iteration on which
    you've saved the PSSM A different database will
    yield a different hit list.

23
PSI-BLAST
24
PSI-BLAST (Position-Specific Iterated BLAST)
1 Select a query and search it against a
protein database 2 PSI-BLAST constructs a
multiple sequence alignment then creates a
profile or specialized position-specific scoring
matrix (PSSM) 3 The PSSM is used as a query
against the database 4 PSI-BLAST estimates
statistical significance (E values) 5 Repeat
steps 3 and 4 iteratively, typically 5
times. At each new search, a new profile is used
as the query.
PSSM
PSSM
From http//bioweb.pasteur.fr/seqanal/blast/intro
-uk.html
25
PSI-BLAST vs BLASTp
  • PSI-BLAST could find more distant homologous than
    a simple BLAST search.
  • PSI-BLAST uses two E-values
  • the threshold E-value for the initial BLAST (-e
    option). The default is 10 as in the standard
    BLAST
  • the inclusion E-value to accept sequences (-h
    option) in the PSSM construction (default is
    0.005).

26
PSI-BLAST advantages
  • Fast because of the BLAST heuristic.
  • Allows PSSMs searches on large databases.
  • A particularly efficient algorithm for sequence
    weighting.
  • A very sophisticated statistical treatment of the
    match scores.
  • Single software.
  • User friendly interface.

27
PSI-BLAST pitfalls
  • Avoid too close sequences overfit!
  • Can include false homologous! Therefore check the
    matches carefully include or exclude sequences
    based on biological knowledge.
  • The E-value reflects the significance of the
    match to the previous training set not to the
    original sequence!
  • Choose carefully your query sequence.
  • Try reverse experiment to certify.

28
BLAST programs
http//www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMDWe
bPAGE_TYPEBlastHome
29
PHI-BLAST (Pattern-Hit Initiated BLAST)
30
PHI-BLAST (Pattern-Hit Initiated BLAST)
This dual requirement is intended to reduce the
number of database hits that contain the pattern,
but are likely to have no true homology to the
query.
From http//bioweb.pasteur.fr/seqanal/blast/intro
-uk.html
31
Choose the BLAST program
Program query Database
1 blastn DNA DNA 1 blastp protein pro
tein 6 blastx DNA protein
6 tblastn protein DNA
36 tblastx DNA DNA
32
Choose the BLAST program
http//www.ncbi.nlm.nih.gov/blast/producttable.sht
ml
33
Choose the BLAST program
http//www.ncbi.nlm.nih.gov/blast/producttable.sht
ml
34
Choose the BLAST program
http//www.ncbi.nlm.nih.gov/blast/producttable.sht
ml
35
(No Transcript)
36
BLAST searches
  • design experiment
  • query sequences
  • target databases
  • choose BLAST program
  • set parameters
  • run BLAST
  • data analysis

37
Tips to improve BLAST searches (1/3)
  • Don't use the default parameters
  • Treat BLAST searches as scientific experiments
  • Perform controls, especially in the twilight zone
  • View BLAST reports graphically
  • use the Karlin-Altschul equation to design
    experiments
  • when troubleshooting, read the footer first

(Bedell et al. 2003)
38
Tips to improve BLAST searches (2/3)
  • know when to use complexity filters
  • mask repeats in genomic DNA
  • segment large genomic sequences
  • be skeptical of hypothetical proteins
  • expect contaminants in EST databases
  • use caution when searching raw sequencing reads
  • look for stop codons and frame-shifts to find
    pseudo-genes

(Bedell et al. 2003)
39
Tips to improve BLAST searches (3/3)
  • consider using ungapped alignment for BLASTX,
    TBLASTN, and TBLASTX
  • look for gaps in coverage as a sign of missed
    exons
  • parse BLAST reports with BioPerl
  • perform pilot experiments
  • examine statistical outliers
  • how to lie with BLAST statistics

(Bedell et al. 2003)
40
Download from NCBI
Installing Stand-alone BLAST
The main advantage of Standalone BLAST is to be
able to create your own BLAST databases.
  • Excutables
  • ftp//ftp.ncbi.nlm.nih.gov/blast/executables/
  • database in FASTA (un/formated)
  • ftp//ftp.ncbi.nlm.nih.gov/blast/db/

41
formatdb
  • formatdb - //to display arguments
  • formatdb -i ecoli.nt -p F -o T //ecoli DNA

Query gt ABCD database \n
The smallest query/database
42
run blast
  • add pathC\blast
  • blastall - //to display options
  • blastall -p blastp -i query -d database -o output
  • blastall -p blastn -d ecoli.nt -i test.txt -o
    test.out

43
Thank You!
44
http//www.ncbi.nlm.nih.gov/blast/producttable.sht
ml
45
http//www.ncbi.nlm.nih.gov/blast/producttable.sht
ml
Write a Comment
User Comments (0)
About PowerShow.com