Stand alone BLAST on Linux - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Stand alone BLAST on Linux

Description:

Unzip the distribution. Helpful info: Standalone BLAST is ... Unzip the swissprot database. View the contents of the swissprot database. FASTA format ... – PowerPoint PPT presentation

Number of Views:396
Avg rating:3.0/5.0
Slides: 43
Provided by: stephe78
Category:
Tags: blast | alone | linux | stand | unzip

less

Transcript and Presenter's Notes

Title: Stand alone BLAST on Linux


1
Stand alone BLAST on Linux
CBW Bioinformatics Vancouver 2004 Lab 4.1
Sohrab Shah bioinformatics.ubc.ca sohrab_at_bioinfo
rmatics.ubc.ca
Stephanie Minnema University of Calgary Will
Hsiao Simon Fraser University
2
Outline
  • What is stand alone BLAST?
  • Why stand alone BLAST?
  • Installing BLAST
  • Formatting databases for BLAST
  • Running stand alone BLAST searches
  • Changing parameters
  • Formatting BLAST output
  • Assignment

3
What is stand alone BLAST?
  • A local installation of the NCBI BLAST suite of
    programs
  • Requires CPU, disk and RAM
  • The same application that drives the NCBI WWW
    BLAST server
  • Software distribution and documentation available
    from
  • ftp//ftp.ncbi.nih.gov/blast/executables/release

4
Why stand alone BLAST?
  • Allows creation of custom databases
  • Specific data sets for specific tasks
  • Increase computational efficiency
  • Increase specificity of results
  • Secure querying
  • Important for IP protection no internet traffic
  • Facilitates high-throughput analyses
  • No queues only competing with internal users
  • Can automate searches

5
Some drawbacks
  • Often need significant hardware resources
  • Need to maintain the databases

6
Installing BLAST
  • The BLAST distribution
  • Point your browser to
  • ftp//ftp.ncbi.nih.gov/blast/executables/release
  • Mailing list
  • http//www.ncbi.nlm.nih.gov/mailman/listinfo/blast
    -announce
  • Distribution announcements
  • Bug reports/fixes

7
ftp//ftp.ncbi.nih.gov/blast/executables/release/2
.2.6
We have already downloaded the distribution, but
this is the ftp directory
8
Installing BLAST
9
Unpack the distribution
Unzip the distribution
Helpful info Standalone BLAST is distributed as
a gziped tar archive The .gz file extension
indicates that the file has been compressed with
gzip a standard Unix compression utility The
gunzip utility uncompresses the file See gt man
gunzip for more info
10
Unpack the distribution
Untar the distribution
  • Helpful info
  • The .tar extension indicates that the file is a
    tape archive created with tar a standard Unix
    archiving tool
  • The tar command above extracts the archive into
    the current working directory
  • See gt man tar for more info
  • x extract
  • p preserve permissions
  • f file

11
List the contents of the distribution
  • A suite of tools for
  • running various blast searches
  • formatting and extracting sequences
  • Documentation
  • README. files read em!
  • Data files with scoring matrices
  • data

12
Configuring BLAST
  • We need to configure the system so the BLAST
    programs can function correctly
  • Set the PATH environment variable by editing
    /.bashrc

Save the file
13
Configuring BLAST
  • We need to set up a configuration file /.ncbirc
    to point to the data directory in the
    distribution
  • Open a file
  • emacs /.ncbirc
  • Save the file

14
Exit the shell
  • Exit the shell
  • Start a new shell
  • When you start a new shell, your environment will
    be set up to run BLAST

15
Formatting the swissprot database for BLAST
  • Change directory to /home/guest/blast/db
  • View the contents of the directory
  • Unzip the swissprot database

16
View the contents of the swissprot database
17
FASTA format
gtSOME DEFINITION OF THE SEQUENCE
\n ACGATCGACTACGATCAGCAGCATAGCTACAGATAG
18
FASTA -gt BLASTable
  • FASTA formatted files are not compatible for the
    BLAST programs
  • You need to prepare the FASTA files for BLAST
    with formatdb
  • This indexes the entries in the FASTA file and
    enables BLAST to run much faster

19
formatdb
  • Formats FASTA formatted databases for BLAST

20
Formatting swissprot
  • Format the swissprot database using formatdb
  • List the contents of the directory
  • The formatdb command will take a few minutes
  • Useful info
  • there should be seven files that are a
    combination of indexes and data
  • note the formatdb.log file
  • View its contents with more formatdb.log
  • Ignore WARNING errors potential bug in new
    release
  • You should see Formatted 143046 sequences in
    volume 0 as the last line in the file

21
formatdb documentation
22
Running BLAST - parameters
23
Running BLAST - parameters
24
Running BLAST - parameters
25
Running BLAST - parameters
26
Running BLAST try it
  • Change directory to /home/guest/Lab4.1
  • List the contents
  • Useful info
  • bact_genome.fna 12Kb of genomic sequence of
    Pseudomonas aeruginosa for the assignment
  • hs_tryp_trna_synth.aa Human tryptophanyl tRNA
    synthetase to try command psi-blast
  • test_blast.aa test protein to try blastp and
    rpsblast
  • unknown1.aa mystery protein for assignment
  • unknown2.aa mystery protein for assignment

27
Running BLAST try it
Run the blastall command below What will this
command do?
What is the protein in test_blast.aa? Repeat the
search with a higher e-value cut-off (10) . How
does the output change?
28
BLAST output
NEW
29
BLAST output
NEW
30
rpsblast
  • Reverse Position Specific BLAST
  • Query protein sequence
  • Database domains
  • We have installed Pfam on your laptop
  • http//pfam.wustl.edu/
  • Other domain databases Smart
  • http//smart.embl-heidelberg.de/
  • CDD
  • http//www.ncbi.nih.gov/Structure/cdd/cdd.shtml
  • For creating local blastable domain databases,
    consult
  • ftp//ftp.ncbi.nih.gov/pub/mmdb/cdd/README

31
Running rpsblast
32
Run rpsblast
  • Search test_blast.aa against Pfam
  • Produce HTML output with T
  • Open the results in your browser

What domains are present?
33
rpsblast output
NEW
34
Running psiblast
  • Preferred option when dealing with an unknown
    protein
  • or trying to find distant homologues
  • Much more sensitive than blastall
  • Less specific with each iteration
  • Use blastpgp to run psiblast on the command line

35
blastpgp parameters
36
blastpgp parameters
37
blastpgp parameters
38
blastpgp parameters
39
Running psiblast (blastpgp)
  • Search swissprot with human tryp tRNA synthetase
    using psiblast with 4 iterations. Generate HTML
    output

How does the hit list change with each
iteration? How can the matrix.ctx file be used in
downstream analysis?
40
psiblast results
41
Further information
  • Consult README files in BLAST distribution

42
Summary
  • A standalone BLAST server enables custom, secure,
    high throughput searches
  • BLAST distribution available from
  • ftp//ftp.ncbi.nih.gov/blast/executables/release
  • Use command line parameters to tune your
    searches and format your results
  • Use different BLAST tools for different purposes
  • Regular (blastall blastp, blastn, blastx,
    tblastn, tblastx)
  • Searching for domains (rpsblast cdd search)
  • distant homologues (blastpgp psi/phi blast)

43
Assignment
  • Four questions
  • Running
  • blastp identify a protein
  • rpsblast search for domains in a protein
  • blastx annotate a genomic sequence
  • psiblast find a function for an unknown protein
  • Some searches may take a few minutes
  • Where applicable report the e-value of hits and
    their locations on the query sequence and the
    command you used to run the search
  • No longer than 2 printed pages
  • Submit to Saara by Fri 9am
Write a Comment
User Comments (0)
About PowerShow.com