Bioperl modules - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Bioperl modules

Description:

Bioperl modules. Object Oriented Programming in Perl (1) ... Components of a module name are separated by double colons (::). For example, Math::Complex ... – PowerPoint PPT presentation

Number of Views:192
Avg rating:3.0/5.0
Slides: 40
Provided by: xiaodo5
Category:

less

Transcript and Presenter's Notes

Title: Bioperl modules


1
Bioperl modules
2
Object Oriented Programming in Perl (1)
  • Defining a class
  • A class is simply a package with subroutines that
    function as methods.

!/usr/local/bin/perl package Cat sub new
sub meow
3
Object Oriented Programming in Perl (2)
  • Perl Object
  • To initiates an object from a class, call the
    class new method.

new_object new ClassName
  • Using Method
  • To use the methods of an object, use the -gt
    operator.

cat-gtmeow()
4
Object Oriented Programming in Perl (3)
  • Inheritance
  • Declare a class array called _at_ISA.
  • This array store the name and parent class(es) of
    the new species.

package NorthAmericanCat _at_NorthAmericanCatISA
(Cat) sub new
5
Perl Modules
  • A Perl module is a reusable package defined in a
    library file whose name is the same as the name
    of the package.

6
Names of perl modules
  • Each Perl module has a unique name.
  • To minimize name space collision, Perl provides a
    hierarchical name space for modules.
  • Components of a module name are separated by
    double colons ().
  • For example,
  • MathComplex
  • MathApprox
  • StringBitCount
  • StringApprox

7
Module files
  • Each module is contained in a single file.
  • Module files are stored in a subdirectory
    hierarchy that parallels the module name
    hierarchy.
  • All module files have an extension of .pm.

Module Is stored in
Config Config.pm
MathComplex Math/Complex.pm
StringApprox String/Approx.pm
8
Module libraries
  • The Perl interpreter has a list of directories in
    which it searhces for modules.
  • Global arry _at_INC
  • gtperl V
  • _at_INC
  • /usr/local/lib/perl5/5.00503/sun4-solaris
  • /usr/local/lib/perl5/5.00503
  • /usr/local/lib/perl5/site-perl/5.005/sun4-solaris
  • /usr/local/lib/perl5/site-perl/5.005

9
Using Modules
  • A module can be loaded by calling the use
    function.
  • use Foo
  • bar( a ) using bar method
  • blat( b ) using blat method

10
Bioperl toolkit
  • Core package (bioperl-live)
  • THE basic package and its required by all the
    other packages
  • Run package (bioperl-run)
  • Providing wrappers for executing some 60 common
    bioinformatics applications
  • DB package (bioperl-db)
  • Subproject to store sequence and annotation data
    in a BioSQL relational database
  • Network package (bioperl-network)
  • Parses and analyzes protein-protein interaction
    data
  • Dev package (bioperl-dev)
  • New and exploratory bioperl development

11
(No Transcript)
12
Bioperl Object-Oriented
  • The Bioperl takes advantages of the OO design to
    create a consistent, well documented, object
    model for interacting with biological data in the
    life sciences.
  • Bioperl Name space
  • The Bioperl package installs everything in the
    Bio namespace.
  • (where are the packages stored???)

13
Bioperl Objects
  • Sequence handling objects
  • Sequence objects
  • Alignment objects
  • Location objects
  • Other Objects
  • 3D structure objects, tree objects and
    phylogenetic trees, map objects, bibliographic
    objects and graphics objects

14
Sequence handling
  • Typical sequence handling tasks
  • Access the sequence
  • Format the sequence
  • Sequence alignment and comparison
  • Search for similar sequences
  • Pairwise comparisons
  • Multiple alignment

15
Sequence Annotation
  • BioSeqFeature Sequence object can have
    multiple sequence feature (SeqFeature) objects
    (e.g. Gene, Exon, or Promoter objects) associated
    with it.
  • BioAnnotation A Seq object can also have an
    Annotation object (used to store database links,
    literature references and comments) associated
    with it

16
Sequence Input/Output
  • The BioSeqIO system was designed to make
    getting and storing sequences to and from the
    myriad of formats as easy as possible.

17
Accessing sequence data
  • Bioperl supports accessing remote databases as
    well as local databases.
  • Bioperl currently supports sequence data
    retrieval from the GenBank, Genpept, RefSeq,
    SwissProt, and EMBL databases

18
Format the sequences
  • SeqIO object can read a stream of sequences in
    one format Fasta, EMBL, GenBank, Swissprot, PIR,
    GCG, SCF, phd/phred, Ace, or raw (plain
    sequence), then write to another file in another
    format

19
Manipulating sequence data
  • seqobj-gtdisplay_id() the human readable id of
    the sequence
  • seqobj-gtsubseq(5,10) part of the sequence as
    a string seqobj-gtdesc() a description of the
    sequence
  • seqobj-gttrunc(5,10) truncation from 5 to 10
    as new object
  • seqobj-gtrevcom reverse complements sequence
  • seqobj-gttranslate translation of the
    sequence

20
Search result parsing
  • The BioSearchIO system was designed for
    parsing sequence database searches (BLAST, sim4,
    waba, FASTA, HMMER, exonerate, etc.)

21
Manipulating alignment
  • The BioAlignIO system was designed for
    manipulating the alignment objects in different
    formats including aln, phylip, fasta, etc.

22
Example Format the sequences
  • Example
  • using seq_formating.pl to convert
    sequences.gb to another format

23
Copy the files to the current directory
Check whether the files are executable
Now, lets look at the genbank file.
24
The home directory in Windows system.
If you have Notepad installed, click Edit with
Notepad. If not, try to open sequence.gb
with Notepad program.
25
(No Transcript)
26
uncheck
27
The format of the input sequences.
28
The perl script file
29
(No Transcript)
30
If no arguments were supplied, a usage
information will appear for instructions.
31
ltentergt
Program name
Format of the input sequences
Format of the output sequences
Input file
Output file
32
Program suceeded! Now its time to look at the
file generated.
33
Use command prompt to run the script
34
Type cdltspacegtc\BioDownload To enter the
BioDownload folder
35
  • Type
  • dir
  • To display the files in the current folder (NOT
    ls)
  • You should have the following files in the folder
  • (you may have other files, but thats fine)
  • seq_formating.pl
  • sequences.gb.txt

36
Type perlltspacegtseq_formating.plltspacegtsequences.
gb.txtltspacegtgenbankltspacegtsequences.fastaltspacegtf
asta
37
Output file
38
The format of the output sequences.
39
Parsing the BLAST output
Whats next
Write a Comment
User Comments (0)
About PowerShow.com