Intermediate Perl Programming - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Intermediate Perl Programming

Description:

Title: No Slide Title Author: Admin Last modified by: Admin Created Date: 7/15/2001 5:18:34 PM Document presentation format: On-screen Show Company – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 23
Provided by: Adm9436
Category:

less

Transcript and Presenter's Notes

Title: Intermediate Perl Programming


1
Intermediate Perl Programming
Todd Scheetz July 18, 2001
2
Review of Perl Concepts
Data Types scalar array hash Input/Output open(FI
LEHANDLE,filename) line ltFILEHANDLEgt print
line Arithmetic Operations , -, , /, ,
, !
3
Review of Perl Concepts
Control Structures if if/else if/elsif/else forea
ch for while
4
Regular Expressions
General approach to the problem of pattern
matching REs are a compact method for
representing a set of possible strings without
explicitly specifying each alternative. For this
portion of the discussion, I will be using to
represent the scope of a set. A A,AA Ø
empty set
5
Regular Expressions
In addition, the will be used to denote
possible alternatives. AB A,B With just
these semantics available, we can begin building
simple Regular Expressions. ABAB AA, AB,
BA, BB AAABBB AAABB,AABBB
6
Regular Expressions
Additional Regular Expression components 0 or
more of the specified symbol 1 or more of the
specified symbol A A, AA, AAA, A Ø,
A, AA, AAA, AB A, AB, ABB, ABBB,
AB Ø, A, B, AA, AB, BA, BB, AAA,
7
Regular Expressions
What if we want a specific number of
iterations? A2,4 AA, AAA, AAAA AB1,2
A, B, AA, AB, BA, BB What if we want any
character except one? A B What if we want
to allow any symbol? . A, B . Ø, A, B,
AA, AB, BA, BB,
8
Regular Expressions
All of these operations are available in
Perl Several shortcuts \d 0, 2, 3, 4,
5, 6, 7, 8, 9 \w\s\w , Hello World,
9
Pattern Matching
Perl supports built-in operations for pattern
matching, substitution, and character
replacement Pattern Matching if(line
m/Rn.\d/) ... In Perl, REs can be a part
of the string rather than the whole string. -
beginning of string - end of string
10
Pattern Matching
Back references if(line m/(Rn.\d)/)
UniGene_label 1
11
Regular Expressions
file my_fasta_file open(IN,
file) line_count 0 while(line ltINgt)
if(line m/\gt/) line_count pri
nt There are line_count FASTA sequences in
file.\n
12
Pattern Matching
UniGene data file ID Bt.1 TITLE
Cow casein kinase II alpha EXPRESS
placenta PROTSIM ORGCaenorhabditis
elegans PROTSIM ORGMus musculus
PROTGI SCOUNT 2 SEQUENCE
ACCM93665 NIDg162776 SEQUENCE
ACCBF043619 NID // ID Bt.2 TITLE
Bos taurus cyclin-dependent ...
13
Pattern Matching
Lets write a small Perl program to determine how
many clusters there are in the Bos taurus UniGene
file.
14
Pattern Matching
Now well build a Perl program that can write an
HTML file containing some basic links based on
the Bos taurus UniGene clustering. Important h
ttp//www.ncbi.nlm.nih.gov80/entrez/query.fcgi?cm
dRetrievedbNucleotidelist_uidsGID_HEREdoptG
enBank
15
Substitution
Pattern matching is useful for counting or
indexing items, but to modify the data,
substitution is required. Substitution searches
a string for a PATTERN and, if found, replaces it
with REPLACEMENT. line s/PATTERN/REPLACEMENT/
Returns a value equal to the number of times
the pattern was found and replaced. result
line s/PATTERN/REPLACEMENT/
16
Substitution
Substitution can take several different
options. specified after the final slash The
most useful are g - global (can substitute at
more than one location) i - case insensitive
matching string One fish, Two fish, Red
fish, Blue fish. string s/fish/dog/g print
string\n One dog, Two dog, Red dog, Blue dog.
17
Substitution
Example Removing leading and trailing
white-space line s/\s(.?)\s/1/ a ?
performs a minimal match it will stop at the
first point that the remainder of the expression
can be matched. line s/\s(.)\s/1/ this
statement will not remove trailing white-space,
instead the white space is retained by the .
18
Character Replacement
A similar operation to substitution is character
replacement. line tr/a-z/A-Z/ count_CG
line tr/CG/CG/ line tr/ACGT/TGCA/ lin
e s/A/T/g line s/C/G/g line
s/G/C/g line s/T/A/g
19
Character Replacement
while(line ltINgt) count_CG line
tr/CG/CG/ count_AT line
tr/AT/AT/ total count_CG
count_AT percent_CG 100 (count_CG/total)
print The sequence was percent_CG
CG-rich.\n
20
Subroutines
One of the most important aspects of programming
is dealing with complexity. A program that is
written in one large section is generally more
difficult to debug. Thus a major strategy in
program development is modularization. Break the
program up into smaller portions that can each be
developed and tested independently. Makes the
program more readable, and easier to maintain and
modify.
21
Subroutines
EXAMPLE Reading in sequences from
UniGene.all.seq file Multiple FASTA sequences in
a single file, each annotated with the UniGene
cluster they belong to. GOAL Make an output
file consisting only of the longest sequence from
each cluster.
22
Subroutines
ISSUES 1. Want to design and implement a usable
program 2. Use subroutines where useful to reduce
complexity. 3. Minimize the memory requirements.
(human UniGene seqs gt 2 GB)
Write a Comment
User Comments (0)
About PowerShow.com