Computer Programming for Biologists - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Computer Programming for Biologists

Description:

Computer Programming for Biologists – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 29
Provided by: Owne1035
Category:

less

Transcript and Presenter's Notes

Title: Computer Programming for Biologists


1
Computer Programming for Biologists
Class 4 Feb 10th, 2012 Karsten
Hokamp http//bioinf.gen.tcd.ie/GE3027
2
Computer Programming for Biologists
Overview
  • Revision
  • Project
  • Loop control
  • Regular Expressions

3
Computer Programming for Biologists
Revision program components
  • expressions
  • 42
  • base eq T
  • num 1 0
  • statements
  • seq atgaacgt
  • print hello world!\n
  • operators
  • , -, , /, , ,
  • built-in functions
  • print, shift, length,
  • key words
  • foreach, while, if,

4
Computer Programming for Biologists
Revision Scalars
  • most basic variable type
  • indicated by dollar sign
  • assign value (right to left)
  • sequence atgaacctctac
  • repeat 5
  • in ltgt
  • default _

5
Computer Programming for Biologists
Revision Arrays
  • ordered list of elements
  • indicated by at symbol (_at_)
  • index starts at 0
  • _at_letters (a..z) ? (a, b, c, d, e, , x,
    y, z)
  • Use and when working on individual elements
  • letters0 A
  • first letters0
  • last letters-1
  • default _at_ARGV

Index 0 1 2 3 4 23 24 25
6
Computer Programming for Biologists
Revision built-in functions
upper uc(in) upper_first
uc_first(in) backwards reverse(in) len
length(sequence) num scalar _at_ARGV file
shift _at_ARGV push _at_prot, aa
_at_bases split //, seq out join _,
_at_letters _at_found keys found _at_order sort
_at_letters _at_out sort a ltgt b _at_nums codon
substr seq, 0, 3, print Hello world!\n
7
Computer Programming for Biologists
Revision structures
  • branching
  • if (base eq t)
  • base u
  • ? also unless ()
  • if (base eq t)
  • base a
  • elsif (base eq a)
  • base t
  • else
  • loops
  • while (in ltgt)
  • out . uc(in)
  • ? also until ()
  • foreach element (_at_list)
  • i
  • print i) element

8
Computer Programming for Biologists
Revision conditions
  • Representative values
  • Examples
  • while (1) ... endless loop
  • if (text) ... true if text not nor
    0
  • if (_at_rows) ... true if array is not empty
  • until (i gt 100) ... comparison or
    expression
  • while (out substr seq, 0, 3, ) ...

9
Computer Programming for Biologists
True and False
  • false ? 0 or empty string ()
  • true ? value different from 0 (1 by default)
  • Comparisons are last in order of execution!

10
Computer Programming for Biologists
Revision data input
  • word(s) from list of command line parameters
  • in shift
  • equivalent to the following
  • in shift _at_ARGV

(command line arguments)
11
Data Input/Output
  • Reading from STDIN, default input stream
  • in ltgt
  • equivalent to
  • in ltSTDINgt
  • Shell tries to stream from file(s) if command
    line argument(s) present
  • perl prog.pl input.txt
  • STDIN

12
Computer Programming for Biologists
Programming Strategy
  • start with pseudo code (comments)
  • code small bits and run
  • watch for warnings and errors
  • dare to try things out
  • check Perl documentation

13
Computer Programming for Biologists
Project
  • Implement the following in a program
  • Print a welcome message
  • Read input from a file
  • Separate header from sequence
  • Report length of sequence
  • Make sequence all upper case
  • Reverse-complement the sequence
  • Reformat sequence into 60 bp width
  • Provide position numbers at each line
  • Go to http//bioinf.gen.tcd.ie/GE3027/class3

14
Computer Programming for Biologists
Project
Exercise http//bioinf.gen.tcd.ie/GE3027/class4
15
Computer Programming for Biologists
Structures Breaks
  • Ways of breaking the loops
  • next continues with next loop
  • last continues after loop
  • exit exits program
  • example
  • print Type y to continue, q to quit
  • while (ltgt)
  • if (_ eq y)
  • last
  • elsif (_ eq q)
  • exit
  • else
  • next

16
Computer Programming for Biologists
Regular Expressions
  • constructs that describe patterns
  • powerful methods for text processing
  • search for patterns in a string
  • search and extract patterns
  • search and replace patterns

17
Computer Programming for Biologists
Regular Expressions
  • Examples
  • Look for a motif in a dna/protein sequence
  • Find low complexity repeats and mask with xs
  • Find start of sequence string in GenBank record
  • Extract e-mail addresses from a web-page
  • Replace strings, e.g. _at_tcd.ie with
    _at_gmail.com

18
Computer Programming for Biologists
Regular Expressions
Find a pattern in a string (stored in a
variable) sequence ataggctagctaga if (
sequence /ctag/ ) print Found!
string in which to search
delimiters
pattern
binding operator
without binding // to a variable, regular
expression works on _
19
Computer Programming for Biologists
Regular Expressions
Search modifier i make search
case-insensitive sequence ataggctagctaga i
f ( sequence /TAG/i ) print Found!
20
Computer Programming for Biologists
Regular Expressions
Metacharacters match at the beginning of a
line match at the end of the line . match
any character (except newline) \ escape the
next metacharacter sequence
gtsequence1\natgacctggaataggat if ( sequence
/gt/ ) line starts with gt print Found
Fasta header!
/\./ matches dot at end of line
21
Computer Programming for Biologists
Regular Expressions
Matching repetition a? match 'a' 1 or 0
times a match 'a' 0 or more times, i.e., any
number of times a match 'a' 1 or more times,
i.e., at least once an,m match at least n"
times, but not more than "m" times. an, match
at least "n" or more times an match exactly
"n" times sequence /a5,/ finds repeats
of 5 or more as
22
Computer Programming for Biologists
Regular Expressions
Search for classes of characters \d match a
digit character \w match a word character
(alphanumeric and _) \D match a non-digit
character \W match a non-word character \s
whitespace \S match a non-whitespace
character date 30 Jan 2009 if ( date
/\d1,2 \w \d2,4/ ) print Correct date
format!
also matches 1 February 09
23
Computer Programming for Biologists
Regular Expressions
Match special characters \t matches a tabulator
(tab) \b matches a word boundary \r matches
return \n matches UNIX newline \cM matches
Control-M (line-ending in Windows) while (my
line ltgt) if (line /\cM/) warn
Windows line-ending detected!
24
Computer Programming for Biologists
Regular Expressions
Search for range of characters match at
least one of the characters specified within
these brackets - specifies a range, e.g. a-z,
or 0-9 match any character not in the list,
e.g. A-Z sequence ataggctapgctaga if (
sequence /acgt/ ) print Sequence
contains non-DNA character
is a special variable containing the last
pattern match and contain strings before
and after match
25
Computer Programming for Biologists
Regular Expressions
Search and replace (substitute) s/pattern1/patter
n2/ sequence ataggctagctaga rna
sequence rna s/t/u/ -gt auaggctagctaga
Only the first match will be replaced!
26
Computer Programming for Biologists
Regular Expressions
Modifiers for substitution i case
in-sensitive g global s match includes
newline sequence ataggctagctaga rna
sequence rna s/t/u/g -gt auaggcuagcuaga
replaces all t in the line with u
27
Computer Programming for Biologists
Regular Expressions
Example Clean up a sequence string sequence
1 ataggctagctagat 16 ttagagctagta sequence
s/actg//g -gt ataggctagctagatttagagctagta
Deletes everything that is not a, c, t, or g.
28
Computer Programming for Biologists
Regular Expressions
  • Extract matched patterns
  • put patterns in parentheses
  • \1, \2, \3, refers back to ()s within pattern
    match
  • 1, 2, 3, refers back to ()s after pattern
    match
  • sequence gttest\natgtagagctagta
  • if (sequence /gt(.)/) id 1
  • or
  • email s/(.)\_at_(.)\.(.)/\1 at \2 dot \3/
  • print Changed address to 1 at 2 dot 3\n

changes kahokamp_at_tcd.ie to kahokamp at tcd dot
ie
Write a Comment
User Comments (0)
About PowerShow.com