Introduction to PERL and RegExp - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Introduction to PERL and RegExp

Description:

Introduction to PERL and RegExp – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 30
Provided by: laurent4
Category:

less

Transcript and Presenter's Notes

Title: Introduction to PERL and RegExp


1
Introduction to PERL and RegExp
 Perl is a language of getting your job done 
Larry Wall
2
Course overview
  • Introduction to Perl language and Regular
    Expressions
  • Introduction to HTML and cgi-bin
  • Perl modules and in-liners
  • Perl object and references

3
Contents
  • Definitions
  • PERL, CGI...
  • History
  • Language basics
  • reserved words
  • operators
  • variables
  • subroutines
  • modules
  • Data Structure
  • Interpolation, quoting
  • File handling
  • Context
  • Special variables
  • Documents, help
  • Regular Expressions
  • Exercises

4
Definitions
  • PERL
  • Practical Extraction and Report Language
  • CGI
  • Common Gateway Interface
  • CPA N
  • Comprehensive Perl Archive Network

5
History
  • Created by Larry Wall in 1987
  • Last major release Perl5 in 1994 (currently Perl
    5.8)
  • 1996 creation of the CPAN repository of Perl
    modules and documentation
  • Today most cgi-bin behind web pages are perl
    scripts
  • Good starting sites
  • www.perl.org, www.perl.com, www.cpan.org
  • history.perl.org

6
Language basics (1)
  • Interpreted, JIT compiled
  • Context dependent
  • Object programming
  • Reserved words
  • Many!! E.g. if, else, until, while, for,
    foreach
  • Large number of predefined functions
  • See http//www.perldoc.com/perl5.8.0/pod/perlfunc.
    html
  • In UNIX systems
  • /usr/local/bin/perl
  • /usr/bin/perl

7
Language basics (2)
  • Variables
  • Scalar
  • var 12 car  abc 
  • Array
  • _at_var ( a , b , c , d )
  • Hash
  • var ( a ,first ,b,3)
  • var (
  •  a  gt  first ,
  •  b  gt 3,
  • )
  • Operators
  • - /
  • ltlt gtgt
  • ! and or not xor
  • . x . x
  • eq ! neq
  • lt lt gt gt
  • lt le gt ge
  • ltgt cmp
  • !

8
Language basics (3)
  • Special variables (e.g.)
  • _ default input
  • process ID
  • string of last match
  • _at_ARGV cmd-line args
  • ENV environment
  • Functions
  • sub funcname expr
  • funcname(parameters)
  • Get parameters by _at__
  • Comments
  • Start with
  • End with \n
  • Modules
  • Huge amount of free modules in CPAN repository
  • use mod_name
  • require mod_name

9
Data structure
  • Scalars
  • Integer x 12345
  • Float x 123.45
  • Scientific x 12.34E5
  • Octal x 0123
  • Hexadec. x 0xffff
  • Binary x 0b1100_1010
  • Chains x petit texte
  • Arrays / List
  • _at_list (un, deux, trois)
  • _at_list qw(un deux trois)
  • list
  • (a,b,c) (un, deux, trois)
  • What if ???
  • _at_list (un, deux, trois)
  • list (un, deux, trois)
  • size _at_list
  • print list1

10
Data structure
  • _at_clefs keys (hash)
  • _at_valeurs values (hash)
  • foreach k (keys hash)
  • print hashk
  • Warning the hash does not keep the order
  • Hashes
  • hash (rouge, 0xff0000, vert, 0x00ff00,
    bleu, 0x0000ff)
  • hash (
  • rouge gt 0xff0000,
  • vert gt 0x00ff00,
  • bleu gt 0x0000ff,
  • )

11
Data structure
  • References
  • \ similar to in C
  • refarray \_at_array
  • What if ???
  • refarray1 12
  • refarray1 15
  • refarray-gt1 18
  • print array1
  • Nested data structure
  • Arrays of arrays
  • Hashes of arrays
  • Arrays of Hashes
  • Hashes of hashes
  • And more!

12
Data structure
  • Global vs Local
  • Lexicals
  • my
  • our
  • Dynamics
  • local
  • What if ???
  • our var A
  • while (ltgt)
  • var D
  • local var B
  • print var
  • my var C
  • print var
  • last
  • print var

13
Interpolation, quoting
  • The quotes have different significations
  • price 100
  • print the price is price
  • This is called interpolation
  • Quoting (interpolated)
  • q// (chain)
  • qq// (chain)
  • qx// (execute)
  • ( ) qw// (word list)
  • // m// (motif match)

14
File handling
  • Opening a file
  • open(LIRE, filename)
  • Reading from a file
  • line ltLIREgt
  • Writing to a file
  • open(ECRIRE, gtfilename)
  • print ECRIRE line
  • Closing a file
  • close(LIRE) close(ECRIRE)
  • Special handlers
  • STDOUT, STDIN, STDERR
  • Select(STDOUT)
  • Piping
  • open(PIPE, ls -1 )
  • _at_filelist ltPIPEgt
  • Other system calls
  • system(command)
  • exec(command, options, filename)

15
File testing
  • File operators
  • -rwxo
  • -e exists
  • -z empty
  • -s not empty (return size)
  • -f simple file
  • -d directory
  • -l symbolic link
  • Example
  • open (READ, filename) if -f filename
  • What if ???
  • foreach (_at_ARGV)
  • next unless -f
  • fsize -s _
  • print("_ is fsize bytes long.\n")

16
Context
  • Scalar
  • a _at_b
  • Boolean
  • while (_at_a) shift _at_a
  • List
  • (a) _at_b
  • Empty
  • () f(a)
  • Interpolation
  • a b

17
Special variables
  • INC hash of modules
  • _at_INC list of lib directories
  • ENV hash of env variables
  • _at_ARGV list of arguments
  • _at__ auto list
  • _ auto var
  • _at_F in liner -a option
  • pid
  • 0 program name
  • autoflush
  • string of last match
  • / record delimiter
  • V PERL version

18
Documents, help, debugging
  • perl -h
  • perldoc ltkeywordgt
  • Web help
  • www.perl.org
  • www.perl.com
  • Books
  • OReilly
  • Debugging?
  • use strict
  • use warnings or -w
  • -d (debug mode)
  • man perldebug

19
Calling perl
  • perl file.pl
  • or in an executable use shebang
  • !/usr/bin/perl
  • Check syntax
  • perl -c file.pl

20
Exercises 1.1 and 1.2 on paper
  • 1.1 Write a little Hello World application
  • 1.2 Write a quadratic equation solver (polynomial
    of the 2nd degree)
  • It should solve ax2bxc0 by taking a,b and c
    values on the command-line and print x1 and x2 or
    tell if the equation is not solvable (deltalt0)

21
Regular Expressions
  • Idea powerful way to search for text patterns
  • Literal (or normal characters)
  • Alphanumeric
  • abcABC0123...
  • Punctuation
  • -_ ,.()/ ?!\ltgt"_at_
  • Metacharacters
  • Ex ls .java
  • Flavors
  • awk, egrep, Emacs, grep, Perl, POSIX, Tcl,
    PROSITE !

22
Patterns are regular expressions
  • Pattern ltA-x-ST(2)-x(0,1)-V
  • Regexp A.ST2.?V
  • Text The sequence must start with an alanine,
    followed by any amino acid, followed by a serine
    or a threonine, two times, followed by any amino
    acid or nothing, followed by any amino acid
    except a valine.
  • Simply the syntax differ

23
Regular Expressions (1)
  • In Perl
  • Start and End of line
  • start, end
  • Match any of several
  • ()
  • Match 0, 1 or more
  • . any, ? 0 or 1, 1 or more, 0 or more
  • m,n range
  • ! negation
  • Examples
  • Match every instance of a SwissProt AC
  • m/OPQ0-9A-Z0-930-9/
  • m/OPQ\dA-Z0-93\d/
  • Match every instance of a SwissProt ID
  • m/A-Z0-91,4_A-Z0-93,5/

24
Regular Expressions (2)
  • Escape character or back reference
  • \char or \num
  • Shorthand
  • \d digit 0-9
  • \s whitespace space\f\n\r\t
  • \w character a-zA-Z0-9_
  • \D\S\W complement of \d\s\w
  • Byte notation
  • \num character in octal
  • \xnum character in hexadecimal
  • \cchar control character
  • Match operator
  • m//
  • var m/colou?r/
  • var ! m/colou?r/
  • Substitution operator
  • s///
  • var s/colou?r/couleur/
  • Translate operator
  • tr///
  • revcomp tr/ACGT/tgca/
  • Modifiers //
  • /i case insensitive
  • /g global match
  • Many other /s,/m,/o,/x...

25
Regular Expressions (3)
  • Grouping
  • External reference
  • var s/sp\(\w\d5)/swissprot AC1/
  • Internal reference
  • var s/tr\(\w\d5)\\1/trembl AC1/
  • Numbering
  • 1 to 9
  • 10 to more if needed...
  • Exercises
  • Create a regexp to recognize any pseudo IP
    address 123.456.789. 12
  • Create a regexp to recognize any email address
    Jean.Dupond_at_isb-sib.ch
  • Create a regexp to change any HTML tag to another
  • ltaddressgt -gt ltpregt
  • On sib-dea
  • use visual_regexp-1.2.tcl to check your regular
    expressions (requires X-windows)

26
Regular Expressions (4)
27
Solution 1.1
  • !/usr/local/bin/perl
  • print "Hello World!\n"
  • exit 0

28
Solution 1.2
!/usr/local/bin/perl my (a, b, c) _at_ARGV if
(a eq '' b eq '' c eq '') die ("missing
variable!") my delta bb - 4ac if
(delta 0) my x -b/2a
print "delta 0, one solution only x x\n"
elsif (delta gt 0) my x1
(-bsqrt(delta))/2a my x2
(-b-sqrt(delta))/2a print "delta gt
0, two solutions x1 x1, x2 x2\n"
else print "delta lt 0, no
solutions!!" exit 0
29
Solution RegExp
  • /\d1,3\.3\d1,3/
  • /\w\.\w\_at_\w\-\w\.a-z2,3/
  • /\lt(\/?)address\gt/\lt1pre\gt/
  • generalized
  • address \w
Write a Comment
User Comments (0)
About PowerShow.com