Introduction to PERL - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Introduction to PERL

Description:

The split command divides up the elements of a scalar into an ... line='Time flies like an arrow; fruit flies like a banana.'; _at_time = split /s /, $line; ... – PowerPoint PPT presentation

Number of Views:170
Avg rating:3.0/5.0
Slides: 53
Provided by: jonfre
Category:

less

Transcript and Presenter's Notes

Title: Introduction to PERL


1
Introduction to PERL
  • Instructor Jon Frederick, M.S.
  • Division of Information Infrastructure,
  • UNIX/NT Systems Group
  • smiile_at_utk.edu, http//web.utk.edu/smiile

2
What is PERL?
  • Practical Extraction and Report Language
  • Invented by Larry Wall in 1986
  • Originally for UNIX system administration
  • Based on C, sed, awk, and English

3
Why is PERL Popular?
  • Easy to use
  • Default behavior e.g. print hello world
  • Free (GNU public license)
  • Available for every O.S. programs transport
    seamlessly
  • Modern hardware makes it run fast
  • No built-in limitations

4
Why is PERL popular?
  • Well-documented and supported
  • man perl
  • OReilly books, esp. The Perl CD-ROM
  • comp.lang.perl
  • the open-source code movement
  • www.cpan.org, thousands of free scripts and
    modules

5
  • !/usr/perl-5.6/bin/perl5.6.0
  • print "which file would you like to search?\n"
  • file ltSTDINgt chomp file
  • open (FH, "ltfile")
  • print "what pattern do you want to find?\n"
  • pattern ltSTDINgt chomp pattern
  • while (lineltFHgt)
  • chomp line push(_at_lines,line)
  • foreach line2 (_at_lines)
  • if (line2 /pattern/)
  • print "line2\n"

6
Running a Perl Program
  • In UNIX, first set file permissions
  • Chmod ux filename.pl
  • filename.pl or
  • perl filename.pl
  • In Windows/DOS,
  • C\gt filename.pl or C\gtperl filename.pl
  • or click on the file from Windows Explorer

7
Perl Programs
  • First line states path to the perl interpreter.
    At UTK UNIX
  • PATH VERSION
  • !/usr/misc/bin/perl 4.0
  • !/soft/script/bin/perl 5.003
  • !/usr/perl-5.6/bin/perl5.6.0 5.6.0
  • Unsure? Type, which perl and/or perl -v

8
Perl Programs
  • comments begin with a pound sign
  • and go until end-of-line
  • Commands are terminated by a
  • semicolon and can go for multiple lines
  • Blocks of commands are surrounded by
  • curly braces
  • The standard file suffix for a perl program is
    .pl
  • White space is polite but optional

9
Scalar Variables
  • Contain a single element, or string of characters
    or numbers (can be any length)
  • Begin with a dollar sign
  • Names are case-sensitive (var ne VAR)
  • Values are assigned with an equals sign
  • Variables do not need to be pre-declared.
    (automatically null or zero until you assign
    values).
  • Single quotes mean literal double mean
    interpolate Example nameuserid\n
  • gives the userid and a line return
  • usrid\n is the same as \userid\\n

10
Scalar Variables
  • Concatenation is achieved with a dot .
  • var1 Hello
  • var2 var1 . world!\n
  • Print var2 prints Hello world!
  • The default variable, _
  • print var foreach var (_at_list) works
  • Print _ foreach _ (_at_list) also works
  • print foreach (_at_list) also works

11
Array Variables
  • Arrays are an ordered list of strings
  • Names begin with the at sign _at_
  • Individual elements of an array are specified by
    their index number. For an array named _at_array,
    the first element could be referred to by
    array0 The last element is array-1 or
    array99 (if the array has 100 elements).

12
Array Variables
  • In a scalar context _at_array is the number of
    elements in the array
  • y _at_array
  • When quoted _at_array returns all of the elements
    in the array separated by a space.
  • array is the index number of the last element
    in the array, i.e., array(_at_array-1)
  • The default array is _at__ (not often used).

13
Array Input
  • Like scalar assignment
  • _at_foods (pizza, salad, beer) or
  • _at_foods qw(pizza salad beer)
  • Push adds element to the end of the list
  • while(xltFHgt) push (_at_lines,x)
  • Unshift adds element to the beginning
  • while(xltFHgt) unshift (_at_lines, x)

14
Array Input
  • Individual Array Elements can be assigned like
    scalar variables
  • foods0 pizza foods1 salad
  • Can be read from the standard input
  • _at_lines ltSTDINgt or push (_at_lines,_) while
    (ltgt)
  • The split command divides up the elements of a
    scalar into an array based on a delimiter
  • _at_line split / /, lines0

15
Array Input Split
  • Syntax split /delimiter/, string
  • Example
  • lineTime flies like an arrow\ fruit flies
    like a banana.
  • _at_time split /\s/, line
  • print time5 time6\n prints fruit
    flies
  • A neat trick, grab only elements 5 and 6
  • (word5,word6) (split, line)5,6

16
Array Output
  • By specifying the index
  • While (x lt _at_lines)
  • print linesx\n
  • x
  • By using foreach
  • foreach (_at_lines)
  • print _\n

17
Array Output
  • By using join
  • file join , _at_foods
  • file is now pizzasaladbeer
  • By using pop and shift
  • first shift _at_foods last pop _at_foods
  • Order can be sorted or reversed
  • _at_sorted sort _at_foods beer pizza salad
  • _at_reversed reverse _at_sorted salad pizza beer

18
File Input/Output
  • Prompt the user for input
  • Print which file would you like to read?\n
  • filename ltSTDINgt
  • chomp filename get rid of that pesky newline.
  • Use the default _at_ARGV array
  • _at_ARGV is list of arguments supplied by the user
    from the command line.
  • pattern ARGV0
  • filename ARGV1 for our script that
    executes
  • program.pl pattern filename

19
Input/Output
  • Since _at_ARGV is a default variable in PERL, you
    can open files explicitly
  • open(FH,ltARGV1)
  • while(varltFHgt) print var
  • close FH
  • Or, let Perl assume you know what youre doing
  • while(ltgt) print
  • opens the default file, ARGV0, assigns it to
    the
  • default filehandle, assigns each line of
    ARGV0
  • to the default variable, and prints.

20
Input/Output
  • The default or standard output is the terminal
    screen
  • while(ltgt)
  • print _ if (_ /pattern/)
  • Which can be redirected from the cmd line
  • program.pl pattern filename gt newfile

21
Input/Output
  • Or, you can explicitly open an output filehandle
  • open(OUT,gtoutput.txt)
  • while(_ltgt)
  • print OUT if (/pattern/)

22
Exercise Suppose you have the following SAS
output for 100 variables. Write a PERL program
that extracts and prints just the variable name
and the p-value of each signed rank test, one
variable name and p-value per line. The SAS
System 2311 Sunday, February 4,
2001 1 The
UNIVARIATE Procedure
Variable C3C4CAL1ALPHA Test
-Statistic- -----p Value------
Student's t t 0.095791 Pr gt t 0.9246
Sign M 1.5 Pr gt M
0.6776 Signed Rank S 5.5 Pr
gt S 0.8715
23
Control Structures
  • If (some_statement)
  • do something
  • do another something
  • elsif (other_statement)
  • do something else
  • else
  • do this only if both statements false
  • NOTE unless (!statement) eq if (statement)

24
Control Structures
  • While (some_statement)
  • do something until statement becomes false.
  • Equivalent to
  • Until (!some_statement)
  • do something

25
The Nature of Truth
  • 0 and are false everything else is true.
  • 0 converts to "0", so false
  • 1-1 computes to 0, then converts to "0", so
    false
  • 1 converts to "1", so true
  • empty string, so false
  • 1 not "" or "0", so true
  • 00 not "" or "0", so true (this one is weird,
    watch out)
  • "0.000" also true for the same reason and
    warning
  • Undef evaluates to "", so false
  • Schwartz, Christiansen and Wall, 1997. Learning
    Perl.

26
Boolean Operators
  • And
  • Or
  • ! Not
  • While (_ltgt (x!12))
  • print if (/Signed/ /Student/)
  • x
  • print lines containing Signed or Student from
  • the first twelve lines of the standard input.

27
Comparison Operators
  • Numeric String
  • Equal eq
  • Greater than gt gt
  • Less than lt lt
  • Greater than or equal gt ge
  • Less than or equal lt le
  • Not equal ! ne
  • Not equal with signed return ltgt cmp

28
Whats wrong with this Picture?
  • If ((x 25) (y lt 25) )
  • print y\n

29
Arithmetic Operations
  • Plus
  • Minus -
  • Divide / (floating point mode default)
  • Multiply
  • Exponentiate
  • Modulus

30
Example
  • Suppose I have a data file that has K variables
    and N observations, and I want the average for
    each K across all N
  • Obsn, Varname(1), varname(2), varname(k)
  • 1, data(1), data(2), . data (k)
  • 2, data(1), data(2), . data (k)
  • N, data(1), data(2), data(k)

31
  • while (ltgt) first read in each line and find
    the sum
  • chomp _at_eachline split /\,\s/, _
  • if (x lt 1) dont include the variable names
  • _at_varnames _at_eachline
  • _at_eachline () x
  • else
  • k 0
  • shift _at_eachline get rid of obsn number.
  • while (k lt _at_eachline)
  • sumk sumk eachlinek
  • k
  • n counts n, the number of obsns.

32
  • while (z lt k)
  • averagez
  • (int(1000 sumz / n))/1000
  • z
  • shift _at_varnames
  • print _at_varnames\n
  • print "_at_average\n"

33
Hashes
  • Also known as associative arrays
  • Consist of pairs of keys and values.
  • Useful for database implementations
  • Hash names begin with the percent sign
  • Unlike arrays which are ordered lists indexed by
    integers, hashes are unordered lists indexed by
    keys.
  • Example emails (Jon gt smiile_at_utk.edu,
  • AJ gt ajw_at_utk.edu)
  • print emailsAJ\n prints ajw_at_utk.edu
  • Note hashes indexed by curly, not square
    brackets!

34
Hash Input
  • Three ways to get data into a hash
  • Assigning, with commas
  • grades (Jon, A, Harley, C, Marco, B)
  • gt is a more readable synonym for comma
  • grades(JongtA, HarleygtC, MarcogtB)
  • Assign each element in scalar context
  • gradesJonA gradesHarleyC
    gradesMarcoB

35
Hash Input
  • If a key already exists, adding it to the hash
    will clobber the previous value of that key. To
    prevent this
  • unless (exists (emailsname))
  • emailsnameemail
  • Or
  • if (!emailsname)
  • emailsnameemail

36
Hash Output
  • Refer to the value with the key
  • print gradesMarco
  • Grab all the keys and sort alphanumerically
  • print sort keys grades
  • Just the values
  • print sort values grades

37
Hash Output
  • Values and keys
  • foreach key (keys grades)
  • print key got a gradeskey\n
  • Or use each
  • while ((name,grade) each(grades))
  • print name got a grade\n"

38
Hash Functions
  • Delete a key-value pair from a hash
  • delete hashnamekey
  • Make all the keys values and values keys
  • reverse hashname

39
A Sample CGI Form
40
A Sample CGI Script
  • !/usr/perl-5.6/bin/perl5.6.0
  • invoke the perl compiler
  • read(STDIN, buffer, ENV'CONTENT_LENGTH')
  • slurp in the data from the CGI form
  • the buffer comes in the form,
  • lastnameFrederickfirstnameJonemailsmiile_at_ut
    k.eduphone555-1212
  • so it must be parsed into the separate data
    fields.
  • _at_pairs split(//, buffer)

41
A Sample CGI Script
  • foreach pair (_at_pairs)
  • (name, value) split(//, pair)
  • FORMname value
  • open(MEM, "ltmemberemails.txt")
  • while (ltMEMgt)
  • chomp
  • seen_ 1
  • close MEM
  • the value of one is arbitrary for keys of seen

42
A Sample CGI Script
  • address FORMemail
  • if (seenaddress)
  • print "Content-type text/html\n\n"
  • print "You're already a member!ltpgt"
  • else
  • print "Content-type text/html\n\n"
  • foreach key (sort keys FORM)
  • print "key is FORMkeyltpgt"
  • open(MEM,gtgtmemberemails.txt)
  • print MEM address close MEM

43
Regular Expressions
  • Regular expressions are patterns to be matched
    against a string
  • Perl regular expressions are a superset of those
    used by the UNIX utilities grep, sed, vi and awk
  • Weve already seen
  • print if (/pattern/)
  • Which is shorthand for
  • print var if (varm/pattern/)

44
Pattern Matching Operators/Functions
  • varm/pattern/
  • the match operator
  • vars/pattern/replacementpattern/g
  • the substitution operator
  • g modifier means all occurences on each line
  • _at_list split /pattern/, var
  • splits var into list with pattern as
    delimiter
  • var join /pattern/, _at_list
  • joins list into a single variable
  • /pattern/i i means ignore case

45
Regular Expressions
  • Metacharacters
  • \ ( ) ? .
  • Backslash means escape or literal
    interpretation of metacharacters
  • var s/\\/pipe-dollar/
  • means replace with pipe-dollar
  • Escaping normal alphanumeric characters turns
    them (some of them) into metacharacters
  • \s means white space (tab or space)
  • \n means line return

46
Regular Expressions
  • means or Parentheses allow grouping
  • print if (/Dept of (PsychologyBiology)/)
  • prints lines containing
  • Dept of Psychology or Dept of Biology
  • . Means any character
  • means any number of the previous character
  • /Psych./ matches Psychology or Psychiatry
  • means one or more of the previous character
  • lines/\s/\t/g
  • replace one-or-more spaces with a tab

47
Regular Expressions
  • means beginning of the line
  • means end of the line
  • s/\s// gets rid of spaces at beginning of
    line
  • identify a character class
  • s/A-Ex2/R/g replaces A, B, C, D, E, 2, or x
    with R.
  • identifies a negative character class
  • \w any word character a-zA-Z0-9_
  • while(ltgt)
  • /\_at_/ print _\n foreach(split
    /\w\_at_\.\-/ )
  • extracts email addresses from an html file

48
Command Line Options
  • perl -w filename.pl
  • Debug mode, provides extra detail about potential
    flaws in code
  • perl -c filename.pl
  • Test if file compiles successfully without
    actually running
  • perl -e command1 command2
  • Command line switch runs perl code typed
    directly on the command line.
  • perl -e sleep(120) while (1) print "\a"
  • a cheap alarm clock

49
Subroutines
  • Defining a subroutine
  • sub name .
  • Invoking a subroutine
  • name
  • print Whats your name?
  • chomp (name ltstdingt)
  • hello
  • sub hello
  • print Hello, name!\n

50
System Calls
  • Backticks execute an expression from the command
    line and return the standard output
  • files ls
  • _at_files split /\n/,files
  • system( ) just executes the expression and
    returns 1 if successful, 0 if not
  • system (mailx -s \test mailing\ smiile_at_utk.edu
    lt file)

51
Additional Resources
  • CGI Course, March 28 and April 6. See
  • http//web.utk.edu/training
  • Another PERL tutorial
  • http//www.netcat.co.uk/rob/perl/win32perltut.html
  • A Directory of PERL tutorials
  • http//www.astentech.com/tutorials/Perl.html
  • Schwartz, R., Christiansen, T., Wall, L.
    (1997). Learning Perl. Sebastopol, CA OReilly
    Associates.

52
Additional Resources
  • The PERL Bookshelf (CD-ROM with 6 books).
    OReilly Associates. Includes Learning Perl.
  • Christiansen, T., Torkington, N. (1998). Perl
    Cookbook. Sebastopol, CA OReilly Associates.
  • UNIX for Windows
  • http//www.research.att.com/dgk/uwin/
Write a Comment
User Comments (0)
About PowerShow.com