Title: Review: filters, bash programming
1Review filters, bash programming
- CSRU3130, Spring 2008
- Ellen Zhang
1
2Announcements
- Homework 4 due today
- Makeup class exercise on shell programming
- Tuesday, Feb 26, 1pm-215pm, Computer lab
- Wednesday, Feb 27, 10am-1115am, Computer lab
- Midterm exam
- Friday, Feb 29
- Close book, ok to bring a one-page cheatsheet
3Course works expectation
- At least 3 hours studies each week
- After each class review slides, make sure you
understand and remember important concepts - Command substitution, filename expansion, command
line arguments, standard input/output/error - Shell loop structure, branch structure,
- At least once a week read textbook and online
tutorial - Homework/Exercise experiment with computers,
learn by trying things out
4For the rest of the course
- Regular quizzes
- More in-class exercises, questions
- Final grades
- Will reflect not only how well you did, but also
how hard you tried too
5Outline
- Bash programming review
- Homework 3 Q1, Q2, Q4
- grep family and regular expression
- Homework 3 Q5
- More filters
- sed
- sort, tr, cut, paste, comp, uniq
5
6Last class
- Shells filename expansions (pattern matching)
- rm , ls
- ls a,b.o, ls ?.o, ls a-z.o
- Command substitution substitute with standard
output of a command - file_numls -l wc l
- cur_time(date)
- Or used in a script
- for student in cat students.txt
7set command (1)
- When invoked without arguments, set shows values
of environment variables - set grep PATH
- echo PATH
- If followed by ordinary arguments (no options
that starts with -), reset values of 1,2, - set Hello world 1 is set to Hello,
2 is world - set Hello World!
- set ls l wc
8set command (2)
- When invoked with options, set turns on/off shell
options - -f disable filename expansion (globbing)
- -v Print shell input lines as they are read.
- -x Print a trace of simple commands and their
arguments after they are expanded and before they
are executed. - To unset these options, use instead of
- set f
9Tracing shell commands
- Display commands after they are expanded
- zhang_at_storm Demo set x
- zhang_at_storm Demo ls
- ls --colortty b.o CCodes chdir dd Examples
exercise1 fe maillist.txt pick - b.o chdir dd fe maillist.txt pick README
README1 subshell subshell_nopar test - echo -ne '\0330zhang_at_storm/Demo
10Examples
- Lets see how pick script is executed
- !/bin/bash
- set -x
- for i
- do
- echo -n "i (y/n)" 1gt2
- read input
- if input 'y'
- then
- str"str i"
- fi
- done
- echo str
Same as for i in
11Tracing pick script
- zhang_at_storm Demo echo "red green blue yellow"
gt color.txt - zhang_at_storm Demo ./pick cat color.txt
- for i in '"_at_"'
- echo -n 'red (y/n)'
- red (y/n) read input
- y
- '' y y ''
- str' red'
- for i in '"_at_"'
- echo -n 'green (y/n)'
- green (y/n) read input
- n
'' n y '' for i in '"_at_"' echo -n 'blue
(y/n)' blue (y/n) read input n '' n y ''
for i in '"_at_"' echo -n 'yellow (y/n)' yellow
(y/n) read input y '' y y '' str' red
yellow' echo red yellow red yellow zhang_at_storm
Demo
12Homework 3 Question 1
- a script that prints out all empty files (fe).
- Syntax fe path_name
- Functions
- check all files located under given directory,
and print to the standard output the list of
empty files. - e.g. suppose in the directory, there are two
empty files, a.o, and b.o, then ./fe will print - a.o b.o.
13Hw 3 Question 1 find out commands needed
- What we need to do ?
- Find out files under given directory
- For each file, check whether it is a regular and
is empty if so, prints its name to standard
output - Go to the course web page , Advanced
bash-scripting guide, Part 2, 7 Tests, 7.2 File
test operators - test f fileName
- test s fileName
- Put them together ?
14AND list construct
- Statement1 statement2 statement3
- first statement is executed
- If it returns false, return false if it returns
true, next statement is executed. Until finding
first false statement, or all statements return
true - For example
- test f myFile test s file
- If myfile is not regular file (i.e., first
statement returns false), then the second test
command is not executed and the who list return
false
15AND list construct
- Statement1 statement2 statement3
- Statement can be
- test command, test f fileName, -n string
- Other commands
- e.g. echo always returns true
- grep returns true if found a match
- In a script, use exit n to return an exit code
- 0 success otherwise failure
- If a script doesnt have exit n its return code
is the return code of the last command in script - Ex str Morning echo Good str
16OR list construct
- Statement1 statement2 statement3 ..
- first statement is executed
- If it returns true, return true (statement2, 3 is
note executed) - if it returns false, next statement is executed.
Until finding first statement that returns true
(the construct then return true), or all
statements return false (the construct then
returns false). - Ex str Morning echo Good Afternoon
17Hw 3 Question 1 put them together
- Use loop and branch constructs
- for i in each files under given directory
- do
- if file i is empty, regular file
- then
- Print the i filename to standard output
- fi
- done
ls 1
test f i ! test s i
f i ! s i
echo i
18Hw 3 Question 1 the script
- !/bin/bash
- fe find and print out all files of zero size
-
- for i in ls 1
- do
- if -f i ! -s i
- then
- echo i
- fi
- done
19A better solution
- du -a grep '0' cut -f2 tr '\n' ' '
- Courtesy Michael Karp
20Hw3 Q2 find 10 largest files
- Original list10 script
- if "1" -0-9
- then
- for path in
- do
- if path ! 1
- then
- path_list"path_list
path" - fi
- done
- ls -Rl path_listsort -k 5 -nr head 1
- else
- ls -Rl grep - sort -k 5 -nr
head -10 - fi
21Hw3 Q2 extended version
- Allow user to specify number (i.e., -n) anywhere
- Usage list10 /Documents /Projects -30
- How to do this ? In two steps
- Parsing command line arguments to find out
- How many files user want to display ?
- What are the directories to check ?
- Call ls, grep, sort, head with appropriate
arguments to do the work
22Hw3 Q2 parsing command line arguments
- Goal find out
- 1. How many files user want to display ?
- 2. What are the directories to check ?
- Use variables (lineNum, pathList) to save answer
to above questions - Check each command line arguments
- If it is a number argument, save it to lineNum
- Otherwise, the argument is a path name, we want
to append it to PathList
23Now try to put them together
24Hw3 Q4 delete empty files
- Goal a script, de, that
- finds all empty files under a directory
- for each of them, asks user yes or no
- deletes files that user responds with yes
- Putting them together how ?
- rm pick fe pathname
- rm pick \fe pathname\
- rm pick (fe pathname)
fe script
pick script
25Hw3 Q4 delete empty files (contd)
- Putting them together
- rm pick \fe pathname\
- rm pick (fe pathname)
- What if pick returns empty string ?
- fileChosen./pick (./fe)
- if -n "fileChosen"
- then
- rm fileChosen
- fi
test n fileChosen Return true if the string
is not empty. Always quote the string being
tested ! See Advanced Bash-Scripting Guide for
a complete list of test.
26Here document
- Pass input that is embedded in script to a
command - !/bin/bash
- mail ltlt DELIMIT_OF_EMAIL
- This is a test email sending using
- here document using /Demo/test script.
- DELIMIT_OF_EMAIL
Here document starts with ltlt, followed by a
special string which is repeated at the end of
the document.
27Bundle script
- bash script, gen_bundle.sh
- Pack files specified in command line arguments
into a bundle file - ./gen_bundle.sh bundle.sh 411 gt mybundlefile
- Mybundlefile, a bundle file, is a shell script
itself - contains commands for unpacking, and files
themselves - When executed, it creates those files being
packed
28Inside mybundlefile file
end of bundle.sh echo 411 cat gt411 ltlt'End of
411' grep "" ltltEnd dial-a-joke
212-976-3838 dial-a-prayer 000000000 dial santa
8900999 today is date end end of 411
- To unbundle, bash this file
- echo bundle.sh
- cat gtbundle.sh ltlt'End of bundle.sh'
- !/bin/bash
- echo 'To unbundle, bash this file'
- for i in _at_ or for i
- do
- echo "echo i 1gt2"
- echo "cat gti ltlt'End of i'"
- cat i
- echo "End of i"
- done
What happens if we run ./mybundlefile ?
29gen_bundle.sh
- For each file specified in command line
- Print out codes for unpacking the file (to
standard output) - cat gt FILENAME ltltSTART of FILENAME
- END of FILENAME
Contents of FILE
30gen_bundle script
- !/bin/bash
- echo To unbundle, bash this file
- for i
- do
- echo echo
- echo cat gti ltltEnd of i
- cat i
- echo End of i
- done
31HW3 Q6 gen_bundle with manifest
- modify bundle so that it packs a "manifest into
the bundle - a list of archived files and other info. about
them, i.e., those provided by ls l. - When bundle file is unpacked, a file named
"manifest" is created with info such as - Manifest of files bundled
- -rwxr-xr-x 1 zhang staff 104 2008-01-30 1402 411
- -rwxr-xr-x 1 zhang staff 203 2008-02-15 1126
bundle.sh
32Bundle how to generate manifest ?
- We know how to obtain info. needed to for
manifest file - call ls l on each files being packed
- gen_bundle needs to generate bash script
- When executed, generate a file named "manifest"
with above info - Use here document again !
33How to generate manifest
- !/bin/bash
- echo To unbundle, bash this file
- for i
- do
- echo echo i
- echo cat gti ltltEnd of i
- cat i
- echo End of i
- done
- generate manifest
- echo generate manifest
- echo cat gtmanifest ltltEnd of manifest
- for i
- do
- ls -l i
- done
- echo End of manifest
34Outline
- Bash programming review
- Homework 3 Q1, Q2, Q4, Q6
- grep family and regular expression
- Homework 3 Q5
- Other filters
- sed
- sort, tr, cut, paste, comp, uniq
34
35Example of regular expressions
- Variable names in C
- a-zA-Z_a-zA-Z_0-9
- Dollar amount with optional cents
- \0-9(\.0-90-9)?
- Time of day
- (10121-9)0-50-9 (ampm)
- What does 1900-2000 match ?
- 1998, 2000, 1, 2, 3, 4, 8, 9 ?
- Range only applies to single character
36HW3 Q5 nstudents script
- How many student accounts are there on storm ?
- group ID for student is 501
- User accounts are stored in file /etc/passwd
- waltonx2747501Myasia D Walton, FAll Jan 17,
2006 Pro. Tran/home/students/walton/bin/bash - What does each field mean ?
37HW3 Q5 nstudents script (contd)
- Source of information
- Each line in /etc/passwd corresponds to a user
- A student account has a group id of 501
- How to find out of student accounts ?
- Select lines where group id field is 501
- grep .
- Count number of lines using wc l
- User pipeline to connect the two
- grep /etc/passwd wc -l
38HW3 Q5 nstudents script (contd)
- Select accounts (lines) where group id is 501
- grep 501 /etc/passwd
- Problem accounts with user id 2501, or group id
5501 will be selected too - grep 501 /etc/passwd
- Problem Accounts with user id 501 will be
selected too - grep "501a-zA-Z" /etc/passwd
- A 501 field, followed by a field of letters
(info. Field) - grep 0-9501" /etc/passwd
- Select lines where fourth field (group id) is
501, safest solution
39Grep back-references
- Find accounts whose uidgroupid
- grep '\(0-9\)\1' /etc/passwd
- Back-references
- Use \( and \) to specify sub-expressions (i.e.
tagged expression) - \n backreference specifier, matches exactly the
string that has matched the nth tagged expression - e.g. search all lines where first word is same as
last - grep \(alpha\1,\\) . \1 filename
40Specify pattern in files
- -f option useful for complicated patterns, also
don't need to worry about shell interpretation. - Example
- cat alphvowels aeiouaaeioueaeioui
aeiouoaeiouuaeiou - egrep -f alphvowels /usr/share/dict/words
abstemious ... tragedious
41Outline
- Bash programming review
- Homework 3 Q1, Q2, Q4, Q6
- grep family and regular expression
- Homework 3 Q5
- More filters
- sed
- sort, tr, cut, paste, comp, uniq
41
42Sed Stream-oriented, Non-Interactive, Text Editor
- Look for patterns one line at a time, like grep,
edit (change) lines of the file - Non-interactive text editor
- Editing commands come in as command line
arguments , or as script - an interactive editor ed which accepts same
commands - For example
- sed s/UNIX/UNIXTM/g filename gt output
- Change all occurrences of UNIX to UNIXTM in given
files, and write output to file output
43Conceptual overview
- All editing commands are applied in order to each
line in the input file - If a command changes input, subsequent command
address will be applied to current (modified)
line, not original input lines. - Original input file is unchanged (sed is a
filter) - By default, results are sent to standard output
(but can be redirected to a file). - -n turn off automatic printing
44Sed Flow of Control
script
. . .
cmd 1
cmd n
cmd 2
print cmd
output
output
input
only without -n
45Address in sed command
- Specify range of lines to apply a command
- Specified through line
- 4/s/old/new/ for 4th line
- 1,5/s/old/new/ for 1st to 5th lines
- 5,/s/old/new/ for from 5th line to end of file
- 5,!/s/old/new/ for all lines except from 5th
line to end of file - Specified through pattern matching
- /The/s/old/new/ for lines starting with The
- /start/,/end/s/old/new/ between first start to
first end
46sed command substitute
- Substitute (s)
- address1,address2s/pattern/replacement/n/g
- Optional address fields
- Pattern specified using regular expression
- Example to replace all numbers in file to XXXX
- sed s/0-90-9/XXXX/g file.txt
- sed /SENSITIVE_START/,/SENSITIVE_END/s/pattern/re
place/g file.txt
47sed command substitute (2)
- Substitute (s)
- address1,address2s/pattern/replacement/n/g
- n/g flag which occurrence to apply command to
(Pattern might occur multiple times in line) - Number (1,2,..) only substitute that occurrence
- g substitute all occurrences within line
48sed Example
- to display all online users and their login
IP/host name - who
- zhang pts/0 2008-02-21 1709
(ool-44c6b5bf.dyn.optonline.net) - root pts/3 2008-02-21 1019
(pad.cis.fordham.edu) - kerins pts/4 2008-02-21 1415
(74.72.35.180) - portela pts/8 2008-02-21 1533
(ool-182c2a62.dyn.optonline.net) - Use sed to select 1st and last field
- who sed -e 's/ . / /' -e 's/(/ /' -e 's/)/ /
- zhang ool-44c6b5bf.dyn.optonline.net
- root pad.cis.fordham.edu
- ...
49Example
- Display /etc/passwd in user-friendly manner
- Before
- zhangx3504500Ellen Zhang , Zhang
instructional/home/staff/zhang/bin/bash - After
- User Zhang UserID 3504 GroupID 500 UserInfo
Ellen Zhang, Zhang instructional HomeDir
/home/staff/zhang Loginshell /bin/bash - First step insert User to the beginning of
lines - Second step replace x with UserID
- 3rd step replace the before groupID to
GroupID - ...
50User friendly /etc/passwd
- First step insert User to the beginning of
lines - sed e s//User / /etc/passwd
- Userzhangx3504500Ellen Zhang , Zhang
instructional/home/staff/zhang/bin/bash - Replace x with UserID
- sed -e 's//User/' -e 's/x/ UserId/'
/etc/passwd - Userzhang UserId3504500Ellen Zhang , Zhang
instructional/home/staff/zhang/bin/bash - Replace before group id with GroupID
- sed -e 's//User/' -e 's/x/ UserId/' -e 's//
GroupId/3' /etc/passwd - Userzhang UserId3504 GroupId500Ellen Zhang ,
Zhang instructional/home/staff/zhang/bin/bash
51sed commands delete, editing
- Delete (d) delete specified lines
- sed 1,5d file.txt
- sed 1,5!d file.txt
- /ltpatterngt/d delete lines containing pattern
- Note file.txt not touched, just the specified
lines not displayed to standard output - More editing commands append, insert, change
entire lines - sed /secret/c\DELETED file.txt
52Sed command print
- Print (p) print specified (selected) lines
- sed n 1,5p file.txt
- Display first five lines only, same as head -5
file.txt - Sed can be used as grep
- sed n /ltpatterngt/p is same as grep ltpatterngt
- !cmd execute the command only if line is not
selected - sed n /ltpatterngt/!p is same as grep v
ltpatterngt
53sed command quit
- Quit (q) end sed session
- sed 2/q file.txt
- print two lines and then quit
- sed /ltpatterngt/q file.txt
- print input up to given pattern and then quit
54Outline
- Bash programming review
- Homework 3 Q1, Q2, Q4, Q6
- grep family and regular expression
- Homework 3 Q5
- More filters
- sed
- sort, tr, cut, paste, comp, uniq
54
55sort command
- Basic functionalities
- sorts the input line by line in ASCII order
- Options
- -k Use the kth field (default field delimiter is
space) as sorting key - -n numeric sorting (note that "20" lt "3", but 3
lt 20) - -r reverse order (from largest to smallest)
- Examples
- ls l sort -k 5 -nr head -10
- ls l sort k 6,7
56sort command
- Default field separator is space, use -t option
to specify other field separator - Ex sort all user accounts by uid
- sort -t -k 3 -n /etc/passwd
- Ex. sort all user accounts with groupid, and then
usrid - sort -t -k 4 -k 3 -n /etc/passwd
- other options
- -f case folding, i.e. ignore case
- -o output file, useful for in situ sorting
- sort foo gt foo is disastrous.
- sort o foo foo
57uniq command
- Discard all but one of successive identical lines
from standard input, writing to standard output - Options to change default behavior
- -c preceding lines with of repetitions
- -u discard repeated lines (even not successive)
- -f k skip k fields in the comparison
command command test asd test
command test asd test
uniq
58Command tr
- tr copies standard input to standard output with
substitution or deletion of selected character - Syntax
- tr options set1 set2
- Input characters in set1 is translated to
corresponding characters in set2 - Ex Change all lower case to upper case
- tr a-z A-Z lt file
59Command tr
- Options
- -c complement set1
- Apply to characters that are not in set1
- -d delete characters in set1, no translation
- -s squeeze, i.e., replace sequence of same
characters (specified in commands) with one - Separate words line by line
- tr -sc A-Za-z '\012
- tr sc A-Za-z \n
- Ex to replace numbers in a file with x
- tr digit x lt file tr -s x
Special character return/newline
Replace sequence of x with a single x
60Putting it together
- Ex Get a letter frequency count on a set of
files given on command line. (No file names means
that std input is used.) - !/bin/bash
- cat
- tr -sc A-Za-z '\012'
- sort
- uniq -c
- sort -k 1 -nr
- head -10
Sort the words, so that same words are grouped
together
Print unique words preceding with counts
61Command cut
- Cut out selected bytes or fields of each line of
a file - cut -b -c -f list -n -d delim -s
file - -b select given bytes, -c select given
characters, -f select given field - -d to specify delimiter for field
- list
- 1,4,7
- 1-3,8
- -5,10 (short for 1-5,10)
- 3- (short for third through last field))
- Ex
- Select user name, uid and gid from /etc/passwd
- cut -d -f 1,3,4 /etc/passwd
- To show first four characters of a file
- cut b 1-4 file.txt
- Show users online (list no other info)
- who cut f1 d
62Command paste
- Merge corresponding lines of files
- paste -s -d list files
- files path name of input files
- If - is specified, standard input will be used
the standard input will be read one line at a
time, circularly, for each instance of -.
Implementations support pasting of at least 12
file operands. - Ex. ls paste - - - -
- Ex. ls l paste - comments.txt
1 2 3
Alice Betty Candy
Alice 1 Betty 2 Candy 3
paste name.txt number.txt