Regular Expression in Java 101 - PowerPoint PPT Presentation

About This Presentation
Title:

Regular Expression in Java 101

Description:

a way to describe patterns in strings. similar to regex in Perl. cryptic syntax: 'write once, ponder many times' used to search, parse, modify textual data ... – PowerPoint PPT presentation

Number of Views:437
Avg rating:3.0/5.0
Slides: 17
Provided by: csscmswaik
Category:

less

Transcript and Presenter's Notes

Title: Regular Expression in Java 101


1
Regular Expression in Java101
  • COMP204
  • Source Sun tutorial,

2
What are they?
  • a way to describe patterns in strings
  • similar to regex in Perl
  • cryptic syntax write once, ponder many times
  • used to search, parse, modify textual data
  • Java java.util.regex with classes Pattern,
    Matcher, and PatternSyntaxException,
  • plus utility methods in String class

3
String constants match
  • regex foo
  • string foo
  • gt 03 "foo"
  • regex foo
  • string foofoofoo
  • gt 03 "foo
  • gt 36 "foo
  • gt 69 "foo"

4
Meta characters
  • Some characters are special, e.g. a single dot
    . matches any character
  • regex cat.
  • string cats
  • gt 04 cats
  • Others are (\-)?.
  • Use meta char literally escape with backslash
    (e.g. \.), or quote, e.g. \Q.\E

5
Character classes
  • abc a, b, or c (simple class)
  • abc any character except a, b, or c (negation)
  • a-zA-Z a through z or A through Z, inclusive
    (range)
  • a-dm-p a through d, or m through p a-dm-p
    (union)
  • a-zdef d, e, or f (intersection)
  • a-zbc a through z, except for b and c
    ad-z (subtraction)
  • a-zm-p a through z, and not m through p
    a-lq-z(subtraction)

6
Predefined classes (see Pattern)
  • . Any character (may or may not match line
    terminators)
  • \d digit 0-9
  • \D non-digit 0-9
  • \s whitespace character \t\n\x0B\f\r
  • \S non-whitespace character \s
  • \w word character a-zA-Z_0-9
  • \W non-word character \w

7
Greedy Quantifiers
  • X? X, once or not at all
  • X X, zero or more times
  • X X, one or more times
  • Xn X, exactly n times
  • Xn, X, at least n times
  • Xn,m X, at least n but not more than m times

8
Reluctant quantifiers
  • X?? X, once or not at all
  • X? X, zero or more times
  • X? X, one or more times
  • Xn? X, exactly n times
  • Xn,? X, at least n times
  • Xn,m? X, at least n but not more than m times

9
Possessive Qantifiers
  • X? X, once or not at all
  • X X, zero or more times
  • X X, one or more times
  • Xn X, exactly n times
  • Xn, X, at least n times
  • Xn,m X, at least n but not more than m times

10
Whats the difference
  • // greedy quantifier
  • regex .foo
  • string xfooxxxxxxfoo
  • gt 013 "xfooxxxxxxfoo"
  • // reluctant quantifier
  • regex .?foo
  • string xfooxxxxxxfoo
  • gt 04 "xfoo
  • gt 413 "xxxxxxfoo"
  • // possessive quantifier
  • regex .foo
  • string xfooxxxxxxfoo
  • No match found.

11
Capturing groups
  • Quantifiers apply to single characters (e.g. a,
    matches everything, why?), character classes
    (e.g. \s) or groups (e.g. (dog)2 )
  • Groups are numbered left-to-right
  • ((A)(B(C))) gt
  • 1 ((A)(B(C))) 2 (A) 3 (B(C)) 4 (C)
  • refer to groups with e.g. \2 for group two
  • regex (\w)\1 string hello gt 24 ll

12
Boundaries
  • The beginning of a line
  • The end of a line
  • \b A word boundary
  • \B A non-word boundary
  • \A The beginning of the input
  • \G The end of the previous match
  • \Z The end of the input but for the final
    terminator, if any
  • \z The end of the input

13
Pattern class
  • boolean b Pattern.matches("ab", "aaaaab")
  • or
  • Pattern p Pattern.compile("ab")
  • Matcher m p.matcher("aaaaab")
  • boolean b m.matches()
  • latter allows for efficient reuse

14
Splitting a string using a regex
  • Pattern p Pattern.compile(ab)
  • String items p.split(aabbab)
  • for(String s items)
    System.out.println(s)
  • similar to split(regex) method in class String
    (see last slide Lecture11.ppt)
  • String items aabbab.split(ab)

15
Matcher class
  • loads of methods, e.g. to access groups (see test
    harness) or replace expressions
  • Pattern p Pattern.compile(dog)
  • Matcher m p.matcher(the dog runs)
  • String result m.replaceAll(cat)
  • System.out.println(result)
  • gt the cat runs

16
String class has one-off methods
  • the dog runs.replaceFirst(dog,cat)
  • gt the cat runs
  • aabcbdabe.split(ab)
  • gt ,c,d,e
  • xfooxxxxxxfoo.matches(.foo)
  • gt true
Write a Comment
User Comments (0)
About PowerShow.com