Chapter 18 Regular Expressions - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Chapter 18 Regular Expressions

Description:

Anchors and ... In multi-line mode, anchors match the beginning and end of any line ... Anchors. Back-References. Using Flags. Using Sub-Expressions with ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 36
Provided by: peopleScs
Category:

less

Transcript and Presenter's Notes

Title: Chapter 18 Regular Expressions


1
Chapter 18Regular Expressions
  • Presented by Yuntao Li

2
Outlines
  • The Structure of a Regular Expression
  • Representing Characters
  • Especial concepts in XQuery

3
Regular Expressions
  • What are they?
  • patterns that describe strings
  • How can we use them?
  • to determine whether a string value matches a
    particular pattern (matches)
  • to replace parts of string that match a pattern
    (replace)
  • to tokenize strings based on a delimiter pattern
    (tokenize)

4
The Structure of a Regular Expression
  • is based on that of XML Schema
  • can be composed of a number of different parts
  • atoms
  • quantifiers
  • branches
  • a single character, such as d,
  • 2. an escape sequence that represents one or more
    characters, like \s or \pLu.
  • 3. a character class expression that represents a
    range or choice of several characters, such as
    a-z

5
Quantifiers
  • The number of times a matching string may appear
    is indicated
  • It appears directly after an atom

6
Quantifiers
7
Parenthesized Sub-Expressions and Branches
  • can be used as an atom in a larger regular
    expression
  • fo matches fooo, but not fofo
  • for specifying a choice between several different
    patterns

(fo)
8
Representing Characters
  • Representing Individual Characters
  • Representing Any Character
  • Representing Groups of Characters

9
Representing Individual Characters
  • A single character can have a quantifier
    associated with it
  • def matches the strings def, ddef, dddef
  • Certain characters, in order to be taken
    literally, must be escaped
  • the asterisk () will be treated like a
    quantifier unless it is escaped

10
Representing Individual Characters
  • These characters are escaped by preceding them
    with a backslash.

11
Representing Individual Characters
  • Use the standard XML syntax for character
    references and predefined entity references in
    regular expressions
  • a space x20
  • a less-than symbol (lt) lt

12
Representing Any Character
  • represents one matching character
  • the period (.)
  • represent multiple characters
  • a quantifier (such as )

13
Representing Groups of Characters
  • Three different kinds of escapes
  • multi-character escapes
  • category escapes
  • block escapes
  • They all start with a backslash

14
Multi-Character Escapes
  • each escape represents only one character in a
    matching string
  • use a quantifier such as to represent several
    replacement characters

15
Category Escapes
  • The Unicode standard defines categories of
    characters based on their purpose.
  • Category escapes take the form \pXX
  • \p \P
  • it is better to use A-Z than \pLu

16
Block Escapes
  • Unicode defines a numeric code point for each
    character.
  • Each range of characters is represented by a
    block name, also defined by Unicode
  • \pIsLatin-1Supplement matches any one character
    in the range x0080 to x00FF
  • More details at the blocks file of the Unicode
    standard

17
Examples for Representing Groups of Characters
18
Character Class Expressions
  • Single Characters and Ranges
  • Subtraction from a Range
  • Negative Character Class Expressions
  • Escaping Rules

19
Single Characters and Ranges
  • List several characters inside square brackets.
  • def defdef, eddfefd
  • \PLI\d either a lowercase letter or a
    digit
  • It is also possible to specify a range of
    characters a-z but not \d
  • More than one range in the same character class
    expression a-zA-Z0-9
  • Ranges and single characters can be combined in
    any order abc0-9 a0-9bc

20
Subtraction from a Range
  • Subtraction allows to express the matched a range
    of characters but leave a few out
  • a-z-jkl any character from a to z
  • except j, k, or l
  • A hyphen can be used a-z-j-l
  • A multicharacter escape also can be substracted
    \pLu-ABC

21
Negative Character Class Expressions
  • A string should not match any of the characters
    specified
  • a-b any character that is not a and b
  • Any character class expression can be negated
  • specify single characters
  • Ranges a-z0-9

22
Examples for Character class expression
23
Escaping Rules for Character Class Expressions
  • The characters , , \, and - must be escaped
    when included as single characters.
  • The character \ must be escaped if it is the
    lower bound of the range.
  • The characters and \ must be escaped if one of
    them is the upper bound of the range.
  • The character must be escaped only if it
    appears first in the character class expression,
    directly after the opening bracket ().

24
Especial Concepts in XQuery
  • Reluctant Quantifiers
  • Anchors
  • Back-References
  • Using Flags

25
Reluctant Quantifiers
  • allow part of a regular expression to match the
    shortest possible string
  • a (?) to the end of any of the kinds of
    quantifiers identified in Table 18-1

r.t
r.?t
reluct or reluctant
26
More related reluctant quantifiers
  • The replace
  • function that
  • use reluctant
  • quantifiers
  • Reluctant quantifiers are not supported in XML
    Schema
  • r.?tly reluctantly

27
Anchors
  • XQuery adds the concept of anchors to XML Schema
    regular expressions
  • str does not match 5str5 in XML Schema
  • The expression should match the beginning or end
    of the string (or both)
  • and

28
Anchors and Multi-Line Mode
  • In Some XQuery functions, the processor should
    operate in multi-line mode
  • In multi-line mode, anchors match the beginning
    and end of any line within the string

29
Back-References
  • Back-references allow you to ensure that certain
    characters in a string match each other
  • a string is a product number delimited by either
    single or double quote ('")\d3-A-Z2('
    ")
  • '\d3-A-Z2'"\d3-A-Z2
  • ('")\d3-A-Z2\1

From left to right Starting with 1 (not
0) Reference any of them by number
30
Using Flags
  • functions accepts a flags argument that allows
    for additional options
  • Options are indicated by single letters
  • the flags argument is a string
  • duplicates are allowed
  • Four options
  • s indicates dot-all mode
  • m indicates multi-line mode
  • i indicates case-insensitive mode
  • x indicates that whitespace characters should be
    ignored

31
Example for the flags argument
  • address 123 Main Street
  • Traverse City, MI 49684

32
Using Sub-Expressions with Replacement Variables
  • The replace function allows parenthesized
    sub-expressions

33
3 functions
  • fnmatches
  • fnmatches("abracadabra", "a.a") returns true
  • fnreplace
  • replace("abracadabra", "a.?a", "") returns
    "cbra"
  • fntokenize
  • fntokenize("The cat sat on the mat", "\s")
    returns ("The", "cat", "sat", "on", "the", "mat")
  • fntokenize("1, 15, 24, 50", ",\s") returns
    ("1", "15", "24", "50")
  • fntokenize("1,15,,24,50,", ",") returns ("1",
    "15", "", "24", "50", "")

34
Conclusion
  • The Structure of a Regular Expression
  • Representing Individual Characters
  • Representing Any Character
  • Representing Groups of Characters
  • Character Class Expressions
  • Reluctant Quantifiers
  • Anchors
  • Back-References
  • Using Flags
  • Using Sub-Expressions with Replacement Variables

35
The End
Write a Comment
User Comments (0)
About PowerShow.com