Regular Expressions - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Regular Expressions

Description:

Search fields can be added, removed or renamed at runtime ... 12/08/1998 2|01024|x86 10:14:53 [Search] Obi Monaco.Huarez. About 15 minutes later... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 34
Provided by: downloadM
Category:

less

Transcript and Presenter's Notes

Title: Regular Expressions


1
Regular Expressions

The hidden power language
Roy Osherove www.iserializable.com Methodology
Team System Expert Sela Group www.Sela.co.il
2
Tools
  • http//tools.osherove.com
  • www.ISerializable.com

3
The Log File
4
Developer Problem Make this log file useful
  • Old log file from a nix systems entries
  • Converted to and from various formats
  • Searched by users
  • Format may change
  • Search fields can be added, removed or renamed
    at runtime

Date CPUsramcpu HHmmss action user
domain.machine 25/05/1998 100512x86 214912
Search Anakin Antler.Anita1 25/05/98
100512x86 215115 Update Anakin
Antler.Anita1 26/05/1998 100256x86 110245
Search Darth Cydot.Uk.Gerry2k 26/05/98
100256x86 111249 Update Darth
Cydot.Uk.Gerry2k 27/05/98 100512x86 153430
Search Anakin Anterl.Anita1 12/08/1998
201024x86 101453 Search Obi Monaco.Huarez
5
About 15 minutes later
  • Done.

About 45 minutes later
Home early.
6
You can be home early too!
  • Regex is easier than you think

7
What are Regular Expressions?
  • A language to describe a language using
    patterns
  • Think SQL or XPath for text
  • Originated with Perl and nix shell scripting
  • Many variations and frameworks exist. Only one
    for .NET (for now)
  • Used in most languages

8
Common Regex Uses
  • Text Validation
  • Phones, emails, address or any format requirement
  • Text Manipulation
  • Transform text
  • Text Parsing
  • Find in files, site Scraping, data collection

9
What .NET brings to the plate
  • Full object model
  • Extended syntax
  • Optimization techniques in the framework

10
.NET Regular Expressions
  • Show up in several places
  • In the classes of the System.Text.RegularExpressio
    ns namespace
  • Via the RegularExpressionValidator validator
    control (for ASP.NET)
  • Sprinkled in dozens of other places
  • Browser capabilities filter
  • In the WSDL ltmatchgt tag
  • And many more

11
Key Classes within System.Text.RegularExpressions
  • Regex
  • Contains the pattern and matching options
  • Important methods
  • IsMatch() returns boolean
  • Replace() returns a string
  • Split() returns a string array
  • Main Use
  • Validation, Splitting, Replacing text

12
The Process
Matches
Input
Regex
Splits
Pattern
Text
Replace text
Options
13
Validation
14
Syntax
  • Match exact text as written in the pattern
  • a will match all a in the text.
  • Except for special symbols

15
Enclosing Alternatives with
  • The square brackets allow you to specify a list
    of alternate values. Used in conjunction with
    the operator, you can even specify character
    ranges.
  • Cc Capital or lowercase c
  • A-Z Any capital letter A through Z
  • A-Za-z Any capital or lowercase letter
  • 0-9 Any digit 0 through 9
  • A-Za-z0-9 Any letter or digit
  • 0-9.- Any digit or special char listed
  • Notice no escape needed

16
Controlling ExpressionFrequency with
  • The operators allow you to control the
    frequency of the preceding expression. The
    expression takes one of these two forms
  • occurrences
  • A-Za-z3
  • MinOccurrences, MaxOccurences
  • A-Za-z1,3

17
Basic Frequency Operators
  • ? 0 or 1
  • 0 or more
  • 1 or more
  • So,
  • 3
  • Will match
  • 3, 33, 3333
  • but not
  • 45, 678.

18
Wildcard Operator .
  • . matches any non-newline character
  • Unless multiline mode has been turned on for the
    pattern
  • Examples
  • A. would match a capital A followed by one any
    character.
  • Will not match Abc
  • A. would match a capital A followed by one or
    more non-newline characters
  • \.htm.? would match ".htm" followed by
  • an optional non-newline character
  • Backslash escape characters that have reserved
    meanings in regular expressions

19
Convenience Expressions
  • \d
  • Any digit
  • \D
  • Any non-digit
  • Must match something else one
  • \s
  • Any whitespace character (such as a space or tab)
  • \S
  • Any character other than a whitespace character
  • \w
  • Any number or letter
  • \W
  • Any character other than a number or letter

Many more Unicode, Hex Values, negative lookups
20
Quick Quiz!
  • A-Za-z3
  • 3 capital or lowercase letters
  • Abc, abc, aBC,1bc
  • A-Za-z2,4
  • A capital letter followed by at least 2 but not
    more than 4 lowercase letters
  • Abc, Acbde, abcde, ABcde
  • \w3,8\.\w3
  • 3 to 8 AlphaNumeric characters, followed by a dot
    and 3 alpha numerics
  • Filename.txt, d0main.com, 1234.567, 34.456

21
Splitting and Manipulating
22
Text Manipulation
23
The Spammer
24
(2) Key Classes within System.Text.RegularExpressi
ons
  • MatchCollection - Match
  • MatchCollection stores all the matches found
  • GroupCollection - Group
  • CaptureCollection - Capture
  • Regex.Match() returns Match
  • Regex.Matches() returns MatchCollection
  • Main Use
  • Parsing, searching, collecting data

25
Simple parsingParsing for emails
26
Grouping(the coolest part)
27
Grouping (pay attention!)
  • Groups give us object models
  • HTML File
  • Roy_at_Osherove.com
  • Create a capture hierarchy and use it in code
  • \w\.\-_at_ \w\.\-\.\w2,5
  • (?ltuserNamegt\w\.\-)_at_(?ltdomaingt\w\.\-\.\w2,5
    )

28
Grouping Emails The Regulator
29
Getting back to the first problemMake this log
file useful
  • Old log file from a nix systems entries
  • Converted to and from various formats
  • Searched by users
  • Format may change
  • Search fields can be added, removed or renamed
    at runtime

Date CPUsramcpu HHmmss action user
domain.machine 25/05/1998 100512x86 214912
Search Anakin Antler.Anita1 25/05/98
100512x86 215115 Update Anakin
Antler.Anita1 26/05/1998 100256x86 110245
Search Darth Cydot.Uk.Gerry2k 26/05/98
100256x86 111249 Update Darth
Cydot.Uk.Gerry2k 27/05/98 100512x86 153430
Search Anakin Anterl.Anita1 12/08/1998
201024x86 101453 Search Obi Monaco.Huarez
30
How do I start?
  • Take a sample of the log file
  • Recognize the data pattern for each entry
  • Use groups to get each lines values
  • Create a tool that uses this regex to parse a log
    file
  • The tool will use the returned results to
    generate the log as XML
  • Load the XML into a DataSet
  • Allow user to print Select statements on the
    DataSet

31
Parsing a log file
32
Regulazy
  • Build simple expressions by example
  • No syntax knowledge needed
  • Free
  • Tools.osherove.com

33
When not to use Regex
  • When its easier and more readable to do it
    otherwise
  • Not just because its cool
  • Hard to read
  • Steep learning curve
  • Hard to maintain

Sometimes, when confronted with a problem, you
might decide to solve it with Regular Expressions
for the wrong reasons. Now you youve got two
problems.
34
Summary
  • Amazing parsing flexibility
  • Good skill to have anywhere
  • Can save you time and nerves
  • With Power comes responsibility
  • Weigh the pros and cons before using

35
Resources
  • The Regulator tools.osherove.com
  • Regulazy tools.osherove.com
  • Regexlib.com Regex archive (http//www.regexlib.
    com) Cheat Sheet
  • http//www.regular-expressions.info

Roy Osherove Royo_at_sela.co.il Blog
www.iserializable.com
36
Thank you!
  • Questions?

Roy Osherove Royo_at_sela.co.il Blog
www.iserializable.com
37
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com