Efficient Regular Expression Evaluation: Theory to Practice Michela Becchi and Patrick Crowley

About This Presentation

Title:

Efficient Regular Expression Evaluation: Theory to Practice Michela Becchi and Patrick Crowley

Description:

Title: PowerPoint Presentation Last modified by: Michela Becchi Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show (4:3) – PowerPoint PPT presentation

Number of Views:161

Avg rating:3.0/5.0

Slides: 22

Provided by: webMisso48

Category:

more less

Transcript and Presenter's Notes

Title: Efficient Regular Expression Evaluation: Theory to Practice Michela Becchi and Patrick Crowley

1
Efficient Regular Expression Evaluation Theory
to PracticeMichela Becchi and Patrick Crowley
ANCS08
2
Motivation

Size and complexity of rule-set increased in
recent years
Snort, as of November 2007
8536 rules, 5549 Perl Compatible Regular
Expressions
99 with character ranges (c1-ck,\s,\w)
16.3 with dot-star terms (., c1..ck
44 with counting constraints (.n.m,
c1..ckn,m)
Several proposals to accelerate regular
expression matching
FPGA
Memory centric architecture

3
Objectives

Can we converge distinct algorithmic techniques
into a single proposal also for large data-sets?
Can we apply techniques intended for memory
centric architectures also on FPGAs?

Provide tool to allow anybody to implement a high
throughput DPI system on the architecture of
choice
4
Target Architectures
Regex-Matching Engine
Memory-centric architectures
FPGA logic
FPGA / ASIC memory
Generalpurpose processors
Network processors
5
Challenges
DFA
NFA
FPGA logic

Logic cell utilization
Clock frequency

Memory space
Memory bandwidth

6
D2FA default transition compression

Observations
DFA state set of ? next state pointers
Transition redundancy
Idea
Differential state representation through use of
non-consuming default transitions
In general

s3
a
s3
a
s4
b
s1
s4
b
s1
c
s5
c
s5
s3
a
s4
b
s2
s6
c
c
s2
s6
7
D2FA algorithms

Problem set default transitions so to
Maximize memory compression
Minimize memory bandwidth overhead
Kumar et al, SIGCOMM06
Bound dpMAX on max default path length
O(dpMAX1) memory accesses per input char
Better compression for higher dpMAX
Becchi et al, ANCS07
Only backward-directed default transitions
(skipping k levels)
Amortized memory bandwidth O((k1/k)N) on N input
chars
Depth-first traversal ? at DFA creation

Memory bandwidth O((dpMAX1)N) Time complexity
O(n2logn) Space complexity O(n2)
Memory bandwidth O((k1/k)N) Time complexity
O(n2) Space complexity O(n)
vs.
Compression w/ k1 compression w/ dpMAX8
8
DFA alphabet reduction

Effective for
Ignore-case regex
Char-ranges
Never used chars

? ?
a-z 0
A 1
B-Z 2
0-9 3
0-9a-zA-Z 4

Alphabet translation table
8
9
Multiple-stride DFAs

Brodie et al, ISCA 2006
Idea
Process stride input chars at a time
Observations
Mechanism used on small DFAs (1-2 regex)
No distinct accepting state handling

DFA w/ stride 2
DFA
0
10
Multiple stride alphabet reduction

Stride s ? Alphabet ?s
?ASCII alphabet ? ?2256265,536
?425644,294M
Effective alphabet much smaller
Char grouping a-cefa, b-fb
Alphabet reduction may be necessary to make
stride doubling feasible on large DFAs

11
Multiple stride default transitions

Compression
Default transitions eliminate transition
redundancy
In multiple stride DFAs
of states does not substantially change
of transitions per state increases
exponentially (? ? ?stride )
Fraction distinct/total transitions decreases
Increased potential for compression!
Accepting state handling
Duplicated states have same outgoing transitions
as original states but different depth
Default transition will remove all outgoing
transitions from new accepting states

12
Multiple stride default transitions (contd)

Problem
For large ? and stride, uncompressed DFA may be
unfeasible
Out of memory when generating a 2K node, stride 4
DFA on a Linux machine w/ 4GB memory
Solution
Perform default transition compression during DFA
creation
Use Becchi et al, ANCS 2006 compression
algorithm
In the situation above, only 10 memory used

13
Putting everything together
DFA
1-22 regex 48-1,940 states
14
NFA

abcd
abce
abc.f
bd-fa
bdc

15
Multiple stride alphabet reduction

Stride doubling
Alphabet reduction
Clustering-based algorithm as for DFA, but sets
of target states are compared

Avoid new state creation
Keep multiple transitions on the same symbol
separated

16
FPGA implementation
(c1b OR c1B) AND NOT (c2a OR c2A)
17
FPGA Results - throughput
18
FPGA Results logic utilization
s7,864 ?164 ?22206
s2,086 ?178 ?21,969
s2,147 ?168 ?21640

Utilization
8-46 on XC5VLX50 device (7,400 slices)
XC5VLX330 device has 51,840 slices

19
ASIC projected results
Regex partitioning into multiple DFAs
Rule-set Stride 1 Stride 1 Stride 1 Stride 1 Stride 2 Stride 2 Stride 2 Stride 2
Rule-set S states Memory footprint Memory footprint S states Memory footprint Memory footprint
Rule-set S states Compressed states Full states S states Compressed states Full states
k-NFA any 78 2,086 - - 1969 2,091 - -
k-DFA any1 59 23,846 505KB 200 KB 850 28,223 356KB 32MB
k-DFA any2 45 86,977 2.9 MB 55 KB 579 102,940 1.27MB 81MB
k-DFA any3 60 14,084 299MB 48 KB 627 19,344 244KB 16 MB

Throughput SRAM_at_500 MHz
2-4 Gbps for stride 1
4-8 Gbps for stride 2

20
Conclusion

Algorithm
Combination of default transition compression,
alphabet reduction and stride multiplying on
potentially large DFAs
Extension of alphabet reduction and stride
multiplying to NFAs
FPGA Implementation
Use of one-hot encoding w/ incremental
improvement schemes
Logic minimization scheme for alphabet reduction
decoding
Additional aspects
Multiple flow handling FPGA vs. memory centric
architectures
Design improvements tailored to specific
architectures and data-sets
Clustering into smaller NFAs and DFAs to allow
smaller alphabets w/ larger strides