Title: Efficient Regular Expression Evaluation: Theory to Practice Michela Becchi and Patrick Crowley
1Efficient Regular Expression Evaluation Theory
to PracticeMichela Becchi and Patrick Crowley
ANCS08
2Motivation
- Size and complexity of rule-set increased in
recent years - Snort, as of November 2007
- 8536 rules, 5549 Perl Compatible Regular
Expressions - 99 with character ranges (c1-ck,\s,\w)
- 16.3 with dot-star terms (., c1..ck
- 44 with counting constraints (.n.m,
c1..ckn,m) - Several proposals to accelerate regular
expression matching - FPGA
- Memory centric architecture
3Objectives
- Can we converge distinct algorithmic techniques
into a single proposal also for large data-sets? - Can we apply techniques intended for memory
centric architectures also on FPGAs?
Provide tool to allow anybody to implement a high
throughput DPI system on the architecture of
choice
4Target Architectures
Regex-Matching Engine
Memory-centric architectures
FPGA logic
FPGA / ASIC memory
Generalpurpose processors
Network processors
5Challenges
DFA
NFA
FPGA logic
- Logic cell utilization
- Clock frequency
- Memory space
- Memory bandwidth
6D2FA default transition compression
- Observations
- DFA state set of ? next state pointers
- Transition redundancy
- Idea
- Differential state representation through use of
non-consuming default transitions - In general
s3
a
s3
a
s4
b
s1
s4
b
s1
c
s5
c
s5
s3
a
s4
b
s2
s6
c
c
s2
s6
7D2FA algorithms
- Problem set default transitions so to
- Maximize memory compression
- Minimize memory bandwidth overhead
- Kumar et al, SIGCOMM06
- Bound dpMAX on max default path length
- O(dpMAX1) memory accesses per input char
- Better compression for higher dpMAX
- Becchi et al, ANCS07
- Only backward-directed default transitions
(skipping k levels) - Amortized memory bandwidth O((k1/k)N) on N input
chars - Depth-first traversal ? at DFA creation
Memory bandwidth O((dpMAX1)N) Time complexity
O(n2logn) Space complexity O(n2)
Memory bandwidth O((k1/k)N) Time complexity
O(n2) Space complexity O(n)
vs.
Compression w/ k1 compression w/ dpMAX8
8DFA alphabet reduction
- Effective for
- Ignore-case regex
- Char-ranges
- Never used chars
? ?
a-z 0
A 1
B-Z 2
0-9 3
0-9a-zA-Z 4
Alphabet translation table
8
9Multiple-stride DFAs
- Brodie et al, ISCA 2006
- Idea
- Process stride input chars at a time
- Observations
- Mechanism used on small DFAs (1-2 regex)
- No distinct accepting state handling
DFA w/ stride 2
DFA
0
10Multiple stride alphabet reduction
- Stride s ? Alphabet ?s
- ?ASCII alphabet ? ?2256265,536
?425644,294M - Effective alphabet much smaller
- Char grouping a-cefa, b-fb
- Alphabet reduction may be necessary to make
stride doubling feasible on large DFAs
11Multiple stride default transitions
- Compression
- Default transitions eliminate transition
redundancy - In multiple stride DFAs
- of states does not substantially change
- of transitions per state increases
exponentially (? ? ?stride ) - Fraction distinct/total transitions decreases
- Increased potential for compression!
- Accepting state handling
- Duplicated states have same outgoing transitions
as original states but different depth - Default transition will remove all outgoing
transitions from new accepting states
12Multiple stride default transitions (contd)
- Problem
- For large ? and stride, uncompressed DFA may be
unfeasible - Out of memory when generating a 2K node, stride 4
DFA on a Linux machine w/ 4GB memory - Solution
- Perform default transition compression during DFA
creation - Use Becchi et al, ANCS 2006 compression
algorithm - In the situation above, only 10 memory used
13Putting everything together
DFA
1-22 regex 48-1,940 states
14NFA
- abcd
- abce
- abc.f
- bd-fa
- bdc
15Multiple stride alphabet reduction
- Stride doubling
- Alphabet reduction
- Clustering-based algorithm as for DFA, but sets
of target states are compared
- Avoid new state creation
- Keep multiple transitions on the same symbol
separated
16FPGA implementation
(c1b OR c1B) AND NOT (c2a OR c2A)
17FPGA Results - throughput
18FPGA Results logic utilization
s7,864 ?164 ?22206
s2,086 ?178 ?21,969
s2,147 ?168 ?21640
- Utilization
- 8-46 on XC5VLX50 device (7,400 slices)
- XC5VLX330 device has 51,840 slices
19ASIC projected results
Regex partitioning into multiple DFAs
Rule-set Stride 1 Stride 1 Stride 1 Stride 1 Stride 2 Stride 2 Stride 2 Stride 2
Rule-set S states Memory footprint Memory footprint S states Memory footprint Memory footprint
Rule-set S states Compressed states Full states S states Compressed states Full states
k-NFA any 78 2,086 - - 1969 2,091 - -
k-DFA any1 59 23,846 505KB 200 KB 850 28,223 356KB 32MB
k-DFA any2 45 86,977 2.9 MB 55 KB 579 102,940 1.27MB 81MB
k-DFA any3 60 14,084 299MB 48 KB 627 19,344 244KB 16 MB
- Throughput SRAM_at_500 MHz
- 2-4 Gbps for stride 1
- 4-8 Gbps for stride 2
20Conclusion
- Algorithm
- Combination of default transition compression,
alphabet reduction and stride multiplying on
potentially large DFAs - Extension of alphabet reduction and stride
multiplying to NFAs - FPGA Implementation
- Use of one-hot encoding w/ incremental
improvement schemes - Logic minimization scheme for alphabet reduction
decoding - Additional aspects
- Multiple flow handling FPGA vs. memory centric
architectures - Design improvements tailored to specific
architectures and data-sets - Clustering into smaller NFAs and DFAs to allow
smaller alphabets w/ larger strides
21Thank you!
- Questions?
- http//regex.wustl.edu