Title: Advanced Algorithms for Fast and Scalable Deep Packet Inspection
1Advanced Algorithms for Fast and Scalable Deep
Packet Inspection
Sailesh Kumar Jonathan Turner John Williams
2Why Regular Expressions Acceleration?
- RegEx are now widely used
- Network intrusion detection systems, NIDS
- Layer 7 switches, load balancing
- Firewalls, filtering, authentication and
monitoring - Content-based traffic management and routing
- RegEx matching is expensive
- Space Large amount of memory
- Bandwidth Requires 1 state traversal per byte
- RegEx is performance bottleneck
- In enterprise switches from Cisco, etc
- Many security appliances
- Use DFA, 1 GB memory, still sub-gigabit
throughput - Need to accelerate RegEx!
3Can we do better?
- Well studied in compiler literature
- Whats different in Networking?
- Can we do better?
- Construction time versus execution time (grep)
- Traditionally, (construction execution) time is
the metric - In networking context, execution time is critical
- Also, there may be thousands of patterns
- DFAs are fast
- But can have exponentially large number of states
- Algorithms exist to minimize number of states
- Still 1) low performance and 2) gigabytes of
memory
4Delayed Input DFA (D2FA), SIGCOMM06
- Many transitions
- 256 transitions per state
- 50 distinct transitions per state (real world
datasets) - Need 50 words per state
- Reduce number of transitions in a DFA
Three rules a, bc, cd
Look at state pairs there are many common
transitions. How to remove them?
4 transitions per state
5Delayed Input DFA (D2FA), SIGCOMM06
- Many transitions
- 256 transitions per state
- 50 distinct transitions per state (real world
datasets) - Need 50 words per state
- Reduce number of transitions in a DFA
Alternative Representation
Three rules a, bc, cd
4 transitions per state
Fewer transitions, less memory
6D2FA Operation
Heavy edges are called default transitions Take
default transitions, whenever, a labeled
transition is missing
DFA
D2FA
7D2FA versus DFA
- D2FAs are compact but requires multiple memory
accesses - Up to 20x increased memory accesses
- Not desirable in off-chip architecture
- Can D2FAs match the performance of DFAs
- YES!!!!
- Content Addressed D2FAs (CD2FA)
- CD2FAs require only one memory access per byte
- Matches the performance of a DFA in cacheless
system - Systems with data cache, CD2FA are 2-3x faster
- CD2FAs are 10x compact than DFAs
8Introduction to CD2FA
- How to avoid multiple memory accesses of D2FAs?
- Avoid lookup to decide if default path needs to
be taken - Avoid default path traversal
- Solution Assign labels to each state, labels
contain - Characters for which it has labeled transitions
- Information about all of its default states
- Characters for which its default states have
labeled transitions
find node Rat location R
Content Labels
find node U athash(c,d,R)
find node V athash(a,b,hash(c,d,R))
9Introduction to CD2FA
?(R, a)
?(R, b)
?(Z, a)
?(Z, b)
R
R
all
all
Z
U
c
l
cd,R
lm,Z
Y
d
m
pq,lm,Z
V
a
P
ab,cd,R
X
b
q
?(X, p)
?(X, q)
?(V, a)
?(V, b)
hash(p,q,hash(l,m,Z))
lm,Z
pq,lm,Z
hash(c,d,R)
Input char
d
a
hash(a,b,hash(c,d,R))
Current state V (label ab,cd,R)
? X (label pq,lm,Z)
10Construction of CD2FA
- We seek to keep the content labels small
- Twin Objectives
- Ensure that states have few labeled transitions
- Ensure that default paths are as small as
possible - D2FA construction heuristic based upon maximum
weight spanning tree creates long default paths - Limit default paths gt less space efficient D2FAs
- Proposed new heuristic called CRO to construct
D2FAs - Runs in 3 phases Construction, Reduction and
Optimization - Default path bound 2 edges gt CRO algorithm
constructs upto 10x space efficient D2FAs - CD2FAs are constructed from these D2FAs
11Memory Mapping in CD2FA
?(R, a)
?(R, b)
?(Z, a)
?(Z, b)
R
Z
R
all
all
U
Y
c
l
cd,R
lm,R
d
m
pq,lm,R
V
X
a
P
ab,cd,R
b
q
WE HAVE ASSUMED THAT HASHING IS COLLISION FREE
hash(a,b,hash(c,d,R))
hash(c,d,R))
hash(p,q,hash(l,m,Z))
COLLISION
12Collision-free Memory Mapping
a
Four states
hash(abc, )
b
a
b
c
,
.
c
4 memory locations
p
hash(pqr, )
q
p
q
r
,
.
r
l
hash(def, )
hash(mln, )
WE NEED SYSTEMATIC APPRAOCH
n
,
.
l
m
m
n
hash(lmn, )
d
hash(edf, )
e
d
e
f
,
.
f
13Bipartite Graph Matching
- Bipartite Graph
- Left nodes are state content labels
- Right nodes are memory locations
- Map state labels to unique memory locations
- An edge for every choice of content label
- Perfect matching problem
- With n left and right nodes
- Need O(logn) random edges
- n 1M implies, we need 20 edges per node
- If we provide slight memory over-provisioning
- We can uniquely map state labels with much fewer
edges - In our experiments, we found perfect matching
without memory over-provisioning
14Memory Reduction Results
15Throughput Results
3x Faster 4KB cache
16Conclusion
- We have proposed CD2FAs
- Matches/surpasses a DFA in throughput
- 10x less memory than table compressed DFA
- Novel randomized memory mapping algorithm based
upon maximum matching in bipartite graph - Zero space overhead
- Zero bandwidth overhead
- Thank you and Questions???