Efficient Memory Utilization on Network Processors for Deep Packet Inspection - PowerPoint PPT Presentation

1 / 40

About This Presentation

Title:

Efficient Memory Utilization on Network Processors for Deep Packet Inspection

Description:

University of Massachusetts Lowell. ANCS 2006. U Mass Lowell. Our Contributions ... U Mass Lowell. DPI and Pattern Matching. Deep Packet Inspection. Inspect: ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 41

Provided by: cseW

Learn more at: https://www.cse.wustl.edu

Category:

more less

Transcript and Presenter's Notes

Title: Efficient Memory Utilization on Network Processors for Deep Packet Inspection

1
Efficient Memory Utilization on Network
Processors for Deep Packet Inspection

Piti Piyachon
Yan Luo
Electrical and Computer Engineering Department
University of Massachusetts Lowell

2
Our Contributions

Study parallelism of a pattern matching algorithm
Propose Bit-Byte Aho-Corasick Deterministic
Finite Automata
Construct memory model to find optimal settings
to minimize the memory usage of DFA

3
DPI and Pattern Matching

Deep Packet Inspection
Inspect packet header payload
Detect computer viruses, worms, spam, etc.
Network intrusion detection application Bro,
Snort, etc.
Pattern Matching requirements
Matching predefined multiple patterns (keywords,
or strings) at the same time
Keywords can be any size.
Keywords can be anywhere in the payload of a
packet.
Matching at line speed
Flexibility to accommodate new rule sets

4
Classical Aho-Corasick (AC) DFA example 1

A set of keywords
he, her, him, his

Failure edges back to state 1 are shown as dash
line. Failure edges back to state 0 are not shown.
5
Memory Matrix Model of AC DFA

Snort (Dec05) 2733 keywords
256 next state pointers
width 15 bits
gt 27,000 states
keyword-ID width 2733 bits
27538 x (2733 256 x 15) 22 MB

22 MB is too big for on-chip RAM
6
Bit-AC DFA (Tan-Sherwoods Bit-Split)
Need 8 bit-DFA
7
Memory Matrix of Bit-AC DFA

Snort (Dec05) 2733 keywords
2 next state pointers
width 9 bits
361 states
keyword-ID width 16 bits
1368 DFA
1368 x 361 x (16 2 x 9) 2 MB

8
Bit-AC DFA Techniques

Shrinking the width of keyword-ID
From 2733 to 16 bits
By dividing 2733 keywords into 171 subsets
Each subset has 16 keywords
Reducing next state pointers
From 256 to 2 pointers
By dividing each input byte into 1 bits
Need 8 bit-DFA
Extra benefits
The number of states (per DFA) reduces from
27,000 to 300 states.
The width of next state pointer reduces from 15
to 9 bits.
Memory
Reduced from 22 MB to 2 MB
The number of DFA ?
With 171 subsets, each subset has 8 DFA.
Total DFA 171 x 8 1,368 DFA

What can we do better to reduce the memory usage?
9
Classical AC DFA example 2
28 states
Failure edges are not shown.
10
Byte-AC DFA

Considering 4 bytes at a time
4 DFA
lt 9 states / DFA
256 next state pointers!

Similar to Dharmapurikar-Lockwoods JACK DFA,
ANCS05
11
Bit-Byte-AC DFA

4 bytes at a time
Each byte divides into bits.
32 DFA ( 4 x 8)
lt 6 states/DFA
2 next state pointers

12
Memory Matrix of Bit-Byte-AC DFA

Snort (Dec05) 2733 keywords
4 bytes at a time
lt 36 states/DFA
2 next state pointers
width 6 bits
keyword-ID width 3 bits
29152 DFA ( 911 x 32)
29152 x 36 x (3 2 x 6) 1.9 MB

1.9 MB is a little better than 2 MB.
This is because
It is not any optimal setting.
Each DFA has different number of states.
Dont need to provide same size of memory matrix
for every DFA.

13
Bit-Byte-AC DFA Techniques

Still keeping the width of keyword-ID as low as
Bit-DFA.
Still keeping next state pointers as small as
Bit-DFA.
Reducing states per DFA by
Skipping bytes
Exploiting more shared states than Bit-DFA
Results of reducing states per DFA
from 27,000 to 36 states
The width of next state pointer reduces from 15
to 6 bits.

14
Construction of Bit-Byte AC DFA
bit 3 of byte 0
4 bytes (considered) at a time
15
Construction of Bit-Byte AC DFA
4 bytes (considered) at a time
16
Construction of Bit-Byte AC DFA
4 bytes (considered) at a time
17
Construction of Bit-Byte AC DFA
4 bytes (considered) at a time
18
Construction of Bit-Byte AC DFA
4 bytes (considered) at a time
19
Construction of Bit-Byte AC DFA
4 bytes (considered) at a time
20
Construction of Bit-Byte AC DFA
4 bytes (considered) at a time
21
Construction of Bit-Byte AC DFA
4 bytes (considered) at a time
22
Construction of Bit-Byte AC DFA
4 bytes (considered) at a time
23
Construction of Bit-Byte AC DFA
Failure edges are not shown.
24
Construction of Bit-Byte AC DFA
25
Construction of Bit-Byte AC DFA
32 bit-byte DFA need to be constructed.
26
Bit-Byte-DFA Searching
27
Bit-Byte-DFA Searching
0
A failure edge is shown as necessary.
28
Bit-Byte-DFA Searching
29
Bit-Byte-DFA Searching
0
A failure edge is shown as necessary.
30
Bit-Byte-DFA Searching
31
Find the optimal settings to minimize memory

When k keywords per subset
The width of keyword-ID k bits
k 1, 2, 3, , K
when K the number of keywords in the whole set.
Snort (Dec.2005) K 2733 keywords
b bit(s) extracted for each byte
b 1, 2, 4, 8
of next state pointers 2b
The example 2 b 1
Beyond b gt 8
gt 256 next state pointers
B Bytes considered at a time
B 1, 2, 3,
The example 2 B 4
Total Memory (T) is a function of k, b, and B.
T f(k, b, B)

32
Ts Formula
,
and
,
when
Total memory of all bit-ACs in all subset
33
Find the optimal k

Each pair of (b, B) has one optimal k for a
minimal T.

keywords per subset
34
Find the optimal b

Each setting of k, b, and B has different optimal
point.
Choosing only the optimal setting to compare.
b 2 is the best.

keywords per subset
35
Find the optimal B

b 2
T reduces while B increases.
Non-linearly
B gt 16,
T begins to increase.
B 16 is the best for Snort (Dec05).

keywords per subset
36
Comparing with Existing Works

Tan-Sherwoods, Brodie-Cytron-Taylors, and Ours

Our Bit-Byte DFA when B16
The optimal point at b2 and k12
272 KB
14 of 2001 KB (Tans)
4 of 6064 KB (Brodies)

keywords per subset
37
Comparing with Existing Works

Tan-Sherwoods and Ours At B 1

(Tans on ASIC)
2001 KB
k 16 is not the optimal setting for B1.
Each bit-DFA uses same storages capacity, which
fits the largest one (worst case).
(Ours on NP)
396 KB lt 2001 KB
k 3 is the optimal setting for B1.
Each bit-DFA uses exactly memory space to hold
it.

keywords per subset
38
Results with an NP Simulator

NePSim2
An open source IXP24xx/28xx simulator
NP Architecture based on IXP2855
16 MicroEngines (MEs)
512 KB
1.4 GHz
Bit-Byte AC DFA b2, B16, k12
T 272 KB
5 Gbps

keywords per subset
39
Conclusion

Bit-Byte DFA model can reduce memory usage up to
86.
Implementing on NP uses on-chip memory more
efficiently without wasting space, comparing to
ASIC.
NP has flexibility to accommodate
The optimal setting of k, b, and B.
Different sizes of Bit-Byte DFA.
New rule sets in the future.
The optimal setting may change.
The performance (using a NP simulator) satisfies
line speed up to 5 Gbps throughput.