Efficient Memory Utilization on Network Processors for Deep Packet Inspection - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Efficient Memory Utilization on Network Processors for Deep Packet Inspection

Description:

University of Massachusetts Lowell. ANCS 2006. U Mass Lowell. Our Contributions ... U Mass Lowell. DPI and Pattern Matching. Deep Packet Inspection. Inspect: ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 41
Provided by: cseW
Category:

less

Transcript and Presenter's Notes

Title: Efficient Memory Utilization on Network Processors for Deep Packet Inspection


1
Efficient Memory Utilization on Network
Processors for Deep Packet Inspection
  • Piti Piyachon
  • Yan Luo
  • Electrical and Computer Engineering Department
  • University of Massachusetts Lowell

2
Our Contributions
  • Study parallelism of a pattern matching algorithm
  • Propose Bit-Byte Aho-Corasick Deterministic
    Finite Automata
  • Construct memory model to find optimal settings
    to minimize the memory usage of DFA

3
DPI and Pattern Matching
  • Deep Packet Inspection
  • Inspect packet header payload
  • Detect computer viruses, worms, spam, etc.
  • Network intrusion detection application Bro,
    Snort, etc.
  • Pattern Matching requirements
  • Matching predefined multiple patterns (keywords,
    or strings) at the same time
  • Keywords can be any size.
  • Keywords can be anywhere in the payload of a
    packet.
  • Matching at line speed
  • Flexibility to accommodate new rule sets

4
Classical Aho-Corasick (AC) DFA example 1
  • A set of keywords
  • he, her, him, his

Failure edges back to state 1 are shown as dash
line. Failure edges back to state 0 are not shown.
5
Memory Matrix Model of AC DFA
  • Snort (Dec05) 2733 keywords
  • 256 next state pointers
  • width 15 bits
  • gt 27,000 states
  • keyword-ID width 2733 bits
  • 27538 x (2733 256 x 15) 22 MB

22 MB is too big for on-chip RAM
6
Bit-AC DFA (Tan-Sherwoods Bit-Split)
Need 8 bit-DFA
7
Memory Matrix of Bit-AC DFA
  • Snort (Dec05) 2733 keywords
  • 2 next state pointers
  • width 9 bits
  • 361 states
  • keyword-ID width 16 bits
  • 1368 DFA
  • 1368 x 361 x (16 2 x 9) 2 MB

8
Bit-AC DFA Techniques
  • Shrinking the width of keyword-ID
  • From 2733 to 16 bits
  • By dividing 2733 keywords into 171 subsets
  • Each subset has 16 keywords
  • Reducing next state pointers
  • From 256 to 2 pointers
  • By dividing each input byte into 1 bits
  • Need 8 bit-DFA
  • Extra benefits
  • The number of states (per DFA) reduces from
    27,000 to 300 states.
  • The width of next state pointer reduces from 15
    to 9 bits.
  • Memory
  • Reduced from 22 MB to 2 MB
  • The number of DFA ?
  • With 171 subsets, each subset has 8 DFA.
  • Total DFA 171 x 8 1,368 DFA

What can we do better to reduce the memory usage?
9
Classical AC DFA example 2
28 states
Failure edges are not shown.
10
Byte-AC DFA
  • Considering 4 bytes at a time
  • 4 DFA
  • lt 9 states / DFA
  • 256 next state pointers!

Similar to Dharmapurikar-Lockwoods JACK DFA,
ANCS05
11
Bit-Byte-AC DFA
  • 4 bytes at a time
  • Each byte divides into bits.
  • 32 DFA ( 4 x 8)
  • lt 6 states/DFA
  • 2 next state pointers

12
Memory Matrix of Bit-Byte-AC DFA
  • Snort (Dec05) 2733 keywords
  • 4 bytes at a time
  • lt 36 states/DFA
  • 2 next state pointers
  • width 6 bits
  • keyword-ID width 3 bits
  • 29152 DFA ( 911 x 32)
  • 29152 x 36 x (3 2 x 6) 1.9 MB
  • 1.9 MB is a little better than 2 MB.
  • This is because
  • It is not any optimal setting.
  • Each DFA has different number of states.
  • Dont need to provide same size of memory matrix
    for every DFA.

13
Bit-Byte-AC DFA Techniques
  • Still keeping the width of keyword-ID as low as
    Bit-DFA.
  • Still keeping next state pointers as small as
    Bit-DFA.
  • Reducing states per DFA by
  • Skipping bytes
  • Exploiting more shared states than Bit-DFA
  • Results of reducing states per DFA
  • from 27,000 to 36 states
  • The width of next state pointer reduces from 15
    to 6 bits.

14
Construction of Bit-Byte AC DFA
bit 3 of byte 0
4 bytes (considered) at a time
15
Construction of Bit-Byte AC DFA
4 bytes (considered) at a time
16
Construction of Bit-Byte AC DFA
4 bytes (considered) at a time
17
Construction of Bit-Byte AC DFA
4 bytes (considered) at a time
18
Construction of Bit-Byte AC DFA
4 bytes (considered) at a time
19
Construction of Bit-Byte AC DFA
4 bytes (considered) at a time
20
Construction of Bit-Byte AC DFA
4 bytes (considered) at a time
21
Construction of Bit-Byte AC DFA
4 bytes (considered) at a time
22
Construction of Bit-Byte AC DFA
4 bytes (considered) at a time
23
Construction of Bit-Byte AC DFA
Failure edges are not shown.
24
Construction of Bit-Byte AC DFA
25
Construction of Bit-Byte AC DFA
32 bit-byte DFA need to be constructed.
26
Bit-Byte-DFA Searching
27
Bit-Byte-DFA Searching
0
A failure edge is shown as necessary.
28
Bit-Byte-DFA Searching
29
Bit-Byte-DFA Searching
0
A failure edge is shown as necessary.
30
Bit-Byte-DFA Searching
31
Find the optimal settings to minimize memory
  • When k keywords per subset
  • The width of keyword-ID k bits
  • k 1, 2, 3, , K
  • when K the number of keywords in the whole set.
  • Snort (Dec.2005) K 2733 keywords
  • b bit(s) extracted for each byte
  • b 1, 2, 4, 8
  • of next state pointers 2b
  • The example 2 b 1
  • Beyond b gt 8
  • gt 256 next state pointers
  • B Bytes considered at a time
  • B 1, 2, 3,
  • The example 2 B 4
  • Total Memory (T) is a function of k, b, and B.
  • T f(k, b, B)

32
Ts Formula
,
and
,
when
Total memory of all bit-ACs in all subset
33
Find the optimal k
 
  • Each pair of (b, B) has one optimal k for a
    minimal T.

keywords per subset
34
Find the optimal b
 
  • Each setting of k, b, and B has different optimal
    point.
  • Choosing only the optimal setting to compare.
  • b 2 is the best.

keywords per subset
35
Find the optimal B
 
     
  • b 2
  • T reduces while B increases.
  • Non-linearly
  • B gt 16,
  • T begins to increase.
  • B 16 is the best for Snort (Dec05).

keywords per subset
36
Comparing with Existing Works
 
     
  • Tan-Sherwoods, Brodie-Cytron-Taylors, and Ours
  • Our Bit-Byte DFA when B16
  • The optimal point at b2 and k12
  • 272 KB
  • 14 of 2001 KB (Tans)
  • 4 of 6064 KB (Brodies)

keywords per subset
37
Comparing with Existing Works
 
     
  • Tan-Sherwoods and Ours At B 1
  • (Tans on ASIC)
  • 2001 KB
  • k 16 is not the optimal setting for B1.
  • Each bit-DFA uses same storages capacity, which
    fits the largest one (worst case).
  • (Ours on NP)
  • 396 KB lt 2001 KB
  • k 3 is the optimal setting for B1.
  • Each bit-DFA uses exactly memory space to hold
    it.

keywords per subset
38
Results with an NP Simulator
 
     
  • NePSim2
  • An open source IXP24xx/28xx simulator
  • NP Architecture based on IXP2855
  • 16 MicroEngines (MEs)
  • 512 KB
  • 1.4 GHz
  • Bit-Byte AC DFA b2, B16, k12
  • T 272 KB
  • 5 Gbps

keywords per subset
39
Conclusion
 
     
  • Bit-Byte DFA model can reduce memory usage up to
    86.
  • Implementing on NP uses on-chip memory more
    efficiently without wasting space, comparing to
    ASIC.
  • NP has flexibility to accommodate
  • The optimal setting of k, b, and B.
  • Different sizes of Bit-Byte DFA.
  • New rule sets in the future.
  • The optimal setting may change.
  • The performance (using a NP simulator) satisfies
    line speed up to 5 Gbps throughput.

keywords per subset
40
Thank you
Question? Piti_Piyachon_at_student.uml.edu Yan_Luo_at_u
ml.edu
Write a Comment
User Comments (0)
About PowerShow.com