A High Throughput String Matching Architecture for Intrusion Detection and Prevention - PowerPoint PPT Presentation

About This Presentation
Title:

A High Throughput String Matching Architecture for Intrusion Detection and Prevention

Description:

A High Throughput String Matching Architecture for Intrusion Detection and Prevention – PowerPoint PPT presentation

Number of Views:170
Avg rating:3.0/5.0
Slides: 30
Provided by: timshe6
Category:

less

Transcript and Presenter's Notes

Title: A High Throughput String Matching Architecture for Intrusion Detection and Prevention


1
A High ThroughputString Matching
Architecturefor Intrusion Detection and
Prevention
Lin Tan U of Illinois, Urbana Champaign Tim
Sherwood UC, Santa Barbara
2
Outline
  • Why String Matching
  • Matching against multiple strings
  • The Aho-Corasick Algorithm
  • The Devil in the Constants
  • A Bit-Split Algorithm
  • Hardware Design and Analysis
  • Conclusions

3
To Protect and Serve
  • Our machines are constantly under attack
  • Cannot rely on end users, we need networks which
    actively defend themselves.

IDS/IPS are promising ways of providing
protection Market for such systems 918.9
million by the end of 2007. Snort an widely
accepted open source IDS
This requires the protection system to be able to
operate at 10 to 40 Gb/s. (We aim at current and
next generation networks.)
4
Our Contributions
  • String Matching Architecture
  • 0.4MB and 10Gbps for Snort rule set ( gt10,000
    characters)
  • Bit-Split String Matching Algorithm
  • Reduces out edges from 256 to 2.
  • Performance/area beats the best techniques we
    examined by a factor of 10 or more.

5
Scanning for Intrusions
CodeRed worm web flow established
uricontent with /root.exe
SoftwareIDS
Scan
Traffic In
Traffic Out
Most IDS define a set of rules.
A string defines a suspicious transmission.
We are not building a full IDS, rather building
the primitives from which full systems can be
built
6
Multiple String Matching
  • The multiple string matching algorithm
  • Input A set of strings/patterns S, and a buffer
    b
  • Output Every occurrence of an element of S in b
  • Extra constraint b is really a stream
  • How to implement
  • Option 1) search for each string independently
  • Option 2) combine strings together and search all
    at once

A string can be anywhere in the payload of a
packet.
Input
Strings
7
Why hardware
  • Snort gt1,000 rules, growing at 1 rule/day or
    more
  • Active research into automated rule building
  • Strings are not limited to be just a-z
  • We need a high speed string matching technique
    with stringent worst case performance.
  • Many algorithms are targeted for average case
    performance. Aho-Corasick can scan once and
    output all matches. But it is too big to be
    on-chip.

8
Outline
  • Why String Matching
  • Matching against multiple strings
  • The Aho-Corasick Algorithm
  • The Devil in the Constants
  • A Bit-Split Algorithm
  • Hardware Design and Analysis
  • Conclusions

9
The Aho-Corasick Algorithm
  • Given a finite set P of patterns, build a
    deterministic finite automaton G accepting the
    set of all patterns in P.

10
An AC Automaton Example
  • Example P he, she, his, hers
  • The Construction linear time.
  • The search of all patterns in P linear time

(Edges pointing back to State 0 are not shown).
11
Linear Time So whats the problem
  • How to implement it on chip?

256 Next State Pointers
lt14gt lt14gt lt14gt lt14gt
lt14gt
  • Problem Size too big to be on-chip
  • 10,000 nodes
  • 256 out edges per node
  • Requires 16,38425614 10MB
  • Solution partition into small state machines
  • Less strings per machine
  • Less out edges per machine

12
Outline
  • Why String Matching
  • Matching against multiple strings
  • The Aho-Corasick Algorithm
  • The Devil in the Constants
  • A Bit-Split Algorithm
  • Hardware Design and Analysis
  • Conclusions

13
Our Main Idea Bit-Split
  • Partition rules (P) into smaller sets (P0 to Pn)
  • Build AC state-machine for each subset
  • For each DFA Pi, rip state-machine apart into 8
    tiny state-machines (Bi0 through Bi7)
  • Each of which searches for 1 bit in the 8 bit
    encoding of an input character
  • Only if all the different B machines agree can
    there actually a match

14
Binary Encoding
P0 he, she, his, hers
15
An example of Bit-Split
P0 he, she, his, hers
P0
B03
b0 0
1
1
1
1
0
b1

0
b2

,1
0
,3
S
h
0
S
1
h
b3
0,1,2,6
0,3
b40,1,4
h
S
h
i
S
0
0
h
0
S
b60,1,2,5,6
1
h
S
h
0
b30,1,2,6
1
r
0
1
b50,3,7,8
h
S
1
b70,3,9
(Edges pointing back to State 0 are not shown).
16
Compact State Set
P0 he, she, his, hers
P0
B03
b0
1
1
1
0
b1

b2

S
h
0
S
1
h
b4
h
S
h
i
S
0
0
h
0
S
b6 2,5
1
h
S
h
0
b3 2
1
r
0
1
b57
h
S
1
b79
(Edges pointing back to State 0 are not shown).
17
An example of Bit-Split
P0 he, she, his, hers
P0
B03
B04
(Edges pointing back to State 0 are not shown).
18
Nice Properties
  • The number of states in Bij is rigorously
    bounded by the number of states in Pi
  • No exponential blow up in state
  • Linear construction time
  • Possible to traverse multiple edges at a time to
    multiply throughput

19
Matching on the example
Input stream
h
x
h
e
r
s
Only scan the input stream once.
20
Matching on the example
h
x
h
e
0
1
0
0
1
1
1
0
P0
B03
B04
2
How do you combine the results from the
different state machines? Only if all the state
machines agree, is there actually a match.
21
How to Implement
  • The AC state machine is equivalent to the 8 tiny
    state machines.
  • The 8 tiny state machines can run independently,
    which means in parallel
  • Intersection done with bit-wise AND.
  • 8 is intuitive but not optimal
  • How to build a system to implement this
    algorithm?
  • Our algorithm makes it feasible to be on-chip

22
A Hardware Implementation
String Match Engine
State Machine Tile
Rule Module 0
Tile 0
Tile 3
ControlBlock
4 Next State Pointers
Partial Match Vector
Byte from Payload
2-bit Input 01 Partial Match Vector
67
lt8gt lt8gt lt8gt lt8gt
lt16gt
23
45
Current State lt8gt
Tile 1
Tile 2
Full Match Vector
8
16

Complete Set of Matches for All Rules
Input
Output Latch
Config Data
2 bits fromeach byte
PartialMatchVector
  • A rule module is equivalent to an AC state
    machine
  • Rule modules, tiles are structurally equivalent
  • All full match vectors are concatenated to
    indicate which strings are matched
  • One tile stores one tiny bit-split state machine

23
An efficient Implementation
Cycle 3 e 01 10 01 01
Cycle 2 h 01 10 10 00
Cycle 1 x 01 11 10 00
Cycle 0 h 01 10 10 00

2
2
2
2
Tile 0
Tile 2
Tile 1
Tile 3
00 01 10 11 PMV
0 1 0 2 0 0000
1 1 0 3 0 0000
2 1 0 5 0 0000
3 1 6 5 0 0000
4 7 0 2 0 1000
5 0 4 5 0 0000
6 7 0 2 0 1100
7 9 0 3 0 0000
8 1 0 3 0 0010
9 1 0 3 0 0001
00 01 10 11 PMV
0 1 0 0 2 0000
1 1 3 0 2 0000
2 4 0 0 2 0000
3 1 0 5 6 1000
4 1 7 0 2 0000
5 1 0 0 8 0000
6 4 0 0 2 0010
7 1 0 5 6 1100
8 4 0 0 2 0001
9
00 01 10 11 PMV
0 0 1 0 0 0000
1 0 2 0 0 0000
2 0 3 0 0 1000
3 0 4 0 0 1110
4 0 4 0 0 1111
5
6
7
8
9
00 01 10 11 PMV
0 0 0 1 2 0000
1 0 0 3 2 0000
2 0 0 4 2 0000
3 0 0 3 5 1000
4 0 0 6 2 0000
5 0 0 4 7 0010
6 0 0 3 5 1100
7 0 0 4 2 0001
8
9
e 1000
h 0000
x 0000
h 0000
e 1100
h 0000
x 0000
h 0000
e 1000
h 0000
x 0000
h 0000
e 1111
h 1110
x 1000
h 0000
Cycle 3 P 1000
Cycle 2 P 0000
Cycle 1 P 0000
Cycle 0 P 0000
24
Performance of Hardware
Key Metric ThroughputCharacter/Area
25
Related Work
  • Software based
  • Good for 100Mb/s, common case
  • FPGA-based
  • Many schemes map rules down to a specialized
    circuit
  • Near optimal utilization of hardware resources
  • Implementing state machines on block-RAMs Cho
    and Mangione-Smith
  • Concurrent to our work mapping state machines to
    on-chip SRAM Aldwairi et. al.
  • Bloom filters Dharmapurikar et al.
  • Excellent filter in the common case
  • TCAM-based
  • Require all patterns to be shorter or equal to
    TCAM width
  • Cutting long patterns 2Gbps with 295KB TCAM Yu
    et. al.

26
Conclusions
  • New Tile-based Architecture
  • 0.4MB and 10Gbps for Snort rule set ( gt10,000
    characters)
  • Possible to be used for other applications, e.g.
    IP lookups, packet classification.
  • New Bit-split Algorithm
  • General purpose enough for many other
    applications, e.g. spam detection, peephole
    optimization, IP lookups, packet classification,
    etc.
  • Feasible to be implemented on other tile-based
    architecture.

27
Thank you! Questions?
28
  • Backup Slides

29
An efficient Implementation
Cycle 3 e 01 10 01 01
Cycle 2 h 01 10 10 00
Cycle 1 x 01 11 10 00
Cycle 0 h 01 10 10 00

2
2
2
2
Tile 0
Tile 2
Tile 1
Tile 3
00 01 10 11 PMV
0 1 0 2 0 0000
1 1 0 3 0 0000
2 1 0 5 0 0000
3 1 6 5 0 0000
4 7 0 2 0 1000
5 0 4 5 0 0000
6 7 0 2 0 1100
7 9 0 3 0 0000
8 1 0 3 0 0010
9 1 0 3 0 0001
00 01 10 11 PMV
0 1 0 0 2 0000
1 1 3 0 2 0000
2 4 0 0 2 0000
3 1 0 5 6 1000
4 1 7 0 2 0000
5 1 0 0 8 0000
6 4 0 0 2 0010
7 1 0 5 6 1100
8 4 0 0 2 0001
9
00 01 10 11 PMV
0 0 1 0 0 0000
1 0 2 0 0 0000
2 0 3 0 0 1000
3 0 4 0 0 1110
4 0 4 0 0 1111
5
6
7
8
9
00 01 10 11 PMV
0 0 0 1 2 0000
1 0 0 3 2 0000
2 0 0 4 2 0000
3 0 0 3 5 1000
4 0 0 6 2 0000
5 0 0 4 7 0010
6 0 0 3 5 1100
7 0 0 4 2 0001
8
9
e 1000
h 0000
x 0000
h 0000
e 1100
h 0000
x 0000
h 0000
e 1000
h 0000
x 0000
h 0000
e 1111
h 1110
x 1000
h 0000
Cycle 3 P 1000
Cycle 2 P 0000
Cycle 1 P 0000
Cycle 0 P 0000
Write a Comment
User Comments (0)
About PowerShow.com