Highthroughput LinkedPattern Matching for Intrusion Detection System - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Highthroughput LinkedPattern Matching for Intrusion Detection System

Description:

Introduction to Intrusion Detection and hardware pattern matching ... Text-oriented language for batch processing of text (pattern databases) ... – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 23
Provided by: seoni
Category:

less

Transcript and Presenter's Notes

Title: Highthroughput LinkedPattern Matching for Intrusion Detection System


1
High-throughput Linked-Pattern Matching for
Intrusion Detection System
  • Zachary Baker
  • and
  • Viktor K. Prasanna
  • University of Southern California
  • http//ceng.usc.edu/prasanna

2
Outline
  • Introduction to Intrusion Detection and hardware
    pattern matching
  • Performance-centered design flow
  • Area, performance over large rule databases is
    more important
  • Methodology
  • Library of architectural options
  • Separate pre-decoded pipelines
  • Basic architecture results
  • Customized performance
  • Partitioning
  • Prefix trees
  • Correlated Content
  • Tool details
  • Reducing PAR cost through incremental synthesis
  • Handling multiple streams through re-sending
  • Efficiency of re-sending strategy
  • Conclusion

3
IDS Intrusion Detection Systems
  • All incoming packets are filtered for specific
    characteristics or content
  • Current databases have thousands of patterns
    requiring string matching
  • FPGA allows fine-grained parallelism and
    computational reuse
  • 10 Gb/s and higher rates desired
  • This is an fairly artificial bound, header
    processing can reduce overall string matching
    burdens
  • Provided by pipelined, streaming architectures

4
String Matching
  • Throughput of units must equal maximum buffered
    traffic on network
  • Current Strategies
  • Naïve approach
  • Hashing ala Bloom filters
  • KMP, Boyer-Moore, Aho-Corasick (especially
    bit-splitting, Eatherton trie)
  • Hardwired shift-and-compare
  • Very fast and simple units
  • Allows variety of interesting meta-layer work to
    be tacked on

5
Performance-Centered Design Flow
  • System-centric view
  • Several thousand pattern matching rules
  • Unit view of matching units can be inefficient
  • Unnecessary work and hardware
  • Methodology works towards time and area
    performance metrics
  • System level design
  • Reuse of hardware elements
  • Option to exchange some area efficiency for
    bandwidth
  • Allows for patterns to be combined together to
    extract deeper information

6
Tool Flow
7
Library of Design Options
  • Basic Architecture
  • Shared decoding
  • Characters are decoded into one-bit pipelines and
    distributed to units as needed
  • Correlated Content Linkages
  • Reduces false-positives
  • Reduces burden on external controller
  • Customized performance
  • Prefix trees
  • Take advantage of partition-generated similarity
    by creating shared prefixes
  • High throughput architectures
  • Take advantage of low area requirements to
    replicate hardware

8
Basic ArchitecturePartitioned Pipelines
9
Basic Architecture Pre-Processing
  • System-level partitioning of patterns
  • Reduction in pipeline burden through min-cut
    partitioning
  • Shared characters are grouped into independent
    pipelines, increasing single-chip throughput and
    allowing for effective multi-chip partitioning
  • Tool first generates graph representation of
    pattern set, then executes partitioning routine.
    The partitioned graphs are then translated into
    architecture description
  • Partitioning also useful in reducing PR time

10
Results - Basic Architecture
  • Partitioning results in 20 increase in clock
    frequency - Optimal number of partitions is
    unpredictable

11
Flow Options Area-Efficient Tree Architecture
  • 4-byte prefixes turn out to be very appropriate
    for intrusion detection
  • /cgi-bin/bigconf.cgi/cgi-bin/common/listrec.pl/
    cgi-sys/addalink.cgi/cgi-sys/entropysearch.cgi
  • Script searches for 4-byte prefixes and
    sub-prefixes then generates prefix matchers in
    hardware description.
  • Essentially hardware Aho-Corasick tree

Average of 15 decrease in area, 5 decrease in
clock period over plain unary
12
Correlated Content Layer
  • Link together pattern matchers
  • Form state machines from low-level comparators
  • form higher-level ideas
  • basic regular expressions are available
  • (alert tcp EXTERNAL_NET any -gt HTTP_SERVERS
    HTTP_PORTSmsg "attackpattern" content
    "attack" content "pattern" within 5)

13
Correlated Content Layer
  • Benefits
  • reduces burden on external controller
  • reduces number of inputs to priority encoder
  • basic regular expression functionality
  • AND, OR, !, within, distance, character classes
  • Disadvantages
  • Adds state that has to be maintained when streams
    switch
  • But only the counters that are active

14
Tool Details
  • Implemented in Perl
  • Text-oriented language for batch processing of
    text (pattern databases)and generation of VHDL
    outputs
  • Utilizes the Metis partitioning library (U/Minn)
  • Template-based generation of architecture
    descriptions
  • Graph Creation, Partitioning
  • Run time 30 seconds for lt 2000 patterns
  • Insignificant time costs compared to improvements
    in performance
  • Place and Route processing times dwarf
    architecture generation costs
  • Problem with all hardwired shift-and-compare
    architectures

15
Small Changes to Rule-set
  • In normal flow, changes to a single character
    would result in recompilation of the entire
    design
  • Wasteful and a lengthy process
  • In general, routing tools do not handle small
    changes well
  • Reduced frequency performance
  • Interaction of interconnect and mapping is highly
    connected to performance
  • However, if blocks of architecture can be
    physically separated on the device, interaction
    is eliminated
  • Creates a smaller place and route problem
  • Small changes can be integrated without full
    recompilation

16
Increasing Speed Place Route
  • Key Predefinition of area constraints for
    each partition
  • ideally the partitions are balanced
  • Underutilization of device blocks makes meeting
    timing constraints easier

17
Increasing Speed Place Route
  • Definitions
  • The optimal partition is selected from the set
    of partitions P
  • Sp is the set of characters required to
    represent the new pattern p
  • is the set difference between the
    characters currently represented in Pi and the
    characters that are present in Sp
  • The partition which will require the addition of
    the minimum number of new characters is the
    optimal partition Pj

18
Incremental Synthesis
  • Goal Reduce place and route costs
  • Using ISE 6.2 Incremental Synthesis support, each
    partition has independent area constraints on
    device
  • Change/addition in one partition does not affect
    other placements partition
  • Cost for changing rules in one of k partitions
  • 1/k guide file processing overhead

19
System Packet Flow
  • Packets are reordered and packet contents are
    sent as stream to string matching units

20
Suffix Resending
  • If an attack spans multiple packets it will not
    be detected if the system looks at packets on a
    one-by-one
  • Packets must be condensed into a stream
  • If time multiplexing is required some section of
    the previous session can be pre-pended to the new
    packets
  • Reserved section equal to the length of the
    longest attack

21
Suffix Resending
  • The necessity to resend packets causes some
    inefficiency in a multiple stream system
  • However, TCP and IP header overhead do not need
    to be handled by the string matching system,
    allowing for us to make up the difference
  • Average internet packet is 402 bytes long
  • Longest attack in our survey of Hogwash database
    is 257 bytes
  • TCP and IP headers equal 40 bytes
  • Thus, if 7 packets are issued to the string
    matching units at a time, the overheads are
    equalized and efficiency is 100

22
Overview
  • Variations in tool flow provide customizable
    performance
  • Tool Options
  • Small partitioned and pre-decoded architecture
  • Prefix trees
  • Fast k-way architecture
  • Potential reduction in hardware reconfiguration
    time
  • Fast reconfiguration
  • KMP architecture (FPGA 04)
  • Meta-information can be extracted with
    correlation
  • Thanks!
  • For more information, http//ceng.usc.edu/prasann
    a
Write a Comment
User Comments (0)
About PowerShow.com