Automatic Synthesis of Efficient Intrusion Detection Systems on FPGAs by Zachary K' Baker and Viktor - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Automatic Synthesis of Efficient Intrusion Detection Systems on FPGAs by Zachary K' Baker and Viktor

Description:

Automatic Synthesis of Efficient Intrusion. Detection Systems on FPGAs ... http://halcyon.usc.edu/~zbaker/idstools. Tool Package. Partitioning tools - KMETIS toolset ... – PowerPoint PPT presentation

Number of Views:125
Avg rating:3.0/5.0
Slides: 21
Provided by: arlW
Category:

less

Transcript and Presenter's Notes

Title: Automatic Synthesis of Efficient Intrusion Detection Systems on FPGAs by Zachary K' Baker and Viktor


1
Automatic Synthesis of Efficient
IntrusionDetection Systems on FPGAsby Zachary
K. Baker and Viktor K. PrasannaUniversity of
Southern California, Los Angeles, CA, USAFPL
2004Review 12-9-04
  • Presented by
  • Jack Meier

2
Outline
  • Introduction
  • Related Work in Automated IDS Generation
  • Approach
  • Tree-based Prefix Sharing
  • Performance Results for Tool-Generated Designs
  • Conclusion

3
Background on network security and intrusion
detection
  • Performance
  • ability to match against a large set of patterns
  • Features
  • automatically optimize and synthesize large
    designs
  • Application for Intrusion detection software
  • Snort
  • Hogwash
  • Trends
  • Move away from general-purpose microprocessor
  • Admins remove rules from the databases when
    performance is limited
  • Move to string matching to an FPGA based
    reconfigurable hardware

4
Related Work in Automated IDS Generation
  • Open-source Intrusion detection software
  • Snort
  • Hogwash
  • String matching is just one component of the many
    functions used in Snort and Hogwash

5
Related work in FPGA-based network scanning
  • First reassemble TCP stream (WashU Schuehler)
  • demultiplex the TCP/IP stream into substreams
  • Sort packets based on rules
  • (WashU Mike Attig)
  • Washington University research approach
  • Deterministic Finite Automata pattern matchers
    (WashU RegEx)
  • Spread the load over several parallel matching
    units
  • Four parallel reduce the runtime by 4X
  • Limited number of regular expressions patterns
    support
  • Bloom Filters
  • Provides large amount of string matching
  • Pre-decoded shift-and-compare shift registers
  • Alternative approaches
  • This paper lacks references much of the related
    work

6
Approach of this work
  • Partitions large rule sets into multiple
    pipelines
  • Minimizes FPGA memory area usage
  • Optimizes Throughput
  • String search (one part of IDS)
  • Characters tend to repeat across strings
  • Each string only contains a few dozen characters
  • pipelines created
  • 2-4 for 400 patterns
  • 8 for 600 1000 patterns
  • Rules
  • repeated characters within a partition is
    maximized
  • number of characters repeated between partitions
    is minimized
  • Tools
  • Process graphs with trie nodes
  • Generate synthesizeable VHDL circuit description

7
Approach
8
Contribution of this work
  • Tool
  • accepts rule strings
  • creates pipelined distribution networks
  • converts template-generated Java to netlists
    w/JHDL
  • Reduces amount of routing required
  • Reduces complexity of finite automata state
    machines
  • Proposed Tool combines common prefixes to form
    matching trees
  • Adds pre-decoded wide parallel inputs

9
Approach
  • Basic idea - characters shared across patterns do
    not need to be redundantly compared
  • Architectures
  • a pre-decoded shift-and-compare
  • Data pre-decoded" into its character equivalent
  • Large array of AND gates asserts output for
    character hot coded
  • Gate delays provided through a shift-register
  • Appropriate decoded character is selected from
    each time-delayed shift-register stage
  • tree-based area optimization prevents repeating
    comparisons
  • Precomputing redundancy information
  • shared decoding
  • pushes all character-level comparisons to the
    beginning of the comparator pipelines
  • reduces the single character match operation to
    the inspection of a single bit.
  • Eg valueh represented with a single bit, one
    bit for each possible value
  • Enables pattern groups to be handled by an
    independent pipeline
  • Eg acker

10
Patterns of text
  • Set of strings appear start in common partition
  • Strings (vertices) with characters in common have
    are connected (with an edge)
  • Partitions are established that have characters
    in common
  • i.e., fully connected groups are partitioned
    together
  • reducing the cut between the partitions decreases
    the number of pipeline registers

Objective - maximize the number of edges between
nodes within the group minimize the number of
edges between nodes in different groups
11
Formalized operation
Node - collection of patterns is the number of
characters piped through each pipeline Pattern -
Each pattern is composed of letters. Edge
Every node with a given letter is connected by an
edge to every other node with that letter. The
edge is added between any vertex-patterns that
have a common character
  • vertex V
  • graph R
  • pattern p
  • ruleset T
  • edge E
  • character class C
  • k, l edge instances

12
Tree-based Prefix Sharing
  • efficient search of matching rules
  • Boyer-Moore algorithm
  • Aho-Corasick algorithm
  • hashing mechanism utilizing the Bloom filter
  • pre-decoding strategy - converts characters to
    single bit lines in the cycle before they are
    required in the state machine
  • reduces the area of designs
  • allows more patterns to be packed in the FPGA
  • customized
  • for the 4-bit blocks of characters
  • four character prefixes map to four decoded bits
  • Fits Xilinx Virtex 4-bit lookup table

13
Tree-based sharing Example operation
/scripts
/unsafe.cgi
/cgi-bin
-win
/cgi
/insecure.cgi
/unsafe.cgi
(Ignored characters)
14
Graph theory
  • Every node with a given letter is connected by an
    edge to every other node with that letter
  • Patterns are partitioned n-ways to reduces
    pipeline register width Improve character usage
    within pipeline
  • number of repeated characters within a partition
    is maximized
  • number of characters repeated between partitions
    is minimized
  • system is composed of n pipelines, each with a
    minimum of bit lines
  • Uses mincut physical design automation to
    minimize representation
  • Weighted reduces the problems associated with
    large rulesets
  • Uses edge weighting based on number of characters
    in the pattern
  • Patterns sharing character locality to be more
    likely grouped together
  • keep similar prefixes together but doesnt force
    incompatible patterns in the same group

15
On-line Tools
  • Tools on-line at
  • http//halcyon.usc.edu/zbaker/idstools
  • Tool Package
  • Partitioning tools - KMETIS toolset
  • the trie data structure modulehttp//search.cpan.
    org/avif/Tree-Trie-0.4/Trie.pm
  • IDS database - Hogwash

16
Clock Period and Area Efficiency as a function of
of partitions and Size of Ruleset
Clock Period(has small effect for different s
ofPartitions)
Area is smallestwith smaller ofPartitions.
(3x range of area)
  • For large rulesets, the penalty to add more
    partitions is not so bad (1.5 x larger)
  • Clock period is relatively unaffected for all
    cases

17
Architecture Comparison
18
Comparisons to Related Work
(Unit Size / Performance 1000)
Pattern size, average unit size for a 16
character pattern (in logic cells one slice is
two logic cells), and performance (in Mb/s/cell)
(Where are Bloom filters on this chart ?!?
)Whole list of 16-character patterns fit in one
filter !This chart does not consider BlockRAMs!
19
Performance Results for Tool-Generated Designs
  • Performance
  • General improvement is approximately 30
  • 602 pattern ruleset reduces the area by almost
    50 in some cases
  • unpartitioned experiments - increased area due to
    the tree architecture
  • large numbers of patterns share the same pipeline
    increased fanout
  • impossible to make fair comparisons without
    reimplementing all other designs
  • performance - throughput/area
  • small, fast designs perform best with Virtex II
    Pro XC2VP100
  • Area increases moderately as the number of
    partitions increases
  • Platform - Xilinx ML-300 contains a Virtex II Pro
    XC2VP7
  • VHDL file size - 300kB for the 361 ruleset
  • 9,000 lines of __HDL code
  • 1200 slices

20
Conclusion
  • Throughput (1 / Clock_rate) not greatly affected
    by of partitions
  • Area increases with of Partitions
  • Performance of approach (speed/area)
  • Comparable to that achieved by UCLA RDL
  • About 2x better than GaTech and U.Create
  • About 30x better than Los Alamos
  • Not compared to Bloom filters
Write a Comment
User Comments (0)
About PowerShow.com