Title: Deterministic MemoryEfficient String Matching Algorithms for Intrusion Detection
1Deterministic Memory-Efficient String Matching
Algorithms for Intrusion Detection
- Nathan Tuck, Timothy Sherwood, Brad Calder,
George Varghese - Department of Computer Science and Engineering,
University of California, San Diego - Department of Computer Science, University of
California, Santa Barbara
2Abstract
- IDSsIntrusion Detection Systems
- Space and time efficient string matching
algorithms - Providing worst-case performance
- Amenable to H/W implementation
- Aho-Corasick
- Memory, performance
3Introduction (i)
- Combating attacks at every level
- Automatically monitoring network traffic
- IDS uses a set of rules
- Apply to matching packets
- Edge and core routers
- Stringent worst-case performance bounds
- Tight constraints on memory
4Introduction (ii)
- At the heart of IDSs is a string matching
algorithm - In Snort, 70 of total execution time and 80 of
instructions executed - Contributions of this paper
- Characterization
- New Algorithms
- Evaluation
5String matching for intrusion detection
6Quantifying the Use of String Matching (i)
- Snort-An intrusion detection system
- The rules are generated manually
- Extract relevant content strings from the payload
and header of known attacks - The action can include logging, alerting,
ignoring, - Rules are usually added as new vulnerabilities
are discovered
7Quantifying the Use of String Matching (ii)
- Scalability of the intrusion detection system
database - Beneficial to avoid that has run-time
proportional to the length of the rules in the
database - New rules are being added to detect or combat new
attacks
8(No Transcript)
9(No Transcript)
10Quantifying the Use of String Matching (iii)
- Linearly searching through the of rules is
becoming increasingly infeasible - The database is growing at a rate that is well
within Moores Law - Need a technique with run-time performance
11State of the Art in String Matching (i)
- Single-pattern string matching
- Boyer-Moore,
- Multi-pattern string matching
- Aho-Corasick, Wu-Manber,
- Imprecise string matching
- Using hashing and signature-based
- Be reverified using a precise string matching
12State of the Art in String Matching (ii)
- Bad Character Heuristics
- Easily exploitable by attackers
- Aho-Corasick
- Use unoptimized data structure for space
optimizations - SFKSearch
- Worst-case performance is quite poor
- Wu-Manber
- Memory access to the shift and hash table
13(No Transcript)
14Applying IP Lookup Techniques to String Matching
(i)
- IP-lookupa set of patterns to match, finding the
longest possible match for a set of IP address
that are streaming by - String matchinga set of strings to match,
finding all of the places in the input stream
where there is a match
15Applying IP Lookup Techniques to String Matching
(ii)
- Unibit and Multibit Tries
- Wastes space with pointer
- Lulea Algorithm
- Use the concepts of leaf pushing and bitmaps to
compress the database - Eatherton Algorithm
- Internal bitmap and external bitmap
16Optimizations for string matching
17Bitmap compression (i)
- With 32-bits pointers
- In Aho-Corasick has 256 next state pointers
- Now using a single pointer to the first valid
next state, and maintain a 256 bit bitmap - Summing all the bits prior that bit number and
adding them to the base next node pointer
18(No Transcript)
19Bitmap compression (ii)
- Original optimized Aho-Corasick
- 1028 bytes each node
- Bitmapped version
- Only 44 bytes each node
- Incurs two costs
- Doubles the worst-case of work
- Performing a sum up to 256 prior bits
20Path Compression (i)
- Bitmap is largely wasted information at the
bottom nodes - Any path compressed nodes must be equal in size
to bitmapped nodes - Failure pointers must include an offset
21(No Transcript)
22Path Compression (ii)
- On a 32 bits pointer
- A single path compressed node can contain data
equivalent to 4 bitmap compressed nodes - In practice, achieve a 2.541 compression ratio
23(No Transcript)
24Results
25Instruction Detection in Hardware
- The number of rules go up by over a factor of
2.5, whereas the size of memory for our algorithm
only goes up by 30 - Focus our attention on the worst-case performance
26(No Transcript)
27Intrusion Detection in Software
- Examine both average-case and worst-case
performance - Wu-Manber is the fastest in the average-case
because of hash function
28(No Transcript)
29(No Transcript)
30Summary (i)
- Current software IDSs largely rely on common-case
optimizations to gain speed - Aho-Corasick is only has deterministic worst-case
lookup times and friendly enough to use for wire
speed H/W matching
31Summary (ii)
- Contribution of this paper
- Apply bitmap node compression and path
compression to Aho-Corasick - Gain both compact storage and worst-case
performance