Title: A Hybrid Finite Automaton for Practical Deep Packet Inspection
1A Hybrid Finite Automatonfor Practical Deep
Packet Inspection
Michela Becchi and Patrick Crowley
2Context
- Deep packet inspection
- Challenge perform regular expression matching at
line rate, given data-sets of hundreds (or
thousands) of patterns - Processing time
- Memory requirement
Matching Engine and RegEx set
FTP.OPEN. www.spyware Host Server.HTTP
Safe packets
Incoming packets
blaBLAbla
Hosxyz
Safe_payload
Safe_payload
xHost
Malicious packets
ServerxHTTP
3Deterministic vs. Non-Deterministic FA
RegEx (1) .abc (2) .bcd (3) .cde
a
NFA
c
b
1
2
3/1
a
d
b
c
d
0
6/2
4
5
DFA
c
d
e
9/3
7
8
Text
d
a
b
c
d
4Memory-time tradeoff
- NFA
- limited size
- potentially NNFA states active in parallel
- DFA
- one state traversal/char
- size potentially 2N states where NNNFA
- In practical cases single DFA infeasible!
- Idea
- Hybrid automaton
- Size comparable to NFA by preventing state
explosion - Predictable and small memory bandwidth/processing
time - Limit to classes of RegEx in Intrusion Detection
Systems - Analyze state explosion scenarios
NFA
time
DFA
memory
5SNORT Regular expressions
- Examples
- Server\sGuptachar\s\d\x2E\d
- User-Agent \r\nA-311\sServer
- Host\r\nwwp\.mirabilis\.com.from\r\nfrom
email\r\nsubject\r\nto24962844 - \sPARTIAL.BODY\.PEEK\n1024
- SNORT RegExs DO consist of
- Sequences of sub-patterns
- Possibly containing (repetitions of) character
ranges - Separated by dot-star terms and counting
constraints - SNORT RegExs DONT normally contain
- Nested repetitions
- Disjunctions of complex sub-expressions
pattern1.pattern2.n,mpatternkcxcypat
ternn
6Dot-star terms
- Definition
- Unconstrained repetitions of wildcards (.) or
large ranges c1c2..ck - Examples
- User-Agent\r\nZC-Bridge
- On single regular expressions (from practical
data-sets) - NO state Blowup
c
7Dot-star conditions (contd)
ce
- Compiling together several RegEx
- Duplication sub-DFAs at . states
- NO exponential blow-up
- ab.cd
- efgh
8Counting constraints
- Definition
- Constrained repetition of wildcard .n,m or
large ranges c1c2..ckn,m - Examples
- AUTH\s\n100 (buffer overflow)
- Exponential state explosion
- Single regular expressions all possible
occurrences of the prefix in the counting
constraint - Multiple regular expressions additionally, all
the possible occurrences of other RegEx in the
counting constraint
9Counting constraints (contd)
NFA
DFA
a
7
a
a
a
d
b
a
a
a
c
1
2
3
4
5
6
a
a
a
c
a
Exab.3cd
ab
ab
0
a
a
a
8
9
10
1
b
b
b
a
a
2
a
ac
11
13
3
12
a
10
ac
a
b
c
a
c
ad
c
ad
14
15
16
4
5
a
d
abc
d
4
c
a
18
9
17
1
6
10First step hybrid-FA
- Idea Stop subset construction at the state where
state blowup would occur - Implication hybrid-FA with a head-DFA, one or
more tail-NFAs and one of more border-states
Hybrid-FA
NFA
e
11Hybrid-FA traversal
NFA
Hybrid-FA
b a a c a b c a c e f c d e
0 0 1 0 1 0 5 0 1 9 0 2 0 5 2 3 0 1 9 2 0 5 2 3 0 11 6 2 0 12 7 2 0 5 8 2 3 0 2 4 0 11 2
1 5 11
b a a c a b c a c e f c d e
0 1 1 5 9 2 5 2 3 9 2 5 2 3 6 2 7 2 8 2 3 0 2 4 11 2
- Functional equivalence (commonly reached
accepting states) - Hybrid-FA
- Limitation in size of active vector till border
state is reached - No back activation from tail-NFAs to head-DFA
12Improving the worst case
- Size Hybrid-FA Size of NFA
- Bandwidth
- Average case improved (in DFA)
- Worst case dependent on tail-NFAs size
- Can we do better?
13Dot-star terms Tail-DFAs
- Idea
- Problem
- Multiple border state traversals gt Multiple
tail-DFA activations - Fact
- In case of
- sub_pattern1. sub_pattern2
- sub_pattern1c1ck sub_pattern2 w/ c1,..,ck ?
sub_pattern2 - subsequent activations of a tail-DFA can be
safely ignored - Implication
- Each tail-DFA adds only 1 to the worst case bound
tail-NFA
14Counting Constraints counter trick
NFA for .nsuffix
- Observation
- n counting states do not carry real next state
information - Idea
- Replace n counting states w/ auto-decrementing
counter - At most 2 memory accesses per counter sufficient
- Optimization
- Counting constraint at the end of the regular
expression (no suffix) gt ONE counter is enough
15Rule-sets
- Distinct PCREs 982
- 25 w/ long counting constraints (generally at
the end of the RegEx, n100-1024) - 11.4 containing . terms
- 54.89 containing c1c2..ck terms
- Header-based grouping
Rule-set Number of rules Header Header Header Header Header Characteristics Characteristics
Rule-set Number of rules Protocol Source IP Src Port Destination IP Destination Port . and x .n,m
Group1 329 Tcp HOME_NET any EXTERNAL_NET HTTP_PORTS/any 283 -
Group2 40 Tcp HOME_NET any EXTERNAL_NET 25/any 24 -
Group3 18 Tcp EXTERNAL_NET any HOME_NET 77777778/any 5 10
Group4 45 Tcp EXTERNAL_NET any HOME_NET 143/any 24 19
Group5 20 Tcp EXTERNAL_NET any HOME_NET 119/any 6 11
Group6 24 Tcp EXTERNAL_NET any HOME_NET 110/any 7 12
16Memory storage requirements
- Tail-DFAs and counter trick used (counters at end)
Rule-set NFA DFA DFA Hybrid-FA Hybrid-FA Hybrid-FA
Rule-set states DFAs Total states tail-FA head-DFA states Total tail-states
Group1 15679 32 71234 31 40461 30321
Group2 1036 3 2 22651 31521 2 20724 1905
Group3 8871 N-A N-A 10 514 -
Group4 3119 N-A N-A 19 2560 -
Group5 5205 N-A N-A 11 2485 -
Group6 1952 N-A N-A 12 4878 -
17Memory bandwidth requirements
- Simulations on 12 packet traces
- From 17MB to 264 MB
- 1-6 rules matched/traces
- Observations
- active set size of parallel active states
Rule-set NFA NFA NFA DFA Hybrid-FA Hybrid-FA Hybrid-FA
Rule-set Avg Max Worst case Avg Max Worst case Avg Max Worst case
Group1 1.15 34 15679 32 1.009 5 32
Group2 1.06 13 1036 2/3 1.001 2 3
Group3 1.04 4 8871 - 1.002 2 11
Group4 2.45 12 3119 - 1.001 2 20
Group5 1.04 5 5205 - 1.001 2 12
Group6 2.99 6 1952 - 1.088 2 13
18Conclusion
- Contributions
- Analysis of practical rule-sets
- Proposal of hybrid-FA to
- reduce memory storage requirement
- limit average case memory bandwidth
- Refinements tail-DFAs and counter tricks
- bound worst case memory bandwidth
- Experimental results
- Memory size comparable to the corresponding NFA
- Memory bandwidth
- Average case single (unfeasible) DFA
- Worst case dependent upon number of problematic
RegEx - Deployment observation
- Head and tail-FAs independent
- Hybrid-FA suitable for deployment on parallel
architectures and FPGAs
19Thanks
20A SNORT rule
HEADER MATCHING (protocol, source addr, source
port, dest. addr, dest. port)
- alert tcp HOME_NET any -gt EXTERNAL_NET
HTTP_PORTS (msg"BACKDOOR a-311 death user-agent
string detected" flowto_server,established - content"User-Agent3A" nocase
content"A-311" distance0 nocase
content"Server" distance0 nocase
pcre"/User-Agent\x3A\r\nA-311\sServer/smi"
referenceurl,www3.ca.com/securityadvisor/pest/pe
st.aspx?id453076778 classtypetrojan-activity
sid6396 rev1)
- PAYLOAD INSPECTION
- Keywords (content)
- Regular expression (PCRE)
21Problem
- Network Intrusion Detection Systems use Regular
Expression Matching for Payload Inspection - Regular Expression Matching performed in Linear
time through deterministic finite automata (DFAs) - Several compression techniques put in place to
reduce memory requirement of given DFAs - BUT
- Complexity of RegEx may make DFAs unfeasible
because of state explosion. - How to prevent state explosion from happening
preserving worst case bound in memory bandwidth?
22Deterministic vs. Non-Deterministic FA
RegEx (1) .abc (2) .bcd (3) .cde
NFA
DFA
c
b