A Hybrid Finite Automaton for Practical Deep Packet Inspection - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

A Hybrid Finite Automaton for Practical Deep Packet Inspection

Description:

Title: PowerPoint Presentation Last modified by: Michela Becchi Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show Other titles – PowerPoint PPT presentation

Number of Views:376
Avg rating:3.0/5.0
Slides: 23
Provided by: webMisso48
Category:

less

Transcript and Presenter's Notes

Title: A Hybrid Finite Automaton for Practical Deep Packet Inspection


1
A Hybrid Finite Automatonfor Practical Deep
Packet Inspection
Michela Becchi and Patrick Crowley
  • CoNEXT 2007

2
Context
  • Deep packet inspection
  • Challenge perform regular expression matching at
    line rate, given data-sets of hundreds (or
    thousands) of patterns
  • Processing time
  • Memory requirement

Matching Engine and RegEx set
FTP.OPEN. www.spyware Host Server.HTTP
Safe packets
Incoming packets
blaBLAbla
Hosxyz
Safe_payload
Safe_payload
xHost
Malicious packets
ServerxHTTP
3
Deterministic vs. Non-Deterministic FA
RegEx (1) .abc (2) .bcd (3) .cde
a
NFA
c
b
1
2
3/1
a

d
b
c
d
0
6/2
4
5
DFA
c
d
e
9/3
7
8
Text
d
a
b
c
d
4
Memory-time tradeoff
  • NFA
  • limited size
  • potentially NNFA states active in parallel
  • DFA
  • one state traversal/char
  • size potentially 2N states where NNNFA
  • In practical cases single DFA infeasible!
  • Idea
  • Hybrid automaton
  • Size comparable to NFA by preventing state
    explosion
  • Predictable and small memory bandwidth/processing
    time
  • Limit to classes of RegEx in Intrusion Detection
    Systems
  • Analyze state explosion scenarios

NFA
time
DFA
memory
5
SNORT Regular expressions
  • Examples
  • Server\sGuptachar\s\d\x2E\d
  • User-Agent \r\nA-311\sServer
  • Host\r\nwwp\.mirabilis\.com.from\r\nfrom
    email\r\nsubject\r\nto24962844
  • \sPARTIAL.BODY\.PEEK\n1024
  • SNORT RegExs DO consist of
  • Sequences of sub-patterns
  • Possibly containing (repetitions of) character
    ranges
  • Separated by dot-star terms and counting
    constraints
  • SNORT RegExs DONT normally contain
  • Nested repetitions
  • Disjunctions of complex sub-expressions

pattern1.pattern2.n,mpatternkcxcypat
ternn
6
Dot-star terms
  • Definition
  • Unconstrained repetitions of wildcards (.) or
    large ranges c1c2..ck
  • Examples
  • User-Agent\r\nZC-Bridge
  • On single regular expressions (from practical
    data-sets)
  • NO state Blowup

c
7
Dot-star conditions (contd)
ce
  • Compiling together several RegEx
  • Duplication sub-DFAs at . states
  • NO exponential blow-up
  1. ab.cd
  2. efgh

8
Counting constraints
  • Definition
  • Constrained repetition of wildcard .n,m or
    large ranges c1c2..ckn,m
  • Examples
  • AUTH\s\n100 (buffer overflow)
  • Exponential state explosion
  • Single regular expressions all possible
    occurrences of the prefix in the counting
    constraint
  • Multiple regular expressions additionally, all
    the possible occurrences of other RegEx in the
    counting constraint

9
Counting constraints (contd)
NFA
DFA

a
7
a
a
a
d
b
a
a
a
c
1
2
3
4
5
6
a
a
a
c
a
Exab.3cd
ab
ab
0
a
a
a
8
9
10
1
b
b
b
a
a
2
a
ac
11
13
3
12
a
10
ac
a
b
c
a
c
ad
c
ad
14
15
16
4
5
a
d
abc
d
4
c
a
18
9
17
1
6
10
First step hybrid-FA
  • Idea Stop subset construction at the state where
    state blowup would occur
  • Implication hybrid-FA with a head-DFA, one or
    more tail-NFAs and one of more border-states

Hybrid-FA
NFA
e
11
Hybrid-FA traversal
NFA
Hybrid-FA
b a a c a b c a c e f c d e
0 0 1 0 1 0 5 0 1 9 0 2 0 5 2 3 0 1 9 2 0 5 2 3 0 11 6 2 0 12 7 2 0 5 8 2 3 0 2 4 0 11 2
1 5 11
b a a c a b c a c e f c d e
0 1 1 5 9 2 5 2 3 9 2 5 2 3 6 2 7 2 8 2 3 0 2 4 11 2
  • Functional equivalence (commonly reached
    accepting states)
  • Hybrid-FA
  • Limitation in size of active vector till border
    state is reached
  • No back activation from tail-NFAs to head-DFA

12
Improving the worst case
  • Size Hybrid-FA Size of NFA
  • Bandwidth
  • Average case improved (in DFA)
  • Worst case dependent on tail-NFAs size
  • Can we do better?

13
Dot-star terms Tail-DFAs
  • Idea
  • Problem
  • Multiple border state traversals gt Multiple
    tail-DFA activations
  • Fact
  • In case of
  • sub_pattern1. sub_pattern2
  • sub_pattern1c1ck sub_pattern2 w/ c1,..,ck ?
    sub_pattern2
  • subsequent activations of a tail-DFA can be
    safely ignored
  • Implication
  • Each tail-DFA adds only 1 to the worst case bound

tail-NFA
14
Counting Constraints counter trick
NFA for .nsuffix
  • Observation
  • n counting states do not carry real next state
    information
  • Idea
  • Replace n counting states w/ auto-decrementing
    counter
  • At most 2 memory accesses per counter sufficient
  • Optimization
  • Counting constraint at the end of the regular
    expression (no suffix) gt ONE counter is enough

15
Rule-sets
  • Distinct PCREs 982
  • 25 w/ long counting constraints (generally at
    the end of the RegEx, n100-1024)
  • 11.4 containing . terms
  • 54.89 containing c1c2..ck terms
  • Header-based grouping

Rule-set Number of rules Header Header Header Header Header Characteristics Characteristics
Rule-set Number of rules Protocol Source IP Src Port Destination IP Destination Port . and x .n,m
Group1 329 Tcp HOME_NET any EXTERNAL_NET HTTP_PORTS/any 283 -
Group2 40 Tcp HOME_NET any EXTERNAL_NET 25/any 24 -
Group3 18 Tcp EXTERNAL_NET any HOME_NET 77777778/any 5 10
Group4 45 Tcp EXTERNAL_NET any HOME_NET 143/any 24 19
Group5 20 Tcp EXTERNAL_NET any HOME_NET 119/any 6 11
Group6 24 Tcp EXTERNAL_NET any HOME_NET 110/any 7 12
16
Memory storage requirements
  • Tail-DFAs and counter trick used (counters at end)

Rule-set NFA DFA DFA Hybrid-FA Hybrid-FA Hybrid-FA
Rule-set states DFAs Total states tail-FA head-DFA states Total tail-states
Group1 15679 32 71234 31 40461 30321
Group2 1036 3 2 22651 31521 2 20724 1905
Group3 8871 N-A N-A 10 514 -
Group4 3119 N-A N-A 19 2560 -
Group5 5205 N-A N-A 11 2485 -
Group6 1952 N-A N-A 12 4878 -
17
Memory bandwidth requirements
  • Simulations on 12 packet traces
  • From 17MB to 264 MB
  • 1-6 rules matched/traces
  • Observations
  • active set size of parallel active states

Rule-set NFA NFA NFA DFA Hybrid-FA Hybrid-FA Hybrid-FA
Rule-set Avg Max Worst case Avg Max Worst case Avg Max Worst case
Group1 1.15 34 15679 32 1.009 5 32
Group2 1.06 13 1036 2/3 1.001 2 3
Group3 1.04 4 8871 - 1.002 2 11
Group4 2.45 12 3119 - 1.001 2 20
Group5 1.04 5 5205 - 1.001 2 12
Group6 2.99 6 1952 - 1.088 2 13
18
Conclusion
  • Contributions
  • Analysis of practical rule-sets
  • Proposal of hybrid-FA to
  • reduce memory storage requirement
  • limit average case memory bandwidth
  • Refinements tail-DFAs and counter tricks
  • bound worst case memory bandwidth
  • Experimental results
  • Memory size comparable to the corresponding NFA
  • Memory bandwidth
  • Average case single (unfeasible) DFA
  • Worst case dependent upon number of problematic
    RegEx
  • Deployment observation
  • Head and tail-FAs independent
  • Hybrid-FA suitable for deployment on parallel
    architectures and FPGAs

19
Thanks
  • Questions?

20
A SNORT rule
HEADER MATCHING (protocol, source addr, source
port, dest. addr, dest. port)
  • alert tcp HOME_NET any -gt EXTERNAL_NET
    HTTP_PORTS (msg"BACKDOOR a-311 death user-agent
    string detected" flowto_server,established
  • content"User-Agent3A" nocase
    content"A-311" distance0 nocase
    content"Server" distance0 nocase
    pcre"/User-Agent\x3A\r\nA-311\sServer/smi"
    referenceurl,www3.ca.com/securityadvisor/pest/pe
    st.aspx?id453076778 classtypetrojan-activity
    sid6396 rev1)
  • PAYLOAD INSPECTION
  • Keywords (content)
  • Regular expression (PCRE)

21
Problem
  • Network Intrusion Detection Systems use Regular
    Expression Matching for Payload Inspection
  • Regular Expression Matching performed in Linear
    time through deterministic finite automata (DFAs)
  • Several compression techniques put in place to
    reduce memory requirement of given DFAs
  • BUT
  • Complexity of RegEx may make DFAs unfeasible
    because of state explosion.
  • How to prevent state explosion from happening
    preserving worst case bound in memory bandwidth?

22
Deterministic vs. Non-Deterministic FA
RegEx (1) .abc (2) .bcd (3) .cde
NFA
DFA
c
b
Write a Comment
User Comments (0)
About PowerShow.com