Title: Fast Firewall Implementation for Software and Hardware-based Routers
1Fast Firewall Implementation for Software and
Hardware-based Routers
- Lili Qiu, Microsoft Research
- George Varghese, UCSD
- Subhash Suri, UCSB
- 9th International Conference on Network Protocols
- Riverside, CA, November 2001
2Outline
- Motivation for packet classification
- Performance metrics
- Related work
- Our approaches
- Performance results
- Summary
3Motivation
- Traditionally, routers forward packets based on
the destination field only - Firewall and diff-serv require packet
classification - forward packets based on multiple fields in the
packet header - e.g. source IP address, destination IP address,
source port, destination port, protocol, type of
service (ToS)
4Packet Classification Based Router
HEADER
Forwarding Engine
Action
Packet Classification
Classifier (policy database)
Predicate
Action
----
----
----
----
Incoming Packet
----
----
5Problem Specification
- Given a set of filters (or rules), find the least
cost matching filter for each incoming packet - Each filter specifies
- Some criterion on K fields
- Associated directive
- Cost
- ExampleRule 1 24.128.0.0/16 4.0.0.0/8
udp denyRule 2 64.248.128.0/20
8.16.192.0/24 tcp permitRule N
24.2.0.0/16 4.16.128.0/20 any
permit Incoming packet 24.128.34.8,
4.16.128.3, udp Answer rule 1
6Performance Metrics
- Classification speed
- Wire rate lookup for minimum size (40 byte)
packets at OC192 (10 Gbps) speeds. - Memory usage
- Should use memory linear in the number of rules
- Update time
- Slow updates are acceptable
- Impact on search speed should be minimal
7Related Work
- Given N rules in K dimensions, the worst-case
bounds - O(log N) search time, O(N(K-1)) memory
- O(N) memory, O((log N)(K-1)) search time
- Grid-of-tries (Srinivasan et.al. Sigcomm98)
- Cross-producting (Srinivasan et.al. Sigcomm98)
- Lucent bit vector scheme (Lakshman et.al.
Sigcomm98) - RFC (Pankaj et.al. Sigcomm99)
- Tuple Space Search (Srinivasan et.al. Sigcomm99)
- Fat Inverted Segment Tree (Feldman et.al.
Infocom00) - Entry Tuple Space Pruning (Srinivasan Infocom01)
8Backtracking Search
- A trie is a binary branching tree, with each
branch labeled 0 or 1 - The prefix associated with a node is the
concatenation of all the bits from the root to
the node
F1 00
F2 10
A
1
0
B
D
0
C
0
F1
E
F2
9Backtracking Search (Cont.)
A
- Extend to multiple dimensions
- Standard backtracking
- Depth-first traversal of the tree visiting all
the nodes satisfying the given constraints - Example Search for 00,0,0Result F8
1
0
B
0
0
C
0
D
0
1
H
0
E
0
I
J
0
0
1
0
1
F8
F
F3
G
1
1
K
0
F6
F4
F2
F5
F7
F1
10Set Pruning Tries
- Multiplane trie
- Fully specify all search paths so that no
backtracking is necessary - Performance
- O(logN) search time
- O(N(k-1)) storage
11Set Pruning Tries Conversion
- Terminology
- Descendant string
- String S is a descendant of string S if S is a
prefix of S - E.g. 00 is a descendant of
- Descendant filter
- Filter A is a descendent of filter B if for all
dimensions j, string A(j) is a descendant of
string B(j) - E.g. Filter 00,00 is a descendant of filter
, - Converting a backtracking trie to a set pruning
trie is essentially replacing a general filter
with its descendent filters
12Set Pruning Tries Example
1
1
0
0
0
1
0
1
1
0
D
C
B
E
1
1
F2
0
0
F2
F2
F2
F2
F
A
F3
Min(F1,F2)
Min(F2,F3)
F1
Backtracking Trie
Set Pruning Trie
Replace ,, with 0,0,, 0,0,0,
0,1,, 1,0,,1,1,, and 1,1,1.
13Performance Evaluation
- 5 real databases from various sites
- Performance metrics
- Total storage
- Total number of nodes in the multiplane trie
- Worst-case lookup time
- Total number of memory accesses in the worst-case
assuming 1 bit at a time trie traversal
14Performance Results
Database Rules Backtracking Backtracking Set Pruning Tries Set Pruning Tries
Database Rules Lookup time Storage Lookup time Storage
1 67 146 1848 86 5541
2 158 153 4914 102 51785
3 183 169 3949 102 59180
4 279 202 6785 102 123951
5 266 208 6555 102 165920
Backtracking has small storage and affordable
lookup time.
15Major Optimizations
- Trie compression algorithm
- Pipelining the search
- Selective pushing
16Trie Compression Algorithm
0
- If a path AB satisfies the Compressible Property
- All nodes on its left point to the same place L
- All nodes on its right point to the same place R
- then we compress the entire branches by 3
edges - Center edge with value ?(AB) pointing to B
- Left edge with value lt ?(AB) pointing to L
- Right edge with value gt ?(AB) pointing to R
- Advantages of compression save time storage
0 branch gt01010
0 branch lt 01010
0
1
0 branch 01010
F1
1
0
0
1
F3
F1
1
F2
F1
F3
0
F2
F3
17Performance Evaluation of Compression
Database Lookup Time of Uncompressed Lookup Time of Compressed
1 146 30
2 153 51
3 169 49
4 202 98
5 208 59
Compression reduces the lookup time by a factor
of 2 - 5
18Pipelining Backtracking
- Use pipeline to speed up backtracking
- Issues
- The amount of register memory passed between
pipelining stages need to be small - The amount of main memory need to be small
- Our approaches
- Propose a backtracking search that only needs K1
registers (K is the number of dimensions) - Have pipeline stage i store only the trie nodes
that will be visited in the stage i
19Pipelining BacktrackingLimit the amount of
register
- Standard backtracking requires O(WK) state for
filters with K fields and W-bit long - Our approach
- Visit more general filters first, and more
specific filters later - Example
- Search for 00,0,0A-B-H-J-K-C-D-E-F-GResult
F8 - Performance
- K1 32-bit registers
A
1
0
D
B
0
0
C
0
D
0
1
H
0
E
S
0
I
J
0
0
P
1
0
1
F8
F
F3
G
1
1
K
0
F6
F4
F2
F5
F7
F1
20Pipelining Backtracking Limit the amount of
memory
- Simple approach
- Store an entire backtracking search trie at every
pipelining stage - Storage increases linearly with the number of
pipelining stages - Our approach
- Have pipeline stage i store only the trie nodes
that will be visited in the stage i
21Storage Requirement for Pipeline
22Trading Storage for Time
- Smoothly tradeoff storage for time
- Observations
- Set pruning tries eliminate all backtracking by
pushing down all filters ? intensive storage - Eliminate backtracking for filters with large
backtracking time - Selective push
- Push down the filters with large backtracking
time - Iterate until the worst-case backtracking time
satisfies our requirement
O((logN)(k-1)) Time (e.g. Backtrack)
O(N(k-1)) Space (e.g. Set Pruning)
23Example of Selective Pushing
- Goal worst-case memory accesses lt 12
- The filter 0, 0, 000 has 12 memory accesses.
- Push the filter down ? reduce lookup time
- Now the search cost of the filter 0,0,001
becomes 12 memory accesses. So we need to push it
down. Done!
0
0
0
0
0
0
0
0
0
0
0
0
F3
0
0
0
0
0
F3
F3
0
0
1
1
1
0
0
0
1
0
0
0
0
F2
0
F2
F2
F2
F1
F1
F1
F1
F1
24Performance of Selective Push
Compressed Trie
Uncompressed Trie
25Summary
Approach Description Performance Gain
Trie compression algorithm Effectively exploit redundancy in trie nodes Reduce lookup time by a factor of 2 5, save storage by a factor of 2.8 8.7
Pipelining the search Split the search into multiple pipelining stages, and each stage is responsible for a portion of search Increase throughput with marginal increase in memory cost
Selective push Push down the filters with large backtracking time Reduce lookup time by 10 25 with only marginal increase in storage
26Traditional routers Destination address lookup
Forwarding Engine
Next Hop
Dstn Addr
Next Hop Computation
Forwarding Table
Dstn-prefix
Next Hop
----
----
----
----
Incoming Packet
----
----
- Unicast destination address based lookup
27Selective Push
- Main idea
- Push down the filters with large backtracking
time - Iterate until the worst-case backtracking time
satisfies our requirement
28Packet Classification
- Motivation for packet classification
- Needed for implementing firewalls and diff-serv
- Problem specification
- Given a classifier of N rules, find the least
cost matching filter for the incoming packets - ExampleRule 1 24.128.0.0/16 4.0.0.0/8 udp
denyRule 2 64.248.128.0/20 8.16.192.0/24
tcp permitRule N 24.0.0.0/8 4.16.128.0/20
any permit Incoming packet 24.128.34.8,
4.17.135.3, udp matches rule 1 - Performance metrics
- Classification speed
- Memory usage
- Update time
29Related Work
- Given N rules in K dimensions, the worst-case
bounds - O(log N) search time, O(N(K-1)) memory
- O(N) memory, O((log N)(K-1)) search time
- Tree based
- Grid-of-tries (Srinivasan et.al. Sigcomm98)
- Fat Inverted Segment Tree (Feldman et.al.
Infocom00) - Lucent bit vector scheme (Lakshman et.al.
Sigcomm98)
30Related Work (Cont.)
- Cross-producting (Srinivasan et.al. Sigcomm98)
- RFC (Pankaj et.al. Sigcomm99)
- Tuple space search
- Tuple space search (Srinivasan et.al. Sigcomm99)
- Entry Tuple Space Pruning (Srinivasan Infocom01)
31Related Work
- Given N rules in K dimensions, the worst-case
bounds - O(log N) search time, O(N(K-1)) memory
- O(N) memory, O((log N)(K-1)) search time
- Grid-of-tries (Srinivasan et.al. Sigcomm98)
- Fat Inverted Segment Tree (Feldman et.al.
Infocom00) - Lucent bit vector scheme (Lakshman et.al.
Sigcomm98) - Cross-producting (Srinivasan et.al. Sigcomm98)
- RFC (Pankaj et.al. Sigcomm99)
- Tuple space search (Srinivasan et.al. Sigcomm99)
32Trie Compression Algorithm
- If a path AB satisfies the Compressible
Property - All nodes on its left point to the same place L
- All nodes on its right point to the same place R
- then we compress the entire branches by 3 edges
- Center edge with value ?(AB) pointing to B
- Left edge with value lt ?(AB) pointing to L
- Right edge with value gt ?(AB) pointing to R
- Advantages of compression save time storage
33Trie Compression Algorithm
0
0
0 branch gt01010
1
0 branch lt 01010
F1
1
0 branch 01010
0
0
1
F3
F1
1
0
F2
F3
34Backtracking Search (Cont.)
- Extend to multiple dimensions
- Backtracking is a depth-first traversal of the
tree which visits all the nodes satisfying the
given constraints - Example search for 00,0,0
35Example of Selective Push
- Goal worst-case memory accesses lt 12
- The filter 0, 0, 0000 has 12 memory
accesses. - Push the filter down ? reduce lookup time
- Now the search cost of the filter 0,0,001
becomes 12 memory accesses. So we need to push it
down. Done!
36Example of Selective Push
- Goal worst-case memory accesses lt 12
- The filter 0, 0, 0000 has 12 memory
accesses. - Push the filter down ? reduce lookup time
- Now the search cost of the filter 0,0,001
becomes 12 memory accesses. So we need to push it
down. Done!
37Using Available Hardware
- So far, we have focused on software techniques
for packet classification. - Further improve the performance by taking
advantage of limited hardware if it is available - By moving some filters (or rules) from software
to hardware - Key issue Which filters to move from software to
hardware?Answer - To reduce lookup time, move the filters with the
largest number of memory accesses when using
software approach
38Challenge of Packet Classification
- The general packet classification problem has
poor worst-case cost - Given N arbitrary filters with k packet fields
- either the worst-case search time is
O((logN)(k-1)) - or the worst-case storage is O(N(k-1))