Title: A Resource Efficient Content Inspection System for Next Generation Smart NICs
1A Resource Efficient Content Inspection System
for Next Generation Smart NICs
- Karthikeyan Sabhanatarajan, Ann Gordon-Ross
The Energy Efficient Internet Project High-perform
ance Computing Simulation Research Lab ECE
Department, University of Florida, Gainesville
This work was supported by the U.S. National
Science Foundation
Also affiliated with NSF Center for High
Performance Reconfigurable Computing
2Introduction
INTERNET
- Internet has grown at an alarming rate 305
between 2000 and 2008
2 of 25
3Introduction
INTERNET
- Edge devices are left idle 75 of the time with
power management - features disabled to maintain network
connectivity.
3 of 25
4Introduction
A solution to save power on the idle devices is
power proxying
The idle PC is allowed to sleep
The PC delegates responsibility to the NIC to
handle network traffic
Additionally, NICs can enhance network security
through Network Intrusion Detection
4 of 25
5Introduction
Next Generation Interfaces Also known as Smart
NICs are expected to take increased network
responsibility
Key Requirement Packet Inspection
Packet
Header Inspection
Content Inspection
This presentation focuses on Content Inspection.
Content inspection is the process of searching
the payload of the packet for the occurrence of
known set of patterns called signatures.
5 of 25
6Motivation
Existing Methodologies
Hardware
Software
Boyer-Moore
Aho Corasick
FPGAs
TCAMs
Bloom Filters
Boyer-Moore
Aho Corasick
FPGAs
TCAMs
Bloom Filters
Wu Manber
Wu Manber
Software techniques cannot support high speed
links with large signature sets
Auxiliary data structures such as SRAM are used
to store pattern combinations to help determine
a pattern match
FPGAs Exploits Parallelism Prohibitive price,
area, and power for wide scale deployments
TCAMs Popular Option Performance O(1)
However, prohibitive energy, price, and auxiliary
data structure requirements for existing
implementations.
Bloom Filters Energy efficient and moderate
throughput False positives required further
inspection on payload matching , imposes
parallelism limits (scalability)
6 of 25
7Background TCAM Methodology
Sample Signature
A B C D E F G H A B C D J K L M E F G
A B C D E F G H J K L M E F G
When w4
TCAM
Prefix Pattern
A B C D E F G H A B C D J K L M E F G
Suffix Pattern
w 4
TCAM
TCAMs are attractive candidates for pattern
matching due to their inherent simplicity in
pattern matching , small look up time , high
throughput, high density, and scalability.
7 of 25
8Background TCAM Methodology
Proposed by Lakshman et. al
A B C D E F G H J K L M E F G U I
A B C D E F G H J K L M E F G U I
A B C D E F G H J K L M E F G U I
A B C D E F G H J K L M E F G U I
A B C D E F G H J K L M E F G
w 4
TCAM
Auxiliary SRAM structures contain several pattern
permutations to identify valid patterns
O(N2) Auxiliary SRAM structure space
requirement.
Gao et. al reduced this requirement to O(NlogN)
by storing address permutations.
8 of 25
9Proposed Solution
TCAM Techniques are
- Simplest and fastest technique - O(1) look up.
- Can match future speed limits of 10 Gbps.
- Highly scalable with no parallelism limits.
- Can accommodate signatures of varying length and
different signature set sizes with ease
However they suffer from
- Increased energy consumption
- Prohibitive price
- Increased auxiliary data structure requirements
Making them unsuitable for wide scale deployment
in SNICs
9 of 25
10Proposed Solution
- We propose a hybrid TCAM based solution
Our Technique solves
- Energy efficiency Through partitioned
architecture
- Additional further reduction in power
consumption through caching by exploiting - network locality
- Auxiliary data structure requirement reduction
using bloom filter - or software techniques
- Meets throughput requirements of high speed
links such as 1 Gbps/ 10 Gbps with ease
- More suitable for wide scale deployment due to
high energy efficiency and reduced - memory requirements.
10 of 25
11Hybrid TCAM Methodology
PTCAM
STCAM
E F G H A B C D J K L M E F G
PTCAM
A B C D
A B C D E F G H J K L M E F G
STCAM
w 4
w 4
w 4
TCAM
Partition the single TCAM into a prefix TCAM
(PTCAM) and a suffix TCAM (STCAM)
Store signatures in the STCAM and PTCAM
accordingly. The signature is then expressed as
permutation of STCAM and PTCAM address.
A B C D E F G H A B C D J K L M E F G
P0 S0 S1 S2 S3
11 of 25
12Exploiting Signature Locality
Our experimentation indicates that there exists
sufficient locality in network traces.
To reduce unwanted switching we exploit this
property and introduce a cache between the PTCAM
and STCAM
12 of 25
13Hybrid TCAM Methodology
PTCAM
STCAM
PTCAM
STCAM
Suffix Cache
Ctrl
E F G H A B C D J K L M E F G
E F G H A B C D J K L M E F G
A B C D
A B C D
w 4
w 4
w 4
w 4
13 of 25
14Hybrid TCAM Methodology
PTCAM
STCAM
Suffix Cache
Ctrl
E F G H A B C D J K L M E F G
A B C D
w 4
Hit
Hit
Enable
Hit
w 4
Miss
Activator
Right Shift
Enable Buffer
1
0
0
0
Enabler
Pause
0th ..(w-1)th
Payload is fed to the inspection system, shifted
at the rate of 1 byte/clock
The cache is activated (w-1) clock cycles after a
TCAM hit
A cache miss pauses shifting to allow searching
the suffix TCAM for the pattern
Cache controller ( ctrl) updates suffix cache
14 of 25
15Hybrid TCAM Methodology
Left Shift
A B C D E F G H J K L M E F G U I
PTCAM
STCAM
Suffix Cache
Ctrl
E F G H A B C D J K L M E F G
A B C D
w 4
Hit
Hit
Enable
Hit
w 4
Miss
Activator
Right Shift
Enable Buffer
1
0
0
0
Enabler
Pause
0th ..(w-1)th
15 of 25
16Hybrid TCAM Methodology
Left Shift
A B C D E F G H J K L M E F G U I
PTCAM
STCAM
Suffix Cache
Ctrl
E F G H A B C D J K L M E F G
A B C D
w 4
Hit
Hit
Enable
Hit
w 4
Miss
Activator
Right Shift
Hit
Enable Buffer
1
0
0
0
Enabler
Pause
0th ..(w-1)th
Contention Resolution
A contention resolution unit handles contention
between identical PTCAM and STCAM patterns.
Preference is given to PTCAM match over STCAM
match
16 of 25
17Experimental Setup
Packet traces Malicious traces from MIT LL
and capture the flag contest from DEFCON Festival
No available power proxying traces and is an
ongoing research
C-based custom simulator written to behaviorally
simulate the entire system.
SNORT and ClamAV used as signature sets
Packets are reassembled and fed to the simulator
STCAM accesses saved to analyze the effect of
caching
TCAM energy consumption obtained from Agarwal et.
al TCAM modelling tool
17 of 25
18Results Signature Distribution
ClamAV and SNORT rule sets SNORT smaller
patterns (70 lt 4 bytes ClamAV medium
sized patterns (72 lt30 bytes
gt100 bytes)
18 of 25
19Results
- Effect of partitioning on Size
Partitioning circumvents natural TCAM compression
. However, negligible increase in TCAM size.
19 of 25
20Results
Partitioning reduces Energy-Delay Product (EDP) .
Two smaller TCAMs are faster than One single big
TCAM. Higher EDP savings for widths of 8 and 16
bytes.
20 of 25
21Results
- Energy reduction for a partitioned system
compared to a non-partitioned system verses TCAM
width for real-time traffic traces. - Energy savings range from 6 to 69 (SNORT) and
6 to 87 (ClamAV) - Smaller TCAMs widths give greater energy savings.
- Larger TCAM accesses use more dont care bits.
21 of 25
22Results
- Effect of Caching Hit rate
- Caching on STCAM width of 4 bytes analyzed.
- Hit rates range from 28 to 88 for cache sizes
of only 40 to 60 entries - A cache containing 40 to 60 entries represents
only 0.002 to 0.004, - respectively, of the S_TCAM entries
22 of 25
23Results
- Effect of Caching Energy Savings
Energy savings for a partitioned TCAM system
(w4) with a suffix cache compared to a
partitioned TCAM system with no suffix cache for
varying number of cache entries. 13 to 64
additional Savings
23 of 25
24Conclusion
- Developed an energy efficient partitioned
TCAM-based content inspection system for SNICs. - Energy and throughput aware
- Energy Delay Product improvements of up to 62
compared to previous non-partitioned TCAM
systems. - Up to 87 energy savings (average) compared to a
non-partitioned TCAM system. - A simple cache with a random replacement policy
further reduces the energy consumption by 64
compared to a partitioned TCAM system. - Caching incurs a throughput reduction of 5.5.
24 of 25
25Future Work
- Evaluating proposed bloom filter based
architecture - Improved caching techniques
- Attack robustness to counter maliciously
engineered packets - A pipelined architecture to hide cache misses and
improve throughput.
25 of 25