A Resource Efficient Content Inspection System for Next Generation Smart NICs - PowerPoint PPT Presentation

About This Presentation
Title:

A Resource Efficient Content Inspection System for Next Generation Smart NICs

Description:

A Resource Efficient Content Inspection System for Next Generation Smart NICs Karthikeyan Sabhanatarajan, Ann Gordon-Ross* The Energy Efficient Internet Project – PowerPoint PPT presentation

Number of Views:141
Avg rating:3.0/5.0
Slides: 26
Provided by: Kart53
Category:

less

Transcript and Presenter's Notes

Title: A Resource Efficient Content Inspection System for Next Generation Smart NICs


1
A Resource Efficient Content Inspection System
for Next Generation Smart NICs
  • Karthikeyan Sabhanatarajan, Ann Gordon-Ross

The Energy Efficient Internet Project High-perform
ance Computing Simulation Research Lab ECE
Department, University of Florida, Gainesville
This work was supported by the U.S. National
Science Foundation
Also affiliated with NSF Center for High
Performance Reconfigurable Computing
2
Introduction
INTERNET
  • Internet has grown at an alarming rate 305
    between 2000 and 2008

2 of 25
3
Introduction
INTERNET
  • Edge devices are left idle 75 of the time with
    power management
  • features disabled to maintain network
    connectivity.

3 of 25
4
Introduction
A solution to save power on the idle devices is
power proxying
The idle PC is allowed to sleep
The PC delegates responsibility to the NIC to
handle network traffic
Additionally, NICs can enhance network security
through Network Intrusion Detection
4 of 25
5
Introduction
Next Generation Interfaces Also known as Smart
NICs are expected to take increased network
responsibility
Key Requirement Packet Inspection
Packet
Header Inspection
Content Inspection
This presentation focuses on Content Inspection.
Content inspection is the process of searching
the payload of the packet for the occurrence of
known set of patterns called signatures.
5 of 25
6
Motivation
Existing Methodologies
Hardware
Software
Boyer-Moore
Aho Corasick
FPGAs
TCAMs
Bloom Filters
Boyer-Moore
Aho Corasick
FPGAs
TCAMs
Bloom Filters
Wu Manber
Wu Manber
Software techniques cannot support high speed
links with large signature sets
Auxiliary data structures such as SRAM are used
to store pattern combinations to help determine
a pattern match
FPGAs Exploits Parallelism Prohibitive price,
area, and power for wide scale deployments
TCAMs Popular Option Performance O(1)
However, prohibitive energy, price, and auxiliary
data structure requirements for existing
implementations.
Bloom Filters Energy efficient and moderate
throughput False positives required further
inspection on payload matching , imposes
parallelism limits (scalability)
6 of 25
7
Background TCAM Methodology
Sample Signature
A B C D E F G H A B C D J K L M E F G
A B C D E F G H J K L M E F G
When w4
TCAM
Prefix Pattern
A B C D E F G H A B C D J K L M E F G
Suffix Pattern
w 4
TCAM
TCAMs are attractive candidates for pattern
matching due to their inherent simplicity in
pattern matching , small look up time , high
throughput, high density, and scalability.
7 of 25
8
Background TCAM Methodology
Proposed by Lakshman et. al
A B C D E F G H J K L M E F G U I
A B C D E F G H J K L M E F G U I
A B C D E F G H J K L M E F G U I
A B C D E F G H J K L M E F G U I
A B C D E F G H J K L M E F G
w 4
TCAM
Auxiliary SRAM structures contain several pattern
permutations to identify valid patterns
O(N2) Auxiliary SRAM structure space
requirement.
Gao et. al reduced this requirement to O(NlogN)
by storing address permutations.
8 of 25
9
Proposed Solution
TCAM Techniques are
  • Simplest and fastest technique - O(1) look up.
  • Can match future speed limits of 10 Gbps.
  • Highly scalable with no parallelism limits.
  • Can accommodate signatures of varying length and
    different signature set sizes with ease

However they suffer from
  • Increased energy consumption
  • Prohibitive price
  • Increased auxiliary data structure requirements

Making them unsuitable for wide scale deployment
in SNICs
9 of 25
10
Proposed Solution
  • We propose a hybrid TCAM based solution

Our Technique solves
  • Energy efficiency Through partitioned
    architecture
  • Additional further reduction in power
    consumption through caching by exploiting
  • network locality
  • Auxiliary data structure requirement reduction
    using bloom filter
  • or software techniques
  • Meets throughput requirements of high speed
    links such as 1 Gbps/ 10 Gbps with ease
  • More suitable for wide scale deployment due to
    high energy efficiency and reduced
  • memory requirements.

10 of 25
11
Hybrid TCAM Methodology
PTCAM
STCAM
E F G H A B C D J K L M E F G
PTCAM
A B C D
A B C D E F G H J K L M E F G
STCAM
w 4
w 4
w 4
TCAM
Partition the single TCAM into a prefix TCAM
(PTCAM) and a suffix TCAM (STCAM)
Store signatures in the STCAM and PTCAM
accordingly. The signature is then expressed as
permutation of STCAM and PTCAM address.
A B C D E F G H A B C D J K L M E F G
P0 S0 S1 S2 S3
11 of 25
12
Exploiting Signature Locality
Our experimentation indicates that there exists
sufficient locality in network traces.
To reduce unwanted switching we exploit this
property and introduce a cache between the PTCAM
and STCAM
12 of 25
13
Hybrid TCAM Methodology
PTCAM
STCAM
PTCAM
STCAM
Suffix Cache
Ctrl
E F G H A B C D J K L M E F G
E F G H A B C D J K L M E F G
A B C D
A B C D
w 4
w 4
w 4
w 4
13 of 25
14
Hybrid TCAM Methodology
PTCAM
STCAM
Suffix Cache
Ctrl
E F G H A B C D J K L M E F G
A B C D
w 4
Hit
Hit
Enable
Hit
w 4
Miss
Activator
Right Shift
Enable Buffer
1
0
0
0
Enabler
Pause
0th ..(w-1)th
Payload is fed to the inspection system, shifted
at the rate of 1 byte/clock
The cache is activated (w-1) clock cycles after a
TCAM hit
A cache miss pauses shifting to allow searching
the suffix TCAM for the pattern
Cache controller ( ctrl) updates suffix cache
14 of 25
15
Hybrid TCAM Methodology
Left Shift
A B C D E F G H J K L M E F G U I
PTCAM
STCAM
Suffix Cache
Ctrl
E F G H A B C D J K L M E F G
A B C D
w 4
Hit
Hit
Enable
Hit
w 4
Miss
Activator
Right Shift
Enable Buffer
1
0
0
0
Enabler
Pause
0th ..(w-1)th
15 of 25
16
Hybrid TCAM Methodology
Left Shift
A B C D E F G H J K L M E F G U I
PTCAM
STCAM
Suffix Cache
Ctrl
E F G H A B C D J K L M E F G
A B C D
w 4
Hit
Hit
Enable
Hit
w 4
Miss
Activator
Right Shift
Hit
Enable Buffer
1
0
0
0
Enabler
Pause
0th ..(w-1)th
Contention Resolution
A contention resolution unit handles contention
between identical PTCAM and STCAM patterns.
Preference is given to PTCAM match over STCAM
match
16 of 25
17
Experimental Setup
Packet traces Malicious traces from MIT LL
and capture the flag contest from DEFCON Festival
No available power proxying traces and is an
ongoing research
C-based custom simulator written to behaviorally
simulate the entire system.
SNORT and ClamAV used as signature sets
Packets are reassembled and fed to the simulator
STCAM accesses saved to analyze the effect of
caching
TCAM energy consumption obtained from Agarwal et.
al TCAM modelling tool
17 of 25
18
Results Signature Distribution
ClamAV and SNORT rule sets SNORT smaller
patterns (70 lt 4 bytes ClamAV medium
sized patterns (72 lt30 bytes
gt100 bytes)
18 of 25
19
Results
  • Effect of partitioning on Size

Partitioning circumvents natural TCAM compression
. However, negligible increase in TCAM size.
19 of 25
20
Results
  • EDP Reduction

Partitioning reduces Energy-Delay Product (EDP) .
Two smaller TCAMs are faster than One single big
TCAM. Higher EDP savings for widths of 8 and 16
bytes.
20 of 25
21
Results
  • Energy Savings
  1. Energy reduction for a partitioned system
    compared to a non-partitioned system verses TCAM
    width for real-time traffic traces.
  2. Energy savings range from 6 to 69 (SNORT) and
    6 to 87 (ClamAV)
  3. Smaller TCAMs widths give greater energy savings.
  4. Larger TCAM accesses use more dont care bits.

21 of 25
22
Results
  • Effect of Caching Hit rate
  • Caching on STCAM width of 4 bytes analyzed.
  • Hit rates range from 28 to 88 for cache sizes
    of only 40 to 60 entries
  • A cache containing 40 to 60 entries represents
    only 0.002 to 0.004,
  • respectively, of the S_TCAM entries

22 of 25
23
Results
  • Effect of Caching Energy Savings

Energy savings for a partitioned TCAM system
(w4) with a suffix cache compared to a
partitioned TCAM system with no suffix cache for
varying number of cache entries. 13 to 64
additional Savings
23 of 25
24
Conclusion
  1. Developed an energy efficient partitioned
    TCAM-based content inspection system for SNICs.
  2. Energy and throughput aware
  3. Energy Delay Product improvements of up to 62
    compared to previous non-partitioned TCAM
    systems.
  4. Up to 87 energy savings (average) compared to a
    non-partitioned TCAM system.
  5. A simple cache with a random replacement policy
    further reduces the energy consumption by 64
    compared to a partitioned TCAM system.
  6. Caching incurs a throughput reduction of 5.5.

24 of 25
25
Future Work
  1. Evaluating proposed bloom filter based
    architecture
  2. Improved caching techniques
  3. Attack robustness to counter maliciously
    engineered packets
  4. A pipelined architecture to hide cache misses and
    improve throughput.

25 of 25
Write a Comment
User Comments (0)
About PowerShow.com