Computer Science is no more about computers - PowerPoint PPT Presentation

1 / 39

About This Presentation

Title:

Computer Science is no more about computers

Description:

School of Engineering. College of Arts & Sciences. School of Medicine. Academic Network ... School of Engineering. College of Arts & Sciences. School of ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 40

Provided by: davide56

Category:

more less

Transcript and Presenter's Notes

Title: Computer Science is no more about computers

1

Computer Science is no more about computers
than astronomy is about telescopes.
Edsger W. Dijkstra

2
Models, Algorithms, ArchitecturesforScalable
Packet Classification

David E. Taylor
Dissertation Defense
Department of Computer Science Engineering
22 July 2004

3
Internet Protocol (IP)
4
IP Route Lookup
Longest Prefix Match (LPM) using destination IP
address in packet header
5
IP Route Lookup
Longest Prefix Match (LPM) using destination IP
address in packet header
6
Packet Classification
Packet Filter Set
7
Packet Classification Problem

Given a packet P containing fields Pj and a set
of filters F with each filter Fi containing
fields Fij, select the highest priority exclusive
filter and r highest priority non-exclusive
filters such that for each matching filter i " j
Fij matches Pj
Performance tradeoffs commonlycharacterized by
point locationproblem in computational geometry
For n regions defined in j dimensions,for j gt 3,
a point may be located inmulti-dimensional space
inO(log n) time with O(nj) spaceor O(logj-1n)
time with O(n) space

Constraints
31 million lookups per second (10 Gb/s link)
Memory and power efficiency
Support for fast incremental updates

8
Dissertation Overview

Fast Internet Protocol Lookup (FIPL) search
engine
Scalable hardware implementation of a Longest
Prefix Matching algorithm
Survey taxonomy of packet classification
techniques
Frame the body of work according to high-level
approach
Analysis of the structure of real filter sets
Identify opportunities for better search
performance
ClassBench tool suite for packet classification
benchmarking
Promote standardized performance evaluation
Eliminate access barriers to realistic test
vectors
Distributed Crossproducting of Field Labels
(DCFL)
Leverage structure of real filter sets and
capabilities of current hardware
Achieve comparable search performance to TCAM
Scale to support large filter sets and filters
classifying on additional packet fields

9
Fast IP Lookup (FIPL) Engine

High-performance implementation of Eatherton
Dittias Tree Bitmap algorithm
Compressed multi-bit trie requires 6 to 8 bytes
of memory per stored prefix
Scalable architecture leverages memory
interleaving to allow multiple search engines to
share a memory interface
Each FIPL engine consumes less than 1 of logic
resources in a commodity FPGA
Evaluated performance using backbone route tables
open research systems
Robust lookup performance under update load

10
ClassBench
Filter Set Generator
Filter Set Analyzer
size
smoothing
scope
Synthetic Filter Set
Seed Filter Set
Trace Generator
Input Header Trace
scale
locality
Set of Benchmark Parameter Files

Filter Set Analyzer extracts relevant statistics
and probability distributions, generates
parameter file
Guided by high-level models developed from filter
set analysis
Parameter files provide complete anonymity of
addresses
Filter Set Generator produces a synthetic filter
set retaining characteristics specified by input
parameter file
High-level adjustments provide control over
filter set size and composition
Trace Generator creates a sequence of packet
headers to exercise the input filter set
High-level adjustments provide control over trace
size and locality of reference

11
Packet Classification Taxonomy
Exhaustive Search
Decomposition
Crossproducting
DCFL
RFC
P2C
Parallel BV
TCAM
ABV
Linear Search
Pruned Tuple Space
E-TCAM
Modular P. Class
HiCuts
HyperCuts
EGT
Conflict-Free Rectangle Search
Tuple Space
Grid-of-Tries
FIS Trees
Rectangle Search
Decision Tree
Tuple Space
12
Distributed Crossproducting of Field Labels

Motivated by observed structure of real filter
sets
Number of unique field values specified by
filters in the filter set is small relative to
the number of filters in the filter set
Number of unique field values matched by any
packet is small and remains relatively constant
for filter sets of various size
Leverage capabilities of current generation of
ASICs and FPGAs
Hundreds of embedded multi-port memory blocks (gt
1MB total)
Millions of logic gates and high clock speeds (gt
200 MHz)
Transform multi-field searching problem into a
distributed set membership query (set
intersection)
Parallel field-specific search engines
(decomposition)
Aggregation pipeline allows a new search to start
on each pipeline cycle
Scales to large filter sets and filters
classifying on additional packet fields
Enabling technology for next-generation services

13
Field Labeling
Form sets of unique filter fields Label each
unique filter field with a locally unique
label Count values support dynamic updates
14
Field Combinations Meta-Labeling
Generalizes to any combination of d filter fields
15
DCFL Preliminaries

Partition the filters in the filter set into
fields
Partition each packet header into corresponding
fields
Let Fi be the set of unique field values for
filter field i that appear in one or more filters
in the filter set
Let Fi(x) Í Fi be the subset of filter field
values in Fi matched by a packet with the value x
in header field i
Let Fi,j be the set of unique filter field value
pairs for fields i and j in the filter set i.e.
if (u,v) Î Fi,j there is some filter or filters
in the set with u in field i and v in field j
Let Fi,j(x,y) Í Fi,j be the subset of filter
field value pairs in Fi,j matched by a packet
with the value x in header field i and y in
header field j
This can be extended to higher-order
combinations, such as set Fi,j,k and subset
Fi,j,k(x,y,z), etc.

16
DCFL Search

In parallel, find subsets F1(w), F2(x), F3(y),
and F4(z)
In parallel, find subsets F1,2(w,x)and F3,4(y,z)
as follows
Let Fquery(w,x) be the set of possible field
value pairs formed from the crossproduct of F1(w)
and F2(x)
For each field value pair in Fquery(w,x), query
for set membership in F1,2, if the field value
pair is inset F1,2 add it to set F1,2(w,x)
Perform the symmetric operations to find subset
F3,4(y,z)
Find subset F1,2,3,4(w,x,y,z) by querying set
F1,2,3,4 with the field value combinations formed
from the crossproduct of F1,2(w,x) and F3,4(y,z)
Select the highest priority exclusive filter and
r highest priority non-exclusive filters in
F1,2,3,4(w,x,y,z)

17
Example DCFL Search
Protocol UDP
Source Address 1101 0100
Destination Address 1010 1101
Destination Port 3
18
Example DCFL Search
Aggregation Node
Aggregation Node
FDP,PR (a,a) (b,b) (a,c) (c,a)
FSA,DA (a,a) (b,a) (c,b) (d,c) (e,d) (f,c) (g,c)
(g,e) (a,e)
19
Example DCFL Search
FDP,PR(3, UDP) (a,c),(b,b)
FSA,DA(1101 0100, 1010 1101) (a,e),(c,b),(d,c)
,(f,c)
Aggregation Node
FSA,DA,DP,PR (a,a,a,a) (b,a,b,b)
(c,b,b,b) (d,c,a,c) (e,d,a,c) (f,c,c,a) (g,c,c,a)
(g,e,a,c) (a,e,a,c)
20
Optimizing the Aggregation Network

Pipeline of aggregation nodes ? performance
bottleneck is node with largest query set size,
Fquery
Query set size, Fquery, determines the number
of sequential memory accesses (SMA) performed at
the node
Freedom to aggregate fields in any order allows
various network constructions
Fquery varies with network construction
Cost of aggregation network, Gi, is the largest
worst-case query set size for all nodes in the
aggregation network
cost(Gi) maxFquery " F1,,F1,,d Î Gi
Select the minimum cost aggregation network
Gmin G cost(G) min cost(Gi) " i
Gmin determined by the structure of the filter set

21
Aggregation Network Cost Example
FSA(1101 0100) a,c,d,f
FDA(1010 1101) b,c,e
FDP(3) a,b
FPR(UDP) b,c
FDP,PR (a,a) (b,b) (a,c) (c,a)
Fquery (a,b) (a,c) (a,e) (c,b) (c,c) (c,e) (d,b)
(d,c) (d,e) (f,b) (f,c) (f,e)
Fquery (a,b) (a,c) (b,b) (b,c)
FSA,DA (a,a) (b,a) (c,b) (d,c) (e,d) (f,c) (g,c)
(g,e) (a,e)
(a,c),(b,b)
(a,e),(c,b), (d,c),(f,c)
Fquery (a,e,a,c) (a,e,b,b) (c,b,a,c)
(c,b,b,b) (d,c,a,c) (d,c,b,b) (f,c,a,c) (f,c,b,b)
FSA,DA,DP,PR (a,a,a,a) (b,a,b,b)
(c,b,b,b) (d,c,a,c) (e,d,a,c) (f,c,c,a) (g,c,c,a)
(g,e,a,c) (a,e,a,c)
22
Aggregation Network Cost Example
FSA(1101 0100) a,c,d,f
FDA(1010 1101) b,c,e
FDP(3) a,b
FPR(UDP) b,c
FDP,PR (a,a) (a,b) (b,b) (c,a) (d,a) (c,c) (e,a)
Fquery (a,b) (a,c) (c,b) (c,c) (d,b) (d,c) (f,b)
(f,c)
Fquery (b,a) (b,b) (c,a) (c,b) (e,a) (e,b)
FSA,PR (a,a) (d,c) (g,a) (b,b) (e,c) (g,c) (c,b)
(f,a) (a,c)
(b,b),(c,a),(e,a)
(a,c),(c,b),(d,c)
FSA,DA,DP,PR (a,a,a,a) (b,a,b,b)
(c,b,b,b) (d,c,a,c) (e,d,a,c) (f,c,c,a) (g,c,c,a)
(g,e,a,c) (a,e,a,c)
Fquery (a,b,b,c) (a,c,a,c) (a,e,a,c) (c,b,b,b)
(c,c,a,b) (c,e,a,b) (d,b,b,c) (d,c,a,c) (d,e,a,c)
23
DCFL Optimizations

Field Splitting
Partition filter fields in order to limit the
maximum number of matching field labels for each
packet field
Limit specified by a threshold, t
Address prefixes O(N log W) algorithm finds
sub-prefix lengths to limit prefix nesting in
each sub-tree to t
Port ranges O(N log N) algorithm sorts port
ranges into subsets to limit range overlap in
each subset to t
Each split adds an aggregation node to the
network
Set membership data structures
Minimize the number of sequential memory accesses
(SMA) per query
Explored three types of data structures Bloom
Filter Arrays, Field Label Indexing, Meta-Label
Indexing
Each aggregation node may employ the data
structure that minimizes the worst-case SMA

24
Real Filter Sets Results

12 real filter sets collected from ISPs,
researchers, and a network equipment vendor
Size range 68 to 4557 filters
Optimized ASIC implementation could achieve 100M
searches/sec with storage for over 200k filters
Assuming 500MHz, dual-port embedded memory,
288-bit word

25
Field Splitting Results

Primary benefit achieve higher performance with
better memory efficiency
Reduces required memory words size for give
performance target
Point of diminishing returns for small thresholds
and large word sizes

26
ClassBench Results

Generated large synthetic filter sets using
parameter files from real filter sets
SMA performance is less sensitive to memory word
size
Achieve better memory efficiency due to smaller
required word size
Provides for easier implementation/management as
tuning W is less critical

27
ClassBench Results

Generated synthetic filter sets (16k filters) to
examine scalability to additional filter fields
Half of all filters specifying TCP or UDP specify
a non-wildcard extra field
Negligible effect on SMA performance
Each additional field increases memory
requirements by 10B/filter

28
Contributions

Design, hardware implementation, and evaluation
of a high-performance Longest Prefix Matching
(LPM) search engine
Made VHDL code for Fast Internet Protocol Lookup
(FIPL) search engine publicly available
Integrated FIPL into the Packet Processor of the
Network Services Platform
Survey taxonomy of packet classification
techniques
Identified opportunities for contributions
Analysis of real filter sets
Provided insight into the impetuses governing
filter composition
Identified varies properties that can be
leveraged for faster searches
ClassBench packet classification benchmarking
tool suite
Eliminated significant access barrier to
realistic test vectors for the research community
Tools and parameter files are publicly available
and in use by several researchers
Distributed Crossproducting of Field Labels
(DCFL)
Novel packet classification algorithm that scales
to support larger filter sets and filter
classifying on additional packet fields

29
Future Directions

Promotion and refinement of ClassBench
Develop formal benchmarking methodology
Consensus building and standardizing efforts
could be taken up by the Internet Engineering
Task Force (IETF)
Further optimization of DCFL
New set membership (set intersection) data
structures for aggregation nodes
Hardware implementation of DCFL
Architecture for dynamic aggregation network
restructuring
Application of DCFL to deep packet inspection
and other hybrid searching problems
Longest Prefix Matching Range Matching
String Matching

30
Acknowledgements

Advisors committee
Jon Turner (research advisor)
William D. Richard (academic advisor)
Dan Fuhrmann
John Lockwood
Fred Rosenberger
FIPL contributors
Will Eatherton Zubin Dittia (Tree Bitmap)
John DeHart (testing)
Tucker Williams, Ed Spitznagel, Todd Sproull
(software)
ClassBench testers
Ed Spitznagel Sailesh Kumar

ARL faculty staff
CSE faculty staff
IBM Zurich Research Lab
Marcel Waldvogel
Andreas Herkersdorf
Fellow ARL students
Lunchtime debate club
Parents
Friends
Sara Taylor (lovely wife)

31
Thank you.Questions?

Only the curious will learn and only the resolute
overcome the obstacle to learning. The quest
quotient
has always excited me more than the intelligence
quotient.
Eugene S. Wilson, Dean of Admissions, Amherst

32
Supplementary Slides
33
Internet Architecture
34
Reducing Query Set Size

Apply Field Splitting to reduce field overlap
(field label set size) to a given threshold, t
Prefixes ? choose sub-prefix lengths such that
prefix nesting remains below threshold
Port ranges ? distribute ranges into minimum
number of bins such that range overlap remains
below threshold

Restrict aggregation networkconstruction by
requiringthat each aggregation nodeoperate on
at least onefield label set
Provides control overquery set size at
eachaggregation node
Utilize delay buffers forpipelined implementation

35
DCFL Architecture w/ Field Splitting
36
Bloom Filter Array Aggregation Node

Bloom filters use an m-bit vector to efficiently
represent a set
Each element sets k bits in the vector
False positive probability is tunable
Bloom Filter Array utilizespre-filter hash
function to distributed elements overW Bloom
filters
Minimizes SMA per query

37
Meta-Label Indexing Aggregation Node

Field value combinations in F1,,i can be
identified by combination of meta-label for
fields (1,,i-1) and the field label for field i
Sort label combinations into bins using
meta-label
For each bin, construct list of field labels and
new meta-label
Store lists in array Aiindexed by meta-label
Multi-way match logic comparesN label pairs per
memory access
Length() is an array storing the lengths of the
lists in Ai in decreasing order

38
DCFL Implementation Architecture
39
Related Work

Ternary Content Addressable Memory (TCAM)
100 200 million searches per second
Requires range to prefix conversion
For IP 5-tuple, each filter may require 900
entries (typically 2 7)
Consumes 3 µW per bit (150x more than SRAM)
Extended TCAM (E-TCAM) and Partitioned Encoded
Search of TCAM (PEST) utilize partitioning
algorithms to reduce power consumption by over
95 (Spitznagel, Taylor, Turner)
Does not address scalability to classify on
additional packet fields
Recursive Flow Classification (RFC) (Gupta,
McKeown)
Performs independent searches on chunks of the
packet header
Performs a multi-stage aggregation utilizing
equivalence classes
HyperCuts (Singh, Baboescu, Varghese, Wang)
Builds a decision tree by partitioning the filter
set
Utilizes uniform partitions and indexing to allow
each decision tree node to make partitions in
multiple dimensions