Computer Science is no more about computers - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Computer Science is no more about computers

Description:

School of Engineering. College of Arts & Sciences. School of Medicine. Academic Network ... School of Engineering. College of Arts & Sciences. School of ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 40
Provided by: davide56
Category:

less

Transcript and Presenter's Notes

Title: Computer Science is no more about computers


1
  • Computer Science is no more about computers
  • than astronomy is about telescopes.
  • Edsger W. Dijkstra

2
Models, Algorithms, ArchitecturesforScalable
Packet Classification
  • David E. Taylor
  • Dissertation Defense
  • Department of Computer Science Engineering
  • 22 July 2004

3
Internet Protocol (IP)
4
IP Route Lookup
Longest Prefix Match (LPM) using destination IP
address in packet header
5
IP Route Lookup
Longest Prefix Match (LPM) using destination IP
address in packet header
6
Packet Classification
Packet Filter Set
7
Packet Classification Problem
  • Given a packet P containing fields Pj and a set
    of filters F with each filter Fi containing
    fields Fij, select the highest priority exclusive
    filter and r highest priority non-exclusive
    filters such that for each matching filter i " j
    Fij matches Pj
  • Performance tradeoffs commonlycharacterized by
    point locationproblem in computational geometry
  • For n regions defined in j dimensions,for j gt 3,
    a point may be located inmulti-dimensional space
    inO(log n) time with O(nj) spaceor O(logj-1n)
    time with O(n) space
  • Constraints
  • 31 million lookups per second (10 Gb/s link)
  • Memory and power efficiency
  • Support for fast incremental updates

8
Dissertation Overview
  • Fast Internet Protocol Lookup (FIPL) search
    engine
  • Scalable hardware implementation of a Longest
    Prefix Matching algorithm
  • Survey taxonomy of packet classification
    techniques
  • Frame the body of work according to high-level
    approach
  • Analysis of the structure of real filter sets
  • Identify opportunities for better search
    performance
  • ClassBench tool suite for packet classification
    benchmarking
  • Promote standardized performance evaluation
  • Eliminate access barriers to realistic test
    vectors
  • Distributed Crossproducting of Field Labels
    (DCFL)
  • Leverage structure of real filter sets and
    capabilities of current hardware
  • Achieve comparable search performance to TCAM
  • Scale to support large filter sets and filters
    classifying on additional packet fields

9
Fast IP Lookup (FIPL) Engine
  • High-performance implementation of Eatherton
    Dittias Tree Bitmap algorithm
  • Compressed multi-bit trie requires 6 to 8 bytes
    of memory per stored prefix
  • Scalable architecture leverages memory
    interleaving to allow multiple search engines to
    share a memory interface
  • Each FIPL engine consumes less than 1 of logic
    resources in a commodity FPGA
  • Evaluated performance using backbone route tables
    open research systems
  • Robust lookup performance under update load

10
ClassBench
Filter Set Generator
Filter Set Analyzer
size
smoothing
scope
Synthetic Filter Set
Seed Filter Set
Trace Generator
Input Header Trace
scale
locality
Set of Benchmark Parameter Files
  • Filter Set Analyzer extracts relevant statistics
    and probability distributions, generates
    parameter file
  • Guided by high-level models developed from filter
    set analysis
  • Parameter files provide complete anonymity of
    addresses
  • Filter Set Generator produces a synthetic filter
    set retaining characteristics specified by input
    parameter file
  • High-level adjustments provide control over
    filter set size and composition
  • Trace Generator creates a sequence of packet
    headers to exercise the input filter set
  • High-level adjustments provide control over trace
    size and locality of reference

11
Packet Classification Taxonomy
Exhaustive Search
Decomposition
Crossproducting
DCFL
RFC
P2C
Parallel BV
TCAM
ABV
Linear Search
Pruned Tuple Space
E-TCAM
Modular P. Class
HiCuts
HyperCuts
EGT
Conflict-Free Rectangle Search
Tuple Space
Grid-of-Tries
FIS Trees
Rectangle Search
Decision Tree
Tuple Space
12
Distributed Crossproducting of Field Labels
  • Motivated by observed structure of real filter
    sets
  • Number of unique field values specified by
    filters in the filter set is small relative to
    the number of filters in the filter set
  • Number of unique field values matched by any
    packet is small and remains relatively constant
    for filter sets of various size
  • Leverage capabilities of current generation of
    ASICs and FPGAs
  • Hundreds of embedded multi-port memory blocks (gt
    1MB total)
  • Millions of logic gates and high clock speeds (gt
    200 MHz)
  • Transform multi-field searching problem into a
    distributed set membership query (set
    intersection)
  • Parallel field-specific search engines
    (decomposition)
  • Aggregation pipeline allows a new search to start
    on each pipeline cycle
  • Scales to large filter sets and filters
    classifying on additional packet fields
  • Enabling technology for next-generation services

13
Field Labeling
Form sets of unique filter fields Label each
unique filter field with a locally unique
label Count values support dynamic updates
14
Field Combinations Meta-Labeling
Generalizes to any combination of d filter fields
15
DCFL Preliminaries
  • Partition the filters in the filter set into
    fields
  • Partition each packet header into corresponding
    fields
  • Let Fi be the set of unique field values for
    filter field i that appear in one or more filters
    in the filter set
  • Let Fi(x) Í Fi be the subset of filter field
    values in Fi matched by a packet with the value x
    in header field i
  • Let Fi,j be the set of unique filter field value
    pairs for fields i and j in the filter set i.e.
    if (u,v) Î Fi,j there is some filter or filters
    in the set with u in field i and v in field j
  • Let Fi,j(x,y) Í Fi,j be the subset of filter
    field value pairs in Fi,j matched by a packet
    with the value x in header field i and y in
    header field j
  • This can be extended to higher-order
    combinations, such as set Fi,j,k and subset
    Fi,j,k(x,y,z), etc.

16
DCFL Search
  • In parallel, find subsets F1(w), F2(x), F3(y),
    and F4(z)
  • In parallel, find subsets F1,2(w,x)and F3,4(y,z)
    as follows
  • Let Fquery(w,x) be the set of possible field
    value pairs formed from the crossproduct of F1(w)
    and F2(x)
  • For each field value pair in Fquery(w,x), query
    for set membership in F1,2, if the field value
    pair is inset F1,2 add it to set F1,2(w,x)
  • Perform the symmetric operations to find subset
    F3,4(y,z)
  • Find subset F1,2,3,4(w,x,y,z) by querying set
    F1,2,3,4 with the field value combinations formed
    from the crossproduct of F1,2(w,x) and F3,4(y,z)
  • Select the highest priority exclusive filter and
    r highest priority non-exclusive filters in
    F1,2,3,4(w,x,y,z)

17
Example DCFL Search
Protocol UDP
Source Address 1101 0100
Destination Address 1010 1101
Destination Port 3
18
Example DCFL Search
Aggregation Node
Aggregation Node
FDP,PR (a,a) (b,b) (a,c) (c,a)
FSA,DA (a,a) (b,a) (c,b) (d,c) (e,d) (f,c) (g,c)
(g,e) (a,e)
19
Example DCFL Search
FDP,PR(3, UDP) (a,c),(b,b)
FSA,DA(1101 0100, 1010 1101) (a,e),(c,b),(d,c)
,(f,c)
Aggregation Node
FSA,DA,DP,PR (a,a,a,a) (b,a,b,b)
(c,b,b,b) (d,c,a,c) (e,d,a,c) (f,c,c,a) (g,c,c,a)
(g,e,a,c) (a,e,a,c)
20
Optimizing the Aggregation Network
  • Pipeline of aggregation nodes ? performance
    bottleneck is node with largest query set size,
    Fquery
  • Query set size, Fquery, determines the number
    of sequential memory accesses (SMA) performed at
    the node
  • Freedom to aggregate fields in any order allows
    various network constructions
  • Fquery varies with network construction
  • Cost of aggregation network, Gi, is the largest
    worst-case query set size for all nodes in the
    aggregation network
  • cost(Gi) maxFquery " F1,,F1,,d Î Gi
  • Select the minimum cost aggregation network
  • Gmin G cost(G) min cost(Gi) " i
  • Gmin determined by the structure of the filter set

21
Aggregation Network Cost Example
FSA(1101 0100) a,c,d,f
FDA(1010 1101) b,c,e
FDP(3) a,b
FPR(UDP) b,c
FDP,PR (a,a) (b,b) (a,c) (c,a)
Fquery (a,b) (a,c) (a,e) (c,b) (c,c) (c,e) (d,b)
(d,c) (d,e) (f,b) (f,c) (f,e)
Fquery (a,b) (a,c) (b,b) (b,c)
FSA,DA (a,a) (b,a) (c,b) (d,c) (e,d) (f,c) (g,c)
(g,e) (a,e)
(a,c),(b,b)
(a,e),(c,b), (d,c),(f,c)
Fquery (a,e,a,c) (a,e,b,b) (c,b,a,c)
(c,b,b,b) (d,c,a,c) (d,c,b,b) (f,c,a,c) (f,c,b,b)
FSA,DA,DP,PR (a,a,a,a) (b,a,b,b)
(c,b,b,b) (d,c,a,c) (e,d,a,c) (f,c,c,a) (g,c,c,a)
(g,e,a,c) (a,e,a,c)
22
Aggregation Network Cost Example
FSA(1101 0100) a,c,d,f
FDA(1010 1101) b,c,e
FDP(3) a,b
FPR(UDP) b,c
FDP,PR (a,a) (a,b) (b,b) (c,a) (d,a) (c,c) (e,a)
Fquery (a,b) (a,c) (c,b) (c,c) (d,b) (d,c) (f,b)
(f,c)
Fquery (b,a) (b,b) (c,a) (c,b) (e,a) (e,b)
FSA,PR (a,a) (d,c) (g,a) (b,b) (e,c) (g,c) (c,b)
(f,a) (a,c)
(b,b),(c,a),(e,a)
(a,c),(c,b),(d,c)
FSA,DA,DP,PR (a,a,a,a) (b,a,b,b)
(c,b,b,b) (d,c,a,c) (e,d,a,c) (f,c,c,a) (g,c,c,a)
(g,e,a,c) (a,e,a,c)
Fquery (a,b,b,c) (a,c,a,c) (a,e,a,c) (c,b,b,b)
(c,c,a,b) (c,e,a,b) (d,b,b,c) (d,c,a,c) (d,e,a,c)
23
DCFL Optimizations
  • Field Splitting
  • Partition filter fields in order to limit the
    maximum number of matching field labels for each
    packet field
  • Limit specified by a threshold, t
  • Address prefixes O(N log W) algorithm finds
    sub-prefix lengths to limit prefix nesting in
    each sub-tree to t
  • Port ranges O(N log N) algorithm sorts port
    ranges into subsets to limit range overlap in
    each subset to t
  • Each split adds an aggregation node to the
    network
  • Set membership data structures
  • Minimize the number of sequential memory accesses
    (SMA) per query
  • Explored three types of data structures Bloom
    Filter Arrays, Field Label Indexing, Meta-Label
    Indexing
  • Each aggregation node may employ the data
    structure that minimizes the worst-case SMA

24
Real Filter Sets Results
  • 12 real filter sets collected from ISPs,
    researchers, and a network equipment vendor
  • Size range 68 to 4557 filters
  • Optimized ASIC implementation could achieve 100M
    searches/sec with storage for over 200k filters
  • Assuming 500MHz, dual-port embedded memory,
    288-bit word

25
Field Splitting Results
  • Primary benefit achieve higher performance with
    better memory efficiency
  • Reduces required memory words size for give
    performance target
  • Point of diminishing returns for small thresholds
    and large word sizes

26
ClassBench Results
  • Generated large synthetic filter sets using
    parameter files from real filter sets
  • SMA performance is less sensitive to memory word
    size
  • Achieve better memory efficiency due to smaller
    required word size
  • Provides for easier implementation/management as
    tuning W is less critical

27
ClassBench Results
  • Generated synthetic filter sets (16k filters) to
    examine scalability to additional filter fields
  • Half of all filters specifying TCP or UDP specify
    a non-wildcard extra field
  • Negligible effect on SMA performance
  • Each additional field increases memory
    requirements by 10B/filter

28
Contributions
  • Design, hardware implementation, and evaluation
    of a high-performance Longest Prefix Matching
    (LPM) search engine
  • Made VHDL code for Fast Internet Protocol Lookup
    (FIPL) search engine publicly available
  • Integrated FIPL into the Packet Processor of the
    Network Services Platform
  • Survey taxonomy of packet classification
    techniques
  • Identified opportunities for contributions
  • Analysis of real filter sets
  • Provided insight into the impetuses governing
    filter composition
  • Identified varies properties that can be
    leveraged for faster searches
  • ClassBench packet classification benchmarking
    tool suite
  • Eliminated significant access barrier to
    realistic test vectors for the research community
  • Tools and parameter files are publicly available
    and in use by several researchers
  • Distributed Crossproducting of Field Labels
    (DCFL)
  • Novel packet classification algorithm that scales
    to support larger filter sets and filter
    classifying on additional packet fields

29
Future Directions
  • Promotion and refinement of ClassBench
  • Develop formal benchmarking methodology
  • Consensus building and standardizing efforts
    could be taken up by the Internet Engineering
    Task Force (IETF)
  • Further optimization of DCFL
  • New set membership (set intersection) data
    structures for aggregation nodes
  • Hardware implementation of DCFL
  • Architecture for dynamic aggregation network
    restructuring
  • Application of DCFL to deep packet inspection
    and other hybrid searching problems
  • Longest Prefix Matching Range Matching
    String Matching

30
Acknowledgements
  • Advisors committee
  • Jon Turner (research advisor)
  • William D. Richard (academic advisor)
  • Dan Fuhrmann
  • John Lockwood
  • Fred Rosenberger
  • FIPL contributors
  • Will Eatherton Zubin Dittia (Tree Bitmap)
  • John DeHart (testing)
  • Tucker Williams, Ed Spitznagel, Todd Sproull
    (software)
  • ClassBench testers
  • Ed Spitznagel Sailesh Kumar
  • ARL faculty staff
  • CSE faculty staff
  • IBM Zurich Research Lab
  • Marcel Waldvogel
  • Andreas Herkersdorf
  • Fellow ARL students
  • Lunchtime debate club
  • Parents
  • Friends
  • Sara Taylor (lovely wife)

31
Thank you.Questions?
  • Only the curious will learn and only the resolute
    overcome the obstacle to learning. The quest
    quotient
  • has always excited me more than the intelligence
    quotient.
  • Eugene S. Wilson, Dean of Admissions, Amherst

32
Supplementary Slides
33
Internet Architecture
34
Reducing Query Set Size
  • Apply Field Splitting to reduce field overlap
    (field label set size) to a given threshold, t
  • Prefixes ? choose sub-prefix lengths such that
    prefix nesting remains below threshold
  • Port ranges ? distribute ranges into minimum
    number of bins such that range overlap remains
    below threshold
  • Restrict aggregation networkconstruction by
    requiringthat each aggregation nodeoperate on
    at least onefield label set
  • Provides control overquery set size at
    eachaggregation node
  • Utilize delay buffers forpipelined implementation

35
DCFL Architecture w/ Field Splitting
36
Bloom Filter Array Aggregation Node
  • Bloom filters use an m-bit vector to efficiently
    represent a set
  • Each element sets k bits in the vector
  • False positive probability is tunable
  • Bloom Filter Array utilizespre-filter hash
    function to distributed elements overW Bloom
    filters
  • Minimizes SMA per query

37
Meta-Label Indexing Aggregation Node
  • Field value combinations in F1,,i can be
    identified by combination of meta-label for
    fields (1,,i-1) and the field label for field i
  • Sort label combinations into bins using
    meta-label
  • For each bin, construct list of field labels and
    new meta-label
  • Store lists in array Aiindexed by meta-label
  • Multi-way match logic comparesN label pairs per
    memory access
  • Length() is an array storing the lengths of the
    lists in Ai in decreasing order

38
DCFL Implementation Architecture
39
Related Work
  • Ternary Content Addressable Memory (TCAM)
  • 100 200 million searches per second
  • Requires range to prefix conversion
  • For IP 5-tuple, each filter may require 900
    entries (typically 2 7)
  • Consumes 3 µW per bit (150x more than SRAM)
  • Extended TCAM (E-TCAM) and Partitioned Encoded
    Search of TCAM (PEST) utilize partitioning
    algorithms to reduce power consumption by over
    95 (Spitznagel, Taylor, Turner)
  • Does not address scalability to classify on
    additional packet fields
  • Recursive Flow Classification (RFC) (Gupta,
    McKeown)
  • Performs independent searches on chunks of the
    packet header
  • Performs a multi-stage aggregation utilizing
    equivalence classes
  • HyperCuts (Singh, Baboescu, Varghese, Wang)
  • Builds a decision tree by partitioning the filter
    set
  • Utilizes uniform partitions and indexing to allow
    each decision tree node to make partitions in
    multiple dimensions
Write a Comment
User Comments (0)
About PowerShow.com