Scalable IP Lookup for Programmable Routers - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Scalable IP Lookup for Programmable Routers

Description:

ARL, Washington University in Saint Louis. http://www.arl.wustl.edu. David B. Parlour ... Amenable to implementation in a programmable router ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 23
Provided by: davide56
Category:

less

Transcript and Presenter's Notes

Title: Scalable IP Lookup for Programmable Routers


1
Scalable IP Lookup forProgrammable Routers
  • David E. Taylor, Jonathan S. Turner,
  • John W. Lockwood, Todd S. Sproull
  • ARL, Washington University in Saint Louis
  • http//www.arl.wustl.edu
  • David B. Parlour
  • Xilinx, Inc.
  • http//www.xilinx.com
  • IEEE Infocom 2002, New York

2
Motivation Focus
  • Scalability strike a favorable balance between
    lookup performance, resource utilization, and
    update performance for high lookup rates and
    large databases
  • Amenable to implementation in a programmable
    router
  • Maximize packet processing resources ? minimize
    resource utilization by baseline functionality
  • Proof of concept using open-platform research
    systems
  • Algorithm and architecture efficiently scale to
    support multi-gigabit links
  • Memory usage for route table ( 10 bytes per
    entry)
  • Hardware resource usage for search engine ( 1
    of FPGA CLBs per 500 Mb/s)
  • Developed supporting control software with web
    interface

3
Route Lookup Example
Query packet arriving on port 3 destined for
128.252.153.194 Result anything arriving on port
3 going to 128.252. transmit on port 5
4
Route Lookup Challenges
  • Classless Inter-Domain Routing (CIDR) allows
    route table entries to be variable-length
    prefixes
  • Requires a Longest Prefix Match (LPM) search over
    the table to find the most specific route
  • Backbone route tables are extremely large
  • Currently 70k to 110k entries with an approximate
    doubling every two years
  • Optical link rates place high throughput
    constraints on route lookup engines
  • 2.5 Gb/s to 40 Gb/s ?5.9M pkt/s to 94.3M pkt/s
  • Must support frequent updates
  • Periodic distribution of routing information

5
Eatherton Dittias Tree Bitmap
6
Eatherton Dittias Tree Bitmap
Create multi-bit decision trie using k-bit strides
Simultaneously compare k address bits per node
Reduces number of memory accesses at the cost of
memory space
7
Eatherton Dittias Tree Bitmap
Compress multi-bit nodes using bitmaps
Extending Paths Bitmap set of exit points from
multi-bit node Internal Prefix Bitmap set of
stored prefixes in multi-bit node
8
Eatherton Dittias Tree Bitmap
  • Minimize pointer storage
  • Store all children of a node contiguously with
    apointer to first child
  • Store next-hop information for internal
    prefixescontiguously and store pointer to first
    item in list

Use strides of IP address to select bit in
bitmap count of 1s to the left use as an
index from pointer
9
Eatherton Dittias Tree Bitmap
IP Address 1000 0000 1111 1100 1010 0000
10
Scalable IP Lookup Design
  • Fast IP Lookup (FIPL) Engine
  • Performs a longest prefix match (LPM) lookup on
    the Tree Bitmap
  • Designed with periodic memory access pattern to
    facilitate parallel operation
  • FIPL Engine Controller
  • Instantiate required number of parallel lookup
    engines to support link rate
  • Interleaves memory accesses of parallel
  • FIPL Wrapper
  • Buffers packets andmodifies Layer 2headers
    based onlookup results
  • Control Processor
  • Handles data structureupdates via
    arbitratedSRAM interface

11
FIPL Engine Design
  • Tree Bitmap stored in SRAM operating at 100MHz
  • Regular memory access period 8 clock cycles
  • Interleave parallel engine memory accesses using
    3-bit cycle counter
  • Exhaust memory bandwidth with 8 FIPL engines
  • Employs multicycle logic paths for area
    efficiency
  • Relative to the Xilinx Virtex 1000-E FPGA, each
    FIPL Engine utilizes less than 1 of the device
    resources

12
Performance Analysis
Gate-level simulation of FPGA running at 100MHz
Used sample database from Mae-West of 16,564
routes Tree Bitmap required 118.8 bits per
entry (w/ 36-bit next hop info)
13
Update Performance
Injected a continuous cycle of route add, modify,
delete at various rates
14
Performance on Research Platform
  • Based on results, targeted 4 engine configuration
    to the WUGS/FPX research platform to support 2
    Gb/s links
  • Sustained 1.988 Gb/s throughput on min. length
    packets 4.7 M packets/sec
  • Limited by 2 Gb/s switch interface of FPX
    (32-bit at 62.5 MHz)
  • 12 performance degradation at 200k updates/s
  • Utilizes only 8 of available logic resources and
    12.5 of on-chip memory resources
  • 4 FIPL Engines and FIPL Engine Controller
    utilizes 6 of logic resources
  • FIPL Wrapper utilizes 2 of logic resources and
    12.5 of on-chip memory resources

15
Modular Control Software
  • FIPL Memory Manager
  • Manages Tree Bitmap data structure
  • Accepts route add/modify/delete ? generates
    memory read/write commands
  • NCHARGE
  • Provides reliable connectivity between multiple
    software processes and reconfigurable hardware
    modules
  • Sproull, et. al., Control and Configuration
    Software for a Reconfigurable Networking Hardware
    Platform, FCCM'02.
  • Remote User Interface
  • Download FPGA circuit, program FPX, configure
    switch, and submit route updates remotely via web
    page
  • More info at http//www.arl.wustl.edu/projects/fpx
    /

16
Current Work Multi-Service Router
17
Towards Better FIPL Performance
  • Several options for architecture optimizations to
    achieve 200MHz clock
  • Utilize on-chip BlockRAMs for implementation of
    table-based CountOnes
  • Focus on reduction of off-chip memory accesses
  • Root node extension and caching
  • Asymmetrical node extension
  • Stride lengths of 12/8/4/4/4
  • Empty path pruning (form of path compression)
  • Other algorithmic optimizations
  • More data structure compression
  • Investigate intelligent node caching techniques

18
Conclusions Lessons
  • Design, simulation, implementation of a
    Longest-Prefix Match (LPM) search engine
  • Achieved a favorable balance between lookup
    performance, memory efficiency, and update
    performance
  • Support 500 Mb/s per 1 of FPGA
  • Utilize 10 bytes per route entry
  • Support 100k updates per second on-the-fly
  • Scalable design provides for ease of use in
    various research systems (IP routers,
    Programmable MSRs, etc.)
  • Great insight gained by carrying algorithmic work
    through to high-performance implementation
  • High-performance FPGA design is hard
  • Opinion CAD tools have not arrived yet

19
Thank you for listening.
Questions?
20
Towards Better Performance
  • Several options for design optimizations to
    achieve 200MHz clock
  • Reduce off-chip memory accesses via root node
    extension and caching
  • Brute node extension causes bitmap functions to
    grow exponentially
  • Represent root node as on-chip array indexed by
    first i-bits of destination address
  • Each array entry storesNext hop for LPM
    ini-bit path, pointer to anextending sub-tree
  • Maintain 4-bit stridelength for off-chipnodes

21
Motivation Advanced Network Services
22
Supporting Advanced Network Services
  • Pressing need for high-performance programmable
    routers
  • Flexible for rapid deployment of new services
  • No need to modify end-systems
  • Must scale with reasonable per-port costs
  • Thousands of ports each supporting optical links
  • Thousands of flows per port
  • Must be computationally robust
  • Support next-generation services without
    modification to infrastructure
  • Requires additional per-port processing resources
  • Minimize resources required for baseline
    functionality
Write a Comment
User Comments (0)
About PowerShow.com