NetFPGA Project: 4-Port Layer 2/3 Switch - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

NetFPGA Project: 4-Port Layer 2/3 Switch

Description:

Programmable Routing Tables Longest Prefix Match, Exact Match ... Packet Memory1. ICMP Processing. L2 Processing. Statistics and Policing. From CFPGA ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 20
Provided by: genejukn
Learn more at: https://web.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: NetFPGA Project: 4-Port Layer 2/3 Switch


1
NetFPGA Project4-Port Layer 2/3 Switch
  • Ankur Singla (asingla_at_stanford.edu)
  • Gene Juknevicius (genej_at_stanford.edu)

2
Agenda
  • NetFPGA Development Board
  • Project Introduction
  • Design Analysis
  • Bandwidth Analysis
  • Top Level Architecture
  • Data Path Design Overview
  • Control Path Design Overview
  • Verification and Synthesis Update
  • Conclusion

3
NetFPGA Development Board
4
Project Introduction
  • 4 Port Layer-2/3 Output Queued Switch Design
  • Ethernet (Layer-2), IPv4, ICMP, and ARP
  • Programmable Routing Tables Longest Prefix
    Match, Exact Match
  • Register support for Switch Fwd On/Off,
    Statistics, Queue Status, etc.
  • Layer-2 Broadcast, and limited Layer-3 Multicast
    support
  • Limited support for Access Control
  • Highly Modular Design for future expandability

5
Bandwidth Analysis
  • Available Data Bandwidth
  • Memory bandwidth 32 bits 25 MHz 800
    Mbits/sec
  • CFPGA to Ingress FIFO/Control Block bandwidth32
    bits 25 MHz / 4 200 Mbits/sec
  • Packet Queue to Egress bandwidth 32 bits 25
    MHz / 4 200 Mbits/sec
  • Packet Processing Requirements
  • 4 ports operating at 10 Mbits/sec gt 40 Mbits/sec
  • Minimum size packet 64 Byte gt 512 bits
  • 512 bits / 40 Mbits/sec 12.8 us
  • Internal clock is 25 MHz
  • 12.8 us 25 MHz 320 clocks to process one
    packet

6
Top Level Architecture
7
Data Flow Diagram
  • Output Queued Shared Memory Switch
  • Round Robin Scheduling
  • Packet Processing Engine provides L2/L3
    functionality
  • Coarse Pipelined Arch. at the Block Level

8
Master Arbiter
  • Round Robin Scheduling of service to Each Input
    and Output
  • Interfaces Rest of the Design with Control FPGA
  • Co-ordinates activities of all high level blocks
  • Maintains Queue Status for each Output

9
Ingress FIFO Control Block
  • Interfaces three blocks
  • Control FPGA
  • Forwarding Engine
  • Packet Buffer Controller
  • Dual Packet Memories for coarse pipelining
  • Responsible for Packet Replication for Broadcast

10
Packet Processing Engine Overview
  • Goals
  • Features L3/L2/ICMP/ARP Processing
  • Performance Requirements 78Kpps
  • Fit within 60 of Single User FPGA Block
  • Modularity / Scalability
  • Verification / Design Ease
  • Actual
  • Support for all required features L2 broadcast,
    L3 multicast, LPM, Statistics and Policing
    (coarse access control)
  • Performance Achieved 234Kpps (worst case 69Kpps
    for ICMP echo requests 1500bytes)
  • Requires only 12 of Single UFPGA resources
  • Highly Modular Design for design/verification/scal
    ability ease

11
Pkt Processing Engine Block Diagram
From CFPGA
Packet Memory0
Native Packet
Packet Memory1
To Packet Buffer
12
Forwarding Master State Machine
  • Responsible for controlling individual processing
    blocks
  • Request/Grant Scheme for future expandability
  • Initiates a Request for Packet to Ingress FIFO
    and then assigns to responsible agents based on
    packet contents
  • Replication of MSM to provide more throughput

13
L3 Processing Engine
  • Parsing of the L3 Information
  • Src/Dest Addr, Protocol Type, Checksum, Length,
    TTL
  • Longest Prefix Match Engine
  • Mask Bits to represent the prefix. Lookup Key is
    Dest Addr
  • Associated Info Table (AIT) Indexed using the
    entry hit
  • AIT provides Destination Port Map, Destination L2
    Addr, Statistics Bucket Index
  • Request/Done scheme to allow for expandability
    (e.g. future m-way Trie implementation project)
  • ICMP Support Engine Request (if Dest Addr is
    Routers IP Address Protocol Type is ICMP)
  • Total 85 cycles for Packet Processing with 80 of
    the cycles spent on Table Lookup
  • If using 4-way trie, total processing time can
    be reduced to less than 30 cycles.

14
L2 Processing Engine
  • If there is any processing problems with ARP,
    ICMP, and/or L3, then L2 switching is done
  • Exact Match Engine
  • Re-use of the LPM match engine but with Mask Bits
    set to all 1s.
  • Associated Info Table (AIT) Indexed using the
    entry hit
  • AIT provides Destination Port Map, and Statistics
    Bucket Index
  • Request/Done scheme to allow for expandability
    (e.g. future Hash implementation project)
  • Learning Engine removed because of Switch/Router
    Hardware Verification problems (HP Switch bug)
  • Total 76 cycles for Packet Processing with over
    80 of the cycles spent on Table Lookup
  • If using Hashing Function, total processing time
    can be reduced to less than 20 cycles.

15
Packet Buffer Interface
  • Interfaces with Master Arbiter and Forward Engine
  • Output Queued Switch
  • Statically Assigned
  • Single Queue per port
  • Off-chip ZBT SRAM on NetFPGA board

16
Control Block
  • Typical Register Rd/Wr Functionality
  • Status Register
  • Control Register (forwarding disable, reset)
  • Routers IP Addresses (port 1-4)
  • Queue Size Registers
  • Statistics Registers
  • Layer-2 Table Programming Registers
  • Layer-3 Table Programming Registers

17
Verification
  • Three Levels of Verification Performed
  • Simulations
  • Module Level to verify the module design intent
    and bus functional model
  • System Level using the NetFPGA verification
    environment for packet level simulations
  • Hardware Verification
  • Ported System Level tests to create tcpdump files
    for NetFPGA traffic server
  • Very good success on Hardware with all System
    Level tests passing.
  • Only one modification required (reset generation)
    after Hardware Porting
  • Demo - Greg can provide lab access to anyone
    interested

18
Synthesis Overview
  • Design was ported to Altera EP20K400 Device
  • Logic Elements Utilized 5833 (35 of Total LEs)
  • RAM ESBs Used 46848 (21 of Total ESBs)
  • Max Design Clock Frequency 31MHz
  • No Timing Violations

Design Block Name Flip-flops (Actual) Ram bits (Actual) Gates (Actual)
Main Arbiter 71 0 1500
Memory Controller 109 0 2000
Control Block 608 0 5000
Ingress FIFO Controller 60 64000 1200
Switching and Routing Engine 925 14000 14000
       
Total 1773 78000 23700
19
Conclusion
  • Easy to achieve required performance in an OQ
    Shared Memory Switch in NetFPGA
  • Modularity of the design allows more interesting
    and challenging future projects
  • Design/Verification Environment was essential to
    meet schedule
  • NetFPGA is an excellent design exploration
    platform
Write a Comment
User Comments (0)
About PowerShow.com