NetFPGA Project: 4-Port Layer 2/3 Switch - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

NetFPGA Project: 4-Port Layer 2/3 Switch

Description:

Programmable Routing Tables Longest Prefix Match, Exact Match ... Packet Memory1. ICMP Processing. L2 Processing. Statistics and Policing. From CFPGA ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 20

Provided by: genejukn

Learn more at: https://web.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: NetFPGA Project: 4-Port Layer 2/3 Switch

1
NetFPGA Project4-Port Layer 2/3 Switch

Ankur Singla (asingla_at_stanford.edu)
Gene Juknevicius (genej_at_stanford.edu)

2
Agenda

NetFPGA Development Board
Project Introduction
Design Analysis
Bandwidth Analysis
Top Level Architecture
Data Path Design Overview
Control Path Design Overview
Verification and Synthesis Update
Conclusion

3
NetFPGA Development Board
4
Project Introduction

4 Port Layer-2/3 Output Queued Switch Design
Ethernet (Layer-2), IPv4, ICMP, and ARP
Programmable Routing Tables Longest Prefix
Match, Exact Match
Register support for Switch Fwd On/Off,
Statistics, Queue Status, etc.
Layer-2 Broadcast, and limited Layer-3 Multicast
support
Limited support for Access Control
Highly Modular Design for future expandability

5
Bandwidth Analysis

Available Data Bandwidth
Memory bandwidth 32 bits 25 MHz 800
Mbits/sec
CFPGA to Ingress FIFO/Control Block bandwidth32
bits 25 MHz / 4 200 Mbits/sec
Packet Queue to Egress bandwidth 32 bits 25
MHz / 4 200 Mbits/sec
Packet Processing Requirements
4 ports operating at 10 Mbits/sec gt 40 Mbits/sec
Minimum size packet 64 Byte gt 512 bits
512 bits / 40 Mbits/sec 12.8 us
Internal clock is 25 MHz
12.8 us 25 MHz 320 clocks to process one
packet

6
Top Level Architecture
7
Data Flow Diagram

Output Queued Shared Memory Switch
Round Robin Scheduling
Packet Processing Engine provides L2/L3
functionality
Coarse Pipelined Arch. at the Block Level

8
Master Arbiter

Round Robin Scheduling of service to Each Input
and Output
Interfaces Rest of the Design with Control FPGA
Co-ordinates activities of all high level blocks
Maintains Queue Status for each Output

9
Ingress FIFO Control Block

Interfaces three blocks
Control FPGA
Forwarding Engine
Packet Buffer Controller
Dual Packet Memories for coarse pipelining
Responsible for Packet Replication for Broadcast

10
Packet Processing Engine Overview

Goals
Features L3/L2/ICMP/ARP Processing
Performance Requirements 78Kpps
Fit within 60 of Single User FPGA Block
Modularity / Scalability
Verification / Design Ease
Actual
Support for all required features L2 broadcast,
L3 multicast, LPM, Statistics and Policing
(coarse access control)
Performance Achieved 234Kpps (worst case 69Kpps
for ICMP echo requests 1500bytes)
Requires only 12 of Single UFPGA resources
Highly Modular Design for design/verification/scal
ability ease

11
Pkt Processing Engine Block Diagram
From CFPGA
Packet Memory0
Native Packet
Packet Memory1
To Packet Buffer
12
Forwarding Master State Machine

Responsible for controlling individual processing
blocks
Request/Grant Scheme for future expandability
Initiates a Request for Packet to Ingress FIFO
and then assigns to responsible agents based on
packet contents
Replication of MSM to provide more throughput

13
L3 Processing Engine

Parsing of the L3 Information
Src/Dest Addr, Protocol Type, Checksum, Length,
TTL
Longest Prefix Match Engine
Mask Bits to represent the prefix. Lookup Key is
Dest Addr
Associated Info Table (AIT) Indexed using the
entry hit
AIT provides Destination Port Map, Destination L2
Addr, Statistics Bucket Index
Request/Done scheme to allow for expandability
(e.g. future m-way Trie implementation project)
ICMP Support Engine Request (if Dest Addr is
Routers IP Address Protocol Type is ICMP)
Total 85 cycles for Packet Processing with 80 of
the cycles spent on Table Lookup
If using 4-way trie, total processing time can
be reduced to less than 30 cycles.

14
L2 Processing Engine

If there is any processing problems with ARP,
ICMP, and/or L3, then L2 switching is done
Exact Match Engine
Re-use of the LPM match engine but with Mask Bits
set to all 1s.
Associated Info Table (AIT) Indexed using the
entry hit
AIT provides Destination Port Map, and Statistics
Bucket Index
Request/Done scheme to allow for expandability
(e.g. future Hash implementation project)
Learning Engine removed because of Switch/Router
Hardware Verification problems (HP Switch bug)
Total 76 cycles for Packet Processing with over
80 of the cycles spent on Table Lookup
If using Hashing Function, total processing time
can be reduced to less than 20 cycles.

15
Packet Buffer Interface

Interfaces with Master Arbiter and Forward Engine
Output Queued Switch
Statically Assigned
Single Queue per port
Off-chip ZBT SRAM on NetFPGA board

16
Control Block

Typical Register Rd/Wr Functionality
Status Register
Control Register (forwarding disable, reset)
Routers IP Addresses (port 1-4)
Queue Size Registers
Statistics Registers
Layer-2 Table Programming Registers
Layer-3 Table Programming Registers

17
Verification

Three Levels of Verification Performed
Simulations
Module Level to verify the module design intent
and bus functional model
System Level using the NetFPGA verification
environment for packet level simulations
Hardware Verification
Ported System Level tests to create tcpdump files
for NetFPGA traffic server
Very good success on Hardware with all System
Level tests passing.
Only one modification required (reset generation)
after Hardware Porting
Demo - Greg can provide lab access to anyone
interested

18
Synthesis Overview

Design was ported to Altera EP20K400 Device
Logic Elements Utilized 5833 (35 of Total LEs)
RAM ESBs Used 46848 (21 of Total ESBs)
Max Design Clock Frequency 31MHz
No Timing Violations

Design Block Name Flip-flops (Actual) Ram bits (Actual) Gates (Actual)
Main Arbiter 71 0 1500
Memory Controller 109 0 2000
Control Block 608 0 5000
Ingress FIFO Controller 60 64000 1200
Switching and Routing Engine 925 14000 14000

Total 1773 78000 23700
19
Conclusion

Easy to achieve required performance in an OQ
Shared Memory Switch in NetFPGA
Modularity of the design allows more interesting
and challenging future projects
Design/Verification Environment was essential to
meet schedule
NetFPGA is an excellent design exploration
platform

Write a Comment

User Comments (0)