Using FPGAs to Generate Gigabit Ethernet Data Transfers - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Using FPGAs to Generate Gigabit Ethernet Data Transfers

Description:

2 SFP cages 1GigE. 2 HSSDC connectors ... Empty. Packet. Packet in Queue. Correct packet type. All bytes received. Good cmd. Is a memory write ... – PowerPoint PPT presentation

Number of Views:145
Avg rating:3.0/5.0
Slides: 20
Provided by: rhu99
Category:

less

Transcript and Presenter's Notes

Title: Using FPGAs to Generate Gigabit Ethernet Data Transfers


1
Using FPGAs to Generate Gigabit Ethernet Data
TransfersThe Network Performance of DAQ
Protocols
Dave Bailey, Richard Hughes-Jones, Marc Kelly
The University of Manchester www.hep.man.ac.uk/
rich/ then Talks
2
Collecting Data over the Network
Detector elements e.g. Calorimeter Planks
  • Aim for a general purpose DAQ solution for
    CALICE
  • CAlorimeter for the LInear Collider Experiment
  • Take ECAL as an example.
  • At the end of the beam spill the planks send all
    the data, to the concentrators
  • Concentrators pack data send to one
    processing node
  • Classic bottleneck problem for the switch

Custom Links
???
Concentrators
Ethernet Switches
Output link Bottleneck Queue
Processing Nodes 1 Burst / Node
3
XpressFX Vertex4 Network Test Board
  • XpressFX Development Card from PLDApplications
  • 8 lane PCI-e card
  • Xilinx Virtex4FX60 FPGA
  • DDR2 memory
  • 2 SFP cages 1GigE
  • 2 HSSDC connectors

4
Overview of the Firmware Design
  • Virtex4FX60 has
  • 16 RocketIP Multi-Gigabit Tranceivers
  • Large internal memory
  • 2 PPC CPUs
  • Ethernet Interface
  • Embedded MAC
  • RocketIO
  • Packet Buffers logic
  • Allows routing of input
  • Prioritising of output
  • Packet State Machine
  • Packet Generator
  • State Machines
  • VHDL model HC11 CPU
  • Control of MAC State Machines (Green Mountain
    Computer Systems)
  • Reserve the PPC for data processing

5
The State Machine Blocks
  • Packet Generator
  • CSRs (set by HC11) for
  • Packet length
  • Packet count
  • Inter-packet delay
  • Destination Address
  • Request Response
  • RX State Machine
  • Decode Request Packet
  • Checksum RFC768
  • Action Mem writes
  • Q Other Requests
  • FIFO
  • TX State Machine
  • Process Request
  • Construct reply
  • Fragment if needed
  • Checksum
  • Packet Analyser

Packet Analyser State Machine
6
The Receive State Machine
End of packet
Packet in Queue
Idle
Empty Packet
Read Header
Wrong packet type
Fifo written
Correct packet type
Fifo has Address cmd
Fill Fifo
Read Cmd
Bad cmd
Not a memory write
Write finished
All bytes received
Write Mem
Check Cmd
Do Cmd
Is a memory write
Good cmd
7
The Receive State Machine
End of packet
cmd in fifo
Idle
End Pkt
Send Header cmd
Xsum sent
Header cmd sent
cmd needs no data
Send Xsum
Check Cmd
Update Counter
cmd requires data
More data to send
Send Memory
All bytes have been sent
All Sent?
Max packet size or byte count done
8
The Test Network
  • Use for testing Raw Ethernet Frame generation by
    the FPGA
  • Test Data collection with Request-Response
    protocols

Responding nodes
FPGA Concentrator
Cisco 7609 1 GE and 10 GE blades
Requesting Node
9
Request-Response Latency 1 GE
  • Request sent from PC
  • Linux Kernel 2.6.20-web100_pktd-plus
  • Intel e1000 NIC
  • Interrupt Coalescence OFF on PC
  • MTU 1500 bytes
  • Response Frames generated by FPGA code
  • Latency 19.7 µs well behaved
  • Latency Slope 0.018 µs/byte
  • B2B Expect 0.0182 µs/byte
  • Mem 0.0004
  • PCI-e 0.0018
  • 1GigE 0.008
  • FPGA 0.008
  • Smooth to 35,000 bytes

10
FPGA ? PC ethCal_recv Frame jitter
  • 12 us frame spacing (line speed)
  • 25 us frame spacing

Peak separation 4-5 us no coalescence
11
Test the Frame Spacing from the FPGA
  • Frames generated by FPGA code
  • Interrupt Coalescence OFF on PC
  • Frame size 1472 bytes
  • 1M packets sent.
  • Plot mean of observed frame spacing vs requested
    spacing
  • Appear have offset of -1 us ?
  • Slope close to 1 as expect
  • Packet loss decreases with packet rate.
  • Packet lost in receiving host
  • Larger effect than UDP/IP packets
  • UDP/IP losses linked to scheduling

12
The Test Network
  • Use for testing Raw Ethernet Frame generation by
    the FPGA
  • Test Data collection with Request-Response
    protocols
  • This time use 10GE hosts
  • But does 10GE work on a PC??

Responding nodes
FPGA Concentrator
Cisco 7609 1 GE and 10 GE blades
Requesting Node
13
10 GigE Back2Back UDP Throughput
  • Motherboard Supermicro X7DBE
  • Kernel 2.6.20-web100_pktd-plus
  • NIC Myricom 10G-PCIE-8A-R Fibre
  • rx-usecs25 Coalescence ON
  • MTU 9000 bytes
  • Max throughput 9.4 Gbit/s
  • Notice rate for 8972 byte packet
  • 0.002 packet loss in 10M packetsin receiving
    host
  • Sending host, 3 CPUs idle
  • For lt8 µs packets, 1 CPU is gt90 in kernel
    modeinc 10 soft int
  • Receiving host 3 CPUs idle
  • For lt8 µs packets, 1 CPU is 70-80 in kernel
    modeinc 15 soft int

14
Scaling of Request-Response Messages
  • Requests from 10GE system
  • Interrupt Coalescence OFF on PC
  • Frame size 1472 bytes
  • 1M packets sent.
  • Request 10,000 bytes of data
  • Host does fragment collectionlike the IP layer
  • Sequential Requests
  • Time to receive all responses scales with round
    trip time.
  • As expected from sequential requests
  • Grouped Requests
  • Collection time increases by 24.6µs per node.
  • From network alone expect 112.3 13.3 µs

15
Sequential Request-Response
  • Interrupt Coalescence OFF on PCs
  • MTU 1500 bytes
  • 10,000 packets sent.
  • Histograms similar
  • Strong 1st peak
  • Second peak 5 µs later
  • Small group 25 µs later
  • Ethernet occupancy for 1500 bytes
  • 1Gig 12.3 µs
  • 10Gig 1.2 µs

16
Grouped Request-Response
  • Interrupt Coalescence OFF on PCs
  • MTU 1500 bytes
  • 10,000 packets sent.
  • Histograms multi-nodal
  • Second peak 7 µs later
  • Small group 25 µs later

17
Conclusions
  • Implemented MAC and PHY layers inside Xilinx
    Virtex4 FPGA
  • Learning curve steep had to overcome issues with
  • Xilinx CoreGen design
  • Clock generation stability on PCB
  • FPGA easily drives 1Gigabit Ethernet at line rate
  • Packet dynamics on the wire as expected
  • Loss of Raw Ethernet frames in end host being
    investigated
  • Request-Response style data collection promising
  • Developing a simple Network test system
  • Planned upgrade to operate at 10Gbit/s
  • Work performed in collaboration with ESLEA UK
    e-Science EU EXPReS projects

18
  • Any Questions?

19
10 GigE UDP Throughput vs packet size
  • Motherboard Supermicro X7DBE
  • Linux Kernel 2.6.20-web100_pktd-plus
  • Myricom NIC 10G-PCIE-8A-R Fibre
  • myri10ge v1.2.0 firmware v1.4.10
  • rx-usecs0 Coalescence ON
  • MSI1
  • Checksums ON
  • tx_boundary4096
  • Steps at 4060 and 8160 byteswithin 36 bytes of
    2n boundaries
  • Model data transfer time as t C mBytes
  • C includes the time to set up transfers
  • Fit reasonable C 1.67 µs m 5.4 e4 µs/byte
  • Steps consistent with C increasing by 0.6 µs
  • The Myricom drive segments the transfers,
    limiting the DMA to 4096 bytes PCI-e chipset
    dependent!
Write a Comment
User Comments (0)
About PowerShow.com