Block Design Review: PlanetLab Line Card Header Format - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Block Design Review: PlanetLab Line Card Header Format

Description:

Packet counter is incremented (atomic SRAM) ... consumes less of the SRM controller time an, thus, the command queue never backs ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 29
Provided by: bdh4
Category:

less

Transcript and Presenter's Notes

Title: Block Design Review: PlanetLab Line Card Header Format


1
Block Design ReviewPlanetLab Line Card Header
Format
David M. Zar dzar_at_wustl.edu http//www.arl.wustl.e
du/projects/techX
2
Revision History
  • 10/31/06 (DMZ)
  • Initial Draft
  • 11/04/06 (DMZ)
  • Updates for performance issues

3
Line Card Centric Overview
Lookup
Switch Tx
QM/Schd
Hdr Format
S W I T C H
Phy Int Rx
Key Extract
Port Splitter
QM/Schd
Lookup
Key Extract
Switch Rx
Phy Int Tx
Hdr Format
Port Splitter
  • Port Splitter (Ingress and Egress)
  • Accepts packets on a NN ring
  • Based on the physical destination port number
  • 0-4 go to QM1 on a scratch ring
  • 5-9 go to QM2 on a scratch ring
  • Measured delay is about 120 cycles, including
    memory latency

4
Ingress Header Format
5
Ingress Header Format
  • Microengine Usage
  • One microengine
  • Eight identical threads
  • NN ring input from Lookup
  • NN ring output to Port Splitter
  • Main functions
  • Using data from Lookup, modify packet header in
    DRAM for proper routing to PE
  • Destination MAC address
  • First five bytes are same as source MAC address
  • Source MAC address
  • Address of this LC
  • VLAN tag
  • Adjust pre-queue stats counters
  • Format input data for QM
  • QID
  • Port Number
  • Ethernet Frame Length

6
LC Ingress Functional Blocks
Lookup
Switch Tx
Hdr Format
Phy Int Rx
Key Extract
Ouput PacketFormat
Possible Input Packet Formats
7
MAC Address and VLAN Tag (Ingress)
  • The source MAC address is fixed and set at boot
    time (_WU_get_mac_address)
  • The destination MAC address will only differ in
    the last byte and this byte is obtained from the
    Lookup data.
  • The VLAN tag is obtained from the Lookup data.

8
Stats/Counters (Ingress/Egress)
  • The Stats Index is obtained from the Lookup Data
  • The pre-queue packet and byte counters are
    updated (_WU_update_counters)
  • Packet counter is incremented (atomic SRAM)
  • Byte count is incremented by the number of bytes
    in the entire Ethernet frame (_WU_get_enet_frame_l
    ength).
  • Frame_length IP_pkt_len 18
  • 18 is the VLAN Ethernet header length

9
QM Data Formatting (Ingress and Egress)
  • QID is extracted from Lookup data
  • Port number is extracted from Lookup data
  • Total Ethernet frame length is passed to QM
  • Stats index is passed on for post-queue counters

10
Ingress HF Block Diagram
dl_source()
Signal next ctx
_WU_get_enet_frame_length
Cycles 10
NN Dequeue
Cycles 2
Cycles 17
init signal
DRAM 45 4B writes Cycles 26
_WU_write_vlan_header
Wait for prev ctx
Cycles 5
SRAM 1 read 1 write Cycles 10
_WU_update_counters
Signal next ctx
Cycles 1
NN Enqueue
Cycles 16
SRAM 3 writes Cycles 12
_WU_update_buffer_descriptor
Wait for prev ctx
dl_sink()
Total cycles 336699 Budget 1400
MHz/(10Gbs/890) 100.8 gt 100 cycles Measured
Latency 745
11
Ingress Validation
  • Send in non-tunneled packets and check output
    packets to see they are our internal, tunneled,
    packets.
  • Worked during development but not tested in
    integrated system at this point.
  • Send in tunneled packets and check output packets
    to see they are our internal, tunneled, packets.
  • Example 01020304 05060708 090a0b0c 81000aaa
    08004500 00380000 0000ff11 3a61c0a8 0001c0a8
    00020001 00010024 ffbd4500 001c0000 0000ff11
    3a7dc0a8 0001c0a8 00020001 00020008 7e87 6d7e
    d5be CRC thats stripped by RX
  • -gt
  • 01020304 0a020102 03040a0b 81000002 08004500
    00380000 0000ff11 3a61c0a8 0001c0a8 00020001
    00010024 ffbd4500 001c0000 0000ff11 3a7dc0a8
    0001c0a8 00020001 00020008 7e87

12
Egress Header Format
13
Egress Header Format
  • Microengine Usage
  • One microengine
  • Eight identical threads
  • NN ring input from Lookup
  • NN ring output to Port Splitter
  • Main functions
  • Using data from Lookup, modify packet header in
    DRAM for proper routing to Switch
  • Destination MAC address
  • First five bytes are same as source MAC address
  • Destination MAC address is looked up based on IP
    address from lookup
  • Source MAC address
  • Address of this LC
  • VLAN tag
  • Adjust pre-queue stats counters
  • Format input data for QM
  • QID
  • Port Number
  • Ethernet Frame Length

14
LC Egress Functional Blocks
Lookup
Phy Int Tx
Hdr Format
S W I T C H
Key Extract
Switch Rx
Output Packet Format
Input Packet Format
15
MAC Address and VLAN Tag (Egress)
  • The source MAC address is fixed and set at boot
    time (_WU_get_mac_address)
  • The destination MAC address will only differ in
    the last nibble and this nibble is obtained from
    the Lookup data.
  • _WU_ip_lookup will take 32 bits from the
    destination IP address and use the local CAM to
    obtain the least significant 4 bits of the MAC
    address.
  • The CAM state bits are used for this so thats
    why there are only 4 bits of data returned
  • The VLAN tag is obtained from the Lookup data.

16
Egress HF Block Diagram
dl_source()
Signal next ctx
Cycles 10
_WU_get_enet_frame_length
Cycles 1
NN Dequeue
Cycles 2
_WU_ip_lookup
init signal
Wait for prev ctx
Cycles 1
DRAM 1 4B read 4 4B writesCycles
32
_WU_write_vlan_header
Cycles 2
_WU_update_counters
SRAM 1 add 1 incrCycles 6
Signal next ctx
Cycles 1
NN Enqueue
SRAM 3 writesCycles 10
_WU_update_buffer_descriptor
Wait for prev ctx
dl_sink()
Total cycles 65 Measured Latency 660
17
Egress Validation
  • Send in our internal, tunneled packets and check
    output packets to see they are our valid IP,
    tunneled, packets.
  • For the PlanetLab demo, there are no non-tunneled
    output packets
  • Check packet and byte counters for valid updates
  • Check CAM for proper initialization (data watch)

18
HF Initialization (Ingress/Egress)
  • All memory locations defined in dl_system.h
  • Base address for HF
  • LCI/E_HF_SRAM_INIT_BASE
  • MAC_ADDR_HI32
  • MAC_ADDR_LO16
  • Pre-Queue Counters
  • LCI/E_LU_COUNTERS_SRAM_INIT_BASE
  • LCI/E_LU_PRE_Q_PKT_CNT_OFFSET offset into
    counters structure for packet counter
  • LCI/E_LU_PRE_Q_BYTE_CNT_OFFSET offset into
    counters structure for byte counter.
  • Thread 0 waits for signal from rx
  • For Egress, the CAM is filled (_WU_hfe_initialize_
    ip_lookup) with data from LCE_HF_SRAM_INIT_BASE
    8
  • each entry is 64 bits cam_entry (32b), RSVD
    (28b), MAC_DEST (4b)

19
File Locations (Ingress and Egress)
  • Main code
  • Applications/LC_Ingress/src/hdr_format/PL/hdr_form
    at.uc
  • Applications/LC_Egress/src/hdr_format/PL/hdr_forma
    t.uc
  • Library
  • library/DataPlane/hdr_format_util.uc

20
Required Includes (Ingress and Egress)
  • Files
  • build/PL/dispatch_loop/dl_system.h
  • memory locations
  • IXA_SDK_4.0/src/library/microblocks_library/
  • dl_meta for metadata macros
  • IXA_SDK_4.0/src/library/dataplane_library/
  • dram for DRAM read/write macros
  • sram for SRAM read/write/add/incr macros
  • xbuf for transfer buffer macros

21
Performance Issues
22
Ingress Performance Anomalies
23
Ingress Anomalies (Explanation)
24
Ingress Anomalies (Explanation)
The SRAM Controllers have a command FIFO
These bus arbiters are shared across all memory
interfaces
25
Ingress/Egress SRAM Issues
  • It seems that using atomic ADD/INCR instructions
    is expensive at the SRAM controller
  • If I remove them and read the SRAM, add myself,
    write the SRAM, this is quicker and consumes less
    of the SRM controller time an, thus, the command
    queue never backs up.
  • The this new design, there are more instructions
    executed, but there may be a few I could
    eliminate with some optimizing of code.
  • No stalling in the WU microblocks (well QM does
    and RX and TX still do but these looks normal).

26
Ingress/Egress Performance
  • 99 CPU cycles
  • 745 cycles latency
  • Expected performance
  • Should have no trouble going at 10 Gb/s but does
  • Simulated performance (as of 11/06/2006)
  • 10 Gb
  • With all other microengines in place (i.e. real
    simulation)

27
Future Work
28
Ingress/Egress Future Work
  • Determine source of I/O stalls
  • Update Stubs projects for validation of
    Ingress/Egress blocks (done for Ingress)
  • Extend Both blocks for all possible packet
    formats
  • Ingress inputs
  • Egress outputs
  • Possible instruction optimization to give a
    little headroom (99 cycles out of 100).
    Currently, design will not work for standard IPv4
    packets PlanetLab VLAN packets are OK.
Write a Comment
User Comments (0)
About PowerShow.com