Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress

Description:

Added information on format of Database entry files. 3 - John DeHart ... Lookup Miscellany. Bugs: No known bugs. Testing: Minimal testing done so far ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 45
Provided by: bdh4
Category:

less

Transcript and Presenter's Notes

Title: Block Design Review: Lookup for IPv4 MR, LC Ingress and LC Egress


1
Block Design ReviewLookupforIPv4 MR, LC
Ingress and LC Egress
John DeHart jdd_at_arl.wustl.edu http//www.arl.wustl
.edu/projects/techX
2
Revision History
  • 10/11/06 (JDD)
  • Created
  • 10/23/06 (JDD)
  • Finished for presentation on 10/24/06
  • 10/24/06 (JDD)
  • Updates from comments during review.
  • Added more TCAM info
  • Added information on format of Database entry
    files

3
Guidelines for Design Reviews
  • Definition of interfaces In/Out
  • Block diagram of module
  • Including list of files where code for each
    block/module exists.
  • Macros
  • List macros and files where they can be found
  • For each macro, provide a few lines of comments
    in the code that describes the macro.
  • Document local and global registers used by
    macro.
  • Memory assumptions
  • What addresses are pre-defined, etc
  • Initialization of Memory
  • Data Structures
  • Control Blocks
  • Details of memory accesses, xfer register usage,
    signal usage.
  • Critical path
  • Testing
  • Develop a well defined acceptance test that
    convinces you that your block works
  • Document acceptance test
  • Pktgen project file?
  • Known bugs

4
Contents
Lookup
Switch Tx
QM/Schd
Hdr Format
S W I T C H
Phy Int Rx
Key Extract
QM/Schd
Lookup
Key Extract
Switch Rx
Phy Int Tx
Hdr Format
5
File locations
  • Code
  • src/applications/LC_Ingress/src/lookup/PL/lookup.u
    c
  • src/applications/LC_Egress/src/lookup/PL/lookup.uc
  • src/applications/IPv4_MR/src/lookup/PL/lookup.uc
  • Configuration and Database Entry Files
  • src/applications/LC_Ingress/build/PL/LCI_config.tx
    t
  • LC_Ingress_Database_64bKey_64bResult_BothQM.txt
  • src/applications/LC_Engress/build/PL/LCE_config.tx
    t
  • LC_Egress_Database_24bKey_64bResult.txt
  • src/applications/IPv4_MR/build/PL/IPv4_config.txt
  • GM_Database_144Key_128bResult.txt
  • IDT Includes
  • src/IDT_NSE/data_plane_IXP2XXX/include/Iipc.uc
  • Which then includes Iipc.h from same directory
  • IDT Simulation Library
  • Typical Installed location
  • C/IDT_NSE/simulation/windows/IDT75K234.dll
  • Repository location
  • src/IDT_NSE/simulation/windows/IDT75K234.dll

6
TCAM Documentation
  • Docs are distributed sprinkled through the
    different installation directories
  • We have gathered most of the important stuff
    here
  • /project/techX/DataSheets/IDT
  • The following documents are located in the above
    directory
  • Datasheet (Under non-disclosure)
  • 75K72234_datasheet.pdf
  • User Manual
  • 75K72234_UserManual.pdf
  • Instruction Latency Application Note
  • 75K72234_latency.pdf
  • SLAM Simulation
  • IDT75K234SLAM_UsersManual.pdf
  • Dataplane Macros
  • NSEDataPlaneMacroAPIGuide.pdf
  • IMS API
  • IMS_API.pdf

7
WU Macros
  • LC Ingress
  • dl_nn_ring_init
  • dl_source_1ME_NN_4words
  • dl_sink_1ME_NN_4words
  • IPv4_MR
  • dl_nn_ring_init
  • dl_source_1ME_NN_9words
  • dl_sink_1ME_NN_4words
  • LC Egress
  • dl_nn_ring_init
  • dl_source_1ME_NN_4words
  • dl_sink_1ME_NN_5words
  • Diagnostics
  • GetTimeStamp
  • CompareTimeStamps

8
IDT Macros
  • IipcStartTimestamp
  • Does CAP read and write to set bit in
    MISC_CONTROL to start the timestamp counter.
  • IipcFormContextFromCsrMeCtx
  • Sets up the Context field for the TCAM command
    word based on the ME and context
  • 128 Contexts per LA-1 Interface
  • IipcMakeBase
  • Form the base address word for any instruction
    for this context
  • Address is 22 bit WORD address, covers 16 MByte
    address space
  • IipcMakeDirectInstruction
  • Form the command word for any of the 4 Direct
    instructions
  • Result of IipcMakeBase and IipcMakeDirectInstructi
    on will be passed as the two address parameters
    to sramwrite
  • sramwrite, w00, iipc_base_word,
    iipc_command_word, count
  • IipcDelayUsingFutureCount(cycles)
  • Sets the Future Count register to this many
    cycles
  • Sets the Future Count Signal register
  • Ctx_arb on that signal
  • IipcSramRead
  • Performs and SRAM read until Done bit is set in
    result.
  • We dont use this if any more.

9
Lookup Initialization and Control
  • XScale utility to initialize NSE and Databases
  • Control Plane and XScale mechanisms to read and
    write TCAM entries while system is active.

10
Lookup Miscellany
  • Bugs No known bugs
  • Testing
  • Minimal testing done so far
  • Some simple functional tests to show distribution
    of packets across all output ports based on Key
    fields for each of the three projects.
  • More complete test plan needed.
  • Still To Do
  • Add information on how to configure Filters for
    Lookup engine.
  • Handle init_done signal from Rx
  • Turn on optimizer
  • Substrate only lookup for IPv4_MR GPE?NPE pkts
  • Add second database to IPv4 MR
  • DB1 GM/EM Database
  • DB2 Route Lookup
  • LD bit in Lookup Result
  • Clean up definition of DB Ids.
  • Consider making Lookup code one common file with
    ifdefs to differentiate
  • Consider removing ifdef DONE_BIT_FIX code
  • Refers to a Done bit bug in the Dual Port QDR
    (which is what we have)
  • I have not seen this bug mentioned anywhere else.

11
TCAM Entries in Simulation
  • Four Parts to a TCAM Entry in simulation
  • dbindex
  • Slot in database occupied by entry.
  • Start at 0
  • Incremented by 1 for each entry
  • Not dependent on size
  • core
  • What is matched against a provided key
  • mask
  • Indicates what part of the entry(core) has to
    match key supplied to give a hit
  • data
  • Results data
  • Configuration and Database Entry files
  • src/applications/LC_Ingress/build/PL/LCI_config.tx
    t
  • LC_Ingress_Database_64bKey_64bResult_BothQM.txt
  • src/applications/LC_Engress/build/PL/LCE_config.tx
    t
  • LC_Egress_Database_24bKey_64bResult.txt
  • src/applications/IPv4_MR/build/PL/IPv4_config.txt
  • GM_Database_144Key_128bResult.txt

12
TCAM Entries in Simulation
  • LC Ingress Database entry from file
  • src/applications/LC_Ingress/build/PL/
    LC_Ingress_Database_64bKey_64bResult_BothQM.txt
  • dbindex 0x0
  • core
    0x51C0A80002110001
  • SL Type 0x5
  • Port 1
  • IP DA192.168.0.2
  • IP Proto 17 (UDP)
  • UDP DPort 0x0001
  • Exact Match everything, except
    wildcard Port
  • mask
    0xf0ffffffffffffff
  • data
    0x0001004A01100001
  • VLAN(16b)0x0001
  • Stats_Index(16b)74(0x4A)

13
TCAM Entries in Simulation
  • IPv4 MR Database entry from file
  • src/applications/IPv4_MR/build/PL/GM_Database_144K
    ey_128bResult.txt
  • dbindex 0x0
  • core
    0x0AAA0002C0A84001C0A82002000100020011
  • MR ID (VLAN) 0x0AAA
  • UDP DPort0x0002
  • IP DA192.168.64.1
  • IP SA192.168.32.02
  • TCP/UDP SPort0x0001
  • TCP/UDP DPort0x0002,
  • TCP_FLAGS_Proto0x0011
    (ProtoUDP, no TCP Flags)
  • mask
    0xffffffffffffffffffffffffffffffffffff Exact
    match everything
  • data
    0x0000003780FC99F95555666601000001
  • Reserved(3b), Drop Bit(1b)
  • Reserved(12b)
  • Cntr_Index(16b)55(0x37),
  • Tx IP DAddr128.252.153.249,
  • Tx UDP Dport0x5555

14
TCAM Entries in Simulation
  • LC Egress Database entry from file
  • src/applications/LC_Egress/build/PL/LC_Egress_Data
    base_24bKey_64bResult.txt
  • dbindex 0x0
  • core 0x11000100
  • IP Proto (8b) 0x11 (UDP)
  • UDP SPort (16b) 1
  • Rsvd(8b) 0
  • mask 0xffffffff
    Exact Match.
  • data 0x000101000021
  • Rsvd(4b) 0
  • VLAN(12b)0x001
  • Rsvd(4b)0
  • Port(4b)1
  • Rsvd(4b)
  • QID(20b)33 (0x00021)

15
Basics of TCAM Operation
  • Instruction is given to TCAM as an sram write
  • Address bus gives instruction
  • 4 Direct Instructions
  • Lookup This is all we use right now.
  • MultiHit Lookup (MHL) or Simultaneous
    Multi-Database Lookup
  • Which one is determined by a bit in a config
    register
  • Preload
  • Indirect Uses data field to specify
    subinstruction
  • Data bus gives
  • Subinstruction for Indirect instructions (There
    are 16 subinstructions)
  • Data for all instructions
  • Our lookup keys go here.
  • Example IPv4 MR Lookup (Key of 144 bits in 5
    words)
  • Load xfer registers w00, w01, w02, w03, w04
    with the lookup key
  • sram write, w00, iipc_base_word,
    iipc_command_word, 5
  • More about iipc_base_word and iipc_command_word
    later
  • 5 number of data words needed for key
  • Result is read back from Contexts Results
    Mailbox
  • This is an SRAM read, not a TCAM Read
    instruction.

16
LC Ingress Lookup
Lookup
Switch Tx
QM/Schd
Hdr Format
S W I T C H
Phy Int Rx
Key Extract
  • Main functions
  • Perform TCAM Lookup
  • Pass Through Data
  • Buf Handle
  • IP Pkt Length and Ethernet Header Length
  • Single code path with possible loop around Result
    Read
  • NN communication
  • Uses 8 threads

17
LC Ingress Lookup Block Interfaces
Lookup
Switch Tx
Hdr Format
S W I T C H
Phy Int Rx
Key Extract
Buf Handle(32b)
Buf Handle(32b)
IP Pkt Length (16b)
Reserved (8b)
IP Pkt Length (16b)
Eth Hdr Len (8b)
Reserved (8b)
Eth Hdr Len (8b)
Lookup Key63-32 (32b)
VLAN (16b)
Stats Index (16b)
Rsvd (4b)
Lookup Key 31-0 (32b)
QID (20b)
DAddr (8b)
Port (4b)
Lookup Result
Lookup Key
D_Addr318 (24b)
SL (4b)
Port (4b)
VLAN (16b)
Stats Index (16b)
Rsvd (4b)
D_Addr70 (8b)
UDP DPort (16b)
Protocol (8b)
QID (20b)
DAddr (8b)
Port (4b)
18
LC Ingress Lookup Block Diagram
mem access
dl_source()
Signal next ctx
Load Xfer Regs
NN Dequeue (4W)
SRAM Write 2W
Send Lookup Request
init signal
Wait for prev ctx
TimeStamp Delay
ctx_swap
Read Result
SRAM Read 2W
Signal next ctx
ctx_swap
Check Done Bit
NN Enqueue (4W)
Wait for prev ctx
Reformat Output
dl_sink()
19
IPv4 MR Lookup
  • Main functions
  • Perform TCAM Lookup
  • Pass Through Data
  • Buf Handle
  • IP Pkt Length and Offset
  • Slice Data Ptr
  • Exception Bits
  • Single code path with possible loop around Result
    Read
  • NN communication
  • Uses 8 threads

20
IPv4 MR Lookup Block Interfaces
Lookup
Tx
DeMux
Rx
Parse
Header Format
Buf Handle(32b)
IP Pkt Length (16b)
IP Pkt Offset (16b)
Rx UDP DPort(16b)
Slice ID (VLAN) (16b)
Cntr Index (16b)
R S V d (1b)
D (1b)
H (1b)
Exception Bits (12b)
L D (1b)
Tx IP DAddr (32b)
Tx UDP SPort(16b)
Tx UDP DPort (16b)
Port (4b)
QID(20b)
DA(8b)
Slice Data Ptr (32b)
Slice Data Ptr (32b)
Reserved (28b)
Code (4b)
Reserved (28b)
Code (4b)
Lookup Key (144b)
Slice ID/Rx UDP DPort (32b)
IP DAddr (32b)
IP SAddr (32b)
SPort (16b)
DPort (16b)
Proto/TCP_Flags(16b)
21
IPv4 MR Functional Block Results
Lookup Key (144b)
TCAM Status Bits
As given to HF Lookup Result (128b)
Stored in TCAM Lookup Result (128b)
Cntr Index (16b)
D 1b
Reserved (11b)
D O N e 1b
H I t 1b
M H I t 1b
L D 1b
Tx IP DAddr (32b)
Tx UDP SPort(16b)
Tx UDP DPort (16b)
Port (4b)
QID(20b)
DA(8b)
22
IPv4 MR Lookup Block Diagram
mem access
dl_source()
Signal next ctx
Load Xfer Regs
NN Dequeue (9W)
SRAM Write 5W
Send Lookup Request
init signal
Wait for prev ctx
TimeStamp Delay
ctx_swap
Read Result
SRAM Read 4W
Signal next ctx
ctx_swap
Check Done Bit
NN Enqueue (9W)
Wait for prev ctx
Reformat Output
dl_sink()
23
LC Egress Lookup
S W I T C H
QM/Schd
Lookup
Key Extract
Switch Rx
Phy Int Tx
Hdr Format
  • Main functions
  • Perform TCAM Lookup
  • Pass Through Data
  • Buf Handle
  • IP Pkt Length and Ethernet Header Length
  • IP Destination Address
  • Single code path with possible loop around Result
    Read
  • NN communication
  • Uses 8 threads

24
LC Egress Lookup Block Interfaces
S W I T C H
Lookup
Key Extract
Switch Rx
Phy Int Tx
Hdr Format
Buf Handle(32b)
Buf Handle(32b)
IP DAddr (32b)
IP DAddr (32b)
Lookup Result 63-32 (32b)
Lookup Key UDP SPort (16b)
Lookup Key IP Proto (8b)
Reserved (8b)
Lookup Result 31-0 (32b)
Lookup Result
Lookup Key
25
LC Egress Lookup Block Diagram
mem access
dl_source()
Signal next ctx
Load Xfer Regs
NN Dequeue (4W)
SRAM Write 1W
Send Lookup Request
init signal
Wait for prev ctx
TimeStamp Delay
ctx_swap
Read Result
SRAM Read 2W
Signal next ctx
ctx_swap
Check Done Bit
NN Enqueue (5W)
Wait for prev ctx
Reformat Output
dl_sink()
26
Performance
27
Packet Sizes
28
Cycle Budget (min eth packets)
  • To hit 5 Gb rate
  • 76B per min IPv4 packet (64 min Eth 12B IFS)
  • 1.4Ghz clock rate
  • 5 Gb/sec 1B/8b packet/76B 8.22 Mp/sec
  • 1.4Gcycle/sec 1 sec/ 8.22 Mp 170.3 cycles
    per packet
  • compute budget 170 cycles
  • latency budget (threads170)
  • 8 threads 1360 cycles
  • To hit 10 Gb rate
  • 76B per min IPv4 packet (64 min Eth 12B IFS)
  • 1.4Ghz clock rate
  • 10 Gb/sec 1B/8b packet/76B 16.44 Mp/sec
  • 1.4Gcycle/sec 1 sec/ 16.44 Mp 85.16 cycles
    per packet
  • compute budget 85 cycles
  • latency budget (threads85)
  • 8 threads 680 cycles

29
Cycle Budget (IPv4 MN packets)
  • To hit 5 Gb rate
  • 90B per min IPv4 packet (78 min IPv4MN 12B IFS)
  • 1.4Ghz clock rate
  • 5 Gb/sec 1B/8b packet/90B 6.94 Mp/sec
  • 1.4Gcycle/sec 1 sec/ 6.94 Mp 201.7 cycles
    per packet
  • compute budget 201 cycles
  • latency budget (threads201)
  • 8 threads 1608 cycles
  • To hit 10 Gb rate
  • 90B per min IPv4 packet (78 min IPv4MN 12B IFS)
  • 1.4Ghz clock rate
  • 10 Gb/sec 1B/8b packet/90B 13.88 Mp/sec
  • 1.4Gcycle/sec 1 sec/ 13.88 Mp 100.86 cycles
    per packet
  • compute budget 100 cycles
  • latency budget (threads100)
  • 8 threads 800 cycles

30
TCAM Instruction Latency Analysis
  • QDR Clock 200 MHz, 5ns period
  • TCAM core Clock 200 MHz, 5ns period
  • NPU Clock 1400 MHz, 0.714 ns period
  • 1 QDR cycle 1 TCAM cycle 7 NPU cycles
  • TCAM Lookup Latencies
  • QDR xfer 1 cycle per word in key
  • Instruction Fifo constant 2 cycles
  • Synchronizer constant 3 cycles
  • Execution Latency fct(key width, output data
    width)
  • Table in IDT Latency Application Note
  • Re-Synchronizer constant 1 cycle

31
TCAM Instruction Latency Analysis
  • IPv4 MR
  • Key 144 bit (5 words)
  • Output data 128 bit
  • QDR Xfer 5 cycles
  • Constants 2 3 1 6 cycles
  • Execution Latency 36 cycles
  • Total Latency 47 TCAM cycles (235 ns) (329 NPU
    cycles)
  • LC Ingress
  • Key 64 bit (2 words)
  • Output data 64 bit
  • QDR Xfer 2 cycles
  • Constants 2 3 1 6 cycles
  • Execution Latency 32 cycles
  • Total Latency 40 TCAM cycles (200 ns) (280 NPU
    cycles)
  • LC Egress
  • Key 24 bit (1 words)
  • Output data 64 bit
  • QDR Xfer 1 cycles
  • Constants 2 3 1 6 cycles

32
TCAM Performance (Rates in M/sec)
LC_Egress
LC_Ingress
IPv4 MR
33
TCAM Performance (Rates in M/sec)
LC_Egress
LC_Ingress
IPv4 MR
34
IPv4 Performance Snapshot
610 Cycles
sram write
sram read
Timestamp Delay
dl_sink ctx_arb
dl_sink processing
Timestamp Delay setup
dl_source Xfer reg loads
  • IPv4 MR lookup
  • Unloaded

Ctx_arb vs br_signal optimization
35
IPv4 Performance Snapshot
Write issued At 34016
Write issued At 33333
34016 33333 683 Cycles
  • IPv4 MR lookup
  • Hack to Parse loop and repeatedly call dl_sink
    with same buf_handle
  • Should guarantee that there is always something
    in NN ring for lookup to pick up
  • Hack to HF set dlNextBlock to IX_DROP
  • Keep Tx from trying to transmit something bad.

36
LC_Ingress Performance Snapshots
gt563 Cycles
  • LC Ingress lookup
  • unloaded

37
LC_Ingress Performance Snapshots
Write issued At 60494
Write issued At 59888
60494 59888 606 Cycles
  • LC Ingress lookup
  • Hack to KE stub loop and repeatedly call dl_sink
    with same buf_handle
  • Should guarantee that there is always something
    in NN ring for lookup to pick up
  • Hack to HF stub set dl_next_block to IX_DROP
  • Keep Tx from trying to transmit something bad.

38
LC_Egress Performance Snapshots
560 Cycles
  • LC Egress lookup
  • Unloaded

39
LC_Egress Performance Snapshots
610 Cycles
  • LC Egress lookup
  • Loaded with KE and HF hacks.

40
Performance Summary
  • Processing Cycles
  • LC Ingress41
  • IPv4 MR 57
  • LC Egress43
  • Abort Cycles
  • LC Ingress16
  • IPv4 MR 16
  • LC Egress16
  • Latency Cycles
  • LC Ingress 560 57 503?
  • IPv4 MR 610 73 537?
  • LC Egress 560 59 501?
  • Expected performance
  • LC Ingress 10Gb/s
  • IPv4 MR 5Gb/s
  • LC Egress 10Gb/s

41
Optimizations Possibilities
  • May still be some code we can move out of
    processing loop or at least between sram write or
    read and the ctx swap.
  • dl_sink has a possible improvement.
  • ctx_arb vs. br_signal/br_!signal

42
Extra Slides
43
Image Slide Template
44
Text Slide Template
Write a Comment
User Comments (0)
About PowerShow.com