An NPBased Router for the Open Network Lab - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

An NPBased Router for the Open Network Lab

Description:

Usage Scenarios: It would be good to document some typical ONL usage examples. ... SRAM Usage. What will be using SRAM? Buffer descriptors. Current MR supports ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 41
Provided by: kareny
Category:
Tags: lab | network | npbased | open | router

less

Transcript and Presenter's Notes

Title: An NPBased Router for the Open Network Lab


1
An NP-Based Router for the Open Network Lab
Jon Turnerwith Patrick Crowley, John DeHart,
Brandon Heller, Fred Kuhns, Jing Lu, Mike Wilson,
Charlie Wiseman, Dave Zar
2
Issues and Questions
  • Dropcounters
  • What is our performance target?
  • 5-port Router, full link rates.
  • How should SRAM banks be allocated?
  • How many packets should be able to be resident in
    system at any given time?
  • How many queues do we need to support?
  • Etc.
  • How will lookups be structured?
  • One operation across multiple DBs vs. multiple
    operations each on one DB
  • Will results be stored in Associated Data SRAM or
    in one of our SRAM banks?
  • Can we use SRAM Bank0 and still get the
    throughput we want?
  • Multicast
  • Are we defining how an ONL user should implement
    multicast?
  • Or are we just trying to provide some mechanisms
    to allow ONL users to experiment with multicast?
  • Do we need to allow a Unicast lookup with one
    copy going out and one copy going to a plugin?
  • If so, this would use the NH_MAC field and the
    copy vector field
  • Plugins
  • Can they send pkts directly to the QM instead of
    always going back through Parse/Lookup/Copy?
  • Use of NN rings between Plugins to do plugin
    chaining

3
Issues and Questions
  • XScale
  • Can it send pkts directly to the QM instead of
    always going through Parse/Lookup/Copy path?
  • ARP request and reply?
  • What else will it do besides handling ARP?
  • Do we need to guarantee in-order delivery of
    packets for a flow that triggers an ARP
    operation?
  • Re-injected packet may be behind a recently
    arrived packet for same flow.
  • What is the format of our Buffer Descriptor
  • Add Reference Count (4 bits)
  • Add MAC DAddr (48 bits)
  • Does the Packet Size or Offset ever change once
    written?
  • Plugins Can they change the packet?
  • Other?
  • How will we write L2 Headers for multicast
    packets?
  • If we are going to do this for multicast, we will
    do it for all packets, right?
  • Copy writes MAC DAddr into Buffer descriptor
  • HF reads MAC DAddr from Buffer descriptor
  • HF writes full L2 Header into scratch ring data
    for Tx
  • Tx takes L2 Header data (14 Bytes) from scratch
    ring and writes it to TBUF
  • TX initiates transfer of rest of packet from DRAM
    to TBUF

4
Issues and Questions
  • How will we manage the Free list?
  • Support for Multicast (ref count in buf desc)
    makes reclaiming buffers a little trickier.
  • Scratch ring to Separate ME
  • Modify dl_buf_drop()
  • Performance assumptions of blocks that do drops
    may have to be changed if we add an SRAM
    operation to a drop
  • Note test_and_decr SRAM atomic operation returns
    pre-modified value
  • Usage Scenarios
  • It would be good to document some typical ONL
    usage examples.
  • This might just be extracting some stuff from
    existing ONL documentation and class projects.
  • Ken?
  • It might also be good to document a JST dream
    sequence for an ONL experiment
  • Oh my, what I have done now
  • Do we need to worry about balancing MEs across
    the two clusters?
  • QM and Lookup are probably heaviest SRAM users
  • Rx and Tx are probably heaviest DRAM users.
  • Plugins need to be in neighboring MEs
  • QM and HF need to be in neighboring MEs

5
Performance
  • What is our performance target?
  • To hit 5 Gb rate
  • Minimum Ethernet frame 76B
  • 64B frame 12B InterFrame Spacing
  • 5 Gb/sec 1B/8b packet/76B 8.22 Mpkt/sec
  • IXP ME processing
  • 1.4Ghz clock rate
  • 1.4Gcycle/sec 1 sec/ 8.22 Mp 170.3 cycles
    per packet
  • compute budget (MEs170)
  • 1 ME 170 cycles
  • 2 ME 340 cycles
  • 3 ME 510 cycles
  • 4 ME 680 cycles
  • latency budget (threads170)
  • 1 ME 8 threads 1360 cycles
  • 2 ME 16 threads 2720 cycles
  • 3 ME 24 threads 4080 cycles
  • 4 ME 32 threads 5440 cycles

6
ONL NP Router (Jons Original)
xScale
xScale
add largeSRAM ring
TCAM
SRAM
Rx (2 ME)
HdrFmt (1 ME)
Parse, Lookup, Copy (3 MEs)
Mux (1 ME)
Tx (2 ME)
QueueManager (1 ME)
largeSRAM ring
Stats (1 ME)
  • Each output has common set of QiDs
  • Multicast copies use same QiD for all outputs
  • QiD ignored for plugin copies

Plugin
Plugin
Plugin
Plugin
Plugin
SRAM
xScale
largeSRAM ring
7
Design Configuration
  • Add NN rings between Plugins for chaining
  • Add Plugin write to QM Scratch Ring
  • Tx is only 1ME
  • Add Freelist Mgr ME

8
ONL NP Router
xScale
xScale
TCAM
Assoc. Data ZBT-SRAM
SRAM
HdrFmt (1 ME)
Parse, Lookup, Copy (3 MEs)
Rx (2 ME)
Mux (1 ME)
Tx (1 ME)
QM (1 ME)
NN
SRAM
NN
NN
NN
NN
SRAM Ring
Plugin1
Plugin2
Plugin3
Plugin4
Plugin5
SRAM
xScale
Scratch Ring
NN Ring
NN
9
ONL Buffer Descriptor
Buffer_Next (32b)
LW0
Buffer_Size (16b)
Offset (16b)
LW1
Packet_Size (16b)
Reserved (12b)
Free_list 0000 (4b)
LW2
MAC DAddr_47_32 (16b)
Stats Index (16b)
LW3
MAC DAddr_31_00 (32b)
LW4
Reserved (28b)
Ref_Cnt (4b)
LW5
Reserved (32b)
LW6
Packet_Next (32b)
LW7
Written by Freelist Mgr
Written by Rx
Written by Copy
Written by QM
10
MR Buffer Descriptor
Buffer_Next (32b)
LW0
Buffer_Size (16b)
Offset (16b)
LW1
Packet_Size (16b)
Reserved (8b)
Free_list 0000 (4b)
Reserved (4b)
LW2
Reserved (16b)
Stats Index (16b)
LW3
Reserved (16b)
Reserved (8b)
Reserved (4b)
Reserved (4b)
LW4
Reserved (32b)
Reserved (4b)
Reserved (4b)
LW5
Reserved (16b)
Reserved (16b)
LW6
Packet_Next (32b)
LW7
11
Intel Buffer Descriptor
Buffer_Next (32b)
LW0
Buffer_Size (16b)
Offset (16b)
LW1
Packet_Size (16b)
Hdr_Type (8b)
Free_list (4b)
Rx_stat (4b)
LW2
Input_Port (16b)
Output_Port (16b)
LW3
Next_Hop_ID (16b)
Fabric_Port (8b)
Reserved (4b)
NHID type (4b)
LW4
FlowID (32b)
ColorID (4b)
Reserved (4b)
LW5
Class_ID (16b)
Reserved (16b)
LW6
Packet_Next (32b)
LW7
12
SRAM Usage
  • What will be using SRAM?
  • Buffer descriptors
  • Current MR supports 229,376 buffers
  • 32 Bytes per SRAM buffer descriptor
  • 7 MBytes
  • Queue Descriptors
  • Current MR supports 65536 queues
  • 16 Bytes per Queue Descriptor
  • 1 MByte
  • Queue Parameters
  • 16 Bytes per Queue Params (actually only 12 used
    in SRAM)
  • 1 MByte
  • QM Scheduling structure
  • Current MR supports 13109 batch buffers per QM ME
  • 44 Bytes per batch buffer
  • 576796 Bytes
  • QM Port Rates
  • 4 Bytes per port
  • Plugin scratch memory

13
SRAM Bank Allocation
  • SRAM Banks
  • Bank0
  • 4 MB
  • Same interface/bus as TCAM
  • Bank1-3
  • 8 MB each
  • Criteria for how SRAM banks should be allocated?
  • Size
  • SRAM Bandwidth
  • How many SRAM accesses per packet are needed for
    the various SRAM uses?
  • QM needs buffer desc and queue desc in same bank

14
SRAM Accesses Per Packet
  • To support 8.22 M pkts/sec we can have 24 Reads
    and 24 Writes per pkt (200M/8.22M)
  • Rx
  • SRAM Dequeue (1 Word)
  • To retrieve a buffer descriptor from free list
  • Write buffer desc (2 Words)
  • Parse
  • Lookup
  • TCAM Operations
  • Reading Results
  • Copy
  • Write buffer desc (3 Words)
  • Ref_cnt
  • MAC DAddr
  • Stats Index
  • Pre-Q stats increments
  • Read 2 Words
  • Write 2 Words
  • HF
  • Should not need to read or write any of the
    buffer descriptor

15
QM SRAM Accesses Per Packet
  • QM (Worst case analysis)
  • Enqueue (assume queue is idle and not loaded in
    Q-Array)
  • Write Q-Desc (4 Words)
  • Eviction of Least Recently Used Queue
  • Write Q-Params ?
  • When we evict a Q do we need to write its params
    back?
  • The Q-Length is the only thing that the QM is
    changing.
  • Looks like it writes it back ever time it
    enqueues or dequeues
  • AND it writes it back when it evcicts (we can
    probably remove the one when it evicts)
  • Read Q-Desc (4 Words)
  • Read Q-Params (3 Words)
  • Q-Length, Threshold, Quantum
  • Write Q-Length (1 Word)
  • SRAM Enqueue -- Write (1 Word)
  • Scheduling structure accesses?
  • They are done once every 5 pkts (when running
    full rate)
  • Dequeue (assume queue is not loaded in Q-Array)
  • Write Q-Desc (4 Words)
  • Write Q-Params ?

16
QM SRAM Accesses Per Packet
  • QM (Worst case analysis)
  • Total Per Pkt accesses
  • Queue Descriptors and Buffer Enq/Deq
  • Write 9 Words
  • Read 9 Words
  • Queue Params
  • Write 2 Words
  • Read 6 Words
  • Scheduling Structure Accesses Per Iteration
    (batch of 5 packets)
  • Advance Head Read 11 Words
  • Write Tail Write 11 Words
  • Update Freelist
  • Read 2 Words
  • OR
  • Write 5 Words

17
Proposed SRAM Bank Allocation
  • SRAM Bank 0
  • TCAM
  • Lookup Results
  • SRAM Bank 1 (2.5MB/8MB)
  • QM Queue Params (1MB)
  • QM Scheduling Struct (0.5 MB)
  • QM Port Rates (20B)
  • Large Inter-Block Rings (1MB)
  • SRAM Rings are of sizes (in Words) 0.5K, 1K, 2K,
    4K, 8K, 16K, 32K, 64K
  • Rx ? Mux (2 Words per pkt) 32KW (16K pkts)
    128KB
  • ? Plugin (3 Words per pkt) 32KW each (10K Pkts
    each) 640KB
  • ? Plugin (3 Words per pkt) 64KW (20K Pkts)
    256KB
  • SRAM Bank 2 (8MB/8MB)
  • Buffer Descriptors (7MB)
  • Queue Descriptors (1MB)
  • SRAM Bank 3 (6MB/8MB)
  • Stats Counters (1MB)
  • Plugin scratch memory (5MB, 1MB per plugin)

18
Lookups
  • How will lookups be structured?
  • Three Databases
  • Route Lookup Containing Unicast and Multicast
    Entries
  • Unicast
  • Port Can be wildcarded
  • Longest Prefix Match on DAddr
  • Routes should be shorted in the DB with longest
    prefixes first.
  • Multicast
  • Port Can be wildcarded?
  • Exact Match on DAddr
  • Longest Prefix Match on SAddr
  • Routes should be sorted in the DB with longest
    prefixes first.
  • Primary Filter
  • Filters should be sorted in the DB with higher
    priority filters first
  • Auxiliary Filter
  • Filters should be sorted in the DB with higher
    priority filters first
  • Will results be stored in Associated Data SRAM or
    in one of our external SRAM banks?
  • Can we use SRAM Bank0 and still get the
    throughput we want?
  • Priority between Primary Filter and Route Lookup

19
TCAM Operations for Lookups
  • Five TCAM Operations of interest
  • Lookup (Direct)
  • 1 DB, 1 Result
  • Multi-Hit Lookup (MHL) (Direct)
  • 1 DB, lt 8 Results
  • Simultaneous Multi-Database Lookup (SMDL)
    (Direct)
  • 2 DB, 1 Result Each
  • DBs must be consecutive!
  • Care must be given when assigning segments to DBs
    that use this operation. There must be a clean
    separation of even and odd DBs and segments.
  • Multi-Database Lookup (MDL) (Indirect)
  • lt 8 DB, 1 Result Each
  • Simultaneous Multi-Database Lookup (SMDL)
    (Indirect)
  • 2 DB, 1 Result Each
  • Functionally same as Direct version but key
    presentation and DB selection are different.
  • DBs need not be consecutive.
  • Care must be given when assigning segments to DBs
    that use this operation. There must be a clean
    separation of even and odd DBs and segments.

20
Lookups
  • Route Lookup
  • Key (68b)
  • Port/Plugin (4b)
  • Can be a wildcard for Unicast.
  • Probably cant be a wildcard for Multicast
  • DAddr (32b)
  • Prefixed for Unicast
  • Exact Match for Multicast
  • SAddr (32b)
  • Unicast entries always have this and its mask 0
  • Prefixed for Multicast
  • Result (72b)
  • Port/Plugin (4b)
  • One of 5 ports or 5 plugins.
  • QID (17b)
  • NH_IP/NH_MAC/CopyVector (48b)
  • At most one of NH_IP, NH_MAC or CopyVector should
    be valid
  • Valid Bits (3b)
  • At most one of the following three bits should be
    set

21
Lookups
  • Filter Lookup
  • Key (136b)
  • Port/Plugin (4b)
  • Can be a wildcard for Unicast.
  • Probably cant be a wildcard for Multicast
  • DAddr (32b)
  • SAddr (32b)
  • Protocol (8b)
  • DPort (16b)
  • Sport (16b)
  • TCP Flags (12b)
  • Exception Bits (16b)
  • Allow for directing of packets based on defined
    exceptions
  • Result (84b)
  • Port/Plugin (4b)
  • NH IP(32b)/MAC(48b)/CopyVector(10b) (48b)
  • At most one of NH_IP, NH_MAC or CopyVector should
    be valid
  • QID (17b)
  • LD (1b) Send to XScale

22
TCAM Core Lookup Performance
Routes
Filters
  • Lookup/Core size of 72 or 144 bits, Freq200MHz
  • CAM Core can support 100M searches per second
  • For 1 Router on each of NPUA and NPUB
  • 8.22 MPkt/s per Router
  • 3 Searches per Pkt (Primary Filter, Aux Filter,
    Route Lookup)
  • Total Per Router 24.66 M Searches per second
  • TCAM Total 49.32 M Searches per second
  • So, the CAM Core can keep up
  • Now lets look at the LA-1 Interfaces

23
TCAM LA-1 Interface Lookup Performance
Routes
Filters
  • Lookup/Core size of 144 bits (ignore for now that
    Route size is smaller)
  • Each LA-1 interface can support 40M searches per
    second.
  • For 1 Router on each of NPUA and NPUB (each NPU
    uses a separate LA-1 Intf)
  • 8.22 MPkt/s per Router
  • Maximum of 3 Searches per Pkt (Primary Filter,
    Aux Filter, Route Lookup)
  • Max of 3 assumes they are each done as a separate
    operation
  • Total Per Interface 24.66 M Searches per second
  • So, the LA-1 Interfaces can keep up
  • Now lets look at the AD SRAM Results

24
TCAM Assoc. Data SRAM Results Performance
  • 8.22M 72b or 144b lookups
  • 32b results consumes 1/12
  • 64b results consumes 1/6
  • 128b results consumes 1/3

Routes
Filters
  • Lookup/Core size of 72 or 144 bits, Freq200MHz,
    SRAM Result Size of 128 bits
  • Associated SRAM can support up to 25M searches
    per second.
  • For 1 Router on each of NPUA and NPUB
  • 8.22 MPkt/s per Router
  • 3 Searches per Pkt (Primary Filter, Aux Filter,
    Route Lookup)
  • Total Per Router 24.66 M Searches per second
  • TCAM Total 49.32 M Searches per second
  • So, the Associated Data SRAM can NOT keep up

25
Lookups Proposed Design
  • Use SRAM Bank 0 (4 MB) for all Results
  • B0 Byte Address Range 0x000000 0x3FFFFF
  • 22 bits
  • B0 Word Address Range 0x000000 0x3FFFFC
  • 20 bits
  • Two trailing 0s
  • Use 32-bit Associated Data SRAM result for
    Address of actual Result
  • Done 1b
  • Hit 1b
  • MHit 1b
  • Priority 8b
  • Present for Primary Filters, for RL and Aux
    Filters should be 0
  • SRAM B0 Word Address 21b
  • 1 spare bit
  • Use Multi-Database Lookup (MDL) Indirect for
    searching all 3 DBs
  • Order of fields in Key is important.
  • Each thread will need one TCAM context
  • Route DB
  • Lookup Size 68b (3 32b words transferred across
    QDR intf)

26
Lookups Latency
  • Three searches in one MDL Indirect Operation
  • Latencies for operation
  • QDR xfer time 6 clock cycles
  • 1 for MDL Indirect subinstruction
  • 5 for 144 bit key transferred across QDR Bus
  • Instruction Fifo 2 clock cycles
  • Synchronizer 3 clock cycles
  • Execution Latency search dependent
  • Re-Synchronizer 1 clock cycle
  • Total 12 clock cycles

27
Lookups Latency
  • 144 bit DB, 32 bits of AD (two of these)
  • Instruction Latency 30
  • Core blocking delay 2
  • Backend latency 8
  • 72 bit DB, 32 bits of AD
  • Instruction Latency 30
  • Core blocking delay2
  • Backend latency 8
  • Latency of first search (144 bit DB)
  • 11 30 41 clock cycles
  • Latency of subsequent searchs
  • (previous search latency) (backend latency of
    previous search) (core block delay of previous
    search) (backend latency of this search)
  • Latency of second 144 bit search
  • 41 8 2 8 43
  • Latency of third search (72 bit)
  • 43 8 2 8 45 clock cycles
  • 45 QDR Clock cycles (200 MHz clock) ? 315 IXP
    Clock cycles (1400 MHz clock)
  • This is JUST for the TCAM operation, we also need
    to read the SRAM
  • SRAM Read to retrieve TCAM Results Mailbox (3
    words one per search)

28
Lookups SRAM Bandwidth
  • Analysis is PER LA-1 QDR Interface
  • That is, each of NPUA and NPUB can do the
    following.
  • 16-bit QDR SRAM at 200 MHz
  • Separate read and write bus
  • Operations on rising and falling edge of each
    clock
  • 32 bits of read AND 32 bits of write per clock
    tick
  • QDR Write Bus
  • 6 32-bit cycles per instruction
  • Cycle 0
  • Write Address bus contains the TCAM Indirect
    Instruction
  • Write Data bus contains the TCAM Indirect MDL
    Sub-Instruction
  • Cycles 1-5
  • Write Data bus contains the 5 words of the Lookup
    Key
  • Write Bus can support 200M/6 33.33 M
    searches/sec
  • QDR Read Bus
  • Retrieval of Results Mailbox
  • 3 32-bit cycles per instruction
  • Retrieval of two full results from QDR SRAM Bank
    0
  • 6 32-bit cycles per instruction

29
Objectives for ONL Router
  • Reproduce approximately same functionality as
    current hardware router
  • routes, filters (including sampling filters),
    stats, plugins
  • Extensions
  • multicast, explicit-congestion marking
  • Use each NPU as separate 5 port router
  • each responsible for half the external ports
  • xScale on each NPU implements CP functions
  • access to control variables, memory-resident
    statistics
  • updating of routes, filters
  • interaction with plugins through shared memory
  • simple message buffer interface for
    request/response

30
Unicast, ARP and Multicast
  • Each port has Ethernet header with fixed source
    MAC address several cases for destination MAC
    address
  • Case 1 unicast packet with destination on
    attached subnet
  • requires ARP to map dAdr to MAC address
  • ARP cache holds mappings issue ARP request on
    cache miss
  • Case 2 other unicast packets
  • lookup must provide next-hop IP address
  • then use ARP to obtain MAC address, as in case 1
  • Case 3 Multicast packet
  • lookup specifies copy-vector and QiD
  • destination MAC address formed from IP multicast
    address
  • Could avoid ARP in some cases
  • e.g. point-to-point link
  • but little advantage, since ARP mechanism
    required anyway
  • Do we learn MAC Addresses from received pkts?

31
Proposed Approach
  • Lookup does separate route lookup and filter
    lookup
  • at most one match for route, up to two for filter
    (primary, aux)
  • combine route lookup with ARP cache lookup
  • xScale adds routes for multi-access subnets,
    based on ARP
  • Route lookup
  • for unicast, stored keys are (rcv port)(dAdr
    prefix)
  • lookup key is (rcv port)(dAdr)
  • result includes Port/Plugin, QiD, next-hop IP or
    MAC address, valid next-hop bit
  • for multicast, stored keys are (rcv
    port)(dAdr)(sAdr prefix)
  • lookup key is (rcv port)(dAdr)(sAdr)
  • result includes 10 bit copy vector, QiD
  • Filter lookup
  • stored key is IP 5-tuple TCP flags arbitrary
    bit masks allowed
  • lookup key is IP 5-tuple flags if applicable
  • result includes Port/Plugin or copy vector, QiD,
    next-hop IP or MAC address, valid next-hop bit,
    primary-aux bit, priority
  • Destination MAC address passed through QM
  • via being written in the buffer descriptor?
  • Do we have 48 bits to spare?
  • Yes, we actually have 14 free bytes. Enough for a
    full (non-vlan) ethernet header.

32
Lookup Processing
  • On receiving unicast packet, do route filter
    lookups
  • if MAC address returned by route (or higher
    priority primary filter) is valid, queue the
    packet and continue
  • else, pass packet to xScale, marking it as no-MAC
  • leave it to xScale to generate ARP request,
    handle reply, insert route and re-inject packet
    into data path
  • On receiving multicast packet, do route filter
    lookups
  • take higher priority result from route lookup or
    primary filter
  • format MAC multicast address
  • copy to queues specified by copy vector
  • if matching auxiliary filter, filter supplies MAC
    address

33
Extra Slides
34
ONL NP Router
TCAM
SRAM
Rx (2 ME)
HdrFmt (1 ME)
Parse, Lookup, Copy (3 MEs)
Mux (1 ME)
Tx (2 ME)
QueueManager (1 ME)
35
ONL NP Router
TCAM
SRAM
HdrFmt (1 ME)
Rx (2 ME)
Parse, Lookup, Copy (3 MEs)
Mux (1 ME)
Tx (2 ME)
QueueManager (1 ME)
36
ONL NP Router
TCAM
  • Copy
  • Port Identifies Source MAC Addr
  • Write it to buffer descriptor or let HF determine
    it via port?
  • Unicast
  • Valid MAC
  • Write MAC Addr to Buffer descriptor and queue pkt
  • No Valid MAC
  • Prepare pkt to be sent to XScale for ARP
    processing
  • Multicast
  • Calculate Ethernet multicast Dst MAC Addr
  • Fct(IP Multicast Dst Addr)
  • Write Dst MAC Addr to buf desc.
  • Same for all copies!
  • For each bit set in copy bit vector
  • Queue a packet to port represented by bit in bit
    vector.
  • Reference Count in buffer desc.

Parse, Lookup, PHFCopy (3 MEs)
  • Parse
  • Do IP Router checks
  • Extract lookup key
  • Lookup
  • Perform lookups potentially three lookups
  • Route Lookup
  • Primary Filter lookup
  • Auxiliary Filter lookup

37
Notes
  • Need a reference count for multicast. (in buffer
    descriptor)
  • How to handle freeing buffer for multicast
    packet?
  • Drops can take place in the following blocks
  • Parse
  • QM
  • Plugin
  • Tx
  • Mux ? Parse
  • Reclassify bit
  • For traffic that does not get reclassified after
    coming from a Plugin or the XScale we need all
    the data that the QM will need
  • QID
  • Stats Index
  • Output Port
  • If a packet matches an Aux filter AND it needs
    ARP processing, the ARP processing takes
    precedence and we do not process the Aux filter
    result.
  • Does anything other than ARP related traffic go
    to the XScale?
  • IP exceptions like expired TTL?
  • Can users direct traffic for delivery to the
    XScale and add processing there?
  • Probably not if we are viewing the XScale as
    being like our CPs in the NSP implementation.

38
Notes
  • Combining Parse/Lookup/Copy
  • Dispatch loop
  • Build settings
  • TCAM mailboxes (there are 128 contexts)
  • So with 24 threads we can have up to 5 TCAM
    contexts per thread.
  • Rewrite Lookup in C
  • Input and Output on Scratch rings
  • Configurable priorities on Mux inputs
  • Xscale, Plugins, Rx
  • Should we allow plugins to write directly to QM
    input scratch ring for packets that do not need
    reclassification?
  • If we allow this is there any reason for a plugin
    to send a packet back through Parse/Lookup/Copy
    if it wants it to NOT be reclassified?
  • We can give Plugins the capability to use NN
    rings between themselves to chain plugins.

39
ONL NP Router
xScale
xScale
add configurable per port delay (up to 150 ms
total delay)
add largeSRAM ring
TCAM
Assoc. Data ZBT-SRAM
SRAM
Rx (2 ME)
HdrFmt (1 ME)
Parse, Lookup, Copy (4 MEs)
Mux (1 ME)
Tx (1 ME)
QueueManager (1 ME)
largeSRAM ring
Stats (1 ME)
  • Each output has common set of QiDs
  • Multicast copies use same QiD for all outputs
  • QiD ignored for plugin copies

Plugin
Plugin
Plugin
Plugin
Plugin
SRAM
xScale
Plugin write access to QM Scratch Ring
largeSRAM ring
40
ONL NP Router
xScale
xScale
TCAM
SRAM
Rx (2 ME)
HdrFmt (1 ME)
Parse, Lookup, Copy (4 MEs)
Mux (1 ME)
Tx (1 ME)
QueueManager (1 ME)
  • Each output has common set of QiDs
  • Multicast copies use same QiD for all outputs
  • QiD ignored for plugin copies

Stats (1 ME)
NN
NN
NN
NN
Plugin1
SRAM
xScale
Write a Comment
User Comments (0)
About PowerShow.com