CSE 58x: Networking Practicum - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

CSE 58x: Networking Practicum

Description:

Custom network-specific instruction set programmed at assembler level ... Transmit and Receive FIFOs to external line cards. 32 m-engine opcodes. ALU instructions ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 37
Provided by: thef
Category:

less

Transcript and Presenter's Notes

Title: CSE 58x: Networking Practicum


1
CSE 58x Networking Practicum
  • Instructor Wu-chang Feng
  • TA Francis Chang

2
About the course
  • Prerequisite CSE 524 or the equivalent
  • Implementation-focused course
  • Intel's IXA network processor platform
  • Contents
  • Brief lecture material on network processors and
    the IXP
  • 5 weeks of designed laboratories
  • 3 weeks of final projects

3
Modern router architectures
  • Split into a fast path and a slow path
  • Control plane
  • High-complexity functions
  • Route table management
  • Network control and configuration
  • Exception handling
  • Data plane
  • Low complexity functions
  • Fast-path forwarding

4
Router functions
  • RFC 1812 plus...
  • Error detection and correction
  • Traffic measurement and policing
  • Frame and protocol demultiplexing
  • Address lookup and packet forwarding
  • Segmentation, fragmentation, reassembly
  • Packet classification
  • Traffic shaping
  • Timing and scheduling
  • Queuing
  • Security

5
Design choices for network products
  • General purpose processors
  • Embedded RISC processors
  • Network processors
  • Field-programmable gate arrays (FPGAs)
  • Application-specific integrated circuits (ASICs)

6
General purpose processors (GPP)
  • Programmable
  • Mature development environment
  • Typically used to implement control plane
  • Too slow to run data plane effectively
  • Sequential execution
  • CPU/Network 50x increase over last decade
  • Memory latencies 2x decrease over last decade
  • Gigabit ethernet 333 nanosecond per packet
    budget
  • Cache miss 150-200 nanoseconds

7
Embedded RISC processors (ERP)
  • Same as GPP, but
  • Slower
  • Cheaper
  • Smaller (require less board space)
  • Designed specifically for network applications
  • Typically used for control plane functions

8
Application-specific integrated circuits (ASIC)
  • Custom hardware
  • Long time to market
  • Expensive
  • Difficult to develop and simulate
  • Not programmable
  • Not reusable
  • But, the fastest of the bunch
  • Suitable for data plane

9
Field Programmable Gate Arrays (FPGA)
  • Flexible re-programmable hardware
  • Less dense and slower than ASICs
  • Cheaper than ASICs
  • Good for providing fast custom functionality
  • Suitable for data plane

10
Network processors
  • The speed of ASICs/FPGAs
  • The programmability and cost of GPPs/ERPs
  • Flexible
  • Re-usable components
  • Lower cost
  • Suitable for data plane

11
Network processors
  • Common features
  • Small, fast, on-chip instruction stores (no
    caching)
  • Custom network-specific instruction set
    programmed at assembler level
  • What instructions are needed for NPs? Open
    question.
  • Minimality, Generality
  • Multiple processing elements
  • Multiple thread contexts per element
  • Multiple memory interfaces to mask latency
  • Fast on-chip memory (headers) and slow off-chip
    memory (payloads)
  • No OS, hardware-based scheduling and thread
    switching

12
Why network processors?
  • The propaganda
  • Take the current vertical network device market
  • Commoditize horizontal slices of it
  • PC market
  • Initially, an IBM custom vertical
  • Now, a commodity market with Intel providing the
    chip-set
  • Network device market
  • Draw your own conclusions

13
Network processing approaches
ASIC
FPGA
Network processor
Speed
GPP
Embedded RISC Processor
Programming/Development Ease
14
Network processor architectures
  • Packet path
  • Store and forward
  • Packet payload completely stored in and forwarded
    from off-chip memory
  • Allows for large packet buffers
  • Re-ordering problems with multiple processing
    elements
  • Intel IXP, Motorola C5
  • Cut-through
  • Packet held in an on-chip FIFO and forwarded
    through directly
  • Small packet buffers
  • Built-in packet ordering
  • AMCC

15
Network processor architectures
  • Processing architecture
  • Parallel
  • Each element independently performs entire
    processing function
  • Packet re-ordering problems
  • Larger instruction store needed per element
  • Pipelined
  • Each element performs one part of larger
    processing function
  • Communicates result to next processing element in
    pipeline
  • Smaller code space
  • Packet ordering retained
  • Deterministic behavior (no memory thrashing)
  • Hybrid

16
Network processor architectures
  • Processing hierarchy
  • ASICs
  • Embedded RISC processors
  • Specialized co-processors
  • See figure 13.7 in book

17
Network processor architectures
  • Memory hierarchy
  • Small on-chip memory
  • Control/Instruction store
  • Registers
  • Cache
  • RAM
  • Large off-chip memory
  • Cache
  • Static RAM
  • Dynamic RAM

18
Network processor architectures
  • Internal interconnect
  • Bus
  • Cross-bar
  • FIFO
  • Transfer registers

19
Network processor architectures
  • Concurrency
  • Hardware support for multiple thread contexts
  • Operating system support for multiple thread
    contexts
  • Pre-emptiveness
  • Migration support

20
Increasing network processor performance
  • Processing hierarchy
  • Increase clock speed
  • Increase elements
  • Memory hierarchy
  • Increase size
  • Decrease latency
  • Pipelining
  • Add hierachies
  • Add memory bandwidth (parallel stores)
  • Add functional memory (CAMs)

21
Focus of this class...
  • Network processors
  • Intel IXA

22
IXP 1200 features
  • One embedded RISC processor (StrongARM)
  • Runs control plane (Linux)
  • 6 programmable packet processors (m-engines)
  • Runs data plane (m-engine assembler or m-engine
    C)
  • Central hash unit
  • Multiple, bus interconnects
  • IXBus (4.4Gbps) to overcome PCI's 2.2Gbps limit
  • Small on-board memory
  • Serial interface for control
  • External interfaces for memory

23
(No Transcript)
24
IXP12xx m-engine
25
IXP2xxx m-engine
26
m-engine functions
  • Packet ingress from physical layer interface
  • Checksum verification
  • Header processing and classification
  • Packet buffering in memory
  • Table lookup and forwarding
  • Header modification
  • Checksum computation
  • Packet egress to physical layer interface

27
m-engine characteristics
  • Programmable microcontroller
  • Custom RISC instruction set
  • Private 2048 instruction store per m-engine
    (loaded by StrongARM)
  • 5-stage execution pipeline
  • Hardware support for 4 threads and context
    switching
  • Each m-engine has 4 hardware contexts (mask
    memory latency)

28
m-engine characteristics
  • 128 general purpose registers
  • Can be partitioned or shared
  • Absolute or context-relative
  • 128 transfer registers
  • Staging registers for memory transfers
  • 4 blocks of 32 registers
  • SDRAM or SRAM
  • Read or Write
  • Local Control and Status Registers (CSRs)
  • USTORE instructions, CTX, etc. (p. 315)

29
m-engine characteristics
  • FBI unit
  • Scratchpad memory
  • Hash unit
  • FBI CSRs
  • IXBus control
  • IXBus FIFOs
  • Transmit and Receive FIFOs to external line cards

30
32 m-engine opcodes
  • ALU instructions
  • ALU, ALU_SHF, DBL_SHIFT
  • Branch/Jump instructions
  • BR, BR0, BR!0, BR_BSET, BRBYTE, BRCTX,
    BR_INP_STATE, BR_!SIGNAL, JUMP, RTN, etc.
  • Reference instructions
  • CSR, FAST_WR, LOCAL_CSR_RD, R_FIFO_RD, PCI_DMA,
    SCRATCH, SDRAM, SRAM, T_FIFO_WR, etc.
  • Local register instructions
  • FIND_BST, IMMED, LD_FIELD, LOAD_ADDR,
    LOAD_BSET_RESULT1, etc.

31
32 m-engine functions
  • Miscellaneous
  • CTX_ARB
  • NOP
  • HASH1_48, HASH1_64, etc.

32
8
9
8
8
9
7. m-engine or StrongARM processing 8. Packet
header read from SDRAM or RFIFO into m-engine
and classified (via SRAM tables) 9. Packet
headers modified 10. mpackets sent to
interface 11. Poll for space on MAC Update
transmit-ready if room for mpacket 12. mpackets
transferred to MAC
1. Packet received on physical interface (MAC) 2.
Ready-bus sequencer polls MAC for mpacket
Updates receive-ready upon a full mpacket 3.
m-engine polls for receive-ready 4. m-engine
instructs FBI to move mpacket from MAC to
RFIFO 5. m-engine moves mpacket directly from
RFIFO to SDRAM 6. Repeat 1-5 until full packet
received
33
Programming the IXP
  • Focus of this course on steps 7, 8, and 9
  • 2 programming frameworks
  • Command-line, IXA Active Computing Engine (ACE)
    framework
  • Graphical microengine C development environment

34
Programming the IXP
  • Command-line, IXA Active Computing Engine (ACE)
    framework
  • Re-usable function blocks chained together to
    build an application (Chapters 22-24)
  • New functions implemented as new blocks in chain
  • Core ACEs (StrongARM)
  • Written in C
  • Microblock ACEs (microengines)
  • Written in assembler

35
(No Transcript)
36
Programming the IXP
  • Graphical microengine C development environment
  • Monolithic microengine C code (can not be used on
    IXP1200 hardware)
  • Demos forthcoming
Write a Comment
User Comments (0)
About PowerShow.com