Title: CS 260 Lecture 1: Introduction to Network Processors
1CS 260Lecture 1 Introduction toNetwork
Processors
- Instructor L.N. Bhuyan
- www.cs.ucr.edu/bhuyan/CS260
2Outline
- Introduction to NP Systems
- Relevant Applications
- Design Issues and Challenges
- Relevant Software and Benchmarks
- A case study Intel IXP network processors
3What are Network Processors
- Any device that executes programs to handle
packets in a data network - Examples
- Processors on router line cards
- Processors in network access equipment
4Why Network Processors
- Current Situation
- Data rates are increasing
- Protocols are becoming more dynamic and
sophisticated - Protocols are being introduced more rapidly
- Processing Elements
- GP(General-purpose Processor)
- Programmable, Not optimized for networking
applications - ASIC(Application Specific Integrated Circuit)
- high processing capacity, long time to develop,
Lack the flexibility - NP(Network Processor)
- achieve high processing performance
- programming flexibility
- Cheaper than GP
5Typical NP Architecture
6TCP/IP Model
- ISO OSI (Open Systems Interconnection) not fully
implemented - Presentation and Session layers not present in
TCP/IP
7Processing Tasks
Source Network Processor Tutorial in Micro 34 -
Mangione-Smith Memik
8Application Categorization
- Control-Plane tasks
- Less time-critical
- Control and management of device operation
- Table maintenance, port states, etc.
- Data-Plane tasks
- Operations occurring real-time on packet path
- Core device operations
- Receive, process and transmit packets
9Data Plane Tasks
- Media Access Control
- Low-level protocol implementation
- Ethernet, SONET framing, ATM cell processing,
etc. - Data Parsing
- Parsing cell or packet headers for address or
protocol information - Classification
- Identify packet against a criteria (filtering /
forwarding decision, QoS, accounting, etc.) - Data Transformation
- Transformation of packet data between protocols
- Traffic Management
- Queuing, scheduling and policing packet data
10Applications IPv4 Routing
P
P
P
B
A
C
Router
- Routers determine next hop and forward packets
11URL-based switching My NSF Project
www.yahoo.com
Internet
Image Server
APP. DATA
TCP
IP
Application Server
Switch
GET /cgi-bin/form HTTP/1.1 Host www.yahoo.com
HTML Server
- Increase efficiency
- Tasks
- Traverse the packet data (request) for each
arriving packet and classify it - Contains .jpg -gt to image server
- Contains cgi-bin/ -gt to application server
12Organizing Processor Resources
- Design decisions
- High-level organization
- ISA and micro architecture
- Memory and I/O integration
- Todays commercial NPs
- Chip multiprocessors
- Most are multithreaded
- Exploit little ILP (Cisco does)
- No cache
- Micro-programmed
13Architectural Comparisons
- High-level organizations
- Aggressive superscalar (SS)
- Fine-grained multithreaded (FGMT)
- Chip multiprocessor (CMP)
- Simultaneous multithreaded (SMT)
14Multithreading
- Basic idea
- multiple register sets in the processor
- fast context switch
- switch thread on a cache access (How is this
different than non-blocking cache?) - tolerating local latency vs remote in CC-NUMA
multiprocessors - hybrids
- switch on notice
- simultaneous multithreading
15Architectural Comparisons (cont.)
Simultaneous Multithreading
Multiprocessing
Superscalar
Fine-Grained
Coarse-Grained
Time (processor cycle)
Thread 1
Thread 3
Thread 5
Thread 2
Thread 4
Idle slot
16Tasks and Services
Three Benchmarks used in the experiment
17Some Challenges
- Intelligent Design
- Given a selection of programs, a target network
link speed, the best design for the processor - Least area
- Least power
- Most performance
- Write efficient multithreaded programs
- NPs have
- Heterogeneous computer resources
- Non-uniform memory
- Multiple interacting threads of execution
- Real-time constraints
- Make use of resources
- How to use special instructions and hardware
assists - Compilers
- Hand-coded
- Multithreaded programs
- Manage access to shared state
- Synchronization between threads
18Benchmarks for Network Processors
- NetBench
- 10 applications
- http//cares.icsl.ucla.edu/NetBench
- CommBench
- 8 networking and communications applications
- http//ccrc.wustl.edu/wolf/cb/
- EEMBC
- http//www.eembc.org/benchmark
- MediaBench
- Transcoders
- Some communications applications
19IXP1200 Block Diagram
- StrongARM processing core
- Microengines introduce new ISA
- I/O
- PCI
- SDRAM
- SRAM
- IX PCI-like packet bus
- On chip FIFOs
- 16 entry 64B each
20IXP1200 Microengine
- 4 hardware contexts
- Single issue processor
- Explicit optional context switch on SRAM access
- Registers
- All are single ported
- Separate GPR
- 2566 1536 registers total
- 32-bit ALU
- Can access GPR or XFER registers
- Shared hash unit
- 1/2/3 values 48b/64b
- For IP routing hashing
- Standard 5 stage pipeline
- 4KB SRAM instruction store not a cache!
- Barrel shifter
21IXP 2400 Block Diagram
- XScale core replaces StrongARM
- Microengines
- Faster
- More 2 clusters of 4 microengines each
- Local memory
- Next neighbor routes added between microengines
- Hardware to accelerate CRC operations and Random
number generation - 16 entry CAM
22Different Types of Memory
Type Width (byte) Size (bytes) Approx unloaded latency (cycles) Notes
Local 4 2560 1 Indexed addressing post incr/decr
On-chip Scratch 4 16K 60 Atomic ops
SRAM 4 256M 150 Atomic ops
DRAM 8 2G 300 Direct path to/fro MSF
23IXA Software Framework
External Processors
Control Plane Protocol Stack
Control Plane PDK
XScale Core
C/C Language
Core Components
Core Component Library
Resource Manager Library
Microengine Pipeline
Microblock Library
Microengine C Language
Micro block
Micro block
Micro block
Protocol Library
Utility Library
Hardware Abstraction Library
24Example Toaster System Cisco 10000
- Almost all data plane operations execute on the
programmable XMC - Pipeline stages are assigned tasks e.g.
classification, routing, firewall, MPLS - Classic SW load balancing problem
- External SDRAM shared by common pipe stages
25IBM PowerNP
- 16 pico-procesors and 1 powerPC
- Each pico-processor
- Support 2 hardware threads
- 3 stage pipeline fetch/decode/execute
- Dyadic Processing Unit
- Two pico-processors
- 2KB Shared memory
- Tree search engine
- Focus is layers 2-4
- PowerPC 405 for control plane operations
- 16K I and D caches
- Target is OC-48
26Motorola C-Port C-5 Chip Architecture
27References
- NPT W. H. Mangione-Smith, G. Memik Network
Processor Technologies - NPRD Patrick Crowley, Raj Yavatkar An
Introduction to Network Processor Research
Design, HPCA-9 Tutorial