CS 260 Lecture 1: Introduction to Network Processors - PowerPoint PPT Presentation

About This Presentation

Title:

CS 260 Lecture 1: Introduction to Network Processors

Description:

Queuing / Scheduling. Data Transformation. Classification. Data ... Queuing, scheduling and policing packet data. 10. 2003 UCR. Applications: IPv4 Routing ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 28

Provided by: defau764

Learn more at: http://www.cs.ucr.edu

Category:

more less

Transcript and Presenter's Notes

Title: CS 260 Lecture 1: Introduction to Network Processors

1
CS 260Lecture 1 Introduction toNetwork
Processors

Instructor L.N. Bhuyan
www.cs.ucr.edu/bhuyan/CS260

2
Outline

Introduction to NP Systems
Relevant Applications
Design Issues and Challenges
Relevant Software and Benchmarks
A case study Intel IXP network processors

3
What are Network Processors

Any device that executes programs to handle
packets in a data network
Examples
Processors on router line cards
Processors in network access equipment

4
Why Network Processors

Current Situation
Data rates are increasing
Protocols are becoming more dynamic and
sophisticated
Protocols are being introduced more rapidly
Processing Elements
GP(General-purpose Processor)
Programmable, Not optimized for networking
applications
ASIC(Application Specific Integrated Circuit)
high processing capacity, long time to develop,
Lack the flexibility
NP(Network Processor)
achieve high processing performance
programming flexibility
Cheaper than GP

5
Typical NP Architecture
6
TCP/IP Model

ISO OSI (Open Systems Interconnection) not fully
implemented
Presentation and Session layers not present in
TCP/IP

7
Processing Tasks
Source Network Processor Tutorial in Micro 34 -
Mangione-Smith Memik
8
Application Categorization

Control-Plane tasks
Less time-critical
Control and management of device operation
Table maintenance, port states, etc.
Data-Plane tasks
Operations occurring real-time on packet path
Core device operations
Receive, process and transmit packets

9
Data Plane Tasks

Media Access Control
Low-level protocol implementation
Ethernet, SONET framing, ATM cell processing,
etc.
Data Parsing
Parsing cell or packet headers for address or
protocol information
Classification
Identify packet against a criteria (filtering /
forwarding decision, QoS, accounting, etc.)
Data Transformation
Transformation of packet data between protocols
Traffic Management
Queuing, scheduling and policing packet data

10
Applications IPv4 Routing
P
P
P
B
A
C
Router

Routers determine next hop and forward packets

11
URL-based switching My NSF Project
www.yahoo.com
Internet
Image Server
APP. DATA
TCP
IP
Application Server
Switch
GET /cgi-bin/form HTTP/1.1 Host www.yahoo.com
HTML Server

Increase efficiency
Tasks
Traverse the packet data (request) for each
arriving packet and classify it
Contains .jpg -gt to image server
Contains cgi-bin/ -gt to application server

12
Organizing Processor Resources

Design decisions
High-level organization
ISA and micro architecture
Memory and I/O integration
Todays commercial NPs
Chip multiprocessors
Most are multithreaded
Exploit little ILP (Cisco does)
No cache
Micro-programmed

13
Architectural Comparisons

High-level organizations
Aggressive superscalar (SS)
Fine-grained multithreaded (FGMT)
Chip multiprocessor (CMP)
Simultaneous multithreaded (SMT)

14
Multithreading

Basic idea
multiple register sets in the processor
fast context switch
switch thread on a cache access (How is this
different than non-blocking cache?)
tolerating local latency vs remote in CC-NUMA
multiprocessors
hybrids
switch on notice
simultaneous multithreading

15
Architectural Comparisons (cont.)
Simultaneous Multithreading
Multiprocessing
Superscalar
Fine-Grained
Coarse-Grained
Time (processor cycle)
Thread 1
Thread 3
Thread 5
Thread 2
Thread 4
Idle slot
16
Tasks and Services
Three Benchmarks used in the experiment
17
Some Challenges

Intelligent Design
Given a selection of programs, a target network
link speed, the best design for the processor
Least area
Least power
Most performance
Write efficient multithreaded programs
NPs have
Heterogeneous computer resources
Non-uniform memory
Multiple interacting threads of execution
Real-time constraints
Make use of resources
How to use special instructions and hardware
assists
Compilers
Hand-coded
Multithreaded programs
Manage access to shared state
Synchronization between threads

18
Benchmarks for Network Processors

NetBench
10 applications
http//cares.icsl.ucla.edu/NetBench
CommBench
8 networking and communications applications
http//ccrc.wustl.edu/wolf/cb/
EEMBC
http//www.eembc.org/benchmark
MediaBench
Transcoders
Some communications applications

19
IXP1200 Block Diagram

StrongARM processing core
Microengines introduce new ISA
I/O
PCI
SDRAM
SRAM
IX PCI-like packet bus
On chip FIFOs
16 entry 64B each

20
IXP1200 Microengine

4 hardware contexts
Single issue processor
Explicit optional context switch on SRAM access
Registers
All are single ported
Separate GPR
2566 1536 registers total
32-bit ALU
Can access GPR or XFER registers
Shared hash unit
1/2/3 values 48b/64b
For IP routing hashing
Standard 5 stage pipeline
4KB SRAM instruction store not a cache!
Barrel shifter

21
IXP 2400 Block Diagram

XScale core replaces StrongARM
Microengines
Faster
More 2 clusters of 4 microengines each
Local memory
Next neighbor routes added between microengines
Hardware to accelerate CRC operations and Random
number generation
16 entry CAM

22
Different Types of Memory
Type Width (byte) Size (bytes) Approx unloaded latency (cycles) Notes
Local 4 2560 1 Indexed addressing post incr/decr
On-chip Scratch 4 16K 60 Atomic ops
SRAM 4 256M 150 Atomic ops
DRAM 8 2G 300 Direct path to/fro MSF
23
IXA Software Framework
External Processors
Control Plane Protocol Stack
Control Plane PDK
XScale Core
C/C Language
Core Components
Core Component Library
Resource Manager Library
Microengine Pipeline
Microblock Library
Microengine C Language
Micro block
Micro block
Micro block
Protocol Library
Utility Library
Hardware Abstraction Library
24
Example Toaster System Cisco 10000

Almost all data plane operations execute on the
programmable XMC
Pipeline stages are assigned tasks e.g.
classification, routing, firewall, MPLS
Classic SW load balancing problem
External SDRAM shared by common pipe stages

25
IBM PowerNP

16 pico-procesors and 1 powerPC
Each pico-processor
Support 2 hardware threads
3 stage pipeline fetch/decode/execute
Dyadic Processing Unit
Two pico-processors
2KB Shared memory
Tree search engine
Focus is layers 2-4
PowerPC 405 for control plane operations
16K I and D caches
Target is OC-48

26
Motorola C-Port C-5 Chip Architecture
27
References

NPT W. H. Mangione-Smith, G. Memik Network
Processor Technologies
NPRD Patrick Crowley, Raj Yavatkar An
Introduction to Network Processor Research
Design, HPCA-9 Tutorial

Write a Comment

User Comments (0)