CS 260 Lecture 1: Introduction to Network Processors - PowerPoint PPT Presentation

About This Presentation
Title:

CS 260 Lecture 1: Introduction to Network Processors

Description:

Queuing / Scheduling. Data Transformation. Classification. Data ... Queuing, scheduling and policing packet data. 10. 2003 UCR. Applications: IPv4 Routing ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 28
Provided by: defau764
Learn more at: http://www.cs.ucr.edu
Category:

less

Transcript and Presenter's Notes

Title: CS 260 Lecture 1: Introduction to Network Processors


1
CS 260Lecture 1 Introduction toNetwork
Processors
  • Instructor L.N. Bhuyan
  • www.cs.ucr.edu/bhuyan/CS260

2
Outline
  • Introduction to NP Systems
  • Relevant Applications
  • Design Issues and Challenges
  • Relevant Software and Benchmarks
  • A case study Intel IXP network processors

3
What are Network Processors
  • Any device that executes programs to handle
    packets in a data network
  • Examples
  • Processors on router line cards
  • Processors in network access equipment

4
Why Network Processors
  • Current Situation
  • Data rates are increasing
  • Protocols are becoming more dynamic and
    sophisticated
  • Protocols are being introduced more rapidly
  • Processing Elements
  • GP(General-purpose Processor)
  • Programmable, Not optimized for networking
    applications
  • ASIC(Application Specific Integrated Circuit)
  • high processing capacity, long time to develop,
    Lack the flexibility
  • NP(Network Processor)
  • achieve high processing performance
  • programming flexibility
  • Cheaper than GP

5
Typical NP Architecture
6
TCP/IP Model
  • ISO OSI (Open Systems Interconnection) not fully
    implemented
  • Presentation and Session layers not present in
    TCP/IP

7
Processing Tasks
Source Network Processor Tutorial in Micro 34 -
Mangione-Smith Memik
8
Application Categorization
  • Control-Plane tasks
  • Less time-critical
  • Control and management of device operation
  • Table maintenance, port states, etc.
  • Data-Plane tasks
  • Operations occurring real-time on packet path
  • Core device operations
  • Receive, process and transmit packets

9
Data Plane Tasks
  • Media Access Control
  • Low-level protocol implementation
  • Ethernet, SONET framing, ATM cell processing,
    etc.
  • Data Parsing
  • Parsing cell or packet headers for address or
    protocol information
  • Classification
  • Identify packet against a criteria (filtering /
    forwarding decision, QoS, accounting, etc.)
  • Data Transformation
  • Transformation of packet data between protocols
  • Traffic Management
  • Queuing, scheduling and policing packet data

10
Applications IPv4 Routing
P
P
P
B
A
C
Router
  • Routers determine next hop and forward packets

11
URL-based switching My NSF Project
www.yahoo.com
Internet
Image Server
APP. DATA
TCP
IP
Application Server
Switch
GET /cgi-bin/form HTTP/1.1 Host www.yahoo.com
HTML Server
  • Increase efficiency
  • Tasks
  • Traverse the packet data (request) for each
    arriving packet and classify it
  • Contains .jpg -gt to image server
  • Contains cgi-bin/ -gt to application server

12
Organizing Processor Resources
  • Design decisions
  • High-level organization
  • ISA and micro architecture
  • Memory and I/O integration
  • Todays commercial NPs
  • Chip multiprocessors
  • Most are multithreaded
  • Exploit little ILP (Cisco does)
  • No cache
  • Micro-programmed

13
Architectural Comparisons
  • High-level organizations
  • Aggressive superscalar (SS)
  • Fine-grained multithreaded (FGMT)
  • Chip multiprocessor (CMP)
  • Simultaneous multithreaded (SMT)

14
Multithreading
  • Basic idea
  • multiple register sets in the processor
  • fast context switch
  • switch thread on a cache access (How is this
    different than non-blocking cache?)
  • tolerating local latency vs remote in CC-NUMA
    multiprocessors
  • hybrids
  • switch on notice
  • simultaneous multithreading

15
Architectural Comparisons (cont.)
Simultaneous Multithreading
Multiprocessing
Superscalar
Fine-Grained
Coarse-Grained
Time (processor cycle)
Thread 1
Thread 3
Thread 5
Thread 2
Thread 4
Idle slot
16
Tasks and Services
Three Benchmarks used in the experiment
17
Some Challenges
  • Intelligent Design
  • Given a selection of programs, a target network
    link speed, the best design for the processor
  • Least area
  • Least power
  • Most performance
  • Write efficient multithreaded programs
  • NPs have
  • Heterogeneous computer resources
  • Non-uniform memory
  • Multiple interacting threads of execution
  • Real-time constraints
  • Make use of resources
  • How to use special instructions and hardware
    assists
  • Compilers
  • Hand-coded
  • Multithreaded programs
  • Manage access to shared state
  • Synchronization between threads

18
Benchmarks for Network Processors
  • NetBench
  • 10 applications
  • http//cares.icsl.ucla.edu/NetBench
  • CommBench
  • 8 networking and communications applications
  • http//ccrc.wustl.edu/wolf/cb/
  • EEMBC
  • http//www.eembc.org/benchmark
  • MediaBench
  • Transcoders
  • Some communications applications

19
IXP1200 Block Diagram
  • StrongARM processing core
  • Microengines introduce new ISA
  • I/O
  • PCI
  • SDRAM
  • SRAM
  • IX PCI-like packet bus
  • On chip FIFOs
  • 16 entry 64B each

20
IXP1200 Microengine
  • 4 hardware contexts
  • Single issue processor
  • Explicit optional context switch on SRAM access
  • Registers
  • All are single ported
  • Separate GPR
  • 2566 1536 registers total
  • 32-bit ALU
  • Can access GPR or XFER registers
  • Shared hash unit
  • 1/2/3 values 48b/64b
  • For IP routing hashing
  • Standard 5 stage pipeline
  • 4KB SRAM instruction store not a cache!
  • Barrel shifter

21
IXP 2400 Block Diagram
  • XScale core replaces StrongARM
  • Microengines
  • Faster
  • More 2 clusters of 4 microengines each
  • Local memory
  • Next neighbor routes added between microengines
  • Hardware to accelerate CRC operations and Random
    number generation
  • 16 entry CAM

22
Different Types of Memory
Type Width (byte) Size (bytes) Approx unloaded latency (cycles) Notes
Local 4 2560 1 Indexed addressing post incr/decr
On-chip Scratch 4 16K 60 Atomic ops
SRAM 4 256M 150 Atomic ops
DRAM 8 2G 300 Direct path to/fro MSF
23
IXA Software Framework
External Processors
Control Plane Protocol Stack
Control Plane PDK
XScale Core
C/C Language
Core Components
Core Component Library
Resource Manager Library
Microengine Pipeline
Microblock Library
Microengine C Language
Micro block
Micro block
Micro block
Protocol Library
Utility Library
Hardware Abstraction Library
24
Example Toaster System Cisco 10000
  • Almost all data plane operations execute on the
    programmable XMC
  • Pipeline stages are assigned tasks e.g.
    classification, routing, firewall, MPLS
  • Classic SW load balancing problem
  • External SDRAM shared by common pipe stages

25
IBM PowerNP
  • 16 pico-procesors and 1 powerPC
  • Each pico-processor
  • Support 2 hardware threads
  • 3 stage pipeline fetch/decode/execute
  • Dyadic Processing Unit
  • Two pico-processors
  • 2KB Shared memory
  • Tree search engine
  • Focus is layers 2-4
  • PowerPC 405 for control plane operations
  • 16K I and D caches
  • Target is OC-48

26
Motorola C-Port C-5 Chip Architecture
27
References
  • NPT W. H. Mangione-Smith, G. Memik Network
    Processor Technologies
  • NPRD Patrick Crowley, Raj Yavatkar An
    Introduction to Network Processor Research
    Design, HPCA-9 Tutorial
Write a Comment
User Comments (0)
About PowerShow.com