NepSim:A Network Processor Simulator with a Power Evaluation Framework - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

NepSim:A Network Processor Simulator with a Power Evaluation Framework

Description:

our simulator vs. Intel's SDK -Several advantages- Enable new architecture design ... A simple of IP forwarding software provided in Intel's SDK ... – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 34
Provided by: cialCsie
Category:

less

Transcript and Presenter's Notes

Title: NepSim:A Network Processor Simulator with a Power Evaluation Framework


1
  • NepSimA Network Processor Simulator with a Power
    Evaluation Framework
  • Yan Luo, Jun Yang, Laxmi N. Bhuyan, and Li Zhao
  • University of California, Riverside
  • By ???

2
outline
  • Introduction
  • Cycle-level simulation
  • Validation
  • Power modeling
  • Power and performance analysis
  • Reducing processing core power

3
Introduction (1)
  • Network Processor (NP)
  • Providing both high performance and flexibility
    in building powerful routers.
  • Exponential increase in clock frequency and core
    complexity, power dissipation(??) will become a
    major design consideration in NP development.
  • NPs have cycle-accurate architecture simulators
    for commercial NPs (not open source)
  • Intel ?Software Development Kit (SDK)
  • Motorola ? C-Ware
  • These simulators (above) dont incorporate power
    modeling and evaluation

4
Introduction (2)
  • NePSim
  • Includes a cycle-accurate architecture simulator
  • An automatic formal verification engine
  • Parameterizable power estimator
  • Execution cores
  • Memory controllers
  • I/O ports
  • Packet buffers
  • High-speed buses
  • We define our system to comply with IXP1200

5
Introduction (3)
  • We propose low-power tech. tailored to NPs and
    using our NePSim system.
  • Dynamic voltage scaling (DVS)
  • Adopted it to each execution core
  • Observing abundant idle time (avg.10-23)
    resulting from contention in the shared memory.
  • Achieved
  • 17 power savings for the NP over four
    application benchmarks
  • Less than 6 performance loss

6
Cycle-level simulation
  • High-level overview of the IXP1200 , then
    describe our simulator software structure
  • Background Intel IXP1200 and its microengines
  • The simulator

7
Background Intel IXP1200 and its microengines
  • StrongARM
  • 6 MEs
  • Standard memory interfaces
  • SDRAM
  • SRAM
  • High-speed bus interfaces
  • IX bus

8
The simulator NepSim
  • Implements most functionalities of the IXP1200
  • Not model StrongARM core, its main task is
    control plane function that dont affect the
    critical path of packet processing.

Why use microcode? (not binary)
  • Leave room for instruction
  • extensions in future research
  • Easier to modify the program

9
NePSim body-the ME simulation core
  • The nepsim body is the module simulates the
    following 5 stages of the ME pipeline
  • Instruction lookup
  • Initial instruction decoding and formation of the
    source register address
  • Reading of operands from the source registers
  • ALU operations, shift or compare operations, and
    generation of condition codes and
  • Writing of result to destination register

10
Model components
  • Device implements I/O devices such as I/O ports
    and the MACs.
  • Dlite resembles the debugger in SimpleScalar.
  • Ex lets users set breakpoints, print pipeline
    status, display register values, and dump memory
    content.
  • Enable configurations
  • Different clock rates and supply voltages of MEs
  • Configure the SRAM and SDRAM with different
    latencies and bandwidths
  • Incoming traffic with different arrival tares and
    patterns

11
our simulator vs. Intels SDK -Several
advantages-
  • Enable new architecture design
  • Permits number of MEs and threads to vary
  • Provides instruction set extensibility in
    microcode assembly code
  • Provides faster execution speed

12
validation
  • Avg error of 1 in thoughput and 6 in avg
    processing time across the 4 benchmarks.
  • The simulation can produce relatively dependable
    results.

13
Power modeling
  • The IXP1200 uses 0.28um technology.
  • We use 0.25um technology because it is the
    closest available feature size to 0.28um.
  • The IXP cores power, excluding I/O, is 4.5W at
    232MHz , include
  • All the MEs ( 0.468W/each )
  • Memory units ( SRAM0.0639W SDRAM0.0643W )
  • IX bus unit ( 0.363W )
  • StrongARM ( 0.5W )
  • 0.46860.3630.06390.06430.5 3.8W
  • 4.5 - 3.8 0.7W
  • Result from our use of a smaller tech.,0.25um
    instead of 0.28um
  • Didnt model internal buses and the clock.

14
Power and performance analysis (1)
  • Assume
  • Max packet arrival rate
  • 16 Ethernet interfaces for receiving
  • 16 Ethernet interfaces for transmitting
  • SRAM SDRAM frequency is 116MHz

15
Power and performance analysis (2)
  • Benchmarks
  • Ipfwdr
  • url
  • nat
  • md4
  • 4 receiving MEs and 2 transmitting MEs.
  • Researchers have tested this 42 ratio to provide
    maximum throughput, and we adopted this
    configuration throughout our experiments.

16
Benchmark descriptions-ipfwdr
  • Ipfwdr
  • A simple of IP forwarding software provided in
    Intels SDK
  • Processing includes Ethernet and IP header
    validation and trie-based touting-table lookup.
  • Routing table resides in SRAM, and the output
    port information is in SDRAM
  • Next hop router on the basis of output port
    information.

17
Benchmark descriptions-URL
  • URL
  • Routes packets on the basis of their contained
    URL request.
  • Often examine the payload of packets when
    processing them
  • Performs a string-matching algorithm that we
    ported from NetBench.
  • String patterns are initialized in SRAM, urls
    code must generate SRAM accesses in later
    comparisons.
  • Also, they must be scanned for pattern matching,
    many requests are generated to SDRAM, which
    stores payload data

18
Benchmark descriptions-nat
  • Nat-network address translation
  • Use the source and destination IP addresses and
    port numbers to compute an index
  • Index serves as a hash-table lookup to retrieve a
    replacement address and port.
  • Each packet accesses the SRAM to look up the hash
    table.
  • SDRAM access arent necessary.

19
Benchmark descriptions-md4
  • Md4
  • The md4 algorithm works on arbitrary-length
    messages and provides a 128-bit fingerprint, or
    digital signature.
  • Use it to implement a Secure Sockets Layer or
    firewall at the edge routers
  • Moves data packets from SDRAM to SRAM and
    accesses SRAM multiple times to compute the
    digital signature.
  • The program is both memory and computation
    intensive.

20
Performance observations (1)
  • The impact of having more MEs on the total packet
    throughput
  • For memory-intensive benchmarks (url md4)
  • Increasing thread means increasing memory
    contention
  • Because memories are shared among all threads.
  • Double the core frequency doesnt double the
    throughput.
  • strange result decrease, nat is not memory bound.

21
Performance observations (2)
  • Nat doesnt have much ME idle time, even the
    ME-to-memory speed ratio is 41
  • Implies that all MEs are busy
  • Receiving fast enough, but transmitting dont
    release memory slots fast enough.
  • Receiving
  • Busy requesting new memory slot
  • Transmitting
  • Busy sending packets
  • R/T ME ratio 33 might better than traditional
    42 configuration.

22
What does the power go? (1)
  • IXP1200s power distribution among the MEs.
  • 0-3receiving4-5transmitting

23
What does the power go? (2)
  • Power consumption
  • ALU 45
  • control 28
  • Instruction operands and results reside 13
  • Static power 7

24
Performance and power observations
  • Performance variation is consistent with the
    power variation
  • Both performance and power consumption grow with
    the addition of MEs, except for nats
    performance.
  • 2 MEs consume less than twice the power of one ME
    because ME idle time increases
  • The gap btw. the 2 curves widens as the of MEs
    increases.
  • Power consumption increases faster than
    performance.

25
(No Transcript)
26
Reducing processing core power
  • Wide use of dynamic voltage scaling (DVS) to
    conserve power in MEs.
  • Reducing voltage and frequency when the processor
    has low activity
  • And increasing them when theres a demand for
    peak processor performance.
  • These range comply with Intels IXP2400
    configurations.
  • Frequency 600MHz-400MHz
  • Voltage 1.3V-1.1V

27
DVS policy
  • ME idle time is abundant because most of the
    benchmarks are memory bound.
  • Applying DVS while MEs are not very active

28
DVS scheme
  • Using hardware to observe ME idle time
    periodically.
  • The percentage of idle time in a past period
  • exceeds a threshold, scale down the voltage and
    frequency (VF).
  • Below the threshold, scale up the VF.

29
Deploying DVS
  • The hardware required to implement the DVS policy
    is trivial.
  • Timer signals after a certain number of cycles
  • accumulator counts the of ME idle cycles
  • When timer signals, the accumulator compares its
    result with a preinitialized value T.
  • EX
  • T20000
  • Idle time Threshold 10 of T 2000

30
Control mechanism
31
  • DVS can save up to 17 of power consumption with
    a performance loss of less than 6
  • DVS hardly affects throughput because the MEs
    have enough idle cycles to cover the stall
    penalty.
  • Also tested using thresholds other than 10 and
    achieved similar results.

32
(No Transcript)
33
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com