Orion: A PowerPerformance Simulator for Interconnection Networks - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Orion: A PowerPerformance Simulator for Interconnection Networks

Description:

Orion: A Power-Performance Simulator for Interconnection Networks ... Provide designers with a framework for rapid exploration of interconnected -processor systems ... – PowerPoint PPT presentation

Number of Views:211
Avg rating:3.0/5.0
Slides: 22
Provided by: davinc
Category:

less

Transcript and Presenter's Notes

Title: Orion: A PowerPerformance Simulator for Interconnection Networks


1
Orion A Power-Performance Simulator for
Interconnection Networks
MICRO02
  • Hang-Sheng Wang, Xinping Zhu, Li-Shiuan Peh,
    Sharad Malik
  • Princeton University

2
Introduction
  • Single-chip multiprocessor systems are seeing the
    use of interconnection networks as the only
    scalable solution to inter-processor
    communication.
  • Interconnection networks consume a significant
    fraction of power
  • Alpha 21364 25W/125W
  • Mellanox server blade (InfiniBand switch)
    15W/40W

3
Orion
  • A network power-performance simulator
  • plug-and-play router and link components
  • Run different communication workloads
  • Constructed within LSE complete platform for
    exploring interconnected µ-processors
  • Provide designers with a framework for rapid
    exploration of interconnected µ-processor systems
  • Enable research in power-efficient H/W and
    compiler techniques for interconnected
    µ-processors

4
Simulation infrastructure
  • LSE (Liberty Simulation Environment)
  • Constructs concurrent structural models and
    retargetable simulators from a unified structural
    machine description and specification DB.
  • Target fast design space exploration for modern
    µ-processors

5
Building blocks of an interconnection network
  • Message transporting class
  • Links and crossbars
  • Message processing class
  • Sources, sinks, buffers and arbiters
  • All modules support different types of
    operational and timing behavior depending on the
    dynamic configuration
  • Can construct a wide range of interconnection
    networks through careful parameterization

6
Component power modeling
7
Power modeling Discussion
  • Architectural-level modeling
  • Estimation based on transistor count and area can
    only useful for average power
  • Information such as transistor count and area is
    typically not available at the time of
    architectural exploration
  • Model hierarchy and reusability
  • Maximize reuse of our power models
  • Can extend them to new microarchitectures

8
Power modeling Discussion
  • Validation
  • Against measured power
  • Against low-level power estimation tools
  • Alpha 21364, InfiniBand switch
  • Link power modeling
  • Plug in actual power numbers of specific links
    obtained from published datasheets
  • Developing parameterized link power models

9
Walkthrough example of a simple wormhole router
10
Case Studies
Exploring different configurations
Exploring different workloads (feedback compiler
or application programmer)
Exploring new microarchitectures
11
Experimental setup
  • 16 node network, 4x4 torus
  • Credit-based flow control
  • Source dimension-ordered routing
  • Uniformly distributed traffic to random
    destinations
  • 59 modules
  • Simulator size 5.2MB
  • 1000 simulation cycles/s

12
Exploring different configurations
  • Wormhole vs. Virtual-channel routers
  • Wormhole router with 64-flit input buffer per
    port (WH64)
  • VC router with 2 VCs per port and 8-flit input
    buffer per VC (VC16)
  • VC router with 8 VCs per port and 8-flit input
    buffer per VC (VC64)
  • VC router with 8 VCs per port and 16-flit input
    buffer per VC (VC128)
  • VC router 3-stage router pipeline
  • WH 2-stage router pipeline

13
WH vs. VC average packet latency
saturation
VC16 outperforms WH64 despite having small buffer
14
WH vs. VC total network power
saturation
VC64 dissipates approximately the same amount of
power as WH64 despite VC requires more complex
hardware.
15
VC64 average power breakdown
Buffer and crossbar are the dominant power
consumers. (85) E(VC128) E(VC64) E(VC 16),
L(VC128) L(VC64) lt L(VC16) Arbiter consumes
less than 1. E(VC) E(WH)
16
Exploring different workloads
  • Broadcast vs. uniform traffic

(1,2)
Broadcast workload change L?L/4 ?L/8 YX routing
(1,1) and (1,3) consumes higher power than
(0,2) and (2,2) All nodes with the same x
coordinate have identical power
consumption. Orion can be interfaced with actual
communication traces.
17
Exploring a new microarchitectural technique
  • Central buffered routers (CB)
  • Shared central buffer forwards flits between
    input and output ports
  • Deployed in IBM SP/2 and InfiniBand routers
  • Higher throughput over input-buffered
    crossbar-based routers (XB)
  • No head-of-line blocking
  • Configurations same area
  • Chip-to-chip 4x4 network

18
(No Transcript)
19
CB vs. XB
  • Performance
  • Random traffic CB lt XB
  • Due to the fewer of ports in CB (25)
  • Broadcast traffic CB gt XB
  • Packets from the same input port need not line up
    behind one another if they are destined for
    different output ports.
  • Power
  • Random traffic CB gt XB
  • Broadcast traffic CB XB
  • Central buffer consumes much more energy than a
    crossbar due to its higher switching capacitance

20
On-chip vs. chip-to-chip
  • On-chip
  • Links take up less than 15 of node power
  • Power consumption depends heavily on traffic
  • Chip-to-chip
  • Links take up more than 70
  • Traffic insensitive

21
Conclusions
  • Orion
  • An architecture-level power-performance simulator
    for interconnection networks that provides a
    platform for rapid exploration of
    power-performance tradeoffs
  • Future works
  • Extensive modules with detailed validation
Write a Comment
User Comments (0)
About PowerShow.com