Title: NoC Symposium07 Panel Proliferating the Use and Acceptance of NoC Benchmark Standards
1 NoC Symposium07 Panel Proliferating the Use
and Acceptance of NoC Benchmark Standards
- Timothy M. Pinkston
- National Science Foundation (NSF)
- tpinksto_at_nsf.gov
- University of Southern California (USC)
- tpink_at_usc.edu
2Driving Forces
Arch
Apps
Architecture
Tech
defines how system functions are supported
Alg SW
Applications
Implementation (Circuit) Technology
define what system functions should be supported
defines the extent to which desired system
functions can be implemented in hardware
Trends Towards On-chip Networked Microsystems,
T. Pinkston and J. Shin, IJHPCN.
(http//ceng.usc.edu/smart/publications/archives/C
ENG-2004-17.pdf)
3 Need for a NoC Benchmark Suite
Is There a
?
- A sampling of benchmark suites already out there
Gen-Purpose/PC
Embedded/SoC
Sci-Eng/HPC
SPEC CPU
STREAM
HPL
-2006
SPLASH
CPU2
EEMBC
-2
Netperf
MiBench
LINPACK
MediaBench
LAPACK
Dhry-/Whetstone
BAPCo SYSmark
ALPBench
ScaLAPACK
NPB (NAS PB)
GraalBench
BYTEmark
LFK (Livermore)
LMBench
NPCryptBench
SparseBench
CommBench
LLCbench
DMABench
BioBench
- Do we really need yet another benchmark suite?
4December 2006 NSF OCIN Workshop
Recommendations(www.ece.ucdavis.edu/ocin06)
- A set of standard workloads/benchmarks and
evaluation methods are needed to enable realistic
evaluation and uniform (fair) comparison between
various approaches
- Need for cooperation (agreement) between academia
and industry - Need for qualified performance metrics latency
and bandwidth under power, energy, thermal,
reliability, area, etc., constraints - Need for standardization of metrics clear
definition of what is being represented by
metrics (e.g., network latency, throughput,...) - Need for effective alternatives to time consuming
full-system execution-driven simulation,
including use of microbenchmarks, parameterized
synthetic traffic/workloads, traces, etc. - Need for accurate characterization and modelling
of system traffic behavior across various
domains general-purpose embedded - Need for analytical methods (complementary to
simulation) to explore and quantitatively
narrow-down the large design space
Challenges in Computer Architecture Evaluation,
K. Skadron, M. Martonosi, D. August, M. Hill, D.
Lilja, V. Pai, in IEEE Computer, pp. 30-36,
August 2003.
5Meaning of Latency and Throughput
- Latency fabric only, endnode-to-endnode, ave.,
no-load, saturation? - Throughput peak, sustained, saturation,
best-case, worst-case?
Simulation 3-D Torus, 4,096 nodes (16 ? 16 ?
16), uniform traffic load, virtual cut-through
switching, three-phase arbitration, 2 and 4
virtual channels. Bubble flow control is used in
dimension order on one virtual channel the other
virtual channel(s) is supplied in dimension order
(deterministic routing) or along any shortest
path to destination (adaptive routing).
6Simple (Analytical) Latency and Throughput Models
- HP Int.Net. chapter ceng.usc.edu/smart/slides/a
ppendixE.html - Network traffic pattern/load determine s g ,
traffic-dependent parameters - Topology and switch marchitecture determine d, Tr
, Ta , Ts , BWBisection - Routing, switching, FC, march, etc., influence
network efficiency factor, r - internal switch speedup reduction of contention
within switches - buffer organizations to mitigate HOL blocking in
and across switches - balance load across network links maximally
utilize link bandwidth - r rL x rR x rA x rS x rmArch x ,
architecture-dependent parameters
7Modeling Throughput of Cell BE EIB (Worst-Case)
BWNetwork ? BWBisection /g
BWNetwork ? 204.8 /1 GB/s
78 GB/s (measured)
Injection bandwidth 25.6 GB/s per element
g 1
Reception bandwidth 25.6 GB/s per element
s 1
Command Bus Bandwidth
BWBisection 8 links 204.8
GB/s
204.8 GB/s
Aggregate bandwidth
Network injection
Network reception
Peak BWNetwork of 25.6 GB/s x 3 x 4 307.2 GB/s
(4 rings each with 12 links)
(12 Nodes)
(12 Nodes)
1,228.8 GB/s
(3 transfers per ring)
307.2 GB/s
307.2 GB/s
r limited, at best, to only 50 due to ring
interferrence
Traffic pattern determines s g
8Integer Programs
Floating-Point Programs
Ref Hennessy Patterson, Computer
Architecture A Quantitative Approach, 4th Ed.
9In Conclusion Answers to Panel Questions
- What are the hallmarks of successful benchmark
suites?
- Fairness represent the proper workload
behavior/characteristics - Portability open, free access, not
architecture/vendor-specific - Transparency yield reproducible performance
results (reporting) - Evolutionary adaptable over time in composition
and reporting
- How can industry and academia facilitate use?
- Establish need/importance for common evaluation
best-practices - Cross-cutting effort architects, circuit
designers, CAD researchers - Need to place high value on developing and using
eval. standards
- What are the main obstacles to establishing a de
facto NoC standard benchmark suite, and how to
address?
- Capturing the diversity of NoC applications
computing domains - Red herrings ? converge on performance evaluation
standards and agree on characteristic traffic
loads and/or microbenchmarks - Ultimately, system-level performance is
important, not component