Title: Using a Communication Architecture Specification in an Applicationdriven Retargetable Prototyping Pl
1Using a Communication Architecture Specification
in an Application-driven Retargetable Prototyping
Platform for Distributed Processing
- Xinping Zhu, Sharad Malik
- Princeton University, USA
2Outline
- Research Context and Motivation
- Previous Work
- Communication Architecture Description
- Prototype Implementations
- Case Study 3DES application
- Summary and Future Work
3Generic Multiprocessing SoC Architecture
4OCA Actions
Send( to, message, length)
Sender
Receiver
Recv (from, message, length)
Message
Message
Outcoming Packets/flits queue
Incoming Packets/flits queue
flits
Switch A
Switch B
Network
5OCA Structure
101
100
Bus Topology with 8 Nodes
000
001
100
111
PE
PE
PE
PE
010
011
Shared Bus
Cube Topology with 8 Nodes
4x4 Mesh Topology with 16 Nodes
6OCA Microarchitecture
Scheduler
grant
request
out
in
select
Buf West
config
Buf South
in
out
Crossbar 5 x 5
Buf East
out
in
Buf North
out
in
Buf Local
in
out
router ?architecture
grant
7Research Motivation
- Choices for Design Space Exploration
- Enhance system-level design productivity
- Focus on the OCA part
OCA Structure
8Outline
- Research Context and Motivation
- Previous Work
- Communication Architecture Description
- Prototype Implementations
- Case Study 3DES application
- Summary and Future Work
9Our Previous Work Microarchitectural Building
Blocks
- A Hierarchical Modeling Framework
- A Classified Library of Reusable OCA Components
Module
Link
Mux
Duplex Link
CrossBar
Bus Backplane
Buffer
FIFO
Central Pool
Interface
SendInterface
ReceiveInterface
SlaveInterface
MasterInterface
ResourceScheduler
Ref Zhu, Malik, A Hierarchical Modeling
Framework for On-Chip Communication
Architectures, ICCAD02
Allocator
Arbiter
10Our Previous WorkSimulation Environments
- Methodology and Library Successfully Used in Two
Modular Modeling Environments - Implementations
- Liberty Simulation Environment (LSE)
- a fast execution-driven modeling and
simulation framework targeting processor
microarchitecture modeling - SystemC
- A general digital synchronous design framework
which enables system-level design
11Related Previous Work
- Metropolis (UC Berkeley)
- Top-down vs. Bottom Up
- StepNP
- System level design tool for NPU using SystemC
- Functional for now
- Benini et al. (IEEE Computer 36-4, 2003)
- Integrate SystemC and GNU GDB based ISS, No PE
model - Cowares ConvergenSC
- System level modeling and verification
- Multiple LISA 2.0 PE model with complex on-chip
buses - Tensillicas XTMP
- Integrate C-callable Xtensa instruction
simulators - Functional simulation, custom interconnect
12Paper Contributions
- Communication architecture specification
- Simple template based specification for rapid
prototyping - Integration of application, processor and
communication architecture models - Provides for application accurate workloads
instead of statistical/synthesized workloads
13Design Exploration for OCAs
- Specifying OCAs through descriptions
- Evaluating OCA choices
14Outline
- Research Context and Motivation
- Previous Work
- Communication Architecture Description
- Prototype Implementations
- Case Study 3DES application
- Summary and Future Work
15Representing OCAs
- A retargetable OCA description/modeling language
- Control path vs. data path
- Datapath microarchitecture components and
structure - Controlpath how the communication resources are
allocated concurrently - Current emphasis on type and topology
- Controlpath is implicitly encoded in the modules
- Template based short, expressive with C-like
syntax
Datapath
Controlpath
Protocol
Topology
µarch blocks
Timing
OCA
16Topology and Type Based Descriptions Examples
Mesh
Bus
NODE n0, n1, n2, n3 n0.addr 0 n1.addr 1
n2.addr 2 n3.addr 3 CLUSTER
my_bus my_bus.data_width 32 my_bus.buffer_size
64 my_bus.protocol round_robin my_bus
bus (n1, n2, n3, n4)
CLUSTER my_net my_net.init_credit
64 my_net.routing dimension my_net torus
(16)
PE
PE
PE
PE
Shared Bus
17Outline
- Research Context and Motivation
- Previous Work
- Communication Architecture Description
- Prototype Implementations
- Case Study 3DES application
- Summary and Future Work
18Retargetable Simulation Flow
PE
OCA
- Application Model
- Enables us to go beyond statistical/synthetic
traffic patterns - System Architecture includes both PE and OCA
- Flexible Implementation Strategy
- SystemC
- Discrete Event MoC
- Liberty Simulation Environment (LSE)
- Synchornous Reactive MoC
System Architecture Description
Simulation Engine
Model Configuration
Distributed Application Model
SystemC Model
LSE Model
Application Binary
Wrapper
Wrapper
Execution
Execution
Performance
19Integrating PE models
- Need a cycle-accurate PE model to
simulate/execute real-world applications - SimIt-ARM simulator (W. Qin, DATE03)
- Wrapper Strategy
- Define a well-maintained interface between PE and
OCA so that PE details are hidden behind it - Flexible, other PE models could be added
(applicable to commercial PE IPs) - Currently we can use both SystemC and LSE style
of code as wrappers
20Distributed Application Modeling
- currently message passing (C-based)
- Several send/recv message passing primitives are
defined
main.c
arm_mp.h
include arm_mp.h main() if
(ns_arm_get_addr() 0) d
ns_arm_send(1, c, 2) else a -1
do a ns_arm_recv(0, b, 2)
while ( a -1)
/ send / int ns_arm_send(int dest, int value,
int length) / recv, return value -1, then
failed / int ns_arm_recv(int source, int buf,
int length) / get the local PE address
mapping/ int ns_arm_get_addr()
GNU ARM Compiler Suite
21Target Specific Communication Libraries
Sender
Receiver
a ns_arm_recv(0, b, 2)
ns_arm_send(1, c, 2)
ARM assembly
ARM assembly
Message
Message
ldc p6, cr0, r1
stc p8, cr2, r1
Incoming Packets/flits queue
Outcoming Packets/flits queue
flits
Node 1
Node 0
Network
22Outline
- Research Context and Motivation
- Previous Work
- Communication Architecture Description
- Prototype Implementations
- Case Study 3DES application
- Summary and Future Work
23Case Study
- Applications
- 3DES widely used encryption algorithm
- Two subcomponents
- Key exchange (KEY_EX)
- Communication Oriented
- Actual Encryption (3DES)
- Computation Oriented
- System Architecture
- PEs
- 3x3 array of ARM-V PEs
- OCA types
- simple bus (BUS)
- 3x3 2d torus(TORUS)
- fully connected crossbar (FULL)
-
Speedup Comparison of Different Machine
Configurations
24Toolset Evaluation
- Simulation Speed
- Up to 15.8K cycles/s
- on P3 1.1GHz with g
- 4x slower than single PE model counting 9x slow
down due to 9-PE model - Fast prototyping
- Total process of building the system takes less
than 10 minutes after parameters and models are
ready -
Comparison between two simulation platforms
25Outline
- Research Context and Motivation
- Previous Work
- Communication Architecture Description
- Prototype Implementations
- Case Study 3DES application
- Summary and Future Work
26Work in Progress
- Revision of OCA description syntax
- Modeling OCA microarchitecture concurrency using
the Operation State Machine (OSM) model - Fully automated simulator synthesis
- Toolkit software release
Machine Configuration
Execution
Performance
27Summary
- Integrated application, PE and OCA modeling and
simulation for design space exploration - Type and topology based OCA descriptions
- Fast and accurate application driven SoC
prototyping - Proof-of-concept embedded system application
28Acknowledgements
- Part of the MESCAL Project
- Modern Embedded Systems Compilers Architectures
and Languages - Princeton and UC Berkeley
- www.gigascale.org/mescal
- mescal.princeton.edu
- A Gigascale System Research Center (GSRC) effort
- www.gigascale.org
- Funded by DARPA and MARCO
- Liberty Research Group _at_ Princeton
- http//liberty.cs.princeton.edu