Title: Solutions for Real Chip Implementation Issues of NoC and Their Application to MemoryCentric Networks
1Solutions for Real Chip Implementation Issues of
NoC and Their Application to Memory-Centric
Networks-on-Chip
- Donghyun Kim, Kwanho Kim, Joo-Young Kim, Seungjin
Lee and Hoi-Jun Yoo - Dept. of EECS
- Korea Advanced Institute of Science and
Technology (KAIST)
2Outline
- Introduction
- Circuit Techniques for efficient NoC
Implementation - Memory-Centric NoC
- Architecture
- NoC operation
- Performance Evaluation Implementation Results
- Conclusions
3Introduction
- Our direction of NoC implementation
Slim Spider - Hierarchical star
Memory Centric NoC (Hierarchical star Shared
memory)
IIS - Configurable
PROTONE - Star topology
Star
Mesh
80-Tile NoC, Intel
RAW, MIT
Baseband processor NoC, STMicro, et. al.
4Hierarchical Star Topology (1/2)
- NoC topologies in Real Chips
- Basic Topologies
Processing Node (IP)
Crossbar Switch
Star (Heterogeneous)
Mesh (Homogeneous)
- Hierarchical Topologies
... Other Derivatives
Global Star, Local Mesh
Global Star, Local Star
5Hierarchical Star Topology (2/2)
Energy / Packet
Interconnection Area
? Hierarchical star topology has high energy/area
efficiency
5
6Techniques for efficient NoC
- From the Circuit Designers Viewpoint
- Keep it SIMPLE and make the Chip WORK!
- Chip aware protocol HW Complexity
- On-chip serialization Small Area
- Synchronization Clock Complexity
- Low voltage swing link Low Power
- Crossbar switch partial activation Low Power
Fancy Network Concepts are not necessary
Chip implementation and its Performance is more
important than the network itself.
- ISOCC, S.J. Lee, et. al. 2005
7Packet Format and Protocol
- Aligned packet format reduces hardware complexity
-
- Efficient link utilization
- Increased control complexity
- Reduced operation speed
-
- Reduced control/hardware Complexity
- Increased operation speed
- Link utilization may inefficient
7
8On-chip Serialization (1/2)
- Concept and effects of On-chip Serialization
Reduced Link Width
Reduced X-bar Switch
? Proper level of On-chip Serialization improves
NoC performance
9On-chip Serialization (2/2)
? WAFT SERDES removes set-up/hold time overhead
of F/F to achieve high speed operation
9
10Synchronization
- Source synchronous - matched delay synchronizer
Strobe delay Phit delay tPD tBUFA tCQ
tBUFB
10
11Low voltage signaling
- Scheme and transceiver circuits
? Improves power efficiency of global long wires
11
12Crossbar Partial Activation
- Crossbar partial activation reduces unneeded
power dissipation
? Applying crossbar partial activation to a 16x16
switch fabric results in maximum 43 power saving
12
13Memory-Centric NoC
Hierarchical Star Topology NoC
Shared Memory Communication
Shared Memory
SW
PE
PE
SW
PE
PE
- Low overhead inter-
- processor communication
- Low Scalability
- Concurrent data transactions
- Message passing overhead
13
14Memory Centric NoC Architecture (1/2)
- Overall Architecture
- 10 RISC processors
- 8 dual port memories
- 4 Channel controllers
- Hierarchical-star topology packet switching
network - Mesochronous comm.
15Memory Centric NoC Architecture (2/2)
15
16Memory Centric NoC Operation (1/2)
- Overview of the MC-NoC operation
16
17Memory Centric NoC Operation (2/2)
- Valid check logic manages data coherency
17
18Memory Centric NoC - Application
- Target application
- SIFT based object recognition
Overall flow of the SIFT computation
? MC-NoC facilitates data transactions occurred
in SIFT feature calculation
18
19Memory Centric NoC Advantages (1/3)
- Task mapping on conventional mesh NoC
Mapping w/o contention
Contended mapping
19
20Memory Centric NoC Advantages (2/3)
- Task mapping on the proposed MC-NoC
20
21Memory Centric NoC Advantages (3/3)
- Flexibility of task mapping
21
22Memory Centric NoC Performance Report
- Average latency power breakdown
Power Breakdown
Average Latency
22
23Implementation Results
- Chip photograph specification
23
24Conclusion
- Real Chip implementation issues
- Simple architecture and circuits
- Chip performance Network performance
- Memory Centric NoC
- Hybrid of Shared Memory and Star topology
- Low overhead communication
- Concurrent inter-processor communications
- MC-NoC for object recognition processor
- 0.18um CMOS process
- 7.7mm x 5mm, 1.4W at 1.8V and 81.6GOPS
24