Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip

Description:

Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim Department of Computer Science ... – PowerPoint PPT presentation

Number of Views:103

Avg rating:3.0/5.0

Slides: 33

Provided by: ucs100

Category:

more less

Transcript and Presenter's Notes

Title: Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip

1
Recursive Partitioning Multicast A
Bandwidth-Efficient Routing for Networks-On-Chip

Lei Wang, Yuho Jin, Hyungjun Kim and
Eun Jung Kim
Department of Computer Science and Engineering
Texas AM University

2
Multi-Core Wave Networks-On-Chip

Uniprocessors hit the power wall.
Multi-processors provide high performance at
lower power budget.
Shared-bus architecture has scalability
limitation.
Networks-On-Chip (NOCs) orchestrate chip-wide
communications towards future many-core
processors.

3
Challenges in On-Chip Communication

High performance
Low communication latency is critical for high
system performance.
Bandwidth-efficient
Well-designed routing algorithms provide high
network throughput.
Power and Area Constraints
Simple topologies and slim routers reduce
communication power consumption and save chip
area.
Efficient Multicast supporting
Cache coherence protocols heavily rely on
multicast or broadcast communication
characteristics.

We propose a bandwidth-efficient routing for
multicast communication in NOCs with low latency
and power consumption.
4
Prior Work in Multicast Communication

Routing Evaluation Criteria for Multicast
Communication Ni93
Multicast in multicomputer system
Tree-based Multicast Routing for DSM
Multiprocessor Torrellas96
Short message multicast in DSM system
Virtual Circuit Tree Multicasting for
NOCsLipasti08
Demonstrate necessity of multicasting on-chip
Propose table-based multicast routing
Region-based Multicast for CMPs Duato08
Multicast routing for irregular topology in CMPs

5
Outline

Motivation
Multicast Router Design
State-of-art Unicast Router Architecture
Replication Schemes
Destination List Management
Recursive Partitioning Multicast (RPM)
Network Partitioning
Routing Rules
Example
Deadlock Avoidance
Evaluation
Conclusion

6
Different Bandwidth Usage Example
Source
Destination
0
1
2
3
0
1
2
3
4
5
6
7
4
5
6
7
8
9
10
11
8
9
10
11
12
13
14
15
12
13
14
15

Left Path requires 11 link traversals, 12 buffer
writes, 15 buffer reads, and 15 crossbar
traversals
Right Path requires 5 link traversals, 6 buffer
writes, 10 buffer reads, and 10 cross-bar
traversals

7
State-of-Art Wormhole Unicast Router
RC
VA
SA
ST
LT
Router
Link
RC VA SA
ST
LT
Link
Router
RC Route Computation VA VC Allocation
SA Switch Allocation ST Switch Traversal
LT Link Traversal
8
What we need in a Multicast Router?

Packet Replication
Synchronous Replication
Asynchronous Replication
Destination List Management
All-destination Encoding
Bit String Encoding
Multiple-region Broadcast Encoding

9
Synchronous Replication
H
Head flit
Time (Cycle)
M
Middle flit
3
2
1
0
Tail flit
T
Output 0
Input 0
T
M
M
H
H
M
Input 1
Output 1
Input 2
Output 2
Output 3
Input 3

Packet replication happens at Switch Traversal
Stage.

10
Asynchronous Replication
H
Head flit
Time (Cycle)
M
Middle flit
3
2
1
0
Tail flit
T
Output 0
Input 0
T
M
M
H
H
M
M
Input 1
Output 1
Input 2
Output 2
Output 3
Input 3
11
Network Partitioning
1
0
Source node
2
N
3
7
W
E
4
8
5
Three Parts (5, 6, 7)
Eight Parts
S
Three Parts (0, 1, 7)
Three Parts (3, 4, 5)
Three Parts (1, 2, 3)
12
Basic Routing Rules

North top right corner.
West top left corner.
South bottom left corner.
East bottom right corner.

N
W
E
S
Source
N
N
E
E
W
W
S
S
Destination
13
Optimized Routing Rules
Source
Destination
Deadlock!!!
14
RPM Example-step 1
Multicast Packet
Source
Destination
Partitioning
M
M
M
15
RPM Example-step 2
Multicast Packet
Source
Destination
Partitioning
M
M
M
M
Ejection
16
RPM Example-step 3
Multicast Packet
Source
Destination
Partitioning
M
M
M
M
17
RPM Example-step 4
Multicast Packet
Source
Destination
Partitioning
M
Ejection
Ejection
M
M
M
M
Ejection
18
RPM Example-step 5
Multicast Packet
Source
Destination
Partitioning
M
Ejection
M
M
19
Deadlock Avoidance

RPM has no turn restrictions, potentially
introducing deadlock.
We use Virtual Network (VN) to avoid deadlock.
Two VNs lie in the same physical network.
Virtual Channels of each port are equally divided
into each virtual network.
Virtual network Id (0 or 1) for each packet is
decided at the source.

20
Evaluation Methodology

Performance Model Cycle-accurate Network
Simulator
Models all router pipeline stages in detail
Highly parameterized
Power Model Orion with both dynamic and leakage
power models

Network configuration
Topology 88 Mesh (66 Mesh, 1010 Mesh, 1616 Mesh)
Routing RPM
VC/Port 4
VC Depth 4
Packet Length (flits) 4
Unicast Traffic Pattern Uniform Random (Bit Complement, Transpose)
Multicast Packet Portion 10 (5, 20, 40, 80)
Multicast Destination Number 0 -16 (uniformly distributed)
21
Uniform Random Traffic
50
40
40

Latency is improved around 50 before network
saturation.
Network throughput is extended 40.

22
Link Utilization
33
45

In low workload, RPM saves 33 link utilization.
In high workload, RPM saves 45 link utlization.

23
Dynamic Power Consumption
50
40
24
Scalability Study-Network Size
Over 50
25
Scalability Study-Multicast Traffic Portion
26
Scalability Study-Destination Number
27
Conclusion

Propose a new multicast routing algorithm,
Recursive Partitioning Multicast (RPM)
Bandwidth-efficient and Scalable
Performance Improvement
Up to 50 latency reduction
33 link utilization reduction
Power Savings
Up to 40 total dynamic power savings
25 crossbar and link power savings