Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip

Description:

Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim Department of Computer Science ... – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 33
Provided by: ucs100
Category:

less

Transcript and Presenter's Notes

Title: Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip


1
Recursive Partitioning Multicast A
Bandwidth-Efficient Routing for Networks-On-Chip
  • Lei Wang, Yuho Jin, Hyungjun Kim and
  • Eun Jung Kim
  • Department of Computer Science and Engineering
  • Texas AM University

2
Multi-Core Wave Networks-On-Chip
  • Uniprocessors hit the power wall.
  • Multi-processors provide high performance at
    lower power budget.
  • Shared-bus architecture has scalability
    limitation.
  • Networks-On-Chip (NOCs) orchestrate chip-wide
    communications towards future many-core
    processors.

3
Challenges in On-Chip Communication
  • High performance
  • Low communication latency is critical for high
    system performance.
  • Bandwidth-efficient
  • Well-designed routing algorithms provide high
    network throughput.
  • Power and Area Constraints
  • Simple topologies and slim routers reduce
    communication power consumption and save chip
    area.
  • Efficient Multicast supporting
  • Cache coherence protocols heavily rely on
    multicast or broadcast communication
    characteristics.

We propose a bandwidth-efficient routing for
multicast communication in NOCs with low latency
and power consumption.
4
Prior Work in Multicast Communication
  • Routing Evaluation Criteria for Multicast
    Communication Ni93
  • Multicast in multicomputer system
  • Tree-based Multicast Routing for DSM
    Multiprocessor Torrellas96
  • Short message multicast in DSM system
  • Virtual Circuit Tree Multicasting for
    NOCsLipasti08
  • Demonstrate necessity of multicasting on-chip
  • Propose table-based multicast routing
  • Region-based Multicast for CMPs Duato08
  • Multicast routing for irregular topology in CMPs

5
Outline
  • Motivation
  • Multicast Router Design
  • State-of-art Unicast Router Architecture
  • Replication Schemes
  • Destination List Management
  • Recursive Partitioning Multicast (RPM)
  • Network Partitioning
  • Routing Rules
  • Example
  • Deadlock Avoidance
  • Evaluation
  • Conclusion

6
Different Bandwidth Usage Example
Source
Destination
0
1
2
3
0
1
2
3
4
5
6
7
4
5
6
7
8
9
10
11
8
9
10
11
12
13
14
15
12
13
14
15
  • Left Path requires 11 link traversals, 12 buffer
    writes, 15 buffer reads, and 15 crossbar
    traversals
  • Right Path requires 5 link traversals, 6 buffer
    writes, 10 buffer reads, and 10 cross-bar
    traversals

7
State-of-Art Wormhole Unicast Router
RC
VA
SA
ST
LT
Router
Link
RC VA SA
ST
LT
Link
Router
RC Route Computation VA VC Allocation
SA Switch Allocation ST Switch Traversal
LT Link Traversal
8
What we need in a Multicast Router?
  • Packet Replication
  • Synchronous Replication
  • Asynchronous Replication
  • Destination List Management
  • All-destination Encoding
  • Bit String Encoding
  • Multiple-region Broadcast Encoding

9
Synchronous Replication
H
Head flit
Time (Cycle)
M
Middle flit
3
2
1
0
Tail flit
T
Output 0
Input 0
T
M
M
H
H
M
Input 1
Output 1
Input 2
Output 2
Output 3
Input 3
  • Packet replication happens at Switch Traversal
    Stage.

10
Asynchronous Replication
H
Head flit
Time (Cycle)
M
Middle flit
3
2
1
0
Tail flit
T
Output 0
Input 0
T
M
M
H
H
M
M
Input 1
Output 1
Input 2
Output 2
Output 3
Input 3
11
Network Partitioning
1
0
Source node
2
N
3
7
W
E
4
8
5
Three Parts (5, 6, 7)
Eight Parts
S
Three Parts (0, 1, 7)
Three Parts (3, 4, 5)
Three Parts (1, 2, 3)
12
Basic Routing Rules
  • North top right corner.
  • West top left corner.
  • South bottom left corner.
  • East bottom right corner.

N
W
E
S
Source
N
N
E
E
W
W
S
S
Destination
13
Optimized Routing Rules
Source
Destination
Deadlock!!!
14
RPM Example-step 1
Multicast Packet
Source
Destination
Partitioning
M
M
M
15
RPM Example-step 2
Multicast Packet
Source
Destination
Partitioning
M
M
M
M
Ejection
16
RPM Example-step 3
Multicast Packet
Source
Destination
Partitioning
M
M
M
M
17
RPM Example-step 4
Multicast Packet
Source
Destination
Partitioning
M
Ejection
Ejection
M
M
M
M
Ejection
18
RPM Example-step 5
Multicast Packet
Source
Destination
Partitioning
M
Ejection
M
M
19
Deadlock Avoidance
  • RPM has no turn restrictions, potentially
    introducing deadlock.
  • We use Virtual Network (VN) to avoid deadlock.
  • Two VNs lie in the same physical network.
  • Virtual Channels of each port are equally divided
    into each virtual network.
  • Virtual network Id (0 or 1) for each packet is
    decided at the source.

20
Evaluation Methodology
  • Performance Model Cycle-accurate Network
    Simulator
  • Models all router pipeline stages in detail
  • Highly parameterized
  • Power Model Orion with both dynamic and leakage
    power models

Network configuration
Topology 88 Mesh (66 Mesh, 1010 Mesh, 1616 Mesh)
Routing RPM
VC/Port 4
VC Depth 4
Packet Length (flits) 4
Unicast Traffic Pattern Uniform Random (Bit Complement, Transpose)
Multicast Packet Portion 10 (5, 20, 40, 80)
Multicast Destination Number 0 -16 (uniformly distributed)
21
Uniform Random Traffic
50
40
40
  • Latency is improved around 50 before network
    saturation.
  • Network throughput is extended 40.

22
Link Utilization
33
45
  • In low workload, RPM saves 33 link utilization.
  • In high workload, RPM saves 45 link utlization.

23
Dynamic Power Consumption
50
40
24
Scalability Study-Network Size
Over 50
25
Scalability Study-Multicast Traffic Portion
26
Scalability Study-Destination Number
27
Conclusion
  • Propose a new multicast routing algorithm,
    Recursive Partitioning Multicast (RPM)
  • Bandwidth-efficient and Scalable
  • Performance Improvement
  • Up to 50 latency reduction
  • 33 link utilization reduction
  • Power Savings
  • Up to 40 total dynamic power savings
  • 25 crossbar and link power savings

28
Thank you!
29
Backup
30
Hardware Implementation of Routing logic
31
Bit Complement Traffic
32
Transpose Traffic
Write a Comment
User Comments (0)
About PowerShow.com