The Power of Priority: NoC based Distributed Cache Coherency - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

The Power of Priority: NoC based Distributed Cache Coherency

Description:

NoC based Distributed Cache Coherency ... Issues in NUCA-based CMP. NoC performance CMP performance ... RD Delay - Apache. RD/RX Delay Reduction - Apache ... – PowerPoint PPT presentation

Number of Views:176
Avg rating:3.0/5.0
Slides: 23
Provided by: evgenyb6
Category:

less

Transcript and Presenter's Notes

Title: The Power of Priority: NoC based Distributed Cache Coherency


1
The Power of PriorityNoC based Distributed
Cache Coherency
EE Department Technion, Haifa, Israel
Evgeny Bolotin, Zvika Guz, Israel Cidon, Ran
Ginosar, Avinoam Kolodny
QNoC Research GroupTechnion
2
Chip Multi-Processor (CMP)
Multi-Core Large cache Shared cache
Distributed cache NoC-based How?
Dual-Core Monolithic shared cache
3
Future Cache - Physics Perspective
  • Large cache ? Large access time

Large monolithic cache is not scalable
4
NUCA - Non Uniform Cache Architecture
  • Banked cache over NoC
  • Smaller bank ? Smaller Access Time
  • Multiple banks ? Multiple Ports
  • Closer bank ? Smaller Access Time

NUCA Non uniform access times
  • Cache-line placement policy
  • Static NUCA (SNUCA)
  • Dynamic NUCA (DNUCA)

Sources Kim et al. ASPLOS 2002 Beckmann et al.
MICRO 2004
5
Issues in NUCA-based CMP
  • NoC performance ? CMP performance
  • Cache coherency and transaction order
    (correctness)
  • Search (in DNUCA)
  • Different traffic types (e.g. fetch vs. prefetch)
  • Synchronization (locks)

NoC Services for CMP?
6
Cache Coherency over NoC
How do we maintain coherency over NoC?
  • Distributed directory
  • Snooping
  • Central directory

7
Distributed Cache Coherency
Cache access ? Multiple NoC transactions
Example Simple read transaction
8
Read Transaction of Modified Block
9
Read Exclusive of Shared Block
10
Basic NoC to Support CMP
  • Off-the-shelf (Vanilla) NoC
  • Grid of wormhole routers
  • Unicast only
  • Ordering in network
  • Static routing
  • No virtual channels
  • Smart interfaces

Can We Do Better?
11
Observations L2 Access
A) Delay Queueing NoC transactions
B) All NoC transactions are equally important
  • C) NoC transactions consist of
  • Short ctrl. packets
  • Long data packets

Idea Differentiate between Ctrl. and Data
Solution Preemptive Priority NoC ? Give priority
to short ctrl. packets
12
Preemptive Priority NoC QNoC
QNoC
Multiple SL Router
  • Service Levels
  • Dedicated wormhole buffer
  • Preemptive priority scheduling

Multiple SL link
13
Example Vanilla NoC
Without contention XDelay of long
packet dDelay of short packet
Vanilla NoC example
Blue delay X Red delay 2Xd Average delay
1.5X
A
B
14
Example Priority NoC
Without contention XDelay of long
packet dDelay of short packet
Vanilla NoC example
Blue delayX Red delay 2Xd Average delay 1.5X
A
B
Priority NoC example
Blue delay Xd Red delay Xd Average delay X
Potential delay reduction 0.5X
15
Priority NoC Different Destinations
  • Very important in wormhole
  • When ctrl. packet is blocked by other worms

Long Data
Short Req.
16
Protocol Correctness
Need state-preserving serialization of
transactions in the processor interface
17
Numerical Evaluation
  • CMP simulator (SIMICS)
  • Simulate parallel benchmarks
  • Obtain L2-cache access traces
  • QNoC simulator (OPNET)
  • Simulate distributed coherence protocol over NoC
  • Measure total RD/RX L2-access delay
  • Measure total program throughput

18
Priority NoC Results
Delay Reduction vs. Network Load
RD Delay - Apache
RD/RX Delay Reduction - Apache
  • Short ctrl. packet gets high priority
  • Long data packet gets low priority

19
Priority NoC Several Benchmarks
Delay Reduction
Program Speedup
20
So Far The Power of Priority
  • Simplicity - Almost for Free
  • Significant CMP Speed-up
  • Good For
  • Coherency
  • Traffic differentiation (e.g. Fetch vs.
    Pre-Fetch)
  • Search in DNUCA
  • Synchronization (Locks)

21
Advanced Support Functions
  • Special Broadcast for Short Messages
  • Broadcast service (e.g. search in DNUCA)
  • Wormhole broadcast slow and expensive
  • ?SF broadcast embedded in wormhole
  • Virtual Ring
  • No Additional Cost
  • For Invalidation Multicast
  • Snooping or synchronization

22
Summary
  • NoC at CMP Service!
  • Shared cache over NoC
  • Priority is powerful
  • Built-in support functions
Write a Comment
User Comments (0)
About PowerShow.com