Non-Uniform Cache Architectures for Wire Delay Dominated Caches - PowerPoint PPT Presentation

About This Presentation

Title:

Non-Uniform Cache Architectures for Wire Delay Dominated Caches

Description:

Data Bus. Address Bus. Bank. Sub-bank. Predecoder. Sense ... Sense. amplifier. Bank. Data bus. Switch. Tag Array. Wordline driver. and decoder. Predecoder ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 23

Provided by: pagesC

Learn more at: https://pages.cs.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Non-Uniform Cache Architectures for Wire Delay Dominated Caches

1
Non-Uniform Cache Architecturesfor Wire Delay
Dominated Caches

Abhishek Desai
Bhavesh Mehta
Devang Sachdev
Gilles Muller

2
Plan

Motivation
What is NUCA
UCA and ML-UCA
Static NUCA
Dynamic NUCA
Simulation Results

3
Motivation

Bigger L2 and L3 Caches are needed
Programs are larger
SMT requires large cache for spatial locality
BW demands have increased on the package
Smaller technologies permit more bits per mm2
Wire delays dominate in large caches
Bulk of the access time will involve routing to
and from the banks, not the bank accesses
themselves

4
What is NUCA?

Data residing closer to the processor is
accessed much faster than data that reside
physically farther from the processor
Example
The closest bank in a 16MB on-chip L2 cache
built in 50nm process technology could be
accessed in 4 cycles, while an access to the
farthest bank might take 47 cycles.

5
UCA and ML-UCA
L2 41
L3 41
L2 10
ML-UCA Avg. access time 11/41 cycles Banks
8/32 Size 16MB Technology 50nm
UCA Avg. access time 255 cycles Banks 1 Size
16MB Technology 50nm
6
Static-NUCA-1
S-NUCA-1 Avg. access time 34 cycles Banks
32 Size 16MB Technology 50nm Area Wire
overhead 20.9
7
S-NUCA-1 cache design
8
Static-NUCA-2
S-NUCA-2 Avg. access time 24 cycles Banks
32 Size 16MB Technology 50nm Area Channel
overhead 5.9
9
S-NUCA-2 cache design
Addressbus
Sense amplifier
10
Dynamic-NUCA
D-NUCA Avg. access time 18 cycles Banks
256 Size 16MB Technology 50nm
11
Management of Data in DNUCA

Mapping
How the data are mapped to the banks and in which
banks a datum can reside?
Search
How the set of possible locations are searched to
find a line?
Movement
Under what conditions the data should be migrated
from one bank to another?

12
Simple Mapping (implemented)
memory controller
bank
one set
way 1
way 2
way 3
way 4
8 bank sets
13
Fair and Shared Mapping
memory controller
memory controller
Fair Mapping
Shared Mapping
14
Searching Cached Lines

Incremental search
Multicast search (Implemented)
Limited multicast
Partitioned multicast
Smart Search
ss-performance
ss-energy

15
Dynamic Movement of Lines

LRU line furthest and MRU line closest
One-bank promotion on a hit (implemented)
Policy on miss
Which line is evicted?
Line in the furthest (slowest) bank --
(implemented)
Where is the new line placed?
Closest (fastest) bank
Furthest (slowest) bank -- (implemented)
What happens to the victim line?
Zero copy policy (implemented)
One copy policy

16
Advantages of DNUCA over ML-UCA

DNUCA does not enforce inclusion thus preventing
redundant copies of the same line
In ML-UCA the faster level may not match the
working set size of an application, either being
too large and thus slow, or being too small and
thus incurring misses

17
Configuration for simulation

Used Sim-Alpha and Cacti
Simple mapping
Multicast search
One-bank promotion on each hit
Replacement policy that chooses the block in the
slowest bank as the victim of a miss

18
Hit Rate Distribution for D-NUCA
19
Simulation results integer benchmarks
20
Simulation results FP benchmarks
21
Summary