Better than the Two: Exceeding Private and Shared Caches via Two-Dimensional Page Coloring

About This Presentation

Title:

Better than the Two: Exceeding Private and Shared Caches via Two-Dimensional Page Coloring

Description:

Better than the Two: Exceeding Private and Shared Caches via Two-Dimensional Page Coloring Lei Jin and Sangyeun Cho Dept. of Computer Science University of Pittsburgh – PowerPoint PPT presentation

Number of Views:80

Avg rating:3.0/5.0

Slides: 32

Provided by: Sang55

Learn more at: https://www.eecg.toronto.edu

Category:

more less

Transcript and Presenter's Notes

Title: Better than the Two: Exceeding Private and Shared Caches via Two-Dimensional Page Coloring

1
Better than the Two Exceeding Private and
Shared Caches via Two-Dimensional Page Coloring

Lei Jin and Sangyeun Cho

Dept. of Computer Science University of Pittsburgh
2
Multicore distributed L2 caches

L2 caches typically sub-banked and distributed
IBM Power4/5 3 banks
Sun Microsystems T1 4 banks
Intel Itanium2 (L3) many sub-arrays
(Distributed L2 caches switched NoC) ? NUCA
Hardware-based management schemes
Private caching
Shared caching
Hybrid caching

3
Private and shared caching

Private caching
? short hit latency (always local)
? high on-chip miss rate
long miss resolution time
complex coherence enforcement

Shared caching
low on-chip miss rate
straightforward data location
simple coherence (no replication)
long average hit latency

4
Other approaches

Hybrid/flexible schemes
Core clustering Speight et al., ISCA2005
Flexible CMP cache sharing Huh et al.,
ICS2004
Flexible bank mapping Liu et al., HPCA2004
Improving shared caching
Victim replication Zhang and Asanovic,
ISCA2005
Improving private caching
Cooperative caching Chang and Sohi, ISCA2006
CMP-NuRAPID Chishti et al., ISCA2005

5
Motivation
Hit latency
Miss rate
What is the optimal balance between miss rate and
hit latency?
6
Talk roadmap

Data mapping, a key property cho and Jin,
Micro2006
Two-dimensional (2D) page coloring algorithm
Evaluation and results
Conclusion and future works

7
Data mapping

Data mapping
Memory data ? location in L2 cache
Private caching
Data mapping determined by program location
Mapping created at miss time
No explicit control
Shared caching
Data mapping determined by address
slice number (block address) (Nslice)
Mapping is static
No explicit control

8
Change mapping granularity
Block granularity
Page granularity
Page
Page
Page
slice number (block address) (N slice)
Page
slice number (page address) (N slice)
9
OS controlled page mapping
Program 1
Memory pages
OS PAGE ALLOCATION
OS PAGE ALLOCATION
Program 2
Virtual address space
Physical address space
10
2D page coloring the problem
access miss
cost
9000
6900
9000
8100
9600
Page
Page
Page
Page
Page
500 30
500 3
P
500 10
500 7
500 12
Network latency / hop 3 cycles Memory latency
300 cycles
Cost(color ) ( access x hop x 3 cycles)
( miss x 300 cycles)
11
2D coloring algorithm

Collect L2 reference trace
Derive conflict information Sherwood et al.,
ICS1999

12
2D coloring algorithm (contd)

Derive conflict information

Reference Matrix A B C A 0 0 0 B 0 0 0 C 0 0 0
Conflict Matrix A B C A 0 0 0 B 0 0 0 C 0 0 0
11
13
2D coloring algorithm (contd)

Derive conflict information

Reference Matrix A B C A 0 0 0 B 1 0 0 C 1 0 0
Conflict Matrix A B C A 0 0 0 B 0 0 0 C 0 0 0
14
2D coloring algorithm (contd)

Derive conflict information

Reference Matrix A B C A 0 0 0 B 1 0 0 C 1 0 0
Conflict Matrix A B C A 0 0 0 B 0 0 0 C 0 0 0
15
2D coloring algorithm (contd)

Derive conflict information

Reference Matrix A B C A 0 1 0 B 1 0 0 C 1 1 0
Conflict Matrix A B C A 0 0 0 B 0 0 0 C 0 0 0
1
0
16
2D coloring algorithm (contd)

Derive conflict information

Reference Matrix A B C A 0 1 0 B 0 0 0 C 1 1 0
Conflict Matrix A B C A 0 0 0 B 1 0 0 C 0 0 0
17
2D coloring algorithm (contd)

Derive conflict information

Reference Matrix A B C A 0 1 0 B 0 0 0 C 1 1 0
Conflict Matrix A B C A 0 0 0 B 1 0 0 C 0 0 0
18
2D coloring algorithm (contd)

Derive conflict information

Reference Matrix A B C A 0 1 1 B 0 0 1 C 1 1 0
Conflict Matrix A B C A 0 0 0 B 1 0 0 C 0 0 0
0
0
1
1
19
2D coloring algorithm (contd)

2D Page coloring

Conflict Matrix A B C A 0 0 0 B 1 0 0 C 1 1 0
Access Counter A B C 1 2 1
Conflict Matrix A B C A 0 0 0 B 1 0 0 C 1 1 0
20
2D coloring algorithm (contd)

2D Page coloring

Conflict Matrix A B C A 0 0 0 B 1 0 0 C 1 1 0
Access Counter A B C 1 2 1
Conflict(color)
Access
Cost(color, page) (
x mem latency)
x hop(color) x hop
delay)
a x
(1-a) x
Optimal color(page) C Cost(C)
MINCost(color, page)

for all colors
21
Experiments setup

Experiments were carried out using simulator
derived from SimpleScalar toolset.
The simulator models a 16-core tile-based CMP.
Each core has private 32KB I/D L1, global shared
256KB L2 slice (total 4MB).

22
Optimal page mapping
a 1/64
a 1/256
of pages
of pages
x
y
y
x
gcc
23
Access distribution
24
Relative performance
25
Value of a
26
Conclusions

With cautious data placement, there is huge room
for performance improvement.
Dynamic mapping schemes with information assisted
by hardware are possible to achieve similar
perform-ance improvement.
This method can also be applied to other
optimization target.

27
Current and future works

Dynamic mapping schemes
Performance
Power
Multiprogrammed and parallel workloads

28
Thank you Questions?
29
Private caching

? short hit latency (always local)
? high on-chip miss rate
long miss resolution time
complex coherence enforcement

L1 miss
L2 access
Hit
Miss
Access directory
A copy on chip
Global miss

Local L2 access
30
Shared caching

L1 miss
L2 access
Hit
Miss

low on-chip miss rate
straightforward data location
simple coherence (no replication)
long average hit latency

31
Performance

Write a Comment

User Comments (0)

About PowerShow.com

Better than the Two: Exceeding Private and Shared Caches via Two-Dimensional Page Coloring - PowerPoint PPT Presentation

Better than the Two: Exceeding Private and Shared Caches via Two-Dimensional Page Coloring

Better than the Two: Exceeding Private and Shared Caches via Two-Dimensional Page Coloring Lei Jin and Sangyeun Cho Dept. of Computer Science University of Pittsburgh – PowerPoint PPT presentation