Design Exploration of an Instruction-Based Shared Markov Table on CMPs presentation

About This Presentation

Transcript and Presenter's Notes

Title: Design Exploration of an Instruction-Based Shared Markov Table on CMPs

1
Design Exploration of an Instruction-Based Shared
Markov Table on CMPs
Design Exploration of an Instruction-Based Shared
Markov Table on CMPs

Karthik Ramachandran Lixin Su

2
Outline

Motivation
Multiple cores on single chip
Commercial workloads
Our study
Start from Instruction sharing pattern analysis
Our experiments
Move onto Instruction cache miss pattern analysis
Our experiments
Conclusions

3
Motivation

Technology push CMPs
Lower access latency to other processors
Application pull Commercial workloads
OS behavior
Database applications
Opportunities for shared structures
Markov based sharing structure
Address large instruction footprint VS. small
fast I caches

4
Instruction Sharing Analysis

How instruction sharing may occur ?
OS multiple processes, scheduling
DB concurrent transactions, repeated queries,
multiple threads
How can CMPs benefit from instruction sharing ?
Snoop/grab instruction from other cores
Shared structures
Lets investigate it.

5
Methodology

Two-step approach
Experiment I
Targets Instruction trace analysis
How much sharing occurs ?
Experiment II
Targets I cache miss stream analysis
Examine the potential of a shared Markov
structure

6
Experiment I

Add instrumentation code to analyze committed
instructions
Focus on repeated sequences of 2, 3, 4, and 5
instructions across 16P
Histogram-based approach

How do we Count ? P1 3 times P2
1 time P3 0 times P4 2 times Total 10
times
P1
P2
P3
P4
A,B A,B A,B A,B A,B
A,B A,B A,B
A,B A,B
7
Results - Experiment I
Q.) Is there any Instruction sharing ? A.) Maybe,
observe the number of times the sequences 2-5
repeat (13000 -17000) Q.) But why does the
numbers for a sequence pattern of 5 Instructions
not differ much from a sequence pattern of 2
Instructions ? A.) Spin Loops!! For non warm-up
case 50 For warm-up case 30
8
Experiment II

Focus on instruction cache misses
Is there sharing involved here too?
Upper bound performance benefit of a shared
Markov table?
Experiment setup
16K-entry fully associative shared Markov table
Each entry has two consecutive misses from same
processor
Atomic lookup and hit/miss counter update when a
processor has two consecutive I misses.
On a miss, Insert a new entry to LRU head
On a hit, Record distance from the LRU head and
move the hit entry to LRU head

9
Design Block Diagram

Small fast shared Markov table
Prefetch when I miss occurs

P
P
I
I
Markov Table
L2
10
Table Lookup Hit Ratio
Q1.) Is there a lot of miss sharing? Q2.) Does
constructive interference pattern exist to help a
CMP? Q3.) Do equal opportunities exist for all
the P?
11
Lets Answer the Questions?

A1.) Yes Of course
A2.) Definitely a constructive interference
pattern exists as you see from the figure
A3.) Yes. Hit/miss ratio remains pretty stable
across processor in spite of variance in the
number of I cache misses.

12
How Big Should the Table Be ?

About 60 of hits are within 4K entries away
from LRU head.
A shared Markov table can fairly utilize I cache
miss sharing.
What about snooping and grabbing instructions
from other I caches?

13
Real Design Issues

Associativity and size of the table
Choose the right path if multiple paths exist
Separate address directory from data entries for
the table and have multiple address directories
What if a sequential prefetcher exists?

14
Conclusions

Instruction sharing on CMPs exists. Spin loops
occur frequently with current workloads.
Markov-based structure for storing I cache misses
may be helpful on CMPs.

15
Questions?
16
Comparison with Real Markov Prefetching
Cnt
5
A B
A C
A D
B D
A B C
LRU head
A E
2
A D F
3
LRU Tail
P
Hit Cnt 2
Miss Cnt 3

Miss to A and prefetch along A, B C

P
A C

Misses to A C and then look up in the
table
Update hit/miss counters and change/record LRU

17
Lookup Example I
P
A C
A B
A D
B D
LRU head
A B
A C
A D
B D
LRU head
A C
Look up
LRU head
LRU Tail
Hit Cnt 3
Miss Cnt 3
Hit Cnt 2
Miss Cnt 3
18
Lookup Example II
P
A C
A B
A D
C D
LRU head
A B
A C
A D
B D
LRU head
C D
Look up
LRU head
LRU Tail
Hit Cnt 2
Miss Cnt 4
Hit Cnt 2
Miss Cnt 3
19
(No Transcript)

Write a Comment

User Comments (0)

About PowerShow.com

Design Exploration of an Instruction-Based Shared Markov Table on CMPs PowerPoint PPT Presentation