Title: Design Exploration of an Instruction-Based Shared Markov Table on CMPs
1Design Exploration of an Instruction-Based Shared
Markov Table on CMPs
Design Exploration of an Instruction-Based Shared
Markov Table on CMPs
- Karthik Ramachandran Lixin Su
2Outline
- Motivation
- Multiple cores on single chip
- Commercial workloads
- Our study
- Start from Instruction sharing pattern analysis
- Our experiments
- Move onto Instruction cache miss pattern analysis
- Our experiments
- Conclusions
3Motivation
- Technology push CMPs
- Lower access latency to other processors
- Application pull Commercial workloads
- OS behavior
- Database applications
- Opportunities for shared structures
- Markov based sharing structure
- Address large instruction footprint VS. small
fast I caches
4Instruction Sharing Analysis
- How instruction sharing may occur ?
- OS multiple processes, scheduling
- DB concurrent transactions, repeated queries,
multiple threads - How can CMPs benefit from instruction sharing ?
- Snoop/grab instruction from other cores
- Shared structures
- Lets investigate it.
5Methodology
- Two-step approach
- Experiment I
- Targets Instruction trace analysis
- How much sharing occurs ?
- Experiment II
- Targets I cache miss stream analysis
- Examine the potential of a shared Markov
structure
6Experiment I
- Add instrumentation code to analyze committed
instructions - Focus on repeated sequences of 2, 3, 4, and 5
instructions across 16P - Histogram-based approach
How do we Count ? P1 3 times P2
1 time P3 0 times P4 2 times Total 10
times
P1
P2
P3
P4
A,B A,B A,B A,B A,B
A,B A,B A,B
A,B A,B
7Results - Experiment I
Q.) Is there any Instruction sharing ? A.) Maybe,
observe the number of times the sequences 2-5
repeat (13000 -17000) Q.) But why does the
numbers for a sequence pattern of 5 Instructions
not differ much from a sequence pattern of 2
Instructions ? A.) Spin Loops!! For non warm-up
case 50 For warm-up case 30
8Experiment II
- Focus on instruction cache misses
- Is there sharing involved here too?
- Upper bound performance benefit of a shared
Markov table? - Experiment setup
- 16K-entry fully associative shared Markov table
- Each entry has two consecutive misses from same
processor - Atomic lookup and hit/miss counter update when a
processor has two consecutive I misses. - On a miss, Insert a new entry to LRU head
- On a hit, Record distance from the LRU head and
move the hit entry to LRU head
9Design Block Diagram
- Small fast shared Markov table
- Prefetch when I miss occurs
P
P
I
I
Markov Table
L2
10Table Lookup Hit Ratio
Q1.) Is there a lot of miss sharing? Q2.) Does
constructive interference pattern exist to help a
CMP? Q3.) Do equal opportunities exist for all
the P?
11Lets Answer the Questions?
- A1.) Yes Of course
- A2.) Definitely a constructive interference
pattern exists as you see from the figure - A3.) Yes. Hit/miss ratio remains pretty stable
across processor in spite of variance in the
number of I cache misses.
12How Big Should the Table Be ?
- About 60 of hits are within 4K entries away
from LRU head. - A shared Markov table can fairly utilize I cache
miss sharing. - What about snooping and grabbing instructions
from other I caches?
13Real Design Issues
- Associativity and size of the table
- Choose the right path if multiple paths exist
- Separate address directory from data entries for
the table and have multiple address directories - What if a sequential prefetcher exists?
14Conclusions
- Instruction sharing on CMPs exists. Spin loops
occur frequently with current workloads. - Markov-based structure for storing I cache misses
may be helpful on CMPs.
15Questions?
16Comparison with Real Markov Prefetching
Cnt
5
A B
A C
A D
B D
A B C
LRU head
A E
2
A D F
3
LRU Tail
P
Hit Cnt 2
Miss Cnt 3
- Miss to A and prefetch along A, B C
P
A C
- Misses to A C and then look up in the
table - Update hit/miss counters and change/record LRU
17Lookup Example I
P
A C
A B
A D
B D
LRU head
A B
A C
A D
B D
LRU head
A C
Look up
LRU head
LRU Tail
Hit Cnt 3
Miss Cnt 3
Hit Cnt 2
Miss Cnt 3
18Lookup Example II
P
A C
A B
A D
C D
LRU head
A B
A C
A D
B D
LRU head
C D
Look up
LRU head
LRU Tail
Hit Cnt 2
Miss Cnt 4
Hit Cnt 2
Miss Cnt 3
19(No Transcript)