Design Exploration of an Instruction-Based Shared Markov Table on CMPs PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Design Exploration of an Instruction-Based Shared Markov Table on CMPs


1
Design Exploration of an Instruction-Based Shared
Markov Table on CMPs
Design Exploration of an Instruction-Based Shared
Markov Table on CMPs
  • Karthik Ramachandran Lixin Su

2
Outline
  • Motivation
  • Multiple cores on single chip
  • Commercial workloads
  • Our study
  • Start from Instruction sharing pattern analysis
  • Our experiments
  • Move onto Instruction cache miss pattern analysis
  • Our experiments
  • Conclusions

3
Motivation
  • Technology push CMPs
  • Lower access latency to other processors
  • Application pull Commercial workloads
  • OS behavior
  • Database applications
  • Opportunities for shared structures
  • Markov based sharing structure
  • Address large instruction footprint VS. small
    fast I caches

4
Instruction Sharing Analysis
  • How instruction sharing may occur ?
  • OS multiple processes, scheduling
  • DB concurrent transactions, repeated queries,
    multiple threads
  • How can CMPs benefit from instruction sharing ?
  • Snoop/grab instruction from other cores
  • Shared structures
  • Lets investigate it.

5
Methodology
  • Two-step approach
  • Experiment I
  • Targets Instruction trace analysis
  • How much sharing occurs ?
  • Experiment II
  • Targets I cache miss stream analysis
  • Examine the potential of a shared Markov
    structure

6
Experiment I
  • Add instrumentation code to analyze committed
    instructions
  • Focus on repeated sequences of 2, 3, 4, and 5
    instructions across 16P
  • Histogram-based approach

How do we Count ? P1 3 times P2
1 time P3 0 times P4 2 times Total 10
times
P1
P2
P3
P4
A,B A,B A,B A,B A,B
A,B A,B A,B
A,B A,B
7
Results - Experiment I
Q.) Is there any Instruction sharing ? A.) Maybe,
observe the number of times the sequences 2-5
repeat (13000 -17000) Q.) But why does the
numbers for a sequence pattern of 5 Instructions
not differ much from a sequence pattern of 2
Instructions ? A.) Spin Loops!! For non warm-up
case 50 For warm-up case 30
8
Experiment II
  • Focus on instruction cache misses
  • Is there sharing involved here too?
  • Upper bound performance benefit of a shared
    Markov table?
  • Experiment setup
  • 16K-entry fully associative shared Markov table
  • Each entry has two consecutive misses from same
    processor
  • Atomic lookup and hit/miss counter update when a
    processor has two consecutive I misses.
  • On a miss, Insert a new entry to LRU head
  • On a hit, Record distance from the LRU head and
    move the hit entry to LRU head

9
Design Block Diagram
  • Small fast shared Markov table
  • Prefetch when I miss occurs

P
P
I
I
Markov Table
L2
10
Table Lookup Hit Ratio
Q1.) Is there a lot of miss sharing? Q2.) Does
constructive interference pattern exist to help a
CMP? Q3.) Do equal opportunities exist for all
the P?
11
Lets Answer the Questions?
  • A1.) Yes Of course
  • A2.) Definitely a constructive interference
    pattern exists as you see from the figure
  • A3.) Yes. Hit/miss ratio remains pretty stable
    across processor in spite of variance in the
    number of I cache misses.

12
How Big Should the Table Be ?
  • About 60 of hits are within 4K entries away
    from LRU head.
  • A shared Markov table can fairly utilize I cache
    miss sharing.
  • What about snooping and grabbing instructions
    from other I caches?

13
Real Design Issues
  • Associativity and size of the table
  • Choose the right path if multiple paths exist
  • Separate address directory from data entries for
    the table and have multiple address directories
  • What if a sequential prefetcher exists?

14
Conclusions
  • Instruction sharing on CMPs exists. Spin loops
    occur frequently with current workloads.
  • Markov-based structure for storing I cache misses
    may be helpful on CMPs.

15
Questions?
16
Comparison with Real Markov Prefetching
Cnt
5
A B
A C
A D
B D
A B C
LRU head
A E
2
A D F
3
LRU Tail
P
Hit Cnt 2
Miss Cnt 3
  • Miss to A and prefetch along A, B C

P
A C
  • Misses to A C and then look up in the
    table
  • Update hit/miss counters and change/record LRU

17
Lookup Example I
P
A C
A B
A D
B D
LRU head
A B
A C
A D
B D
LRU head
A C
Look up
LRU head
LRU Tail
Hit Cnt 3
Miss Cnt 3
Hit Cnt 2
Miss Cnt 3
18
Lookup Example II
P
A C
A B
A D
C D
LRU head
A B
A C
A D
B D
LRU head
C D
Look up
LRU head
LRU Tail
Hit Cnt 2
Miss Cnt 4
Hit Cnt 2
Miss Cnt 3
19
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com