CS 7960-4 Lecture 10 - PowerPoint PPT Presentation

About This Presentation
Title:

CS 7960-4 Lecture 10

Description:

The cache and line size influence the percentage. of misses attributable to conflicts ... for the same time, HM of CPI or AM of IPC is. appropriate. Next Week's ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 20
Provided by: RajeevBala4
Category:
Tags: lecture | size

less

Transcript and Presenter's Notes

Title: CS 7960-4 Lecture 10


1
CS 7960-4 Lecture 10
Improving Direct-Mapped Cache Performance by the
Addition of a Small Fully-Associative Cache and
Prefetch Buffers N.P. Jouppi Proceedings of
ISCA-17 1990
2
Cache Basics
Data array
Tag array
D E C O D E R
Way 1
Way 2
Set
Address
Comparator
Mux
3
Multiplexing
M
4
Banking
Words/Ways get distributed
Sets get distributed
Wordline
Bitline
  • Banking reduces acces time per bank and overall
    power
  • Allows multiple accesses without true
    multiporting

5
Virtual Memory
  • A single physical address (A) can map to
    multiple
  • virtual addresses (X, Y)
  • The CPU provides addresses X and Y and the
  • cache must make sure that both map to the same
  • cache location
  • Naive solution perform virtual-to-physical
  • translation (TLB) before accessing the cache

6
Page Coloring
  • To identify potential cache locations and
    initiate
  • the RAM look-up, only index bits are needed
  • If OS ensures that virtual index bits always
    match
  • physical index bits, you can start RAM look-up
  • before completing TLB look-up
  • When both finish, use newly obtained physical
  • address for the tag comparison (note cant use
  • virtual address for tag comparison
  • Virtually-indexed, Physically-tagged

7
Memory Wall
Year Clock speed Memory
latency in seconds in cycles

1997 0.75 GHz 5020ns 53 cycles
2011 10 GHz 16ns 160 cycles
Improves by 10/year
Clock speed has traditionally improved by
50/year, but will improve by only 20/year in
the future
8
Bottlenecks
9
Conflict Misses
  • Direct-mapped caches have lower access times,
  • but suffer from conflict misses
  • Most conflict misses are localized to a few sets
  • -- an associativity of 1.2 is desirable?

10
Victim Caches
  • Every eviction from L1 gets put in the victim
    cache
  • (VC and L1 are exclusive)
  • Victim cache associative look-up can happen in
  • parallel with L1 look-up VC hit results in a
    swap

L1
Victim cache
11
Results
  • The cache and line size influence the percentage
  • of misses attributable to conflicts
  • 15-entry victim cache eliminates half the
    conflict
  • misses reduction in total cache misses is
    less
  • than 20

12
Prefetch Techniques
  • Prefetch on miss fetches multiple lines for
    every
  • cache miss
  • Tagged prefetch waits till a prefetched line is
  • touched before bringing in more lines
  • Prefetch deals with capacity and compulsory
  • misses, but causes cache pollution

13
Stream Buffers
  • On a cache miss, fill the stream buffer with
  • contiguous cache lines
  • When you read the top of the queue, bring in the
  • next line
  • If the top-of-q does not service a miss, the
    stream
  • buffer flushes and starts from scratch

L1
Sequential lines
Stream buffer
14
Results
  • Eight entries are enough to eliminate most
    capacity
  • and compulsory misses
  • 72 of I-cache misses and 25 of D-cache misses
  • are eliminated
  • Multiple stream buffers help eliminate 43 of
  • D-cache misses
  • Large cache lines minimize stream buffer impact
  • (stream buffer removes 10 of D-cache misses
    for
  • 128B cache line size)

15
Potential Improvements
  • Relax the top-of-q constraint for the stream
    buffer
  • Maintain a stride value to detect non-sequential
  • accesses

16
Bottlenecks Again
For 4KB caches, 16B lines
17
Harmonic and Arithmetic Means
  • HM of IPC N / (1/IPCa 1/ IPCb 1/ IPCc)
  • N / (CPIa CPIb CPIc)
  • 1 / AM of CPI
  • Weight each benchmark as if they all execute
    one
  • instruction
  • If you want to assume each benchmark executes
  • for the same time, HM of CPI or AM of IPC is
  • appropriate

18
Next Weeks Paper
  • Memory Dependence Prediction Using Store
  • Sets, Chrysos and Emer, ISCA-25, 1998

19
Title
  • Bullet
Write a Comment
User Comments (0)
About PowerShow.com