Title: Improving DirectMapped Cache Performance by the Addition of a Small FullyAssociative Cache
1 - Improving Direct-Mapped Cache Performance by the
Addition of a Small Fully-Associative Cache - And Pefetch Buffers
- Norman P. Jouppi
- PresenterShrinivas Narayani
2Contents
- Cache Basics
- Types of Cache misses
- Cost of Cache misses
- How to remove the cache misses
- Larger Block size
- Adding Associativity (Reducing Conflict Misses)
- Miss Cache
- Victim Cache .. An Improvement over miss
cache - Removing Capacity Misses and Compulsory Misses
- Prefetch Technique
- Stream Buffers
- Conclusion
3 - Mapping
- (Block Address) modulo (Number of cache
blocks in the cache) - Cache is accessed using lower order bits.
- e.g Memory address between (0001) and 11101
map to locations 001 and 101 in cache. - Data is addressed using tag (higher order bits
of address)
4Direct Mapped Cache
000 001 010 011 100 101
110 111
00001
00101 01001
01101
10001
5Cache Terminology
- Cache Hit
- Cache Miss
- Miss Penalty The miss penalty is the time to
replace the block in the upper level with
corresponding block from the lower level.
6- In a direct-Mapped cache, there is only one
place the newly requested item and hence only one
choice of what to replace.
7Types of Misses
- CompulsoryThe first access to a block is not in
the cache, so the block must be brought into the
cache. These are also called cold start misses or
first reference misses.(Misses in Infinite
Cache) - CapacityIf the cache cannot contain all the
blocks needed during execution of a program,
capacity misses will occur due to blocks being
discarded and later retrieved.(Misses in Size ) - ConflictIf the block-placement strategy is set
associative or direct mapped, conflict misses (in
addition to compulsory and capacity misses) will
occur because a block can be discarded and later
retrieved if too many blocks map to its set.
These are also called collision misses or
interference misses.(Misses in N-way
Associative) - Coherence Misses Result of invalidation to
preserve multiprocessor cahce consistency.
8- Conflict Misses account for
- Between 20 to 40 of of all direct-mapped cache
misses
9Cost of Cache Misses
- Cycle time has been decreasing much faster than
memory access time. - Average number of machine cycles per instruction
has been decreasing dramatically. This two
effects can results in miss cost. - Eg Cache miss on VAX11/780 only cost 60 of the
average instruction execution. If every
instruction had cache miss then machine
performance can go down by 60.
10How to Reduce the Cache Miss
- Increase Block Size
- Increase Associativity
- Use a Victim Cache
- Use a Pseudo Associative Cache
- Hardware Prefetching
- Compiler-Controlled Prefetching
- Compiler Optimizations
11Increasing Block size
- One way to reduce the miss rate is to increase
the block size - Reduce compulsory misses - why?
- Take advantage of spacial locality
- However, larger blocks have disadvantages
- May increase the miss penalty (need to get more
data) - May increase hit time (need to read more data
from cache and larger mux) - May increase conflict and capacity misses.
12Adding Associativity
From processor
To processor
tag
data
one cache line of data
MRU entry Fully-associative miss cache LRU entry
tag and comparator
one cache line of data
tag and comparator
one cache line of data
tag and comparator
one cache line of data
tag and comparator
From next lower cache
- when a miss occur,data is returned
- to DM and miss cache
- Each time the upper cache and miss cache is probed
13Performance of Miss cache
- Replaces a long off-chip miss penalty with a
short one-cycle on-chip miss. - Data conflict misses more removed
14Disadvantage of Miss Cache
- Waste of storage space in the miss cache due
to duplication of data.
15Victim Cache
- An improvement over miss cache.
- Loads victim line instead of requested line.
- In case of miss contents of DM cache and victim
cache are swapped.
16The effect of DM cache size on victim cache
performance
- DM size increase, likelyhood of conflict miss
removed by victim cache reduces
17Reducing Capacity and Compulsory Misses
Use prefetch technique 1.prefetch always
2.prefetch on miss 3.tagged prefetch
18- Prefetch always prfetches always after every
reference. - On miss prefetch on miss always fetches the next
line. - In tagged prefetch each block has a tag bit
associated with it. - When a block is fetched its tag bit set is set
zero and one when it is used - While block undergoes this change a new block is
fetched.
19 Stream buffers
- Start prefetch before tag transition
20- Stream buffer consist of a series of entries,
each consisting of a tag, an available bit, and a
data line. - On a miss it fetches successive line at the miss
target. - Lines after the line requested are placed in
buffer which avoid populating the cache with the
data which is not needed.
21Multi-Way Stream Buffers
? only remove 25 of data cache miss ?interleaved
stream of data from different sources ? four
stream buffer in parallel ? instruction stream
unchanged ? twice the performance of the single
stream buffer
22Stream buffer Vs Prefetch
- Feasible to Implement
- Lower latency
- Extra hardware required by stream buffers is
comparable with additional tag required by tagged
prefetch.
23Stream buffer performance vs.cache size
- Only data stream buffer performance
- improve as cache size increase
- It can contain data for reference pattern
- that access several sets of data.
24(No Transcript)
25Conclusion
- Miss cache beneficial in removing data cache miss
and conflict misses. - Victim cache is an improvement over Miss cache
that saves the victim of the cache miss instead
of target. - stream buffer reduces capacity,compulsory miss
- Multiway stream buffers are set of stream buffers
that can prefetch down several stream
concurrently.
26 References
- Improving Direct-Mapped Cache Performance by the
Addition of a small - Fully-Associative Cache and Prefetch Buffers
- Norman P. Jouppi
-
- Computer Organization and design
- Patterson D. and Hennesy J.
-
-