Title: CPE 431531 Chapter 7 Large and Fast: Exploiting Memory Hierarchy
1CPE 431/531Chapter 7 - Large and Fast
Exploiting Memory Hierarchy
Swathi T. Gurumani Modified From Slides of Dr.
Rhonda Kay Gaede UAH
27.5 A Common Framework for Memory Hierarchies -
Question 1
- Question 1 Where Can a Block be Placed?
- Answer A Range of Associativities are Possible
- Advantage Increasing associativity decreases
miss rates. - Disadvantage Increasing associativity increases
cost and access time.
37.5 A Common Framework for Memory Hierarchies -
Question 2
- Question 2 How is a Block Found?
- Answers
- Caches - Small degrees of associativity are used
because high degrees are costly - Virtual Memory - Full associativity makes sense
because - Misses are very expensive
- Software can implement sophisticated replacement
schemes - Full map can be easily indexed
- Large items means small number of mappings
47.5 A Common Framework for Memory Hierarchies -
Question 3
- Question 3 Which Block Should be Replaced on a
Cache Miss? - Answers
- Cache
- Random
- LRU
- Virtual Memory
- LRU
57.5 A Common Framework for Memory Hierarchies -
Question 4
- Question 4 What Happens on a Write?
- Answers
- Caches - write-back is the strategy of the future
- Virtual memory - always uses write-back
67.5 A Common Framework for Memory Hierarchies -
The Three Cs
- Cache misses occur in three categories
- Compulsory misses cold start misses first
access to a cache that has never been in cache - Capacity misses cache cannot contain all blocks
needed for execution - Conflict misses multiple blocks compete for
same set collision misses - The challenge in designing memory hierarchies is
that every change that potentially improves the
miss rate can also negatively affect overall
performance.
77.6 Real Stuff - The Pentium P4 and Opteron
Memory Hierarchies
- Both have secondary caches on the main processor
die.
87.6 Real Stuff - The Pentium P4 and Opteron
Memory Hierarchies
97.7 Fallacies and Pitfalls
- Pitfall Forgetting to account for byte
addressing or the cache block size in simulating
a cache. - Pitfall Ignoring memory system behavior when
writing programs or when generating code in
compiler - Pitfall Using average memory access time to
evaluate the memory hierarchy of an out-of-order
processor - Pitfall Extending an address space by adding
segments on top of an unsegmented address space.
107.8 Concluding Remarks
- Because CPU speeds continue to increase faster
than either DRAM access times or disk access
times, memory will increasingly be the factor
that limits performance.
117.8 Concluding Remarks Developments
- Hardware
- Changes in cache capabilities
- Software
- Restructuring loops
- Compiler-directed prefetching