Title: COMP 206 Computer Architecture and Implementation Unit 8a: Basics of Caches
1COMP 206Computer Architecture and
ImplementationUnit 8a Basics of Caches
- Siddhartha Chatterjee
- Fall 2000
2The Big Picture Where are We Now?
- The Five Classic Components of a Computer
- This unit Memory System
Processor
Input
Memory
Output
3The Motivation for Caches
- Motivation
- Large memories (DRAM) are slow
- Small memories (SRAM) are fast
- Make the average access time small by servicing
most accesses from a small, fast memory - Reduce the bandwidth required of the large memory
4The Principle of Locality
- The Principle of Locality
- Program access a relatively small portion of the
address space at any instant of time - Example 90 of time in 10 of the code
- Two different types of locality
- Temporal Locality (locality in time) If an item
is referenced, it will tend to be referenced
again soon - Spatial Locality (locality in space) If an item
is referenced, items whose addresses are close by
tend to be referenced soon
5Levels of the Memory Hierarchy
6Memory Hierarchy Principles of Operation
- At any given time, data is copied between only 2
adjacent levels - Upper Level (Cache) the one closer to the
processor - Smaller, faster, and uses more expensive
technology - Lower Level (Memory) the one further away from
the processor - Bigger, slower, and uses less expensive
technology - Block
- The minimum unit of information that can either
be present or not present in the two-level
hierarchy
Lower Level (Memory)
Upper Level (Cache)
To Processor
Blk X
From Processor
Blk Y
7Memory Hierarchy Terminology
- Hit data appears in some block in the upper
level (example Block X in previous slide) - Hit Rate the fraction of memory access found in
the upper level - Hit Time Time to access the upper level which
consists of - RAM access time Time to determine hit/miss
- Miss data needs to be retrieve from a block in
the lower level (Block Y on previous slide) - Miss Rate 1 - (Hit Rate)
- Miss Penalty Time to replace a block in the
upper level - Time to deliver the block the processor
- Hit Time ltlt Miss Penalty
8Cache Addressing
- Block/line is unit of allocation
- Sector/sub-block is unit of transfer and
coherence - Cache parameters j, k, m, n are integers, and
generally powers of 2
9Examples of Cache Configurations
Memory address
Block address Block offset
Tag Set Sector Byte
10Storage Overhead of Cache
11Basics of Cache Operation
12Details of Simple Blocking Cache
Write Through
Write Back
13Example 1KB, Direct-Mapped, 32B Blocks
- For a 1024 (210) byte cache with 32-byte blocks
- The uppermost 22 (32 - 10) address bits are the
tag - The lowest 5 address bits are the Byte Select
(Block Size 25) - The next 5 address bits (bit5 - bit9) are the
Cache Index
14Example 1a 1KB Direct Mapped Cache
15Example 1b 1KB Direct Mapped Cache
16Example 2 1KB Direct Mapped Cache
17Example 3a 1KB Direct Mapped Cache
18Example 3b 1KB Direct Mapped Cache
19A-way Set-Associative Cache
- A-way set associative A entries for each Cache
Index - A direct-mapped caches operating in parallel
- Example Two-way set associative cache
- Cache Index selects a set from the cache
- The two tags in the set are compared in parallel
- Data is selected based on the tag result
20 Fully Associative Cache
- Push the set-associative idea to its limit!
- Forget about the Cache Index
- Compare the Cache Tags of all cache tag entries
in parallel - Example Block Size 32B, we need N 27-bit
comparators
21Cache Shapes
Direct-mapped (A 1, S 16)
2-way set-associative (A 2, S 8)
4-way set-associative (A 4, S 4)
8-way set-associative (A 8, S 2)
Fully associative (A 16, S 1)
22Need for Block Replacement Policy
- Direct Mapped Cache
- Each memory location can only mapped to 1 cache
location - No need to make any decision -)
- Current item replaces previous item in that cache
location - N-way Set Associative Cache
- Each memory location have a choice of N cache
locations - Fully Associative Cache
- Each memory location can be placed in ANY cache
location - Cache miss in a N-way Set Associative or Fully
Associative Cache - Bring in new block from memory
- Throw out a cache block to make room for the new
block - We need to decide which block to throw out!
23Cache Block Replacement Policies
- Random Replacement
- Hardware randomly selects a cache item and throw
it out - Least Recently Used
- Hardware keeps track of the access history
- Replace the entry that has not been used for the
longest time. - For two-way set-associative cache one needs one
bit for LRU replacement - Example of a Simple Pseudo LRU Implementation
- Assume 64 Fully Associative entries
- Hardware replacement pointer points to one cache
entry - Whenever an access is made to the entry the
pointer points to - Move the pointer to the next entry
- Otherwise do not move the pointer
24Cache Write Policy
- Cache read is much easier to handle than cache
write - Instruction cache is much easier to design than
data cache - Cache write
- How do we keep data in the cache and memory
consistent? - Two options (decision time again -)
- Write Back write to cache only. Write the cache
block to memory when that cache block is being
replaced on a cache miss - Need a dirty bit for each cache block
- Greatly reduce the memory bandwidth requirement
- Control can be complex
- Write Through write to cache and memory at the
same time - What!!! How can this be? Isnt memory too slow
for this?
25Write Buffer for Write Through
- A Write Buffer is needed between cache and main
memory - Processor writes data into the cache and the
write buffer - Memory controller write contents of the buffer
to memory - Write buffer is just a FIFO
- Typical number of entries 4
- Works fine if store frequency (w.r.t. time) ltlt 1
/ DRAM write cycle - Memory system designers nightmare
- Store frequency (w.r.t. time) gt 1 / DRAM write
cycle - Write buffer saturation
26Write Buffer Saturation
- Store frequency (w.r.t. time) gt 1 / DRAM write
cycle - If this condition exist for a long period of time
(CPU cycle time too quick and/or too many store
instructions in a row) - Store buffer will overflow no matter how big you
make it - CPU Cycle Time ltlt DRAM Write Cycle Time
- Solutions for write buffer saturation
- Use a write back cache
- Install a second level (L2) cache
27Write Allocate versus Not Allocate
- Assume that a 16-bit write to memory location
0x00 causes a cache miss - Do we read in the block?
- Yes Write Allocate
- No Write No-Allocate
28Four Questions for Memory Hierarchy
- Where can a block be placed in the upper level?
(Block placement) - How is a block found if it is in the upper level?
(Block identification) - Which block should be replaced on a miss? (Block
replacement) - What happens on a write? (Write strategy)