Title: Memory Hierarchy Introduction to Memory Subsystem Cache Memories Main Memory DRAM
1Memory Hierarchy - Introduction to Memory
Subsystem - Cache Memories - Main Memory (DRAM)
- ECE 411 - Fall 2009
- Lecture 5
2The LC-3ba Datapath
ALUMuxSel
3Physical Memory Systems
So far, weve viewed memory as a black box that
you can put data and programs into for later
access
Memory
Data
Processor
Program
General- Purpose Registers
MAR
ALUs
MDR
PC
Control Logic
CC
4Types of Memory
5Memory Hierarchy
CAPACITY
SPEED and COST
Registers
On-Chip SRAM
Off-Chip SRAM
DRAM
Disk
6Why Memory Hierarchy?
- Processor needs lots of bandwidth
- 64-bit architectures need even more
- Applications and Windows need lots of instruction
and data - PowerPoint, iTune, games, etc.
- Must be cheap per bit
- Today 1TB lt 1K
7Memory Hierarchy?
- Fast and small, costly memories
- Enable quick access (fast cycle time)
- Enable lots of bandwidth (1 L/S/I-fetch/cycle)
- Holds hot instructions and data
- Slower, expensive, larger memories
- Capture larger share of instruction and data of
running programs - Still relatively fast
- Slow, inexpensive huge storage
- Hold rarely-needed state
- Needed for correctness and persistent, long-term
storage - Bottom-line provide appearance of large, fast
memory with cost of cheap, slow memory
8Why Does a Hierarchy Work?
- Locality of reference
- Temporal locality
- Reference same memory location many times (close
together, in time) - Spatial locality
- Reference near neighbors around the same time
- Empirically observed
- Significant!
- Even small working memories (16KB) often
satisfies gt70 of references to multi-MB data
sets - Working set principle
- At any given time during execution, we only need
to keep a small subset of data close to the data
path - This is enforced by programmer and user behavior
9Memory Hierarchy
CPU
- Temporal Locality
- Keep recently referenced items at higher levels
- Future references satisfied quickly
- Spatial Locality
- Bring neighbors of recently referenced to higher
levels - Future references satisfied quickly
I D L1 Cache
Shared L2 Cache
Main Memory
Disk
10Memory Hierarchy an inverted view ?
- Temporal Locality
- Keep recently referenced items at LOWER levels
- Future references satisfied quickly
- Spatial Locality
- Bring neighbors of recently referenced to LOWER
levels - Future references satisfied quickly
11Four Central Questions
- P-I-R-W
- Placement
- Where can a block of memory go?
- Identification
- How do I find a block of memory?
- Replacement
- How do I make space for new blocks?
- Write Policy
- How do I propagate changes?
- Need to consider these for all memories caches
now - Main memory, disks later
12Placement
13Cache Memories Purpose, Configuration, and
Performance
14Review of Memory Hierarchies
CPU
Cache (SRAM)
Increasing Capacity
Physical Memory
Main Memory (DRAM)
Increasing Speed
Virtual Memory (Disk)
15Cache Memory Motivation
- Processor speeds are increasing much faster than
memory speeds - Current top-end Pentium has a cycle time of about
0.3 ns - High-end DRAMs have access times of about 30ns
- DRAM access takes 100 cycles minimum, not even
counting time to send signals from the processor
to the memory - Memory speed matters
- Each instruction needs to be fetched from memory
- Loads, stores are a significant fraction of
instructions - Amdahls Law tells us that increasing processor
performance without speeding up memory wont help
much overall (air travel example) - Temporal locality and spatial locality allows
caches to work well
16Cache Memories
- Relatively small SRAM memories located physically
close to the processor - SRAMs have low access times
- Physical proximity reduces wire delay
- Modern processors use multiple levels of cache
memories, each level is 5-10 times faster than
the next level - Similar in concept to virtual memory
- Keep commonly-accessed data in smaller, fast
memory - Use larger memory to hold data thats accessed
less frequently
17Caches vs. Virtual Memory
- Caches
- Implemented completely in hardware
- Operate on relatively small blocks of data
(blocks) - 32-512 bytes common
- Often restrict which memory addresses can be
stored in a given location in the cache
- Virtual Memory
- Use combination of hardware and software
- Operate on larger blocks of data (pages)
- 8-1024 KB common
- Allow any block to be mapped into any location in
physical memory
18Cache Operation
- On memory access, look in the cache first
- If the address we want is in the cache, complete
the operation, usually in one cycle - If not, complete the operation using the main
memory (many cycles)
19Performance of Memory Hierarchies
- Basic Formula
- Tavg Phit Thit Pmiss Tmiss
- Thit time to complete the memory reference if
we hit in a given level of the hierarchy - Tmiss time to complete the memory reference if
we miss and have to go down to the next level - Phit, Pmiss Probabilities of hitting or missing
in the level
20Example 1
- A memory system consists of a cache and a main
memory. If it takes 1 cycle to complete a cache
hit, and 100 cycles to complete a cache miss,
what is the average memory access time if the hit
rate in the cache is 97?
21Example 1
- A memory system consists of a cache and a main
memory. If it takes 1 cycle to complete a cache
hit, and 100 cycles to complete a cache miss,
what is the average memory access time if the hit
rate in the cache is 97? - Thit 1 cycle
- Tmiss 100 cycles
- Phit .97
- Pmiss .03
- Tavg Phit Thit Pmiss Tmiss 0.97 1
.03 100 - 3.97 cycles
22Describing Caches
- We characterize a cache using 7 parameters
- Access Time Thit
- Capacity the total amount of data the cache can
hold - of blocks block size
- Block (line) Size The amount of data that gets
moved into or out of the cache as a chunk - Analagous to page size in virtual memory
- What happens on a write?
- Replacement Policy What data is replaced on a
miss? - Associativity How many locations in the cache is
a given address eligible to be placed in? - Unified, Instruction, Data What type of data is
kept in the cache? - Well cover this in more detail next time
23Capacity
- In general, bigger is better
- The more data you can store in the cache, the
less often you have to go out to the main memory - However, bigger caches tend to be slower
- Need to understand how both Thit and Phit change
as you change the capacity of the cache. - Declining return on investment as cache size goes
up - Well see why when we talk about causes of cache
misses - From the point of view of the processor, cache
access time is always an integer number of cycles - Depending on processor cycle time, changes in
cache access time may be either really important
or irrelevant (quantization effect).
24Cache Block Size
- Very similar concept to page size
- Cache groups contiguous addresses into lines
- Lines almost always aligned on their size
- Caches fetch or write back an entire line of data
on a miss - Spatial Locality
- Reading/Writing a block
- Typically, takes much longer to fetch the first
word of a block than subsequent words - Page Mode DRAMs
- Tfetch Tfirst (line Size / fetch width)
Tsubsequent
25Impact of Block Size on Hit Rate
Figure Credit Computer Organization and Design
The Hardware / Software Interface, page 559
26Hit Rate isnt Everything
- Average access time is better performance
indicator than hit rate - Tavg Phit Thit Pmiss Tmiss
- Tmiss Tfetch Tfirst (Block Size / fetch
width) Tsubsequent - Trade-off Increasing block size usually
increases hit rate, but also increases fetch time - As blocks get bigger, increase in fetch time
starts to outweigh increase in hit rate
27Associativity Where Can Data Go?
- In virtual memory systems, any page could be
placed in any physical page frame - Very flexible
- Use page table, TLB to track mapping between
virtual address and physical page frame and allow
fast translation - This doesnt work so well for caches
- Cant afford the time to do software search to
see if a line is in the cache - Need hardware to determine if we hit
- Cant afford the space for a table of mappings
for each virtual address - Page tables are MB to GB on modern architectures,
caches tend to be KB in size
28Direct-Mapped One cache location for each
address
29Fully-Associative Anything Can Go Anywhere
30Direct-Mapped vs. Fully-Associative
- Direct-Mapped
- Require less area
- Only one comparator
- Fewer tag bits required
- Fast can return data to processor in parallel
with determining if a hit has occurred - Conflict misses reduce hit rate
- Fully-Associative
- No conflict misses, therefore higher hit rate in
general - Need one comparator for each line in the cache
- Design trade-offs
- For a given chip area, will you get a better hit
rate with a fully-associative cache or a
direct-mapped cache with a higher capacity? - Do you need the lower access time of a
direct-mapped cache?
31An Aside Talking About Cache Misses
- In single-processor systems, cache misses can be
divided into three categories - Compulsory Misses caused by the first reference
to each line of data - In an infinitely-large fully-associative cache,
these would be the only misses - Capacity Misses caused because a program
references more data than will fit in the cache - Conflict Misses caused because more lines try to
share a specific place in the cache than will fit
32Compromise Set-Associative Caches