Title: Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
1Chapter SevenLarge and Fast Exploiting Memory
Hierarchy (Part II)
2Virtual Memory Motivations
- To allow efficient and safe sharing of memory
among multiple programs. - To remove the programming burdens of a small,
limited amount of main memory.
3Virtual Memory
- Main memory can act as a cache for the secondary
storage (disk) - Advantages
- illusion of having more physical memory
- program relocation
- protection
4Pages virtual memory blocks
- Page faults the data is not in memory, retrieve
it from disk - huge miss penalty, thus pages should be fairly
large (e.g., 4KB) - reducing page faults is important (LRU is worth
the price) - can handle the faults in software instead of
hardware - using write-through is too expensive so we use
write-back
5Placing a Page and Finding It Again
- We want the ability to use a clever and flexible
replacement scheme. - We want to reduce page fault rate.
- Fully-associative placement serves our purposes.
- But full search is impractical, so we locate
pages by using a full table that indexes the
memory. gt page table (resides in memory) - Each program has it own page table, which maps
the virtual address space of that program to main
memory.
6Page Table Register
l
e
r
e
g
i
s
t
e
r
3
1
3
0
2
9
2
8
2
7
1
5
1
4
1
3
1
2
1
1
1
0
9
8
3
2
1
0
2
0
1
2
P
a
g
e
t
a
b
l
e
1
8
2
9
2
8
2
7
1
5
1
4
1
3
1
2
1
1
1
0
9
8
3
2
1
0
7Process
- The page table, together with the program counter
and the registers, specifies the state of a
program. - If we want to allow another program to use the
CPU, we must save this state. - We often refer to this state as a process.
- A process is considered active when its in
possession of the CPU.
8Dealing With Page Faults
- When the valid bit for a virtual page is off, a
page fault occurs. - The operating system takes over, and the transfer
is done with the exception mechanism. - The OS must find the page in the next level of
hierarchy, and decide where to place the
requested page in the main memory. - LRU policy is often used.
9Page Tables
10What About Writes?
- Write-back scheme is used because write-through
takes too much time! - Also known as copy-back.
- To determine whether a page needs to be copied
back when we choose to replace it, a dirty bit is
added to the page table. - The dirty bit is set when any word in the page is
written.
11Making Address Translation Fast
- A cache for address translations
translation-lookaside buffer (TLB)
P
h
y
s
i
c
a
l
m
e
m
o
r
y
a
g
e
o
r
d
i
s
k
a
d
d
r
e
s
s
D
i
s
k
s
t
o
r
a
g
e
12Typical Values for TLB
- TLB (also known as translation cache) size
16-512 entries - Block size 1-2 page table entries (typically 4-8
bytes each) - Hit time 0.5-1 clock cycle
- Miss penalty 10-100 clock cycles
- Miss rate 0.01-1
13Integrating VM, TLBs and Caches
3
1
3
0
2
9
1
5
1
4
1
3
1
2
1
1
1
0
9
8
3
2
1
0
r
t
y
T
a
g
T
L
B
T
L
B
h
i
t
P
h
y
s
i
c
a
l
p
a
g
e
n
u
m
b
e
r
P
h
y
s
i
c
a
l
a
d
d
r
e
s
s
P
h
y
s
i
c
a
l
a
d
d
r
e
s
s
t
a
g
14TLBs and caches
15Overall Operation of a Memory Hierarchy
TLB Page Table Cache Possible? If so, under what circumstances?
Hit Hit Miss Possible
Miss Hit Hit TLB misses, but entry found in page table, after retry, data is found in cache
Miss Hit Miss TLB misses, but entry found in page table, after retry, data misses in cache
Miss Miss Miss TLB misses and followed by page fault, after retry, data must miss in cache
Hit Miss Miss Impossible
Hit Miss Hit Impossible
Miss Miss Hit Impossible
Possible combinations of events in TLB, VM and
Cache
16Implementing Protection with Virtual Memory
- The OS takes care of this.
- Hardware need to provide at least three
capabilities - support at least two modes that indicate whether
the running process is a user process or an OS
process (kernel process, supervisor process,
executive process) - provide a portion of the CPU state that a user
process can read but not write. - Provide mechanisms whereby the CPU can go from
the user mode to supervisor mode.
17A Common Framework for Memory Hierarchies
- Question 1 Where can a block be placed?
- Question 2 How is a block found?
- Question 3 Which block should be replaced on a
cache miss? - Question 4 What happens on a Write?
18The Three Cs
- Compulsory misses (cold-start misses)
- Capacity misses
- Conflict misses (collision misses)
19Modern Systems Intel P4 and AMD Opteron
- Very complicated memory systems
20Some Issues
- Processor speeds continue to increase very
fast much faster than either DRAM or disk
access times - Design challenge dealing with this growing
disparity - Trends
- synchronous SRAMs (provide a burst of data)
- redesign DRAM chips to provide higher bandwidth
or processing - restructure code to increase locality
- use prefetching (make cache visible to ISA)