Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II) - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)

Description:

Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II) – PowerPoint PPT presentation

Number of Views:135

Avg rating:3.0/5.0

Slides: 21

Provided by: TodA164

Category:

more less

Transcript and Presenter's Notes

Title: Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)

1
Chapter SevenLarge and Fast Exploiting Memory
Hierarchy (Part II)
2
Virtual Memory Motivations

To allow efficient and safe sharing of memory
among multiple programs.
To remove the programming burdens of a small,
limited amount of main memory.

3
Virtual Memory

Main memory can act as a cache for the secondary
storage (disk)
Advantages
illusion of having more physical memory
program relocation
protection

4
Pages virtual memory blocks

Page faults the data is not in memory, retrieve
it from disk
huge miss penalty, thus pages should be fairly
large (e.g., 4KB)
reducing page faults is important (LRU is worth
the price)
can handle the faults in software instead of
hardware
using write-through is too expensive so we use
write-back

5
Placing a Page and Finding It Again

We want the ability to use a clever and flexible
replacement scheme.
We want to reduce page fault rate.
Fully-associative placement serves our purposes.
But full search is impractical, so we locate
pages by using a full table that indexes the
memory. gt page table (resides in memory)
Each program has it own page table, which maps
the virtual address space of that program to main
memory.

6
Page Table Register
l
e

r
e
g
i
s
t
e
r

3
1

3
0

2
9

2
8

2
7

1
5

1
4

1
3

1
2

1
1

1
0

9

8

3

2

1

0
2
0
1
2
P
a
g
e

t
a
b
l
e
1
8
2
9

2
8

2
7
1
5

1
4

1
3

1
2

1
1

1
0

9

8

3

2

1

0
7
Process

The page table, together with the program counter
and the registers, specifies the state of a
program.
If we want to allow another program to use the
CPU, we must save this state.
We often refer to this state as a process.
A process is considered active when its in
possession of the CPU.

8
Dealing With Page Faults

When the valid bit for a virtual page is off, a
page fault occurs.
The operating system takes over, and the transfer
is done with the exception mechanism.
The OS must find the page in the next level of
hierarchy, and decide where to place the
requested page in the main memory.
LRU policy is often used.

9
Page Tables
10
What About Writes?

Write-back scheme is used because write-through
takes too much time!
Also known as copy-back.
To determine whether a page needs to be copied
back when we choose to replace it, a dirty bit is
added to the page table.
The dirty bit is set when any word in the page is
written.

11
Making Address Translation Fast

A cache for address translations
translation-lookaside buffer (TLB)

P
h
y
s
i
c
a
l

m
e
m
o
r
y
a
g
e
o
r

d
i
s
k

a
d
d
r
e
s
s
D
i
s
k

s
t
o
r
a
g
e
12
Typical Values for TLB

TLB (also known as translation cache) size
16-512 entries
Block size 1-2 page table entries (typically 4-8
bytes each)
Hit time 0.5-1 clock cycle
Miss penalty 10-100 clock cycles
Miss rate 0.01-1

13
Integrating VM, TLBs and Caches
3
1

3
0

2
9

1
5

1
4

1
3

1
2

1
1

1
0

9

8

3

2

1

0

r
t
y
T
a
g
T
L
B
T
L
B

h
i
t
P
h
y
s
i
c
a
l

p
a
g
e

n
u
m
b
e
r
P
h
y
s
i
c
a
l

a
d
d
r
e
s
s
P
h
y
s
i
c
a
l

a
d
d
r
e
s
s

t
a
g
14
TLBs and caches
15
Overall Operation of a Memory Hierarchy
TLB Page Table Cache Possible? If so, under what circumstances?
Hit Hit Miss Possible
Miss Hit Hit TLB misses, but entry found in page table, after retry, data is found in cache
Miss Hit Miss TLB misses, but entry found in page table, after retry, data misses in cache
Miss Miss Miss TLB misses and followed by page fault, after retry, data must miss in cache
Hit Miss Miss Impossible
Hit Miss Hit Impossible
Miss Miss Hit Impossible
Possible combinations of events in TLB, VM and
Cache
16
Implementing Protection with Virtual Memory

The OS takes care of this.
Hardware need to provide at least three
capabilities
support at least two modes that indicate whether
the running process is a user process or an OS
process (kernel process, supervisor process,
executive process)
provide a portion of the CPU state that a user
process can read but not write.
Provide mechanisms whereby the CPU can go from
the user mode to supervisor mode.

17
A Common Framework for Memory Hierarchies

Question 1 Where can a block be placed?
Question 2 How is a block found?
Question 3 Which block should be replaced on a
cache miss?
Question 4 What happens on a Write?

18
The Three Cs

Compulsory misses (cold-start misses)
Capacity misses
Conflict misses (collision misses)

19
Modern Systems Intel P4 and AMD Opteron

Very complicated memory systems

20
Some Issues

Processor speeds continue to increase very
fast much faster than either DRAM or disk
access times
Design challenge dealing with this growing
disparity
Trends
synchronous SRAMs (provide a burst of data)
redesign DRAM chips to provide higher bandwidth
or processing
restructure code to increase locality
use prefetching (make cache visible to ISA)