Theory of Memory - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Theory of Memory

Description:

Bad news: cache state is visible. CPU core. acc: access. acc.adr: address ... c = ( pr, rd, lms, hm,gm) pr program rest. rd recursion depth ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 37
Provided by: profdr1
Category:
Tags: gm | memory | news | theory

less

Transcript and Presenter's Notes

Title: Theory of Memory


1
Theory of Memory
  • W. Paul
  • Saarland University and DFKI
  • bmbf Projekt Verisoft-XT
  • joint work with
  • Ulan Degebaev and Norbert Schirmer
  • Saarland University

2
why might his be important?
  • Unites theories of
  • store buffers
  • interlocking
  • caches
  • cache coherence
  • out of order execution
  • X64 instruction set
  • address translation
  • optimized compilation
  • structured parallel C semantics
  • Explains why hypervisor might run structured
    parallel C
  • VCC is supposed to mirror structured parallel C
    semantics
  • thus VCC might be(come) sound

3
Specifying Memory
x
M(x)
4
Store Buffer
memory M
sbuf(y)
w(i)
r(j)
5
Store Buffer
memory M
sbuf(y)
w(i)
r(j)
6
Caches
M
ca
7
Many Caches Snooping
M
ca(1)
ca(p)
8
Many Caches
M
x.la
x.off
ca(1)
ca(p)
9
Many Caches
M
x.la
x.off
ca(1)
ca(p)
10
Many Caches
M
x.off
ca(1)
ca(p)
11
Overlapping Transactions
c
b
public (a)
a
c
c
12
Sequentially Consistent Memorylemma 5
c
b
public (a)
a
c
c
13
Tomasulo Schedulers for OOO
IF
issue
reservation stations
funct. units
CDB
ROB
WB
14
Two Memory Units
m
RS
RS
sbuf
MMU
funct. units
LS
CDB
ROB
15
Single Processor OOO correctnesslemma 6
m
RS
RS
sbuf
MMU
funct. units
LS
CDB
ROB
16
Multi Processor OOO implementation
m
RS
RS
sbuf
MMU
funct. units
LS
CDB
data(i,j)
ROB
17
Multi Processor OOO correctnesslemma 7
m
RS
RS
sbuf
MMU
funct. units
LS
CDB
data(i,j)
ROB
18
Multi Processor OOO correctnesslemma 7
m
RS
RS
sbuf
MMU
funct. units
LS
CDB
data(i,j)
ROB
19
X64 architecture
  • CPU core
  • R user registers
  • SR system registers
  • CR3
  • acc access
  • segmentation
  • mmu memory management unit
  • tlb translation look aside buffer
  • memory system
  • mm main memory
  • ca cache
  • sbuf store buffer

mm
ca
sbuf
acc
mmu
tlb
acc
CR3
segmentation
core
R
20
segmentation offlemma 8
  • 1 segment
  • large as entire address space
  • segmentation invisible

mm
ca
sbuf
acc
mmu
tlb
acc
CR3
segmentation
core
R
21
Bad news cache state is visible
  • CPU core
  • acc access
  • acc.adr address
  • acc.r rights (user,write, exe)
  • acc.data
  • acc.mmode memory mode
  • WB write back
  • WT write through ...
  • NC no cache

mm or devices
ca
sbuf
acc
mmu
tlb
acc
CR3
core
R
22
Good News no device, no NC mode
  • acc.mmode memory mode
  • WB write back
  • WT write through ...
  • NC no cache not used

mm
ca
sbuf
acc
mmu
tlb
acc
CR3
core
R
23
Sequentially Consistent Physical Memorylemma 9
  • acc.mmode memory mode
  • WB write back
  • WT write through ...
  • mix on same address
  • PM sequentially consistent physical memory
    abstraction
  • Proof MOESI invariants are maintained

PM
sbuf
acc
mmu
tlb
acc
CR3
core
R
24
Initialize page tables
  • 1 processor
  • sbuf invisible
  • operating mode paging disabled
  • mmu invisible
  • set up page table tree in PM

PM
page tables
sbuf
acc
mmu
tlb
acc
CR3
core
R
25
Translated Linear Memory
  • many processors
  • operating mode paging enabled
  • keep tlb consistent

PM
page tables
sbuf
acc
mmu
tlb
acc
CR3
core
R
26
Translated Consistent Linear Memory sbufs lemma
10
  • many processors
  • operating mode paging enabled
  • keep tlb consistent

LM
page tables
sbuf
acc
CR3
core
R
27
C0 Pascal with C syntaxconfigurations
  • c ( pr, rd, lms, hm,gm)
  • pr program rest
  • rd recursion depth
  • lms 0 recursion depth!local memories
  • hm heap memory
  • gm global memory
  • subvariables
  • (m,i)17.gpr3
  • value of pointers subvariables !

memory m
va(c,(m,i))
size(m,i)
ba(m,i)
28
Parallel C
  • c ( pr, rd, lms, hm,gm)
  • pr program rest
  • rd recursion depth
  • lms 0 recursion depth!local memories
  • hm heap memory
  • gm global memory
  • Share
  • gm
  • hm
  • Interleave at small steps semantics steps

memory m
va(c,(m,i))
size(m,i)
ba(m,i)
29
Parallel C
  • c ( pr, rd, lms, hm,gm)
  • pr program rest
  • rd recursion depth
  • lms 0 recursion depth!local memories
  • hm heap memory
  • gm global memory
  • Share
  • gm
  • hm
  • Interleave at small steps semantics steps
  • Problem
  • Processor interleaves instructions
  • of compiled programs code(p)

memory m
va(c,(m,i))
size(m,i)
ba(m,i)
30
simulation relation consis(c, alloc, d)
LM
alloc(c,y)
y
alloc(c,p)
p
31
Non optimizing compilerstep by step simulation

32
Optimizing compilersimulation between IO-steps

33
IO-steps (1) volatile accesses

34
Volatiles Sequentially Consistentlemma 11

35
Structured Parallel C
  • Implement Locks using Volatiles
  • IO-steps (2) lock release
  • Run Processors alone on locked portions
  • of linear memory
  • Lemma 1 sbufs invisible
  • Lemma 10 Ordinary C code in linear memory

36
Summary
  • Implement Locks using Volatiles
  • IO-steps (2) lock release
  • Run Processors alone on locked portions
  • of linear memory
  • Lemma 1 sbufs invisible
  • Lemma 10 Ordinary C code in linear memory
  • Outlined correctness proof for implementation of
    structured parallel C
  • Initialisation
  • compilation
Write a Comment
User Comments (0)
About PowerShow.com