Execution Replay for Multiprocessor Virtual Machines presentation

About This Presentation

Transcript and Presenter's Notes

Title: Execution Replay for Multiprocessor Virtual Machines

1
Execution Replay for Multiprocessor Virtual
Machines

George W. Dunlap
Dominic Lucchetti
Michael A. Fetterman
Peter M. Chen

2
Big ideas

Detection and replay of memory races is possible
on commodity hardware
Overhead high for some workloads
but surprisingly low for other workloads

3
Execution Replay
CPU
Interrupts
Network
Memory
Keyboard, mouse
Disk
4
Uses of Execution Replay

Reconstructing state
Fault tolerance
Reconstructing execution
Debugging
Realistic trace generation
Both
Intrusion analysis

5
Single-processor Replay

Basic principles well understood
Log all non-deterministic inputs
Timing of asynchronous events
Minimal overhead (Dunlap02)
13 worst case
Log for months or years
Available commercially
VMWare Record/Replay

6
Replay for Multiprocessors

Memory races in multiprocessor VMs
The Ordering Requirement
The CREW Protocol
Implementing with page protections
Relation to the Ordering Requirement
Generating constrants from CREW events
DMA-capable devices and CREW
Performance

7
The Multiprocessor Challenge

Interleaved reads and writes
Fine-grained non-determinism
Much more difficult
Existing solutions
Hardware modification
Software instrumentation
SMP-ReVirt
Hardware MMU to detect sharing

8
Multiprocessor Replay
P2
P1
P2
P1
n5
n3
Memory
if (nlt4)
9
Ordering Memory Accesses

Preserving order will reproduce execution
a?b a happens-before b
Ordering is transitive a?b, b?c means a?c
Two instructions must be ordered if
they both access the same memory, and
one of them is a write

10
Constraints Enforcing order

To guarantee a?d
a?d
b?d
a?c
b?c
Suppose we need b?c
b?c is necessary
a?d is redundant

P1
P2
a
b
overconstrained
c
d
11
CREW Protocol

Each shared object in one of two states
Concurrent-Read all processors can read, none
can write
Exclusive-Write one processor (the owner) can
read and write others have no access

12
CREW protocol, cont

Enforced with hardware MMU
Read/write
Read-only
None
Change CREW states on demand
Fault, fixup, re-execute
CREW event
Increasing or reducing permission due to CREW
state changes

13
CREW Property

If two instructions on different processors
access the same page,
and one of them is a write,
there will be a CREW event on each processor
between them.

14
Generating Constraints

State Concurrent Read
All processors read-only
d CREW fault
New state P2 Exclusive
r privilege reduction
Read to None
i privilege increase
Read to Read/write
Log timing of r and i
Constraint
r ? i

P1
P2
a

d
r
i
d
15
Direct Memory Access

Device accesses memory directly
Logically another processor
Reads and writes need to be ordered
IOMMU cant fault/fixup/re-execute
Observation Transaction model
Device non-preemptible actor

16
Prototype SMP-ReVirt

Modified Xen hypervisor
Implement logging, CREW protocol
Details in paper

17
Evaluation questions

What is the overhead?
What affects performance?
In paper
When might I want to use MP?
Log with 1, 2, or N cpus?

18
Evaluation Workloads

SPLASH2 parallel application suite
FMM, LU, ocean, radix, water-spatial, radiosity
Kernel-build
Dbench

19
Predicting results

Key changes in sharing attributes
4096-byte sharing granularity
Miss is very expensive
SPLASH2
Good high spatial locality / low false sharing
Bad random access patterns / high false sharing
The Linux kernel
Tuned to 16-byte cacheline
Involving the kernel may be expensive

20
Single-processor Xen guests
21
Log Growth Rate
Workload Log growth(GB/day) Days to fill 300GB
FMM 0.234 1280
LU 0.237 1261
Ocean 0.232 1295
Radix 0.292 1025
Water-spatial 0.232 1296
Kernel-build 0.564 531
Radiosity 0.231 1295
Dbench 0.557 538
22
2-processor Xen guests
23
2-processor, cont
24
Log Growth Rate
Workload Log growth(GB/day) Days to fill 300GB
FMM 34.5 8.7
LU 3.2 92.7
Ocean 4.3 69.1
Radix 39.8 7.5
Water-spatial 36.3 8.25
Kernel-build 43.3 6.9
Radiosity 88.4 3.4
Dbench 77.0 3.9
25
4-processor Xen guests
26
Recap

Memory races in multiprocessor VMs
The Ordering Requirement
The CREW Protocol
Implementing with page protections
Relation to the Ordering Requirement
Generating constrants from CREW events
DMA-capable devices and CREW
Performance

27
Big ideas

Detection and replay of memory races is possible
on commodity hardware
Overhead high for some workloads
but surprisingly low for other workloads

28
Questions

Write a Comment

User Comments (0)

About PowerShow.com

Execution Replay for Multiprocessor Virtual Machines PowerPoint PPT Presentation