Execution Replay for Multiprocessor Virtual Machines - PowerPoint PPT Presentation

About This Presentation
Title:

Execution Replay for Multiprocessor Virtual Machines

Description:

Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen Big ideas Detection and replay of memory ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 29
Provided by: George747
Category:

less

Transcript and Presenter's Notes

Title: Execution Replay for Multiprocessor Virtual Machines


1
Execution Replay for Multiprocessor Virtual
Machines
  • George W. Dunlap
  • Dominic Lucchetti
  • Michael A. Fetterman
  • Peter M. Chen

2
Big ideas
  • Detection and replay of memory races is possible
    on commodity hardware
  • Overhead high for some workloads
  • but surprisingly low for other workloads

3
Execution Replay
CPU
Interrupts
Network
Memory
Keyboard, mouse
Disk
4
Uses of Execution Replay
  • Reconstructing state
  • Fault tolerance
  • Reconstructing execution
  • Debugging
  • Realistic trace generation
  • Both
  • Intrusion analysis

5
Single-processor Replay
  • Basic principles well understood
  • Log all non-deterministic inputs
  • Timing of asynchronous events
  • Minimal overhead (Dunlap02)
  • 13 worst case
  • Log for months or years
  • Available commercially
  • VMWare Record/Replay

6
Replay for Multiprocessors
  • Memory races in multiprocessor VMs
  • The Ordering Requirement
  • The CREW Protocol
  • Implementing with page protections
  • Relation to the Ordering Requirement
  • Generating constrants from CREW events
  • DMA-capable devices and CREW
  • Performance

7
The Multiprocessor Challenge
  • Interleaved reads and writes
  • Fine-grained non-determinism
  • Much more difficult
  • Existing solutions
  • Hardware modification
  • Software instrumentation
  • SMP-ReVirt
  • Hardware MMU to detect sharing

8
Multiprocessor Replay
P2
P1
P2
P1
n5
n3
Memory
if (nlt4)
9
Ordering Memory Accesses
  • Preserving order will reproduce execution
  • a?b a happens-before b
  • Ordering is transitive a?b, b?c means a?c
  • Two instructions must be ordered if
  • they both access the same memory, and
  • one of them is a write

10
Constraints Enforcing order
  • To guarantee a?d
  • a?d
  • b?d
  • a?c
  • b?c
  • Suppose we need b?c
  • b?c is necessary
  • a?d is redundant

P1
P2
a
b
overconstrained
c
d
11
CREW Protocol
  • Each shared object in one of two states
  • Concurrent-Read all processors can read, none
    can write
  • Exclusive-Write one processor (the owner) can
    read and write others have no access

12
CREW protocol, cont
  • Enforced with hardware MMU
  • Read/write
  • Read-only
  • None
  • Change CREW states on demand
  • Fault, fixup, re-execute
  • CREW event
  • Increasing or reducing permission due to CREW
    state changes

13
CREW Property
  • If two instructions on different processors
  • access the same page,
  • and one of them is a write,
  • there will be a CREW event on each processor
    between them.

14
Generating Constraints
  • State Concurrent Read
  • All processors read-only
  • d CREW fault
  • New state P2 Exclusive
  • r privilege reduction
  • Read to None
  • i privilege increase
  • Read to Read/write
  • Log timing of r and i
  • Constraint
  • r ? i

P1
P2
a

d
r
i
d
15
Direct Memory Access
  • Device accesses memory directly
  • Logically another processor
  • Reads and writes need to be ordered
  • IOMMU cant fault/fixup/re-execute
  • Observation Transaction model
  • Device non-preemptible actor

16
Prototype SMP-ReVirt
  • Modified Xen hypervisor
  • Implement logging, CREW protocol
  • Details in paper

17
Evaluation questions
  • What is the overhead?
  • What affects performance?
  • In paper
  • When might I want to use MP?
  • Log with 1, 2, or N cpus?

18
Evaluation Workloads
  • SPLASH2 parallel application suite
  • FMM, LU, ocean, radix, water-spatial, radiosity
  • Kernel-build
  • Dbench

19
Predicting results
  • Key changes in sharing attributes
  • 4096-byte sharing granularity
  • Miss is very expensive
  • SPLASH2
  • Good high spatial locality / low false sharing
  • Bad random access patterns / high false sharing
  • The Linux kernel
  • Tuned to 16-byte cacheline
  • Involving the kernel may be expensive

20
Single-processor Xen guests
21
Log Growth Rate
Workload Log growth(GB/day) Days to fill 300GB
FMM 0.234 1280
LU 0.237 1261
Ocean 0.232 1295
Radix 0.292 1025
Water-spatial 0.232 1296
Kernel-build 0.564 531
Radiosity 0.231 1295
Dbench 0.557 538
22
2-processor Xen guests
23
2-processor, cont
24
Log Growth Rate
Workload Log growth(GB/day) Days to fill 300GB
FMM 34.5 8.7
LU 3.2 92.7
Ocean 4.3 69.1
Radix 39.8 7.5
Water-spatial 36.3 8.25
Kernel-build 43.3 6.9
Radiosity 88.4 3.4
Dbench 77.0 3.9
25
4-processor Xen guests
26
Recap
  • Memory races in multiprocessor VMs
  • The Ordering Requirement
  • The CREW Protocol
  • Implementing with page protections
  • Relation to the Ordering Requirement
  • Generating constrants from CREW events
  • DMA-capable devices and CREW
  • Performance

27
Big ideas
  • Detection and replay of memory races is possible
    on commodity hardware
  • Overhead high for some workloads
  • but surprisingly low for other workloads

28
Questions
Write a Comment
User Comments (0)
About PowerShow.com