Java without the Coffee Breaks: A Nonintrusive Multiprocessor Garbage Collector

About This Presentation

Title:

Java without the Coffee Breaks: A Nonintrusive Multiprocessor Garbage Collector

Description:

inc/dec operations placed in buffer. Objects allocated with RC=1, dec enqueued ... cyclic garbage is in cycle buffer; collected unless race. Details in ECOOP' ... – PowerPoint PPT presentation

Number of Views:76

Avg rating:3.0/5.0

Slides: 28

Provided by: davidf63

Category:

more less

Transcript and Presenter's Notes

Title: Java without the Coffee Breaks: A Nonintrusive Multiprocessor Garbage Collector

1
Java without the Coffee BreaksA Nonintrusive
MultiprocessorGarbage Collector

David F. Bacon
IBM T.J. Watson Research Center
Joint work with C.R. Attanasio, Han Lee,
V.T. Rajan, and Steve Smith
Combines results to appear in PLDI'01 and ECOOP01

2
The Garbage Collector What is It?

Concurrent
Multiprocessor
Reference-counted
Cycle collecting
Low-latency
High-performance

3
Why do it?

RC has good locality properties
Will tracing collection scale to multi-GB heaps?
Effect of rising memory latency, SMT/CMP?
RC is easily decoupled from mutator
Promises lower synchronization costs
Java makes good GC a commercial requirement
Makes sense to have genetic diversity
People said it couldnt be done

4
Why is it hard?

Must defer reference counting stack
Complicates algorithm
Reference counts are shared variables
Synchronization required
Write barriers for RC more expensive
Affect two objects
RC doesnt handle cycles
Other systems use backup tracing collector

5
Outline

Introduction
Motivation
System Overview
Concurrent Reference Counting
Cycle Collection
Measurements
Conclusions

6
System Overview

Producer/Consumer System
Similar to Deutsch-Bobrow DRC
As implemented by DeTreville on SRC Firefly

Mutator Emit inc/dec Allocate
Collector Free Memory Process inc/dec Collect
Cycles
7
Implementation

Implemented in Jalapeño JVM at IBM TJW
All of VM, JIT, and GC are written in Java
Extended with unsafe primitives
Multiple GC implementations
Runs on IBM RS/6000 multiprocessors
Linux/Intel port underway
GC is machine-independent except for barriers

8
Outline

Introduction
Motivation
System Overview
Concurrent Reference Counting
Cycle Collection
Measurements
Conclusions

9
Concurrent Reference Counting

Time divided into epochs
All CPUs must participate before epoch advances
Write barrier on heap updates
inc/dec operations placed in buffer
Objects allocated with RC1, dec enqueued
Decrements processed one epoch behind increments
Stack references are deferred
Snapshot stacks at epoch boundary
First increment decrement at next epoch
Simpler invariant than Deutsch-Bobrow no ZCT
required

10
Collector CPU
CPU 1
CPU 2
11
Outline

Introduction
Motivation
System Overview
Concurrent Reference Counting
Cycle Collection
Measurements
Conclusions

12
Synchronous Cycle Collection

Class loader identifies acyclic classes
Arrays, String, and such are marked green
Two key observations
Most reference counts are 1
Garbage cycles created by decrement to non-0
Use those objects as starting points
DFS-based algorithm subtracts internal RCs
If resulting count is 0, collect cyclic garbage
Based on algorithm by Lins, but O(n) instead of
O(n2)

13
Root Buffer
1. Process Decrements and Accumulate Roots
2. Mark Gray Subtract Internal Reference Counts
3. Scan Restore Live, Mark Dead White
4. Collect White
14
Concurrent Cycle Collection

Based on synchronous algorithm
Relies on stability property of garbage
If no mutation, synchronous algorithm will work
Detect when mutation occurs and avoid collecting
Two tests required
Delta test detects local changes
Sigma test detects non-local changes

15
Root Buffer
Cycle Buffer
1. Process Decrements
5. Await next epoch
2. Mark Gray
6. If still orange, GC
3. Scan
7. If changed, restore
4. Collect White
16
Root Buffer
Cycle Buffer
1. Process Decrements
6. Await next epoch
2. Mark Gray
7. Compute in-degree sum
3. Scan
8. If 0, GC/decrement neighbors
4. Collect White
9. If non-0, restore
5. Calculate external in-degree
17
Proof Sketch

Based on succession of graphs Gi for each epoch i
Induced by inc/dec operations
Safety
Necessity shown by foregoing examples
Sufficiency proved because
Passing Sigma test on past Gi ensures stable
garbage
Delta test ensures that RC info in Gi was correct
Liveness
All possible cycle roots are considered/reconsider
ed
All cyclic garbage is in cycle buffer collected
unless race
Details in ECOOP01 Paper

18
Outline

Introduction
Motivation
System Overview
Concurrent Reference Counting
Cycle Collection
Measurements
Conclusions

19
Speed vs. Parallel MarkSweep
20
Pause Time vs. Parallel MarkSweep
21
Cycle Collection Buffering Roots
22
Reference Tracing vs. MarkSweep
23
Object Reclamation
0
0
0
968
25K
(2)
(13)
(5)
24
Related Work