380C - PowerPoint PPT Presentation

About This Presentation
Title:

380C

Description:

Cannot incrementally reclaim memory, must free en masse ... Sweep phase - put free ones on the free list. 4. 8. 12. 16. 128. free lists. heap. 33. Marksweep ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 40
Provided by: CSCF
Category:
Tags: 380c | free | ones

less

Transcript and Presenter's Notes

Title: 380C


1
380C
  • Where are we where we are going
  • Managed languages
  • Dynamic compilation
  • Inlining
  • Garbage collection
  • What else can you do when you examine the heap a
    lot?
  • Why you need to care about workloads
  • Alias analysis
  • Dependence analysis
  • Loop transformations
  • EDGE architectures

2
380C lecture 18
  • Garbage Collection
  • Why use garbage collection?
  • What is garbage?
  • Reachable vs live, stack maps, etc.
  • Allocators and their collection mechanisms
  • Semispace
  • Marksweep
  • Performance comparisons
  • Mark Region
  • Incremental age based collection
  • Write barriers Friend or foe?
  • Generational
  • Beltway

3
Mark Region and Other Advances in Garbage
Collection
PLDI08 Immix A Mark-Region Collector
WithSpace Efficiency, Fast Collection, and
Mutator Performance
  • Kathryn S. McKinley Stephen M. Blackburn
  • University of Texas at Austin Australian
    National University

4
Isnt GC a bit retro?
Languages without automated garbage collection
are getting out of fashion. The chance of running
into all kinds of memory problems is gradually
outweighing the performance penalty you have to
pay for garbage collection. Paul Jansen,
managing director of TIOBE Software, in Dr Dobbs,
April 2008
5
GC FundamentalsThe TimeSpace Tradeoff
6
GC FundamentalsThe TimeSpace Tradeoff
Our Goal
7
GC FundamentalsAlgorithmic Components
  • Allocation
  • Reclamation

Identification

Sweep-to-Free
Tracing (implicit)
Free List
Compact
Reference Counting (explicit)
Bump Allocation
Evacuate
8
GC FundamentalsCanonical Garbage Collectors
9
Mark-SweepFree List Allocation Trace
Sweep-to-Free
Actual data, taken from geomean of DaCapo, jvm98,
and jbb2000 on 2.4GHz Core 2 Duo
10
Mark-CompactBump Allocation Trace Compact
Actual data, taken from geomean of DaCapo, jvm98,
and jbb2000 on 2.4GHz Core 2 Duo
11
Semi-SpaceBump Allocation Trace Evacuation
Space inefficient
Space inefficient
Actual data, taken from geomean of DaCapo, jvm98,
and jbb2000 on 2.4GHz Core 2 Duo
12
Mark-Regionwith Sweep-To-Region
Reclamation
Sweep-to-Free

13
Mark-RegionBump Allocation Trace
Sweep-to-Region
Actual data, taken from geomean of DaCapo, jvm98,
and jbb2000 on 2.4GHz Core 2 Duo
14
Naïve Mark-Region
0
  • Contiguous allocation into regions
  • Excellent locality
  • For simplicity, objects cannot span regions
  • Simple mark phase (like mark-sweep)
  • Mark objects and their containing region
  • Unmarked regions can be freed

15
ImmixEfficient Mark-Region Garbage Collection
16
Lines and Blocks
? More contiguous allocation
? Increased metadata o/h
? Constrained object sizes
Free
Free
Recyclable lines
Recyclable lines
0
? TLB locality, cache locality
? Block gt 4 X max object size
? Objects span lines
? Lines marked with objects
? Less fragmentation
? Fast common case
17
Allocation Policy(Recycling)
  • Recycle partially marked blocks first
  • Minimizes fragmentation
  • Maximizes sharing of freed blocks
  • Recycle in address order
  • We explored other options
  • Allocate into free blocks last

18
Opportunistic Defragmentation
  • Opportunistically evacuate fragmented blocks
  • Lightweight, uses same allocation mechanism
  • No cost in common case (specialized GC)

0
  • Identify source and target blocks
  • (see paper for heuristics)
  • Evacuate objects in source blocks
  • Allocate into target blocks
  • Opportunistic
  • Leave in place if no space, or object pinned

19
Other Optimizations
? Most objects small
? Small objects implicitly mark next line
? V. Fast common case
? Large objects mark lines exactly
?
? Multi-line objects may skip many small holes
? Overflow allocation (used on failure)
20
ResultsComplete data available
athttp//cs.anu.edu.au/Steve.Blackburn/pubs
21
Evaluation
  • 20 Benchmarks
  • Hardware

Collectors
  • Core 2 Duo
  • 2.4GHz, 32KB L1, 4MB L2, 2GB RAM
  • AMD Athlon 3500
  • 2.2GHz, 64KB L1, 512KB L2, 2GB RAM
  • PowerPC 970
  • 1.6GHz, 32KB L1, 512KB L2, 2GB RAM
  • DaCapo
  • SPECjvm98
  • SPEC jbb2000

  • Full Heap
  • Immix
  • MarkSweep
  • MarkCompact
  • SemiSpace
  • Generational
  • GenIX
  • GenMS
  • GenCopy
  • Sticky
  • StickyIX
  • StickyMS

Methodology
  • MMTk
  • Jikes RVM 2.9.3
  • (Perf HotSpot 1.5)
  • Replay compiler
  • Discard outliers
  • Report 95th ile

Please see the paper for details.
22
Mutator Time
Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz
Core 2 Duo
23
Minimum Heap
24
GC Time
Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz
Core 2 Duo
25
Total Performance
Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz
Core 2 Duo
26
Generational Performance
Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz
Core 2 Duo
27
Sticky Performance
Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz
Core 2 Duo
28
PseudoJBB 2000
On 2.4GHz Core 2 Duo
29
PseudoJBB 2000
On 2.4GHz Core 2 Duo
30
Prior Work
  • http//www.ibm.com/developerworks/ibm/library/i-ga
    rbage1/
  • IBM product collector
  • Mark-Region not characterized
  • Collector not evaluated
  • Product and basis for other research
  • Domani et al 2000Kermany Petrank 2006

31
Mark-Region Collection
Sweep-to-Free

32
ImmixEfficient Mark-Region Collection
Actual data, taken from geomean of DaCapo, jvm98,
and jbb2000 on 2.4GHz Core 2 Duo
33
Open Source Code available in JikesRVM 2.9.3
onward.http//www.jikesrvm.orgComplete data
available athttp//cs.anu.edu.au/Steve.Blackbur
n/pubs
34
Research History
  • PLDI 1998
  • Clinger Hanson postulated the radioactive decay
    model for object lifetimes
  • Genesis of Older-First
  • Stefanovic, McKinley, Moss OOPSLA99

35
Garbage Collection Hypotheses
  • Generational hypothesis younger objects die
    quickly, so collect them first
  • Older-first hypothesis the collector can collect
    less the longer it waits

Age ordered heap
Survival function s(v) for object
lifetime distribution
s(v)
younger ? older 0 1/2V
V
36
Older-first Algorithm
37
Next Steps
  • Beltway
  • BJMM PLDI02
  • Increments
  • Belts
  • Combines generational and older-first
  • Ulterior Reference Counting
  • BM OOPSLA03
  • Reference count on-per-object basis
  • Responsiveness and throughput
  • MMTk BCM SIGMETRICS04 ICSE04
  • Toolkit for building understanding GC
  • Motivated todays work

38
Garbage Collection is the Answer to All Your
Problems
  • Improves data and code locality
  • Huang et al. OOPSLA02 ISMM04, VEE04
  • Cooperative GC optimizations
  • Colocation Guyer OOPSLA05
  • Free-me Guyer et al. PLDI06
  • Finds leaks
  • Bond ASPLOS06, Jump POPL07
  • Tolerates leaks
  • Bond OOSLA08
  • Helps with dynamic software updating!
  • Subramaniam, Hicks ??08
  • DaCapo Benchmarks
  • Blackburn et al. OOPSLA06 CACM08

39
380C
  • Where are we where we are going
  • Why you need to care about workloads
  • Managed languages
  • Dynamic compilation
  • Inlining
  • Garbage collection
  • Opportunity to improve data locality on-the-fly
  • Read X. Huang, S. M. Blackburn, K. S. McKinley,
    J. E. B. Moss, Z. Wang, and P. Cheng, The Garbage
    Collection Advantage Improving Program Locality,
    ACM Conference on Object Oriented Programming,
    Systems, Languages, and Applications (OOPSLA),
    pp. 69-80, Vancouver, Canada, October 2004.
  • Alias analysis
  • Dependence analysis
  • Loop transformations
  • EDGE architectures
Write a Comment
User Comments (0)
About PowerShow.com