Garbage Collection - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Garbage Collection

Description:

Note the specific use of 'can' and not 'will'. Typically we delete an object (free ... The heap is naturally compacted (defragmented) as a result of collection. ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 27
Provided by: usersEc3
Category:

less

Transcript and Presenter's Notes

Title: Garbage Collection


1
Garbage Collection
2
What is Garbage?
  • Memory allocated from the heap that can no longer
    be accessed is referred to as garbage.
  • Note the specific use of can and not will.
  • Typically we delete an object (free the chunk) as
    soon as we suspect that we will no longer access
    that chunk.
  • Garbage collection applies onto to objects that
    literally cannot be accessed (whether we would
    want to or not).

3
What do you mean cant?
  • With the powerful pointer operations with C,
    theres no such thing as unreachable memory.
  • char p (char) any_address_you_want
  • In most languages, however (e.g., Java, Pascal,
    etc), pointer arithmetic and unchecked pointer
    casts are not permitted.
  • In these languages its possible to lose the
    ability to access an object

4
Losing an object
  • T p new T // object allocated
  • p nilltTgt() // lost forever.
  • An object becomes unreachable tt the instant when
    the last pointer that points to an object is
    removed.
  • assigned a new value (points to nil or something
    else).
  • goes out of scope.

5
Detecting Unreachable Objects with Reference
Counting
  • Can we determine when objects become unreachable?
  • Yes, count the number of references to each
    object.
  • If the number of references ever reaches zero,
    then the object truly is unreachable.

6
Reference Counting Hazard 1
  • We must store the count inside the object, not
    inside the pointers.
  • There are four ways to ensure the count is
    available
  • modify the heap so that the count is part of the
    chunks signature
  • instruct all programmers to include a count in
    all classes (yuck).
  • create a base class that contains a count and
    make all objects inherit from that base class.
  • Use a pair of pointers, one to the reference
    count, one to the object.

7
Reference Counting Hazard 2
  • The reference count is not guaranteed to go to
    zero (even when the object is unreachable).
  • Any data structure with a cycle, e.g.,
  • trees that contain parent pointers
  • circularly linked lists
  • Reference counting is safe, but not complete
    (never deletes non-garbage, but might fail to
    delete actual garbage).

8
Garbage Collection
  • The system (really library routines, e.g.,
    malloc) examines all of memory to determine which
    chunks are garbage and which are reachable.
  • There are several variations, three general
    themes
  • mark/sweep
  • copy
  • generational

9
When Do We Collect?
  • Most garbage collection is done on a stop the
    world basis. All processing must be suspended
    while the garbage collector runs.
  • On single threaded systems no problem.
  • On multi-threaded systems, all threads must be
    paused.
  • The collection process is often slow.
  • E.G., Mark/sweep touches every chunk in the heap
    and every pointer in the stack or data segment.
  • Collect periodically?
  • Collect when out of memory?

10
Mark and Sweep
  • One additional bit of storage is added to the
    signature(s) of each chunk. Call this the
    garbage bit.
  • When a collection begins, the collection routine
    first visits every chunk in the heap and sets the
    garbage bit in each.
  • The collection routine then attempts to visit
    every reachable chunk and clear the garbage bit
    (mark phase).
  • After marking, everything that is still garbage
    is deallocated (sweep).

11
Marking
  • The marking process is simple in principle.
  • Begin with the root pointers. These are all
    pointers into the heap that are either
  • global variables
  • local variables in active stack frames
    (activation records).
  • Place all the root pointers in a work queue.

12
Marking Continued
  • Repeat until work queue is empty
  • remove pointer p from the queue
  • if the chunk pointed to by p is marked garbage
    then
  • mark the chunk as non garbage
  • locate all pointers inside the object stored in
    the chunk.
  • add all these pointers to the work queue.
  • NOTE if the chunk is not garbage, then weve
    visited it before, no reason to visit it again
    (avoid cycles this way).

13
How Do You Find Pointers?
  • Some runtime systems (e.g., Java JVM and most
    lisp interpreters) store type information along
    with every reference (AKA pointer).
  • To find pointers, simply walk through memory
    looking for locations that are tagged as
    reference type.
  • C/C does not store type information with the
    objects.
  • More efficient storage,
  • but makes garbage collection impossible(?)

14
Providing Garbage Collection for C
  • Perhaps we can replace all pointers in our
    programs with objects that can be recognized as
    pointers.
  • e.g., require that all pointers be stored in
    certain memory locations.
  • What the programmer thinks is a pointer, is
    really a handle (a pointer to the actual
    pointer).
  • When its time to mark, go to the memory
    locations that hold the true pointers, and add
    these to the work queue.

15
Analysis of Handles
  • All pointers in the program would have to be
    replaced with handles.
  • may not be practical for existing systems
  • No obvious way to distinguish root pointers from
    other pointers.
  • Whoops! This is a major defect.
  • Unless we can distinguish the root pointers, we
    can not do better than reference counting.

16
Boehm / Wieser
  • Observation 1 we really only care about
    pointers to object in the heap (other pointers do
    not need to be identified).
  • Observation 2 we (the heap implementers) know
    what range of addresses have been used for the
    heap.
  • Eureka! given any memory location we can
    determine if the location holds a pointer to the
    heap by checking to see if its value is within
    the range for the heap.

17
What about Root Pointers?
  • The next key observation is
  • Most (all?) operating systems provide functions
    that allow us make queries about the memory map.
  • e.g., in Win32 GetSystemInfo() and
    VirtualQuery().
  • Eureka! We can determine the address ranges for
    the stack, and the data segment.
  • all root pointers will be inside this range.

18
Analysis of Boehm / Wieser
  • Pointers are identified conservatively.
  • Some locations may hold values that just happen
    to be within the range of heap addresses.
  • All pointers are correctly identified (unless)
  • Programmers must not play tricks with pointers.

19
Mark / Sweep Drawback Paging
  • Lets imagine that were very lucky, and we
    generally only collect garbage when theres lots
    to collect (more efficient this way).
  • Note that the mark phase requires that all
    objects be touched.
  • strictly speaking, only the signatures need to be
    touched.
  • If 90 of the heap is garbage, then
  • well have a lot of page faults
  • most of the page faults are a waste of time!

20
Copy-Based Collection
  • With copy-based collection the memory available
    for the heap is divided into two parts (heap A
    and heap B).
  • only one heap half is used at a time.
  • To collect garbage
  • visit objects as we did before (with work queue)
  • instead of marking the objects, make a copy of
    each reachable object in the other heap half.
  • update pointers to reflect the new location of
    the object.

21
Analysis of Copy Collection
  • The heap is naturally compacted (defragmented) as
    a result of collection.
  • We dont need to touch the garbage (may avoid
    page faults this way).
  • Utilization of memory is reduced by 50.
  • Need to change the pointers.
  • NOTE the conservative assumption about
    pointers used by Boehm / Wieser would no longer
    be conservative if we were changing the pointer
    values. We could end up changing somebodys data!

22
Generational Collection
  • Collect only subsections of the heap at a time.
  • Recently allocated objects are in the hot
    section of the heap. This hot section is
    collected frequently.
  • The age of an object is the number of collections
    that the object survives without being collected.
  • As an objects age increases, it moves into
    successively colder sections of the heap (which
    are collected less frequently).

23
Analysis of Generational Collection
  • Generational collection has better caching
    behavior than traditional systems.
  • Lots of objects are used for only a short period
    of time.
  • Objects that dont become garbage right away, are
    likely to be used for a long time.
  • Generational collection requires copying (suffers
    from same drawbacks as copy collection).

24
Making Malloc Fast
  • The Knuth Heap uses very naïve signatures with
    regard to performance.
  • Must touch lots of memory to find a chunk of the
    appropriate size.
  • Observation most programs allocate only a few
    different kinds of objects (even if they allocate
    millions of objects of each kind).
  • Eureka! keep free lists for chunks of each size.

25
Powers of 2
  • One simple strategy is to always allocate chunks
    that are a power of 2 (bytes) in size. e.g., 8
    bytes, 16, 32, 64, 128, 256, bytes.
  • if a request is made for 10 bytes, we allocate a
    chunk of size 16.
  • can waste up to 50 of the memory as internal
    fragmentation.
  • As memory is deallocated, we insert the chunk on
    a free list.
  • All chunks one the free list are the same size
  • One pointer is kept for each possible size.

26
Storing Signatures
  • Signatures do not actually need to be stored with
    the chunks.
  • If we have a free list for each size we dont
    even need a signature at all! (well except to
    support garbage collection).
Write a Comment
User Comments (0)
About PowerShow.com