Garbage Collection

Abstraction of Garbage Collection

  • Manual Memory Management: C/C++
    • Problems: 1. Memory Leak; 2. Double Free; 3. Use-after-frees
  • What’s Garbage: Objects that won’t be used again

  • Types of Garbage Collectors:

    • Incremental vs. stop-the-world:
    • An incremental collector is one that runs concurrently
      with the program.
    • A stop-the-world collector pauses program execution to
      look for garbage.
    • Compacting vs non-compacting:
    • A compacting collector is one that moves objects around
      in memory.
    • A non-compacting collector is one that leaves all objects
      where they originated.

Algorithms

  • The simplest framework: reference counting
    • Each object should have a reference count (refcount)
    • Create/Delete a reference will increase/decrease the value of refcount
    • When refcount is zero, it should be reclaimed.
    • More: Reference Cycles is a set of objects that cyclically refer to one another.
    • Because all the objects are referenced, all have nonzero refcounts and are never reclaimed.
    • An implementation in C++: shared_ptr
  • Mark-and-Sweep: The Intuition
    • Any objects (not) reachable from the root set are (not) reachable
    • Marking Phase: Find reachable objects. + Sweeping phase: Reclaim free memory
    • Four States: Marked, Enqueued, Unknown, Deallocated
  • Baker’s Algorithm

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    Move all of the root set to the enqueued list.
    While the enqueued list is not empty:
    Move the first object from the enqueued list to the marked list.
    For each unknown object referenced, add it to the enqueued list.
    At this point, everything reachable is in marked and everything unreachable is in unknown.
    Concatenate the unknown and deallocated lists
    Deallocates all garbage in O(1).
    Move everything from the marked list to the unknown list.
    Can be done in O(1).
    Indicates objects again must be proven reachable on next scan.
  • Stop-and-Copy:

    • Improve Locality => compaction
    • Increasing Allocation Speed: Free-List
      • General: 10-20 assembly instructions
      • Stack Allocation: one
    • Advantages:
      1. Implementation simplicity (compared to mark-and-sweep).
      2. Fast memory allocation; using OS-level tricks, can allocate in a single assembly instruction.
      3. Excellent locality; depth-first ordering of copied objects places similar objects near each other.
    • Disadvantages:
      1. Requires half of memory to be free at all times.
      2. Collection time proportional to number of bytes used by objects.

Hybrid Approaches

  • Motto: Objects die young (short live object)
  • Several Layer “Generations”:
    1. Eden: Stop-and-copy strategy
    2. Survivor Objects: move elements to it when OOM
    3. Tenured Objects: Objects that survive long enough

TO BE CONTINUE