Garbage Collector Internals
This document describes the internals of mruby's garbage collector
for developers working on src/gc.c and related code.
Read this if you are: modifying core data structures that hold object references (and need to add write barriers), debugging memory leaks or GC-related crashes, tuning GC performance for an embedded target, or working on the GC code itself.
For user-facing GC docs, see gc-arena-howto.md (arena usage in C extensions) and memory.md (heap regions).
Overview
mruby uses a tri-color incremental mark-and-sweep garbage collector with an optional generational mode. The collector runs in small incremental steps between VM instruction execution, avoiding long pauses.
Tri-Color Model
Every heap-allocated object has a color stored in
RBasic::gc_color (3 bits):
| Color | Value | Meaning |
|---|---|---|
| White (A or B) | 1 or 2 | Unmarked, candidate for collection |
| Gray | 0 | Marked, but children not yet scanned |
| Black | 4 | Fully marked and scanned |
| Red | 7 | Static/ROM object, never collected |
The GC uses two white types (A and B) in a flip-flop scheme. At the start of each GC cycle, the meaning of "current white" is flipped by XORing the white bits. This avoids recoloring all live objects at cycle boundaries, which is an O(1) operation instead of O(n).
#define is_dead(s, o) \
(((o)->gc_color & other_white_part(s) & GC_WHITES) || \
(o)->tt == MRB_TT_FREE)
An object is dead if it still carries the previous cycle's white color.
Heap Structure
Heap Pages
Objects are allocated from fixed-size heap pages:
mrb_heap_page
+-- freelist linked list of free slots
+-- next next page in heap list
+-- free_next next page with free slots
+-- old old generation flag (generational mode)
+-- region true if carved from a contiguous region
+-- objects[MRB_HEAP_PAGE_SIZE] RVALUE array (default 1024)
Each page holds MRB_HEAP_PAGE_SIZE objects (default 1024). On
64-bit systems, a page is approximately 40 KB (40 bytes per slot).
RVALUE Union
All mruby object types share the same slot size via a C union:
RVALUE = union of {
RBasic, RObject, RClass, RString, RArray, RHash,
RRange, RData, RProc, REnv, RFiber, RException, ...
struct { RBasic header; RVALUE *next; } (free slot)
}
Free slots use the union space for a freelist pointer.
Freelist
Each page maintains a singly-linked freelist of available slots.
Allocation pops from the freelist; deallocation during sweep
prepends to the freelist. The GC tracks pages with free slots in
gc->free_heaps for fast allocation.
Heap Regions
For embedded systems with fixed memory banks, mrb_gc_add_region()
carves heap pages from a user-provided contiguous buffer:
static uint8_t heap_buf[64 * 1024];
mrb_gc_add_region(mrb, heap_buf, sizeof(heap_buf));
Region pages are never freed by the GC (even if all objects die).
When region pages are exhausted, allocation falls back to malloc().
GC Phases
The GC operates as a three-state machine:
GC_STATE_ROOT --> GC_STATE_MARK --> GC_STATE_SWEEP --> GC_STATE_ROOT
Root Scan (GC_STATE_ROOT)
Marks objects directly reachable from the VM:
- Global variables (
mrb_gc_mark_gv) - GC arena (
gc->arena[0..arena_idx-1]) - All built-in classes (Object, Class, Module, etc.)
- Top-level self (
mrb->top_self) - Current exception (
mrb->exc) - Execution contexts (VM stacks, call info stacks, active fibers)
- Task queues (if
MRB_USE_TASK_SCHEDULERis defined)
After root scanning, the white color is flipped.
Incremental Marking (GC_STATE_MARK)
Gray objects are popped from the gray stack and their children marked. Each step processes a limited number of objects:
limit = (GC_STEP_SIZE / 100) * step_ratio
With default step_ratio = 200 and GC_STEP_SIZE = 1024, the
limit is 2048 objects per step.
When the gray stack is exhausted, the final marking phase re-marks the arena and global variables to catch objects created during marking, then transitions to sweep.
Sweep (GC_STATE_SWEEP)
Iterates through heap pages. For each object:
- If dead (previous cycle's white): call
obj_free(), return slot to freelist - If alive: paint with current white for the next cycle
Sweep is also incremental: gc->sweeps tracks the current page
position between steps.
Gray Stack
The gray stack is a fixed-size array of object pointers:
struct RBasic *gray_stack[MRB_GRAY_STACK_SIZE]; /* default 1024 */
size_t gray_stack_top;
mrb_bool gray_overflow;
When the stack overflows, gray_overflow is set to TRUE. During
marking, gc_gray_rescan() scans the entire heap to find any gray
objects that could not be pushed. This guarantees correctness at the
cost of a full heap scan.
Write Barriers
During incremental marking, a black (fully marked) object storing a reference to a white (unmarked) object creates a dangerous edge that could lead to premature collection. Write barriers prevent this.
Field Write Barrier
Used when assigning a specific field:
mrb_field_write_barrier(mrb, parent, child);
If parent is black and child is white:
- During marking or generational mode: paint
childgray (add to gray stack for scanning) - During sweep: paint
parentwith current white (demote it for next cycle)
General Write Barrier
Used when an object has been modified but the specific child is not known:
mrb_write_barrier(mrb, obj);
Paints obj gray and pushes it onto the gray stack for re-scanning.
GC Arena
The arena protects newly created objects from collection before
they are stored in a reachable location. Every mrb_obj_alloc()
automatically pushes the new object onto the arena.
C extensions must save and restore the arena index when creating many temporary objects to prevent arena overflow:
int ai = mrb_gc_arena_save(mrb);
/* create temporary objects */
mrb_gc_arena_restore(mrb, ai);
Fixed vs Dynamic Arena
- Dynamic (default): arena grows by 1.5x when full. Risk of unbounded growth if arena is not properly managed.
- Fixed (
MRB_GC_FIXED_ARENA): raises an exception on overflow. Arena size isMRB_GC_ARENA_SIZE(default 100).
Permanent Registration
For long-lived C objects that must survive indefinitely:
mrb_gc_register(mrb, obj); /* add to permanent root */
mrb_gc_unregister(mrb, obj); /* remove from root */
These store objects in a global array that is always marked as part of the root set.
See gc-arena-howto.md for detailed usage patterns.
Generational Mode
When enabled (default, unless MRB_GC_TURN_OFF_GENERATIONAL is
defined), the GC classifies objects into young and old generations.
Minor GC
Only processes young objects. Pages where all objects are old are
marked with page->old = TRUE and skipped entirely during sweep.
Minor GC always runs to completion in a single step.
Major GC
A full mark-and-sweep cycle that processes all objects. Triggered
when gc->live > gc->oldgen_threshold. Major GC runs
incrementally, like the non-generational collector.
After a major GC completes, the collector reverts to minor GC mode. The old-generation threshold is recalculated:
oldgen_threshold = live_after_mark * MAJOR_GC_INC_RATIO / 100
With MAJOR_GC_INC_RATIO = 120, a major GC triggers when live
objects exceed 120% of the last major GC's survivors.
Transitioning Between Modes
mrb_gc_generational_mode_set(mrb, TRUE); /* enable */
mrb_gc_generational_mode_set(mrb, FALSE); /* disable */
From Ruby: GC.generational_mode = true/false.
Object Allocation
mrb_obj_alloc() is the core allocation function:
- If
MRB_GC_STRESSis defined, run a full GC - If
gc->live >= gc->threshold, runmrb_incremental_gc() - Ensure arena has space (
gc_arena_keep) - Pop an object from the freelist of
gc->free_heaps - If no free pages, allocate a new page (
add_heap) - Initialize the object (zero fill, set type and class)
- Paint with current white color
- Push onto arena (
gc_protect) - Increment
gc->live
Object Freeing
obj_free() performs type-specific cleanup:
- Objects/Exceptions: free instance variable tables
- Classes: free method tables and instance variable tables
- Arrays: free heap buffer (if not embedded/shared)
- Hashes: free hash table
- Strings: free heap buffer (if not embedded/shared)
- Data objects: call user-provided
dfreecallback - Procs: decrement irep reference count
- Fibers: free context (stacks)
The object's type is set to MRB_TT_FREE after freeing.
Triggering GC
Automatic
GC runs automatically when gc->live >= gc->threshold during
object allocation. After each cycle:
threshold = (live_after_mark / 100) * interval_ratio
minimum: GC_STEP_SIZE (1024)
With default interval_ratio = 200, GC triggers when live objects
roughly double.
Manual
mrb_full_gc(mrb); /* force complete GC cycle */
mrb_garbage_collect(mrb); /* public API wrapper */
From Ruby: GC.start.
Configuration
Compile-Time
| Macro | Default | Description |
|---|---|---|
MRB_HEAP_PAGE_SIZE |
1024 | Objects per heap page |
MRB_GRAY_STACK_SIZE |
1024 | Gray stack capacity |
MRB_GC_ARENA_SIZE |
100 | Arena size (fixed mode) or initial size |
MRB_GC_FIXED_ARENA |
off | Use fixed-size arena |
MRB_GC_TURN_OFF_GENERATIONAL |
off | Disable generational mode |
MRB_GC_STRESS |
off | Full GC on every allocation (debug) |
MRB_USE_MALLOC_TRIM |
off | Call malloc_trim() after full GC |
Runtime
From Ruby code:
GC.interval_ratio = 200 # threshold = live * ratio / 100
GC.step_ratio = 200 # objects per incremental step
GC.generational_mode = true
GC.start # force full GC
GC.enable # re-enable GC
GC.disable # disable GC
Source Files
| File | Contents |
|---|---|
src/gc.c |
GC implementation (~1400 lines) |
include/mruby/gc.h |
mrb_gc structure, public GC API |
include/mruby.h |
Arena save/restore macros |