Automatic memory management: hybrid reference counting

Posted on March 1, 2014 by Tommy McGuire
Labels: rust, programming language

When last we left, I had demonstrated that one form of garbage collection, reference counting, had a significant problem: it could not collect cycles of objects. If two objects point at each other, each will always have a reference count greater than zero even if nothing else points to them.

Various smart people created various fixes for the cycle problem. One persistent suggestion is to use weak pointers, pointers which do not increase the destination's reference count but which must be checked before they can be followed because the destination may no longer exist. Weak pointers work well if the structure being built includes designated, special references like back pointers in a doubly-linked list or parent pointers in a tree. On the other hand, in a general graph structure weak pointers seem less useful. One algorithm, mentioned in Garbage Collection by Jones and Lins attempts to automatically break cycles using weak pointers. They also mention that it does not actually work.

A better option is a cycle detector: a separate subsystem that scans objects that appear to be live, checking them for real, external pointers. Typically, cycle detectors work by removing the counts for the internal links in a graph of objects, leaving only valid external links. Python has a fine example of a cycle detector. (Which link I found via a Stack Overflow answer; some old documentation still seems to be relevant.) The Python documentation describes both the Python and lower-level C interfaces of its cycle detector. By the way, if you take a look at any of those links, pay attention to the handling of objects with __del__ methods. But more on that later.

A slightly different cycle collector is described in Garbage Collection, specifically one by Rafael Lins himself.

Lins' cycle collector

Lins' algorithm hooks into the normal reference counting mechanism when an object's reference count is decremented. If the count becomes zero, the object is indeed garbage and can be deleted. If the reference count is non-zero, however, the object is acting sketchy. It is not behaving normally. It is standing out. And you know what happens to nails that stick out.

Anyway, when an object's count is reduced but remains above zero, it is inserted into a control set maintained by the collector. It is also marked with a special color, purple.

Coloring is a frequent technique in collection algorithms. In Garbage Collection, the typical colors used are black, grey, and white, but the colors indicate different things for different algorithms; in the case of Lins' algorithm, black indicates an object is known to be in use, grey indicates an object that may be garbage, and white indicates that an object is known to be garbage. Purple means sketchy.

Cycle detection works in three phases, for each object found in the control set that is still purple (objects can be recolored black by being used by the mutator):

  1. The object is marked grey, and all of its children recursively have their counts decremented and are marked grey. This step subtracts the effects of all of the pointers between objects in the sub-graph consisting of this object and its children---any reference count greater than zero must represent a pointer from outside this group of objects. Note that no deletion happens now. The objects may still be alive and if so, will have their reference counts restored.
  2. Starting from the same element of the control set, the algorithm scans grey objects, testing their current reference count. If the count is greater than zero, the object has an external pointer. The object and its children are recolored black and their reference counts restored. If the count is equal to zero, the object is colored white.
  3. Again starting from the element of the control set, white elements are deleted.

The third step as presented in Garbage Collection has issues. The algorithm there is:

collect_white(S) =
if colour(S) == white
for T in Children(S)

The two problems are, first, if the structure is indeed cyclic then this algorithm will recur infinitely, and second, a deleted object may be visited by a later stage of collect_white. Many memory management systems would not find the latter to be incorrect since freeing an object still leaves its storage under the care of the manager, but it does become a problem when an underlying allocator such as malloc is used. Weirdly, the former does not seem to be present in Lins' original paper.

There are some preliminary complications that need to be handled before we can look at this collector in Rust.

Type management and allocations

Back in the last post, the Rc and Box types were parameterized by the type T, the type of the user data. Doing so has a number of advantages: it is always completely type safe since the type is tracked from the creation of the Boxes and Rc's, through borrowing the object, to the deletion step. Further, deletion is simplified because a Box<T> can be allocated as an owned pointer, converted to an unsafe pointer for most of its life, and returned to an owned pointer to be deleted, correctly handling any deletion process for T.

Unfortunately, the collector I am working on here will be looking at objects of many different types. To see why a collector cannot deal with only a single type, consider what happens when a type T contains a reference to a type U, which in turn contains a reference to T: we are back in the realm of uncollectable garbage. Fortunately, Rust provides some assistance for us. (What the heck, it has to track this stuff.)

The key I am using is the TyDesc pointer, a pointer to a type descriptor provided by the std::unstable::intrinsics module. (Does that make you nervous? It should. I'm using Rust 0.9; this is probably all changed in the head of master. Fortunately, intrinsics seems to have been reasonably stable since 0.7 and the facilities have been present and used by the arena module since 0.5.)

TyDesc (I'm not going to bold every instance of Rust code anymore; when I last fiddled with the Maniagnosis fonts, all the boldy things became annoyingly busy.) is a structure containing a number of useful things including the size and alignment requirements for the type being described and the type's drop_glue, a function to be called to invoke the type's destructor (which may be explicit as in an instance of the Drop trait, or not as in the case of a type containing an owned pointer to a vector or something.

Rust makes extensive use of the Resource Allocation Is Initialization (RAII) technique, particularly for memory management in its standard library. That is the whole "owned pointer" thing. And no, Rust's types are not parameterized with an allocator like C++'s standard templates. (And I hope they never will be; that is insane.) As a result, drop_glue is very important. The alternative for a memory management system like this would likely be to provide separate implementations of a great deal of the standard library. Rust's original @ sigil, the associated "garbage collector," managed strings, and the at_vec module are all an example of the result. Rust isn't Haskell; multiple implementations of the standard prelude in the standard library are not really appreciated. Consistent, if not deterministic, handling of destructors must be a requirement for a memory management system in Rust.

To start, Box will no longer be parameterized (explicitly, anyway) by the user's type T. Instead, it will contain the reference count, a color, and a pointer to T's type descriptor.

enum Color { Black, Grey, White, Purple, }

struct Box {
color : Color,
count : uint,
tydesc : *TyDesc,
drawing of a Box

The Box and its object will be allocated in a single call to malloc, leaving the object immediately following the Box header. Because of the object's alignment requirements, there may need to be some padding between the Box and the object itself; the size of the Box and this padding along with the location of the object can be calculated using the size of the Box and the type descriptor.

fn padded(size : uint, align : uint) -> uint { (size + align - 1) & !(align - 1) }
fn box_offset(t : *TyDesc) -> uint { unsafe { padded( intrinsics::size_of::<Box>(), (*t).align ) } }

Allocating and initializing the Box and object is...interesting. Welcome to systems programming!

impl Box {
unsafe fn unsafe_new(t : T) -> *mut Box {
let tydesc = intrinsics::get_tydesc::<T>();
let offset = box_offset(tydesc);
let block = libc::malloc( (offset + (*tydesc).size) as u64 ) as *u8;
intrinsics::move_val_init( cast::transmute(block),
Box { color : Black,
count : 1,
tydesc : tydesc, } );
intrinsics::move_val_init( cast::transmute( ptr::offset(block, offset as int) ),
t );
block as *mut Box

move_val_init takes a pointer to an uninitialized block of memory plus a Rust value, and moves the value into the block without deinitializing the block or doing any other magic. cast::transmute is used to tell the type system that I really, really know what I'm doing. (It's a lie.) That is, to mean in the first use that block is a pointer to a Box-sized area, and in the second that the offset calculation is a pointer to a object-sized area. ptr::offset is used to compute the location of the object in a manner similar to C's array indexing; block plus offset times the size of *block. In this case, block is a pointer to bytes, which matches the offset calculation.

Deallocation is similarly interesting.

unsafe fn unsafe_free(ptr : *mut Box) {
let tydesc_t = (*ptr).tydesc;
((*tydesc_t).drop_glue)( ptr::offset(ptr as *u8, box_offset(tydesc_t) as int) as *i8 );
libc::free( ptr as *libc::c_void );

unsafe_free invokes the object type's drop_glue with a pointer to the object, as a pointer to (inexplicably) a signed i8. Then, it frees the entire Box and object block. This step is a pretty hairy issue.

If you look at Tim Peter's comment on python-dev, or a fair number of comments on python-dev in March, 2000, you will find a great deal of discussion about Python's cycle detector and its relationship to Python objects' finalizers.

If you would prefer to avoid reading Tim Peter's weird typographic tics in ancient Python mail, here's the takeaway: garbage collection and user-defined destructors are not a good combination in roughly the same way that licking a server rack is not a good idea. In either case, it doesn't taste very good and you end up with difficult-to-explain cuts. The bottom line is that Python does not automatically free objects that are participating in cycles and that have user-defined finalizers.

The problem is two-fold: on the one hand, finalizers can (sanely!) be used to clean up other resources such as file descriptors, but the nondeterministic nature of collection makes that much less attractive. (You will likely run out of file descriptors before you run out of memory.) On the other, finalizers can also be used to do weird things like resurrecting objects just before they would be deleted. These things may not be valid if some other, pointed-to, child, object has already been deleted.

Further, not calling the object's drop glue is not an option, since allowing the object to have owned pointers to other non-collected objects is important to avoiding the at_vec issue, and calling drop_glue seems to be the only way to free such owned pointers. I think it might be reasonable to prevent allocated objects from having user-defined finalizers, but I do not know how to do that at the moment. As a result, I am invoking the drop_glue before unconditionally releasing the object. (No resurrections here!)

Fortunately, Rust already offers plenty of opitions for non-collected objects with appropriate RAII semantics, which ameliorate the first problem and I am perfectly willing to put the second under the heading of "Don't do stupid things", with the corollary that one ought not to mix destructors and garbage collection. As a result, I don't think the Python answer to the question is particularly applicable to Rust.

I will cover the remaining reference counting operations later, but to illustrate one, the method used to get a reference to the object makes significant use of the type descriptor.

unsafe fn unsafe_get<T>(&self) -> *T {
// TODO: Ensure T matches tydesc?
let ptr : *u8 = cast::transmute(self);
ptr::offset( ptr, box_offset(self.tydesc) as int ) as *T

This method takes a type T and returns a pointer to what is hoped to indeed be a T. Currently, that is not verified at runtime. However the smart-pointer type is parameterized by T and is the only thing that can get to this method, so any invocation of this method has to be called with the right T. That is my story and I'm sticking to it.

Lins' operations

initial state of Lins' algorithm
Initial state when collecting: objects on the right are live; objects on the left are garbage.

The basic operations of Lins' algorithm are mark_grey, scan, scan_black, and collect_white. Currently, these are all implemented by the Box struct.

unsafe fn mark_grey(&mut self) {
if self.color != Grey {
self.color = Grey;
let children = self.children();
for &ch in children.iter() {
(**ch).count -= 1;

mark_grey acts to remove the effect of pointers from this object to other objects from the other object's reference counts. As a result, when it is complete any non-zero counts represent pointers from outside the (possibly cyclic) graph that this object participates in.

state after mark_grey
After mark_grey: all objects have been marked grey.

I will get into the children method later; it is a complex topic. For the moment, understand that it returns a collection of pointers to references to Boxes that represents the set of children of the object associated with this Box. Alternatively, it returns a collection of uninterpreted values, each representing an object directly referenced by this object, but that is saying the same thing. (See tracing below. Do you like systems programming yet?)

mark_grey is almost a direct translation of the algorithm, as is scan.

unsafe fn scan(&mut self) {
if self.color == Grey {
if self.count > 0 {
} else {
self.color = White;
let children = self.children();
for &ch in children.iter() {

scan_black restores the reference counts for objects that are still alive. Following the scan/scan_black step, the objects with non-zero external counts and their descendants account for both pointers from outside this graph and inside while objects without external references are colored white.

unsafe fn scan_black(&mut self) {
if self.color != Black {
self.color = Black;
let children = self.children();
for &ch in children.iter() {
(**ch).count += 1;
state after scan
After scanning: live objects are black, garbage is white.

Note that all of the sketchy objects are scanned by the collector. It is not readily apparent that a run of Lins' stages on any specific object will result in a collection of the cycle, since there may be multiple internal references that would keep it alive. However, I believe that in any cyclic directed graph, there will be at least one node representing a key object that, under this algorithm, would result in the collection of the cycle. I should probably check Lins' paper.

This, specifically, is the difference between this algorithm and the improvement published in the The Garbage Collection Handbook: this algorithm runs each stage in order on each element of the control set while the improved version runs on the entire control set at each stage. This algorithm may repeatedly mark and unmark an object before it is conclusively determined to be garbage.

collect_white is written thusly:

unsafe fn collect_white(&mut self, collected : &mut HashSet<*mut Box>) {
if self.color == White {
collected.insert(self as *mut Box);
self.color = Purple;
let children = self.children();
for &ch in children.iter() { (**ch).collect_white(collected); }
self.color = White;

This method uses a temporary purple color change to prevent the infinite recursion issue, and collects the white objects onto another structure for deletion to avoid the reference-after-free issue. The final line, task_collector().remove(self), removes this box from the control set maintained by the collector. (It also foreshadows more Rust follies, too.)

All of these operations are described and written recursively. mark_grey, scan, and scan_black can be trivially rewritten iteratively, using an auxiliary stack. I believe collect_white can be as well, with more complexity.

The collector

pub struct HybridRcCollector {
control_set : HashSet<*mut Box>,
trace_fns : HashMap<*TyDesc,fn (*u8) -> ~[*mut Box]>,

impl HybridRcCollector {
fn new() -> HybridRcCollector {
HybridRcCollector { control_set : HashSet::new(),
trace_fns : HashMap::new(), }

For the moment, the important element of the HybridRcCollector structure is the control_set, the collection of sketchy objects. I will get to trace_fns in a bit.

The two primary operations on the former are insert and remove.

fn insert(&mut self, ptr : *mut Box) { self.control_set.insert(ptr); }
fn remove(&mut self, ptr : *mut Box) { self.control_set.remove(&ptr); }

In addition to the call to remove in collect_white, there are calls in unsafe_delete, the Box method used to handle decrementing an object's reference count.

// Part of impl Box
unsafe fn unsafe_delete(ptr : *mut Box) {
if ptr.is_not_null() {
(*ptr).count -= 1;
if (*ptr).count == 0 {
} else {
(*ptr).color = Purple;

This is the method that actually decrements a reference count under normal circumstances, deletes the object if the count is zero, and inserts the object into the collector's control_set if its count has been decremented but is still greater than zero. The test for a null pointer is important later; let me present the GC method itself before I get into it.

unsafe fn unsafe_gc(&mut self) {
let mut deleting : HashSet<*mut Box> = HashSet::new();
loop {
// there has to be a better way to get elements out of a set
match self.control_set.iter().nth(0) {
Some(&ptr) => {
if (*ptr).color == Purple {
(*ptr).collect_white( &mut deleting );
None => { break; }
for &ptr in deleting.iter() {
let children = (*ptr).children();
for &ch in children.iter() {
// convert ch to a mutable pointer
let ch = ch as *mut *mut Box;
*ch = ptr::mut_null();

unsafe_gc is an implementation of Garbage Collection's gc_control_set algorithm. It loops over the elements in the control_set, removing those elements, and then calling the three major operations on each. The end result is that the live object graph is in the same state is was before, while garbage objects from cycles have been collected in deleting. At that point, all of the contents of deleting are deleted.

Note before actually calling Box::unsafe_free, the object's child references are set to null.

Remember discussing finalizers? The problem here is when an object A and an object B point to each other. Deleting A without taking special measures will drop the reference count on B, calling its destructor. Then, deleting B results in a second call to the destructor and seven years' bad luck all around, even if you avoid the infinite recursion. To break that cycle while still ensuring that every object's destructor is called, I null the object's reference-counted-child pointers.

The test in unsafe_delete prevents that method from doing anything when it is called from the drop method of a reference-counted pointer after that point; this breaks the recursion. When A is freed, its destructor is called, but B is not deleted. Instead, B is assumed to also be in the deleting collection; it will be cleanly removed when its turn arrives.

Be aware that if the object has a user-defined destructor that attempts to dereference reference-counted pointers in the object, bad things will happen. The rest of this code ensures safety, that doesn't.

Smart pointers: Hrc

Given what has gone before, the type-safe smart pointer type is pretty trivial. It has the same borrow methods as the Rc type from the last post and roughly the same Drop and Clone implementations.

pub struct Hrc<T> { ptr : *mut Box, }

impl <T> Hrc<T> {
pub fn borrow<'l>(&'l self) -> &'l T {
unsafe { cast::transmute( (*self.ptr).unsafe_get::<T>() ) }

// Bad programmer! Bad! No biscuit!
pub fn borrow_mut<'l>(&'l mut self) -> &'l mut T {
unsafe { cast::transmute( (*self.ptr).unsafe_get::<T>() ) }

impl<T> Drop for Hrc<T> {
fn drop(&mut self) { unsafe { Box::unsafe_delete(self.ptr as *mut Box); } }

impl <T> Clone for Hrc<T> {
fn clone(&self) -> Hrc<T> {
unsafe { (*(self.ptr)).unsafe_inc(); }
Hrc { ptr : self.ptr }

Now, about that children() thing.


Going back to the taxonomy from the last post, this hybrid reference counter and cycle detector is actually a stationary tracing collector. The next hard part to deal with is tracing.

In a traditional collector, tracing starts from roots consisting of the stack, static variables, and registers. In this algorithm, tracing starts from the collector's control set (hybrid reference counting makes a slightly different use of tracing than mark-sweep or copying collectors). Tracing then follows pointers embedded in objects. If you thought that last post had all the vocabulary you might want, you were wrong. There are roughly two ways of identifying roots and pointers in objects: conservatively or precisely.

One sort of middle ground can be found by using tags associated with pointers. Without compiler support, however, some variant of conservatism is almost the only option. However, pure conservatism would be problematic in Rust. Take for example the Cells used in the mutator simulator. Conservatively tracing a cell would mean following a pointer to the location of the owned vector (outside of any memory managed by a collector), then scanning it for heap pointers.

There are another couple of possibilities. One that I haven't explored is to use Rust's really unstable-looking reflection support, also in std::unstable::intrinsics, which might make it possible to find pointers in a data structure by type-directed scanning. That approach feels like it would be slow, however.

For now, I am forcing the user to supply a tracing function for types stored in reference-counted boxes by requiring the trait Trace.

An implementation of Trace should provide a static method that takes a reference to the same type and returns a vector of TracePtr's.

pub struct TracePtr { priv ptr : **mut Box, }

trait Trace {
fn trace(ptr : &Self) -> ~[TracePtr];

There really has to be a better way to do that than return a vector.

TracePtr values are provided by the trace method in Hrc.

// method in Hrc
pub fn trace(&self) -> TracePtr {
TracePtr{ ptr : &self.ptr }

Some implementations of Trace are trivial.

impl Trace for int {
fn trace(_ : &int) -> ~[TracePtr] { ~[] }

While others are less so. This is an implementation for objects consisting of a vector of smart pointers:

impl Trace for ~[Cell] {
fn trace(this : &~[Cell]) -> ~[TracePtr] {
this.iter().map( |&C(ref e)| e.trace() ).collect()

The Trace trait is a requirement for an object being allocated in a smart pointer.

pub fn allocate<T:Trace>(t:T) -> Hrc<T> {
Hrc { ptr : unsafe { Box::unsafe_new(t) } }

Yes, I just smuggled the actual allocator interface, allocate, through in the middle of a completely unrelated section. Sue me.

The fancy trick here is the line, task_collector().register::<T>(). register is a method in HybridRcCollector that is only interested in the type parameter, T. It looks up the type descriptor for the type T, gets a pointer to the trace function implemented by T, and uses the former to store the latter in the trace_fns map.

// method in HybridRcCollector
unsafe fn register<T:Trace>(&mut self) {
let tydesc = intrinsics::get_tydesc::<T>();
let trace_fn : fn (&T) -> ~[TracePtr] = Trace::trace;
self.trace_fns.insert(tydesc, cast::transmute(trace_fn));

Here's the neat thing: because of the way I've defined Trace, TracePtr, and Hrc::trace, the functions stored in trace_fns are in fact of the type fn (*u8) -> ~[**mut Box], as long as the type used to get the key to trace_fns is the same as the type of the object pointed to by the argument of the function. (Did I mention that this was systems programming? There are a few other assumptions, such as that the ptr in TracePtr will be the only bytes of the object. These sorts of things are common in C programming and are the cause of some of the weird requirements of the C standards. I hope that the Rust standard follows along.)

Once the trace_fns map is filled correctly, the Box's children method can built in stages, first by recovering the trace function.

// in HybridRcCollector
fn get_trace_fn<'l>(&'l self, tydesc : *TyDesc) -> &'l (fn (*u8) -> ~[**mut Box]) {

And then calling it with the object attached to the Box, to produce a vector of pointers to child Boxes.

// in Box
unsafe fn children(&self) -> ~[**mut Box] {
let collector = task_collector(); // "does not live long enough" if used in the expression
let trace = collector.get_trace_fn(self.tydesc);
let t = ptr::offset( cast::transmute(self), box_offset(self.tydesc) as int );


There is one remaining complication, and fantastic Rust-ism, necessary for this memory manager. Many of the Box methods, as well as user code, need to access the HybridRcCollector structure. Fortunately, this code is serial, meaning that it runs in a single task, and Rust provides a hook to hang task-local data on: the local_data module.

(I stole most of this from the rand module, which has the TaskRng task-local random number generator.)

local_data_key!(HYBRID_RC_COLLECTOR_KEY : ~HybridRcCollector)

pub struct TaskCollector {
priv collector: *mut HybridRcCollector,

pub fn task_collector() -> TaskCollector {
local_data::get_mut(HYBRID_RC_COLLECTOR_KEY, |alloc| match alloc {
None => {
let mut hrc = ~HybridRcCollector::new();
let ptr = &mut *hrc as *mut HybridRcCollector;
local_data::set(HYBRID_RC_COLLECTOR_KEY, hrc);
TaskCollector { collector : ptr }
Some(hrc) => {
TaskCollector { collector : &mut **hrc }

A TaskCollector is a pointer to the task's collector, lazily initialized when task_collector is first called. The TaskCollector itself is mostly a facade calling the unsafe HybridRcCollector's methods.

impl TaskCollector {
pub fn insert(&self, ptr : *mut Box) { unsafe { (*self.collector).insert(ptr); } }
pub fn remove(&self, ptr : *mut Box) { unsafe { (*self.collector).remove(ptr); } }

pub fn gc(&self) { unsafe { (*self.collector).unsafe_gc() }; }

pub fn collecting(&self) -> bool { unsafe { (*self.collector).collecting } }

pub fn register<T:Trace>(&self) { unsafe { (*self.collector).register::<T>() } }
pub fn get_trace_fn<'l>(&'l self, tydesc : *TyDesc) -> &'l (fn (*u8) -> ~[*mut Box]) {
unsafe { (*self.collector).get_trace_fn(tydesc) } }

Currently, a Rust program blows up amusingly if the HybridRcCollector has a Drop destructor. However, if it were to work, that would be an excellent place to put a final call to the garbage collector, to clean up the allocated memory as the task is exiting.


How does it work? Pretty well.

$ valgrind ./hybrid-reference-count-trace-test
[ creates: 399553, deletes: 300018, links: 200216, unlinks: 31167, ops: 1000000 ]
==20660== HEAP SUMMARY:
==20660== in use at exit: 40 bytes in 1 blocks
==20660== total heap usage: 805,675 allocs, 805,674 frees, 58,457,125 bytes allocated
==20660== LEAK SUMMARY:
==20660== definitely lost: 0 bytes in 0 blocks
==20660== indirectly lost: 0 bytes in 0 blocks
==20660== possibly lost: 0 bytes in 0 blocks
==20660== still reachable: 40 bytes in 1 blocks
==20660== suppressed: 0 bytes in 0 blocks

361 is the length of the control set just before the collector is run as the task is finishing.

Future work

I have a few things to do before I can slap some lipstick on this pig and release it into the wild.

On the other hand, I think I can get something like a mark-sweep collector out of the techniques I have here, which would possibly have better, the same, or worse overhead along with better or worse collection delays.

active directory applied formal logic ashurbanipal authentication books c c++ comics conference continuations coq data structure digital humanities Dijkstra eclipse virgo electronics emacs goodreads haskell http java job Knuth ldap link linux lisp math naming nimrod notation OpenAM osgi parsing pony programming language protocols python quote quotes R random REST ruby rust SAML scala scheme shell software development system administration theory tip toy problems unix vmware yeti
Member of The Internet Defense League
Site proudly generated by Hakyll.