Automatic memory management: reference counting
Last December, when I wrote about Flex- and BisonModule, I mentioned that it was one of my favorite hacks. But I did not describe what the hack was.
Python uses reference-counted memory management. When I wrote BisonModule, that became the chief obstacle: while flex behaves very simply, bison takes control of the program and calls back into Python code to read tokens and to build the parse tree. As a result, managing the lifetimes of symbol objects is a bit tricky. My first attempt dug rather deep into bison's runtime and used the fact that you can poke around with negative sub-expression references ("$1" in the right-hand part of a rule, for example) to access previous symbols. The result was ugly and stood a very good chance of leaking memory like a sieve.
So I scrubbed that and started over; I created an array of Python objects for tokens and symbols and arranged for this array to "own" the objects. Every new symbol or token object was added to the array, with a reference count of 1, for the array ownership. Adding an object to the parse tree incremented the reference count for that object and setting the tree root provided an external pointer and reference count bump to keep the tree from being deleted. Getting rid of the extraneous objects was as simple as walking the array when the parse completed and decrementing each object's reference count. Objects that then had zero counts were garbage and immediately deleted; objects that were part of the tree had non-zero counts and were returned to Python. The algorithm is simple and does not require intimate knowledge of bison's internals, but it does have a weaknesses---all of the tokens and symbols produced in intermediate steps in the parse are kept around until the parse is completed.
It also does not leak memory, even when using Python's reference counting.
Automatic memory management
(There was a beautiful segue here, somewhere.)
There are three fundamentally different styles of managing dynamically-allocated memory in a computer program:
Manually. Manual memory management involves calling new, malloc, or something similar, to allocate a new block of memory, and free, delete, or something like that to release it. Manual memory management is not difficult, really, but it is a horrendous pain in the rump. It is one of the primary factors that makes systems programming mostly bookkeeping. I don't intend to write much about manual memory management here.
Semi-manually. Or, alternatively, semi-automatic memory management. Oddly enough, this is one of the best approaches, if the problem allows it. Of course, the only technique that I know about in this category is the arena, where an allocator maintaining a large block of memory hands out that memory in small pieces for use by the program; rather than deleting the individual pieces, the allocator frees the entire block in one operation. With this scheme, allocations within the region are cheap: bump up a high-water-mark in the region and return a pointer to the space between the old and new values of the mark. Deallocation is also cheap, since nothing is done with the small pieces until the entire region is discarded. For some investigation of their performance, see "Reconsidering Custom Memory Allocation", by Emery Berger, Benjamin Zorn, and Kathryn McKinley.
Unfortunately, arena-based management does not always apply. It works best when the program has (possibly nested) phases: at the beginning of a phase, the program can create a region, use it during the phase, and discard it at the end of the phase. It works less well when allocations of different lifespans are intermingled or when the program does not have clearly identifiable and separate phases. Two successful examples are Apache httpd, which supplies an arena whose lifespan matches the processing of a single request to the code handling the request, and the passes of a multi-pass compiler.
Rust currently includes an arena memory manager module, although I haven't explored it closely. At the moment, I do not intend to look at semi-manual management further.
Automatically. Automated memory management puts the work of dynamic memory management, specifically, the work of deallocating memory, off onto a software system, saving the programmer effort at a greater or lesser cost of efficiency and convenience. This topic is what I want to look at today.
Before I go further, I need some specialized vocabulary. If you are peering through automatic-memory-manager-colored glasses, the world is a strange place.
The program that you or I would think of as doing all the work is just a mutator. A mutator does one of four things: it allocates an object; deallocates an object; links two objects, adding a pointer from one object to another; and unlinks two objects, removing a link from the first to the second. (Strictly speaking, the last may not be necessary; I am including it for extra verisimilitude.) And an object is a block of memory allocated and used by the mutator. These objects are allocated from a heap, the chaotic source of all memory.
Continuing with my list above, there are several different kinds of automated memory managers:
Reference counting. Yes, reference counting is an automatic memory management scheme, although it requires a considerable amount of help from the mutator. Reference counting works by associating a counter with every allocated object; this counter contains the number of pointers that refer to the allocated object. As long as the reference count is greater than or equal to one, the object may be reachable and potentially in use. But as soon as the reference count goes to zero, the object is garbage and may be deleted. Copying a pointer requires incrementing the reference count and removing a pointer requires decrementing it; those actions need to be taken by the mutator.
Tracing garbage collectors. In any memory management scheme, there will be roots, pointers outside the managed heap that point at objects inside the heap. Further, objects inside the heap may contain pointers to other objects inside the heap. A path of pointers from a root is required for any object to be considered live, to be available for use by the mutator. Tracing garbage collectors periodically follow, or trace, the pointers from the roots to identify all of the live objects. Or, really, to identify the objects which are not live, the garbage. These garbage objects can then be freed or reused.
There are two broad categories of tracing garbage collectors:
Stationary garbage collectors, in which an object, once allocated, does not change its location in memory. A mark-sweep collector is a typical example; these work by walking through the live objects from the roots and marking them, then freeing the remaining, unmarked objects in a sweeping phase.
Moving garbage collectors, which may move an object once it has been allocated. A two-space copying collector is a typical example of a moving collector, where objects are initially allocated in one half of the heap. Then, when that half, or space, has filled, objects are traced from the roots and copied into the second space. Once all of the live objects have been moved to the second space, the original space contains only garbage, which is freed.
Ignoring some very significant second-order effects, the primary distinction between stationary and moving collectors is that the former only need to identify one pointer to an object, to determine that it is live. The latter must identify all pointers to an object, in order to update them with its new location.
My terms are a bit not-standard. Stationary tracing collectors are frequently referred to as mark-sweep collectors and moving collectors as copying collectors, after their prototypical algorithms, mark-sweep and two-space copying.
There are hordes of variations on each of these categories; the originals of most date back to Lisp in 1960, and many people have done a great deal of exploration; some of the most recent changes in the Java Virtual machine have been new garbage collectors. The reference I am currently using is Garbage Collection: Algorithms for Automatic Dynamic Memory Management, by Richard Jones and Rafael D. Lins, a book that I've had for years. It describes a great many variations on the basic themes. My copy of The Garbage Collection Handbook: The Art of Automatic Memory Management, by Jones, Antony Hosking, and Eliot Moss, is on its way from Amazon as I type; I hope it is as good and more up-to-date.
In the mean time, let's talk reference counting. Here is a simple reference counter in Rust:
Reference counting in Rust
#[no_send]
pub struct Rc<T> { priv ptr : *mut Box<T>, }
A value of type Rc<T> represents a pointer to a box containing a value of type T. The type is marked no_send to prevent it from being shared with another thread; there is no locking here. Also, an unsafe '*' pointer is used to prevent the behavior of Rust's other pointer types from getting in the way.
A Box is:
struct Box<T> {
count : uint,
value : T,
}
The count is the number of references from Rc pointers while value is the value. (Note that the Rust standard library's own Rc type goes through an extra level of indirection for mutable values, in order to support run-time borrow checking and to match the language's other safety guarantees.)
The basic operations
The basic operations are creating a new Rc pointer and Box, and borrowing the value in the Box.
// unsafe operations
impl<T> Rc<T> {
pub unsafe fn unsafe_new(t : T) -> Rc<T> {
Rc { ptr : cast::transmute( ~Box{ count : 1, value : t } ) }
}
}
// safe operations
impl <T> Rc<T> {
pub fn new(t:T) -> Rc<T> { unsafe { Rc::unsafe_new(t) } }
pub fn borrow<'l>(&'l self) -> &'l T { unsafe { &(*self.ptr).value } }
// DANGER! DANGER!
pub fn borrow_mut<'l>(&'l mut self) -> &'l mut T {
unsafe { &mut (*self.ptr).value }
}
}
unsafe_new creates a new box with a given value and a count of one, and returns a pointer to it wrapped in an Rc type. cast::transmute converts the ~, owning pointer into a *, unsafe pointer; using an owned pointer makes it easy to free the Box when needed, but the Rc value does not own the box and the raw owned-pointer behavior would get in the way.
The new function provides a safe, simple wrapper for creating boxes and Rc values, and the borrow method allows the mutating code access to the value in the box. The lifetime on borrow indicates that the borrowed pointer must not outlive the Rc value; by construction, the Rc value will not outlive the Box containing the value.
borrow_mut is a bad idea. It allows a mutator to change the value in the box. Rust's standard library Rc type goes to considerable lengths to ensure that it cannot be used to violate Rust's safety. This thing doesn't. Don't do this. But for my simple example here, it's fine.
The magic
The magic happens on dropping and cloning Rc values.
#[unsafe_destructor]
impl<T> Drop for Rc<T> {
fn drop(&mut self) {
if self.ptr.is_not_null() {
unsafe {
(*self.ptr).count -= 1;
if (*self.ptr).count == 0 {
let _ : ~Box<T> = cast::transmute(self.ptr);
self.ptr = ptr::mut_null();
}
}
}
}
}
impl <T> Clone for Rc<T> {
fn clone(&self) -> Rc<T> {
unsafe { (*self.ptr).count += 1; }
Rc { ptr : self.ptr }
}
}
The Drop trait specifies a destructor or finalizer for a Rust type. In this case, the Rc being removed results in a decrement of the count in the Box. Then, if the count is zero, the box is converted back into an owned value, which is immediately destroyed. Because the type T has been tracked all through the Rc code, this ensures that any finalization of the T value happens.
A "bad interaction" between Drop and @-pointers forces the #[unsafe_destructor] annotation; otherwise Rust rejects a Drop implementation for a parameterized type.
The Clone trait copies the Rc value by creating a new Rc value pointing to the same box, in which the count is incremented.
And that is all she wrote, as far as simple reference counters go.
Rust includes a reference counting memory management module in its standard library. That module is roughly the basis of this one, but mine is considerably simplified.
Mutants! Mutants everywhere!
In order to play with this and other memory managers, I need a mutator. Theoretically, I could write a number of real, semi-useful programs that use the managers and try them in a live situation. But I'm not going to because I'm lazy. Instead, I am going to use a quick and dirty mutator harness to exercise the manager, to see if it fails spectacularly.
The first requirement is for the mutator simulator is an implementation of the four operations.
enum Op { Create, Delete, Link, Unlink, }
pub trait Store {
fn perform_create(&mut self); // Create
fn perform_delete(&mut self); // Delete
fn perform_link(&mut self); // Link
fn perform_unlink(&mut self) -> bool; // Unlink
}
An object implementing the trait Store provides methods (using a heap) to create an object, delete an object, and link and unlink objects. Because the object chosen for the operation may not have links, perform_unlink returns a bool indicating whether it did anything. All of the other operations either succeed or cause the mutator simulator to fail.
The mutator
A Mutator itself contains an implementation of Store, a random-number generation framework, and a set of operation counts.
pub struct Mutator<'s,S> {
store : &'s mut S,
range : Range<f32>,
rng : TaskRng,
c_count : uint,
d_count : uint,
l_count : uint,
u_count : uint,
o_count : uint,
}
impl<S> ToStr for Mutator<S> {
fn to_str(&self) -> ~str {
format!("[ creates: {}, deletes: {}, links: {}, unlinks: {}, ops: {} ]",
self.c_count, self.d_count, self.l_count, self.u_count, self.o_count)
}
}
The only output of the test program is the number of operations that have been performed. Any actual value will be provided by valgrind, an instrumentation framework that can identify a plethora of memory management bugs.
Creating a new Mutator is trivial, as expected.
impl<'s, S:Store> Mutator<'s,S> {
pub fn new(s:&'s mut S) -> Mutator<'s,S> {
Mutator {
store : s,
range : Range::new(0f32, 1f32),
rng : rand::task_rng(),
c_count : 0,
d_count : 0,
l_count : 0,
u_count : 0,
o_count : 0,
}
}
Once the mutator has been configured, it can repeatedly choose and execute an operation.
pub fn perform_operation(&mut self) {
match self.choose() {
Create => { self.store.perform_create(); self.c_count += 1; },
Delete => { self.store.perform_delete(); self.d_count += 1; },
Link => { self.store.perform_link(); self.l_count += 1; },
Unlink => { if self.store.perform_unlink() { self.u_count += 1; } },
}
self.o_count += 1;
}
The mutator chooses the operation to perform randomly; in this code with a 40% of creating a new object, 30% chance of deleting an object, 20% chance of linking two objects, and 10% chance of unlinking them.
fn choose(&mut self) -> Op {
let x = self.range.ind_sample( &mut self.rng );
if x < 0.4f32 { Create }
else if x < 0.7f32 { Delete }
else if x < 0.9f32 { Link }
else { Unlink }
}
}
The store
The final element of the mutator is a simulation of a heap of memory: a collection of cells of type T. For this heap simulation, there are two basic operations, peek, which returns the contents of a cell in the heap, actually removing it from the heap, and poke, which stores an object into a cell in the heap. In order to mimic the memory access behavior of a real program, for loose definitions of "mimic", a random cell is fetched for each call to peek.
trait RandomHeap<T> {
fn peek(&mut self) -> T;
fn poke(&mut self, t : T);
}
impl<T> RandomHeap<T> for ~[T] {
fn peek(&mut self) -> T {
let i = rand::task_rng().gen_range(0, self.len());
return self.swap_remove(i)
}
fn poke(&mut self, t : T) { self.push(t); }
}
(In peek, rand::task_rng() returns a task-local random number generator.)
The test program
The representation of a memory cell capable of being linked and unlinked, using my simple Rc, is as Rc<~[...]>, a box containing an array of something. The something is ultimately going to be more references, so to prevent an infinite type-level recursion, I introduce a Cell type:
enum Cell { C( Rc<~[Cell]> ), }
Cell and its constructor, C, do not do anything other than breaking up Rc<~[Rc<...>]>.
As a next step, I implemented Store for a vector of Rc cells, making use of the random heap operations.
impl Store for ~[Rc<~[Cell]>] {
fn perform_create(&mut self) { self.poke( Rc::new( ~[] ) ); }
fn perform_delete(&mut self) { self.peek(); }
fn perform_link(&mut self) {
let mut l : Rc<~[Cell]> = self.peek();
let r : Rc<~[Cell]> = self.peek();
l.borrow_mut().push( C(r.clone()) );
self.poke( l );
self.poke( r );
}
fn perform_unlink(&mut self) -> bool {
let mut res = false;
let mut l = self.peek();
{
let contents = l.borrow_mut();
if contents.len() > 0 {
match contents.pop() {
C(r) => { self.poke(r); res = true; },
}
}
}
self.poke(l);
res
}
}
And finally, I create the Store, the Mutator, introduce an initial population of 100 cells, and set the mutator off to work.
fn main() {
let mut s : ~[Rc<~[Cell]>] = ~[];
let mut m = Mutator::new(&mut s);
for _ in range(0, 100) { m.store.perform_create(); }
for _ in range(0, 1000000) { m.perform_operation(); }
println(m.to_str());
}
Results
If the mutator simulator is run without the ability to link objects together, it allocates and frees a fair amount of memory without losing track of any.
[ creates: 398879, deletes: 301084, links: 200009, unlinks: 0, ops: 1000000 ]
==20013==
==20013== HEAP SUMMARY:
==20013== in use at exit: 40 bytes in 1 blocks
==20013== total heap usage: 800,686 allocs, 800,685 frees, 45,327,266 bytes allocated
==20013==
==20013== LEAK SUMMARY:
==20013== definitely lost: 0 bytes in 0 blocks
==20013== indirectly lost: 0 bytes in 0 blocks
==20013== possibly lost: 0 bytes in 0 blocks
==20013== still reachable: 40 bytes in 1 blocks
==20013== suppressed: 0 bytes in 0 blocks
The mutator made approximately 40000 create operations, approximately 30000 deletes, and 20000 attempts to link objects (I emasculated the operation itself, not its invocation by the simulator). The remaining ten thousand operations were attempts to unlink objects, which failed.
The remainder of the output is courtesy of valgrind, and indicates that only one of the 800,686 objects allocated was not freed before the program ended. (That remaining 40 byte object is in the Rust runtime somewhere).
If, however, the simulator has the ability to link objects, valgrind tells a different story.
[ creates: 399155, deletes: 300946, links: 199940, unlinks: 31003, ops: 1000000 ]
==19863==
==19863== HEAP SUMMARY:
==19863== in use at exit: 124,616 bytes in 2,547 blocks
==19863== total heap usage: 803,544 allocs, 800,997 frees, 45,679,158 bytes allocated
==19863==
==19863== LEAK SUMMARY:
==19863== definitely lost: 240 bytes in 3 blocks
==19863== indirectly lost: 124,336 bytes in 2,543 blocks
==19863== possibly lost: 0 bytes in 0 blocks
==19863== still reachable: 40 bytes in 1 blocks
==19863== suppressed: 0 bytes in 0 blocks
Here, the program has lost 2,546 objects, amounting to about 124kb of memory; not a lot on this run, but enough.