Creating a Letterpress cheating program in Rust, v2

Posted on March 9, 2013 by Tommy McGuire

I have updated my horrible, horrible Rust anagram-esque code for a (relatively) up-to-date build of the Rust *master* branch, rustc 0.6 (edcebac 2013-03-03 20:02:06 -0800). There were a number of simple syntax changes, along with a much nicer hashmap.

I also fixed a grotesque performance problem that improved the Rust programs' times considerably, so these times are not *directly* comparable to the previous post.

To start, here are the current timings:

Program	Duration (sec)	Speedup
pesser_one.py	49.7	1
presser_two.py	12.5	4
anagrams-vectors.rs	7.9	6
presser_three.py	6.0	8
anagrams-hashmap.rs	5.8	9
anagrams-vectors.c	5.6	9
anagrams-hash.c	0.9	55

Once again, the argument for all of these runs was "asdwtribnowplfglewhqagnbe", which produces 7440 results from my dictionary; I get 33,554,406 combinations to look at. I used Python 2.7.3; gcc 4.6.3, with -O3; and rustc, with -O.

So, what did I change?

Overall, I'm now using core::hashmap::linear::LinearMap and LinearSet instead of the HashMap that used to be in the std crate. LinearMap has a much nicer, if still imperfect, interface.

The fail "keyword" is now a macro: fail!.

anagrams-vectors.rs

The previous version use a (owned) vector of (owned) vectors of chars to store the "keys" from the dictionary file: ~[~[char]]. To look up a key, made from a combination of the input letters (a (borrowed) vector of chars), I use a binary search into this vector. To make the binary search work, two vectors of chars need to be comparable, to implement the Ord trait. Unfortunately (and I need to check on this), char's don't have an implementation of Ord. So, I added one:


impl char : Ord {
    #[inline(always)] pure fn lt(&self, other: &char) -> bool { *self <  *other }
    #[inline(always)] pure fn le(&self, other: &char) -> bool { *self <= *other }
    #[inline(always)] pure fn gt(&self, other: &char) -> bool { *self >  *other }
    #[inline(always)] pure fn ge(&self, other: &char) -> bool { *self >= *other }
}

One of the recent changes to rustc, however, makes that illegal because it is problematic to implement a trait for a type originating from a different crate. As a result, I removed that implementation and made the keys vectors of int's (converting the char's to int's). The int's are comparable.

By the way, one of the syntax changes was to replace "impl char : Ord ..." with "impl Ord for char...". That declaration is now much more readable.

anagrams-hashmap.rs

Once again, I converted the vector of char's to a vector of int's. In this case, that did not result in any major changes.

The major performance change (which I did make to both programs) was to move the allocation of the actual key used by the lookup out of the each_combination inner loop. That is the major performance change.

The combination provided by each_combination is a &[int], while the call to dictionary.find(...) wants a &~[int] because the key type, K, for the dictionary is ~[int]. As a result, I need to copy the combination's values out of one vector and into an appropriately allocated vector. The previous version was allocating the copy in the inner loop; this version allocates it in the outer loop, where the size is known, and inserts the values in the inner loop. In other words, the previous version had:


let key = @vec::from_fn(combo.len(), |i| combo[i]);

(You can ignore the "@", which was needed by the previous hashmap to make the key's Copy-able.)

The current version uses:


let mut key = vec::from_elem(size, 0);              // outer loop
...
    for uint::range(0,i) |j| { key[j] = combo[j]; } // inner loop

Avoiding the allocation was a big win. Avoiding the copy would also be a win, but I don't know how to do that. Yet.