Monday, February 25, 2013

Creating a Letterpress cheating program in Rust

Update: Creating a Letterpress cheating program in Rust, v2.

As I believe I have mentioned, I have been casually learning Rust (currently, Rust 0.5) by working on a port of Jeff Knupp's Creating and Optimizing a Letterpress Cheating Program in Python. It's now time to describe my current progress.

For those of you with short attention spans, this is the punchline:

ProgramDuration (sec)Speedup
pesser_one.py49.71
anagrams-hashmap.rs24.02
anagrams-vectors.rs17.63
presser_two.py12.54
presser_three.py6.08
anagrams-vectors.c5.69
anagrams-hash.c0.955

The argument for all of these runs was "asdwtribnowplfglewhqagnbe", which produces 7440 results from my dictionary; I get 33,554,406 combinations. I used Python 2.7.3; gcc 4.6.3, with -O3; and rustc 0.5, with -O. (I missed "--opt-level 3" the first time around and did get some improvement when I used it. But that's not shown here.)

I started by getting all three of the Python versions from Knupp working, to have something to compare against. (I would like to point out that presser_one.py is not something I would consider idiomatic Python, but I do need something to make the rest look better.) Then, I wrote the anagrams-hashmap Rust version, the meat of which looks like:

fn main() {
    let args = os::args();
    if args.len() < 2 { fail ~"Usage: anagrams letters"; }
    let mut letters = str::chars(args[1]);
    std::sort::quick_sort(letters, |a,b| *a <= *b);
    let dictionary = load_dictionary();
    let mut set : Set<@~str> = HashMap();
    for uint::range(2,letters.len()) |i| {
        for combinations::each_combination_ref(letters,i) |combo| {
            let ana = @vec::from_fn(combo.len(), |i| *combo[i]);
            match dictionary.find(ana) {
                Some(ref val) => {
                    for val.each() |word| { set_add(set, *word); }
                }
                None => { }
            }
        }
    }
    let result = vec_from_set(set);
    println(fmt!("%u", result.len()));
}

This code uses the each_combination function I described last time. Unfortunately, the hashmap interface (and implementation) are a bit of a mess in 0.5, leading to some unpleasant code and probably poor performance. Specifically, I'm looking at the line "@vec::from_fn(combo.len(), |i| *combo[i])", which copies the contents of the combination vector into a separate vector, since the call to "find(ana)" is going to take ownership.

Unhappy with the performance of that, I wrote the anagrams-vectors Rust version, which avoids the use of the hashmap for the dictionary, replacing it with a binary search similar to presser_one.py. In fact, it uses a conversion of the same bisect_left function. The fact that anagrams-vectors shows improvement should tell you there's something fishy in the state of Denmark (er, hashmap). (Unfortunately, the Set used to contain the resulting words is a thin wrapper on the hashmap, so I did not completely remove that module.)

Following several attempts to improve the Rust results without much success, I decided to dust off my C and go old-school, medieval in fact, on this sucker. I first came up with the vector-based C version, which mmaps the anagram dictionary file and builds its dictionary structure based on that file's sorted lines. Given that attempt's not-completely-satisfactory success, I added the hash version, which builds a pseudo-hashtable over the mmap'ed dictionary. That version is about as far as I'm willing to go at the moment.

There are probably a few worthwhile notes on the C versions:

  • There is no reusability in the C code. I'm not sure if there would be a point, given the focus on getting the fastest code quickly. On the other hand, it is fairly difficult to build reusable data structures such as hashmaps in C.
  • I am using some less-well-known parts of C. mmap is one example; another is using alloca for memory allocation; it puts the allocated space in the current stack frame, making deletion easier. (On the other hand, you can blow your stack pretty easily with it.)
  • The hash is djb2 from Hash Functions; it is dead simple and works fairly well. To keep the maximum chain length down (for linear chaining), the hashtable load factor is a pitiful 0.2; I tried several other values, but increasing it did cost time, while decreasing it further did not offer much improvement. For a load factor of 0.5, the maximum chain length was 41 if I remember correctly, for 0.2, it was 9 with 58505 entries.
  • I used a bitset to hold the results, noting that each word in the output could come from exactly one line, all of the words on the line would appear if any did, and each line could appear exactly once. Yes, that's 58,000 bits. Using integers did not seem to make any special difference over the bit-manipulations, and I was worried about stack space.

I expect the performance to improve as Rust work continues. In fact, the hashmap interface is much better in the version in the core library in incoming. When 0.6 lands (or sooner), I'll update this code and provide up-to-date results.

No comments: