Letterpress cheating in Rust 0.7

Posted on July 6, 2013 by Tommy McGuire
Labels: rust, toy problems, programming language

I spent part of this weekend updating my idiotic Rust code to compile with the newly-released 0.7 compiler. There have been some significant changes in Rust with this release, which of course means that the code may be less idiomatic than it ever was. (One of these days, they'll make some change to the language and my code will suddenly become perfect. One of these days.)

Be that as it may, performance changes are kind of a mixed bag:

Results

Language Program Rust 0.6
Duration1
(seconds)
Rust 0.7
Duration
(seconds)
Python alternatives/presser_one.py 49.1 48.6
Rust anagrams-vectors-tasks 27.1 13.82
Python alternatives/presser_two.py 12.8 12.6
Rust anagrams-vectors-wide 11.8 13.1
Rust anagrams-hashmap-wide 9.3 15.4
C alternatives/anagrams-vectors 8.0 5.83
Rust anagrams-vectors 8.0 8.2
Rust anagrams-hashmap 6.0 35.54
Python alternatives/presser_three.py 6.0 6.3
Rust anagrams-hashmap-mmap 4.8 10.6
C alternatives/anagrams-hash 0.9 1.0

Notes

1. The previous timings were taken during the upgrade to 0.6 for the most part. The program anagrams-hashmap-mmap.rs is new, and was timed based on the incoming branch sometime between 0.6 and 0.7.

2. I assume this means that part or all of the new scheduler work has landed; it is a dramatic improvement. The other parallel versions, the "-wide" programs, have not improved and the hashmap version appears to be suffering from the same issue as anagrams-hashmap. anagrams-vectors-tasks.rs is, however, making heavier use of inter-task communications.

3. I have no idea what happened here; neither the program nor gcc changed to my knowledge. One of the timings may be in error. You get what you pay for.

4. There is something really odd going on here. There seems to be a performance regression involving hashing, but the result is not consistent in any way between compiles and sometimes runs. In fact, I added some quick code to print out the durations of various parts of the program (which is commented out in the code on github), and the performance for anagrams-hashmap.rs improved to 8.9 sec.

Changes

As you are probably aware if you read the 0.7 release notes, there were some significant changes between Rust 0.6 and 0.7. Aside from the normal, seemingly random perturbations of the module use declarations, the core crate was renamed std and the std crate was renamed extra. Further, the attempt to replace functions with methods continues apace: vec::slice(buffer, i, j) became buffer.slice(i, j) and SharedChan(response_chan) became SharedChan::new(response_chan).

The major change is the in-progress deprecation of "internal" iterators (i.e. the higher-order function approach) in favor of "external" iterators (i.e. a thing with a next method). External iterators are significantly more flexible than internal iterators and, as it turns out, just as easy to use. (On the other hand, a lot of Rust tutorial blog posts, including one of mine, are obsolete and, dang it, I just finished converting some of my internal iterators to the newer "double bool return value" protocol. In any case, if you are going to ask if Rust is ready for prime-time, production use, stahp. Just don't.)

An example of the ease of use of external iterators is the split_words function that litters my programs. This function takes a string containing space-separated words and produces a vector of the individual words, and is used to create the values for the anagram dictionary. The original, internal iterator, version is:


pub fn split_words(s : &str) -> ~[~str] {
let mut words = ~[];
for str::each_word(s) |word| {
words.push(word.to_owned());
}
return words;
}

That code could be improved, perhaps, by replacing the loop body with higher-order transformers. However, with the external iterator version, the change is trivial:


pub fn split_words(s : &str) -> ~[~str] {
s.word_iter().transform(|w| w.to_owned()).collect()
}

The word_iter method returns an Iterator that travels by space-separated words, the transform method (which should be called map) applies a function to each word, and the collect method assembles the words into a vector. Well, technically, it assembles the values into any implementation of the FromIterator trait, of which there is one for ~[T].

Some functions have disappeared, with replacements provided by iterators.


- let mut t = str::to_chars(s);
+ let mut t : ~[char] = s.iter().collect();

In this case, the to_chars function is replaced by an iterator over the characters in a string followed by collect. The type annotation is necessary because, as I mentioned, collect deals with FromIterator's, of which there may be many, and the further use of t only manipulates another collection trait. Omitting it leaves the exact type of t open and produces the following error at the subsequent use:


anagrams-hashmap-mmap.rs:11:37: 11:39 error: the type of this value must be known in this context
anagrams-hashmap-mmap.rs:11 extra::sort::quick_sort(t, |a,b| *a <= *b);
^~
anagrams-hashmap-mmap.rs:10:16: 10:35 error: failed to find an implementation of trait std::iterator::FromIterator<char,std::str::StrCharIterator<>> for &mut [[type error]]
anagrams-hashmap-mmap.rs:10 let mut t = s.iter().collect();
^~~~~~~~~~~~~~~~~~~

The for loop syntax is relatively unchanged, with the replacement of an each higher-order method with an iterator implementing the advance method. The necessity of the advance method is slated to go away when the for loop protocol is updated, I believe.


- for set.each |ln| {
- count += 1 + vec::count(*ln, &(' ' as u8));
+ for set.iter().advance |ln| {
+ count += 1 + ln.iter().count(|&ch| { ch == ' ' as u8 });

There does seem to be a problem with the lifetime of temporary values in 0.7. If I write the following code:


for sorted_keys(dict).iter().advance |key| { ... }

I then get the following error:


mk_anadict.rs:40:8: 40:26 error: borrowed value does not live long enough
mk_anadict.rs:40 for sorted_keys(dict).iter().advance |key| {
^~~~~~~~~~~~~~~~~~
mk_anadict.rs:43:4: 43:5 note: borrowed pointer must be valid for the method call at 43:4...
mk_anadict.rs:43 }
^
mk_anadict.rs:40:8: 40:33 note: ...but borrowed value is only valid for the method call at 40:8
mk_anadict.rs:40 for sorted_keys(dict).iter().advance |key| {
^~~~~~~~~~~~~~~~~~~~~~~~~

However, if I give the temporary value a name then everything works as expected:


let skeys = sorted_keys(dict);
for skeys.iter().advance |key| { ... }

On the other hand, the HashMap method find_or_insert now returns a mutable value, so this code from mk_anadict_traits.rs now works:


map.find_or_insert(key, ~[]).push(line.to_owned());

Yay!

Two final notes: The finalize method in the Drop trait (used for RAII-style clean-up) is now called drop. And the <-> swapping operator is now gone but not forgotten, since the operation is still necessary.


let mut ks = ~[];
// ks <-> key_set;
util::swap(&mut ks, &mut key_set);

In anagrams-vectors-tasks.rs, the key_set is being shipped off to another task, so this code replaces the value of key_set with a fresh vector and sends the original key_set from a another variable, ks. The swap dance is needed to avoid the destruction of the no-longer-valid value. I am not sure, however, if util::swap is the right replacement for <->.

Comments



Excellent post covering some of the improvements and oddities of 0.7. Thanks for the concrete examples, too. I just updated some code from 0.6 and found that it wasn't horrible, but wasn't trivial.

The lifetimes issue you mention in 0.7 most notably affected my code when dealing with command line arguments. You must create a local value first before using them.

Also there appears to be an odd change when going from owned strings to vectors. ~"something".to_bytes() used to return an owned pointer to the vector of bytes. In 0.7 this has become "something".as_bytes().to_owned() unless I'm missing something.

Anyway, thanks for sharing.

K Matthias
'2013-07-09T16:40:25.111-05:00'

Thanks for your comments!

There is a thread on the rust-dev mailing list concerning the lifetime issue, but no resolution as of yet.

There's also a thread on the performance issue, which Björn Steinbrink thinks is at least partially due to inlining changes.

Finally, one thing that I did notice but didn't mention: you probably don't want to do "use std::*;". (If you do, fail! produces some very impressive error messages.)

Tommy McGuire
'2013-07-10T11:08:24.744-05:00'
active directory applied formal logic ashurbanipal authentication books c c++ comics conference continuations coq data structure digital humanities Dijkstra eclipse virgo electronics emacs goodreads haskell http java job Knuth ldap link linux lisp math naming nimrod notation OpenAM osgi parsing pony programming language protocols python quote R random REST ruby rust SAML scala scheme shell software development system administration theory tip toy problems unix vmware yeti
Member of The Internet Defense League
Site proudly generated by Hakyll.