Rust 0.8: External iterators

Posted on October 5, 2013 by Tommy McGuire
Labels: rust, programming language

If you have browsed through my ridiculous Rust posts, you will know that I have been using a sad, silly combinatorics problem that I stole from someone else's blog to play with the programming language Rust. I am still doing it, I'm sorry to say.

If you have been following the development of Rust, you will know that between 0.6 and the recent release of 0.8, Rust's for expressions have changed from an "internal" iterator basis to an "external" iterator basis. (I've described the change before, way back when 0.7 was released. Geeze, that was a long time ago.)

Basically, an "internal" iterator involves a higher-order function that is passed as an argument a new function that represents the body of the for expression. This new function is called once on each element of whatever is being iterated over. An "external" iterator is instead an implementation of the trait Iterator, which means that it has a next() method. External iterators have many advantages, such as more flexibility and better composability, which is why the behavior of the for expression was changed. However, external iterators also seem to have downsides.

This is part of my implementation of an external iterator for generating all permutations of the contents of a vector:


impl<'self,T:Clone> Iterator<~[T]> for PermutationIterator<'self,T> {

#[inline]
fn next(&mut self) -> Option<~[T]> {
if self.initial {
self.initial = false;
} else if self.length <= 1 {
return None;
} else {
// find largest k such that indices[k] < indices[k+1]
// if no such k exists, all permutations have been generated
let mut k = self.length - 2;
while k > 0 && self.indices[k] >= self.indices[k+1] { k -= 1; }
if k == 0 && self.indices[0] > self.indices[1] { return None; }
// find largest l such that indices[k] < indices[l]
// k+1 is guaranteed to be such
let mut l = self.length - 1;
while self.indices[k] >= self.indices[l] { l -= 1; }
// swap indices[k] and indices[l]; sort indices[k+1..]
// (they're just reversed)
self.indices.swap(k,l);
self.indices.mut_slice(k+1, self.length).reverse();
// fixup permutation based on indices
for i in range(k, self.length) {
self.permutation[i] = self.values[ self.indices[i] ].clone();
}
}
return Some(self.permutation.clone());
}
}

The algorithm for generating the permutations (in lexical order) is straight out of the Wikipedia page, with the modification that it is working on an internal vector of indices rather than the data vector (to avoid potentially expensive copies). What I want to discuss is the line


return Some(self.permutation.clone());

permutation contains the current permutation being generated by the iterator. It is intended to be passed directly to the body of the loop, which would avoid an allocation and data copies. Unfortunately, this code calls clone() there.

The reason it calls clone() is that this code implements an Iterator<~[T]> and therefore has a return type of Option<~[T]>. In other words, it returns an owned vector; hence the clone(). What I think I would like to do is:


// impl<'self,T:Clone> Iterator<&'self [T]> for PermutationIterator<'self,T> {

// fn next(&mut self) -> Option<&'self [T]> {
// ...
// return Some(self.permutation.slice_from(0));
// }
// }

Unfortunately, I cannot convince the compiler to let me return a borrowed reference to the permutation vector, which is an element of the PermutationIterator structure and should have the same lifetime1. On one hand, it should work, I think. On the other, well, maybe it is a sketchy thing to do.

Why is this important? Time to break out a performance comparison:

VariationTime (seconds)
vec permutations_iter11
vec permutations_iter (manual)7
lexical iterator6
old each_permutation1.2

These timings are for generating all 39,916,800 permutations of a vector of 11 integers, counting the number of permutations.

vec permutations_iter is the iterator version currently in the vec module, based on the Steinhaus–Johnson–Trotter algorithm. It allocates and copies the returned permutation as well.

vec permutations_iter (manual) is based on a suggestion from the implementer of the vec iterator (from the commit message introducing the S-J-T iterator); it does not copy the permutation, but iterates over element swaps from the permutation state space and lets the body of the loop update its own copy of the permutation. The code looks like:


let mut v = do vec::from_fn(VECTOR_SIZE) |i| { i };
let mut count = 0;
for (a,b) in vec::ElementSwaps::new(VECTOR_SIZE) {
v.swap(a,b);
count = count + 1;
}
assert_eq!(count, NUM_PERMUTATIONS);

lexical iterator is my iterator above, also allocating and copying the return permutation.

old each_permutation is the previous permutation generator from the vec module. It used similar code to my iterator above, also based on the Wikipedia lexical generator algorithm. Critically, it does not allocate and copy the returned permutation; it is the "internal" iterator version.

The code for each_permutation can trivially be converted to be useful in a do expression, which may be the best option. It may be possible that this kind of generator is a bad fit for external iterators.

On the other hand, I am hoping that some very smart Rust developer will show up and tell me that, A), I'm an idiot, and B), there is a trivial way to do this that I'm not seeing.

Update

In the discussion on reddit, englabenny writes,

I'm surprised the manual ElementSwaps version performs so poorly. I didn't expect that. The premise of my Permutations Iterator was really that it could be implemented without copies -- which it could, but acrichto helped discover that was through a borrowchecker hole. So the fixed version that clones the whole vector for each iterator element is quite terrible.

Pull req: https://github.com/mozilla/rust/pull/9062 most of the discussion was on irc, and the initial copyless version isn't available, I'll pastebin it if git still has it somewhere. [Edit] Thanks git reflog: Buggy copyless permutations iter

In the discussion of that pull request, alexcrichton opened issue 9069, "Unsound behavior when returning &-pointers", writing,

I thought something like this should be a compiler error? It seems to me that the lifetime of b would conflict with attempting to re-borrow a as mut.

That would seem to imply that my desired behavior is wrong. That issue was marked as a duplicate of 8624, which seems to emphasize that changing the elements of a vector that potentially has an outstanding reference is a no-go.

In the same reddit thread, ericanderton suggested using managed vectors, which I had not thought about. It might work, but I'm stuck trying to dynamically allocate a mutable managed vector. And it still has the objection that mutating the iterated value is not entirely kosher.

Notes

1. For what it's worth, the error message is:


$ rust build --opt-level 2 --lib combinatorics.rs
combinatorics.rs:62:20: 62:51 error: cannot infer an appropriate lifetime due to conflicting requirements
combinatorics.rs:62 return Some(self.permutation.slice_from(0));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
combinatorics.rs:62:20: 62:36 note: first, the lifetime cannot outlive the expression at 62:20...
combinatorics.rs:62 return Some(self.permutation.slice_from(0));
^~~~~~~~~~~~~~~~
combinatorics.rs:62:20: 62:36 note: ...due to the following expression
combinatorics.rs:62 return Some(self.permutation.slice_from(0));
^~~~~~~~~~~~~~~~
combinatorics.rs:62:20: 62:51 note: but, the lifetime must be valid for the method call at 62:20...
combinatorics.rs:62 return Some(self.permutation.slice_from(0));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
combinatorics.rs:62:20: 62:36 note: ...due to the following expression
combinatorics.rs:62 return Some(self.permutation.slice_from(0));
^~~~~~~~~~~~~~~~
active directory applied formal logic ashurbanipal authentication books c c++ comics conference continuations coq data structure digital humanities Dijkstra eclipse virgo electronics emacs goodreads haskell http java job Knuth ldap link linux lisp math naming nimrod notation OpenAM osgi parsing pony programming language protocols python quote R random REST ruby rust SAML scala scheme shell software development system administration theory tip toy problems unix vmware yeti
Member of The Internet Defense League
Site proudly generated by Hakyll.