# Rust Iterators

**UPDATE 31 JULY 2013: THIS POST IS OBSOLETE.** As of 0.7, Rust is moving to external iterators, rather than the "internal iterators" described in this post. Welcome to Rust language development. And if you're wondering why I haven't updated this post, I am still working on a version of each_combination (and each_permutation, in the **vec** module; I cannot seem to find a way to do it that both implements the iterator trait and does not allocate a new vector for each combination or permutation.

In my last post, I described my first Rust program, a simple utility to build an anagram dictionary based on part of Jeff Knupp's Python Letterpress cheating program. This time, I would like to take a closer look at a slightly unusual part of Rust by presenting an implementation of part of Python's itertools module that is needed by the remainder of the anagram-esqe program.

In order to find all the words that can be built from a set of characters, Jeff uses combinations, a function which "[returns] *r* length subsequences of elements from the input *iterable*. Combinations are emitted in lexicographic sort order. So, if the input *iterable* is sorted, the combination tuples will be produced in sorted order." According to the documentation, `combinations('ABCD', 2)`

produces "AB", "AC", "AD", "BC", "BD", "CD". The Rust standard library doesn't include anything like this function.

What Rust does have is a style of iterators based on higher-order functions. For example, to iterate across all the elements of a vector, the function `vec::each`

takes a vector and a function to be applied to each element: `each(v, |elt| { frob(elt) })`

(or alternatively, the "method" version is an element of a typeclass with an implementation for vectors, allowing it to be called as `v.each(|elt| { frob(elt) })`

). Now, the way this function works is that the function argument should return true or false after processing each element; if it returns false, the iteration terminates. In other words, the frob function should return a boolean in those examples.

Rust provides a couple of special syntactic forms that make these higher-order functions easier to use: **do** and **for** expressions. The syntax for the **for** expression is:

"for" expr [ '|' ident_list '|' ] ? '{' block '}' ;

The **do** expression differs only in the keyword. **do** is the simpler of the two; to illustrate, the Rust manual describes the following two expressions as identical:

f(|j| g(j));

do f |j| {

g(j);

}

The higher-order function is *f*, *j* is the parameter to the anonymous function, which in turn calls *g* with *j*. One feature of this syntactic form is that a **return** expression in the body of the anonymous function returns from the function enclosing the **do** expression, not the inner anonymous function. Likewise, **break** and **loop** are handled specially, allowing the expression to behave more like a control structure. (The latter is similar to **continue** in many other languages, in this use.)

The **for** expression is similar, but more specialized. In addition to the specialized handling of **return**, the anonymous function forming the body of the **for** expression implicitly returns **true**, continuing the iteration. The Rust tutorial provides this example:

for each([2, 4, 8, 5, 16]) |n| {

if *n % 2 != 0 {

println("found odd number!");

break;

}

}

The signature of each is `fn each<'r,T>(v: &'r [T], f: &fn(&'r T) -> bool)`

; *v* is the vector argument and *f* is the function. (For the moment, ignore the "&'r" bit.)

How does this interact with combinations? How about an each-like function that takes arguments describing what and how many to combine, and a function to apply to each combination as it is generated? Without too much introduction, this is what I came up with:

pub pure fn each_combination<T:Copy>(values : &[T],

r : uint,

fun : &fn(combo : &[T]) -> bool) {

let length = values.len();

if r == 0 || r > length { return; }

let max_indices0 = length - r;

let mut indices = vec::from_fn(r, |i| i);

let mut combination = vec::from_fn(r, |i| values[i]);

loop {

if !fun(combination) { return; }

let mut i = r - 1;

indices[i] += 1;

while i > 0 && indices[i] > max_indices0 + i {

i -= 1;

indices[i] += 1;

}

if indices[0] > max_indices0 { break; }

combination[i] = values[indices[i]];

for uint::range(i + 1, r) |i| {

indices[i] = indices[i-1] + 1;

combination[i] = values[indices[i]];

}

}

}

This function first tests whether any iterations are needed, then creates a pair of vectors to handle those needed. *indices* is a vector of integers, the size of the combinations that are needed, containing the indices into the *values* argument for the current combination. The *combination* vector contains the *r* elements from *values* (specifically, copies of them) for the current combination. Specifially, the loop tries to maintain combination[i] = values[indices[i], for i in 0 to *r-1*. The contents of the *indices* vector are incremented lexicographically, with the first, **while**, inner loop locating the right-most index to be incremented and the second, **for** loop, fixing up the subsequent indices as well as the *combination* vector. The overall loop terminates when the left-most index, *indices[0]* is too large to allow the remainder of *indices* to fit into *values*. Or when *fun* returns false after being invoked with a combination.

There are a couple of minor points to notice about this function. First, it is marked **pure**, indicating that it only modifies data *owned by its own stack frame*; meaning that aside from internal manipulations, it is (ideally) referentially transparent. Second, notice that *indices*, *combination*, and the helper *i* are declared **mut**, meaning that these variables are mutable; otherwise variables in Rust are immutable.

Third, the type parameter of *each_combination*, *T*. Because elements from *values* are copied into *combination*, *T* must be copyable, indicated by the Copy typeclass. Requiring that, in general, seems sub-optimal, so I created a second, much shorter, function that uses *each_combination* but does not require *T* to be copyable.

pub pure fn each_combination_ref<'v,T>(values : &'v [T],

r : uint,

fun : &fn(combo : &[&'v T]) -> bool) {

each_combination(vec::from_fn(values.len(), |i| &values[i]), r, fun);

}

This function creates a vector containing borrowed pointers to the elements of the original *values*, then calls *each_combination* on it. Since the borrowed pointers are being copied, not the contents of the vector, *T* no longer needs to be copyable and as a side effect (if *T* is large enough, say) this function may be more efficient. The function parameter must be different, of course, in that it acts on a borrowed pointer to a vector containing borrowed pointers to the value elements. That is where Rust's type system gets interesting.

Check out the type of *values* and the type of *combo*, the parameter to the function *fun*. The type of *values* is `&'v [T]`

, indicating that it is a borrowed pointer to a vector containing *T*'s, with a **lifetime variable** of *v*. The type of *combo* is `&[&'v T]`

, which is a borrowed pointer to a vector containing borrowed pointers to *T*'s, here the inner borrowed pointers have the same lifetime variable, *v*. [Note: the syntax for lifetime variables will be changing in Rust 0.6.]

Those types are telling the Rust compiler that the contents of *combo* can be used anywhere that doesn't live any longer than *values*. Equivalent C code could easily leak the pointers into *values*, leaving them dangling if that vector were freed.

Niko Matsakis described the **for** loop protocol in detail, along with a significant problem and proposed solution: Revised for loop protocol.

**Update 4/5/2013:** Updated to use rust 0.6 lifetime syntax. "&v/T" became "&'v T". Also, the library modules have the lifetime, "'v", as the first of the type parameters as shown above, although that is not mentioned in the manual or RELEASES.txt.