Reimplementing ashurbanipal.web in Rust, pt. 2

Posted on July 15, 2015 by Tommy McGuire
Labels: ashurbanipal, rust

In the last post, I described the background and Rust code to compute text recommendations from the Ashurbanipal data. In this post, I'm going show the web server code that uses those recommendation structures and supplies data to the browser front-end. As a reminder, this is using Rustful, a lightweight web-application framework in Rust, based on the hyper Rust HTTP library. Ashurbanipal is a prototype a text recommendation engine based on the contents of the text, not on reviews, other purchases, or any other external data. The prototype is based on 22,000-ish books on the Project Gutenberg 2010 DVD, at dpg.crsr.net. To use the it, go to the page, find a book that you know you like (using the search field at the upper left), and get style-based, topic-based, and combined recommendations for books which are in some way similar to your choice.

  1. Part 1: Computing recommendations.
  2. Part 2: You are here.
  3. Part 3: My kingdom for metadata.

The main function for the application server is pretty simple; in fact it is based on the Rustful example code almost unchanged.


fn main() {
let args : Vec<String> = env::args().collect();
if args.len() < 4 { panic!("Usage: ashurbanipal_web pos-data topic-data metadata"); }

let router = insert_routes! {
TreeRouter::new() => {
"style" => Get : RecQuery::Style,
"topic" => Get : RecQuery::Topic,
"combination" => Get : RecQuery::Combination,
"lookup/:etext_no" => Get : RecQuery::TextLookup,
"lookup" => Get : RecQuery::TextSearch
}
};

let rec_state = RecState::new(&args[1], &args[2], &args[3]);

println!("serving...");

let server = Server {
content_type : content_type!(Application / Json; Charset = Utf8),
global : (rec_state,).into(),
handlers : router,
host : FromStr::from_str("127.0.0.1:8080").unwrap(),
log : Box::new( rustful::log::StdOut ),
server : "ashurbanipal_web(Rust)".to_string(),
..Server::default()
};

if let Err(e) = server.run() {
println!("could not start server: {}", e.description());
}
}

In the last post, I mentioned how the collect method is a polymorphic translator between an iterator and a collection; std::env::args is a Rust function returning an iterator over the command-line arguments to the program. The first block ensures that we have been passed something like the right arguments, which should be paths to the data files.

The second block of code is a Rust macro (identified by the trailing '!'), which simplifies creating the routing tree used by the application. This application server listens for GET requests for /style, /topic, /combination, /lookup/etext-number, and /lookup. Matching each query produces one of the possible values for RecQuery, which will be seen below. One odd thing about Rust macros: there cannot be a comma after the last element of a sequence (as in, following TextSearch there). In other parts of Rust, such as structures, a trailing comma is fine.

Following that, the third block of code creates a RecState, which contains the recommendation structures described last time (as well as metadata storage).

Finally, this code creates a Server and runs it, listens for connections on the localhost interface, port 8080. The contents of the Server structure configure the web application:

In the router above, the URLs the server is listening for are associated with values of RecQuery:


pub enum RecQuery {
Style,
Topic,
Combination,
TextLookup,
TextSearch,
}

impl Handler for RecQuery {
fn handle_request(&self, context: Context, response: Response) {
let &RecState(ref style, ref topic, _, _) = context.global.get().unwrap();
match *self {
RecQuery::Style => handle_recommendation_query(style, context, response),
RecQuery::Topic => handle_recommendation_query(topic, context, response),
RecQuery::Combination => handle_recommendation_query(&Combination::new(style, topic), context, response),
RecQuery::TextLookup => handle_text_query(context, response),
RecQuery::TextSearch => handle_text_search(context, response),
}
}
}

The RecQuery type, which contains no special values, implements the Handler trait from Rustful, requiring an implementation of the handle_request method. This method recovers the data from the server's global storage and dispatches to one of three functions to actually produce results, handle_recommendation_query, handle_text_query, and handle_text_search. I will discuss the final two when I get to the metadata used by the project, but for the moment I am focusing on the recommendations themselves.

One odd bit here is the line,


RecQuery::Combination => handle_recommendation_query(&Combination::new(style, topic), context, response),

In the combination case, I have to create a new Combination structure based on the style and topic recommendation structures. Happily, creating this structure is very lightweight; creating a new one for each request is required because I could not find a way to reference the same style and topic structures in creating a persistent Combination. The requirements of Rustful and the lifetimes of the Style, Topic, and Combination never could come together. (I am aware that I could have used Rust's Box to store the structures on the heap, but I preferred not to do that. What can I say.)

In any case, handle_recommendation query is:


fn handle_recommendation_query(r : &Recommendation, context: Context, mut response: Response) {
let &RecState(_, _, ref metadata, _) = context.global.get().unwrap();
let start = optional("start", 0, &context);
let limit = optional("limit", 20, &context);
match required("etext_no", &context) {
Some(etext_no) => {
match r.sorted_results(etext_no) {
Some(rows) => {
let recommendation = Recommendations {
count : rows.len(),
rows : metadata.add_metadata(&rows, start, limit)
};
response.set_status(StatusCode::Ok);
response.into_writer().send( json::encode(&recommendation).unwrap() );
}
None => {
response.set_status(StatusCode::NotFound);
response.into_writer().send("no matching etext");
}
}
}
None => {
response.set_status(StatusCode::BadRequest);
response.into_writer().send("parameter required: etext_no");
}
};
}

This function can be broken down into four sections, roughly from outside in:

  1. The first recovers the data structures from global storage and the arguments passed in the HTTP request. The start and limit parameters provide windowed access to the results, rather than transmitting the whole, large, result array; they are optional with defaults of starting at element 0 and returning 20 elements. Required is the etext number to make recommendations from.
  2. If the necessary etext number is missing or otherwise invalid, this code sets a Bad Request status and replies with an error message.
  3. If the supplied etext number produces some recommendations, this response is put into a Recommendations structure along with added metadata (which process also handles the start and limit) and returned as JSON with a status of Ok.
  4. If, however, the etext number is a valid number but cannot be associated with a text, a Not Found status is returned.

The two methods required and optional parse the Strings picked out of the request's query parameters and return them after converting them to a more useful type using the polymorphic function parse. Parse works with any data type implementing the FromString trait, so this code tries very hard to produce whatever value you want.


fn required<T:FromStr>(v : &str, context : &Context) -> Option<T> {
context.query.get(v).and_then( |s| s.parse::<T>().ok() )
}

fn optional<T:FromStr>(v : &str, default : T, context : &Context) -> T {
required(v, context).unwrap_or(default)
}

Required produces an Option, to be explicit about the case where the argument is missing with a None. A failure to parse the value produces an Error object, which can be converted to an Option via the ok method. Since both the hashmap method get and parse/ok here produce Option values, they can be chained together with and_then, which will return the first None it sees, or Some containing the final, appropriate value.

Optional uses required to pick out the value, but it also must be passed a default value to be used when the parameter is missing or fails to parse.

A final interesting piece here is the Recommendations structure:


#[derive(RustcEncodable)]
struct Recommendations<'a> {
count : usize,
rows : Vec>,
}

Remember, the handler produces JSON data. To do so, it uses the rustc_serialize library, which is not a part of the Rust distribution. In fact, it is a library for serialization and deserialization that includes a compiler plug-in which handles the RustcEncodable derivation annotation. This plug-in evaluates the annotated structure, Recommendations, to automatically provide an implementation of the Encodable trait from rustc_serialize. Once that is done, a reference to a Recommendations can be passed to the library's json::encode method to convert the value into JSON for the HTTP response.

In this case, the recommendations include the total number of computed recommendations and the data for a selected window of those recommendations. The actual data for a text come from a Text structure (which also derives RustcEncodable), which I will get into next time.

active directory applied formal logic ashurbanipal authentication books c c++ comics conference continuations coq data structure digital humanities Dijkstra eclipse virgo electronics emacs goodreads haskell http java job Knuth ldap link linux lisp math naming nimrod notation OpenAM osgi parsing pony programming language protocols python quote R random REST ruby rust SAML scala scheme shell software development system administration theory tip toy problems unix vmware yeti
Member of The Internet Defense League
Site proudly generated by Hakyll.