Letterpress cheating in Nimrod?

Posted on December 23, 2013 by Tommy McGuire

In my spare time here, I have been focusing on Rust, trying to follow the language development. So far, I like what I see. However, there are a number of other new options in the "systems programming language" area and I feel beholden (Behooved?) to take a look at them.

Here's a short list of some:

Go: Meh. On pretty much all fronts, meh. For details, check every other programmer's blog for postings about Go.
D: Well, D is not new, and when I looked at it previously I did not see anything particularly exciting. I may have to take another look at it, but I'm not feeling the mojo right now.
ATS: Shows up at the top of a lot of results on the programming language shootout! Dependently typed! And not based on Coq or any other common proof assistant! (Coq is, by the way, the best computer game I have ever played.) Is that sufficiently terrifying? Ok, I do need to take a serious look at ATS, but I'll need to get some more coffee first.
Nimrod: Local type inference. Manual memory management or, optionally, soft realtime garbage collection. A macro system; well, several actually. And it compiles to C. It certainly might be entertaining.

At the risk of brain weevils, for today Nimrod it is.

Syntactic issues

Unlike C, Rust, and many other languages, Nimrod does not use a variant of C syntax. Instead, it is similar to Python: colons and indentation. (Yes, I know: you hate significant whitespace. Or you hate semicolons. In either case, I hate you; the world is nicely symmetric.) Here is a simple example, a procedure which accepts a string and returns a new string containing the same characters, sorted. (I am new to Nimrod; in all these examples I would greatly appreciate any feedback!)


proc sorted(s : string) : string =
  var seq = toSeq(s.items)
  seq.sort(system.cmp[char])
  var newStr = newString(s.len)
  for i,ch in seq.pairs: newStr[i] = ch
  newStr

One seriously odd thing about Nimrod lexically is,

Nimrod is a style-insensitive language. This means that it is not case-sensitive and even underscores are ignored: type is a reserved word, and so is TYPE or T_Y_P_E. The idea behind this is that this allows programmers to use their own preferred spelling style and libraries written by different programmers cannot use incompatible conventions. A Nimrod-aware editor or IDE can show the identifiers as preferred. Another advantage is that it frees the programmer from remembering the exact spelling of an identifier. (From the Nimrod Manual.)

In other words, sorted above could be called as Sorted, or soRtEd, or even S_oRT_ed. I, personally, do not really care what convention is used, as long as you pick one. Choosing all of the above seems unwise.

At a higher level, Nimrod supports named and defaulted procedure parameters, and the definition of new symbolic operators, as well as a method-call syntactic sugar: "The syntax obj.method(args) can be used instead of method(obj, args)" (Nimrod Tutoral, Part II). The latter, which is seen above, is particularly useful since Nimrod uses multi-methods rather than traditional object-oriented single-dispatch.

A simple example

If Rust's middle name is "safety", Nimrod's is "macros". Nimrod supports several kinds: templates, "a simple substitution mechanism that operates on Nimrod's abstract syntax trees", macros, enabling "advanced compile-time code transformations", and something called "term rewriting macros", that allow a programmer to "enhance the compilation pipeline with user defined optimizations". This is all in addition to a type system supporting parametric polymorphism.

The simplest option for compile-time metaprogramming is templates. This example is from the Nimrod Tutorial, Part II:


template withFile(f: expr,
                  filename: string,
                  mode: TFileMode,
                  body: stmt): stmt {.immediate.} =
  block:
    let fn = filename
    var f: TFile
    if open(f, fn, mode):
      try:
        body
      finally:
        close(f)
    else:
      quit("cannot open: " & fn)

The withFile template is used twice in the remainder of the mk_anadict program, which converts a words file to an anagram dictionary:


var dictionary = initTable[string,seq[string]]()

withFile(words, "/usr/share/dict/words", fmRead):
  while not words.EndOfFile:
    let line = words.readLine
    if line.len >= 2 and line.len < 19 and line.allCharsInSet({'a'..'z'}):
      let sline = sorted(line)
      if dictionary.hasKey(sline):
        dictionary.mget(sline).add(line)
      else:
        dictionary[sline] = @[line]

var keys = toSeq(dictionary.keys)
keys.sort(system.cmp[string])

with_file(dict, "anadict.txt", fmWrite):
  for key in keys:
    dict.writeln(key, " ", dictionary[key].join(" "))

You may notice that I used Nimrod's lexical flexibility there. Whoops.

Searching for words

One variant of a template is baked into Nimrod as iterators, which along with for statements are "an abstract mechanism to iterate over the elements of a container" according to the Nimrod Manual. The following iterator implements something similar to Python's combinations from itertools:


iterator eachCombination[T](values : openarray[T], r : int) : seq[T] =
    let length = values.len
    if r > 0 and r <= length:
      var max_indices0 = length - r
      var indices : seq[int]
      var combination : seq[T]
      newSeq(indices, r)
      newSeq(combination, r)
      for i in 0..r-1:
        indices[i] = i
        combination[i] = values[i]
      while true:
        yield(combination)
        # increment the indices
        var i = r - 1
        indices[i] += 1
        while i > 0 and indices[i] > max_indices0 + i:
          # indices[i] now too large; decrement i, increment indices[i]
          # and we'll fix up the following indices later
          i -= 1
          indices[i] += 1
        # Can't fix up 'done'
        if indices[0] > max_indices0: break
        # Fix up the indices and the combination from i to r-1
        combination[i] = values[indices[i]]
        for i in i+1 .. r-1:
          indices[i] = indices[i-1] + 1
          combination[i] = values[indices[i]]

How do I know an iterator is a variant of a template? When I first wrote that block of code, I erroneously used let instead of var for several variable declarations. (let produces immutable variables, var mutable.) I did not receive any errors until I attempted to use eachCombination; clearly, some significant checking is not done until the code is expanded.

One other oddity is the separate declaration of indices followed by creation of the sequence and then its initialization. When I tried var indices : seq[int] = newSeq[int](r), I received the error "type mismatch: got (seq[typedesc[int]]) but expected 'seq[int]'". typedesc is the compile-time type of types; I don't know why newSeq is returning that.

The remainder of the anagram searching program uses a second iterator that wraps eachCombination:


iterator allCombinations[T](values : openarray[T]) : seq[T] =
  for length in 2..values.len:
    for combo in eachCombination(values, length):
      yield(combo)

And the rest of the program is not too dissimilar to the final Python version.


proc stringToSeq(str : string) : seq[char] = return toSeq(str.items)

proc loadDictionary() : TTable[seq[char],seq[string]] =
  var result = initTable[seq[char],seq[string]]()
  for line in "anadict.txt".lines:
    var words = line.split
    result[stringToSeq(words[0])] = words[1..words.len - 1]
  return result

when isMainModule:
  if os.paramCount() == 1:
    var board = stringToSeq(os.paramStr(1))
    board.sort(system.cmp[char])
    var dictionary = loadDictionary()
    var result = initSet[string]()
    for combo in allCombinations(board):
      if dictionary.hasKey(combo):
        for word in dictionary[combo]:
          discard result.containsOrIncl(word)
    echo(result.len)

Conclusion

Overall, I like Nimrod. It has some rough edges, and I suspect the case- and underscore-insensitivity will mean that it will not become one of my favorite languages (or, maybe, I could get used to that, too), but I do want to play around with its metaprogramming facilities; they seem to be the best of any programming language I have run across. This side of Scheme, anyway.

mk_anadict.nim and nimrod_anagrams.nim can be found in the rust-toys repository on Github.

Comments

Very nice article!

I was intrigued by the seq[typedesc] error you ran into and I took some time to fix it. The fix is in the current "devel" branch on github and it will make it into the upcoming 0.9.4 release.

Anonymous
'2014-01-08T11:28:37.076-06:00'

Interesting read. But what do you mean Coq is the "best computer game" you've ever played? (That it's not useful?)

Anonymous
'2014-01-15T15:07:38.476-06:00'

Thanks for the update! (Now, that's service!)

I had thought it was some compile-time use that I did not know about.

Tommy McGuire
'2014-01-16T10:16:25.338-06:00'

About Coq: I've never really tried to use it for anything. I have done a fair amount of work with OCaml, and it is one of my favorite languages.

The reason I refer to Coq as the "best computer game" is that using it to prove things is fun! It's challenging and rewarding when you finally wind your way through a dark series of lemmas and tactics, all alike.

Tommy McGuire
'2014-01-16T10:19:30.132-06:00'