Parsing with Derivatives

Posted on June 7, 2012 by Tommy McGuire
Labels: programming language, software development

In 1964, Brzozowski defined the derivative of a regular language, which allows a nifty, different technique for implementing regular expression engines. In October, 2010, Matthew Might and David Darais posted “Yacc is dead” to ArXiv, extending the derivative technique to context-free languages. Subsequently, Matt Might’s postings on the subject appeared on various parts of the interwebs, and a while after that, I ran across them and decided to fool with the ideas. Finally, Matthew Might, David Darais, and Daniel Spiewak published the Functional Perl, “Parsing with Derivatives”. and Might gave a very good talk at Stanford.

This page ties together the posts I have made about parser derivatives.

Posts

I have written a series of posts on derivative-related techniques, for regular expressions and for parsers.

  1. Practical regular expression derivatives presents Java code for a simple derivative-based regular expression engine. This engine is not fully fleshed-out, since it is eclipsed by general context-free grammars available to the parsing techniques of later posts, but it does clearly show the ideas.
  2. Matt Might on parsing with derivatives links to a video presentation Might gave that clearly explains the idea of parsing with derivatives.
  3. Parsing with derivatives: introduction shows the basic skeleton of a Java implementation, along with some background information.
  4. Parsing with derivatives: recursion extends the Java skeleton with enough power to parse context-free grammars, although it is very inefficient.
  5. Parsing with derivatives: compaction shows how to reduce the size of the graph structures built during parsing, using several techniques including what Might terms a compaction step between each derivation.

Code

Source code for the work in progress can be found on GitHub.

active directory applied formal logic ashurbanipal authentication books c c++ comics conference continuations coq data structure digital humanities Dijkstra eclipse virgo electronics emacs goodreads haskell http java job Knuth ldap link linux lisp math naming nimrod notation OpenAM osgi parsing pony programming language protocols python quote R random REST ruby rust SAML scala scheme shell software development system administration theory tip toy problems unix vmware yeti
Member of The Internet Defense League
Site proudly generated by Hakyll.