Parsing with Derivatives

Posted on June 7, 2012 by Tommy McGuire

Labels: programming language, software development

In 1964, Brzozowski defined the derivative of a regular language, which allows a nifty, different technique for implementing regular expression engines. In October, 2010, Matthew Might and David Darais posted “Yacc is dead” to ArXiv, extending the derivative technique to context-free languages. Subsequently, Matt Might’s postings on the subject appeared on various parts of the interwebs, and a while after that, I ran across them and decided to fool with the ideas. Finally, Matthew Might, David Darais, and Daniel Spiewak published the Functional Perl, “Parsing with Derivatives”. and Might gave a very good talk at Stanford.

This page ties together the posts I have made about parser derivatives.

Posts

I have written a series of posts on derivative-related techniques, for regular expressions and for parsers.

Practical regular expression derivatives presents Java code for a simple derivative-based regular expression engine. This engine is not fully fleshed-out, since it is eclipsed by general context-free grammars available to the parsing techniques of later posts, but it does clearly show the ideas.
Matt Might on parsing with derivatives links to a video presentation Might gave that clearly explains the idea of parsing with derivatives.
Parsing with derivatives: introduction shows the basic skeleton of a Java implementation, along with some background information.
Parsing with derivatives: recursion extends the Java skeleton with enough power to parse context-free grammars, although it is very inefficient.
Parsing with derivatives: compaction shows how to reduce the size of the graph structures built during parsing, using several techniques including what Might terms a compaction step between each derivation.

Code

Source code for the work in progress can be found on GitHub.