Variations on the theme of monadic regular expressions: Abstraction
Posted on September 6, 2010
by Tommy McGuire
Regular expressions are a more general tool than appears in normal usage. They can be used to specify any finite state machine, although without extensions to the expressions such machines would have no visible behavior beyond accepting or rejecting. (They're also both a floor wax and a dessert topping, but are artificial whipped cream and a some kind of very lame floor wax, if such a thing is possible.)In any case, it might be useful to have a regular expression engine that can operate on different collections than Haskell's List and on elements of a different type than Char. (In Haskell, a String is a List of Char's.) That is the purpose of this variation on the monadic regular expression engine.
> module AbstractEngine where
>
> import Prelude hiding (or,seq,length,drop,elem)
> import Control.Monad
> import List as L hiding (or,drop,elem)
In addition to the underlying monad and the base element type, the Pattern type must be parameterized by the type of the collection of elements, s. As with the previous iteration, the two elements of the pair are the previously matched text and the yet-to-be-examined text.
> newtype Pattern m s a = P { runPattern :: (s a, s a) -> m (s a, s a) }
Most of the regular expression operations are very similar to their counterparts in the previous version. Certainly, done is.
> done :: Monad m => Pattern m s a
> done = P return
The end function is different in two respects. First, it adds a type class, Q, which must be instanced by the collection s. Second, instead of using pattern matching to examine the state of the input, it uses the isEmpty predicate, part of the Q type class.
> end :: (Q s, MonadPlus m) => Pattern m s a
> end = P $ \ (m,e) -> if isEmpty e then return (m,e) else mzero
Likewise, any uses the isEmpty predicate in addition to the deq function to decompose the input and the enq function to re-compose the matched output. The next function, lit, adds prefix, append, drop, and len.
> any :: (Q s, MonadPlus m) => Pattern m s a
> any = P $ \ (m,s) -> let (hd,tl) = deq s in
> if isEmpty s then mzero else return (m `enq` hd, tl)
>
> lit :: (MonadPlus m, Q s, Eq a) => s a -> Pattern m s a
> lit l = P $ \ (m,s) -> if l `prefix` s then return (m `append` l, drop (len l) s) else mzero
The cls and ncls functions are made up of a plethora of functions from the Q type class. Part of the reason is that, for simplicity, I chose to represent the set arguments to cls and ncls as values of the type of the collections.
> cls :: (Q s, MonadPlus m, Eq a) => s a -> Pattern m s a
> cls set = P cls'
> where
> cls' (m,s) | (isEmpty s) || not (hd `elem` set) = mzero
> | otherwise = return (m `enq` hd, tl)
> where
> (hd,tl) = deq s
>
> ncls :: (Q s, MonadPlus m, Eq a) => s a -> Pattern m s a
> ncls set = P ncls'
> where
> ncls' (m,s) | (isEmpty s) || (hd `elem` set) = mzero
> | otherwise = return (m `enq` hd, tl)
> where
> (hd,tl) = deq s
The seq, or, and kst combinators are identical to the previous version, with the exception of the additional type parameter.
> seq :: Monad m => Pattern m s a -> Pattern m s a -> Pattern m s a
> seq p1 p2 = P $ \ cur -> runPattern p1 cur >>= runPattern p2
>
> or :: MonadPlus m => Pattern m s a -> Pattern m s a -> Pattern m s a
> or p1 p2 = P $ \ cur -> (runPattern p1 cur) `mplus` (runPattern p2 cur)
>
> kst :: MonadPlus m => Pattern m s a -> Pattern m s a
> kst p = (p `seq` (kst p)) `or` done
The type class Q is the key to this version of the regular expression engine. Q is used as a constraint on the collection type of the input, and is named somewhat appropriately, Any instance of this class must supply the basic functions isEmpty, enq, and deq.
- isEmpty is a simple predicate, true if the collection has no contents and false otherwise.,/li>
- enq takes a collection and an additional element and adds the element to the end of the collection.
- deq takes a collection and decomposes it into the first element and the remainder of the collection. This function will always be called safely (in some cases by the magic of lazy evaluation).
> class Q s where
> isEmpty :: s a -> Bool
> enq :: s a -> a -> s a
> deq :: s a -> (a, s a)
The remaining functions in the type class are provided with implementations based on the previous functions. For the most part, they are fairly simple exercises in basic Haskell, if of limited style, given their reliance on only the three base Q functions.
> len :: s a -> Int
> len s | isEmpty s = 0
> | otherwise = 1 + (len tl)
> where
> (_,tl) = deq s
>
> prefix :: Eq a => s a -> s a -> Bool
> prefix s t | isEmpty s = True
> | isEmpty t = False
> | hds == hdt = prefix tls tlt
> | otherwise = False
> where
> (hds, tls) = deq s
> (hdt, tlt) = deq t
>
> drop :: Int -> s a -> s a
> drop 0 s = s
> drop i s = let (_,tl) = deq s in
> if isEmpty s then s else drop (i-1) tl
>
> elem :: Eq a => a -> s a -> Bool
> elem a set | isEmpty set = False
> | a == elt = True
> | otherwise = elem a rem
> where
> (elt,rem) = deq set
>
> append :: s a -> s a -> s a
> append s t | isEmpty t = s
> | otherwise = (s `enq` elt) `append` rem
> where
> (elt,rem) = deq t
The instance declaration of Q for Lists allows the engine to handle Strings as well as potentially other structures, and illustrates how to develop a suitable instance for another collection type.
> instance Q [] where
> isEmpty = L.null
> enq l a = l ++ [a]
> deq q = (L.head q, L.tail q)
The point of this article is not necessarily to present the wondrous generic regular expression parser. If there is any point to it at all, it is the technique of using a type class to parameterize a module and to require, and supply, the minimal set of features for necessary functionality. The same technique is applicable to almost any language.