Wednesday, October 27, 2010

Link o' the day: Meditations on Bluish Coder and inadequacy

Every once in a while, I run across someone who makes me feel inadequate. (As opposed to the people I work with every day, who make me feel completely adequate.)

Recently,  I have been interested in programming languages with dependent data types, mostly because they seem to be the natural end point of type system evolution. (Perhaps "end point" is the wrong term; "next step" may be better.) Unfortunately, most of the dependently typed languages I have seen are unappealing for various reasons.
  • Epigram seems neat, but the last time I looked at it I seem to recall it only worked within its own IDE. Not a terrible failing, but still.... And version 2 is not out yet.
  • Agda also seems neat, but for some reason I am having a hard time getting enthusiastic about it.
  • Coq is neat (there may be a trend here), but as a programming language? Certainly, it is the best computer game I have played in ages, but one issue I have is that I find Coq proofs almost unreadable. It does get lots of extra points for Coq'Art, a truly excellent book.
  • Many of the others that I have heard about are not actively developed.
However, there is one alternative with a significant feature: ATS has excellent placement in the computer language shootout. (Hey, that's how I found OCaml, which quickly became my last favorite programming language.) Plus, the current implementation is named "Anairiats", which is both cool and enigmatic.

To come back around to the point of this post, I was casting around for more information about ATS, and found Bluish Coder, the blog of Chris Double, who, judging by the tags and posts, seems to have already done most of the things on my to-do list. Scheme in Javascript, ATS, continuations, factor, etc., etc. Worst of all, he's in New Zealand.

Damn, I'm inadequate.

[Edit: Yah, and I can't even remember to tag my posts.]

Saturday, October 23, 2010

SAML2 Servlet Filter

I have been working on various forms of authentication for Java-based web applications for quite a while. The latest iteration uses the Security Assertion Markup Language, version 2 to interact with an Identity Provider (IdP), which is supported by the authentication system we get to use. When describing the OSGi-based design I came up with to support this thingamajig I again mentioned the code to handle the interaction; the first part of that code is the topic of this post. I intend to describe the servlet filter which stands in front of application servlets; a later post will describe the Assertion Consumer Service.

Because of the way the code is structured, this post must assume that you are familiar with the basic protocol as described in SAML authentication for web applications.

Background


There are two parts to the application-side (or Service Provider, SP side) of the authentication infrastructure: a servlet filter and the Assertion Consumer Service (or ACS), an application that interprets the SAML response and directs the user's request appropriately. The servlet filter intercepts incoming user requests and either
  • passes the request as-is, for those that do not need authentication,
  • handles redirected requests from the ACS after the authentication,
  • handles requests that are already authenticated,
  • does something useful in the case when SAML authentication must be disabled, or
  • generates an authentication request.
The first and fourth options are, well, optional. The SAML2 filter only needs the second, third, and fifth branches to function correctly.

I do not intend to provide complete code for the filter, because the implementation I have is tightly bound to the OSGi container and to our framework and the idea of refactoring it for genericity leaves a bad taste in my mouth. The basic structure of the filter, though, is:
try
{
  final HttpServletResponse response = (HttpServletResponse) resp;
  final HttpServletRequest request =
  filteredRequest(new HideClientSsoHeaderHttpRequest((HttpServletRequest) rqst), response, chain, getConfig(config));
  if (request == null) { return; }
  else if (handleAcsRedirect(request, response, chain)) { return; }
  else if (handleAuthenticatedRequest(request, response, chain)) { return; }
  else if (handleSamlDisabled(request, response, chain, getConfig(config))) { return; }
  else
  {
    // Unknown key: send Authentication Request
    response.sendRedirect(authnRequest(request));
    return;
  }
}
catch (...) { ... }

The possible exceptions are SecuritySystemException, a high-level exception thrown by the authentication and identity-related parts of our framework, usually indicating an operational failure of some sort, and DecoderException, from Apache Commons Codec and indicates a Base64 or other encoding error.

Pass through


Passing through requests unfiltered is relatively simple, although there are several related operations and as a result the function is larger than it would seem to be. The filteredRequest function returns either a null, indicating that the request has been handled, or a request which may be the original or may be a modified version. Pass through processing must be done first, since the whole goal is to short-circuit other more expensive processing.
private HttpServletRequest filteredRequest(HttpServletRequest request, HttpServletResponse response, FilterChain chain, Config config) throws IOException, ServletException
  {

The basic filter compares the URL from the request with a regular expression pattern; if the URL matches the request is passed on to the next element of the filter chain to be handled and null is returned.
if (config.excludedUrlPattern != null)
    {
      final String contextPath = request.getContextPath();
      final String requestUri = request.getRequestURI();
      final int contextBeg = requestUri.indexOf(contextPath);
      final int contextEnd = contextBeg + contextPath.length();
      
      String applicationUrl =
          (contextBeg < 0 || contextEnd == (requestUri.length() - 1))
          ? requestUri : requestUri.substring(contextEnd);
      if (!applicationUrl.startsWith("/"))
      {
        applicationUrl = "/" + applicationUrl;
      }
      
      if (config.excludedUrlPattern.matcher(applicationUrl).matches())
      {
        /* Request matches excluded URL pattern; do not continue with filter. */
        chain.doFilter(request, response);
        return null;
      }
    }
The second option we currently have is for AMF requests (a.k.a. BlazeDS, as in Adobe Flex), it matches against the destination of the request (which requires grubbing through the request, unfortunately). Corey describes what this does as:
When a Flex client makes a remote object call this ends making two seperate server calls. The first is an administration call that doesn't have a normal AMF payload (that can be decoded). The second is the call (with payload) to the service destination. Because I can't tell what service destination this is targeted to I'm always letting these administration request go through unfiltered. I need to add a check here when this happens to make sure they are targeting the Flex broker with the URL.
if (config.excludedAmfDestinationPattern != null
        && AMF_CONTENT_TYPE.equals(request.getHeader(CONTENT_TYPE_HEADER)))
    {
      final CapturedBytesHttpRequest captureBytesRequest = new CapturedBytesHttpRequest(request);
      final AmfDecoder decoder = new AmfDecoder(captureBytesRequest.getCapturedBytes());
      if (!decoder.decode() || config.excludedAmfDestinationPattern.matcher(decoder.getDestination()).matches())
      {
        /* Request matches AMF destination pattern; do not continue with filter. */
        chain.doFilter(captureBytesRequest, response);
        return null;
      }
      else
      {
        return captureBytesRequest;
      }
    }
    
    return request;
  }

ACS redirects

When a request is directed to the ACS which contains a successful authentication, it redirects the browser back to the original URL, meaning that the request is captured by this filter. However, the redirected request (which must be a GET) contains a query parameter matching the regular expression pattern "^ACSREQUESTID=(.*)$". (In-band signaling is kind of unpleasant, but there are few other ways to correctly get the requisite information shared by the ACS, this filter, and the browser.) When a request contains the ACSREQUESTID, the value is picked out and decoded. This value is a token that is used as a key for the authentication information as well as to recover the original request, which is handled by this function.
private boolean handleAcsRedirect(final HttpServletRequest request, final HttpServletResponse response, final FilterChain chain)
  throws DecoderException, IOException, ServletException
  {
    final String queryString = request.getQueryString();
    if (queryString != null)
    {
      final Matcher matcher = pat.matcher(queryString);
      if (matcher.matches())
      {
        final String key = new URLCodec().decode(matcher.group(1));
        final HttpServletRequestWrapper storedRequest = Storage.getRequestStore().getRequest(key);
        final IdentityMap mapping = Storage.getIdentityStore().getIdentity(Arrays.asList(key));
        if (storedRequest != null && mapping != null)
        {
          storedRequest.setRequest(request);          
          chain.doFilter(new AuthenticatedHttpServletRequest(storedRequest, mapping), response);
          Storage.getRequestStore().removeRequest(key);
          return true;
        }
      }
    }
    return false;
  }
The Storage class provides access to two different storage systems:
  • The IdentityStore records information returned from the IdP about a user; the information is returned as an IdentityMap, which is used to construct an AuthenticatedHttpServletRequest.
    public interface IdentityStore
    {
      public abstract void storeIdentity(String key, String identity, Map<String, List<String>> attributes, Date expiry);
      public abstract IdentityMap getIdentity(List keys);
      public abstract IdentityMap getIdentity(Identity identity);
      public interface IdentityMap
      {
        public abstract String getEmployeeNumber();
        public abstract Map<String,List<String>> getAttributes();
        public abstract Date getExpiration();
        public abstract boolean valid();
      }
    }
    
  • The RequestStore records the original request from the user (which will be stored below)
    public interface RequestStore
    {
    
      public abstract void storeRequest(String key, HttpServletRequest request);
      public abstract String getRequestUrl(String key);
      public abstract HttpServletRequestWrapper getRequest(String key);
      public abstract void removeRequest(String key);
    }
    
The Storage class in turn comes in two forms:
  • An in-memory store, used for original testing and possibly in case of database failures, and
  • A database-backed store, used to share requests between load-balanced servers as well as preserving requests in case of server restarts.
An AuthenticatedHttpServletRequest recovers a stored request and adds the identity information from the IdentityMap to allow access from the application.

Authenticated requests

Authenticated requests are simple to handle. The token used as a key store the identity information is also added as a cookie to the response by the ACS, so all this method needs to do is to recover the cookie and check it against the IdentityStore (which includes checking if the authentication has timed out).
private boolean handleAuthenticatedRequest(final HttpServletRequest request, final HttpServletResponse response, final FilterChain chain)
  throws DecoderException, IOException, ServletException
  {
    final List authCookie = Misc.getCookie(request, AUTH_COOKIE_NAME);
    final IdentityMap mapping = Storage.getIdentityStore().getIdentity(authCookie);
    if (mapping != null && mapping.valid())
    {
      chain.doFilter(new AuthenticatedHttpServletRequest(request, mapping), response);
      return true;
    }
    return false;
  }

Disabling SAML

The downside of the SAML2 single-sign-on design is that if the IdP is down, the applications are also unavailable. As an alternative, I added a administratively configurable flag to disable SAML2 processing. If SAML2 is disabled, there are three other options:
  • Some applications are already able to handle their own authentication in place of SAML2. For these applications, if SAML2 is disabled all requests are passed on directly to the application.
  • Some applications may need a generic username and password based authentication scheme; in this case a application-specified page is displayed with a form request a login. Once the user is logged in, normal identity information is available to the application, so the authenticated requests branch above can be taken.
  • Any remaining applications should simply display an "unavailable" page.
I will not show any code for this method, since it is relatively straightforward and the processing for option 2 is lengthy.

Generating authentication requests

The remaining possibility requires generating an authentication request, storing the original request, and redirecting the browser via the request back to the IdP.
private String authnRequest(HttpServletRequest request) throws SecuritySystemException
  {
    // Generate the session identification key
    final String key = Framework.getSessionKeyService().generateSessionKey();
    Storage.getRequestStore().storeRequest(key, request);
    return new AuthenticationRequest(key).toUriString();
  }
The session key generated is used as the token for the SAML2 request as well as the value of the cookie identifying future authenticated requests. The key itself is just a cryptographically secure random number. The String generated is the URL of the IdP with the authentication request added as a query parameter, following the SAML2 specifications. This is passed back to the browser as a redirect.

An AuthenticationRequest

To interact with the IdP, I used the fmclientsdk.jar and openssoclientsdk.jar libraries from OpenAM, which was OpenSSO from Sun before the Oracle acquisition. I currently need to use both, because some classes are missing from each. (I really should report that as a bug.)
public AuthenticationRequest(String key) throws SecuritySystemException
  {
    try
    {
      final String destinationUrl = ...;
      final boolean forceAuthn = ...;
      final boolean isPassive = ...;
      final boolean signAuthnRqst = ...;
      final String acsUrl = ...;

      final AuthnRequest authnRequest = ProtocolFactory.getInstance().createAuthnRequest();
      
      authnRequest.setID(key);
      authnRequest.setVersion(SAML_VERSION);
      authnRequest.setIssueInstant(new Date());
      authnRequest.setDestination(destinationUrl);
      if (forceAuthn)
      {
        authnRequest.setForceAuthn(forceAuthn);
      }
      if (isPassive)
      {
        authnRequest.setIsPassive(isPassive);
      }
      authnRequest.setProtocolBinding(PROTOCOL_BINDING);
      authnRequest.setAssertionConsumerServiceURL(acsUrl); 
      authnRequest.setIssuer(issuer());
      authnRequest.setNameIDPolicy(nameIdPolicy());
      authnRequest.setRequestedAuthnContext(requestAuthnContext());
      if (signAuthnRqst)
      {
        sign(authnRequest);
      }
      
      authnRequestMessage = authnRequest.toXMLString(true, true);
      log.debug("request: " + authnRequestMessage);
    }
    catch (SAML2Exception e)
    {
      throw new SecuritySystemException("Unable to create SAML2 AuthnRequest", e);
    }
    catch (...) { ... }
  }
The com.sun.identity.saml2.protocol.ProtocolFactory class provides access to the components needed to generate the authentication request. The created object is a com.sun.identity.saml2.protocol.AuthnRequest, which has a large number of properties which can be set to fill in elements in the request. For precise details of which properties are needed, see the SAML2 specifications as well as something like the Federal ICAM (Identity, Credential, and Access Management) Security Assertion Markup Language (SAML) 2.0 Web Browser Single Sign-on (SSO) Profile. (Man, that thing has an unfortunate logo.) This method calls three additional methods to create other parts of the request:
  • The issuer identifies the server making the authentication request.
    private Issuer issuer() throws SAML2Exception
      {
        final String issuerName = ...;
        final Issuer issuer = AssertionFactory.getInstance().createIssuer();
        issuer.setValue(issuerName);
        return issuer;
      }
    
  • The name id policy describes the requested properties of the "name" used to identify the user in the authentication response. In our case, I am actually using an employee number, which is returned as one of the attributes of the user, rather than the SAML2 name.
    private NameIDPolicy nameIdPolicy() throws SAML2Exception
      {
        final String issuerName = ...;
        final boolean allowCreate = ...;
        final NameIDPolicy nameIdPolicy = ProtocolFactory.getInstance().createNameIDPolicy();
        nameIdPolicy.setFormat(NAMEID_POLICY_FORMAT);
        nameIdPolicy.setSPNameQualifier(issuerName);
        nameIdPolicy.setAllowCreate(allowCreate);
        return nameIdPolicy;
      }
    
  • Finally, the authentication context describes the requested properties of the authentication itself. For example, it is possible to request various levels of confidence in the authentication, from simple user assertions to multi-factor cryptographic confidence.
    private RequestedAuthnContext requestAuthnContext() throws SAML2Exception
      {
        final RequestedAuthnContext requestedAuthnContext = ProtocolFactory.getInstance().createRequestedAuthnContext();
        requestedAuthnContext.setComparison(AUTHN_CONTEXT_COMPARISON);
        requestedAuthnContext.setAuthnContextClassRef(AUTHN_CONTEXT_CLASSREF_VALUES);
        return requestedAuthnContext;
      }
    
The one tricky feature of the authentication request is the cryptographic signature. The sign method above (invoked when the signAuthnRequest flag is set) signs the authnRequest "internally", using XML encryption. In the HTTP-Redirect binding, requests are signed after being converted to strings, which will be described shortly. Typically, signAuthnRqst should be false. Instead, the authentication request, formatted in XML, is first compressed (using DeflaterOutputStream).
public String toEncodedString() throws SecuritySystemException
  {
    try
    {
      final ByteArrayOutputStream baos = new ByteArrayOutputStream(authnRequestMessage.length());
      final DeflaterOutputStream dos = new DeflaterOutputStream(baos, new Deflater(Deflater.BEST_COMPRESSION, true));
      dos.write(authnRequestMessage.getBytes());
      dos.close();
      return encodedString = encode(baos.toByteArray());
    }
    catch (IOException e)
    {
      throw new SecuritySystemException("Unable to encode SAML2 AuthnRequest", e);
    }
  }
Following compression, the string is Base-64 encoded, then URL-encoded to protect non-URL-safe Base-64 characters.
private String encode(byte[] byteArray)
  {
    return new String( new URLCodec().encode( Base64.encodeBase64(byteArray) ) );
  }
Finally, the query string is partially assembled, with the SAML request and SigAlg (indicating the signature algorithm) parameters. The resulting string is signed and the signature added as a third query parameter. The whole is assembled with the URL to be returned to the browser.
public String toUriString() throws SecuritySystemException
  {
    try
    {
      final String parameters = String.format("%s=%s&%s=%s",
        SAML_REQUEST,
        toEncodedString(),
        "SigAlg", new URLCodec().encode("http://www.w3.org/2000/09/xmldsig#rsa-sha1"));
      final String signature = signature(parameters);
      return String.format("%s?%s&%s=%s", destinationUrl, parameters, "Signature", signature);
    }
    catch (EncoderException e)
    {
      throw new SecuritySystemException("Cannot enecode SigAlg", e);
    }
  }
To sign the string, the normal Java cryptography API's are employed. The result is encoded identically to the compressed request.
private String signature(String unsigned) throws SecuritySystemException
  {
    String signatureAlgorithm = ...;
    try
    {
      final Pair pair = ...;
      final Signature signature = Signature.getInstance(signatureAlgorithm);
      signature.initSign(pair.getY());
      signature.update(unsigned.getBytes("UTF-8"));
      return encode( signature.sign() );
    }
    catch (...) { ... } ...
  }

Conclusion


The filter is relatively complex, although it follows a simple step-by-step procedure. However, most of the complexity comes from generating the authentication request, which requires interaction with the OpenAM client SDK, Java cryptography and compression, and HTTP URL manipulation.

Best of luck, and I hope this helps someone.

Sunday, October 17, 2010

Link o' the Day: Top 10 Performance Problems

From the Canned Platypus link o' the day, I found a very nice list of Top 10 Performance Problems taken from Zappos, Monster, Thomson and Co. I am not sure I would call them the "Top 10", but they are certainly significant causes of web application performance issues. (Also, I would switch #10 and the #11 bonus; "Intermittent Problems" is not really a class of problems.)

Tuesday, October 12, 2010

Quote o' the Week: It’s [Not] Faster Because It’s C

From Canned Platypus,
I’d even argue that the main reason kernel code tends to be efficient is not because it’s written in C but because it’s written with parallelism and reentrancy in mind, by people who understand those issues. A lot of code is faster not because it’s written in C but for the same reasons that it’s written in C. It’s common cause, not cause and effect. The most common cause of all is that C code tends to be written by people who have actually lived outside the Java reality-distortion bubble and been forced to learn how to write efficient code (which they could then do in Java but no longer care to).
Smack!

There is some good advice there:
Focus on stability and features first, scalability and manageability second, per-unit performance last of all, because if you don’t take care of the first two nobody will care about the third.
[...]
Compare results, not approaches.
On the other hand, per-unit performance is a good tool to achieve stability, scalability, and manageability, so don't get carried away.

Monday, October 11, 2010

(How to write a (Lisp) interpreter (in C++) (or make Peter Norvig cry, whichever comes first))

[11/30/2010] Anthony C. Hay's Lisp interpreter in [more than] 90 lines of C++ is very sweet, and much more functional than mine. Check it out!

Last week, I was pointed (by reddit) to Peter Norvig's (How to Write a (Lisp) Interpreter (in Python)) and the follow-on, (An ((Even Better) Lisp) Interpreter (in Python)), and for no readily discernible reason was compelled to attempt a translation. Naturally, I chose C++, a language I have less than a year's experience with, which period was more than a couple of years ago, and which I do not especially like. So, in a spirit similar to that needed to properly enjoy tucking a live weasel into my trousers, I set to work.

What follows is the important part of the result, the eval function. Value::p and Environment::p are both pointers, to a Value and an Environment respectively. (The necessity of using the weird ::p idiom will be explained shortly, although you might consider the random dynamic_pointer_cast's to be a clue.)

Value::p eval(Value::p x, Environment::p env) {

The first branch handles symbols, the value of which is found in the environment.

if (x->type() == Value::SYMBOL) {                            // Look up the value
      Symbol::p x1 = dynamic_pointer_cast(x);
      return env->find(x1)[x1];

The second branch handles the other scalar values, which in this interpreter are integers and strings. These values are self-evaluating.

} else if (x->type() != Value::LIST) {                       // Everything else scalar is self-evaluating
      return x;

The remaining values are all lists of various flavors. To make it easier to manipulate the values, I broke out repr, which is the STL list representing the list, symbol, which is the symbol at the head of the list, it, which is a STL iterator pointing to the second element of the list, and end, which is a STL iterator pointing to the end of the list.

} else {
      List::p l = dynamic_pointer_cast<List>(x);                 // This is where it gets exciting
      list<Value::p> repr = l->to_list();
      if (repr.size() > 0 && repr.front()->type() == Value::SYMBOL) {
        Symbol::p symbol = dynamic_pointer_cast<Symbol>(repr.front());
        list<Value::p>::iterator it = ++(repr.begin());          // The second element, needed by most branches
        list<Value::p>::iterator end = repr.end();               // The end of the list

The first of the special forms is quote, which returns the remainder of the list.

if ((symbol->to_string()) == "quote") {                  // quote: Return the rest of the list
          return List::create(++it, end);

The second special form is if, which evaluates the next element and tests its boolean value; if true the result is the evaluation of third element, if false, the fourth element. The element chosen is handled by the test and possible extra increment.

} else if ((symbol->to_string()) == "if") {              // if: Yeah, like this is going to work
          if (!(*eval(*it++, env))) { ++it; }
          return eval(*it, env);

The third special form is set!, which updates a new binding in the environment. Environment handling is one thing I stole directly from Norvig; find() returns a map, and the ...[var] = val assignment is a normal operator on that map. Very nice.

} else if ((symbol->to_string()) == "set!") {            // set!: Life does not get worse if I remove this
          Symbol::p var = dynamic_pointer_cast<Symbol>( eval(*it++, env) );
          Value::p val = eval(*it++, env);
          env->find(var)[var] = val;
          return val;

The fourth special form is define, which adds a new binding to the environment. The second element of the list must be a symbol, which will not be evaluated, and the third will be the new value, which will be evaluated.

} else if ((symbol->to_string()) == "define") {          // define: Hey, Mickey! You're so defined...
          Symbol::p var = dynamic_pointer_cast<Symbol>(*it++);   // Yeah, that's not even funny to me
          Value::p val = eval(*it++, env);
          env->update(var, val);
          return val;

The fifth special form is lambda, which is used to create a new function. The second element of the list is a list of argument identifiers, while the third is the body of the function. You will take note that I called out the variables pars and body; if you put both values in the call to Lambda::create, you have to look up the definition of sequence points.

} else if ((symbol->to_string()) == "lambda") {          // lambda: first element is the parameters, next is the body
          List::p pars = dynamic_pointer_cast<List>(*it++);
          List::p body = dynamic_pointer_cast<List>(*it);
          return Lambda::create(pars, body);

The sixth and final special form is begin, which simply evaluates the remaining values of the list and returns the final value.

} else if ((symbol->to_string()) == "begin") {           // begin: evaluate 'em all and let McCarthy sort 'em out
          Value::p value;
          for (; it != end; ++it) { value = eval(*it, env); }
          return value;

And the final branch is function application. The first step is to evaluate all of the values in the list. The first element should be a Lambda value, which is called with the arguments and the environment.

} else {                                                 // application: this is where it gets really exciting
          list::iterator it = repr.begin();            // The beginning of the list
          list evaluated;
          for (; it != end; ++it) { evaluated.push_back( eval(*it, env) ); }
          {
            Lambda::p lambda = dynamic_pointer_cast(evaluated.front());
            List::p args = List::create(++(evaluated.begin()), evaluated.end());
            return (*lambda)(args, env);
          }
        }
      }
    }
  }

For those of you who are long-term Lisp users, the collection of closing braces should make you feel all warm inside, right?

Unlike Norvig's version, which effectively compiles to Python by using Python's lambda, in C++ I had to implement lambda and function application myself. The Lambda class itself maintains a list of parameter names and the body expression. Function application is handled by operator(), in other words by overloading the function call for Lambda values. The operator() method recursively calls eval on the body expression, after creating a new environment mapping the parameter names to argument values.

class Lambda : public Value {
  public:
    typedef shared_ptr<Lambda> p;
    static p create(List::p params, List::p exp) { return p(new Lambda(params,exp)); }
    Value::t type() { return LAMBDA; }
    operator bool() { return true; }
    string to_string() { return "LAMBDA"; }

    virtual Value::p operator()(List::p args, Environment::p outer) {
      return eval(_e, create_env(outer, _p->to_list(), args->to_list()));
    }

  private:
    List::p _p;
    List::p _e;
  protected:
    Lambda(List::p p, List::p e) : _p(p), _e(e) { }

    Environment::p create_env(Environment::p outer, list<Value::p> ¶ms, list<Value::p> &args) {
      list<Value::p>::iterator p_begin = params.begin();
      list<Value::p>::iterator p_end = params.end();
      list<Value::p>::iterator a_begin = args.begin();
      list<Value::p>::iterator a_end = args.end();
      Environment::p env = Environment::create(outer);
      for (; p_begin != p_end && a_begin != a_end; p_begin++, a_begin++) {
        Symbol::p symb = dynamic_pointer_cast<Symbol>(*p_begin);
        env->update(symb, *a_begin);
      }
      return env;
    }
  };

One (among many) downsides of implementing lambdas manually is that it also requires implementing all of the builtin functions.

struct Builtin : public Lambda {
    virtual Value::p operator()(List::p args, Environment::p outer) = 0;
    Builtin(List::p p) : Lambda(p,List::p()) { }
  };

As an example of a builtin function, Plus implements integer addition. Verbose, but tedious.

struct Plus : public Builtin {
    Plus(List::p p) : Builtin(p) { }
    Value::p operator()(List::p args, Environment::p outer) {
      list<Value::p>::iterator end = args->to_list().end();
      long sum = 0;
      for (list<Value::p>::iterator i = args->to_list().begin(); i != end; ++i) {
        sum += dynamic_pointer_cast<Int>(*i)->to_int();
      }
      return Int::p(new Int(sum));
    }
  };

Amazingly, the Environment is not completely hideous. It is fundamentally a loose wrapper around the C++ STL map, using Norvig's tactic of returning the map from find().

class Environment {
  public:
    typedef shared_ptr<Environment> p;
    typedef map<Symbol::p,Value::p,Symbol::less> env_t;
   
    static p create() { return p(new Environment()); }
    static p create(p outer) { return p(new Environment(outer)); }

    virtual void update(Symbol::p s, Value::p v) { _e.insert( make_pair(s,v) ); }
    virtual env_t& find(Symbol::p s) { return _e.count(s) > 0 ? _e : _outer->find(s); }

  private:
    env_t _e;
    p _outer;

    Environment() { }
    Environment(p outer) : _outer(outer) { }
  };

Lisp parsing is actually relatively simple, although C++'s string handling is somewhat lacking. To handle it similarly to Norvig, I had to implement replace to provide some convenience.

void replace(string &s, string from, string to) {
    int idx = 0;
    int next;
    while ((next = s.find(from, idx)) != string::npos) {
      s.replace(next, from.size(), to);
      idx = next + to.size(); 
    }
  }

Fortunately, C++ lets me faff about with string I/O to break the input string into a list of string tokens.

list tokenize(string s) {
    replace(s, "(", " ( ");
    replace(s, ")", " ) ");
    istringstream iss(s);
    list result;
    result.insert(result.end(), istream_iterator<string>(iss), istream_iterator<string>());
    return result;
  }

Not to mention using string I/O to parse integers.

Value::p atom(string token) {
    istringstream oss(token);
    long l;
    if (oss >> l) { return Int::p(new Int(l)); } else { return Symbol::p(new Symbol(token)); }
  }

Fortunately, I did not bother to do much error handling anywhere in the code, so the missing bits of read_from are (ahem) unexceptional.

Value::p read_from(list<string> &tokens) {
    if (tokens.size() <= 0) { /* error */ }
    string token = tokens.front();
    tokens.pop_front();
    if (token == "(") {
      list<Value::p> lst;
      while (tokens.front() != ")") {
        lst.push_back( read_from(tokens) );
      }
      tokens.pop_front();
      return List::create(lst);
    } else if (token == ")") { /* error */
    } else {
      return atom(token);
    }
  }

Anyway, to get back to a discussion I delayed at the top of this post, I avoided memory management almost entirely by using TR1 shared_ptr<T> templates, which almost trivially handle reference counting. To avoid some template syntax nastiness, I adopted an idiom of providing a typedef of "p" in a class C for the type shared_ptr<C>. Now, I really like type-safe collections, and I am duly impressed (and terrified) by some of the bizarre abuses that C++ templates are put to. But shared_ptr's are pretty darn sweet. The only downside is reference counting, which has problems with circular structures. Fortunately, I tend not to create circular structures and could probably deal with them in this Lisp by removing set!.

Amazingly, the interpreter does in fact seem to work; the main function below (sorry, no repl as I used up my I/O quota above)...

int
main(int argc, char *argv[]) {
  Lispic::Environment::p global = Lispic::global_env();
  std::list tokens = Lispic::tokenize(
    "(begin"
    "  (print (+ 2 3))"
    "  (define double (lambda (x) (+ x x)))"
    "  (print (+ 3 3) (double 4))"
    ")"
  );
  std::cout << Lispic::eval(Lispic::read_from(tokens), global)->to_string() << std::endl;
  return 0;
}
...produces the results:
$ ./lispic
5
6
8
8
All this in about 300 lines of code. For greater amusement, you can get the source code for this monstrosity.

If you are still reading, thank you very much for your attention. If you were looking for more about the CAP theorem, bear with me. I think I have some ideas on how to actually formalize it using either Unity (I took a class from Jay Misra about 15 years ago) or Leslie Lamport's TLA+ (I have a Microsoft Channel 9 video). I suspect that it is going to involve temporal logic. Duck and cover!

[Edit] I just remembered Paul Wilson's incomplete An Introduction to Scheme and its Implementation, which is fortunately still available. At least, I think it was written by Paul Wilson, of the garbage collection reading collection.