Mad science, abstract data types, and objects

Posted on December 12, 2011 by Tommy McGuire

Labels: theory, notation, toy problems, haskell, lisp, java, programming language

Mad science, and mad scientists, are always good to find. They represent guaranteed entertainment. But they do show up in the oddest places. How about this one: OOPSLA 2009, William Cook presents a talk entitled “No one listened! But I’ll show them, I’ll show them all!”. And the weird part was, nobody noticed. Well, sure, it’s actually a popular paper that gets referenced frequently and the title is in fact “On Understanding Data Abstraction, Revisited”, but there’s no sign that the peasants went scurrying for their pitchforks and torches.

William Cook

So, what’s the deal with Cook and his paper?

A little backstory

In 1990, Cook published “Object-Oriented Programming Versus Abstract Data Types” which is almost identical in content to, but somewhat longer than, “On Understanding Data Abstraction, Revisited”. In both papers Cook describes the long-established, but little known, difference between objects and abstract data types. (For simplicity, the object-oriented side is limited to a subset of what I learned to call “object-based programming”; objects, as you might normally think of them, minus inheritance and also without modifications.)

Given those limitations, objects are data abstractions that “have a collection of methods, or procedures, that share access to private local state.” Consider the type specimen of object-oriented programming: Smalltalk. In Smalltalk, objects have internal fields that contain the object’s state, but this state is entirely unavailable to the outside world. Instead the object has a collection of messages that it responds to, exposing information potentially based on its internal state as necessary. Once an object is created, it is entirely encapsulated, isolated from any other code than that implementing its behavior. Because the interface of the data abstraction is the methods exposed by the object, Cook identifies objects as procedural data abstractions. The key feature of an object is that its interface is defined by its constructor: the collection of methods the object responds to and the semantics of those methods are defined by the construction of the object. In a sense, those methods are associated with the specific value or instance.

Abstract data types, on the other hand, have “a public name, a hidden representation, and operations to create, combine, and observe values of the abstraction.” The classic examples of them are the primitive types in Java, C++, or most other common languages. A value has a defined type name, say “double”, some representation (Do you really know, or care, what a double actually looks like? (Note: You should. Floating point numbers are a notoriously leaky abstraction.)), and operations like reading one or converting one from some other representation, adding or subtracting two of them, and converting one to another representation, such as a string of characters. The key feature of an ADT is that its interface is defined by a collection of operations that apply to a specific type.

$$ \begin{array}{cr} & \textbf{Constructors} \\ \textbf{Observations} & \begin{array}{lcc} & [] & :: \\ \textrm{empty?} & \textrm{true} & \textrm{false} \\ \textrm{head} & \textrm{error} & \textrm{value} \end{array} \end{array} $$

Let’s say we create a matrix of operations on a data abstraction. My data abstraction is a list; it has two constructors, the Haskell-like “[]”, or nil, building an empty list, and “::” or “cons”, representing a constructor that takes a new element and an existing list and produces a new, combined list with the new element at the head. The list also has two “observation” operations, empty? which returns true for an empty list and false otherwise, and head, which produces some kind of error on an empty list, but returns the first value in the list otherwise. (This example is taken directly from “OOP Versus ADT”.)

$$ \begin{array}{cr} & \textbf{Constructors} \\ \textbf{Observations} & \begin{array}{l|c|c|} & [] & :: \\ \textrm{empty?} & \textrm{true} & \textrm{false} \\ \textrm{head} & \textrm{error} & \textrm{value} \end{array} \end{array} $$

Suppose I want to create an object-oriented version of this data abstraction. I would divide up the operations by constructor, so that I had an implementation of empty? and head for an empty list and empty? and head for a cons cell. The details of each operation is divided among the objects created by each specific constructor. Adding a new constructor is easy, since it simply requires providing implementations of the observations on the new object. On the other hand, adding a new observation is difficult, since it requires modifying all of the existing constructors. This should sound familiar to anyone who has written code in almost any object oriented language.

$$ \begin{array}{cr} & \textbf{Constructors} \\ \textbf{Observations} & \begin{array}{lcc} & [] & :: \\ \hline \textrm{empty?} & \textrm{true} & \textrm{false} \\ \hline \textrm{head} & \textrm{error} & \textrm{value} \end{array} \end{array} $$

On the other hand, suppose I want an ADT version of this data abstraction. In this case, I would divide the operations by observation, so that I had an implementation of empty? and head that each covered all of the constructors. The details of each constructor is spread among the observation implementations. Adding new observation is easy; it just has to handle the individual cases of the constructors. Adding a new constructor is hard, though, because every observation needs to be modified. This option should be familiar to anyone who has used ML or Haskell, since it is how their algebraic data types work.

There are some less obvious consequences to these two styles of data abstraction. For one thing, the two approaches are almost duals; the advantages of one are the weaknesses of the other, and vice-versa. Both approaches do, in fact, provide data abstraction; objects by encapsulating the state of the data instance behind procedural abstractions, ADTs by encapsulating that state behind a type boundary. Both also describe (if you squint a bit) common data abstraction patterns. On the other hand, each approach has some peculiarities: because an object only has access to its own state, complex operations require weakening the encapsulation of objects. Cook calls “an operation ‘complex’ if it inspects multiple representations,” like for example equality testing; I may not want to expose all of the implementation information necessary to compare two instances in the methods that an object responds to, yet I have no other option in order to implement some observations. Cook claims that object oriented, procedural abstractions are possible in dynamically typed languages but ADTs are not; “abstract data types depend upon a static type system to enforce type abstraction. It is not an accident that dynamic languages use objects instead of user-defined abstract data types. Dynamic languages typically support built-in abstract data types for primitive types; the type abstraction here is enforced by the runtime system.”

I do not doubt the validity of this deconstruction of objects and abstract data types, if for no other reason than that it exactly covers Philip Wadler’s “expression problem”. Also, both ideas seem pretty productive, which is always the primary test of science on the hoof. If that was all there was to it, there would be no mad science involved.

The mad science part

Mad Science Tip #1: If you have to repeat the same message over 20 years, and no one seems to be listening to you, they’re the crazy ones. They’re the ones who don’t see.

Mad Science Tip #2: A beautiful theory, one that explains everything, is obviously true. Even if reality doesn’t seem to particularly agree. The ADT half of Cook’s papers seems to be relatively uncontroversial, perhaps because no one actually uses the languages (like ML and Haskell) that have user-defined ADTs as the primary data abstraction scheme. On the other hand, Cook’s minor point about the inability to use ADTs in dynamically typed languages doesn’t seem particularly, well, true; most Lisps, for example, allow you to define structures (or write macros to build such structures) that have representations hidden from any functions other than a limited group. Such pseudo-ADTs may not be as encapsulated as Cook would like, but coupled with a common coding style rule, say “Don’t do stupid things,” they work well enough. Objects, likewise, do not actually seem to exist. It’s been too long since I last used Smalltalk, but every other “object-oriented” language allows a method in a class to access the internals of more than one object of the same class, neatly avoiding (some cases of) the “complex” operation issue. (To avoid all of the difficulties presented by complex operations requires something like multimethods or other techniques that are even more rare.) Maybe some of the object systems for Scheme satisfy the requirements for objects, but Scheme doesn’t have a standard anything, much less an object system; it is a toolkit for implementing other languages. Further, Cook’s ideas on actually doing object-oriented programming in Java would likely horrify most Java-istas, even those that love interfaces as much as I do.

Mad Science Tip #3: You get extra points for hosting dinner parties where you torment your rivals.

What is the relationship between objects and abstract data types (ADTs)? I have asked this question to many groups of computer scientists over the last 20 years. I usually ask it at dinner, or over drinks. The typical response is a variant of “objects are a kind of abstract data type”.

Whereupon he goes all maniacal on them. Last and possibly least, Mad Science Tip #4: Being a mad scientist is fun; you get to meet interesting people, there’s nothing like the je ne sais quoi of carving your name in the moon with a giant laser. (For that matter, giant lasers are just plain cool no matter what you do with them.) But it may have detrimental effects on your social life.

Most groups eventually work through the differences between objects and ADTs, but I can tell they walk away feeling uneasy, as if some familiar signposts now point in different directions.

…or perhaps as if they’re just glad to get away with all the right number of appendages. Now, I’m not saying William Cook is going to destroy the world if Java 8 doesn’t get objects right. But if you see him being followed by a giant tongue monster or psychopathic, hyper-intelligent gerbils, don’t say I didn’t warn you.

A Proposal for Simplified, Modern Definitions of “Object” and “Object Oriented”

Update (July 13, 2012): Prof. Cook has recently made a blog post, A Proposal for Simplified, Modern Definitions of “Object” and “Object Oriented”, that continues the struggle against sloppy language and those who refuse to acknowledge the primacy of reason. Erlang is object oriented; ML, Modula-2, and Ada are not. “All languages in the Lisp family, including Scheme, Racket, and Common Lisp” are object oriented (although all languages in the Lisp family are anything oriented, because “it is possible to define a few simple macros” that do any kind of magic you desire). Go is object oriented, and protocols in Clojure make it object oriented.

Java is problematic:

As I have said before, it is also possible to do object-oriented programming in Java. This suggests that it is also possible to NOT do object-oriented programming in Java. If you work at it, you can [program object-orientedly (?)]. [Emphasis mine].

He goes on to say, however, that

All other existing object-oriented languages are included in the definition, including Simula, Smalltalk, C++, Eiffel, Beta, Objective-C, JavaScript, Visual Basic, Self, Perl, Matlab, PHP, Python, and Ruby…. As I have suggested before, the untyped lambda calculus is also object oriented, because the only way to implement “data” is with behavior.

Unfortunately, I suspect that C++, Objective-C, JavaScript, Perl, PHP, Python, and Ruby belong under the same caveats as Java, since I believe that very little code is written in those languages under the idea that “the key point is that objects are behavioral abstractions: objects are known by what they do”; certainly most of the code I have seen does not. Smalltalk, Simula, and Self, certainly, perhaps Eiffel and Beta. Matlab is on it’s own, and I’m not going to touch Visual Basic with a burning, ten-foot effigy of a sweaty Steve Ballmer.

On another front, Java is not currently a functional language, but Java 8 will be. The future will bring “a strong shift toward a new convention whereby ‘functional’ means ‘no aliasing of mutable state’” with the result that Lisp and ML will not be functional languages.

On a personal note, I would like to thank Professor Cook for his kind comments on this article, and say that the robotic zombie ninjas are really, really sweet. (Robotic ninja zombies? See, it works any which way! It’s brilliant!)

Comments

This is pretty funny. Of course I meant the original essay to be humorous as well, but it was probably too dry. I *am* going to try to get everybody to use interfaces in Java 8 :-) WHAAAAHAHAHAHAHAHAH

William Cook
‘2012-03-21T19:23:20.941-05:00’

Looking at this situation it seems that Java 8 is definitely going to get the right objects but still we have to wait little more to get it released. I think almost around 2013 it will come into existence.

Steven Howard
‘2012-03-29T01:20:49.359-05:00’