Friday, December 30, 2011

Tip: Converting a CVS repository to git

I have a great load of source code (and other detritus) in a CVS repository that I've been copying around for years. I have not used CVS for anything new in a very long time, but because I was the only one modifying things, I did not bother to move the contents to anything more modern.

Occasionally, I need to reactivate a project, and usually want to move the history into a newer tool, git usually. And every time, I have to look up how to do it. So, here's the latest working scheme, in hopes that I don't have to search around anymore.

  1. Copy the CVSROOT directory and the module from the repository to someplace local. My CVS root (and the master git repositories) live remotely. The destination doesn't matter much.

    scp -r remote:CVS/{CVSROOT,module} CVS
  2. Get a copy of the latest cvs2svn, which includes cvs2git, which seems to be working better than I remember. Use cvs2git to create the two git fast-import format files from the repository.

    cvs2svn-2.3.0/cvs2git --blobfile=module-blob.dat --dumpfile=module-dump.dat --username me CVS/
  3. Create a new git repository to act as the master repository for the project.

    mkdir module.git
    cd module.git/
    git init --bare --shared
    
  4. Import the project.

    cat ../module-blob.dat ../module-dump.dat | git fast-import
  5. Put the new master repository back on my remote server.

    cd ..; scp -r module.git remote:GIT/
  6. Clone the remote repository to create a local, working repository.

    git clone ssh://remote/home/me/GIT/module.git
  7. The import above leaves the CVSROOT and module directories in the git repository, so the useful content of the module is one level down into the repository. The first git operation on the working repository fixes that.

    git rm -r CVSROOT/
    git mv module/* .
    git rm -r module
    git commit -m 'Removing cvs residue'
    git push

And with luck, that's that.

Saturday, December 24, 2011

Quote o' the day: Criticism

From reddit:

Criticizing another's favorite language is like criticizing their genitalia, religion or politics.

Words to live by.

Thursday, December 15, 2011

Link o' the day: Set Theory

I never really understood the whole deal about the Axiom of Choice. ("the last great controversy of mathematics"?) But now, I get it. Thanks, Randall Munroe!

Monday, December 12, 2011

Mad science, abstract data types, and objects

Mad science, and mad scientists, are always good to find. They represent guaranteed entertainment. But they do show up in the oddest places. How about this one: OOPSLA 2009, William Cook presents a talk entitled "No one listened! But I'll show them, I'll show them all!". And the weird part was, nobody noticed. Well, sure, it's actually a popular paper that gets referenced frequently and the title is in fact "On Understanding Data Abstraction, Revisited", but there's no sign that the peasants went scurrying for their pitchforks and torches.

William Cook

So, what's the deal with Cook and his paper?

A little backstory

In 1990, Cook published "Object-Oriented Programming Versus Abstract Data Types" which is almost identical in content to, but somewhat longer than, "On Understanding Data Abstraction, Revisited". In both papers Cook describes the long-established, but little known, difference between objects and abstract data types. (For simplicity, the object-oriented side is limited to a subset of what I learned to call "object-based programming"; objects, as you might normally think of them, minus inheritance and also without modifications.)

Given those limitations, objects are data abstractions that "have a collection of methods, or procedures, that share access to private local state." Consider the type specimen of object-oriented programming: Smalltalk. In Smalltalk, objects have internal fields that contain the object's state, but this state is entirely unavailable to the outside world. Instead the object has a collection of messages that it responds to, exposing information potentially based on its internal state as necessary. Once an object is created, it is entirely encapsulated, isolated from any other code than that implementing its behavior. Because the interface of the data abstraction is the methods exposed by the object, Cook identifies objects as procedural data abstractions. The key feature of an object is that its interface is defined by its constructor: the collection of methods the object responds to and the semantics of those methods are defined by the construction of the object. In a sense, those methods are associated with the specific value or instance.

Abstract data types, on the other hand, have "a public name, a hidden representation, and operations to create, combine, and observe values of the abstraction." The classic examples of them are the primitive types in Java, C++, or most other common languages. A value has a defined type name, say "double", some representation (Do you really know, or care, what a double actually looks like? (Note: You should. Floating point numbers are a notoriously leaky abstraction.)), and operations like reading one or converting one from some other representation, adding or subtracting two of them, and converting one to another representation, such as a string of characters. The key feature of an ADT is that its interface is defined by a collection of operations that apply to a specific type.

Constructors
Observations[]::
empty?truefalse
headerrorvalue
Let's say we create a matrix of operations on a data abstraction. My data abstraction is a list; it has two constructors, the Haskell-like "[]", or nil, building an empty list, and "::" or "cons", representing a constructor that takes a new element and an existing list and produces a new, combined list with the new element at the head. The list also has two "observation" operations, empty? which returns true for an empty list and false otherwise, and head, which produces some kind of error on an empty list, but returns the first value in the list otherwise. (This example is taken directly from "OOP Versus ADT".)

Constructors
Observations[]::
empty?truefalse
headerrorvalue

Object-oriented

Suppose I want to create an object-oriented version of this data abstraction. I would divide up the operations by constructor, so that I had an implementation of empty? and head for an empty list and empty? and head for a cons cell. The details of each operation is divided among the objects created by each specific constructor. Adding a new constructor is easy, since it simply requires providing implementations of the observations on the new object. On the other hand, adding a new observation is difficult, since it requires modifying all of the existing constructors. This should sound familiar to anyone who has written code in almost any object oriented language.

Constructors
Observations[]::
empty?truefalse
headerrorvalue

ADT

On the other hand, suppose I want an ADT version of this data abstraction. In this case, I would divide the operations by observation, so that I had an implementation of empty? and head that each covered all of the constructors. The details of each constructor is spread among the observation implementations. Adding new observation is easy; it just has to handle the individual cases of the constructors. Adding a new constructor is hard, though, because every observation needs to be modified. This option should be familiar to anyone who has used ML or Haskell, since it is how their algebraic data types work.

There are some less obvious consequences to these two styles of data abstraction. For one thing, the two approaches are almost duals; the advantages of one are the weaknesses of the other, and vice-versa. Both approaches do, in fact, provide data abstraction; objects by encapsulating the state of the data instance behind procedural abstractions, ADTs by encapsulating that state behind a type boundary. Both also describe (if you squint a bit) common data abstraction patterns. On the other hand, each approach has some peculiarities: because an object only has access to its own state, complex operations require weakening the encapsulation of objects. Cook calls "an operation 'complex' if it inspects multiple representations," like for example equality testing; I may not want to expose all of the implementation information necessary to compare two instances in the methods that an object responds to, yet I have no other option in order to implement some observations. Cook claims that object oriented, procedural abstractions are possible in dynamically typed languages but ADTs are not; "abstract data types depend upon a static type system to enforce type abstraction. It is not an accident that dynamic languages use objects instead of user-defined abstract data types. Dynamic languages typically support built-in abstract data types for primitive types; the type abstraction here is enforced by the runtime system."

I do not doubt the validity of this deconstruction of objects and abstract data types, if for no other reason than that it exactly covers Philip Wadler's "expression problem". Also, both ideas seem pretty productive, which is always the primary test of science on the hoof. If that was all there was to it, there would be no mad science involved.

The mad science part

Mad Science Tip #1: If you have to repeat the same message over 20 years, and no one seems to be listening to you, they're the crazy ones. They're the ones who don't see.

Mad Science Tip #2: A beautiful theory, one that explains everything, is obviously true. Even if reality doesn't seem to particularly agree. The ADT half of Cook's papers seems to be relatively uncontroversial, perhaps because no one actually uses the languages (like ML and Haskell) that have user-defined ADTs as the primary data abstraction scheme. On the other hand, Cook's minor point about the inability to use ADTs in dynamically typed languages doesn't seem particularly, well, true; most Lisps, for example, allow you to define structures (or write macros to build such structures) that have representations hidden from any functions other than a limited group. Such pseudo-ADTs may not be as encapsulated as Cook would like, but coupled with a common coding style rule, say "Don't do stupid things," they work well enough. Objects, likewise, do not actually seem to exist. It's been too long since I last used Smalltalk, but every other "object-oriented" language allows a method in a class to access the internals of more than one object of the same class, neatly avoiding (some cases of) the "complex" operation issue. (To avoid all of the difficulties presented by complex operations requires something like multimethods or other techniques that are even more rare.) Maybe some of the object systems for Scheme satisfy the requirements for objects, but Scheme doesn't have a standard anything, much less an object system; it is a toolkit for implementing other languages. Further, Cook's ideas on actually doing object-oriented programming in Java would likely horrify most Java-istas, even those that love interfaces as much as I do.

Mad Science Tip #3: You get extra points for hosting dinner parties where you torment your rivals.

What is the relationship between objects and abstract data types (ADTs)? I have asked this question to many groups of computer scientists over the last 20 years. I usually ask it at dinner, or over drinks. The typical response is a variant of “objects are a kind of abstract data type”.
Whereupon he goes all maniacal on them.

Last and possibly least, Mad Science Tip #4: Being a mad scientist is fun; you get to meet interesting people, there's nothing like the je ne sais quoi of carving your name in the moon with a giant laser. (For that matter, giant lasers are just plain cool no matter what you do with them.) But it may have detrimental effects on your social life.

Most groups eventually work through the differences between objects and ADTs, but I can tell they walk away feeling uneasy, as if some familiar signposts now point in different directions.
...or perhaps as if they're just glad to get away with all the right number of appendages.

Now, I'm not saying William Cook is going to destroy the world if Java 8 doesn't get objects right. But if you see him being followed by a giant tongue monster or psychopathic, hyper-intelligent gerbils, don't say I didn't warn you.

A Proposal for Simplified, Modern Definitions of "Object" and "Object Oriented"

Update (July 13, 2012): Prof. Cook has recently made a blog post, A Proposal for Simplified, Modern Definitions of "Object" and "Object Oriented", that continues the struggle against sloppy language and those who refuse to acknowledge the primacy of reason. Erlang is object oriented; ML, Modula-2, and Ada are not. "All languages in the Lisp family, including Scheme, Racket, and Common Lisp" are object oriented (although all languages in the Lisp family are anything oriented, because "it is possible to define a few simple macros" that do any kind of magic you desire). Go is object oriented, and protocols in Clojure make it object oriented.

Java is problematic:

As I have said before, it is also possible to do object-oriented programming in Java. This suggests that it is also possible to NOT do object-oriented programming in Java. If you work at it, you can [program object-orientedly (?)]. [Emphasis mine].
He goes on to say, however, that
All other existing object-oriented languages are included in the definition, including Simula, Smalltalk, C++, Eiffel, Beta, Objective-C, JavaScript, Visual Basic, Self, Perl, Matlab, PHP, Python, and Ruby.... As I have suggested before, the untyped lambda calculus is also object oriented, because the only way to implement "data" is with behavior.

Unfortunately, I suspect that C++, Objective-C, JavaScript, Perl, PHP, Python, and Ruby belong under the same caveats as Java, since I believe that very little code is written in those languages under the idea that "the key point is that objects are behavioral abstractions: objects are known by what they do"; certainly most of the code I have seen does not. Smalltalk, Simula, and Self, certainly, perhaps Eiffel and Beta. Matlab is on it's own, and I'm not going to touch Visual Basic with a burning, ten-foot effigy of a sweaty Steve Ballmer.

On another front, Java is not currently a functional language, but Java 8 will be. The future will bring "a strong shift toward a new convention whereby 'functional' means 'no aliasing of mutable state'" with the result that Lisp and ML will not be functional languages.

On a personal note, I would like to thank Professor Cook for his kind comments on this article, and say that the robotic zombie ninjas are really, really sweet. (Robotic ninja zombies? See, it works any which way! It's brilliant!)

Thursday, December 8, 2011

8,827,520 pixels!

  • 1440 x 900 on the (old) laptop,
  • 3840 x 1080 in two monitors on the (new) laptop,
  • 1920 x 1080 and 1280 x 1024 on the Mac.

We had a short-lived competition. I won only briefly.

Quote o' the day: flagrant reticulation

In a reply to Ben Collins-Sussman's post, Your Community is NOT Your Tools, MarkusQ writes:

I think some of the git-resistance comes from different cultural assumptions around the cost of merging. I know from an svn mindset, the “mash the branches together and call it merged” mindset of the git-world can seem a little fast and loose. Likewise, I suspect the older “merging is to be approached with fear and reverence” culture may seem hidebound when you’re used to the flagrant reticulation, devil take the hindmost style that the git world embraces.

– MarkusQ

For those of you too lazy to look it up yourself,


$ dict reticulation
2 definitions found

From The Collaborative International Dictionary of English v.0.48 [gcide]:

  Reticulation \Re*tic`u*la"tion\, n.
     The quality or state of being reticulated, or netlike; that
     which is reticulated; network; an organization resembling a
     net.
     [1913 Webster]
  
           The particular net you occupy in the great
           reticulation.                            --Carlyle.
     [1913 Webster]

From WordNet (r) 3.0 (2006) [wn]:

  reticulation
      n 1: (photography) the formation of a network of cracks or
           wrinkles in a photographic emulsion
      2: an arrangement resembling a net or network; "the reticulation
         of a leaf"; "the reticulation of a photographic emulsion"

The only way to get more points would have been to work in "squamous".

Friday, December 2, 2011

Link o' the week: Bayes' Theorem

I seem to be up to my armpits in Bayesian statistics. Between Thrun and Norvig's AI class (the AI classes I took back in the late '80's and early '90's primarily dealt with state-space search, neural networks, and predicate logic; there was nary a statistic to be found) and the very good textbook I ran across, Doing Bayesian Data Analysis: A Tutorial with R and BUGS by John K. Kruschke, this stuff seems to be appearing everywhere.

Then I ran across Eliezer Yudkowsky's post, An Intuitive Explanation of Bayes' Theorem. Now, intuition is a funny thing. It is not innate as most people seem to think, you have to train it. Yudkowsky does that, going over the same ground with many examples and good explanations. Better, his writing, like Kruschke's, is light and entertaining:

Fun Fact!
Q.How can I find the priors for a problem?
A.Many commonly used priors are listed in the Handbook of Chemistry and Physics.
Q.Where do priors originally come from?
A.Never ask that question.
Q.Uh huh. Then where do scientists get their priors?
A.Priors for scientific problems are established by annual vote of the AAAS. In recent years the vote has become fractious and controversial, with widespread acrimony, factional polarization, and several outright assassinations. This may be a front for infighting within the Bayes Council, or it may be that the disputants have too much spare time. No one is really sure.
Q.I see. And where does everyone else get their priors?
A.They download their priors from Kazaa.
Q.What if the priors I want aren't available on Kazaa?
A.There's a small, cluttered antique shop in a back alley of San Francisco's Chinatown. Don't ask about the bronze rat.

Bronze rat aside, I also like the Javascript calculators embedded in the article, allowing a reader to try to figure out the answers to Yudkowsky's examples without dredging up another app.