Tuesday, October 6, 2015

Ashurbanipal, a text recommendation engine


The Ashurbanipal project is a prototype of a text recommendation engine based entirely on attributes internal to the texts, such as style and topic. It is based on a snapshot of English texts from Project Gutenberg in 2010 and includes approximately 24,000 works of no-longer-in-copyright or otherwise freely-available fiction and non-fiction.

The URL is http://dpg.crsr.net.


Recommendation engines are very important, both economically and, well, culturally. Now, I assume I can leave the first side of that "and" to your imagination, but for the second, they have become some of the most important ways that cultural artifacts such as books and videos are discovered in this world. Not necessarily by replacing other means, but by being more ubiquitous and by being at least reasonably successful.

On the other hand, most recommendation engines have a significant flaw: they are, essentially, popularity contests. Consider the Amazon recommendation system, for books and for everything else you need in daily life, or the Netflix system that was the subject of a relatively recent computer science contest: both are built on either

  • ratings provided by users, or
  • the actions or attributes of users, such as what other things they have bought or looked at, or things like age, gender, etc.

In neither case do the recommendation systems look at any attributes in the articles being recommended. Certainly, they look at attributes of the articles, like author, cast and category, but as far as I have been able to find out, Amazon doesn't open the text of the books being recommended and the Netflix Prize contest certainly involved nothing of the contents of the movies in the data set.

Existing recommendation systems work well. (I'm still wondering how my gen-one Tivo knew to record a cool old psychic documentary narrated by Leonard Nimoy after I'd had it set up for less than six hours.) There is no question that they are pretty good at what they do and what they do is a big part of the solution to the overall problem of discovery.

But popularity-based recommendation engines have a glaring weakness: unpopular artifacts. If a video is too new to have any review data and is not a production of a known cast, it's not likely to appear in any recommendation lists. Even if it's exactly what someone wants to see. Books by long dead authors are still pretty readable and still pretty good. (Well, ok, I admit I'm not a wild fan of Tess of the d'Urbervilles.) But I have yet to hear of anyone who has had H. Rider Haggard show up in their Amazon recommendations uninvited, in spite of having a plethora of published options. If an artifact has few connections to other, well-known artifacts, like ratings or shopping-cart interactions, then it isn't really even a candidate for recommendation.

What do to about this situation? I personally know of a couple of ongoing attempts to build for music what I'll call "internal" recommendation engines (as opposed to recommendation engines built on factors "external" to the item under recommendation): you like that beat, we'll find you other songs with beats like that, and so on.

Unfortunately, I'm not especially interested in music and I really have no idea what I'm doing. Fortunately, however, I am interested in books (Muahahaha, books, books, hahaha.) (Why does that always happen?) and I do have some experience munging around with text.

My attempt at building an internal text recommendation engine is Ashurbanipal.


Ashurbanipal himself was the last (successful) king of the Neo-Assyrian empire. Like all of the other Assyrian kings (and most of the other ancient near-eastern rulers) he was a right rat bastard, cruel to his enemies and big-talking in his monuments. In fact, the British Museum (Naturally; like, where else would it be?) has a relief of him and his wife enjoying a lovely garden party with the king of Elam's head hanging from a nearby tree.

But Ashurbanipal's claim to my interest is his library, which is the largest single known collection of ancient cuneiform literature, if I recall correctly. Ashurbanipal may or may not have been a scholar-prince, taught to read and write because he was not expected to take the throne. The library also may or may not have been the first ancient library to be indexed, may or may not have been intended to preserve Sumerian and Akkadian cuneiform literature and culture in the face of the post-bronze age Aramaic culture, and may or may not have been the inspiration for Alexander's library in Alexandria.

Ashurbanipal, the software, is a collection of mostly Java (and a little Rust, at the moment) applications currently built to

  • collect meta-data about Project Gutenberg texts,
  • collect actual data from real, actual, Project Gutenberg text files, and
  • use that data to make recommendations for the texts.

Probably the most interesting program in this collection is run-tag-todolist and the Java program TagTodoList.java that backs it, the program which processes Project Gutenberg text files (stored in .zip form on the April 2010 DVD image; the simplest way to download a good data set) and produces the style and topic data that is the subject of this current bloggage. But more on that later.

What the beast does

Texts, particularly but not exclusively fiction, have a number of interesting internal attributes which could provide a set of handles for a recommendation engine. Attributes like style, topic, plot, characterization, and undoubtedly (well, hopefully) others. Ashurbanipal currently uses the first two of those.

Style is my first target, because it has been extensively studied (sort-of) and approached in a fashion that I can use for recommendations (if you squint a little).


In practice, stylometry is typically used for authorship attribution questions: Did Shakespeare really write this piece of garbage? Is Lief Erickson responsible for the stupid joke about whoever is buried in Grant's tomb? This seems a reasonable situation; if you like Charles Dickens' writing in A Tale of Two Cities, you might very well like his writing in Bleak House. (One thing I've noticed is that many style recommendation lists lead rather shortly to Bleak House. I've got no idea what that means.)

Many different approaches have been taken to stylometry, from plausible sounding but ultimately unhelpful things like sentence length to completely bogus, did-anyone-ever-buy-this? things (cough, cough, cusum). One standard method, however, seems to have bubbled to the top, due both to success and computational ease: the proportion of various "function words" or "stop words" in a segment of the text. Function words are those which carry little actual meaning in the text, but which serve to provide the grammatical structure on which content or lexical words hang like shiny cherries on the tree. They're sometimes known as "stop words" because they're ignored in most diddling-about with words, a short-sighted and uncouth fact rarely mentioned by "traditional" "computational" "linguists".

The idea is that you grub out say 5000 words from a text and count the number of uses of "an", "the", "or" and so on. The counts of each roughly match other segments from the same author and significantly differ for different authors.

In my own nigh-infinite wisdom, I completely ignored this tactic. (In fact, I didn't read about it until I'd already started writing code, and I rather like my approach so I'm running with it.) Instead, in a fine application of my "If all you have is a hammer and a screwdriver, every problem looks like a threaded nail" principle, I took an off-the-shelf part-of-speech tagger and counted the number of each reported part of speech for each text, which I then normalized by dividing the counts by the total number of words in the text.

The result is a matrix with one row per text, containing approximately 45 columns with headings like "singular common noun", "determiner", and less obviously, "numeral". (Actually, it uses the Penn Treebank tag-set, so those are "NN", "DT", and "CD".) Each value in the row is a positive number between zero and one, typically very close to zero. To make style recommendations based on a chosen text, I compute the Euclidian distance between that text and all of the others, then sort the list by the resulting distances. It seems to produce reasonable answers.

(I fully intend to collect functional-word information at some point soonish and compare those stylistic results to the POS results I have. However, so far I have done little in the way of cross validation. So there, nyah.)

For one example, the first book from a different author in the list of style recommendations for Jane Austen's Sense and Sensibility is His Heart's Queen by Mrs. Georgie Sheldon (1843-1926; slightly later than I would have expected). Using the "Page 63" test (i.e., turn to page 63 of a book and read it to see if the author has been smoking too much crack to be acceptable; in actual fact, I scrolled down until the tabs were a ways down and approximately equivalent), I find

"No; my feelings are not often shared, not often understood. But sometimes they are." As she said this, she sunk into a reverie for a few moments; but rousing herself again, "Now, Edward," said she, calling his attention to the prospect, "here is Barton valley. Look up to it, and be tranquil if you can. Look at those hills! Did you ever see their equals? To the left is Barton park, amongst those woods and plantations. You may see the end of the house. And there, beneath that farthest hill, which rises with such grandeur, is our cottage."

"It is a beautiful country," he replied; "but these bottoms must be dirty in winter."

"How can you think of dirt, with such objects before you?"

"Because," replied he, smiling, "among the rest of the objects before me, I see a very dirty lane."

"How strange!" said Marianne to herself as she walked on.

"Have you an agreeable neighbourhood here? Are the Middletons pleasant people?"

"No, not all," answered Marianne; "we could not be more unfortunately situated."

"Marianne," cried her sister, "how can you say so? How can you be so unjust? They are a very respectable family, Mr. Ferrars; and towards us have behaved in the friendliest manner. Have you forgot, Marianne, how many pleasant days we have owed to them?"

"No," said Marianne, in a low voice, "nor how many painful moments."

Elinor took no notice of this; and directing her attention to their visitor, endeavoured to support something like discourse with him, by talking of their present residence, its conveniences, &c. extorting from him occasional questions and remarks. His coldness and reserve mortified her severely; she was vexed and half angry; but resolving to regulate her behaviour to him by the past rather than the present, she avoided every appearance of resentment or displeasure, and treated him as she thought he ought to be treated from the family connection.

from Sense and Sensibility; and this

"Oh, I was afraid you would think me very bold---that you would regard me with contempt," Violet sighed, tremulously. "After my letter had gone, and I tried to think over what I had written more calmly, and to wonder how you would regard it, I was almost sorry that I had sent it."

"'Almost,' but not really sorry?" questioned Wallace, with a fond smile.

"No, for I had to tell you the truth, if I told you anything, and no one can be sorry for being strictly candid," she returned, "and," with a resolute uplifting of her pretty head, while she looked him straight in the eyes, "why should I not tell you just what was in my heart? Why does the world think that a woman must never speak, no matter if she ruins two lives by her silence? You told me that you loved me, although you did not ask me if I returned your affection; but I knew that my life would be ruined if I did not make you understand it. I do love you, Wallace, and I will not be ashamed because I have told you of it."

The young man was deeply moved by this frank, artless confession. He knew there was not a grain of indelicacy or boldness in it; it was simply a truthful expression of a pure and noble nature, the spontaneous outburst of a holy affection responding to the sacred love of his own heart, and the avowal aroused a profound reverence for an ingenuousness that was as rare as it was perfect.

He bent down and touched his lips to her silken hair.

"There is no occasion," he said, earnestly, "and you have changed all my life, my dear one, by adopting such a straightforward course. Still," he added, with a slight smile, "I did not come here intending to tell you just this, or with the hope that our interview would result in such open confessions."

"Did you not?" Violet asked, quickly, and darting a startling look at him.

from His Heart's Queen. Using the same technique, from Moby Dick

By the mainmast; Starbuck leaning against it.

My soul is more than matched; she's overmanned; and by a madman! Insufferable sting, that sanity should ground arms on such a field! But he drilled deep down, and blasted all my reason out of me! I think I see his impious end; but feel that I must help him to it. Will I, nill I, the ineffable thing has tied me to him; tows me with a cable I have no knife to cut. Horrible old man! Who's over him, he cries;---aye, he would be a democrat to all above; look, how he lords it over all below! Oh! I plainly see my miserable office,---to obey, rebelling; and worse yet, to hate with touch of pity! For in his eyes I read some lurid woe would shrivel me up, had I it. Yet is there hope. Time and tide flow wide. The hated whale has the round watery world to swim in, as the small gold-fish has its glassy globe. His heaven-insulting purpose, God may wedge aside. I would up heart, were it not like lead. But my whole clock's run down; my heart the all-controlling weight, I have no key to lift again.

[A burst of revelry from the forecastle.]

Oh, God! to sail with such a heathen crew that have small touch of human mothers in them! Whelped somewhere by the sharkish sea. The white whale is their demigorgon. Hark! the infernal orgies! that revelry is forward! mark the unfaltering silence aft! Methinks it pictures life. Foremost through the sparkling sea shoots on the gay, embattled, bantering bow, but only to drag dark Ahab after it, where he broods within his sternward cabin, builded over the dead water of the wake, and further on, hunted by its wolfish gurglings. The long howl thrills me through! Peace! ye revellers, and set the watch! Oh, life! 'tis in an hour like this, with soul beat down and held to knowledge,---as wild, untutored things are forced to feed---Oh, life! 'tis now that I do feel the latent horror in thee! but 'tis not me! that horror's out of me! and with the soft feeling of the human in me, yet will I try to fight ye, ye grim, phantom futures! Stand by me, hold me, bind me, O ye blessed influences!

And the first non-Herman Melville, non-Various text is Edgar Allan Poe's The Works of Edgar Allen Poe --- Volume 4:

But now a new horror presented itself, and one indeed sufficient to startle the strongest nerves. My eyes, from the cruel pressure of the machine, were absolutely starting from their sockets. While I was thinking how I should possibly manage without them, one actually tumbled out of my head, and, rolling down the steep side of the steeple, lodged in the rain gutter which ran along the eaves of the main building. The loss of the eye was not so much as the insolent air of independence and contempt with which it regarded me after it was out. There it lay in the gutter just under my nose, and the airs it gave itself would have been ridiculous had they not been disgusting. Such a winking and blinking were never before seen. This behavior on the part of my eye in the gutter was not only irritating on account of its manifest insolence and shameful ingratitude, but was also exceedingly inconvenient on account of the sympathy which always exists between two eyes of the same head, however far apart. I was forced, in a manner, to wink and to blink, whether I would or not, in exact concert with the scoundrelly thing that lay just under my nose. I was presently relieved, however, by the dropping out of the other eye. In falling it took the same direction (possibly a concerted plot) as its fellow. Both rolled out of the gutter together, and in truth I was very glad to get rid of them.

The bar was now four inches and a half deep in my neck, and there was only a little bit of skin to cut through. My sensations were those of entire happiness, for I felt that in a few minutes, at farthest, I should be relieved from my disagreeable situation. And in this expectation I was not at all deceived. At twenty-five minutes past five in the afternoon, precisely, the huge minute-hand had proceeded sufficiently far on its terrible revolution to sever the small remainder of my neck. I was not sorry to see the head which had occasioned me so much embarrassment at length make a final separation from my body. It first rolled down the side of the steeple, then lodge, for a few seconds, in the gutter, and then made its way, with a plunge, into the middle of the street.

Ok, that's gross. "A Predicament", I'm afraid; I'd never read that one before. But there we go! That was the goal! Success!

(Anyway, the problem with Various is that it is, in this particular case, an issue of Atlantic Monthly with no attributed articles. A collection by various authors, writing on different subjects, may well match some other text in toto, but it's not very likely to be valid. I'm not saying that an enthusiastic reader of Moby Dick wouldn't find something good in a given issue of Atlantic Monthly, but I don't want to say that they would, either. And the Page 63 thing isn't going to find it.)

In terms of non-fiction, here is The Descent of Man

I have remarked that sexual selection would be a simple affair if the males were considerably more numerous than the females. Hence I was led to investigate, as far as I could, the proportions between the two sexes of as many animals as possible; but the materials are scanty. I will here give only a brief abstract of the results, retaining the details for a supplementary discussion, so as not to interfere with the course of my argument. Domesticated animals alone afford the means of ascertaining the proportional numbers at birth; but no records have been specially kept for this purpose. By indirect means, however, I have collected a considerable body of statistics, from which it appears that with most of our domestic animals the sexes are nearly equal at birth. Thus 25,560 births of race- horses have been recorded during twenty-one years, and the male births were to the female births as 99.7 to 100. In greyhounds the inequality is greater than with any other animal, for out of 6878 births during twelve years, the male births were to the female as 110.1 to 100. It is, however, in some degree doubtful whether it is safe to infer that the proportion would be the same under natural conditions as under domestication; for slight and unknown differences in the conditions affect the proportion of the sexes. Thus with mankind, the male births in England are as 104.5, in Russia as 108.9, and with the Jews of Livonia as 120, to 100 female births. But I shall recur to this curious point of the excess of male births in the supplement to this chapter. At the Cape of Good Hope, however, male children of European extraction have been born during several years in the proportion of between 90 and 99 to 100 female children.

close to which comes Cactus Culture for Amateurs Being Descriptions of the Various Cactuses Grown in This Country, With Full and Practical Instructions for Their Successful Cultivation (hey, I'm really not making this stuff up).

C. speciosissimus (most beautiful). --Although not a night-flowering kind, nor yet a climber, yet this species resembles in habit the above rather than the columnar-stemmed ones. It is certainly the species best adapted for cultivation in small greenhouses or in the windows of dwelling-houses, as it grows quickly, remains healthy under ordinary treatment, is dwarf in habit, and flowers freely---characters which, along with the vivid colours and large size of the blossoms, render it of exceptional value as a garden plant. Its stems are slender, and it may be grown satisfactorily when treated as a wall plant. For its cultivation, the treatment advised for Phyllocactuses will be found suitable. When well grown and flowered it surpasses in brilliancy of colours almost every other plant known. Specimens with thirty stems each 6 ft. high, and bearing from sixty to eighty buds and flowers upon them at one time, may be grown by anyone possessing a warm greenhouse. The stems are three to five angled, spiny, the tufts of spines set in little disks of whitish wool. The flowers are as large as tea saucers, with tubes about 4 in. long, the colour being an intense crimson or violet, so intense and bright as to dazzle the eyes when looked at in bright sunlight. When cut and placed in water they will last three or four days. April and May. Mexico, 1820. "Numberless varieties have been raised from this Cereus, as it seeds freely and crosses readily with other species. Many years ago, Mr. D. Beaton raised scores of seedlings from crosses between this and C. flagelliformis, and has stated that he never found a barren seedling. Much attention was given to these plants about fifty years ago, for Sir E. Antrobus is said to have exhibited specimens with from 200 to 300 flowers each. I have been informed that an extremely large plant of this Cereus, producing hundreds of flowers every season, is grown on the back wall of a vinery at the Grange, Barnet, the residence of Sir Charles Nicholson, Bart." (L. Castle).

In point of fact, this choice of stylometric data is quite good at separating fiction from non-fiction, as well as (usually) finding texts authored in roughly the same time period.

Now that I can identify texts which read similarly (hey, they do to me, although I may be completely tone-deaf here), it would be nice to find texts which are related by something more solid.


As you can see by the Moby Dick to Edgar Allan Poe and Charles Darwin to cactus growing recommendations, style alone is likely to be a poor (or at least weird) overall choice. A more obvious recommendation would be based on something I think of as "topic".

A topic, as I think of the term in this context, is something combining elements of genre and setting, subject matter and background. As far as topics go, Sense and Sensibility and His Heart's Queen are English romances of a certain period; Moby Dick is a sea story with whales; Edgar Allan Poe is, well..., "A Predicament" is sort of a clock thing, or possibly an anatomical thing; The Descent of Man is science, natural history; and Cactus Culture for Amateurs similarly biological.

Defining a topic is hard, perhaps impossible. However, they're pretty easy to spot when you see them. For Ashurbanipal, I use the part-of-speech data to pick out common nouns from the text, count the nouns, and record the 200 most common nouns in each text. (This is, in fact, the screwdriver from my threaded-nail.)

In order to avoid multiple entries for the same word, I use the Stanford POS tagger's edu.stanford.nlp.process.Morphology class to "lemmatize" the words, a process of, in effect, stemming the words with knowledge of their part of speech. This process should be able to tell the difference between the noun "meeting" and the verb "to meet".

Comparing texts is fairly easy; I use the Jaccard distance between the two sets of nouns to compute a number between zero and one representing the distance between the two sets. A smaller distance means the texts are closer together and therefore the candidate is a more likely recommendation.

Book Recommendations
Sense and Sensibility Pride and Prejudice, Emma, Persuasion, Mansfield Park, Northanger Abbey, and Maria Edgeworth's Tales and Novels
Moby Dick Fighting the Whales by R.M. Ballantyne, Old Jack by William Henry Giles Kingston, and Great Sea Stories
The Descent of Man The Origin of Species, The Variation of Animals and Plants Under Domestication, and Darwinism by Alfred Russel Wallace
A Tale of Two Cities Little Dorrit, Barnaby Rudge, Bleak House, and Our Mutual Friend

Matthew Jockers' excellent Macroanalysis presents an alternative, algorithmic way of identifying topics. He, too, chooses to separate nouns, but then uses a topic modeling technique, Latent Dirichlet allocation, to categorize the nouns into weighted clusters. Further, he seems to have struck on the same overall approach of Euclidian distance, both in terms of stylometric measures and topic modeling categories, to relate texts by similarity. (Honestly, I did not discover Macroanalysis until I had most of Ashurbanipal written. I claim independent discovery.)


Having multiple recommendation techniques is nice, but combining them into a single, "best" recommendation would be most useful. Unfortunately, this is the part of the system that requires the most validation and I have had very little feedback. Currently, I am doing the simplest thing possible: I multiply the Euclidian distance in style by the Jaccard distance in topic; since the latter is always between 0 and 1, it serves to reduce the style distance proportionally to the topic distance.

You pays your money and you takes your chances.

Ashurbanipal, the code

Way back at the top, I mentioned run-tag-todolist, which calls the Java program TagTodoList.java. This program computes the part-of-speech and noun count data that is used to make recommendations, given a to-do list of etext numbers, language, content type, and file location information. In TagTodoList.java, the list of things to process goes down and around and eventually winds up in a thread pool running a Callable class called TaggerCallable.java. TaggerCallable reads the text file for English books out of the .zip file in the DVD image. The Project Gutenberg licensing and advertising information are stripped out (mostly successfully) by code stolen and translated from Clemens Wolff, and then the text is broken into approximately 10kb chunks to prevent the Stanford POS tagger from blowing through the heap. The actual break is made between paragraphs, or at least on an empty line which should be a paragraph transition. Going back to TaggerCallable, each of the fragments is processed by the tagger and the results are accumulated for each text.

In case you're interested in such things, here is a skeleton of the code invoking the tagger:

     private final TokenizerFactory<CoreLabel> tokenizerFactory
         = PTBTokenizer.factory(new CoreLabelTokenFactory(),
     private final MaxentTagger tagger
         = new MaxentTagger("english-left3words-distsim.tagger");
     private final Morphology morphology = new Morphology();
       final DocumentPreprocessor documentPreprocessor
           = new DocumentPreprocessor(text);
       for (List<HasWord> sentence : documentPreprocessor) {
         for (TaggedWord word : tagger.tagSentence(sentence)) {
           // word count
           if ("NN".equals(tag) || "NNS".equals(tag)) {
             // get base form of word
             String lemma = morphology.stem(word).toString();
             if (lemma == null) {
               lemma = word.toString();

I assembled that mess by looking at the commands from the Stanford POS tagger distribution and cut-n-pasting things from their source. It looks like it works, anyway .

As far as the programs using the collected data go, I have already discussed much of them in Reimplementing ashurbanipal.web in Rust, where I walked through the process of converting the Java servlets into a Rust program in order to reduce their memory footprint (and improve their speed). Command-line versions of the recommendation programs are also to be found in the Ashurbanipal project.

The prototype

The way Ashurbanipal recommendations work begins with a book selected by the user, something similar to which he or she would like to read. To use the prototype, go to the page, find a book that you know you like (using the search field at the upper left), and get style-based, topic-based, and combined recommendations for books which are in some way similar to your choice.

Here's the current web site: http://dpg.crsr.net.

To find a book, enter an author's name, a title, or some likely subject term (or one or more words from any of those) into the text field. The lovely and talented server will provide a drop-down list of possibilities; select one.

The information about the text you have selected will appear to the right of the text box, while the recommendations will appear in three rows below. Each list of recommendations can be scrolled left or right via the arrows; left indicates a higher recommendation and right a lower. Initially, the left-most visible book is the highest recommendation and the book you have selected will be the only thing to appear if you scroll left.

Click on the a recommendation's title to select it as the base book for more recommendations and on the Project Gutenberg link to go to the book's page at PG, where the book can be downloaded in a plethora of formats.

In the future, well, I have an idea for supporting plot recommendations and am actively looking for further attributes.


Existing recommendation engines




Ashurbanipal, the dead guy

Ashurbanipal, the software

Assorted gibberish

Monday, September 28, 2015

A New Thing

I believe I have discovered A New Thing.

I recently finished reading Who Murdered Chaucer: A Medieval Mystery by Terry Jones (yes, of Monty Python fame), et. al.1 and started reading Everything and More: A Compact History of Infinity by David Foster Wallace (with an introduction by Neil Stephenson). Let me tell you, it's been an interesting segue; I must apologize for the style of this post. (As an aside, I also took a quick break to read a new-to-me Silver John novel, The Voice of the Mountain, by Manly Wade Wellman. (Review: meh.) I am currently trying to figure out how to work "air" (meaning "any", "every"), "to shammock", or "to gop" into this post.)

Who Murdered Chaucer is quite a good book, throwing some understanding on the politics and religion of late 14th- and early 15th-century England, the reigns of Richard II and the usurper Henry IV, and the life of the poet and courtier Chaucer. It is also heavily partisan towards Richard II's party and a bit sketchy with the circumstantial evidence. (As another aside, though, if the Archbishop of Canterbury, Thomas Arundel, did not have Chaucer silenced and did not attempt to have Chaucer's work censored, he should have.)

However, when I mentioned Who Murdered Chaucer to my nemisis, Mittens, the conversation began like:

"...by Terry Jones. Yes, the Monty Python guy. It's pretty good."

"Did you read it in the voice of a Pepperpot?" replied Mittens.

"No, no, not at all.... Well, yes, parts of it."

So I kind of had this idea primed for me.

Anyway, after reading not more than a few pages of Everything and More, I came to a sudden and yet delayed, and very surprising, realization.

Now, Everything and More seems like it would be a perfect book for me: Non-fiction, essayish (a form I love), on a topic I enjoy and unfortunately, know something about, by a writer celebrated for his cerebral writing, his vocabulary, and his focus on compassion, existentialism, and pretension. So far, it has actually been pretty good. Unfortunately, as I said, I know something about infinity, the formal grounding of mathematics, and Georg Canter, so the hand-waving stands out.

(As another aside, one sentence early in Everything and More struck me as incredibly wrong: "Your author here ... is also someone who disliked and did poorly in every math course he ever took, save one, which wasn't even in college, but which was taught by one of those rare specialists who can make the abstract alive and urgent, and who actually really talks to you when he's lecturing, and of whom anything that's good about this booklet is a pale and well-meant imitation." Aside from the (repeated) use of the term "booklet" to describe a three-hundred-plus page trade paperback (can we slather on any more irony?), I find that the terminal "and" does not work at all. "[A]nything ... is a pale and well-meant imitation" is off; it destroys the structure of the elegant edifice Wallace has built in that footnote in much the same way that capping the Cathedral of Our Lady of Chartres with the marble pyramidion of the Washington Monument would (a) crush the construct (or would likely do so) down onto the labyrinth in its floor and (b) look ridiculous for the brief seconds it stood. That sentence really, really should end, "...and of whom anything that's good about this booklet is a pale but well-meant imitation.")

In any case, David Foster Wallace's prose should be read in the voice of an Englishman badly impersonating a middle-aged, middle-class, matronly Englishwoman. It's perfect: the high-pitched, tight-voiced, false squeal mixed with the amalgam of down-home, mid-American voice and prolix, erudite verbiage makes the whole experience a delight. Don't believe me? Check it for yourself: down a bit of pepperpot

and then read you some David Foster Wallace:

...There are these two young fish swimming along and they happen to meet an older fish swimming the other way, who nods at them and says "Morning, boys. How's the water?" And the two young fish swim on for a bit, and then eventually one of them looks over at the other and goes "What the hell is water?"

This is a standard requirement of US commencement speeches, the deployment of didactic little parable-ish stories. The story ["thing"] turns out to be one of the better, less bullshitty conventions of the genre, but if you're worried that I plan to present myself here as the wise, older fish explaining what water is to you younger fish, please don't be. I am not the wise old fish. The point of the fish story is merely that the most obvious, important realities are often the ones that are hardest to see and talk about. Stated as an English sentence, of course, this is just a banal platitude, but the fact is that in the day to day trenches of adult existence, banal platitudes can have a life or death importance, or so I wish to suggest to you on this dry and lovely morning.


Here's another didactic little story. There are these two guys sitting together in a bar in the remote Alaskan wilderness. One of the guys is religious, the other is an atheist, and the two are arguing about the existence of God with that special intensity that comes after about the fourth beer. And the atheist says: "Look, it's not like I don't have actual reasons for not believing in God. It's not like I haven't ever experimented with the whole God and prayer thing. Just last month I got caught away from the camp in that terrible blizzard, and I was totally lost and I couldn't see a thing, and it was fifty below, and so I tried it: I fell to my knees in the snow and cried out 'Oh, God, if there is a God, I'm lost in this blizzard, and I'm gonna die if you don't help me.'" And now, in the bar, the religious guy looks at the atheist all puzzled. "Well then you must believe now," he says, "After all, here you are, alive." The atheist just rolls his eyes. "No, man, all that was was a couple Eskimos happened to come wandering by and showed me the way back to camp."


By way of example, let's say it's an average adult day, and you get up in the morning, go to your challenging, white-collar, college-graduate job, and you work hard for eight or ten hours, and at the end of the day you're tired and somewhat stressed and all you want is to go home and have a good supper and maybe unwind for an hour, and then hit the sack early because, of course, you have to get up the next day and do it all again. But then you remember there's no food at home. You haven't had time to shop this week because of your challenging job, and so now after work you have to get in your car and drive to the supermarket. It's the end of the work day and the traffic is apt to be: very bad. So getting to the store takes way longer than it should, and when you finally get there, the supermarket is very crowded, because of course it's the time of day when all the other people with jobs also try to squeeze in some grocery shopping. And the store is hideously lit and infused with soul-killing muzak or corporate pop and it's pretty much the last place you want to be but you can't just get in and quickly out; you have to wander all over the huge, over-lit store's confusing aisles to find the stuff you want and you have to maneuver your junky cart through all these other tired, hurried people with carts (et cetera, et cetera, cutting stuff out because this is a long ceremony) and eventually you get all your supper supplies, except now it turns out there aren't enough check-out lanes open even though it's the end-of-the-day rush. So the checkout line is incredibly long, which is stupid and infuriating. But you can't take your frustration out on the frantic lady working the register, who is overworked at a job whose daily tedium and meaninglessness surpasses the imagination of any of us here at a prestigious college.

But anyway, you finally get to the checkout line's front, and you pay for your food, and you get told to "Have a nice day" in a voice that is the absolute voice of death. Then you have to take your creepy, flimsy, plastic bags of groceries in your cart with the one crazy wheel that pulls maddeningly to the left, all the way out through the crowded, bumpy, littery parking lot, and then you have to drive all the way home through slow, heavy, SUV-intensive, rush-hour traffic, et cetera et cetera.

Everyone here has done this, of course. But it hasn't yet been part of you graduates' actual life routine, day after week after month after year.

But it will be. And many more dreary, annoying, seemingly meaningless routines besides. But that is not the point. The point is that petty, frustrating crap like this is exactly where the work of choosing is gonna come in. Because the traffic jams and crowded aisles and long checkout lines give me time to think, and if I don't make a conscious decision about how to think and what to pay attention to, I'm gonna be pissed and miserable every time I have to shop. Because my natural default setting is the certainty that situations like this are really all about me. About MY hungriness and MY fatigue and MY desire to just get home, and it's going to seem for all the world like everybody else is just in my way. And who are all these people in my way? And look at how repulsive most of them are, and how stupid and cow-like and dead-eyed and nonhuman they seem in the checkout line, or at how annoying and rude it is that people are talking loudly on cell phones in the middle of the line. And look at how deeply and personally unfair this is.

The thing is that, of course, there are totally different ways to think about these kinds of situations. In this traffic, all these vehicles stopped and idling in my way, it's not impossible that some of these people in SUV's have been in horrible auto accidents in the past, and now find driving so terrifying that their therapist has all but ordered them to get a huge, heavy SUV so they can feel safe enough to drive. Or that the Hummer that just cut me off is maybe being driven by a father whose little child is hurt or sick in the seat next to him, and he's trying to get this kid to the hospital, and he's in a bigger, more legitimate hurry than I am: it is actually I who am in HIS way.

Or I can choose to force myself to consider the likelihood that everyone else in the supermarket's checkout line is just as bored and frustrated as I am, and that some of these people probably have harder, more tedious and painful lives than I do.

Again, please don't think that I'm giving you moral advice, or that I'm saying you are supposed to think this way, or that anyone expects you to just automatically do it. Because it's hard. It takes will and effort, and if you are like me, some days you won't be able to do it, or you just flat out won't want to.

But most days, if you're aware enough to give yourself a choice, you can choose to look differently at this fat, dead-eyed, over-made-up lady who just screamed at her kid in the checkout line. Maybe she's not usually like this. Maybe she's been up three straight nights holding the hand of a husband who is dying of bone cancer. Or maybe this very lady is the low-wage clerk at the motor vehicle department, who just yesterday helped your spouse resolve a horrific, infuriating, red-tape problem through some small act of bureaucratic kindness. Of course, none of this is likely, but it's also not impossible. It just depends what you what to consider. If you're automatically sure that you know what reality is, and you are operating on your default setting, then you, like me, probably won't consider possibilities that aren't annoying and miserable. But if you really learn how to pay attention, then you will know there are other options. It will actually be within your power to experience a crowded, hot, slow, consumer-hell type situation as not only meaningful, but sacred, on fire with the same force that made the stars: love, fellowship, the mystical oneness of all things deep down.3

And so on, and so forth. I hope no one minds that I quoted Wallace extensively there, but also took only selected passages from the address.

If you have taken the opportunity to enjoy my experiment there, I hope you will be able to confirm the frission of correctness, of key-fitting-into-lock rightness that I felt when I saw the relationship. I hesitate to use the term "epiphany", because it wasn't; more of a dawning realization as I continued to read Everything and More.

Perhaps the actual link here is existentialism; I chose the "Mrs. Premise and Mrs. Conclusion" skit deliberately, for it's genuine coverage of Sartre which corresponds brilliantly with the "This is Water" thing.

On the other hand, I suspect the existentialism connection is only second-order. Like the Emily Dickneson / The Yellow Rose of Texas thing, this convergence goes beyond subject matter and into fundamental questions of style.


1 If you see Al, tell him he owes me a fiver.2

2 And by this joke, I mean no disrespect towards Robert Yeager, Terry Dolan, Alan Fletcher and Juliette Dor, Jones' co-authors. They are all distinguished Chaucerian and medieval scholars and do not deserve to be overshadowed by the fame of their co-author.

3 Taken from a transcription of the 2005 Kenyon College Commencement Address, May 21, 2005.4

4 Thank you, Internet Archive Wayback Machine!

Saturday, August 22, 2015

Link o' the day: a couple of DSP links

You may have noticed a sudden appearance of Fourier transform stuff around here. It's one of the things I've been meaning to get around to playing with for a while, and when I saw Matt Jockers' Syuzhet stuff, I suddenly had a reason. Here's a couple of good links I've found so far:

The Scientist and Engineer's Guide to Digital Signal Processing, by Steven W. Smith, Ph.D. A hefty textbook on signal processing, convolution, Fourier transforms, and so forth, with "Very readable - low math - many examples". It's pretty easy to get bogged down in the math, especially if you're like me, and I know I am. This book looks at the topic from a discrete, programmable standpoint and a use, not derivation, standpoint. It's available online; I'd suggest the PDF versions of the chapters since the HTML version is missing for some chapters.

Think DSP, by Allen B. Downey. An introduction to digital signal processing using Python, by the person who brought you Think Complexity. Still in progress, I believe, but good reading.

Thursday, August 20, 2015

Link o' the day: The Programming Historian

Ok, so I know that The Programming Historian sounds like...is there an ultimate form of "oxymoron"? But it's not. There is a lot of good stuff there, and hopefully more coming.

Currently, there are introductions to computing, programming, and regular expressions (yeah, sigh), data management, classification, topic modelling and MALLET (!), GIS, and much more.

Wednesday, August 19, 2015

Syuzhet: Prodding the Frequency Domain

[Subliminal message: go to the real post!]

Following up on my previous excursion into R, I am going to take a closer look at A Portrait of the Artist as a Young Man, in both the time and frequency domains, in order to get a better handle on how the frequency domain of a book works.

Unfortunately, the margins of this blog do not provide enough room for the gigantic, monstrous thing. For that, you'll have to go to the Real Prodding the Frequency Domain page.