Exploring Syuzhet

Posted on August 8, 2015 by Tommy McGuire
Labels: books, digital humanities, R


[Subliminal message: go to the real post!]

Last year, Matthew Jockers published "A Novel Method for Detecting Plot", which caused a suitably flamboyant reaction particularly when he wrote "Revealing Sentiment and Plot Arcs with the Syuzhet Package" and said that he "measured and compared 40,000+ plot shapes and then clustered the resulting data in order to reveal six common, perhaps archetypal, plot shapes". The post "The Rest of the Story" describes the six shapes.

His comments, and the package, when he published it on github, drew a significant amount of criticism on a number of levels from the basic techniques it uses to produce raw data from texts to the techniques it uses to analyze the data. In fact, Matt seems to have thrown in the towel with "Requiem for a low pass filter" where he wrote, "I’m not entirely ready to concede that Fourier is useless for the larger problem, but [Ben Schmidt and Scott Enderle, presumably along with another major player in the conversation, Annie Swafford] have convinced me that a better solution than the low pass is possible and probably warranted."

I admit that I myself have exactly no standing to say so, but I think that the basic approach is sound and the specific techniques are good in general, even if some parts are a little sketchy. (And, hey, sketchy isn't all bad.)

So let me start with some disclaimers: I am not a digital signal processing guy. I'm not even a mathematician who looks like a DSP guy. I'm a computer programmer and all my math is discrete. I'm also a systems administrator, systems programmer and networking guy; I've never done any numerical work and I try to avoid floating point numbers if at all possible. But I've spent the last week or so reading up on DSP, so there's that, right?

But, given that, I wrote an R notebook discussion, more or less in depth and probably less than more coherent, of how Syuzhet works, or fails to, and what I currently think needs to be done to make it more better.

Unfortunately, the margins of this blog do not provide enough room for the gigantic, monstrous thing. For that, you'll have to go to the Real Exploring Syuzhet page.

active directory applied formal logic ashurbanipal authentication books c c++ comics conference continuations coq data structure digital humanities Dijkstra eclipse virgo electronics emacs goodreads haskell http java job Knuth ldap link linux lisp math naming nimrod notation OpenAM osgi parsing pony programming language protocols python quote R random REST ruby rust SAML scala scheme shell software development system administration theory tip toy problems unix vmware yeti
Member of The Internet Defense League
Site proudly generated by Hakyll.