Welcome to The Valve

Valve Links

The Front Page
Statement of Purpose

John Holbo - Editor
Scott Eric Kaufman - Editor
Aaron Bady
Adam Roberts
Amardeep Singh
Andrew Seal
Bill Benzon
Daniel Green
Jonathan Goodwin
Joseph Kugelmass
Lawrence LaRiviere White
Marc Bousquet
Matt Greenfield
Miriam Burstein
Ray Davis
Rohan Maitzen
Sean McCann
Guest Authors

Laura Carroll
Mark Bauerlein
Miriam Jones

Past Valve Book Events

cover of the book Theory's Empire

Event Archive

cover of the book The Literary Wittgenstein

Event Archive

cover of the book Graphs, Maps, Trees

Event Archive

cover of the book How Novels Think

Event Archive

cover of the book The Trouble With Diversity

Event Archive

cover of the book What's Liberal About the Liberal Arts?

Event Archive

cover of the book The Novel of Purpose

Event Archive

The Valve - Closed For Renovation

Happy Trails to You

What’s an Encyclopedia These Days?

Encyclopedia Britannica to Shut Down Print Operations

Intimate Enemies: What’s Opera, Doc?

Alphonso Lingis talks of various things, cameras and photos among them

Feynmann, John von Neumann, and Mental Models

Support Michael Sporn’s Film about Edgar Allen Poe

Philosophy, Ontics or Toothpaste for the Mind

Nazi Rules for Regulating Funk ‘n Freedom

The Early History of Modern Computing: A Brief Chronology

Computing Encounters Being, an Addendum

On the Origin of Objects (towards a philosophy of computation)

Symposium on Graeber’s Debt

The Nightmare of Digital Film Preservation

Richard Petti on Occupy Wall Street: America HAS a Ruling Class

Bill Benzon on Whatwhatwhatwhatwhatwhatwhat?

Nick J. on The Valve - Closed For Renovation

Bill Benzon on Encyclopedia Britannica to Shut Down Print Operations

Norma on Encyclopedia Britannica to Shut Down Print Operations

Bill Benzon on What’s an Object, Metaphysically Speaking?

john balwit on What’s an Object, Metaphysically Speaking?

William Ray on That Shakespeare Thing

Bill Benzon on That Shakespeare Thing

William Ray on That Shakespeare Thing

JoseAngel on That Shakespeare Thing

Bill Benzon on Objects and Graeber's Debt

Bill Benzon on A Dirty Dozen Sneaking up on the Apocalypse

JoseAngel on A Dirty Dozen Sneaking up on the Apocalypse

JoseAngel on Objects and Graeber's Debt

Advanced Search

RSS 1.0 | RSS 2.0 | Atom

RSS 1.0 | RSS 2.0 | Atom


Powered by Expression Engine
Logo by John Holbo

Creative Commons Licence
This work is licensed under a Creative Commons License.



About Last Night
Academic Splat
Amardeep Singh
Bemsha Swing
Bitch. Ph.D.
Blogging the Renaissance
Butterflies & Wheels
Cahiers de Corey
Category D
Charlotte Street
Cheeky Prof
Chekhov’s Mistress
Chrononautic Log
Cogito, ergo Zoom
Collected Miscellany
Completely Futile
Confessions of an Idiosyncratic Mind
Conversational Reading
Critical Mass
Crooked Timber
Culture Cat
Culture Industry
Early Modern Notes
Easily Distracted
fait accompi
Ferule & Fescue
Ghost in the Wire
Giornale Nuovo
God of the Machine
Golden Rule Jones
Grumpy Old Bookman
Ideas of Imperfection
In Favor of Thinking
In Medias Res
Inside Higher Ed
jane dark’s sugarhigh!
John & Belle Have A Blog
John Crowley
Jonathan Goodwin
Kathryn Cramer
Languor Management
Light Reading
Like Anna Karina’s Sweater
Lime Tree
Limited Inc.
Long Pauses
Long Story, Short Pier
Long Sunday
Making Light
Maud Newton
Michael Berube
Motime Like the Present
Narrow Shore
Neil Gaiman
Old Hag
Open University
Pas au-delà
Planned Obsolescence
Quick Study
Rake’s Progress
Reader of depressing books
Reading Room
Reassigned Time
Reeling and Writhing
Return of the Reluctant
Say Something Wonderful
Shaken & Stirred
Silliman’s Blog
Slaves of Academe
Sorrow at Sills Bend
Sounds & Fury
Stochastic Bookmark
Tenured Radical
the Diaries of Franz Kafka
The Elegant Variation
The Home and the World
The Intersection
The Litblog Co-Op
The Literary Saloon
The Literary Thug
The Little Professor
The Midnight Bell
The Mumpsimus
The Pinocchio Theory
The Reading Experience
The Salt-Box
The Weblog
This Public Address
This Space: The Fire’s Blog
Thoughts, Arguments & Rants
Tingle Alley
University Diaries
Unqualified Offerings
What Now?
William Gibson

Tuesday, June 01, 2010

From the CHE: Text-Mining and Data Digging as “The Humanities Go Google”

Posted by Rohan Maitzen on 06/01/10 at 08:55 AM

Data-diggers are gunning to debunk old claims based on “anecdotal” evidence and answer once-impossible questions about the evolution of ideas, language, and culture. Critics, meanwhile, worry that these stat-happy quants take the human out of the humanities. Novels aren’t commodities like bags of flour, they warn. Cranking words from deeply specific texts like grist through a mill is a recipe for lousy research, they say—and a potential disaster for the profession. . . .

The idea that animates [Franco Moretti’s] vision for pushing the field forward is “distant reading.” Mr. Moretti and Mr. Jockers say scholars should step back from scrutinizing individual texts to probe whole systems by counting, mapping, and graphing novels.

And not just famous ones. New insights can be gleaned by shining a spotlight into the “cellars of culture” beneath the small portion of works that are typically studied, Mr. Moretti believes.

He has pointed out that the 19-century British heyday of Dickens and Austen, for example, saw the publication of perhaps 20,000 or 30,000 novels—the huge majority of which are never studied.

The problem with this “great unread” is that no human can sift through it all. “It just puts out of work most of the tools that we have developed in, what, 150 years of literary theory and criticism,” Mr. Moretti says. “We have to replace them with something else.”

Something else, to him, means methods from linguistics and statistical analysis. His Stanford team takes the Hardys and the Austens, the Thackerays and the Trollopes, and tosses their masterpieces into a database that contains hundreds of lesser novels. Then they cast giant digital nets into that megapot of words, trawling around like intelligence agents hunting for patterns in the chatter of terrorists.

(read the whole story here)


I couldn’t decide what I wanted to say about this story when I first posted it. I’m still not sure, but it’s something about the way this project, and the reactions to it, highlight the difference between approaching literature as symptomatic and approaching it as aesthetic experience. Of those 20,000 or 30,000 unread 19thC novels, how many offer much for those interested in the latter approach, for instance? And yet it is pretty obviously imperfect to proceed towards large generalizations about culture and/or society on the basis of the handful of books (OK, dozens) that are really well known from the 19thC.

By Rohan Maitzen on 06/01/10 at 11:02 AM | Permanent link to this comment

Has the filtering process selected for aesthetic value, do you think? It seems so, but there’s really too much unread to know for sure.

By Jonathan Goodwin on 06/01/10 at 11:15 AM | Permanent link to this comment

I agree that we can’t assume it has. But there’s no way to read through 30,000 volumes to check--assuming “text mining,” whatever else it does, can’t help you find the lost gems. What I had in mind more, though, was just the difference between slow reading and data digging as methods and what they suggest about the purpose of literary study--whichever texts we select.

By Rohan Maitzen on 06/01/10 at 11:23 AM | Permanent link to this comment

Also, the comment about “dullards” in that article is a bit much.

By Jonathan Goodwin on 06/01/10 at 11:29 AM | Permanent link to this comment

Well, this is one kind of issue if you think of literary studies as REALLY DEEPLY AND TRULY one thing. If that’s the case, then we’ve got a rather nasty conflict over what that one thing is and how to study it.

But maybe it’s not ONE THING. Maybe there’s more than one intellectually valid way of looking at literature and its history. In that case, the data miners aren’t necessarily in conflict with the close readers.

What I’m wondering is something like this: If the canonical texts really are somehow “superior to” or “more central” than the rest, can we show this with statistical methods. I’m imagining that for any given corpus of texts and for any text within that corpus, we can calculate the, shall we say, projection of the text into the the corpus. I would expect the projections of canonical texts to be of a different statistical character than the projections of non-canonical texts? Just what that difference should be, gimme a break, I’m making this up as I go along. But that’s what I’d be gunning for if I were doing this work.

By Bill Benzon on 06/01/10 at 11:58 AM | Permanent link to this comment

I have a book entitled Algorithmic Aesthetics that was published in the late 70s, I think. People have been thinking about this (and using computational stylistics) for a long time. It’s just the size of the available corpus that has changed. (And probably what Moretti is doing is qualitatively different as well.)

By Jonathan Goodwin on 06/01/10 at 12:04 PM | Permanent link to this comment

The size of the corpus makes a BIG difference. It’s one thing to look for the “signature” of excellence text by text. What I have in mind is quite different. I want to look at the relationships that obtain among all the texts and see if some aren’t more “central” than others.

By Bill Benzon on 06/01/10 at 12:08 PM | Permanent link to this comment

We said quite a bit about this in the Moretti event linked to at the left.  As usual, those bygone discussions are “dead” even though they’re right there.

The article strikes me as the typical kind of thing you get with science journalism.  In humanities journalism, you most often get the mocking article.  In science journalism, it’s the gosh-wow everything-is-becoming-different article, whether the science is really new or not.  It’s not really the fault of the people interviewed, who have typically been subtly misquoted in any event.  But it’s impossible for a science reporter to write an article that does not go gosh-wow, both because that’s the only way they get them published, and because reporters are generally the dimmest group of literate people, as a class, that I’ve ever encountered.

The various issues mentioned don’t seem like that large a deal to me, other than their ongoing thumbsucker value.  People who study literary quality and aesthetics are going to study individual works.  People who study mass trends in literature are going to study a database.  It’s gradually going to get easier and easier to do the second, so there’s going to be a gradual push-and-shove over turf within the academic departments that study literature, but the two kinds of questions aren’t really that connected.

By on 06/01/10 at 04:15 PM | Permanent link to this comment

the Moretti event

My bad, Rich; thanks for pointing this connection out. I’d seen that graphic many times but the event predated my relationship with the Valve and it didn’t even come to mind when I saw this CHE story. There’s something interesting, perhaps, in this “news” story being sort of 4 years old (at least)--that event was in 2006. Of course, it may yet be new to other people now, as it was, basically, to me.

By Rohan Maitzen on 06/01/10 at 04:55 PM | Permanent link to this comment

What Rich said. There’s no particular reason for you to have know or been curious about the Moretti event, Rohan.

And, as Jonathan Goodwin has noted, this kind of thing has been going on for a long time. There’s a journal called Computers and the Humanities—I assume it still exists, though I don’t really know—that’s been publishing at least since the mid-70s (when I coauthored a review-article on computational linguistics and the humanities), more likely since some time in the 60s. And Stanley Fish critiqued some of this work in some of his 70s essays on stylistics (collected in Is There a Text in This Class?). So, the person who wrote the article didn’t do their homework.

By Bill Benzon on 06/01/10 at 06:35 PM | Permanent link to this comment

Wow.  I mean, I wish Moretti and Jockers all the best, but the article gave me the sense that they really have no idea how they are going to use these databases and algorithms.  The one actual research question they raise—how did the 19th century novel shift from an abstract teaching form to a concrete and dramatic form—seems already weird and misguided.  And they way they hope to track it—by looking at the frequency of abstract words like “loyalty”—seems ignorant at best, moronic at worst. 

Take two sentences:

“Sacrifice your life to him to whom you’ve tied your life.”


“I questioned Johnson’s loyalty.  His honor was at stake.”

The former sentence is abstract and didactic.  The latter the sentence is dramatic and concrete. 

Abstract diction vs. concrete diction is different than “saying versus showing,” which is what the researchers really seem to be addressing.

Ugh.  These questions should be raised by historians and sociologists, not by literature scholars.  The fact that they are *not* raised by sociologists and historians suggests that they aren’t terribly important.

By on 06/01/10 at 07:30 PM | Permanent link to this comment

Not all of it is the fault of the journalists—let’s not forget that Franco Moretti and Matt Jockers are themselves presenting this shift as a revolution, a radical departure from the way things have been done in the past. 

It’s not, of course: literary historians (not to mention good old plain historians, who know that they are more likely to get usable data from less “interesting” literary texts) have been going beyond the “classics” for quite some time.

As so often, I’m with Bill Benzon on this one: what has changed is the set of tools; what remains is the choice between aesthetics-focused projects and mass-trend-focused projects.

Both are extremely valuable when well done, and I look forward to seeing some interesting results from the new databases.  Here’s a really great one, indicating—for all the constructivists out there—that Romantic love is a literary universal.  http://muse.jhu.edu/journals/philosophy_and_literature/v030/30.2gottschall.html

My concerns are only (1) the rhetoric of revolution (2) the studies that return less than exciting results ("hey, titles got shorter!") (3) the occasional mishandling of evidence (as with the clues in Sherlock Holmes).

By Joshua Landy on 06/01/10 at 08:56 PM | Permanent link to this comment

Yeah, the rhetoric of revolution’s got to go, even if there’s a revolution at hand. It’s been totally devalued by the previous generation of “revolutionaries.”

By Bill Benzon on 06/01/10 at 09:30 PM | Permanent link to this comment

To Luther Blissett [whoever this person actually is]: you think Jockers and I are “ignorant at best, moronic at worst”? We’ve published plenty of work with our signature at the end: read it, and then say whatever you wish. But read our work, not someone else’s.
To Joshua Landy, who is a colleague, so to speak, at Stanford: you want to criticize what I have written on titles and other topics? Why didn’t you show up the many times I have discussed them in public at Stanford? Let me suggest a reason: because you could never get away with caricature ["hey, titles got shorter”!] and slander [mishandling of evidence].

By on 06/02/10 at 12:08 AM | Permanent link to this comment

Prof. Moretti—I did not label you or Prof. Jockers that way.  I labeled the attempt to trace developments in the novel by doing word searches as ignorant and/or moronic.  If the journalist misrepresented that research plan, I apologize.  However, I think that you’ll need to do more than look at diction to prove large claims about if, how, and when the novel shifted in the way the journalist claims you claim it shifted. 

Which is to say that a million half-pennies of evidence cannot prove an argument any better than fifty if that evidence is, by its very nature, not sufficient evidence.  I presented a simplistic thought experiment to get at this problem above.  Another: let’s say that from 1846 to 1881, the word “loyalty” was used in geometrically increasing numbers.  Does that mean that the theme of loyalty suddenly became more important?  I doubt it.  Nor does it suggest that the novel before 1846 was somehow more dramatic and less abstract and didactic about loyalty. 

Which is to say: any large claim of any import about literature will always require close reading.  If that means one must closely read 30,000 novels to establish the validity of that claim, then so be it.  But I wonder what sorts of claims about literature as literature require such amounts of data.  It seems to me that the sorts of claims being investigated are in the historian’s or sociologist’s terrain.  And if literature is merely a subset of history or sociology, then let’s simply disband the English departments and move the now-all-too-limited funding to those departments whose questions are so much more pressing.

In the end, though, I’d rather read a sentence of Dickens’ than one of any of his unfortunately forgotten English contemporaries.  And I’m quite stubborn in my belief that explaining why or arguing if that is the case is the business of English departments (and teaching students how to write in such powerful ways is the other business of English departments).

By on 06/02/10 at 02:01 AM | Permanent link to this comment

Well, Luther, alot depends on what claims you consider to be “large” and significant, doesn’t it? Consider the following article:

Monika Fludernik, The Diachrnoization of Narratology. Narrative, Vol 11, No. 3, October 2003.

After some preliminary throat-clearing, Fludernik gets around to a particular investigation, how the writer negotiates a scene shift from one location to another. She looks at 50 British narratives from late medieval to early 20th century. One of the things she finds is that, early on, the scene shift was explicitly signaled (now we’re moving from X to Y) and that such signalling disappeared over time and, with the novel, scene shifts correspond to chapter shifts and have no explicit signalling. It’s simply done. Here’s the opening of her summary statement:

As a consequence of changes in narrative structure, the scene shift that was an important functional element in Middle English narrative became downgraded to a supernumerary element occurring at chapter beginnings. The originally highly important scene shift was connected to the oral delivery of episodic narrative and depended on the overt manipulations of the narrator qua bard.15 The metanarrative quality of the scene shifts, although salient from the perspective of the twentiethcentury novel, was far less salient in Middle English narrative, since such narrative included much more striking abstract and coda sections in which the narrator played a crucial role. In the development of prose narrative, the figure of the narrator, still prominent in the Renaissance and of special importance in Fielding’s work, increasingly lost his position of privilege. At the same time the structural patterns of drama acquired a hold over narrative production and started to influence scene shifts. Thus, in most novels the narrative moves from one scene to the other, rarely changing the setting within a chapter. In this manner the novel, despite its diegetic surface structure, comes to obey the deep-structural patterns of drama, in which scene changes occur either when a new setting is introduced or when a new set of protagonists appear on stage. The introduction of lengthy dialogue scenes and, later, of consciousness scenes in the novel underlines this development (Fludernik, “Natural” Narratology ch. 4).

Now, I regard that as large and interesting. Fludernik, of course, has done close reading of her 50 texts. But we don’t know whether or not her sample is biased. What I’d really like to do is check her results in all relevant texts. That might well be doable with current technology, though we’d need to get all the texts together in a single digital archive.

From my POV the single most important result would the demonstration of a systematic change in scene shift over time. That would suggest that there is a direction to literary history, that literary history is not just a stack of more or less disconnected moments in time (see this recent post of mine on Time’s Arrow in Literary Space). If that’s case, then the historicists have to move beyond encapsulating their texts in time-stamped cases of amber and figure out what’s “driving” that directionality. Why does literature change, and in a specific direction?

And now we’re in territory I’m exploring under the rubric of cultural evolution (e.g. here, here, and here).

By Bill Benzon on 06/02/10 at 09:09 AM | Permanent link to this comment

In defence of Luther, it’s worth noting that the interesting studies being done on the basis of digital corpora command our attention by virtue of the questions they pose.  For example, the case Bill cites, about scene-changes, is compelling because scene-changes are interesting.  (What a great topic, by the way!)

But these questions would simply be unimaginable were it not for the existence of close reading.  All interesting cases of “distant reading” are, in fact, ultimately parasitic on traditions of attention to detail (narratological, formalist, new critical, or what-have-you).  And so it’s a great shame that the recent re-advocacy of “distant reading” (which has always been a good idea) has been accompanied by a needless and counterproductive polemic against close reading.  If we kill close reading, we will, in the long run, kill distant reading too.

By Joshua Landy on 06/03/10 at 03:27 AM | Permanent link to this comment

I don’t see how you can do close reading without looking at a wider context. Or all you end up with is something rather distant from the text.

I hit this problem yesterday with a 17th century text by Walter Hamond “A paradox Prooving that the inhabitants of the isle called Madagascar, or St. Laurence, (in temporall things) are the happiest people in the world”

Louis Wright suggests that Hammond owes no particular debt to literary tradition; may have picked up a few literary illusions but views the text as dependant on Hamond “turning his imagination loose”.


Hamond certainly has turned his imagination loose playing with a number of well worn themes that would be highly familar in the 17th century but not the 20th century. As it’s a work of propoganda he gives a non-tradional spin on traditional themes in places to suite his agenda and to reasure an audience familiar with such themes and his work is certainly in keeping with classical tradition stemming from the Alexandrian school with regard to the nobility of primative people.

The text is certainly original but it has a clear historical context and past life, which is down-played in the article. Wrights concern is to discuss it as an early example of the popular mid 18th century theme of man in a state of nature.

I don’t think you can draw historical horizon’s so tightly. It’s certainly a product of it’s time but it has a relationship with history.

By on 06/03/10 at 11:45 AM | Permanent link to this comment

Can you lit-crit types ever figure out the difference between a “topos” and a “trope”?  Juat sayin’.

By on 06/04/10 at 11:31 PM | Permanent link to this comment

Jim, I think your comment would have made more sense over on the tvtropes thread.  And yeah, you’re right: a trope is any twist of the language and usually today synonymous with any figure of speech.  A topos is a rhetorical commonplace: definition, cause and effect, before and after, process analysis, etc.

By on 06/05/10 at 11:17 AM | Permanent link to this comment

Add a comment:



Remember my personal information

Notify me of follow-up comments?

Please enter the word you see in the image below: