Welcome to The Valve
Login
Register


Valve Links

The Front Page
Statement of Purpose

John Holbo - Editor
Scott Eric Kaufman - Editor
Aaron Bady
Adam Roberts
Amardeep Singh
Andrew Seal
Bill Benzon
Daniel Green
Jonathan Goodwin
Joseph Kugelmass
Lawrence LaRiviere White
Marc Bousquet
Matt Greenfield
Miriam Burstein
Ray Davis
Rohan Maitzen
Sean McCann
Guest Authors

Laura Carroll
Mark Bauerlein
Miriam Jones

Past Valve Book Events

cover of the book Theory's Empire

Event Archive

cover of the book The Literary Wittgenstein

Event Archive

cover of the book Graphs, Maps, Trees

Event Archive

cover of the book How Novels Think

Event Archive

cover of the book The Trouble With Diversity

Event Archive

cover of the book What's Liberal About the Liberal Arts?

Event Archive

cover of the book The Novel of Purpose

Event Archive

The Valve - Closed For Renovation

Happy Trails to You

What’s an Encyclopedia These Days?

Encyclopedia Britannica to Shut Down Print Operations

Intimate Enemies: What’s Opera, Doc?

Alphonso Lingis talks of various things, cameras and photos among them

Feynmann, John von Neumann, and Mental Models

Support Michael Sporn’s Film about Edgar Allen Poe

Philosophy, Ontics or Toothpaste for the Mind

Nazi Rules for Regulating Funk ‘n Freedom

The Early History of Modern Computing: A Brief Chronology

Computing Encounters Being, an Addendum

On the Origin of Objects (towards a philosophy of computation)

Symposium on Graeber’s Debt

The Nightmare of Digital Film Preservation

Bill Benzon on Whatwhatwhatwhatwhatwhatwhat?

Nick J. on The Valve - Closed For Renovation

Bill Benzon on Encyclopedia Britannica to Shut Down Print Operations

Norma on Encyclopedia Britannica to Shut Down Print Operations

Bill Benzon on What’s an Object, Metaphysically Speaking?

john balwit on What’s an Object, Metaphysically Speaking?

William Ray on That Shakespeare Thing

Bill Benzon on That Shakespeare Thing

William Ray on That Shakespeare Thing

JoseAngel on That Shakespeare Thing

Bill Benzon on Objects and Graeber's Debt

Bill Benzon on A Dirty Dozen Sneaking up on the Apocalypse

JoseAngel on A Dirty Dozen Sneaking up on the Apocalypse

JoseAngel on Objects and Graeber's Debt

Bill Benzon on The Sins of Steven Pinker: Or, Let’s Get on with It

Advanced Search

Articles
RSS 1.0 | RSS 2.0 | Atom

Comments
RSS 1.0 | RSS 2.0 | Atom

XHTML | CSS

Powered by Expression Engine
Logo by John Holbo

Creative Commons Licence
This work is licensed under a Creative Commons License.

 


Blogroll

2blowhards
About Last Night
Academic Splat
Acephalous
Amardeep Singh
Beatrice
Bemsha Swing
Bitch. Ph.D.
Blogenspiel
Blogging the Renaissance
Bookslut
Booksquare
Butterflies & Wheels
Cahiers de Corey
Category D
Charlotte Street
Cheeky Prof
Chekhov’s Mistress
Chrononautic Log
Cliopatria
Cogito, ergo Zoom
Collected Miscellany
Completely Futile
Confessions of an Idiosyncratic Mind
Conversational Reading
Critical Mass
Crooked Timber
Culture Cat
Culture Industry
CultureSpace
Early Modern Notes
Easily Distracted
fait accompi
Fernham
Ferule & Fescue
Ftrain
GalleyCat
Ghost in the Wire
Giornale Nuovo
God of the Machine
Golden Rule Jones
Grumpy Old Bookman
Ideas of Imperfection
Idiocentrism
Idiotprogrammer
if:book
In Favor of Thinking
In Medias Res
Inside Higher Ed
jane dark’s sugarhigh!
John & Belle Have A Blog
John Crowley
Jonathan Goodwin
Kathryn Cramer
Kitabkhana
Languagehat
Languor Management
Light Reading
Like Anna Karina’s Sweater
Lime Tree
Limited Inc.
Long Pauses
Long Story, Short Pier
Long Sunday
MadInkBeard
Making Light
Maud Newton
Michael Berube
Moo2
MoorishGirl
Motime Like the Present
Narrow Shore
Neil Gaiman
Old Hag
Open University
Pas au-delà
Philobiblion
Planned Obsolescence
Printculture
Pseudopodium
Quick Study
Rake’s Progress
Reader of depressing books
Reading Room
ReadySteadyBlog
Reassigned Time
Reeling and Writhing
Return of the Reluctant
S1ngularity::criticism
Say Something Wonderful
Scribblingwoman
Seventypes
Shaken & Stirred
Silliman’s Blog
Slaves of Academe
Sorrow at Sills Bend
Sounds & Fury
Splinters
Spurious
Stochastic Bookmark
Tenured Radical
the Diaries of Franz Kafka
The Elegant Variation
The Home and the World
The Intersection
The Litblog Co-Op
The Literary Saloon
The Literary Thug
The Little Professor
The Midnight Bell
The Mumpsimus
The Pinocchio Theory
The Reading Experience
The Salt-Box
The Weblog
This Public Address
This Space: The Fire’s Blog
Thoughts, Arguments & Rants
Tingle Alley
Uncomplicatedly
Unfogged
University Diaries
Unqualified Offerings
Waggish
What Now?
William Gibson
Wordherders

Monday, July 18, 2011

HD7: Digital Humanities Sandbox Goes to the Congo

Posted by Bill Benzon on 07/18/11 at 10:38 AM

Or, Speculations in Computational Evolutionary Psychology

Note: This version of the post has been revised from an earlier version in which I suggested that the distribution in the first chart followed a power law. Cosma Shalizi checked it for me and it’s not a power law distribution. It’s an exponential distribution.

So, I’ve been exploring Conrad’s Heart of Darkness. In the last two posts I’ve examined one paragraph in the text, the so-called nexus. It’s the longest paragraph in the text, it’s structurally central, and it covers a lot of semantic territory.

OK, but what about the other paragraphs.

What about them?

Aren’t you going to look at them?

Well, yeah, but I sure don’t have time to troll through them like I did the nexus. I mean, that post stretched from here to Sunday.

I get your point. Why don’t you do the Moretti thing?

Moretti thing?

You know, distant reading.

Distant reading? You mean count something? Count what?

How about paragraph length?

What’ll that get me?

I don’t know. Just do it. I mean, you already know that the nexus is the longest paragraph in the text. There must be something going on with that. Mess around and see if something turns up.


* * * * *

I did and it did.

I used the MSWord word-count tool to count the words in every paragraph in the text. All 198 of them. One at a time. Real tedious stuff. Then I loaded the results into a spreadsheet and created a bar chart showing paragraph length from longest to shortest:


HD whole ordered 2


Whoa!

I didn’t have any hunches about the shape of this distribution before I began this work, much less any actual knowledge about paragraph length in, say, late 19th and early 20th century British fiction, or in any other population of texts for that matter. But I don’t think I’d have guessed a distribution like that. Given that 200 words is a pretty long paragraph – at least it is these days, though I’ve read lots of long paragraphs in 19th century novels – I’d probably have guessed a much flatter distribution with the maximum somewhere above 200 but less than, say, 300. And I’d still make a guess like that for most books. Even for books where paragraphs of 500, 600 words were not uncommon, I’d guess that there’d be a bunch of paragraphs close to one another at the upper end and a pretty slow drop-off to the lower end. Instead we have one paragraph that’s distinctly longer than its nearest neighbors, 1502 vs. 1129 and 1103, and a pretty quick drop to below 400.

I’d like to know two things: 1) what’s typical about the distribution of paragraph length, if indeed anything is typical, 2) In this text, what does that distribution imply about the mind? What’s the mind doing inside of a paragraph in Heart of Darkness that’s different from what it’s doing between one paragraph and the next?


* * * * *

We know that the nexus is way over there to the left of the distribution. And we know, at least qualitatively, what’s going on inside it. For the most part, it’s about Kurtz, from his Intended to bald head, his ivory, his pan-European background, his brilliant ideas, and his batshit craziness. Let’s say that what happens inside a paragraph is a certain kind of integration, whatever that means: certain kind.

So, what happens between paragraphs?

Well, let’s see. If we’re integrating within a paragraph, what do we do when we’re done? Integration means something like gathering together and wrapping it up in a package. What do we do with the package?

Pass it on? Tell it to someone else, like in a conversation.

Hmmmm . . . I don’t know whether that’s going to get us anything. But let’s give it a try.

Here’s another chart:


HD sec3


It looks very different from the first one. It’s very, well, spiky. What is it?

It’s paragraph lengths in the third section of the text, the last one.

So the long paragraphs aren’t all bunched together, are they? They’re spread out.

Yes.

Whatever’s going on inside a really long paragraph only has to get done every once in awhile.

Seems that way.

What’s going on there at the end, with all those short paragraphs?

That’s the conversation between Marlow and the Intended.

Oh, he said, she said. Each paragraph is one person speaking.

Yes. And one person passes it on to the other from one paragraph to the next, like you said.

Turn taking.

Hmmmm . . . But you know, this isn’t a REAL conversation. It’s imaginary. It’s all going on inside one person.

Joseph Conrad.

Well, yeah, he wrote the text. But, you know, like the man said, Mistah Conrad—he dead. Now, it’s you and me. We’re the heads in which this conversation’s taking place.

So, those imaginary people, they’re like ‘places’ in the mind, like registers in a computer? And we’re passing information from one to the other.

Something like that.

Well, you know those evolutionary psychologists keep arguing

Don’t they ever!

... keep arguing that the last spurt of brain evolution

The Big One!

Yeah, the big one. It was about social communication. We grew this brain so we could manage our social life.

So, you’re saying that, like, we’ve got this big computational space let’s call it, this computational space just for dealing with one another. And that’s what’s going on in this text, computing in that space and, like, different people are different registers in that space.

Something like that, something like that.

And when we’re looking at these paragraph lengths, we’re looking at traces—maybe even traces in the Derridean sense of the word

No no no no no! Don’t you dare, don’t you dare mix deconstruction with evolutionary psychology. I forbid it. I forbid it. Can’t happen! Don’t do it! The horror! the horror!

... we’re looking at traces of those computations. The barest traces. Just hints.

Except that we CAN look at what’s going on in those paragraphs, each one of them and analyze it.

But what about Moretti, distant reading? You’re the one who brought it up, remember?

Oh, yeah. But, well, aren’t we going to somehow find our way back to the text? I mean, distance is distance, but . . .

I suppose. But now that you got me on the Moretti kick I’d like to run with it for awhile.

OK, so how about another chart and then let’s call it quits.

I’m tired.

OK, I’m getting a little fried too. Here’s another chart. It’s the whole text, all the paragraphs, from beginning to end in order.


HD whole

Spiky, spiky, spiky!

Yep, there’s the nexus, right there in the middle. And the final conversation off there to the right. And the distribution before the nexus seems to be a bit different from the one after.

I wonder what’s going on in those other two mega paragraphs, the other ones above 1000?

Well, notice that they’re pretty close to the nexus, one before, the other after. They’re not, like, way on the outside. The whole thing looks a bit like a pyramid.

But what’s going on INSIDE them?

The first one’s about Marlow’s crew on the boat. And the other one, the one after the nexus, that’s about the Russian and his relationship to Kurtz. But we’re getting too close, too close for distant reading.

Yeah, but don’t we have to sooner or later?

Maybe later, but not now.

OK, not now.

Appendix: Authorial Intention

That’s a standard issue in literary criticism: What did the author intend? For some critics it’s crucial. For others, it’s beside the point.

Whatever.

But, did Conrad intend for the distribution of paragraph length in Heart of Darkness to be exponential? If you mean conscious intention, that seems very unlikely. But those paragraphs were surely written to be as Conrad wanted them. Whatever it was that he was consciously intending, it left an unconscious trace in the text in the form of a most interesting distribution of paragraph lengths.

Update: There’s an interesting conversation about this a Language Log. Nostromo has a similar distribution.

Update 2: Mark Liberman has posted a distribution of paragraph lengths by length for Nostromo and it’s exponential. He’s also done a distribution by order in the text, and that appears to be pretty random. In particular the longest paragraphs are not toward the center. One is toward the beginning, the other toward the end. So, whatever’s “driving” the length distribution seems to be independent of position. (For these two texts.)


* * * * *

Earlier posts in this series:


Comments

Wonder how well that rather normal looking distribution of paragraph lengths fits all of Conrad’s other novels…

By on 07/18/11 at 11:11 AM | Permanent link to this comment

also, just in terms of differences between disciplinary methods and interpretive foci, I wonder just how far away from the mean the nexus is. Is it a statistical outlier, a data point that would be discarded in, say, a behavioral study, but is not here precisely because it’s your starting point--- what’s unique and particular.

By on 07/18/11 at 11:23 AM | Permanent link to this comment

1) Which “rather normal looking distribution of paragraph lengths” are you talking about? I don’t see one. There’s nothing “normal” in a statistical sense. And I don’t see a distribution that strikes me as being typical.

2) Mean = 193, Median = 162.

By Bill Benzon on 07/18/11 at 12:08 PM | Permanent link to this comment

1) last graph

By on 07/18/11 at 01:07 PM | Permanent link to this comment

That is one of the weirdest headlines I’ve ever seen.

By Shelley on 07/18/11 at 07:31 PM | Permanent link to this comment

@Jim: The last graph is by no means “rather normal looking” - it is, in essence, a time series. It’s not a distribution at all.

By on 07/18/11 at 11:20 PM | Permanent link to this comment

could it be a distribution of words per paragraph over time?

By on 07/19/11 at 08:56 AM | Permanent link to this comment

That depends on exactly what you mean over time. In the last graph I’ve listed the paragraphs in order from left to right. So, yes, time moves left to right. However . . .

Distance along the graph from left to right is not directly proportional to time. Let’s assume, for the moment, that someone reads words at a constant rate throughout the text. Let’s assume, say, only 150 words a minute. That tallest bar, corresponding to the longest paragraph, is 1500 words, so that’s 10 minutes of reading time. The last 35 paragraphs are about 1700 words, or 11 minutes of reading time.

By Bill Benzon on 07/19/11 at 09:20 AM | Permanent link to this comment

Very interesting analysis.  Writers do tend to favor a somewhat normalized paragraph length as a preferred length.  This does vary, but I find I varies only moderately, besides about 1/5 paragraphs.  Your data seems to indicate this as well. 

By on 09/17/11 at 03:34 PM | Permanent link to this comment

Add a comment:

Name:
Email:
Location:
URL:

 

Remember my personal information

Notify me of follow-up comments?

Please enter the word you see in the image below: