- Franco Moretti
- Matthew L. Jockers
- Ted Underwood
Each of these scholars demonstrates how distant reading can produce concrete literary knowledge using computational tools.
Let us examine their research in detail.
1. Franco Moretti: Theoretical Foundations and Early Experiments
The foundational work of distant reading appears in Moretti’s books:
- Graphs, Maps, Trees
- Distant Reading
Moretti’s research project is not simply computational; it is a new model of literary history.
He believed that literary scholars traditionally studied individual masterpieces, but literary history actually consists of large-scale systemic patterns.
His solution was to develop three analytical tools:
- graphs
- maps
- trees.
Each corresponds to a different dimension of literary analysis.
Graphs: The Rise and Fall of Genres
Moretti used graphs to study the life cycles of literary genres.
For example, he examined the publication history of British novels between 1740 and 1900.
Instead of reading thousands of novels individually, he collected publication data and plotted the number of novels produced each year.
This revealed a striking pattern.
Genres such as:
- gothic novels
- historical novels
- silver-fork novels
followed predictable cycles of emergence, popularity, and decline.
Moretti argued that literary genres behave somewhat like biological species.
They appear, flourish briefly, and then disappear.
This observation led him to propose a theory of literary evolution.
Maps: Geography of Narrative
Another experiment involved mapping the spatial structure of narratives.
In his study of nineteenth-century British novels, Moretti plotted locations mentioned in texts onto geographic maps.
The results showed that many novels construct very specific spatial patterns.
For instance, village novels often organize narrative events around central institutions such as churches, roads, or markets.
This allowed scholars to see how literature encodes social space.
In this sense, distant reading overlaps with cultural geography.
Trees: Evolution of Literary Forms
The third method borrowed from evolutionary biology.
Moretti constructed tree diagrams showing how literary genres branch and evolve over time.
For example, detective fiction evolved into multiple subgenres:
- classical detective stories
- hard-boiled crime fiction
- police procedural.
These branches form something similar to Darwinian evolutionary trees.
Moretti therefore proposed that literary history might be studied using models similar to those used in evolutionary science.
2. Matthew Jockers: Macroanalysis of Literature
The next major stage in distant reading research appears in the work of Matthew L. Jockers, especially in his book:
- Macroanalysis: Digital Methods and Literary History
Jockers moved distant reading further toward large-scale computational analysis of textual corpora.
The Corpus
Jockers assembled a dataset of approximately 3500 nineteenth-century novels.
These novels came from various English-speaking regions:
- Britain
- the United States
- Ireland.
Using computational techniques, he analyzed patterns across this massive corpus.
Topic Modeling
One of Jockers’ main tools was topic modeling.
Topic modeling is an algorithmic method that identifies clusters of words that frequently occur together across many texts.
From these clusters, researchers can infer themes or topics present in the corpus.
For example, a topic cluster containing words like:
- marriage
- courtship
- family
- inheritance
might represent domestic fiction.
By applying topic modeling to thousands of novels, Jockers could identify dominant thematic structures in nineteenth-century literature.
Sentiment Analysis
Another technique he used was sentiment analysis.
This method measures emotional tone in texts.
Algorithms classify words according to emotional valence:
- positive emotion
- negative emotion
- neutral language.
Jockers used sentiment analysis to examine emotional trajectories of novels.
He found that many narratives follow predictable emotional arcs, such as:
- tragedy (declining emotional tone)
- comedy (increasing emotional tone)
- romance (fluctuating emotional structure).
This kind of analysis allows scholars to examine narrative structure quantitatively.
Influence and Intertextuality
Jockers also used computational techniques to detect influence between authors.
By measuring stylistic similarities between texts, algorithms can estimate which authors may have influenced others.
For example, certain stylistic patterns can reveal connections between Victorian writers.
This approach opens new possibilities for studying intertextual networks in literary history.
3. Ted Underwood: Machine Learning and Literary Change
A third major figure in distant reading research is Ted Underwood.
His book
- Distant Horizons: Digital Evidence and Literary Change
represents one of the most sophisticated applications of machine learning to literary history.
The Problem Underwood Addresses
Traditional literary history often assumes that literary movements replace each other in clear chronological periods.
For example:
- Romanticism
- Realism
- Modernism.
Underwood questioned whether these categories actually correspond to observable linguistic patterns in texts.
To test this, he used machine learning algorithms to analyze large datasets of literary texts from the eighteenth to twentieth centuries.
Machine Learning Models
Underwood trained computational models to distinguish between different types of writing.
For example, a model could be trained to recognize differences between:
- fiction
- nonfiction.
Once trained, the model could analyze thousands of texts and identify statistical features distinguishing literary genres.
Gradual Change Rather Than Sudden Revolutions
Underwood discovered that literary change often occurs gradually rather than abruptly.
For instance, features associated with “modernist style” begin appearing long before the modernist period traditionally identified by literary historians.
This suggests that literary history may be more continuous than traditional periodization suggests.
4. Methodological Tools of Distant Reading
To summarize, distant reading relies on several computational techniques.
| Tool | Function |
|---|---|
| Text mining | Extract patterns from large corpora |
| Topic modeling | Identify thematic structures |
| Sentiment analysis | Measure emotional tone |
| Network analysis | Map relationships between characters or texts |
| Machine learning | Detect stylistic and genre patterns |
These tools allow scholars to analyze millions of words simultaneously, something impossible through traditional reading.
5. Philosophical Implications
Distant reading introduces a profound shift in how literature is conceptualized.
Instead of focusing only on individual works, literature becomes:
- a dataset
- a cultural system
- a historical network.
This approach resembles methodologies used in disciplines such as sociology and computational linguistics.
6. Critiques from Traditional Literary Scholars
Despite its innovations, distant reading remains controversial.
Critics argue that literature cannot be reduced to statistical patterns.
Important aspects of literature include:
- aesthetic beauty
- metaphorical complexity
- narrative voice.
These qualities may not be easily measurable by computational algorithms.
Thus critics fear that distant reading might transform literary studies into data science rather than humanistic interpretation.
7. Toward a Hybrid Model
Many scholars now advocate combining both approaches.
A typical research process might look like this:
- Use distant reading to identify large-scale patterns.
- Select representative texts.
- Perform detailed close readings of those texts.
This integrated method preserves interpretive depth while expanding analytical scope.
Conclusion
The work of scholars such as Moretti, Jockers, and Underwood demonstrates that distant reading is not merely a theoretical proposal. It is now a fully developed research methodology capable of generating new insights into literary history.
By analyzing thousands of texts simultaneously, distant reading allows scholars to detect patterns that were previously invisible to traditional literary criticism. At the same time, it does not eliminate close reading but rather repositions it within a larger analytical framework.