Introduction
If Matthew L. Jockers’s Macroanalysis established the large-scale application of topic modeling in literary studies, the work of Ted Underwood advances the field in a more theoretically refined and methodologically self-conscious direction. Underwood’s research, particularly in Distant Horizons: Digital Evidence and Literary Change, represents a significant development in the use of computational models—including topic modeling—to investigate the evolution of literary discourse over time.
Where earlier studies emphasized thematic discovery, Underwood’s work is distinguished by a more precise question:
How do literary categories—such as genre, period, and style—change over time, and how can these changes be measured?
1. Corpus and Research Design
Underwood’s research is grounded in large-scale corpora of English-language texts, including:
- Nineteenth- and twentieth-century novels
- Digitized archives from libraries and databases
- Both canonical and non-canonical works
Unlike earlier approaches that focused primarily on thematic clustering, Underwood integrates:
- Topic modeling
- Classification algorithms
- Statistical modeling
This multi-method approach allows for a more nuanced understanding of literary change.
2. Topic Modeling as a Tool for Historical Analysis
Underwood employs Latent Dirichlet Allocation not merely to identify themes, but to track their historical trajectories.
Key Idea
Topics are not static—they evolve across time.
Rather than asking:
- What topics exist?
Underwood asks:
- How do topics rise, decline, and transform across decades?
3. Modeling Genre as a Moving Target
One of Underwood’s most influential contributions is his reconceptualization of genre.
Traditional View
- Genres are fixed categories (e.g., romance, realism, gothic)
Underwood’s View
- Genres are statistical patterns that shift over time
Using topic modeling, he demonstrates that:
- The vocabulary associated with a genre changes
- The boundaries between genres are fluid
- Genres overlap and evolve
Example: The Novel
Underwood shows that what counts as a “novel” in:
- 1800
is not the same as in: - 1900
This is reflected in:
- Changing word distributions
- Emerging and disappearing topics
4. Temporal Dynamics of Language
A central focus of Underwood’s work is the temporal dimension of language.
By applying topic modeling across chronological slices of data, he identifies:
(1) Topic Emergence
- New thematic clusters appear over time
(2) Topic Persistence
- Some themes remain stable across centuries
(3) Topic Decline
- Certain discourses fade or disappear
Illustrative Patterns
For instance, analysis may reveal:
- Decline in religious vocabulary over time
- Rise in industrial and scientific discourse
- Shifts in emotional and psychological language
These patterns provide:
A data-driven account of cultural transformation.
5. Combining Topic Modeling with Classification
Underwood’s work goes beyond unsupervised modeling.
He integrates:
- Supervised machine learning
- Predictive modeling
This allows him to:
- Classify texts by period or genre
- Measure how distinguishable different periods are
Key Insight
If a model can accurately predict the date of a text, then language has measurable historical signatures.
This transforms literary history into:
- A problem of pattern recognition
6. Methodological Sophistication
Underwood emphasizes methodological rigor in several ways:
(1) Validation
- Testing models on unseen data
(2) Replicability
- Making methods transparent
(3) Interpretation
- Combining statistical results with human analysis
This marks a maturation of the field:
From experimentation to disciplined inquiry.
7. Conceptual Contributions
Underwood’s work introduces several important theoretical shifts:
(a) History as Gradient, Not Boundary
- Literary periods are not sharply defined
- They change gradually
(b) Genre as Distribution
- Genres are not categories but tendencies
(c) Evidence in Literary Studies
- Computational models provide a new form of evidence
- Not replacing interpretation, but supplementing it
8. Tensions and Critiques
Despite its sophistication, Underwood’s approach raises critical questions.
(1) Quantification vs Interpretation
- Can statistical patterns capture literary meaning?
(2) Loss of Aesthetic Detail
- Style, irony, and narrative complexity remain difficult to model
(3) Dependence on Archives
- Results reflect available digitized texts
- Not the totality of literary production
9. Relation to Broader Digital Humanities
Underwood’s work represents a second phase in digital humanities:
- First phase: Demonstration (e.g., Jockers)
- Second phase: Refinement and critique
It moves the field toward:
- Methodological self-awareness
- Theoretical integration
Conclusion
The research of Ted Underwood marks a significant advancement in the application of topic modeling to literary studies. By focusing on temporal dynamics, genre evolution, and methodological rigor, it transforms topic modeling from a tool of thematic discovery into an instrument of historical analysis.
Through the use of Latent Dirichlet Allocation and complementary techniques, Underwood demonstrates that literary change can be measured, modeled, and interpreted as a dynamic system of shifting linguistic patterns.
The broader implication is profound:
Literary history is not a sequence of fixed periods and stable genres, but a continuous process of transformation—one that can be traced through the statistical evolution of language itself.