To understand digital humanities research in literary studies, one must first clearly understand what computational methods are and how algorithms are applied to literary texts. Only then can we appreciate how scholars use these tools to study literature at large scales.

This discussion requires four stages:

  1. what computational methods mean in the humanities
  2. what algorithms are and how they work in textual analysis
  3. the major computational techniques used in literary studies
  4. concrete research examples from leading digital humanities scholars.

1. What Is a Computational Method?

A computational method refers to the use of computer-based techniques to analyze large datasets. In the context of literary studies, the dataset consists of digital texts such as novels, poems, plays, or archives.

Instead of manually reading and interpreting texts, scholars use computers to perform tasks such as:

  • counting linguistic patterns
  • detecting stylistic features
  • identifying thematic clusters
  • mapping relationships between texts.

The central idea is that computers can process vast amounts of textual data much faster than humans.

Thus computational methods allow scholars to study large literary corpora, sometimes containing thousands or millions of texts.


2. What Is an Algorithm?

An algorithm is a step-by-step set of instructions designed to solve a specific problem.

In digital humanities, algorithms analyze texts by performing operations such as:

  • counting words
  • identifying repeated patterns
  • measuring similarity between texts
  • classifying texts into categories.

For example, an algorithm might be designed to:

  1. read a digital text
  2. break it into individual words
  3. count the frequency of each word
  4. compare those frequencies across many texts.

This simple process already produces useful insights about style, vocabulary, and thematic emphasis.

Thus algorithms act as analytical tools that allow scholars to detect patterns invisible to manual reading.


3. Preparing Texts for Computational Analysis

Before algorithms can analyze literary works, the texts must be converted into machine-readable form.

This involves several steps.

Digitization

Printed texts are scanned and converted into digital format using optical character recognition (OCR).

Large literary corpora such as those used in digital humanities research often come from databases like:

  • Project Gutenberg
  • HathiTrust Digital Library.

Text cleaning

Digital texts must be cleaned to remove irrelevant elements such as:

  • page numbers
  • formatting symbols
  • footnotes.

Tokenization

Tokenization means breaking a text into smaller units called tokens, usually words.

For example, the sentence:

“The night was dark and stormy”

would be tokenized as:

  • the
  • night
  • was
  • dark
  • and
  • stormy.

This allows algorithms to analyze word frequencies and patterns.


Lemmatization

Words often appear in different grammatical forms.

For example:

  • run
  • runs
  • running
  • ran.

Lemmatization reduces these forms to a single root word.

This allows computational models to treat them as the same concept.


4. Major Computational Techniques Used in Literary Studies

Once texts are prepared, scholars apply various computational methods. Several techniques are especially important in digital literary research.


Word Frequency Analysis

This is the simplest computational method.

Algorithms count how often each word appears in a text or corpus.

This technique can reveal:

  • dominant themes
  • stylistic tendencies
  • differences between authors.

For example, frequent use of words like:

  • nature
  • mountain
  • river
  • solitude

might indicate a romantic literary style.


Stylometry

Stylometry analyzes authorial style using statistical techniques.

It measures patterns such as:

  • word frequency
  • sentence length
  • grammatical structures.

Stylometry can sometimes determine who wrote a particular text.

One famous application involved the analysis of the novel The Cuckoo’s Calling.
Computational analysis revealed that the real author was J. K. Rowling.

Stylometric techniques were developed by scholars such as:

  • John Burrows

Burrows introduced a statistical method known as Burrows’s Delta, which measures stylistic similarity between texts.


Topic Modeling

Topic modeling is one of the most powerful tools used in digital humanities.

It uses algorithms to identify hidden thematic structures within large corpora.

The most widely used algorithm is Latent Dirichlet Allocation (LDA).

The algorithm works by detecting clusters of words that frequently appear together.

For example, a cluster containing words like:

  • ship
  • sea
  • voyage
  • captain

might represent a maritime narrative theme.

Topic modeling allows scholars to analyze themes across thousands of texts simultaneously.

A major researcher using this method is Matthew L. Jockers in his book Macroanalysis: Digital Methods and Literary History.


Sentiment Analysis

Sentiment analysis measures emotional tone within texts.

Algorithms classify words as:

  • positive
  • negative
  • neutral.

By applying sentiment analysis to entire novels, scholars can trace emotional trajectories across narratives.

For example, many stories follow recognizable emotional patterns such as:

  • rise–fall–rise structures
  • tragic downward arcs.

Network Analysis

Network analysis examines relationships between characters or texts.

Characters are treated as nodes, and interactions between them form links.

This produces visual maps of narrative structures.

For example, network analysis can reveal:

  • central characters in a novel
  • clusters of social relationships
  • structural organization of narratives.

This method has been applied to works like Hamlet to visualize character interactions.


5. Major Digital Humanities Scholars and Their Research

To see how these methods are applied in practice, it is useful to examine the work of major scholars.


Franco Moretti

Franco Moretti pioneered distant reading.

In Graphs, Maps, Trees, he used statistical data to study the rise and fall of literary genres.

Instead of analyzing individual novels, he examined large datasets of publication records.

This revealed patterns in the evolution of literary forms.


Matthew Jockers

Matthew L. Jockers used topic modeling and sentiment analysis to study thousands of nineteenth-century novels.

His work demonstrates how algorithms can identify:

  • thematic patterns
  • narrative structures
  • emotional arcs across literary history.

Ted Underwood

Ted Underwood uses machine learning models to study long-term literary change.

In Distant Horizons: Digital Evidence and Literary Change, he analyzed literary texts spanning several centuries.

His research shows that literary change often occurs gradually rather than through abrupt revolutions.


Andrew Piper

Another important figure is Andrew Piper.

In his book Enumerations: Data and Literary Study, Piper examines how counting and statistical analysis can reveal patterns in literary culture.

His research studies:

  • repetition in literature
  • narrative pacing
  • textual structure.

6. Advantages of Computational Methods

Computational analysis offers several major advantages for literary studies.

Scale

Algorithms can analyze thousands of texts simultaneously.


Objectivity

Statistical methods reduce subjective bias in interpretation.


Pattern detection

Computational models reveal patterns that human readers might never notice.


7. Limitations and Challenges

Despite their advantages, computational methods also have limitations.

Algorithms cannot easily understand:

  • irony
  • metaphor
  • symbolic meaning.

These aspects of literature require interpretive judgment.

Therefore most scholars argue that computational methods should complement rather than replace traditional literary analysis.


Conclusion

Computational methods represent one of the most important methodological innovations in contemporary literary studies. By using algorithms to analyze large textual corpora, scholars can detect linguistic, thematic, and structural patterns across literary history. Researchers such as Moretti, Jockers, Underwood, and Piper have demonstrated how these techniques can transform our understanding of literary systems.

At the same time, computational analysis cannot fully capture the aesthetic and interpretive richness of literature. For this reason, the most productive approach combines computational analysis with traditional close reading, creating a multi-scale method of literary research.