Koppel and other computational linguists often experimented with three major families of machine-learning models:
- Support Vector Machines (SVM)
- Bayesian classifiers
- Neural network models
Each of these models attempts to answer the same question:
Given a text, which author most likely wrote it?
But they approach the problem in very different mathematical ways.
1. The General Machine-Learning Framework in Stylometry
Before discussing individual models, we must understand the basic pipeline used in stylometric machine learning.
Step 1 — Feature extraction
Texts are converted into numerical features.
Examples of features:
- frequency of function words
- character n-grams
- sentence length
- punctuation patterns
- vocabulary richness
Example feature vector for a text:
| feature | value |
|---|---|
| frequency of “the” | 0.065 |
| frequency of “and” | 0.041 |
| avg sentence length | 17 |
Each text becomes a vector of numbers.
Step 2 — Training data
The system is trained on texts whose authors are already known.
Example dataset:
| text | author |
|---|---|
| text 1 | Austen |
| text 2 | Dickens |
| text 3 | Austen |
The algorithm learns patterns linking features → authors.
Step 3 — Classification
When a new anonymous text appears, the model predicts which author it most closely resembles.
2. Support Vector Machines (SVM)
One of the most powerful models used by Koppel is the Support Vector Machine.
This model is widely used in machine learning because it performs extremely well with high-dimensional data, such as stylometric features.
Core Idea
Imagine each text as a point in a geometric space.
Example:
A text might be represented by coordinates like
(0.065, 0.041, 17)
Each coordinate represents a feature.
The algorithm tries to find a boundary that separates authors.
Example:
Texts by Dickens might lie on one side of a boundary.
Texts by Austen lie on another side.
The model learns a separating line or plane.
Mathematical Principle
The goal of SVM is to maximize the margin between classes.
The classification rule is represented by a hyperplane:
w \cdot x + b = 0
Here:
- x = feature vector of the text
- w = weight vector learned by the algorithm
- b = bias term
The hyperplane divides the feature space into regions corresponding to different authors.
Why SVM Works Well in Stylometry
Stylometric datasets often have:
- thousands of linguistic features
- relatively small numbers of texts
SVM handles this situation very effectively.
Advantages:
- high accuracy
- robust to noise
- works well with sparse data
For this reason, many stylometry studies in the 2000s used SVM as their primary model.
3. Bayesian Classifiers
Another important model used in stylometry is the Naive Bayes classifier.
This model is based on probability theory.
Its goal is to compute:
What is the probability that a given author wrote this text?
The Bayesian Principle
The model relies on the principle formulated by the mathematician
Thomas Bayes.
The key rule is:
P(A∣B)=P(B)P(B∣A)P(A)
P(A)
P(B∣A)
P(B∣¬A)
P(A∣B)=P(B)P(B∣A)P(A)≈0.68,P(B)≈0.25P(B)=0.25P(B|A)P(A)=0.17P(A|B)~0.68Posterior = useful evidence / total evidence
Where:
- P(A∣B) = probability of author A given the text
- P(B∣A) = probability of the text given author A
- P(A) = prior probability of author A
How It Works in Stylometry
The algorithm calculates the probability that each author produced the observed features.
Example:
| Author | Probability |
|---|---|
| Austen | 0.73 |
| Dickens | 0.18 |
| Eliot | 0.09 |
The highest probability determines the predicted author.
Why It Is Called “Naive”
The model assumes that linguistic features are independent of one another.
Example:
It treats
- frequency of “the”
- sentence length
- punctuation patterns
as unrelated variables.
This assumption is not perfectly realistic, but surprisingly the model works very well in practice.
Strengths of Bayesian Methods
Advantages include:
- simple and computationally efficient
- effective with small datasets
- easy to interpret probabilistically
Because of these properties, Naive Bayes is often used as a baseline model in stylometry experiments.
4. Neural Network Models
More recently, stylometry researchers began using neural networks, inspired by artificial neural systems in the brain.
These models are especially common in modern natural language processing.
Basic Idea
A neural network consists of layers of computational units called neurons.
Each neuron performs a weighted transformation of its inputs.
The basic operation is:
y = f(\sum_{i=1}^{n} w_i x_i + b)
Where:
- xi = input features
- wi = learned weights
- b = bias
- f = activation function
Neural Networks in Authorship Attribution
In stylometry, neural networks can learn complex patterns such as:
- phrase structures
- word order patterns
- stylistic rhythms
Unlike simpler models, neural networks can capture non-linear relationships between features.
Deep Learning Approaches
Modern systems sometimes use deep neural architectures such as:
- recurrent neural networks (RNN)
- transformers
- convolutional neural networks
These models analyze language at multiple levels:
- characters
- words
- sentences
They can detect subtle stylistic signals that simpler models may miss.
5. Comparison of the Three Models
| Model | Core idea | Strength |
|---|---|---|
| Support Vector Machine | geometric separation of authors | very accurate with high-dimensional data |
| Bayesian classifier | probabilistic inference | simple and interpretable |
| Neural network | layered pattern learning | captures complex linguistic structures |
Each model represents a different philosophy of machine learning.
6. Why Koppel Used Multiple Models
Koppel emphasized that no single algorithm is universally best.
Instead, researchers should experiment with different models depending on:
- dataset size
- text length
- feature types
Stylometry often uses ensemble methods, combining several algorithms to improve accuracy.
7. Philosophical Implications
The use of machine learning in stylometry reflects a deeper philosophical shift.
Traditional literary criticism assumes that style is:
- artistic
- interpretive
- qualitative
Machine learning treats style as:
- measurable
- statistical
- algorithmically recognizable
Thus computational stylometry reveals that literary style contains hidden mathematical structures.
8. Influence on Digital Humanities
Koppel’s work helped integrate stylometry into the broader field of digital humanities.
Researchers now use machine learning to study:
- disputed authorship in literature
- historical documents
- political speeches
- social media texts
This represents a new paradigm where literary analysis intersects with artificial intelligence.
✅ In essence
Koppel’s machine-learning approach transformed stylometry from a purely statistical technique into a modern AI-driven discipline, capable of analyzing large collections of texts and identifying subtle patterns in human writing.