Probabilistic theory (or probability theory) is a mathematical framework used to measure and reason about uncertainty. Instead of saying something is absolutely true or false, probabilistic theory allows us to say how likely it is to occur.
It is fundamental in many disciplines, including:
- statistics
- artificial intelligence
- linguistics
- economics
- physics
- machine learning
Researchers in computational stylometry such as Moshe Koppel and Patrick Juola rely heavily on probabilistic reasoning when identifying authorship.
1. Basic Idea of Probabilistic Theory
In everyday reasoning we often deal with uncertain events.
For example:
- Will it rain tomorrow?
- Which candidate will win the election?
- Who wrote an anonymous text?
These questions cannot always be answered with certainty.
Instead we assign probabilities.
A probability expresses the degree of likelihood of an event.
The value of probability ranges between:
0 → impossible
1 → certain
Example:
- Probability of rain tomorrow = 0.7
- Probability of rain tomorrow = 70%
Both mean the same thing.
2. Mathematical Definition
Probability theory describes the likelihood of events using a simple rule:
P(A)=\frac{\text{number of favorable outcomes}}{\text{total number of possible outcomes}}
Here:
- P(A) = probability of event A
Example:
When tossing a fair coin:
Total outcomes = 2 (Heads, Tails)
Probability of heads:
P(Heads) = 1 / 2 = 0.5
3. Key Concepts in Probabilistic Theory
Random Events
A random event is something whose outcome cannot be predicted with certainty.
Examples:
- tossing a coin
- rolling dice
- weather changes
In machine learning, many processes are treated as random events.
Sample Space
The sample space is the set of all possible outcomes.
Example:
Coin toss sample space:
{Heads, Tails}
Dice sample space:
{1,2,3,4,5,6}
Probability Distribution
A probability distribution shows how probabilities are assigned to possible outcomes.
Example for a fair dice:
| outcome | probability |
|---|---|
| 1 | 1/6 |
| 2 | 1/6 |
| 3 | 1/6 |
Probability distributions are essential in statistics and machine learning.
4. Conditional Probability
Often probabilities depend on additional information.
Example:
Probability of rain given that clouds are present.
This is called conditional probability.
It is represented mathematically as:
P(A∣B)=P(B)P(A∩B)
P(B)
P(A∩B)
P(A∣B)=P(B)P(A∩B)≈0.46P(B)=0.65P(A∩B)=0.30P(A|B) ≈ 0.46A∩B is the part of B where A also happens
This means:
Probability of A given B.
Example:
Probability that a person is sick given that they have symptoms.
5. Bayes’ Theorem
One of the most important ideas in probabilistic theory is Bayes’ theorem, formulated by the statistician Thomas Bayes.
P(A∣B)=P(B)P(B∣A)P(A)
P(A)
P(B∣A)
P(B∣¬A)
P(A∣B)=P(B)P(B∣A)P(A)≈0.68,P(B)≈0.25P(B)=0.25P(B|A)P(A)=0.17P(A|B)~0.68Posterior = useful evidence / total evidence
This theorem updates probabilities when new evidence appears.
It is widely used in:
- medical diagnosis
- spam detection
- machine learning
- authorship attribution
Example:
Suppose we want to know:
What is the probability that Jane Austen wrote a text given its stylistic features?
Bayesian reasoning calculates this probability using prior knowledge about Austen’s style.
6. Philosophical Meaning of Probability
Probability theory raises deep philosophical questions.
What does probability actually represent?
There are several interpretations.
Frequentist interpretation
Probability represents long-run frequency.
Example:
If we toss a coin many times, about half of the outcomes will be heads.
Thus probability of heads = 0.5.
This view dominates classical statistics.
Bayesian interpretation
Probability represents degree of belief.
Instead of frequency, probability measures how strongly we believe something is true based on evidence.
Example:
A doctor may say:
There is a 70% probability that the patient has a particular disease.
This interpretation is widely used in AI and machine learning.
7. Role of Probabilistic Theory in Machine Learning
Many machine-learning models are probabilistic.
They treat predictions as likelihood estimates rather than certainties.
Example:
A classifier might output:
| Author | Probability |
|---|---|
| Austen | 0.65 |
| Dickens | 0.25 |
| Eliot | 0.10 |
The model predicts Austen because her probability is highest.
This is exactly how Naive Bayes classifiers operate in stylometry.
8. Probabilistic Theory in Language
Human language itself contains probabilistic structure.
For example:
After the word “the,” certain words are more likely to appear.
Example probabilities:
- the man (common)
- the sky (common)
- the running (rare)
Language models estimate these probabilities.
This principle is fundamental to modern natural language processing.
9. Probabilistic Thinking in Stylometry
In authorship attribution we cannot say with absolute certainty who wrote a text.
Instead we calculate probabilities of authorship.
Example:
| Author | Probability |
|---|---|
| Austen | 0.72 |
| Dickens | 0.20 |
| Eliot | 0.08 |
The algorithm chooses the author with the highest probability.
Thus probabilistic theory allows researchers to make informed predictions under uncertainty.
10. Importance in Artificial Intelligence
Modern AI systems rely heavily on probabilistic models.
Examples include:
- speech recognition
- machine translation
- recommendation systems
- language models
These systems do not “understand” language in a human sense.
Instead they compute probabilities of linguistic patterns.
Conclusion
Probabilistic theory provides a mathematical way to reason about uncertainty and likelihood.
It allows us to:
- quantify uncertainty
- update beliefs using evidence
- make predictions based on data
This framework lies at the heart of modern fields such as machine learning, artificial intelligence, and computational linguistics.
In stylometry, probabilistic theory helps scholars estimate which author most likely produced a given text, turning literary questions into measurable statistical problems.