What is Probability Theory?

Probabilistic theory (or probability theory) is a mathematical framework used to measure and reason about uncertainty. Instead of saying something is absolutely true or false, probabilistic theory allows us to say how likely it is to occur.

It is fundamental in many disciplines, including:

  • statistics
  • artificial intelligence
  • linguistics
  • economics
  • physics
  • machine learning

Researchers in computational stylometry such as Moshe Koppel and Patrick Juola rely heavily on probabilistic reasoning when identifying authorship.


1. Basic Idea of Probabilistic Theory

In everyday reasoning we often deal with uncertain events.

For example:

  • Will it rain tomorrow?
  • Which candidate will win the election?
  • Who wrote an anonymous text?

These questions cannot always be answered with certainty.

Instead we assign probabilities.

A probability expresses the degree of likelihood of an event.

The value of probability ranges between:

0 → impossible
1 → certain

Example:

  • Probability of rain tomorrow = 0.7
  • Probability of rain tomorrow = 70%

Both mean the same thing.


2. Mathematical Definition

Probability theory describes the likelihood of events using a simple rule:

P(A)=\frac{\text{number of favorable outcomes}}{\text{total number of possible outcomes}}

Here:

  • P(A) = probability of event A

Example:

When tossing a fair coin:

Total outcomes = 2 (Heads, Tails)

Probability of heads:

P(Heads) = 1 / 2 = 0.5


3. Key Concepts in Probabilistic Theory

Random Events

A random event is something whose outcome cannot be predicted with certainty.

Examples:

  • tossing a coin
  • rolling dice
  • weather changes

In machine learning, many processes are treated as random events.


Sample Space

The sample space is the set of all possible outcomes.

Example:

Coin toss sample space:

{Heads, Tails}

Dice sample space:

{1,2,3,4,5,6}


Probability Distribution

A probability distribution shows how probabilities are assigned to possible outcomes.

Example for a fair dice:

outcomeprobability
11/6
21/6
31/6

Probability distributions are essential in statistics and machine learning.


4. Conditional Probability

Often probabilities depend on additional information.

Example:

Probability of rain given that clouds are present.

This is called conditional probability.

It is represented mathematically as:

P(AB)=P(AB)P(B)P(A|B)=\frac{P(A \cap B)}{P(B)}P(A∣B)=P(B)P(A∩B)​

P(B)P(B)P(B)

P(AB)P(A\cap B)P(A∩B)

P(AB)=P(AB)P(B)0.46P(A\mid B)=\frac{P(A\cap B)}{P(B)}\approx 0.46P(A∣B)=P(B)P(A∩B)​≈0.46P(B)=0.65P(A∩B)=0.30P(A|B) ≈ 0.46A∩B is the part of B where A also happens

This means:

Probability of A given B.

Example:

Probability that a person is sick given that they have symptoms.


5. Bayes’ Theorem

One of the most important ideas in probabilistic theory is Bayes’ theorem, formulated by the statistician Thomas Bayes.

P(AB)=P(BA)P(A)P(B)P(A|B)=\frac{P(B|A)P(A)}{P(B)}P(A∣B)=P(B)P(B∣A)P(A)​

P(A)P(A)P(A)

P(BA)P(B\mid A)P(B∣A)

P(B¬A)P(B\mid \neg A)P(B∣¬A)

P(AB)=P(BA)P(A)P(B)0.68,  P(B)0.25P(A\mid B)=\frac{P(B\mid A)P(A)}{P(B)}\approx 0.68,\; P(B)\approx 0.25P(A∣B)=P(B)P(B∣A)P(A)​≈0.68,P(B)≈0.25P(B)=0.25P(B|A)P(A)=0.17P(A|B)~0.68Posterior = useful evidence / total evidence

This theorem updates probabilities when new evidence appears.

It is widely used in:

  • medical diagnosis
  • spam detection
  • machine learning
  • authorship attribution

Example:

Suppose we want to know:

What is the probability that Jane Austen wrote a text given its stylistic features?

Bayesian reasoning calculates this probability using prior knowledge about Austen’s style.


6. Philosophical Meaning of Probability

Probability theory raises deep philosophical questions.

What does probability actually represent?

There are several interpretations.


Frequentist interpretation

Probability represents long-run frequency.

Example:

If we toss a coin many times, about half of the outcomes will be heads.

Thus probability of heads = 0.5.

This view dominates classical statistics.


Bayesian interpretation

Probability represents degree of belief.

Instead of frequency, probability measures how strongly we believe something is true based on evidence.

Example:

A doctor may say:

There is a 70% probability that the patient has a particular disease.

This interpretation is widely used in AI and machine learning.


7. Role of Probabilistic Theory in Machine Learning

Many machine-learning models are probabilistic.

They treat predictions as likelihood estimates rather than certainties.

Example:

A classifier might output:

AuthorProbability
Austen0.65
Dickens0.25
Eliot0.10

The model predicts Austen because her probability is highest.

This is exactly how Naive Bayes classifiers operate in stylometry.


8. Probabilistic Theory in Language

Human language itself contains probabilistic structure.

For example:

After the word “the,” certain words are more likely to appear.

Example probabilities:

  • the man (common)
  • the sky (common)
  • the running (rare)

Language models estimate these probabilities.

This principle is fundamental to modern natural language processing.


9. Probabilistic Thinking in Stylometry

In authorship attribution we cannot say with absolute certainty who wrote a text.

Instead we calculate probabilities of authorship.

Example:

AuthorProbability
Austen0.72
Dickens0.20
Eliot0.08

The algorithm chooses the author with the highest probability.

Thus probabilistic theory allows researchers to make informed predictions under uncertainty.


10. Importance in Artificial Intelligence

Modern AI systems rely heavily on probabilistic models.

Examples include:

  • speech recognition
  • machine translation
  • recommendation systems
  • language models

These systems do not “understand” language in a human sense.

Instead they compute probabilities of linguistic patterns.


Conclusion

Probabilistic theory provides a mathematical way to reason about uncertainty and likelihood.

It allows us to:

  • quantify uncertainty
  • update beliefs using evidence
  • make predictions based on data

This framework lies at the heart of modern fields such as machine learning, artificial intelligence, and computational linguistics.

In stylometry, probabilistic theory helps scholars estimate which author most likely produced a given text, turning literary questions into measurable statistical problems.