Probability distributions

Continuous

Discrete

Sample size (N) 1,000

	Mean	Variance	Skewness	Kurtosis
Theoretical	—	—	—	—
Sample	—	—	—	—

What is a probability distribution?

A probability distribution is the shape of randomness. It tells you how likely each possible outcome is. Roll a fair die and every face is equally probable — that is the discrete uniform distribution. Measure the heights of a thousand people and they cluster around an average, tailing off symmetrically — that is the normal distribution. Every distribution is a different answer to the question: how is uncertainty structured?

Mathematically, a continuous distribution is described by a probability density function (PDF) whose integral over all values equals one. A discrete distribution uses a probability mass function (PMF) whose values sum to one. The shape of these functions encodes everything: where values cluster, how far they spread, whether they lean left or right, and how heavy the tails are.

The Central Limit Theorem

Why does the normal distribution appear everywhere? The central limit theorem provides the answer. Take any distribution — exponential, uniform, Poisson, anything with finite variance — and draw repeated samples. The distribution of the sample means will converge to a normal distribution as the sample size grows, regardless of the shape of the underlying distribution.

This is why measurement errors are normally distributed, why exam scores form bell curves, and why so much of statistics is built around the normal distribution. It is not that the world is inherently Gaussian — it is that averages are, and averages are what we often measure.

Continuous vs. Discrete

Continuous distributions model quantities that can take any value in an interval: temperature, time, height. Their PDF gives the density at each point — the probability of falling in a small interval is roughly the density times the interval width. The area under the entire PDF curve is exactly one.

Discrete distributions model countable outcomes: the number of defects in a batch, goals in a football match, customers arriving in an hour. Their PMF gives the exact probability of each value, and all probabilities sum to one. The Poisson, binomial, and geometric distributions are the workhorses here.

Choosing the Right Distribution

Normal — when values cluster symmetrically around a mean, especially for sums or averages (heights, measurement errors, stock returns over short periods). Exponential — for waiting times between independent events (time between customer arrivals, radioactive decay). Poisson — for counts of rare, independent events in a fixed interval (typos per page, meteor strikes per year).

Beta — for modeling probabilities or proportions bounded between 0 and 1 (Bayesian priors, batting averages). Gamma — for positive continuous values with a right skew (insurance claims, rainfall). Log-Normal — when the logarithm of the variable is normally distributed (income, stock prices, city sizes). Binomial — for the number of successes in fixed trials (coin flips, defective items). Geometric — for the number of trials until the first success.

Convergence

Move the sample size slider to the right and watch the histogram approach the theoretical curve. With 100 samples, the histogram is rough and jagged. With 1,000 it starts to smooth out. With 100,000 it hugs the theoretical curve almost exactly. This is the law of large numbers made visible: as you collect more data, the empirical distribution converges to the true distribution.

The rate of convergence depends on the distribution. Heavy-tailed distributions like the log-normal need more samples before the histogram settles down. Distributions with bounded support, like the beta or uniform, converge quickly. Try it — switch between distributions with the sample size set high, and watch how fast each one converges.