Beta

The Beta distribution is a continuous probability distribution defined on the interval \([0, 1]\). It is most commonly used in Bayesian statistics to model uncertainty about a probability or a proportion (such as a click-through rate, a success rate, or a coin’s fairness).

Intuition:

  • While the Binomial distribution counts successes, the Beta distribution models the probability of success itself.

  • If you think of a probability \(p\) as a random variable, the Beta distribution describes which values of \(p\) are more or less likely based on prior knowledge or observed data.

What question does the Beta answer?

The Beta distribution answers the following core question:

Given that I have seen \(\alpha - 1\) successes and \(\beta - 1\) failures (or have equivalent prior belief), what is the probability distribution over the unknown parameter \(p \in [0, 1]\)?

Closed form (probability density function)

For \(x \in [0, 1]\) and shape parameters \(\alpha > 0\), \(\beta > 0\):

\[f(x; \alpha, \beta) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha, \beta)}\]

where the denominator \(B(\alpha, \beta)\) is the Beta function, which ensures the total area under the curve is 1.

Intuition (Alpha and Beta as “pseudo-counts”)

A very helpful way to think about the parameters \(\alpha\) (alpha) and \(\beta\) (beta) is as “counts” of observations:

  • \(\alpha - 1\): The number of “successes” previously seen.

  • \(\beta - 1\): The number of “failures” previously seen.

For example:

  • Beta({a: 1, b: 1}) is a Uniform distribution (0 successes, 0 failures; every value of \(p\) is equally likely).

  • Beta({a: 10, b: 10}) is a symmetric bell-shaped curve centered at \(0.5\).

  • Beta({a: 2, b: 8}) is skewed toward low values (more failures than successes).

Constructor

Beta({a: ..., b: ...})

  • a: shape parameter 1 (alpha), real number > 0

  • b: shape parameter 2 (beta), real number > 0

  • support: real numbers in [0, 1]

Interactive Visualization

Experiment with the \(\alpha\) and \(\beta\) parameters below to see how the shape of the Beta distribution changes in real-time.

Relationship to Binomial (Conjugacy)

The Beta distribution is the conjugate prior for the Binomial likelihood. This is the core of simple Bayesian updating:

  1. Start with a Prior: \(p \sim Beta(\alpha, \beta)\)

  2. Observe Data: Observe \(k\) successes and \(n-k\) failures in \(n\) Binomial trials.

  3. Compute Posterior: The updated distribution for \(p\) is exactly \(Beta(\alpha + k, \beta + n - k)\).

Typical use cases

  • Modeling uncertainty about a success probability \(p\).

  • Serving as a prior distribution for Bernoulli or Binomial observation models.

  • Representing any continuous random variable that is strictly bounded between 0 and 1.

Executable example: basics (samples and score)

1// Define a Beta distribution with alpha=2, beta=5
2var d = Beta({a: 2, b: 5});
3
4var s = sample(d);
5display("Sample from Beta(2, 5): " + s);
6
7var logDensity = d.score(0.3);
8display("Log density at 0.3: " + logDensity);
9display("Density at 0.3: " + Math.exp(logDensity));
Sample from Beta(2, 5): 0.24728983708004426
Log density at 0.3: 0.7705248015812884
Density at 0.3: 2.160899999999997
undefined

Scoring

d.score(x) is the log density of the value x in [0, 1].

Because the Beta is a continuous distribution, score(x) represents the log of the Probability Density Function (PDF), not a discrete probability.

\[d.score(x) = (\alpha - 1) \log x + (\beta - 1) \log (1 - x) - \log B(\alpha, \beta)\]

Executable example: Bayesian updating

Story: We start with a “Flat” (Uniform) prior because we know nothing about a coin. We then observe 8 heads out of 10 tosses. We use observe to see how our belief about the coin’s fairness updates.

 1var model = function() {
 2  // Prior: We know nothing, so we use Beta(1, 1) which is Uniform(0, 1)
 3  var p = sample(Beta({a: 1, b: 1}));
 4  
 5  // Data: We observe 8 successes (heads) out of 10 trials
 6  observe(Binomial({p: p, n: 10}), 8);
 7  
 8  return p;
 9};
10
11var posterior = Infer({model: model, method: 'MCMC', samples: 5000});
12
13display("Posterior Expected value of p: " + expectation(posterior));
Posterior Expected value of p: 0.746026880369335
undefined