Beta

The Beta distribution is a continuous probability distribution defined on the interval \([0, 1]\). It is most commonly used in Bayesian statistics to model uncertainty about a probability or a proportion (such as a click-through rate, a success rate, or a coin’s fairness).

Intuition:

While the Binomial distribution counts successes, the Beta distribution models the probability of success itself.
If you think of a probability \(p\) as a random variable, the Beta distribution describes which values of \(p\) are more or less likely based on prior knowledge or observed data.

What question does the Beta answer?

The Beta distribution answers the following core question:

Given that I have seen \(\alpha - 1\) successes and \(\beta - 1\) failures (or have equivalent prior belief), what is the probability distribution over the unknown parameter \(p \in [0, 1]\)?

Closed form (probability density function)

For \(x \in [0, 1]\) and shape parameters \(\alpha > 0\), \(\beta > 0\):

\[f(x; \alpha, \beta) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha, \beta)}\]

where the denominator \(B(\alpha, \beta)\) is the Beta function, which ensures the total area under the curve is 1.

Intuition (Alpha and Beta as “pseudo-counts”)

A very helpful way to think about the parameters \(\alpha\) (alpha) and \(\beta\) (beta) is as “counts” of observations:

\(\alpha - 1\): The number of “successes” previously seen.
\(\beta - 1\): The number of “failures” previously seen.

For example:

Beta({a: 1, b: 1}) is a Uniform distribution (0 successes, 0 failures; every value of \(p\) is equally likely).
Beta({a: 10, b: 10}) is a symmetric bell-shaped curve centered at \(0.5\).
Beta({a: 2, b: 8}) is skewed toward low values (more failures than successes).

Constructor

Beta({a: ..., b: ...})

a: shape parameter 1 (alpha), real number > 0
b: shape parameter 2 (beta), real number > 0
support: real numbers in [0, 1]

Interactive Visualization

Experiment with the \(\alpha\) and \(\beta\) parameters below to see how the shape of the Beta distribution changes in real-time.

Alpha (α):

Beta (β):

Relationship to Binomial (Conjugacy)

The Beta distribution is the conjugate prior for the Binomial likelihood. This is the core of simple Bayesian updating:

Start with a Prior: \(p \sim Beta(\alpha, \beta)\)
Observe Data: Observe \(k\) successes and \(n-k\) failures in \(n\) Binomial trials.
Compute Posterior: The updated distribution for \(p\) is exactly \(Beta(\alpha + k, \beta + n - k)\).

Typical use cases

Modeling uncertainty about a success probability \(p\).
Serving as a prior distribution for Bernoulli or Binomial observation models.
Representing any continuous random variable that is strictly bounded between 0 and 1.

Executable example: basics (samples and score)

// Define a Beta distribution with alpha=2, beta=5
var d = Beta({a: 2, b: 5});

var s = sample(d);
display("Sample from Beta(2, 5): " + s);

var logDensity = d.score(0.3);
display("Log density at 0.3: " + logDensity);
display("Density at 0.3: " + Math.exp(logDensity));

Sample from Beta(2, 5): 0.24728983708004426
Log density at 0.3: 0.7705248015812884
Density at 0.3: 2.160899999999997
undefined

Scoring

d.score(x) is the log density of the value x in [0, 1].

Because the Beta is a continuous distribution, score(x) represents the log of the Probability Density Function (PDF), not a discrete probability.

\[d.score(x) = (\alpha - 1) \log x + (\beta - 1) \log (1 - x) - \log B(\alpha, \beta)\]

Executable example: Bayesian updating

Story: We start with a “Flat” (Uniform) prior because we know nothing about a coin. We then observe 8 heads out of 10 tosses. We use observe to see how our belief about the coin’s fairness updates.

var model = function() {
  // Prior: We know nothing, so we use Beta(1, 1) which is Uniform(0, 1)
  var p = sample(Beta({a: 1, b: 1}));
  
  // Data: We observe 8 successes (heads) out of 10 trials
  observe(Binomial({p: p, n: 10}), 8);
  
  return p;
};

var posterior = Infer({model: model, method: 'MCMC', samples: 5000});

display("Posterior Expected value of p: " + expectation(posterior));

Posterior Expected value of p: 0.746026880369335
undefined