Sampling and scoring

This page is the “root” reference for WebPPL’s two most important building blocks:

sample(dist[, opts]): create a random choice by drawing a value from a distribution object
dist.score(value): compute the log probability / log density assigned to a value

If you are ever unsure about syntax, shapes, or what an option does, come back here first.

Quick glossary (read this once)

distribution object

An object representing a probability distribution, such as Bernoulli({p: 0.7}) or Gaussian({mu: 0, sigma: 1}). Distribution objects support at least:

sampling: via sample(dist)
scoring: via dist.score(value)

random choice (sample site)

A place in your program where sample(...) is called. During inference, WebPPL treats each sample site as a stochastic “choice” whose value can be explored.

log probability / log density

WebPPL uses natural log values (base e). For discrete distributions, score returns log P(X = value). For continuous distributions, it returns a log density (not a probability).

inference

The process of turning a stochastic program (a model) into a distribution on its return values, usually approximately (e.g. via MCMC, SMC, etc.).

guide distribution

An auxiliary distribution used by some inference strategies as a proposal / approximation. It does not change the model itself; it changes how inference explores it.

drift kernel

A proposal mechanism for MCMC (MH-based) methods. It proposes new values based on the previous value at a sample site.

Distribution objects in one minute

A distribution object represents a distribution, and has two principal uses:

Draw samples from it using sample(dist).
Compute the (natural) log probability / density of a value using dist.score(value).

See also: the Distributions overview page.

Sampling: `sample(dist[, opts])`

Basic form

Use sample(dist) to draw one value from a distribution object.

For Bernoulli({p: ...}) the result is a boolean (true/false).
For continuous distributions like Gaussian(...) the result is a number.

Working example: one sample + scoring the sample

var d = Bernoulli({p: 0.7});
var x = sample(d);
var out = {
  sample: x,
  logp_of_sample: d.score(x),
  logp_true: d.score(true),
  logp_false: d.score(false)
};

out;

{
  sample: true,
  logp_of_sample: -0.35667494393873245,
  logp_true: -0.35667494393873245,
  logp_false: -1.203972804325936
}

Scoring: `dist.score(value)`

dist.score(value) returns the natural log of the probability (discrete) or density (continuous) that dist assigns to value.

Two practical notes:

Log space is used because probabilities can get extremely small.
If you ever need the probability (discrete), you can convert with Math.exp(logp).

For Bernoulli in particular:

score(true) = log(p)
score(false) = log(1 - p)

(See also the Bernoulli page.)

From log probability to probability

WebPPL’s score returns values in log space (natural log). To convert a single log probability back to an ordinary probability:

p = Math.exp(logp)

Example (Bernoulli)

var d = Bernoulli({p: 0.7});

var logpTrue = d.score(true);   // log(0.7)
var logpFalse = d.score(false); // log(0.3)

var out = {
  logpTrue: logpTrue,
  pTrue: Math.exp(logpTrue),

  logpFalse: logpFalse,
  pFalse: Math.exp(logpFalse),

  checkSum: Math.exp(logpTrue) + Math.exp(logpFalse)
};

out;

{
  logpTrue: -0.35667494393873245,
  pTrue: 0.7,
  logpFalse: -1.203972804325936,
  pFalse: 0.30000000000000004,
  checkSum: 1
}

Normalizing a set of log scores (stable softmax)

When you have several log scores (e.g. for multiple outcomes), computing probabilities as Math.exp(logp) and then normalizing can underflow.

A numerically stable pattern is:

subtract the maximum log score
exponentiate
normalize

// Normalize log-scores stably using the "subtract max" trick (log-sum-exp style).

var normalizeLogProbs = function(logps) {
  var m = reduce(function(a, b) { return a > b ? a : b; }, -Infinity, logps);
  var shifted = map(function(x) { return x - m; }, logps);
  var ws = map(Math.exp, shifted);
  var z = sum(ws);
  return map(function(w) { return w / z; }, ws);
};

// A small example with Categorical-like weights
var logps = [Math.log(1), Math.log(2), Math.log(7)];
var ps = normalizeLogProbs(logps);

var out = {
  logps: logps,
  probs: ps,
  sum: sum(ps)
};

out;

The optional second argument to `sample`

sample also accepts an optional second argument opts:

sample(dist, {guide: ...})
sample(dist, {driftKernel: ...})

These are inference controls: they affect how inference proposes values at a sample site, but they do not change the intended target distribution of the model.

Guide distributions

Definition (what is a guide?)

A guide distribution is an auxiliary distribution that some inference strategies can use instead of sampling directly from the model’s distribution at a sample site.

Syntax

A guide distribution is specified like this:

sample(dist, {guide: function() { return guideDist; }})

Where guideDist is another distribution object (e.g. a Gaussian with different parameters).

When does it matter?

It matters when the inference method is told to use guides. For example, forward sampling has an option guide: true that samples random choices from guides. This is useful for debugging and for making the effect visible.

Working example: forward sampling from model vs from guide

var model = function() { return sample(Gaussian({mu: 0, sigma: 1}), {
    guide: function() {
      return Gaussian({mu: 2, sigma: 1});
    }
  });
};

// One forward sample (as a distribution with 1 particle), then take that sample.
var oneForward = function(useGuide) {
  return Infer({method: 'forward', samples: 1, guide: useGuide, model: model}).sample();
};

var out = {
  fromModel: repeat(5, function() { return oneForward(false); }),
  fromGuide: repeat(5, function() { return oneForward(true); })
};

out;

{
  fromModel: [
    0.11924522582887023,
    0.38808096549942767,
    0.7184860219660659,
    0.22119172616379196,
    -1.390182678896135
  ],
  fromGuide: [
    1.6537032695322873,
    2.6175423879712083,
    2.022255309098623,
    2.816055576307594,
    2.033318965474878
  ]
}

Drift kernels

Definition (what is a drift kernel?)

A drift kernel is a function that maps the previous value of a random choice to a proposal distribution. It is mainly used by MH-based MCMC methods.

In other words: it tells MCMC how to propose a “nearby” value instead of proposing from the prior.

Syntax

A drift kernel is specified like this:

sample(dist, {driftKernel: function(prevVal) { return proposalDist; }})

Working example: MCMC with and without a drift kernel

var y = 0.0;

// A local random-walk proposal centered on the previous value.
var gaussianKernel = function(prevVal) {
  return Gaussian({mu: prevVal, sigma: 0.25});
};

var modelNoDrift = function() {
  var x = sample(Gaussian({mu: 0, sigma: 1}));
  observe(Gaussian({mu: x, sigma: 0.5}), y);
  return x;
};

var modelWithDrift = function() {
  var x = sample(Gaussian({mu: 0, sigma: 1}), {driftKernel: gaussianKernel});
  observe(Gaussian({mu: x, sigma: 0.5}), y);
  return x;
};

var postNo = Infer({method: 'MCMC', samples: 200, burn: 50, lag: 0, model: modelNoDrift});
var postYes = Infer({method: 'MCMC', samples: 200, burn: 50, lag: 0, model: modelWithDrift});

var out = {
  noDrift: repeat(5, function() { return postNo.sample(); }),
  withDrift: repeat(5, function() { return postYes.sample(); })
};

out;

{
  noDrift: [
    -0.06646820994639271,
    -1.4559684714821435,
    -0.19874645123021997,
    -0.3027843672245508,
    0.391805746701598
  ],
  withDrift: [
    -0.6810996596203992,
    0.8014740290586883,
    -0.2434066210167739,
    -0.40017017186879256,
    -0.33440032947124126
  ]
}

How to run examples locally

From the repository root you can run any example with:

npx webppl examples/distributions/<file>.wppl --random-seed 0

(We use --random-seed in the docs so outputs stay reproducible.)

Sampling and scoring

Quick glossary (read this once)

Distribution objects in one minute

Sampling: sample(dist[, opts])

Basic form

Working example: one sample + scoring the sample

Scoring: dist.score(value)

From log probability to probability

Example (Bernoulli)

Normalizing a set of log scores (stable softmax)

The optional second argument to sample

Guide distributions

Definition (what is a guide?)

Syntax

When does it matter?

Working example: forward sampling from model vs from guide

Drift kernels

Definition (what is a drift kernel?)

Syntax

Working example: MCMC with and without a drift kernel

How to run examples locally

Sampling: `sample(dist[, opts])`

Scoring: `dist.score(value)`

The optional second argument to `sample`