Discrete vs. Categorical

Both Discrete and Categorical represent finite (discrete) distributions parameterized by a vector of non-negative weights.

They differ in what they return:

Discrete({ps: ...}) returns an index in {0, 1, ..., ps.length - 1}.
Categorical({ps: ..., vs: ...}) returns the corresponding value from vs.

If you remember only one thing, remember this: Discrete returns an index; Categorical returns a value.

Constructors

Discrete

Discrete({ps: ...})

ps: list/array of non-negative numbers (weights)
return value: an integer index

Categorical

Categorical({ps: ..., vs: ...})

vs: list/array of values (any type)
ps: list/array of non-negative numbers (weights), same length as vs

Uniform categorical (omit `ps`)

If you omit ps:

Categorical({vs: vs})

you get a uniform distribution over the values in vs.

Important: unnormalized weights

For both constructors, ps may be unnormalized. That is, ps is treated as weights and then internally normalized:

P(i) ∝ ps[i]

Example: ps = [1, 3, 6] corresponds to probabilities [0.1, 0.3, 0.6].

(Contrast: Multinomial requires a normalized probability vector.)

Shape and typing rules (common gotchas)

ps must contain non-negative numbers.
In Categorical, ps.length must equal vs.length.
Discrete returns an index; to map to an actual value, you must index into your own vs list.
score expects the same type that sample would return: - Discrete({ps}).score(k) expects an integer index k. - Categorical({ps, vs}).score(v) expects a value from ``vs``.

Executable example

var vs = ["red", "green", "blue"];
var weights = [1, 3, 6]; // Unnormalized weights are allowed for Discrete/Categorical.

var dIndex = Discrete({ps: weights});
var dValue = Categorical({ps: weights, vs: vs});
var dUnif = Categorical({vs: vs}); // ps omitted -> uniform over vs

var i = sample(dIndex);
var v = sample(dValue);
var u = sample(dUnif);

// Show the implied probabilities explicitly (normalize weights)
var z = sum(weights);
var ps = map(function(w) { return w / z; }, weights);

var out = {
  vs: vs,
  weights: weights,
  normalized_probs: ps,

  discrete_sample_index: i,
  discrete_mapped_value: vs[i],

  categorical_sample_value: v,
  uniform_categorical_sample_value: u,

  // score expects the same type as sample returns:
  discrete_score_of_index_2: dIndex.score(2),
  categorical_score_of_blue: dValue.score("blue")
};

out;

{
  vs: [ 'red', 'green', 'blue' ],
  weights: [ 1, 3, 6 ],
  normalized_probs: [ 0.1, 0.3, 0.6 ],
  discrete_sample_index: 0,
  discrete_mapped_value: 'red',
  categorical_sample_value: 'red',
  uniform_categorical_sample_value: 'green',
  discrete_score_of_index_2: -0.5108256237659907,
  categorical_score_of_blue: -0.5108256237659907
}

Real-life pattern: weighted choice among actions

A common use case is selecting among options with different prior plausibilities (e.g. “try easy fix”, “reboot”, “ask for help”).

Categorical is often more convenient than Discrete here because it returns the option directly.

Tip: if you later need to attach more information, you can store objects in vs (e.g. {name: ..., cost: ...}).

Measurement scales note

Categorical/Discrete are foundational because they match the first measurement levels in the classic scale taxonomy: nominal (categories with no inherent order) and ordinal (categories with an order). At these levels, observations are not “quantities” in the arithmetic sense but labels (nominal) or orderable labels (ordinal), so the natural probabilistic model is a finite choice among outcomes—exactly what Categorical/Discrete represent. For interval and ratio scales, differences and ratios are meaningful and one often uses continuous (or otherwise quantitative) distributions—though discretization is always possible when appropriate.

Examples: nominal vs ordinal

This example demonstrates how Categorical naturally models the first two measurement levels:

Nominal: outcomes are labels with no inherent order (e.g. colors).
Ordinal: outcomes are still labels, but we interpret them as ordered (e.g. low < medium < high).

The key point is that Categorical itself does not “know” about order. It simply returns one of the values in vs according to the weights in ps. If you want to perform numeric operations that rely on order (for example, compute an “average level”), you must explicitly encode your ordinal labels as ranks (e.g. low→1, medium→2, high→3).

The code below therefore prints three exact (enumerated) distributions:

a nominal distribution over color labels,
an ordinal distribution over level labels (still just labels),
the same ordinal distribution after mapping labels to numeric ranks, which makes quantities like expected rank well-defined.

// Measurement scales: nominal vs ordinal with Categorical.
// We show the full distribution exactly via enumeration (deterministic output).

var summarize = function(d) {
  var supp = d.support();
  var probs = map(function(v) { return Math.exp(d.score(v)); }, supp);
  return {support: supp, probs: probs, sum: sum(probs)};
};

// NOMINAL: categories have no inherent order (e.g., colors).
var colors = ["red", "green", "blue"];
var colorWeights = [1, 3, 6]; // unnormalized weights OK

var colorDist = Infer({
  method: "enumerate",
  model: function() {
    return sample(Categorical({ps: colorWeights, vs: colors}));
  }
});

// ORDINAL: categories have an order (e.g., low < medium < high),
// but Categorical itself still just returns labels.
var levels = ["low", "medium", "high"];
var levelWeights = [2, 5, 3];

var levelDist = Infer({
  method: "enumerate",
  model: function() {
    return sample(Categorical({ps: levelWeights, vs: levels}));
  }
});

// If you need to do numeric operations (e.g., compute an expected "level"),
// you must explicitly map labels to ranks yourself:
var rank = function(x) {
  return x === "low" ? 1 : (x === "medium" ? 2 : 3);
};

var rankDist = Infer({
  method: "enumerate",
  model: function() {
    return rank(sample(Categorical({ps: levelWeights, vs: levels})));
  }
});

var out = {
  nominal_colors: summarize(colorDist),
  ordinal_levels_as_labels: summarize(levelDist),
  ordinal_levels_as_ranks: summarize(rankDist)
};

out;

{
  nominal_colors: {
    support: [ 'blue', 'green', 'red' ],
    probs: [ 0.6, 0.29999999999999993, 0.10000000000000002 ],
    sum: 1
  },
  ordinal_levels_as_labels: {
    support: [ 'high', 'medium', 'low' ],
    probs: [ 0.29999999999999993, 0.5000000000000001, 0.2 ],
    sum: 1
  },
  ordinal_levels_as_ranks: {
    support: [ 1, 2, 3 ],
    probs: [ 0.2, 0.5000000000000001, 0.29999999999999993 ],
    sum: 1
  }
}