Categorical distribution


Story

A probability is assigned to each of a set of discrete outcomes.


Example

A hen will peck at grain A with probability \(\theta_\mathrm{A}\), grain B with probability \(\theta_\mathrm{B}\), and grain C with probability \(\theta_\mathrm{C}\).


Parameters

The distribution is parametrized by the probabilities assigned to each event. We define \(\theta_y\) to be the probability assigned to outcome \(y\). The set of \(\theta_y\)’s are the parameters, and are constrained by

\[\begin{align} \sum_y \theta_y = 1. \end{align}\]

Support

If we index the categories with sequential integers from 1 to N, the distribution is supported for integers 1 to N, inclusive when described using the indices of the categories.


Probability mass function

\[\begin{align} f(y;\{\theta)y\}) = \theta_y \end{align}\]

Moments

Moments are not defined for a Categorical distribution because the value of \(y\) is not necessarily numeric.


Usage

Package

Syntax

NumPy

rg.choice(len(theta), p=theta)

SciPy

scipy.stats.rv_discrete(values=(range(len(theta)), theta)).rvs()

Stan

categorical(theta)



Notes

  • This distribution must be manually constructed if you are using the scipy.stats module using scipy.stats.rv_discrete(). The categories need to be encoded by an index. For interactive plotting purposes, below, we need to specify a custom PMF and CDF.

  • To sample out of a Categorical distribution, use numpy.random.choice(), specifying the values of \(\theta\) using the p kwarg.


PMF and CDF plots