{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial 3b: Probability distributions and their stories\n", "\n", "(c) 2018 Justin Bois. With the exception of pasted graphics, where the source is noted, this work is licensed under a [Creative Commons Attribution License CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/). All code contained herein is licensed under an [MIT license](https://opensource.org/licenses/MIT).\n", "\n", "This document was prepared at [Caltech](http://www.caltech.edu) with financial support from the [Donna and Benjamin M. Rosen Bioengineering Center](http://rosen.caltech.edu).\n", "\n", "\n", "\n", "*This tutorial was generated from an Jupyter notebook. If you want to use the interactivity to explore probability distributions, you will need to download the notebook and run it on your machine. You can download the notebook [here](t3b_probability_stories.ipynb).*" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", " \n", " Loading BokehJS ...\n", "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "\n", "(function(root) {\n", " function now() {\n", " return new Date();\n", " }\n", "\n", " var force = true;\n", "\n", " if (typeof (root._bokeh_onload_callbacks) === \"undefined\" || force === true) {\n", " root._bokeh_onload_callbacks = [];\n", " root._bokeh_is_loading = undefined;\n", " }\n", "\n", " var JS_MIME_TYPE = 'application/javascript';\n", " var HTML_MIME_TYPE = 'text/html';\n", " var EXEC_MIME_TYPE = 'application/vnd.bokehjs_exec.v0+json';\n", " var CLASS_NAME = 'output_bokeh rendered_html';\n", "\n", " /**\n", " * Render data to the DOM node\n", " */\n", " function render(props, node) {\n", " var script = document.createElement(\"script\");\n", " node.appendChild(script);\n", " }\n", "\n", " /**\n", " * Handle when an output is cleared or removed\n", " */\n", " function handleClearOutput(event, handle) {\n", " var cell = handle.cell;\n", "\n", " var id = cell.output_area._bokeh_element_id;\n", " var server_id = cell.output_area._bokeh_server_id;\n", " // Clean up Bokeh references\n", " if (id != null && id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", "\n", " if (server_id !== undefined) {\n", " // Clean up Bokeh references\n", " var cmd = \"from bokeh.io.state import curstate; print(curstate().uuid_to_server['\" + server_id + \"'].get_sessions()[0].document.roots[0]._id)\";\n", " cell.notebook.kernel.execute(cmd, {\n", " iopub: {\n", " output: function(msg) {\n", " var id = msg.content.text.trim();\n", " if (id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", " }\n", " }\n", " });\n", " // Destroy server and session\n", " var cmd = \"import bokeh.io.notebook as ion; ion.destroy_server('\" + server_id + \"')\";\n", " cell.notebook.kernel.execute(cmd);\n", " }\n", " }\n", "\n", " /**\n", " * Handle when a new output is added\n", " */\n", " function handleAddOutput(event, handle) {\n", " var output_area = handle.output_area;\n", " var output = handle.output;\n", "\n", " // limit handleAddOutput to display_data with EXEC_MIME_TYPE content only\n", " if ((output.output_type != \"display_data\") || (!output.data.hasOwnProperty(EXEC_MIME_TYPE))) {\n", " return\n", " }\n", "\n", " var toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n", "\n", " if (output.metadata[EXEC_MIME_TYPE][\"id\"] !== undefined) {\n", " toinsert[toinsert.length - 1].firstChild.textContent = output.data[JS_MIME_TYPE];\n", " // store reference to embed id on output_area\n", " output_area._bokeh_element_id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n", " }\n", " if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n", " var bk_div = document.createElement(\"div\");\n", " bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n", " var script_attrs = bk_div.children[0].attributes;\n", " for (var i = 0; i < script_attrs.length; i++) {\n", " toinsert[toinsert.length - 1].firstChild.setAttribute(script_attrs[i].name, script_attrs[i].value);\n", " }\n", " // store reference to server id on output_area\n", " output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n", " }\n", " }\n", "\n", " function register_renderer(events, OutputArea) {\n", "\n", " function append_mime(data, metadata, element) {\n", " // create a DOM node to render to\n", " var toinsert = this.create_output_subarea(\n", " metadata,\n", " CLASS_NAME,\n", " EXEC_MIME_TYPE\n", " );\n", " this.keyboard_manager.register_events(toinsert);\n", " // Render to node\n", " var props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n", " render(props, toinsert[toinsert.length - 1]);\n", " element.append(toinsert);\n", " return toinsert\n", " }\n", "\n", " /* Handle when an output is cleared or removed */\n", " events.on('clear_output.CodeCell', handleClearOutput);\n", " events.on('delete.Cell', handleClearOutput);\n", "\n", " /* Handle when a new output is added */\n", " events.on('output_added.OutputArea', handleAddOutput);\n", "\n", " /**\n", " * Register the mime type and append_mime function with output_area\n", " */\n", " OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n", " /* Is output safe? */\n", " safe: true,\n", " /* Index of renderer in `output_area.display_order` */\n", " index: 0\n", " });\n", " }\n", "\n", " // register the mime type if in Jupyter Notebook environment and previously unregistered\n", " if (root.Jupyter !== undefined) {\n", " var events = require('base/js/events');\n", " var OutputArea = require('notebook/js/outputarea').OutputArea;\n", "\n", " if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n", " register_renderer(events, OutputArea);\n", " }\n", " }\n", "\n", " \n", " if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n", " root._bokeh_timeout = Date.now() + 5000;\n", " root._bokeh_failed_load = false;\n", " }\n", "\n", " var NB_LOAD_WARNING = {'data': {'text/html':\n", " \"
\\n\"+\n", " \"

\\n\"+\n", " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", " \"

\\n\"+\n", " \"\\n\"+\n", " \"\\n\"+\n", " \"from bokeh.resources import INLINE\\n\"+\n", " \"output_notebook(resources=INLINE)\\n\"+\n", " \"\\n\"+\n", " \"
\"}};\n", "\n", " function display_loaded() {\n", " var el = document.getElementById(\"36ecd046-1f82-4cba-a2b6-3619cc74d722\");\n", " if (el != null) {\n", " el.textContent = \"BokehJS is loading...\";\n", " }\n", " if (root.Bokeh !== undefined) {\n", " if (el != null) {\n", " el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n", " }\n", " } else if (Date.now() < root._bokeh_timeout) {\n", " setTimeout(display_loaded, 100)\n", " }\n", " }\n", "\n", "\n", " function run_callbacks() {\n", " try {\n", " root._bokeh_onload_callbacks.forEach(function(callback) { callback() });\n", " }\n", " finally {\n", " delete root._bokeh_onload_callbacks\n", " }\n", " console.info(\"Bokeh: all callbacks have finished\");\n", " }\n", "\n", " function load_libs(js_urls, callback) {\n", " root._bokeh_onload_callbacks.push(callback);\n", " if (root._bokeh_is_loading > 0) {\n", " console.log(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n", " return null;\n", " }\n", " if (js_urls == null || js_urls.length === 0) {\n", " run_callbacks();\n", " return null;\n", " }\n", " console.log(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n", " root._bokeh_is_loading = js_urls.length;\n", " for (var i = 0; i < js_urls.length; i++) {\n", " var url = js_urls[i];\n", " var s = document.createElement('script');\n", " s.src = url;\n", " s.async = false;\n", " s.onreadystatechange = s.onload = function() {\n", " root._bokeh_is_loading--;\n", " if (root._bokeh_is_loading === 0) {\n", " console.log(\"Bokeh: all BokehJS libraries loaded\");\n", " run_callbacks()\n", " }\n", " };\n", " s.onerror = function() {\n", " console.warn(\"failed to load library \" + url);\n", " };\n", " console.log(\"Bokeh: injecting script tag for BokehJS library: \", url);\n", " document.getElementsByTagName(\"head\")[0].appendChild(s);\n", " }\n", " };var element = document.getElementById(\"36ecd046-1f82-4cba-a2b6-3619cc74d722\");\n", " if (element == null) {\n", " console.log(\"Bokeh: ERROR: autoload.js configured with elementid '36ecd046-1f82-4cba-a2b6-3619cc74d722' but no matching script tag was found. \")\n", " return false;\n", " }\n", "\n", " var js_urls = [\"https://cdn.pydata.org/bokeh/release/bokeh-0.13.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.13.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-tables-0.13.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-gl-0.13.0.min.js\"];\n", "\n", " var inline_js = [\n", " function(Bokeh) {\n", " Bokeh.set_log_level(\"info\");\n", " },\n", " \n", " function(Bokeh) {\n", " \n", " },\n", " function(Bokeh) {\n", " console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-0.13.0.min.css\");\n", " Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-0.13.0.min.css\");\n", " console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.13.0.min.css\");\n", " Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.13.0.min.css\");\n", " console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-tables-0.13.0.min.css\");\n", " Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-tables-0.13.0.min.css\");\n", " }\n", " ];\n", "\n", " function run_inline_js() {\n", " \n", " if ((root.Bokeh !== undefined) || (force === true)) {\n", " for (var i = 0; i < inline_js.length; i++) {\n", " inline_js[i].call(root, root.Bokeh);\n", " }if (force === true) {\n", " display_loaded();\n", " }} else if (Date.now() < root._bokeh_timeout) {\n", " setTimeout(run_inline_js, 100);\n", " } else if (!root._bokeh_failed_load) {\n", " console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n", " root._bokeh_failed_load = true;\n", " } else if (force !== true) {\n", " var cell = $(document.getElementById(\"36ecd046-1f82-4cba-a2b6-3619cc74d722\")).parents('.cell').data().cell;\n", " cell.output_area.append_execute_result(NB_LOAD_WARNING)\n", " }\n", "\n", " }\n", "\n", " if (root._bokeh_is_loading === 0) {\n", " console.log(\"Bokeh: BokehJS loaded, going straight to plotting\");\n", " run_inline_js();\n", " } else {\n", " load_libs(js_urls, function() {\n", " console.log(\"Bokeh: BokehJS plotting callback run at\", now());\n", " run_inline_js();\n", " });\n", " }\n", "}(window));" ], "application/vnd.bokehjs_load.v0+json": "\n(function(root) {\n function now() {\n return new Date();\n }\n\n var force = true;\n\n if (typeof (root._bokeh_onload_callbacks) === \"undefined\" || force === true) {\n root._bokeh_onload_callbacks = [];\n root._bokeh_is_loading = undefined;\n }\n\n \n\n \n if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n root._bokeh_timeout = Date.now() + 5000;\n root._bokeh_failed_load = false;\n }\n\n var NB_LOAD_WARNING = {'data': {'text/html':\n \"
\\n\"+\n \"

\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"

\\n\"+\n \"\\n\"+\n \"\\n\"+\n \"from bokeh.resources import INLINE\\n\"+\n \"output_notebook(resources=INLINE)\\n\"+\n \"\\n\"+\n \"
\"}};\n\n function display_loaded() {\n var el = document.getElementById(\"36ecd046-1f82-4cba-a2b6-3619cc74d722\");\n if (el != null) {\n el.textContent = \"BokehJS is loading...\";\n }\n if (root.Bokeh !== undefined) {\n if (el != null) {\n el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(display_loaded, 100)\n }\n }\n\n\n function run_callbacks() {\n try {\n root._bokeh_onload_callbacks.forEach(function(callback) { callback() });\n }\n finally {\n delete root._bokeh_onload_callbacks\n }\n console.info(\"Bokeh: all callbacks have finished\");\n }\n\n function load_libs(js_urls, callback) {\n root._bokeh_onload_callbacks.push(callback);\n if (root._bokeh_is_loading > 0) {\n console.log(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n return null;\n }\n if (js_urls == null || js_urls.length === 0) {\n run_callbacks();\n return null;\n }\n console.log(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n root._bokeh_is_loading = js_urls.length;\n for (var i = 0; i < js_urls.length; i++) {\n var url = js_urls[i];\n var s = document.createElement('script');\n s.src = url;\n s.async = false;\n s.onreadystatechange = s.onload = function() {\n root._bokeh_is_loading--;\n if (root._bokeh_is_loading === 0) {\n console.log(\"Bokeh: all BokehJS libraries loaded\");\n run_callbacks()\n }\n };\n s.onerror = function() {\n console.warn(\"failed to load library \" + url);\n };\n console.log(\"Bokeh: injecting script tag for BokehJS library: \", url);\n document.getElementsByTagName(\"head\")[0].appendChild(s);\n }\n };var element = document.getElementById(\"36ecd046-1f82-4cba-a2b6-3619cc74d722\");\n if (element == null) {\n console.log(\"Bokeh: ERROR: autoload.js configured with elementid '36ecd046-1f82-4cba-a2b6-3619cc74d722' but no matching script tag was found. \")\n return false;\n }\n\n var js_urls = [\"https://cdn.pydata.org/bokeh/release/bokeh-0.13.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.13.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-tables-0.13.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-gl-0.13.0.min.js\"];\n\n var inline_js = [\n function(Bokeh) {\n Bokeh.set_log_level(\"info\");\n },\n \n function(Bokeh) {\n \n },\n function(Bokeh) {\n console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-0.13.0.min.css\");\n Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-0.13.0.min.css\");\n console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.13.0.min.css\");\n Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.13.0.min.css\");\n console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-tables-0.13.0.min.css\");\n Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-tables-0.13.0.min.css\");\n }\n ];\n\n function run_inline_js() {\n \n if ((root.Bokeh !== undefined) || (force === true)) {\n for (var i = 0; i < inline_js.length; i++) {\n inline_js[i].call(root, root.Bokeh);\n }if (force === true) {\n display_loaded();\n }} else if (Date.now() < root._bokeh_timeout) {\n setTimeout(run_inline_js, 100);\n } else if (!root._bokeh_failed_load) {\n console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n root._bokeh_failed_load = true;\n } else if (force !== true) {\n var cell = $(document.getElementById(\"36ecd046-1f82-4cba-a2b6-3619cc74d722\")).parents('.cell').data().cell;\n cell.output_area.append_execute_result(NB_LOAD_WARNING)\n }\n\n }\n\n if (root._bokeh_is_loading === 0) {\n console.log(\"Bokeh: BokehJS loaded, going straight to plotting\");\n run_inline_js();\n } else {\n load_libs(js_urls, function() {\n console.log(\"Bokeh: BokehJS plotting callback run at\", now());\n run_inline_js();\n });\n }\n}(window));" }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import numpy as np\n", "import scipy.stats as st\n", "import scipy.special\n", "\n", "import bokeh.io\n", "import bokeh.plotting\n", "import bokeh.application\n", "import bokeh.application.handlers\n", "bokeh.io.output_notebook()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because this tutorial consists mainly of a list of probability distributions and their stories and properties, it is useful to have a little index of them up front.\n", "\n", "* [Discrete distributions](#Discrete-distributions)\n", " - [Bernoulli distribution](#Bernoulli-distribution)\n", " - [Geometric distribution](#Geometric-distribution)\n", " - [Negative Binomial distribution](#Negative-Binomial-distribution)\n", " - [Binomial distribution](#Binomial-distribution)\n", " - [Hypergeometric distribution](#Hypergeometric-distribution)\n", " - [Categorical distribution](#Categorical-distribution)\n", " - [Discrete Uniform distribution](#Discrete-Uniform-distribution)\n", "* [Continuous distributions](#Continuous-distributions)\n", " - [Uniform distribution](#Uniform-distribution)\n", " - [Gaussian (a.k.a. Normal) distribution](#Gaussian,-a.k.a.-Normal,-distribution)\n", " - [Log-Normal distribution](#Log-Normal-distribution)\n", " - [Chi-square distribution](#Chi-square-distribution)\n", " - [Student-t distribution](#Student-t-distribution)\n", " - [Cauchy distribution](#Cauchy-distribution)\n", " - [Exponential distribution](#Exponential-distribution)\n", " - [Gamma distribution](#Gamma-distribution)\n", " - [Weibull distribution](#Weibull-distribution)\n", " - [Beta distribution](#Beta-distribution)\n", "* [Discrete multivariate distributions](#Discrete-multivariate-distributions)\n", " - [Multinomial distribution](#Multinomial-distribution)\n", "* [Continuous multivariate distributions](#Continuous-Multivariate-distributions)\n", " - [Multivariate Gaussian (a.k.a. Multivariate Normal) distribution](#Multivariate-Gaussian,-a.k.a.-Multivariate-Normal,-distribution)\n", " - [Lewandowski-Kurowicka-Joe (LKJ) distribution](#Lewandowski-Kurowicka-Joe-%28LKJ%29-distribution)\n", " - [Dirichlet distribution](#Dirichlet-distribution)\n", "\n", "This tutorial also features interactive plotting, which needs a running Jupyter notebook to utilize. You should therefore download and run the notebook. When you launch your notebook, take note of the URL. You can find this at the top of your browser. It will be something like `localhost:8888`. Before you work through the notebook, define your notebook_url in the code cell below." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "notebook_url = 'localhost:8888'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Motivation\n", "\n", "In this class, we have talked about the three levels of building a model in the\n", "biological sciences.\n", "* **Cartoon model**: A cartoon or verbal phenomenological\n", " description of the system of interest.\n", "* **Mathematical model**: A mathematization of a cartoon model. This gives quantitative predictions on how a biological system should behave.\n", "* **Statistical model**: A description of what we expect from an\n", " experiment, given our mathematical model. In the Bayesian context, this is\n", " specification of the likelihood and prior.\n", "\n", "Take together, these models comprise a **generative model**, which describes how the data were generated. Specifying your model amounts to choosing a probability distributions that describe the process of data generation. In some cases, you need to derive the likelihood (or even numerically compute it when it cannot be written in closed form). In most practical cases, though, your model is composed of standard probability distributions. These distributions all have **stories** associated with them. Importantly, *if your data and model match the story of a distribution, you know that this is the distribution to choose for your likelihood.*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Review of probability distributions, PMFs, PDFs, and CDFs\n", "\n", "Before we begin talking about distributions, let's remind ourselves\n", "what probability distributions are. We cut some corners in our\n", "definitions here, but these definitions are functional for most of our\n", "data analysis purposes.\n", "\n", "A **probability mass function** (PMF), $f(x)$, describes the\n", "probability of a discrete variable obtaining value $x$. The variable\n", "$x$ takes on discrete values, so the **normalization condition**\n", "is\n", "\n", "\\begin{align}\n", "\\sum_x f(x) = 1.\n", "\\end{align}\n", "\n", "A **probability density function** (PDF), which we shall\n", "call $f(x)$, is defined such that the probability that the value of a\n", "continuous variable $x$ is $a \\le x \\le b$ is\n", "\n", "\\begin{align}\n", "\\int_a^b \\mathrm{d}x\\,f(x).\n", "\\end{align}\n", "\n", "A **cumulative distribution function** (CDF), denoted $F(x)$ is defined such that $F(x)$ is the probability that the value of a variable $X$ is less than or equal to $x$. For a discrete distribution\n", "\n", "\\begin{align}\n", "F(k) = \\sum_{k'=k_\\mathrm{min}}^k f(k'),\n", "\\end{align}\n", "\n", "where $k_\\mathrm{min}$ is the minimal value the variable can take, and for a continuous distribution,\n", "\n", "\\begin{align}\n", "F(x) = \\int_{-\\infty}^x \\mathrm{d}x'\\,f(x').\n", "\\end{align}\n", "\n", "
\n", "\n", "If a probability mass or density function depends on parameters, say $n$ and $\\theta$, we write it as $f(x;n, \\theta)$. There does not seem to be consensus on the best notation, and you may see this same quantity written as $P(x\\mid n, \\theta)$, for example." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sampling\n", "\n", "Given that we know a probability distribution, we can take\n", "*samples* out of it. This means that we can randomly draw\n", "numbers and the probability that we draw a certain number $x$ is\n", "proportional to the PMF or PDF, $f(x)$. Sampling\n", "out of a distribution is often easier than computing the distribution\n", "over a range of values because many of those values are zero. \n", "\n", "The `numpy.random` module and [Stan](http://mc-stan.org/) are power tools for sampling out of distributions. For each distribution I describe, I show how to specify them using NumPy and Stan (and also with the `scipy.stats` module)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Distributions and their stories\n", "\n", "In what follows, I will present some commonly used probability distributions and their associated stories. I give the following.\n", "- Name of the distribution\n", "- It story.\n", "- An example.\n", "- The parameters of the distribution.\n", "- The **support**, or the domain for which the probability is nonzero.\n", "- The probability mass function (for discrete distributions) or probability density function (for continuous distributions). **I use the parametrization that Stan uses.**\n", "- Syntax for how to use the distribution using `numpy.random`, `scipy.stats`, and Stan.\n", "- Description of relations to other distributions.\n", "- Miscellaneous notes.\n", "- Interactive plots of the PMF/PDF and CDF of the distribution.\n", "\n", "I omit analytical expression for the CDF because CDFs are often expressed as special functions, such as regularized incomplete beta functions or error functions. They can also be easily looked up. We mainly use them for comparing ECDFs in plots, and we can use the `scipy.stats` module to just compute them. \n", "\n", "Along the way, we'll define other terms, such as **Bernoulli trial** and **Poisson process**. I will use the `scipy.stats` module (imported as `st`) to compute the PDF/PMF and CDF. The API for this module will be apparent as I use it.\n", "\n", "Finally, there is often more than one way to define the parameters of a distribution. I will be using the definitions and variable names that are used in [Stan](http://mc-stan.org/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualization of PDF/PMF's and CDFs\n", "\n", "I use [Bokeh](https://bokeh.pydata.org/) to construct interactive visualizations of the distribution. The `bebi103.viz` module has a function to create Bokeh apps to visualize the distributions, but I include the function here in this notebook to avoid dependencies. Please excuse the long code block. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def distribution_plot_app(x_min=None, x_max=None, scipy_dist=None,\n", " transform=None, custom_pdf=None, custom_pmf=None, custom_cdf=None, \n", " params=None, n=400, plot_height=200, plot_width=300, x_axis_label='x', \n", " title=None):\n", " \"\"\"\n", " Build interactive Bokeh app displaying a univariate\n", " probability distribution.\n", "\n", " Parameters\n", " ----------\n", " x_min : float\n", " Minimum value that the random variable can take in plots.\n", " x_max : float\n", " Maximum value that the random variable can take in plots.\n", " scipy_dist : scipy.stats distribution\n", " Distribution to use in plotting.\n", " transform : function or None (default)\n", " A function of call signature `transform(*params)` that takes\n", " a tuple or Numpy array of parameters and returns a tuple of\n", " the same length with transformed parameters.\n", " custom_pdf : function\n", " Function with call signature f(x, *params) that computes the\n", " PDF of a distribution.\n", " custom_pmf : function \n", " Function with call signature f(x, *params) that computes the\n", " PDF of a distribution.\n", " custom_cdf : function\n", " Function with call signature F(x, *params) that computes the\n", " CDF of a distribution.\n", " params : list of dicts\n", " A list of parameter specifications. Each entry in the list gives\n", " specifications for a parameter of the distribution stored as a\n", " dictionary. Each dictionary must have the following keys.\n", " name : str, name of the parameter\n", " start : float, starting point of slider for parameter (the\n", " smallest allowed value of the parameter)\n", " end : float, ending point of slider for parameter (the\n", " largest allowed value of the parameter)\n", " value : float, the value of the parameter that the slider\n", " takes initially. Must be between start and end.\n", " step : float, the step size for the slider\n", " n : int, default 400\n", " Number of points to use in making plots of PDF and CDF for \n", " continuous distributions. This should be large enough to give\n", " smooth plots.\n", " plot_height : int, default 200\n", " Height of plots.\n", " plot_width : int, default 300\n", " Width of plots.\n", " x_axis_label : str, default 'x'\n", " Label for x-axis.\n", " title : str, default None\n", " Title to be displayed above the PDF or PMF plot.\n", "\n", " Returns\n", " -------\n", " output : Bokeh app\n", " An app to visualize the PDF/PMF and CDF. It can be displayed\n", " with bokeh.io.show(). If it is displayed in a notebook, the\n", " notebook_url kwarg should be specified.\n", " \"\"\"\n", " if None in [x_min, x_max]:\n", " raise RuntimeError('`x_min` and `x_max` must be specified.')\n", "\n", " if scipy_dist is None:\n", " fun_c = custom_cdf\n", " if (custom_pdf is None and custom_pmf is None) or custom_cdf is None:\n", " raise RuntimeError('For custom distributions, both PDF/PMF and'\n", " + ' CDF must be specified.')\n", " if custom_pdf is not None and custom_pmf is not None:\n", " raise RuntimeError('Can only specify custom PMF or PDF.')\n", " if custom_pmf is None:\n", " discrete = False\n", " fun_p = custom_pdf\n", " else:\n", " discrete = True\n", " fun_p = custom_pmf\n", " elif ( custom_pdf is not None \n", " or custom_pmf is not None\n", " or custom_cdf is not None):\n", " raise RuntimeError(\n", " 'Can only specify either custom or scipy distribution.')\n", " else:\n", " fun_c = scipy_dist.cdf\n", " if hasattr(scipy_dist, 'pmf'):\n", " discrete = True\n", " fun_p = scipy_dist.pmf\n", " else:\n", " discrete = False\n", " fun_p = scipy_dist.pdf\n", " \n", " if discrete:\n", " p_y_axis_label = 'PMF'\n", " else:\n", " p_y_axis_label = 'PDF'\n", "\n", " if params is None:\n", " raise RuntimeError('`params` must be specified.')\n", " \n", "\n", " def _plot_app(doc):\n", " p_p = bokeh.plotting.figure(plot_height=plot_height,\n", " plot_width=plot_width,\n", " x_axis_label=x_axis_label,\n", " y_axis_label=p_y_axis_label,\n", " title=title)\n", " p_c = bokeh.plotting.figure(plot_height=plot_height,\n", " plot_width=plot_width,\n", " x_axis_label=x_axis_label,\n", " y_axis_label='CDF')\n", "\n", " # Link the axes\n", " p_c.x_range = p_p.x_range\n", "\n", " # Make sure CDF y_range is zero to one\n", " p_c.y_range = bokeh.models.Range1d(-0.05, 1.05)\n", "\n", " # Make array of parameter values\n", " param_vals = np.array([param['value'] for param in params])\n", " if transform is not None:\n", " param_vals = transform(*param_vals)\n", "\n", " # Set up data for plot\n", " if discrete:\n", " x = np.arange(int(np.ceil(x_min)), \n", " int(np.floor(x_max))+1)\n", " x_size = x[-1] - x[0]\n", " x_c = np.empty(2*len(x))\n", " x_c[::2] = x\n", " x_c[1::2] = x\n", " x_c = np.concatenate(((max(x[0] - 0.05*x_size, x[0] - 0.95),), \n", " x_c,\n", " (min(x[-1] + 0.05*x_size, x[-1] + 0.95),)))\n", " x_cdf = np.concatenate(((x_c[0],), x))\n", " else:\n", " x = np.linspace(x_min, x_max, n)\n", " x_c = x_cdf = x\n", "\n", " # Compute PDF and CDF\n", " y_p = fun_p(x, *param_vals)\n", " y_c = fun_c(x_cdf, *param_vals)\n", " if discrete:\n", " y_c_plot = np.empty_like(x_c)\n", " y_c_plot[::2] = y_c\n", " y_c_plot[1::2] = y_c\n", " y_c = y_c_plot\n", "\n", " # Set up data sources\n", " source_p = bokeh.models.ColumnDataSource(data={'x': x,\n", " 'y_p': y_p})\n", " source_c = bokeh.models.ColumnDataSource(data={'x': x_c, \n", " 'y_c': y_c})\n", "\n", " # Plot PDF and CDF\n", " p_c.line('x', 'y_c', source=source_c, line_width=2)\n", " if discrete:\n", " p_p.circle('x', 'y_p', source=source_p, size=5)\n", " p_p.segment(x0='x',\n", " x1='x',\n", " y0=0, \n", " y1='y_p', \n", " source=source_p, \n", " line_width=2)\n", " else:\n", " p_p.line('x', 'y_p', source=source_p, line_width=2)\n", " \n", " \n", " def _callback(attr, old, new):\n", " param_vals = tuple([slider.value for slider in sliders])\n", " if transform is not None:\n", " param_vals = transform(*param_vals)\n", " \n", " # Compute PDF and CDF\n", " source_p.data['y_p'] = fun_p(x, *param_vals)\n", " y_c = fun_c(x_cdf, *param_vals)\n", " if discrete:\n", " y_c_plot = np.empty_like(x_c)\n", " y_c_plot[::2] = y_c\n", " y_c_plot[1::2] = y_c\n", " y_c = y_c_plot\n", " source_c.data['y_c'] = y_c\n", "\n", " sliders = [bokeh.models.Slider(start=param['start'],\n", " end=param['end'],\n", " value=param['value'],\n", " step=param['step'],\n", " title=param['name'])\n", " for param in params]\n", " for slider in sliders:\n", " slider.on_change('value', _callback)\n", "\n", " # Add the plot to the app\n", " widgets = bokeh.layouts.widgetbox(sliders)\n", " grid = bokeh.layouts.gridplot([p_p, p_c], ncols=2)\n", " doc.add_root(bokeh.layouts.column(widgets, grid))\n", "\n", " handler = bokeh.application.handlers.FunctionHandler(_plot_app)\n", " return bokeh.application.Application(handler)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A note about CDFs for discrete distributions\n", "\n", "I plot the CDFs for discrete distributions as \"staircases.\" As an example, here is a plot of the CDF of the [Binomial distribution](#Binomial-distribution) with parameters $N=10$ and $\\theta = 0.5$." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "(function(root) {\n", " function embed_document(root) {\n", " \n", " var docs_json = {\"676e1474-4fe1-4964-85e5-4382757b4d08\":{\"roots\":{\"references\":[{\"attributes\":{\"plot\":null,\"text\":\"\"},\"id\":\"712adb34-8e43-49bc-be21-6430768534a2\",\"type\":\"Title\"},{\"attributes\":{\"callback\":null,\"data\":{\"x\":{\"__ndarray__\":\"AAAAAAAA4L8AAAAAAAAAAAAAAAAAAAAAAAAAAAAA8D8AAAAAAADwPwAAAAAAAABAAAAAAAAAAEAAAAAAAAAIQAAAAAAAAAhAAAAAAAAAEEAAAAAAAAAQQAAAAAAAABRAAAAAAAAAFEAAAAAAAAAYQAAAAAAAABhAAAAAAAAAHEAAAAAAAAAcQAAAAAAAACBAAAAAAAAAIEAAAAAAAAAiQAAAAAAAACJAAAAAAAAAJEAAAAAAAAAkQAAAAAAAACVA\",\"dtype\":\"float64\",\"shape\":[24]},\"y\":{\"__ndarray__\":\"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAFA/AAAAAAAAUD////////+FP////////4U/AgAAAAAArD8CAAAAAACsP/7//////8U//v//////xT8CAAAAACDYPwIAAAAAINg////////v4z///////+/jPwAAAAAAgOo/AAAAAACA6j8AAAAAAEDuPwAAAAAAQO4/AAAAAACo7z8AAAAAAKjvPwAAAAAA+O8/AAAAAAD47z8AAAAAAADwPwAAAAAAAPA/\",\"dtype\":\"float64\",\"shape\":[24]}},\"selected\":{\"id\":\"8e55f090-1bbd-41b1-88cc-6a537bf987fa\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"a70caeab-9286-460f-a3c4-1545f35b2eb3\",\"type\":\"UnionRenderers\"}},\"id\":\"3b4c9655-b5d9-4e90-a715-994728db915d\",\"type\":\"ColumnDataSource\"},{\"attributes\":{},\"id\":\"30731cd1-b6aa-4c9f-b863-81cf05af95a4\",\"type\":\"PanTool\"},{\"attributes\":{\"callback\":null},\"id\":\"f0c76258-e855-4651-b569-13d49b89953e\",\"type\":\"DataRange1d\"},{\"attributes\":{},\"id\":\"e1d6bc1b-99dd-465c-a39a-8a46207638a2\",\"type\":\"LinearScale\"},{\"attributes\":{\"callback\":null},\"id\":\"1d1bf80c-41a3-41fc-ab54-3d793a35ec5b\",\"type\":\"DataRange1d\"},{\"attributes\":{},\"id\":\"31c4d706-f2b7-4db3-b796-8dae39c79104\",\"type\":\"SaveTool\"},{\"attributes\":{\"below\":[{\"id\":\"93f17973-4019-43b9-947c-1dc40cefb4d7\",\"type\":\"LinearAxis\"}],\"left\":[{\"id\":\"19448b18-9ba4-4d40-be48-f1e7e2a3c201\",\"type\":\"LinearAxis\"}],\"plot_height\":250,\"plot_width\":350,\"renderers\":[{\"id\":\"93f17973-4019-43b9-947c-1dc40cefb4d7\",\"type\":\"LinearAxis\"},{\"id\":\"90059295-bb1a-438e-b264-660345c0ef77\",\"type\":\"Grid\"},{\"id\":\"19448b18-9ba4-4d40-be48-f1e7e2a3c201\",\"type\":\"LinearAxis\"},{\"id\":\"3f8b30b4-8ea3-4226-b97c-50742adf9b48\",\"type\":\"Grid\"},{\"id\":\"9791de01-0409-49f1-b6a1-e4e3aaf7e5c9\",\"type\":\"BoxAnnotation\"},{\"id\":\"90f66895-a293-4fe0-a830-8609c49fe704\",\"type\":\"GlyphRenderer\"}],\"title\":{\"id\":\"712adb34-8e43-49bc-be21-6430768534a2\",\"type\":\"Title\"},\"toolbar\":{\"id\":\"1db922f5-1153-4a97-b66e-b8ec9f3e439f\",\"type\":\"Toolbar\"},\"x_range\":{\"id\":\"f0c76258-e855-4651-b569-13d49b89953e\",\"type\":\"DataRange1d\"},\"x_scale\":{\"id\":\"e1d6bc1b-99dd-465c-a39a-8a46207638a2\",\"type\":\"LinearScale\"},\"y_range\":{\"id\":\"1d1bf80c-41a3-41fc-ab54-3d793a35ec5b\",\"type\":\"DataRange1d\"},\"y_scale\":{\"id\":\"d554ed57-326b-43f0-87c8-2edb66df0030\",\"type\":\"LinearScale\"}},\"id\":\"7026a4e3-fc01-4322-afab-57c040cfb2b2\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{},\"id\":\"594cfb16-1ae3-4544-8248-e2552370fb71\",\"type\":\"ResetTool\"},{\"attributes\":{},\"id\":\"d554ed57-326b-43f0-87c8-2edb66df0030\",\"type\":\"LinearScale\"},{\"attributes\":{},\"id\":\"d712feed-c061-4666-b734-502a6fd3eb4a\",\"type\":\"HelpTool\"},{\"attributes\":{\"source\":{\"id\":\"3b4c9655-b5d9-4e90-a715-994728db915d\",\"type\":\"ColumnDataSource\"}},\"id\":\"0030597a-7190-45a7-b590-5bda3a8ead29\",\"type\":\"CDSView\"},{\"attributes\":{},\"id\":\"55abbf0a-81d4-4c85-a5fe-45fd0bef8d71\",\"type\":\"WheelZoomTool\"},{\"attributes\":{\"overlay\":{\"id\":\"9791de01-0409-49f1-b6a1-e4e3aaf7e5c9\",\"type\":\"BoxAnnotation\"}},\"id\":\"4f4214c0-9660-4809-aa89-4e46bc3b4012\",\"type\":\"BoxZoomTool\"},{\"attributes\":{\"dimension\":1,\"plot\":{\"id\":\"7026a4e3-fc01-4322-afab-57c040cfb2b2\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"8e70f681-606f-47b3-a9d8-d073128cd566\",\"type\":\"BasicTicker\"}},\"id\":\"3f8b30b4-8ea3-4226-b97c-50742adf9b48\",\"type\":\"Grid\"},{\"attributes\":{\"active_drag\":\"auto\",\"active_inspect\":\"auto\",\"active_multi\":null,\"active_scroll\":\"auto\",\"active_tap\":\"auto\",\"tools\":[{\"id\":\"30731cd1-b6aa-4c9f-b863-81cf05af95a4\",\"type\":\"PanTool\"},{\"id\":\"55abbf0a-81d4-4c85-a5fe-45fd0bef8d71\",\"type\":\"WheelZoomTool\"},{\"id\":\"4f4214c0-9660-4809-aa89-4e46bc3b4012\",\"type\":\"BoxZoomTool\"},{\"id\":\"31c4d706-f2b7-4db3-b796-8dae39c79104\",\"type\":\"SaveTool\"},{\"id\":\"594cfb16-1ae3-4544-8248-e2552370fb71\",\"type\":\"ResetTool\"},{\"id\":\"d712feed-c061-4666-b734-502a6fd3eb4a\",\"type\":\"HelpTool\"}]},\"id\":\"1db922f5-1153-4a97-b66e-b8ec9f3e439f\",\"type\":\"Toolbar\"},{\"attributes\":{\"line_alpha\":0.1,\"line_color\":\"#1f77b4\",\"line_width\":2,\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"b7acb8bf-daed-437c-8451-d876f881a224\",\"type\":\"Line\"},{\"attributes\":{\"bottom_units\":\"screen\",\"fill_alpha\":{\"value\":0.5},\"fill_color\":{\"value\":\"lightgrey\"},\"left_units\":\"screen\",\"level\":\"overlay\",\"line_alpha\":{\"value\":1.0},\"line_color\":{\"value\":\"black\"},\"line_dash\":[4,4],\"line_width\":{\"value\":2},\"plot\":null,\"render_mode\":\"css\",\"right_units\":\"screen\",\"top_units\":\"screen\"},\"id\":\"9791de01-0409-49f1-b6a1-e4e3aaf7e5c9\",\"type\":\"BoxAnnotation\"},{\"attributes\":{},\"id\":\"a70caeab-9286-460f-a3c4-1545f35b2eb3\",\"type\":\"UnionRenderers\"},{\"attributes\":{\"data_source\":{\"id\":\"3b4c9655-b5d9-4e90-a715-994728db915d\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"46a13b24-c12f-461c-8772-1753868350ea\",\"type\":\"Line\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"b7acb8bf-daed-437c-8451-d876f881a224\",\"type\":\"Line\"},\"selection_glyph\":null,\"view\":{\"id\":\"0030597a-7190-45a7-b590-5bda3a8ead29\",\"type\":\"CDSView\"}},\"id\":\"90f66895-a293-4fe0-a830-8609c49fe704\",\"type\":\"GlyphRenderer\"},{\"attributes\":{},\"id\":\"f10a21b0-27ea-4e9d-852e-a91d6e31544a\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"plot\":{\"id\":\"7026a4e3-fc01-4322-afab-57c040cfb2b2\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"b423c126-4596-4a24-9bbf-339d954659e3\",\"type\":\"BasicTicker\"}},\"id\":\"90059295-bb1a-438e-b264-660345c0ef77\",\"type\":\"Grid\"},{\"attributes\":{\"line_color\":\"#1f77b4\",\"line_width\":2,\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"46a13b24-c12f-461c-8772-1753868350ea\",\"type\":\"Line\"},{\"attributes\":{},\"id\":\"8e55f090-1bbd-41b1-88cc-6a537bf987fa\",\"type\":\"Selection\"},{\"attributes\":{\"axis_label\":\"F(n; 10, 0.5)\",\"formatter\":{\"id\":\"5f81ec9c-e7e9-45bc-8d96-c9856856d3a0\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"7026a4e3-fc01-4322-afab-57c040cfb2b2\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"8e70f681-606f-47b3-a9d8-d073128cd566\",\"type\":\"BasicTicker\"}},\"id\":\"19448b18-9ba4-4d40-be48-f1e7e2a3c201\",\"type\":\"LinearAxis\"},{\"attributes\":{},\"id\":\"8e70f681-606f-47b3-a9d8-d073128cd566\",\"type\":\"BasicTicker\"},{\"attributes\":{\"axis_label\":\"n\",\"formatter\":{\"id\":\"f10a21b0-27ea-4e9d-852e-a91d6e31544a\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"7026a4e3-fc01-4322-afab-57c040cfb2b2\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"b423c126-4596-4a24-9bbf-339d954659e3\",\"type\":\"BasicTicker\"}},\"id\":\"93f17973-4019-43b9-947c-1dc40cefb4d7\",\"type\":\"LinearAxis\"},{\"attributes\":{},\"id\":\"5f81ec9c-e7e9-45bc-8d96-c9856856d3a0\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{},\"id\":\"b423c126-4596-4a24-9bbf-339d954659e3\",\"type\":\"BasicTicker\"}],\"root_ids\":[\"7026a4e3-fc01-4322-afab-57c040cfb2b2\"]},\"title\":\"Bokeh Application\",\"version\":\"0.13.0\"}};\n", " var render_items = [{\"docid\":\"676e1474-4fe1-4964-85e5-4382757b4d08\",\"roots\":{\"7026a4e3-fc01-4322-afab-57c040cfb2b2\":\"10a64dfd-c292-4d5f-a8f3-cc6c07bedc6f\"}}];\n", " root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n", "\n", " }\n", " if (root.Bokeh !== undefined) {\n", " embed_document(root);\n", " } else {\n", " var attempts = 0;\n", " var timer = setInterval(function(root) {\n", " if (root.Bokeh !== undefined) {\n", " embed_document(root);\n", " clearInterval(timer);\n", " }\n", " attempts++;\n", " if (attempts > 100) {\n", " console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\")\n", " clearInterval(timer);\n", " }\n", " }, 10, root)\n", " }\n", "})(window);" ], "application/vnd.bokehjs_exec.v0+json": "" }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "id": "7026a4e3-fc01-4322-afab-57c040cfb2b2" } }, "output_type": "display_data" } ], "source": [ "x = np.arange(0, 11)\n", "x_size = x[-1] - x[0]\n", "x_c = np.empty(2*len(x))\n", "x_c[::2] = x\n", "x_c[1::2] = x\n", "x_c = np.concatenate(((max(x[0] - 0.05*x_size, x[0] - 0.95),), \n", " x_c,\n", " (min(x[-1] + 0.05*x_size, x[-1] + 0.95),)))\n", "x_cdf = np.concatenate(((x_c[0],), x))\n", "\n", "y = st.binom.cdf(x_cdf, 10, 0.5)\n", "y_c = np.empty_like(x_c)\n", "y_c[::2] = y\n", "y_c[1::2] = y\n", "\n", "p = bokeh.plotting.figure(plot_height=250,\n", " plot_width=350,\n", " x_axis_label='n',\n", " y_axis_label='F(n; 10, 0.5)')\n", "p.line(x_c, y_c, line_width=2)\n", "bokeh.io.show(p)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The CDF appears to be multivalued at the vertical lines of the staircase. It is not. Furthermore, the lines at zero and one on the CDF axis should extend out to $-\\infty$ and $\\infty$, respectively along the horizontal axis. Strictly speaking, it should be plotted as follows." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "(function(root) {\n", " function embed_document(root) {\n", " \n", " var docs_json = {\"ced3b931-9c64-4b56-9e67-8375e5010d72\":{\"roots\":{\"references\":[{\"attributes\":{\"dimension\":1,\"plot\":{\"id\":\"9b7364ed-7010-4591-930c-d9fbf4b59fad\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"9503391e-9e21-4dc6-934e-eccd3292997f\",\"type\":\"BasicTicker\"}},\"id\":\"403ff54d-73b2-4e9c-9e76-d513af8c24c0\",\"type\":\"Grid\"},{\"attributes\":{},\"id\":\"d8e9a393-bec1-49dc-8c3a-1ba856417b0b\",\"type\":\"UnionRenderers\"},{\"attributes\":{\"fill_alpha\":{\"value\":0.1},\"fill_color\":{\"value\":\"#1f77b4\"},\"line_alpha\":{\"value\":0.1},\"line_color\":{\"value\":\"#1f77b4\"},\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"1494b557-6fa4-44b1-a6f5-a82c06d9a6d5\",\"type\":\"Circle\"},{\"attributes\":{},\"id\":\"a28f5868-867f-4ec4-8f74-db53b63a0f2e\",\"type\":\"PanTool\"},{\"attributes\":{\"source\":{\"id\":\"b950ad54-44ad-4464-96c8-3c602be857fa\",\"type\":\"ColumnDataSource\"}},\"id\":\"377c6cd6-1678-4c22-9da6-ef4beed910a9\",\"type\":\"CDSView\"},{\"attributes\":{},\"id\":\"4a894c0f-6a0e-4f55-ab21-a181f5f7b42c\",\"type\":\"Selection\"},{\"attributes\":{\"data_source\":{\"id\":\"b950ad54-44ad-4464-96c8-3c602be857fa\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"36db8eca-9a40-4980-b672-53820cfcd49b\",\"type\":\"Segment\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"e4c18f28-998d-4198-b756-1d41761c2b76\",\"type\":\"Segment\"},\"selection_glyph\":null,\"view\":{\"id\":\"377c6cd6-1678-4c22-9da6-ef4beed910a9\",\"type\":\"CDSView\"}},\"id\":\"9d22d4de-4bd2-4f8b-8693-80478c64a3f7\",\"type\":\"GlyphRenderer\"},{\"attributes\":{},\"id\":\"f0f9670b-627d-4296-b830-64e685ccb87a\",\"type\":\"UnionRenderers\"},{\"attributes\":{\"data_source\":{\"id\":\"966b04e0-987d-4922-8f33-be9d9f67c327\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"4661ff4d-c14f-4225-8ebc-3b4a314dbcf4\",\"type\":\"Ray\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"e9fa06a4-3269-4a92-8568-c64adde34fc7\",\"type\":\"Ray\"},\"selection_glyph\":null,\"view\":{\"id\":\"eaf41819-d768-4cbf-a0b9-1deee25519ca\",\"type\":\"CDSView\"}},\"id\":\"594809fe-0b6d-4632-8b26-c72742d0c0d4\",\"type\":\"GlyphRenderer\"},{\"attributes\":{},\"id\":\"1983848b-247c-47d5-966c-ee1784c2f154\",\"type\":\"Selection\"},{\"attributes\":{\"bottom_units\":\"screen\",\"fill_alpha\":{\"value\":0.5},\"fill_color\":{\"value\":\"lightgrey\"},\"left_units\":\"screen\",\"level\":\"overlay\",\"line_alpha\":{\"value\":1.0},\"line_color\":{\"value\":\"black\"},\"line_dash\":[4,4],\"line_width\":{\"value\":2},\"plot\":null,\"render_mode\":\"css\",\"right_units\":\"screen\",\"top_units\":\"screen\"},\"id\":\"cecbd523-02fe-4e78-9112-882bb2f4fa2f\",\"type\":\"BoxAnnotation\"},{\"attributes\":{},\"id\":\"ad96c61d-bbe5-43a5-b7de-3d6a3b836602\",\"type\":\"BasicTicker\"},{\"attributes\":{},\"id\":\"7e5cb271-5883-4ab3-b7aa-1ce891d15585\",\"type\":\"UnionRenderers\"},{\"attributes\":{\"line_alpha\":{\"value\":0.1},\"line_color\":{\"value\":\"#1f77b4\"},\"line_width\":{\"value\":2},\"x0\":{\"field\":\"x0\"},\"x1\":{\"field\":\"x1\"},\"y0\":{\"field\":\"y0\"},\"y1\":{\"field\":\"y1\"}},\"id\":\"e4c18f28-998d-4198-b756-1d41761c2b76\",\"type\":\"Segment\"},{\"attributes\":{\"fill_color\":{\"value\":\"#1f77b4\"},\"line_color\":{\"value\":\"#1f77b4\"},\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"085d1899-8b26-49a6-8585-da22f5d53a83\",\"type\":\"Circle\"},{\"attributes\":{},\"id\":\"9503391e-9e21-4dc6-934e-eccd3292997f\",\"type\":\"BasicTicker\"},{\"attributes\":{},\"id\":\"18ccf8c8-2b83-4a3b-8add-0d03209cb83c\",\"type\":\"UnionRenderers\"},{\"attributes\":{\"line_color\":{\"value\":\"#1f77b4\"},\"line_width\":{\"value\":2},\"x0\":{\"field\":\"x0\"},\"x1\":{\"field\":\"x1\"},\"y0\":{\"field\":\"y0\"},\"y1\":{\"field\":\"y1\"}},\"id\":\"36db8eca-9a40-4980-b672-53820cfcd49b\",\"type\":\"Segment\"},{\"attributes\":{},\"id\":\"53d043c4-c582-446f-92d2-f9864c2323a9\",\"type\":\"UnionRenderers\"},{\"attributes\":{},\"id\":\"271b18e0-1d76-4c2e-938b-5c4988cdcd88\",\"type\":\"Selection\"},{\"attributes\":{\"data_source\":{\"id\":\"5a8f75a5-bcb6-4374-86e9-4d6bb3ae376e\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"a5d013d5-f04d-46f8-8533-48353adb3857\",\"type\":\"Circle\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"ff65b444-89b7-4284-840a-cdc4c1ff1b86\",\"type\":\"Circle\"},\"selection_glyph\":null,\"view\":{\"id\":\"27b92b63-abd7-4d08-9020-9fc1d98dcf33\",\"type\":\"CDSView\"}},\"id\":\"3167995a-578c-43ec-b342-e22e266c6f30\",\"type\":\"GlyphRenderer\"},{\"attributes\":{},\"id\":\"37570dc1-cdaf-44ec-acb6-fa70b3beda96\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"angle\":{\"units\":\"rad\",\"value\":3.141592653589793},\"length\":{\"units\":\"data\",\"value\":0},\"line_color\":{\"value\":\"#1f77b4\"},\"line_width\":{\"value\":2},\"x\":{\"value\":0},\"y\":{\"value\":0}},\"id\":\"074883d5-5fb1-47f7-bfc1-51329e30ca7c\",\"type\":\"Ray\"},{\"attributes\":{},\"id\":\"9ca30144-f163-4f4b-9948-3447725b3205\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{},\"id\":\"d6455e2f-7355-4ec9-80ae-b2e7d309ff03\",\"type\":\"WheelZoomTool\"},{\"attributes\":{\"overlay\":{\"id\":\"cecbd523-02fe-4e78-9112-882bb2f4fa2f\",\"type\":\"BoxAnnotation\"}},\"id\":\"9b887c63-dc15-48a8-af26-455a5357c12a\",\"type\":\"BoxZoomTool\"},{\"attributes\":{\"angle\":{\"units\":\"rad\",\"value\":3.141592653589793},\"length\":{\"units\":\"data\",\"value\":0},\"line_alpha\":{\"value\":0.1},\"line_color\":{\"value\":\"#1f77b4\"},\"line_width\":{\"value\":2},\"x\":{\"value\":0},\"y\":{\"value\":0}},\"id\":\"6647234d-d5b9-4d16-b313-0cb100a90eab\",\"type\":\"Ray\"},{\"attributes\":{\"source\":{\"id\":\"9b7fe33f-2390-48f7-8401-c10421e839cd\",\"type\":\"ColumnDataSource\"}},\"id\":\"1973b254-2d18-4473-9624-11a03f7ea206\",\"type\":\"CDSView\"},{\"attributes\":{\"data_source\":{\"id\":\"9b7fe33f-2390-48f7-8401-c10421e839cd\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"085d1899-8b26-49a6-8585-da22f5d53a83\",\"type\":\"Circle\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"1494b557-6fa4-44b1-a6f5-a82c06d9a6d5\",\"type\":\"Circle\"},\"selection_glyph\":null,\"view\":{\"id\":\"1973b254-2d18-4473-9624-11a03f7ea206\",\"type\":\"CDSView\"}},\"id\":\"c3070052-9f4f-4be2-920c-a8a23234529c\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"source\":{\"id\":\"5a8f75a5-bcb6-4374-86e9-4d6bb3ae376e\",\"type\":\"ColumnDataSource\"}},\"id\":\"27b92b63-abd7-4d08-9020-9fc1d98dcf33\",\"type\":\"CDSView\"},{\"attributes\":{},\"id\":\"efb3a078-4ccb-41f4-aefb-ffba46062ec0\",\"type\":\"SaveTool\"},{\"attributes\":{},\"id\":\"06b53a24-399e-420b-a3ca-51013d1d90a5\",\"type\":\"ResetTool\"},{\"attributes\":{},\"id\":\"3cf80d63-e479-4933-8cbd-54782bf80cea\",\"type\":\"Selection\"},{\"attributes\":{},\"id\":\"9116f36e-8338-4a2f-8afb-c1d80f470921\",\"type\":\"HelpTool\"},{\"attributes\":{},\"id\":\"d17b23b6-fa71-4747-91a9-4e8e33b78464\",\"type\":\"Selection\"},{\"attributes\":{},\"id\":\"e9356607-910d-48de-8329-daa9773b2d54\",\"type\":\"Selection\"},{\"attributes\":{},\"id\":\"b61160f0-8b2b-485d-847a-03686e273cc7\",\"type\":\"UnionRenderers\"},{\"attributes\":{\"callback\":null,\"data\":{},\"selected\":{\"id\":\"4a894c0f-6a0e-4f55-ab21-a181f5f7b42c\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"7e5cb271-5883-4ab3-b7aa-1ce891d15585\",\"type\":\"UnionRenderers\"}},\"id\":\"966b04e0-987d-4922-8f33-be9d9f67c327\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"callback\":null,\"data\":{\"x\":[1,2,3,4,5,6,7,8,9,10],\"y\":{\"__ndarray__\":\"AAAAAAAAUD////////+FPwIAAAAAAKw//v//////xT8CAAAAACDYP///////7+M/AAAAAACA6j8AAAAAAEDuPwAAAAAAqO8/AAAAAAD47z8=\",\"dtype\":\"float64\",\"shape\":[10]}},\"selected\":{\"id\":\"d17b23b6-fa71-4747-91a9-4e8e33b78464\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"f0f9670b-627d-4296-b830-64e685ccb87a\",\"type\":\"UnionRenderers\"}},\"id\":\"5a8f75a5-bcb6-4374-86e9-4d6bb3ae376e\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"below\":[{\"id\":\"d45d9d1a-5118-46b8-a7f3-6cf41b91f91c\",\"type\":\"LinearAxis\"}],\"left\":[{\"id\":\"35c3c31b-5cd5-4930-afcd-0d3a011a9169\",\"type\":\"LinearAxis\"}],\"plot_height\":250,\"plot_width\":350,\"renderers\":[{\"id\":\"d45d9d1a-5118-46b8-a7f3-6cf41b91f91c\",\"type\":\"LinearAxis\"},{\"id\":\"2f66f274-c4b8-44b7-b1cf-84e78c4b43d0\",\"type\":\"Grid\"},{\"id\":\"35c3c31b-5cd5-4930-afcd-0d3a011a9169\",\"type\":\"LinearAxis\"},{\"id\":\"403ff54d-73b2-4e9c-9e76-d513af8c24c0\",\"type\":\"Grid\"},{\"id\":\"cecbd523-02fe-4e78-9112-882bb2f4fa2f\",\"type\":\"BoxAnnotation\"},{\"id\":\"9d22d4de-4bd2-4f8b-8693-80478c64a3f7\",\"type\":\"GlyphRenderer\"},{\"id\":\"7c0b387a-fe48-43d9-a5da-ba179fd5773a\",\"type\":\"GlyphRenderer\"},{\"id\":\"594809fe-0b6d-4632-8b26-c72742d0c0d4\",\"type\":\"GlyphRenderer\"},{\"id\":\"6b0e57fd-325d-4766-850e-685c41822011\",\"type\":\"GlyphRenderer\"},{\"id\":\"3167995a-578c-43ec-b342-e22e266c6f30\",\"type\":\"GlyphRenderer\"},{\"id\":\"c3070052-9f4f-4be2-920c-a8a23234529c\",\"type\":\"GlyphRenderer\"}],\"title\":{\"id\":\"156246a0-3d90-45b1-9f09-9576cb33bb11\",\"type\":\"Title\"},\"toolbar\":{\"id\":\"c8d6cb93-20ae-46cc-beb8-17f341fd7f59\",\"type\":\"Toolbar\"},\"x_range\":{\"id\":\"0a789983-315b-49b3-a2e9-9ba7bf446852\",\"type\":\"DataRange1d\"},\"x_scale\":{\"id\":\"eea03589-fda6-480e-b4f2-6603f4dc55a2\",\"type\":\"LinearScale\"},\"y_range\":{\"id\":\"e162c9d6-e91b-4ae1-8c56-bcae1f670892\",\"type\":\"DataRange1d\"},\"y_scale\":{\"id\":\"ee311861-9a5e-4acc-86e6-603c5e94dd0c\",\"type\":\"LinearScale\"}},\"id\":\"9b7364ed-7010-4591-930c-d9fbf4b59fad\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{\"angle\":{\"units\":\"rad\",\"value\":0},\"length\":{\"units\":\"data\",\"value\":0},\"line_color\":{\"value\":\"#1f77b4\"},\"line_width\":{\"value\":2},\"x\":{\"value\":10},\"y\":{\"value\":1}},\"id\":\"4661ff4d-c14f-4225-8ebc-3b4a314dbcf4\",\"type\":\"Ray\"},{\"attributes\":{},\"id\":\"eea03589-fda6-480e-b4f2-6603f4dc55a2\",\"type\":\"LinearScale\"},{\"attributes\":{},\"id\":\"ee311861-9a5e-4acc-86e6-603c5e94dd0c\",\"type\":\"LinearScale\"},{\"attributes\":{\"fill_alpha\":{\"value\":0.1},\"fill_color\":{\"value\":\"#1f77b4\"},\"line_alpha\":{\"value\":0.1},\"line_color\":{\"value\":\"#1f77b4\"},\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"ff65b444-89b7-4284-840a-cdc4c1ff1b86\",\"type\":\"Circle\"},{\"attributes\":{\"plot\":null,\"text\":\"\"},\"id\":\"156246a0-3d90-45b1-9f09-9576cb33bb11\",\"type\":\"Title\"},{\"attributes\":{\"fill_color\":{\"value\":\"white\"},\"line_color\":{\"value\":\"#1f77b4\"},\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"a5d013d5-f04d-46f8-8533-48353adb3857\",\"type\":\"Circle\"},{\"attributes\":{\"callback\":null,\"data\":{},\"selected\":{\"id\":\"1983848b-247c-47d5-966c-ee1784c2f154\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"18ccf8c8-2b83-4a3b-8add-0d03209cb83c\",\"type\":\"UnionRenderers\"}},\"id\":\"3804d5cb-e388-4521-837e-738b69d01811\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"source\":{\"id\":\"3804d5cb-e388-4521-837e-738b69d01811\",\"type\":\"ColumnDataSource\"}},\"id\":\"a3397548-fecd-4ccf-9030-df09b31abbb4\",\"type\":\"CDSView\"},{\"attributes\":{\"axis_label\":\"n\",\"formatter\":{\"id\":\"37570dc1-cdaf-44ec-acb6-fa70b3beda96\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"9b7364ed-7010-4591-930c-d9fbf4b59fad\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"ad96c61d-bbe5-43a5-b7de-3d6a3b836602\",\"type\":\"BasicTicker\"}},\"id\":\"d45d9d1a-5118-46b8-a7f3-6cf41b91f91c\",\"type\":\"LinearAxis\"},{\"attributes\":{\"angle\":{\"units\":\"rad\",\"value\":0},\"length\":{\"units\":\"data\",\"value\":0},\"line_alpha\":{\"value\":0.1},\"line_color\":{\"value\":\"#1f77b4\"},\"line_width\":{\"value\":2},\"x\":{\"value\":10},\"y\":{\"value\":1}},\"id\":\"e9fa06a4-3269-4a92-8568-c64adde34fc7\",\"type\":\"Ray\"},{\"attributes\":{\"fill_alpha\":{\"value\":0.1},\"fill_color\":{\"value\":\"#1f77b4\"},\"line_alpha\":{\"value\":0.1},\"line_color\":{\"value\":\"#1f77b4\"},\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"61440e8f-3697-4925-ab2b-6610736a34a7\",\"type\":\"Circle\"},{\"attributes\":{\"fill_color\":{\"value\":\"white\"},\"line_color\":{\"value\":\"#1f77b4\"},\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"8b9786b9-e255-422e-82fc-3c4e926e548e\",\"type\":\"Circle\"},{\"attributes\":{\"callback\":null,\"data\":{\"x0\":[0,1,2,3,4,5,6,7,8,9],\"x1\":[1,2,3,4,5,6,7,8,9,10],\"y0\":{\"__ndarray__\":\"AAAAAAAAUD////////+FPwIAAAAAAKw//v//////xT8CAAAAACDYP///////7+M/AAAAAACA6j8AAAAAAEDuPwAAAAAAqO8/AAAAAAD47z8=\",\"dtype\":\"float64\",\"shape\":[10]},\"y1\":{\"__ndarray__\":\"AAAAAAAAUD////////+FPwIAAAAAAKw//v//////xT8CAAAAACDYP///////7+M/AAAAAACA6j8AAAAAAEDuPwAAAAAAqO8/AAAAAAD47z8=\",\"dtype\":\"float64\",\"shape\":[10]}},\"selected\":{\"id\":\"271b18e0-1d76-4c2e-938b-5c4988cdcd88\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"53d043c4-c582-446f-92d2-f9864c2323a9\",\"type\":\"UnionRenderers\"}},\"id\":\"b950ad54-44ad-4464-96c8-3c602be857fa\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"callback\":null,\"data\":{\"x\":[0,1,2,3,4,5,6,7,8,9,10],\"y\":{\"__ndarray__\":\"AAAAAAAAUD////////+FPwIAAAAAAKw//v//////xT8CAAAAACDYP///////7+M/AAAAAACA6j8AAAAAAEDuPwAAAAAAqO8/AAAAAAD47z8AAAAAAADwPw==\",\"dtype\":\"float64\",\"shape\":[11]}},\"selected\":{\"id\":\"e9356607-910d-48de-8329-daa9773b2d54\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"b61160f0-8b2b-485d-847a-03686e273cc7\",\"type\":\"UnionRenderers\"}},\"id\":\"9b7fe33f-2390-48f7-8401-c10421e839cd\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"callback\":null},\"id\":\"0a789983-315b-49b3-a2e9-9ba7bf446852\",\"type\":\"DataRange1d\"},{\"attributes\":{\"callback\":null,\"data\":{\"x\":[0],\"y\":[0]},\"selected\":{\"id\":\"3cf80d63-e479-4933-8cbd-54782bf80cea\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"d8e9a393-bec1-49dc-8c3a-1ba856417b0b\",\"type\":\"UnionRenderers\"}},\"id\":\"5da197a2-1efc-4198-b82b-c706cbc74160\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"active_drag\":\"auto\",\"active_inspect\":\"auto\",\"active_multi\":null,\"active_scroll\":\"auto\",\"active_tap\":\"auto\",\"tools\":[{\"id\":\"a28f5868-867f-4ec4-8f74-db53b63a0f2e\",\"type\":\"PanTool\"},{\"id\":\"d6455e2f-7355-4ec9-80ae-b2e7d309ff03\",\"type\":\"WheelZoomTool\"},{\"id\":\"9b887c63-dc15-48a8-af26-455a5357c12a\",\"type\":\"BoxZoomTool\"},{\"id\":\"efb3a078-4ccb-41f4-aefb-ffba46062ec0\",\"type\":\"SaveTool\"},{\"id\":\"06b53a24-399e-420b-a3ca-51013d1d90a5\",\"type\":\"ResetTool\"},{\"id\":\"9116f36e-8338-4a2f-8afb-c1d80f470921\",\"type\":\"HelpTool\"}]},\"id\":\"c8d6cb93-20ae-46cc-beb8-17f341fd7f59\",\"type\":\"Toolbar\"},{\"attributes\":{\"source\":{\"id\":\"966b04e0-987d-4922-8f33-be9d9f67c327\",\"type\":\"ColumnDataSource\"}},\"id\":\"eaf41819-d768-4cbf-a0b9-1deee25519ca\",\"type\":\"CDSView\"},{\"attributes\":{\"plot\":{\"id\":\"9b7364ed-7010-4591-930c-d9fbf4b59fad\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"ad96c61d-bbe5-43a5-b7de-3d6a3b836602\",\"type\":\"BasicTicker\"}},\"id\":\"2f66f274-c4b8-44b7-b1cf-84e78c4b43d0\",\"type\":\"Grid\"},{\"attributes\":{\"callback\":null},\"id\":\"e162c9d6-e91b-4ae1-8c56-bcae1f670892\",\"type\":\"DataRange1d\"},{\"attributes\":{\"axis_label\":\"F(n; 10, 0.5)\",\"formatter\":{\"id\":\"9ca30144-f163-4f4b-9948-3447725b3205\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"9b7364ed-7010-4591-930c-d9fbf4b59fad\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"9503391e-9e21-4dc6-934e-eccd3292997f\",\"type\":\"BasicTicker\"}},\"id\":\"35c3c31b-5cd5-4930-afcd-0d3a011a9169\",\"type\":\"LinearAxis\"},{\"attributes\":{\"data_source\":{\"id\":\"3804d5cb-e388-4521-837e-738b69d01811\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"074883d5-5fb1-47f7-bfc1-51329e30ca7c\",\"type\":\"Ray\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"6647234d-d5b9-4d16-b313-0cb100a90eab\",\"type\":\"Ray\"},\"selection_glyph\":null,\"view\":{\"id\":\"a3397548-fecd-4ccf-9030-df09b31abbb4\",\"type\":\"CDSView\"}},\"id\":\"7c0b387a-fe48-43d9-a5da-ba179fd5773a\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"source\":{\"id\":\"5da197a2-1efc-4198-b82b-c706cbc74160\",\"type\":\"ColumnDataSource\"}},\"id\":\"e76ef1b6-86ba-4d4c-b422-b1da726cfd9e\",\"type\":\"CDSView\"},{\"attributes\":{\"data_source\":{\"id\":\"5da197a2-1efc-4198-b82b-c706cbc74160\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"8b9786b9-e255-422e-82fc-3c4e926e548e\",\"type\":\"Circle\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"61440e8f-3697-4925-ab2b-6610736a34a7\",\"type\":\"Circle\"},\"selection_glyph\":null,\"view\":{\"id\":\"e76ef1b6-86ba-4d4c-b422-b1da726cfd9e\",\"type\":\"CDSView\"}},\"id\":\"6b0e57fd-325d-4766-850e-685c41822011\",\"type\":\"GlyphRenderer\"}],\"root_ids\":[\"9b7364ed-7010-4591-930c-d9fbf4b59fad\"]},\"title\":\"Bokeh Application\",\"version\":\"0.13.0\"}};\n", " var render_items = [{\"docid\":\"ced3b931-9c64-4b56-9e67-8375e5010d72\",\"roots\":{\"9b7364ed-7010-4591-930c-d9fbf4b59fad\":\"72218366-ce0d-4daf-89c8-f620f6b468b4\"}}];\n", " root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n", "\n", " }\n", " if (root.Bokeh !== undefined) {\n", " embed_document(root);\n", " } else {\n", " var attempts = 0;\n", " var timer = setInterval(function(root) {\n", " if (root.Bokeh !== undefined) {\n", " embed_document(root);\n", " clearInterval(timer);\n", " }\n", " attempts++;\n", " if (attempts > 100) {\n", " console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\")\n", " clearInterval(timer);\n", " }\n", " }, 10, root)\n", " }\n", "})(window);" ], "application/vnd.bokehjs_exec.v0+json": "" }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "id": "9b7364ed-7010-4591-930c-d9fbf4b59fad" } }, "output_type": "display_data" } ], "source": [ "x = np.arange(0, 11)\n", "y = st.binom.cdf(x, 10, 0.5)\n", "\n", "p = bokeh.plotting.figure(plot_height=250,\n", " plot_width=350,\n", " x_axis_label='n',\n", " y_axis_label='F(n; 10, 0.5)')\n", "p.segment(x[:-1], y[:-1], x[1:], y[:-1], line_width=2)\n", "p.ray(0, 0, angle=np.pi, length=0, line_width=2)\n", "p.ray(x[-1], 1, angle=0, length=0, line_width=2)\n", "p.circle([0], [0], fill_color='white')\n", "p.circle(x[1:], y[:-1], fill_color='white')\n", "p.circle(x, y)\n", "bokeh.io.show(p)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, since it is understood that the CDF is not multivalued, there should be no ambiguity in plotting the staircase, and indeed staircase style CDFs are commonly used. The staircase has less clutter and I find it is easier to look at and interpret. Furthemore, we know that all CDFs extend toward $x=-\\infty$ with a value of zero and toward $x=\\infty$ with a value of one. So, again, there is no ambiguity in cutting off the infinitely long tails of the CDF." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Discrete distributions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Bernoulli distribution\n", "\n", "* **Story.** A *Bernoulli trial* is an experiment that has two outcomes that can be encoded as success ($y=1$) or failure ($y = 0$). The result $y$ of a Bernoulli trial is Bernoulli distributed.\n", "\n", "* **Example.** Check to see if a given bacterium is competent, given that it has probability $\\theta$ of being competent.\n", "\n", "* **Parameter.** The Bernoulli distribution is parametrized by a\n", "single value, $\\theta$, the probability that the trial is successful.\n", "\n", "* **Support.**\n", "The Bernoulli distribution may be nonzero only for zero and one.\n", "\n", "* **Probability mass function.**\n", "\\begin{align}\n", "f(y;\\theta) = \\left\\{ \\begin{array}{ccc}\n", "1-\\theta & & y = 0 \\\\[0.5em]\n", "\\theta & & y = 1.\n", "\\end{array}\n", "\\right.\n", "\\end{align}\n", "\n", "* **Usage**\n", "\n", "| Package | Syntax |\n", "|: ------------- | :------------- |\n", "|**NumPy** | `np.random.choice([0, 1], p=[1-theta, theta])` |\n", "|**SciPy** | `scipy.stats.bernoulli(theta)` |\n", "|**Stan** | `bernoulli(theta)` |\n", "\n", "* **Related distributions.** \n", " - The Bernoulli distribution is a special case of the [Binomial distribution](#Binomial-distribution) with $N=1$." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "application/vnd.bokehjs_exec.v0+json": "", "text/html": [ "\n", "" ] }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "server_id": "53803458c9ad4733a4036d2fd12d0ebd" } }, "output_type": "display_data" } ], "source": [ "params = [dict(name='θ', start=0, end=1, value=0.5, step=0.01)]\n", "app = distribution_plot_app(x_min=0,\n", " x_max=1,\n", " scipy_dist=st.bernoulli,\n", " params=params,\n", " x_axis_label='y',\n", " title='Bernoulli')\n", "bokeh.io.show(app, notebook_url=notebook_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Geometric distribution\n", "\n", "* **Story.** We perform a series of Bernoulli trials with probability of success $\\theta$ until we\n", "get a success. The number of failures $y$ before the success is Geometrically distributed.\n", "\n", "* **Example.** Consider actin polymerization. At each time\n", "step, an actin monomer may add to the filament (\"failure\"), or an\n", "actin monomer may fall off (\"success\") with (usually very low)\n", "probability $\\theta$. The length of actin filaments are Geometrically\n", "distributed.\n", "\n", "* **Parameter.** The Geometric distribution is parametrized by a\n", "single value, $\\theta$, the probability that the Bernoulli trial is\n", "successful.\n", "\n", "* **Support.** The Geometric distribution, as defined here, is has support on the nonnegative integers.\n", "\n", "* **Probability mass function.**\n", "\n", "\\begin{align}\n", "f(y;\\theta) = (1-\\theta)^y \\, \\theta.\n", "\\end{align}\n", "\n", "* **Usage**\n", "\n", "| Package | Syntax |\n", "|: ------------- | :------------- |\n", "|**NumPy** | `np.random.geometric(theta)` |\n", "|**SciPy** | `scipy.stats.geom(theta, loc=-1)` |\n", "|**Stan** | `neg_binomial(1, theta/(1-theta))` |\n", "\n", "\n", "\n", "* **Related distributions.**\n", " - The Geometric distribution is a special case of the [Negative Binomial distribution](#Negative-Binomial-distribution) in which $\\alpha=1$ and $\\theta = \\beta/(1+\\beta)$.\n", " - The continuous analog of the Geometric distribution is the [Exponential distribution](#Exponential-distribution). \n", "\n", "* **Notes.** \n", " - The Geometric distribution is supported on non-negative integer $y$.\n", " - The Geometric distribution is not implemented in Stan. You can instead use a Negative Binomial distribution fixing the parameter $\\alpha$ to be unity and relating the parameter $\\beta$ of the Negative Binomial distribution to $\\theta$ as $\\theta = \\beta/(1+\\beta)$.\n", " - The Geometric distribution is defined differently in SciPy, instead replacing $y$ with $y-1$. In SciPy's parametrization the Geometric distribution describes the number of successive Bernoulli *trials* (not just the failures; the success is included) necessary to get a success. To adjust for this, we use the `loc=-1` kwarg." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "application/vnd.bokehjs_exec.v0+json": "", "text/html": [ "\n", "" ] }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "server_id": "94449cdc47e6491cab1ada3268fd00ea" } }, "output_type": "display_data" } ], "source": [ "params = [dict(name='θ', start=0, end=1, value=0.5, step=0.01)]\n", "app = distribution_plot_app(x_min=0,\n", " x_max=20,\n", " scipy_dist=st.geom,\n", " params=params,\n", " transform=lambda theta: (theta, -1),\n", " x_axis_label='y',\n", " title='Geometric')\n", "bokeh.io.show(app, notebook_url=notebook_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Negative Binomial distribution\n", "\n", "* **Story.** We perform a series of Bernoulli trials. The number of failures, $y$, before we get $\\alpha$\n", "successes is Negative Binomially distributed. An equivalent story is that the sum of $\\alpha$ independent and identically Gamma distributed variables is Negative Binomial distributed.\n", "\n", "* **Example.** Bursty gene expression can give mRNA count\n", "distributions that are Negative Binomially distributed. Here,\n", "\"success\" is that a burst in gene expression stops. So, the\n", "parameter $\\beta$ is the mean length of a burst in expression. The parameter $\\alpha$ is related to the\n", "frequency of the bursts. If multiple bursts are possible within the\n", "lifetime of mRNA, then $\\alpha > 1$. Then, the number of \"failures\" is\n", "the number of mRNA transcripts that are made in the characteristic\n", "lifetime of mRNA.\n", "\n", "* **Parameters.** There are two parameters: $\\alpha$, the desired number of\n", "successes, and $\\beta$, which is the mean of the $\\alpha$ identical Gamma distributions that give the Negative Binomial. The probability of success of each Bernoulli trial is given by $\\beta/(1+\\beta)$.\n", "\n", "* **Support.** The Negative-Binomial distribution is supported on the set of nonnegative integers.\n", "\n", "* **Probability mass function.**\n", "\\begin{align}\n", "\\\\ \\phantom{blah}\n", "f(y;\\alpha,\\beta) = \\begin{pmatrix}\n", "y+\\alpha-1 \\\\\n", "\\alpha-1\n", "\\end{pmatrix}\n", "\\left(\\frac{\\beta}{1+\\beta}\\right)^\\alpha \\left(\\frac{1}{1+\\beta}\\right)^y.\n", "\\\\ \\phantom{blah}\n", "\\end{align}\n", "Here, we use a combinatorial notation;\n", "\\begin{align}\n", "\\\\ \\phantom{blah}\n", " \\begin{pmatrix}\n", "y+\\alpha-1 \\\\\n", "\\alpha-1\n", "\\end{pmatrix} = \\frac{(y+\\alpha-1)!}{(\\alpha-1)!\\,y!}.\n", "\\\\ \\phantom{blah}\n", "\\end{align}\n", "Generally speaking, $\\alpha$ need not be an integer, so we may write the PMF as\n", "\\begin{align}\n", "\\\\ \\phantom{blah}\n", "f(y;\\alpha,\\beta) = \\frac{\\Gamma(y+\\alpha)}{\\Gamma(\\alpha) \\, y!}\\,\\left(\\frac{\\beta}{1+\\beta}\\right)^\\alpha \\left(\\frac{1}{1+\\beta}\\right)^y.\n", "\\end{align}\n", "\n", "* **Usage**\n", "\n", "| Package | Syntax |\n", "|: ------------- | :------------- |\n", "|**NumPy** | `np.random.negative_binomial(alpha, beta/(1+beta))` |\n", "|**SciPy** | `scipy.stats.nbinom(alpha, beta/(1+beta))` |\n", "|**Stan** | `neg_binomial(alpha, beta)` |\n", "|**Stan with $(\\mu, \\phi)$ parametrization** | `neg_binomial_2(mu, phi)` |\n", "\n", "* **Related distributions.**\n", " - The [Geometric distribution](#Geometric-distribution) is a special case of the Negative Binomial distribution in which $\\alpha=1$ and $\\theta = \\beta/(1+\\beta)$.\n", " - The continuous analog of the Negative Binomial distribution is the [Gamma distribution](#Gamma-distribution). \n", " - In a certain limit, which is easier implemented using the $(\\mu,\\phi)$ parametrization below, the Negative Binomial distribution becomes a [Poisson distribution](#Poisson-distribution).\n", "\n", "* **Notes.** \n", " - The Negative Binomial distribution may be parametrized such that the probability mass function is\n", " \\begin{align}\n", " \\\\ \\phantom{blah}\n", " f(y;\\mu,\\phi) = \\frac{\\Gamma(y+\\phi)}{\\Gamma(\\phi) \\, y!}\\,\\left(\\frac{\\phi}{\\mu+\\phi}\\right)^\\phi\\left(\\frac{\\mu}{\\mu+\\phi}\\right)^y. \n", " \\\\ \\phantom{blah}\n", " \\end{align}\n", " These parameters are related to the parametrization above by $\\phi = \\alpha$ and $\\mu = \\alpha/\\beta$. In the limit of $\\phi\\to\\infty$, which can be taken for the PMF, the Negative Binomial distribution becomes Poisson with parameter $\\mu$. This also gives meaning to the parameters $\\mu$ and $\\phi$. $\\mu$ is the mean of the Negative Binomial, and $\\phi$ controls extra width of the distribution beyond Poisson. The smaller $\\phi$ is, the broader the distribution.\n", " - In Stan, the Negative Binomial distribution using the $(\\mu,\\phi)$ parametrization is called `neg_binomial_2`.\n", " - SciPy and NumPy use yet another parametrization. The PMF for SciPy is\n", " \\begin{align}\n", " \\\\ \\phantom{blah}\n", " f(y;n, p) = \\frac{\\Gamma(y+n)}{\\Gamma(n) \\, y!}\\,p^n \\left(1-p\\right)^y. \n", " \\\\ \\phantom{blah}\n", " \\end{align}\n", " The parameter $p$ is the probability of success of a Bernoulli trial. The parameters are related to the others we have defined by $n=\\alpha=\\phi$ and $p=\\beta/(1+\\beta) = \\phi/(\\mu+\\phi)$." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "application/vnd.bokehjs_exec.v0+json": "", "text/html": [ "\n", "" ] }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "server_id": "0e9a927566a847d1ad4b6bdcd6c34e34" } }, "output_type": "display_data" } ], "source": [ "params = [dict(name='α', start=1, end=20, value=5, step=1),\n", " dict(name='β', start=0, end=5, value=1, step=0.01)]\n", "\n", "app = distribution_plot_app(x_min=0,\n", " x_max=50,\n", " scipy_dist=st.nbinom,\n", " params=params,\n", " transform=lambda alpha, beta: (alpha, beta/(1+beta)),\n", " x_axis_label='y',\n", " title='Negative Binomial')\n", "bokeh.io.show(app, notebook_url=notebook_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Binomial distribution\n", "\n", "* **Story.** We perform $N$ Bernoulli trials, each with probability $\\theta$ of success. The number of successes, $n$, is Binomially distributed.\n", "\n", "* **Example.** Distribution of plasmids between daughter cells\n", "in cell division. Each of the $N$ plasmids as a chance $\\theta$ of being\n", "in daughter cell 1 (\"success\"). The number of plasmids, $n$, in\n", "daughter cell 1 is binomially distributed.\n", "\n", "* **Parameters.** There are two parameters: the probability $\\theta$ of success for each Bernoulli trial, and the number of trials, $N$.\n", "\n", "* **Support.** The Binomial distribution is supported on the set of nonnegative integers.\n", "\n", "* **Probability mass function.**\n", "\n", "\\begin{align}\n", "f(n;N,\\theta) = \\begin{pmatrix}\n", "N \\\\\n", "n\n", "\\end{pmatrix}\n", "\\theta^n (1-\\theta)^{N-n}.\n", "\\end{align}\n", "\n", "* **Usage**\n", "\n", "| Package | Syntax |\n", "|: ------------- | :------------- |\n", "|**NumPy** | `np.random.binomial(N, theta)` |\n", "|**SciPy** | `scipy.stats.binom(N, theta)` |\n", "|**Stan** | `binomial(N, theta)` |\n", "\n", "\n", "* **Related distributions.**\n", " - The [Bernoulli distribution](#Bernoulli-distribution) is a special case of the Binomial distribution where $N=1$.\n", " - In the limit of $N\\to\\infty$ and $\\theta\\to 0$ such that the quantity $N\\theta$ is fixed, the Binomial distribution becomes a [Poisson distribution](#Poisson-distribution) with parameter $N\\theta$.\n", " - The Binomial distribution is a limit of the [Hypergeometric distribution](#Hypergeometric-distribution). Considering the Hypergeometric distribution and taking the limit of $a+b\\to\\infty$ such that $a/(a+b)$ is fixed, we get a [Binomial distribution](#Binomial-distribution) with parameters $N=N$ and $\\theta = a/(a+b)$." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "application/vnd.bokehjs_exec.v0+json": "", "text/html": [ "\n", "" ] }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "server_id": "5b531e4dcb76442a81e573628dd95e98" } }, "output_type": "display_data" } ], "source": [ "params = [dict(name='N', start=1, end=20, value=5, step=1),\n", " dict(name='θ', start=0, end=1, value=0.5, step=0.01)]\n", "app = distribution_plot_app(x_min=0,\n", " x_max=20,\n", " scipy_dist=st.binom,\n", " params=params,\n", " x_axis_label='n',\n", " title='Binomial')\n", "bokeh.io.show(app, notebook_url=notebook_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Poisson distribution\n", "\n", "* **Story.** Rare events occur with a rate $\\lambda$ per unit\n", "time. There is no \"memory\" of previous events; i.e., that rate is\n", "independent of time. A process that generates such events is called a *Poisson process*. The occurrence of a rare event in this context is referred to as an *arrival*. The number $n$ of arrivals in unit time is Poisson distributed.\n", "\n", "* **Example.** The number of mutations in a strand of DNA per\n", "unit length (since mutations are rare) are Poisson distributed.\n", "\n", "* **Parameter.** The single parameter is the rate $\\lambda$ of\n", "the rare events occurring.\n", "\n", "* **Support.** The Poisson distribution is supported on the set of nonnegative integers.\n", "\n", "* **Probability mass function.**\n", "\\begin{align}\n", "f(n;\\lambda) = \\frac{\\lambda^n}{n!}\\,\\mathrm{e}^{-\\lambda}.\n", "\\end{align}\n", "\n", "* **Usage**\n", "\n", "| Package | Syntax |\n", "|: ------------- | :------------- |\n", "|**NumPy** | `np.random.poisson(lam)` |\n", "|**SciPy** | `scipy.stats.poisson(lam)` |\n", "|**Stan** | `poisson(lam)` |\n", "\n", "\n", "* **Related distributions.**\n", " - In the limit of $N\\to\\infty$ and $\\theta\\to 0$ such that the quantity $N\\theta$ is fixed, the [Binomial distribution](#Binomial-distribution) becomes a Poisson distribution with parameter $N\\theta$. Thus, for large $N$ and small $\\theta$,\n", " \\begin{align}\n", " \\\\ \\phantom{blah}\n", " f_\\mathrm{Poisson}(n;\\lambda) \\approx f_\\mathrm{Binomial}(n;N, \\theta),\n", " \\\\ \\phantom{blah}\n", " \\end{align}\n", " with $\\lambda = N\\theta$. Considering the biological example of mutations,\n", " this is Binomially distributed: There are $N$ bases, each with a\n", " probability $\\theta$ of mutation, so the number of mutations, $n$ is\n", " binomially distributed. Since $\\theta$ is small and $N$ is large, it is approximately\n", " Poisson distributed.\n", " - Under the ($\\mu, \\phi$) parametrization of the [Negative Binomial distribution](#Negative-Binomial-distribution), taking the limit of large $\\phi$ yields the Poisson distribution." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "application/vnd.bokehjs_exec.v0+json": "", "text/html": [ "\n", "" ] }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "server_id": "5c4b7fa4608a4b4fa84e93128c4cd4d1" } }, "output_type": "display_data" } ], "source": [ "params = [dict(name='λ', start=1, end=20, value=5, step=0.1)]\n", "app = distribution_plot_app(x_min=0,\n", " x_max=40,\n", " scipy_dist=st.poisson,\n", " params=params,\n", " x_axis_label='n',\n", " title='Poisson')\n", "bokeh.io.show(app, notebook_url=notebook_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Hypergeometric distribution\n", "\n", "* **Story.** Consider an urn with $a$ white balls and $b$ black\n", "balls. Draw $N$ balls from this urn without replacement. The number\n", "white balls drawn, $n$, is Hypergeometrically distributed.\n", "\n", "* **Example.** There are $a+b$ finches on an island, and $a$ of\n", "them are tagged (and therefore $b$ of them are untagged). You capture $N$ finches. The number of tagged\n", "finches $n$ is Hypergeometrically distributed, $f(n;N, a, b)$,\n", "as defined below.\n", "\n", "* **Parameters.** There are three parameters: the number of\n", "draws $N$, the number of white balls $a$, and the number of black\n", "balls $b$.\n", "\n", "* **Support.** The Hypergeometric distribution is supported on the set of integers between $\\max(0, N-b)$ and $\\min(N, a)$, inclusive.\n", "\n", "* **Probability mass function.**\n", "\\begin{align}\n", "f(n;N, a, b) = \\frac{\\begin{pmatrix}a\\\\n\\end{pmatrix}\\begin{pmatrix}b\\\\N-n\\end{pmatrix}}\n", "{\\begin{pmatrix}a+b\\\\N\\end{pmatrix}}.\n", "\\end{align}\n", "\n", "* **Usage**\n", "\n", "| Package | Syntax |\n", "|: ------------- | :------------- |\n", "|**NumPy** | `np.random.hypergeometric(a, b, N)` |\n", "|**SciPy** | `scipy.stats.hypergeom(a+b, a, N)` |\n", "|**Stan** | `hypergeometric(N, a, b)` |\n", "\n", "\n", "* **Related distributions.** \n", " - In the limit of $a+b\\to\\infty$ such that $a/(a+b)$ is fixed, we get a [Binomial distribution](#Binomial-distribution) with parameters $N=N$ and $\\theta = a/(a+b)$.\n", "\n", "* **Notes.** \n", " - This distribution is analogous to the [Binomial distribution](#Binomial-distribution), except that the Binomial distribution describes draws from an urn *with* replacement. In the analogy, $p = a/(a+b)$.\n", " - SciPy uses a different parametrization. Let $M = a+b$ be the total number of balls in the urn. Then, noting the order of the parameters, since this is what `scipy.stats.hypergeom` expects,\n", " \\begin{align}\n", " \\\\ \\phantom{blah}\n", " f(n;M, a, N) = \\frac{\\begin{pmatrix}a\\\\n\\end{pmatrix}\\begin{pmatrix}M-a\\\\N-n\\end{pmatrix}}\n", " {\\begin{pmatrix}M\\\\n\\end{pmatrix}}.\n", " \\\\ \\phantom{blah}\n", " \\end{align}\n", " - The random number generator in `numpy.random` has a different parametrization than in the `scipy.stats` module. The `numpy.random.hypergeom()` function uses the same parametrization as Stan, except the parameters are given in the order `a, b, N`, not `N, a, b`, as in Stan. \n", " - When using the sliders below, you will only get a plot if $N \\le a+b$ because you cannot draw more balls out of the urn than are actually in there." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "application/vnd.bokehjs_exec.v0+json": "", "text/html": [ "\n", "" ] }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "server_id": "c606e5ab098e4439946859ee0d18b924" } }, "output_type": "display_data" } ], "source": [ "params = [dict(name='N', start=1, end=20, value=10, step=1),\n", " dict(name='a', start=1, end=20, value=10, step=1),\n", " dict(name='b', start=1, end=20, value=10, step=1)]\n", "app = distribution_plot_app(x_min=0,\n", " x_max=40,\n", " scipy_dist=st.hypergeom,\n", " params=params,\n", " transform=lambda N, a, b: (a+b, a, N),\n", " x_axis_label='n',\n", " title='Hypergeometric')\n", "bokeh.io.show(app, notebook_url=notebook_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Categorical distribution\n", "\n", "* **Story.** A probability is assigned to each of a set of discrete outcomes.\n", "\n", "* **Example.** A hen will peck at grain A with probability $\\theta_A$, grain B with probability $\\theta_B$, and grain C with probability $\\theta_C$.\n", "\n", "* **Parameters.** The distribution is parametrized by the probabilities assigned to each event. We define $\\theta_y$ to be the probability assigned to outcome $y$. The set of $\\theta_y$'s are the parameters, and are constrained by\n", "\n", "\\begin{align}\n", "\\sum_y \\theta_y = 1.\n", "\\end{align}\n", "\n", "* **Support.** If we index the categories with sequential integers from 1 to *N*, the distribution is supported for integers 1 to _N_, inclusive.\n", "\n", "* **Probability mass function.**\n", "\\begin{align}\n", "f(y;\\{\\theta_y\\}) = \\theta_y.\n", "\\end{align}\n", "\n", "* **Usage (with `theta` length `n`)**\n", "\n", "| Package | Syntax |\n", "|: -------------: | :------------- |\n", "|**NumPy** | `np.random.choice(len(theta), p=theta)` |\n", "|**SciPy** | `scipy.stats.rv_discrete(values=(range(len(theta)), theta)).rvs()` |\n", "|**Stan** | `categorical(theta)` |\n", "\n", "* **Related distributions.** \n", " - The [Discrete Uniform distribution](#Discrete-Uniform-distribution) is a special case where all $\\theta_y$ are equal.\n", " - The Bernoulli distribution is a special case where there are two categories that can be encoded as having outcomes of zero or one. In this case, the parameter for the Bernoulli distribution is $\\theta = \\theta_0 = 1-\\theta_1$.\n", "\n", "\n", "* **Notes.** \n", " - This distribution must be manually constructed if you are using the `scipy.stats` module using `scipy.stats.rv_discrete()`. The categories need to be encoded by an index. For interactive plotting purposes, below, we need to specify a custom PMF and CDF.\n", " - To sample out of a Categorical distribution, use `numpy.random.choice()`, specifying the values of $\\theta$ using the `p` kwarg." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "application/vnd.bokehjs_exec.v0+json": "", "text/html": [ "\n", "" ] }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "server_id": "59c40536dcf24121825f9ed9df7801a8" } }, "output_type": "display_data" } ], "source": [ "def categorical_pmf(x, θ1, θ2, θ3):\n", " thetas = np.array([θ1, θ2, θ3, 1-θ1-θ2-θ3])\n", " if (thetas < 0).any():\n", " return np.array([np.nan]*len(x))\n", " return thetas[x-1]\n", "\n", "def categorical_cdf_indiv(x, thetas):\n", " if x < 1:\n", " return 0\n", " elif x >= 4:\n", " return 1\n", " else:\n", " return np.sum(thetas[:int(x)])\n", " \n", "def categorical_cdf(x, θ1, θ2, θ3):\n", " thetas = np.array([θ1, θ2, θ3, 1-θ1-θ2-θ3])\n", " if (thetas < 0).any():\n", " return np.array([np.nan]*len(x))\n", "\n", " return np.array([categorical_cdf_indiv(x_val, thetas) for x_val in x])\n", "\n", "params = [dict(name='θ1', start=0, end=1, value=0.2, step=0.01),\n", " dict(name='θ2', start=0, end=1, value=0.3, step=0.01),\n", " dict(name='θ3', start=0, end=1, value=0.1, step=0.01)]\n", "app = distribution_plot_app(x_min=1,\n", " x_max=4,\n", " custom_pmf=categorical_pmf,\n", " custom_cdf=categorical_cdf,\n", " params=params,\n", " x_axis_label='category',\n", " title='Discrete categorical')\n", "bokeh.io.show(app, notebook_url=notebook_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Discrete Uniform distribution\n", "\n", "* **Story.** A set of discrete outcomes that can be indexed with sequential integers each has equal probability, like rolling a fair die.\n", "\n", "* **Example.** A monkey can choose from any of $n$ bananas with equal probability.\n", "\n", "* **Parameters.** The distribution is parametrized by the high and low allowed values.\n", "\n", "* **Support.** The Discrete Uniform distribution is supported on the set of integers ranging from $y_\\mathrm{low}$ to $y_\\mathrm{high}$, inclusive.\n", "\n", "* **Probability mass function.**\n", "\\begin{align}\n", "f(y;y_\\mathrm{low}, y_\\mathrm{high}) = \\frac{1}{y_\\mathrm{high} - y_\\mathrm{low} + 1}\n", "\\end{align}\n", "\n", "* **Usage**\n", "\n", "| Package | Syntax |\n", "|: -------------: | :------------- |\n", "|**NumPy** | `np.random.randint(low, high+1)` |\n", "|**SciPy** | `scipy.stats.randint(low, high+1)` |\n", "|**Stan** | `categorical(theta)`, `theta` array with all equal values |\n", "\n", "* **Related distributions.** \n", " - The [Discrete Uniform distribution](#Discrete-Uniform-distribution) is a special case of the [Categorical distribution](#Categorical-distribution) where all $\\theta_y$ are equal.\n", "\n", "* **Notes.** \n", " - This distribution is not included in Stan. Instead, use a [Categorical distribution](#Categorical-distribution) with equal probailities.\n", " - In SciPy, this distribution is know as `scipy.stats.randint`. The `high` parameter is not inclusive; i.e., the set of allowed values includes the `low` parameter, but not the `high`. The same is true for `numpy.random.randint()`, which is used for sampling out of this distribution." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "application/vnd.bokehjs_exec.v0+json": "", "text/html": [ "\n", "" ] }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "server_id": "bbc4b4965ec546b0b22ed860a80d39e2" } }, "output_type": "display_data" } ], "source": [ "params = [dict(name='low', start=0, end=10, value=0, step=1),\n", " dict(name='high', start=0, end=10, value=10, step=1)]\n", "app = distribution_plot_app(x_min=0,\n", " x_max=10,\n", " scipy_dist=st.randint,\n", " params=params,\n", " transform=lambda low, high: (low, high+1),\n", " x_axis_label='y',\n", " title='Discrete continuous')\n", "bokeh.io.show(app, notebook_url=notebook_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Continuous distributions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Uniform distribution\n", "\n", "* **Story.** Any outcome in a given range has equal probability.\n", "\n", "* **Example.** Anything in which all possibilities are equally\n", "likely. This is, perhaps surprisingly, rarely encountered.\n", "\n", "* **Parameters.** The Uniform distribution is not defined on an\n", "infinite or semi-infinite domain, so lower and upper bounds, $\\alpha$ and\n", "$\\beta$, respectively, are necessary parameters.\n", "\n", "* **Support.** The Uniform distribution is supported on the interval $[\\alpha, \\beta]$.\n", "\n", "* **Probability density function.**\n", "\n", "\\begin{align}\n", "f(y;\\alpha, \\beta) = \\left\\{ \\begin{array}{ccc}\n", "\\frac{1}{\\beta - \\alpha} & & \\alpha \\le y \\le \\beta \\\\[0.5em]\n", "0 & & \\text{otherwise.}\n", "\\end{array}\n", "\\right.\n", "\\end{align}\n", "\n", "* **Usage**\n", "\n", "| Package | Syntax |\n", "|: -------------: | :------------- |\n", "|**NumPy** | `np.random.uniform(alpha, beta)` |\n", "|**SciPy** | `scipy.stats.uniform(alpha, beta)` |\n", "|**Stan** | `uniform(alpha, beta)` |\n", "\n", "\n", "\n", "* **Related distributions.** \n", " - The Uniform distribution on the interval [0, 1] (i.e., $\\alpha=0$ and $\\beta=1$) is a special case of the [Beta distribution](#Beta-distribution) where the parameters for the Beta distribution are $\\alpha=\\beta=1$ (not to be confused with the $\\alpha$ and $\\beta$ used to parametrize the Uniform distribution)." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "application/vnd.bokehjs_exec.v0+json": "", "text/html": [ "\n", "" ] }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "server_id": "b575a13fe822460e820b32de263029f0" } }, "output_type": "display_data" } ], "source": [ "params = [dict(name='α', start=-2, end=3, value=0, step=0.01),\n", " dict(name='β', start=-2, end=3, value=1, step=0.01)]\n", "app = distribution_plot_app(x_min=-2,\n", " x_max=3,\n", " scipy_dist=st.uniform,\n", " params=params,\n", " title='Uniform')\n", "bokeh.io.show(app, notebook_url=notebook_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Gaussian, a.k.a. Normal, distribution\n", "\n", "* **Story.** Any quantity that emerges as the sum of a large number of\n", "subprocesses tends to be Gaussian distributed provided none of the\n", "subprocesses is very broadly distributed.\n", "\n", "* **Example.** We measure the length of many *C. elegans*\n", "eggs. The lengths are Gaussian distributed. Many biological measurements, like the height of people, are (approximately) Gaussian distributed. Many processes contribute to setting the length of an egg or the height of a person.\n", "\n", "* **Parameters.** The Gaussian distribution has two parameters,\n", "the mean $\\mu$, which determines the location of its peak, and the\n", "standard deviation $\\sigma$, which is strictly positive (the\n", "$\\sigma\\to 0$ limit defines a Dirac delta function) and determines the\n", "width of the peak.\n", "\n", "* **Support.** The Gaussian distribution is supported on the set of real numbers.\n", "\n", "* **Probability density function.**\n", "\n", "\\begin{align}\n", "f(y;\\mu, \\sigma) = \\frac{1}{\\sqrt{2\\pi \\sigma^2}}\\,\\mathrm{e}^{-(y-\\mu)^2/2\\sigma^2}.\n", "\\end{align}\n", "\n", "* **Usage**\n", "\n", "| Package | Syntax |\n", "|: -------------: | :------------- |\n", "|**NumPy** | `np.random.normal(mu, sigma)` |\n", "|**SciPy** | `scipy.stats.norm(mu, sigma)` |\n", "|**Stan** | `normal(mu, sigma)` |\n", "\n", "* **Related distributions.** The Gaussian distribution is a limiting distribution in the sense of the\n", "central limit theorem, but also in that many distributions have a\n", "Gaussian distribution as a limit. This is seen by formally taking\n", "limits of, e.g., the Gamma, Student-t, Binomial distributions, which\n", "allows direct comparison of parameters.\n", "\n", "* **Notes.** \n", " - SciPy, NumPy, and Stan all refer to the Gaussian distribution as the Normal distribution." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "application/vnd.bokehjs_exec.v0+json": "", "text/html": [ "\n", "" ] }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "server_id": "4416fc9a38454a76839096ca994b8c26" } }, "output_type": "display_data" } ], "source": [ "params = [dict(name='µ', start=-0.5, end=0.5, value=0, step=0.01),\n", " dict(name='σ', start=0.1, end=1.0, value=0.2, step=0.01)]\n", "app = distribution_plot_app(x_min=-2,\n", " x_max=2,\n", " scipy_dist=st.norm,\n", " params=params, \n", " x_axis_label='y',\n", " title='Gaussian, a.k.a. Normal')\n", "bokeh.io.show(app, notebook_url=notebook_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Half-Normal distribution\n", "\n", "* **Story.** The Half-Normal distribution is a [Gaussian distribution](#Gaussian-distribution) with zero mean truncated to only have nonzero probability for positive real numbers.\n", "\n", "* **Parameters.** Strictly speaking, the Half-Normal is parametrized by a single positive parameter $\\sigma$. We could, however, translate it so that the truncated Gaussian has a maximum at $\\mu$ and only values greater than or equal to $\\mu$ have nonzero probability.\n", "\n", "* **Probability density function.**\n", "\n", "\\begin{align}\n", "f(y;\\mu, \\sigma) = \\left\\{\\begin{array}{ccc}\n", "\\frac{2}{\\sqrt{2\\pi \\sigma^2}}\\,\\mathrm{e}^{-(y-\\mu)^2/2\\sigma^2} & & y \\ge \\mu\\\\\n", "0 && \\text{otherwise.}\n", "\\end{array}\\right.\n", "\\end{align}\n", "\n", "* **Support.** The Half-Normal distribution is supported on the set $[\\mu, \\infty)$. Usually, we have $\\mu = 0$, in which case the Half-Normal distribution is supported on the set of nonnegative real numbers.\n", "\n", "* **Usage**\n", "\n", "| Package | Syntax |\n", "|: -------------: | :------------- |\n", "|**NumPy** | `np.random.normal(mu, sigma)` |\n", "|**SciPy** | `scipy.stats.halfnorm(mu, sigma)` |\n", "|**Stan** | `real y; y ~ normal(mu, sigma)` |\n", "\n", "\n", "\n", "* **Notes.** \n", " - In Stan, a Half-Normal is defined by putting bounds on the variable and then using a Normal distribution.\n", " - The Half-Normal distibution is a useful prior for nonnegative parameters that should not be too large and may be very close to zero." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "application/vnd.bokehjs_exec.v0+json": "", "text/html": [ "\n", "" ] }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "server_id": "8d260e181feb427a8aa5920a04e9fa95" } }, "output_type": "display_data" } ], "source": [ "params = [dict(name='µ', start=-0.5, end=0.5, value=0, step=0.01),\n", " dict(name='σ', start=0.1, end=1.0, value=0.2, step=0.01)]\n", "app = distribution_plot_app(x_min=-0.5,\n", " x_max=2,\n", " scipy_dist=st.halfnorm,\n", " params=params, \n", " x_axis_label='y',\n", " title='Half-Normal')\n", "bokeh.io.show(app, notebook_url=notebook_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Log-Normal distribution\n", "\n", "* **Story.** If $\\ln y$ is Gaussian distributed, $y$ is\n", "Log-Normally distributed.\n", "\n", "* **Example.** A measure of fold change in gene expression can\n", "be Log-Normally distributed.\n", "\n", "* **Parameters.** As for a Gaussian, there are two parameters,\n", "the mean, $\\mu$, and the variance $\\sigma^2$.\n", "\n", "* **Support.** The Log-Normal distribution is supported on the set of positive real numbers.\n", "\n", "* **Probability density function.**\n", "\n", "\\begin{align}\n", "f(y;\\mu, \\sigma) = \\frac{1}{y\\sqrt{2\\pi \\sigma^2}}\\,\\mathrm{e}^{-(\\ln y - \\mu)^2/2\\sigma^2}.\n", "\\end{align}\n", "\n", "* **Usage**\n", "\n", "| Package | Syntax |\n", "|: -------------: | :------------- |\n", "|**NumPy** | `np.random.lognormal(mu, sigma)` |\n", "|**SciPy** | `scipy.stats.lognorm(x, sigma, loc=0, scale=np.exp(mu))` |\n", "|**Stan** | `lognormal(mu, sigma)` |\n", "\n", "* **Notes.** \n", " - Be careful not to get confused. The Log-Normal distribution describes the distribution of $y$ given that $\\ln y$ is Gaussian distributed. It does not describe the distribution of $\\ln y$.\n", " - The way location, scale, and shape parameters work in SciPy for the Log-Normal distribution is confusing. If you want to specify a Log-Normal distribution as we have defined it using `scipy.stats`, use a shape parameter equal to $\\sigma$, a location parameter of zero, and a scale parameter given by $\\mathrm{e}^\\mu$. For example, to compute the PDF, you would use `scipy.stats.lognorm(x, sigma, loc=0, scale=np.exp(mu))`.\n", " - The definition of the Log_Normal in the `numpy.random` module matches what we have defined above and what is defined in Stan." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "application/vnd.bokehjs_exec.v0+json": "", "text/html": [ "\n", "" ] }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "server_id": "4c7720f5b5834ef19d424f7328b142c3" } }, "output_type": "display_data" } ], "source": [ "params = [dict(name='µ', start=0.01, end=0.5, value=0.1, step=0.01),\n", " dict(name='σ', start=0.1, end=1.0, value=0.2, step=0.01)]\n", "app = distribution_plot_app(x_min=0,\n", " x_max=4,\n", " scipy_dist=st.lognorm,\n", " params=params, \n", " transform=lambda mu, sigma: (sigma, 0, np.exp(mu)),\n", " x_axis_label='y',\n", " title='Log-Normal')\n", "\n", "bokeh.io.show(app, notebook_url=notebook_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Chi-square distribution\n", "\n", "* **Story.** If $X_1$,\n", "$X_2$, $\\ldots$, $X_n$ are Gaussian distributed,\n", "$X_1^2 + X_2^2 + \\cdots + X_n^2$ is $\\chi^2$-distributed. See also the story of the [Gamma distribution](#Gamma-distribution), below.\n", "\n", "* **Example.** The sample variance of $\\nu-1$ independent and identically distributed Gaussian random variables, after scaling, is Chi-square distributed. This is the most common use case of the Chi-square distribution.\n", "\n", "* **Parameters.** There is only one parameter, the degrees of freedom $\\nu$.\n", "\n", "* **Support.** The Chi-square distribution is supported on the positive real numbers.\n", "\n", "* **Probability density function.**\n", "\\begin{align}\n", " f(y;\\nu) \\equiv \\chi^2_n(x;\\nu) = \\frac{1}{2^{\\nu/2}\\,\\Gamma\\left(\\frac{\\nu}{2}\\right)}\\,\n", "x^{\\frac{\\nu}{2}-1}\\,\\mathrm{e}^{-y/2}.\n", "\\end{align}\n", "\n", "* **Usage**\n", "\n", "| Package | Syntax |\n", "|: -------------: | :------------- |\n", "|**NumPy** | `np.random.chisquare(nu)` |\n", "|**SciPy** | `scipy.stats.chi2(nu)` |\n", "|**Stan** | `chi_square(nu)` |\n", "\n", "\n", "* **Related distributions.** The Chi-square distribution is a special case of the [Gamma distribution](#Gamma-distribution) with $\\alpha = \\nu/2$ and $\\beta = 1/2$." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "application/vnd.bokehjs_exec.v0+json": "", "text/html": [ "\n", "" ] }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "server_id": "7f6a19966c264edb9c154043f969ca40" } }, "output_type": "display_data" } ], "source": [ "params = [dict(name='ν', start=1, end=20, value=10, step=0.01)]\n", "app = distribution_plot_app(x_min=0,\n", " x_max=40,\n", " scipy_dist=st.chi2,\n", " params=params, \n", " x_axis_label='y',\n", " title='Chi-square')\n", "bokeh.io.show(app, notebook_url=notebook_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Student-t distribution\n", "\n", "* **Story.** The story of the Student-t distribution largely derives from its relationships with other distributions. One way to think about it is as a Gaussian-like distribution with heavier tails.\n", "\n", "* **Parameters.** The Student-t distribution is peaked, and its peak is located at $\\mu$. The peak's width is dictated by parameter $\\sigma$. Finally, we define the \"degrees of freedom\" as $\\nu$. This last parameter imparts the distribution with heavy tails.\n", "\n", "* **Support.** The Student-t distribution is supported on the set of real numbers.\n", "\n", "* **Probability density function.**\n", "\n", "\\begin{align}\n", "f(y;\\mu, \\sigma, \\nu) = \\frac{\\Gamma\\left(\\frac{\\nu+1}{2}\\right)}{\\Gamma\\left(\\frac{\\nu}{2}\\right)\\sqrt{\\pi \\nu \\sigma^2}}\\,\n", "\\left(1 + \\frac{(y-\\mu)^2}{\\nu\\sigma^2}\\right)^{-\\frac{\\nu+1}{2}}.\n", "\\end{align}\n", "\n", "* **Usage**\n", "\n", "| Package | Syntax |\n", "|: -------------: | :------------- |\n", "|**NumPy** | `mu + sigma * np.random.standard_t(nu)` |\n", "|**SciPy** | `scipy.stats.t(nu, mu, sigma)` |\n", "|**Stan** | `student_t(nu, mu, sigma)` |\n", "\n", "\n", "\n", "* **Related distributions.**\n", " - In the $n\\to \\infty$ limit, the Student-t distribution becomes as [Gaussian distribution](#Gaussian-distribution).\n", " - The [Cauchy distibution](#Cauchy-distribution) is a special case of the Student-t distribution in which $\\nu = 1$.\n", " - Only the standard Student-t distribution ($\\mu=0$ and $\\sigma = 1$) is available in the `numpy.random` module. You can still draw out of the Student-t distribution by performing a transformation on the samples out of the standard Student-t distribution, as shown in the usage, above.\n", " - We get this distribution whenever we marginalize an unknown $\\sigma$ out of a Gaussian distribution with an improper prior on $\\sigma$ of $1/\\sigma$." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "application/vnd.bokehjs_exec.v0+json": "", "text/html": [ "\n", "" ] }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "server_id": "02634637c7784e25a24ffe4073b6b163" } }, "output_type": "display_data" } ], "source": [ "params = [dict(name='ν', start=1, end=50, value=10, step=0.01),\n", " dict(name='µ', start=-0.5, end=0.5, value=0, step=0.01),\n", " dict(name='σ', start=0.1, end=1.0, value=0.2, step=0.01)]\n", "app = distribution_plot_app(x_min=-2,\n", " x_max=2,\n", " scipy_dist=st.t,\n", " params=params, \n", " x_axis_label='y',\n", " title='Student-t')\n", "bokeh.io.show(app, notebook_url=notebook_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Cauchy distribution\n", "\n", "* **Story.** The intercept on the *x*-axis of a beam of light coming from the point $(\\mu, \\sigma)$ is Cauchy distributed. This story is popular in physics (and is one of the first examples of Bayesian inference in [Sivia's book](https://global.oup.com/academic/product/data-analysis-9780198568322)), but is not particularly useful. You can think of it as a peaked distribution with enormously heavy tails.\n", "\n", "* **Parameters.** The Cauchy distribution is peaked, and its peak is located at $\\mu$. The peak's width is dictated by parameter $\\sigma$.\n", "\n", "* **Support.** The Cauchy distribution is supported on the set of real numbers.\n", "\n", "* **Probability density function.**\n", "\n", "\\begin{align}\n", "f(y;\\mu, \\sigma) = \\frac{1}{\\pi \\sigma}\\,\n", "\\frac{1}{1 + (y-\\mu)^2/\\sigma^2}.\n", "\\end{align}\n", "\n", "* **Usage**\n", "\n", "| Package | Syntax |\n", "|: -------------: | :------------- |\n", "|**NumPy** | `mu + sigma * np.random.standard_cauchy()` |\n", "|**SciPy** | `scipy.stats.cauchy(mu, sigma)` |\n", "|**Stan** | `cauchy(mu, sigma)` |\n", "\n", "* **Related distributions.**\n", " - The Cauchy distribution is a special case of the [Student-t distribution](Student-t-distribution) in which the degrees of freedom *ν* = 1.\n", " - The `numpy.random` module only has the Standard Cauchy distribution ($\\mu = 0$ and $\\sigma = 1$), but you can draw out of a Cauchy distribution using the transformation shown in the NumPy usage above." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "application/vnd.bokehjs_exec.v0+json": "", "text/html": [ "\n", "" ] }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "server_id": "f1261c67a48049da8d711263c545dbb5" } }, "output_type": "display_data" } ], "source": [ "params = [dict(name='µ', start=-0.5, end=0.5, value=0, step=0.01),\n", " dict(name='σ', start=0.1, end=1.0, value=0.2, step=0.01)]\n", "app = distribution_plot_app(x_min=-2,\n", " x_max=2,\n", " scipy_dist=st.cauchy,\n", " params=params, \n", " x_axis_label='y',\n", " title='Cauchy')\n", "bokeh.io.show(app, notebook_url=notebook_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exponential distribution\n", "\n", "* **Story.** This is the waiting time for an arrival from a\n", "Poisson process. I.e., the inter-arrival time of a Poisson process is\n", "Exponentially distributed.\n", "\n", "* **Example.** The time between conformational switches in a\n", "protein is Exponentially distributed (under simple mass action\n", "kinetics).\n", "\n", "* **Parameter.** The single parameter is the average arrival\n", "*rate*, $\\beta$. Alternatively, we can use $\\tau=1/r$ as the parameter, in this case a characteristic arrival *time*.\n", "\n", "* **Support.** The Exponential distribution is supported on the set of nonnegative real numbers.\n", "\n", "* **Probability density function.**\n", "\\begin{align}\n", "f(y;\\beta) = \\beta\\, \\mathrm{e}^{-\\beta y}.\n", "\\end{align}\n", "\n", "\n", "* **Related distributions.**\n", " - The Exponential distribution is the continuous analog of the [Geometric distribution](#Geometric-distribution). The \"rate\" in the Exponential distribution is analogous to the probability of success of the Bernoulli trial. Note also that because they are uncorrelated, the amount of time between any two arrivals is independent of all other inter-arrival times.\n", " - The Exponential distribution is a special case of the [Gamma distribution](#Gamma-distribution) with parameter $\\alpha = 1$.\n", " \n", "* **Usage**\n", "\n", "| Package | Syntax |\n", "|: -------------: | :------------- |\n", "|**NumPy** | `np.random.exponential(1/beta)` |\n", "|**SciPy** | `scipy.stats.expon(loc=0, scale=1/beta)` |\n", "|**Stan** | `exponential(beta)` |\n", "\n", "* **Notes.** \n", " - Alternatively, we could parametrize the Exponential distribution in terms of an average time between arrivals of a Poisson process, $\\tau$, as\n", " \\begin{align}\n", " \\\\\\phantom{blah}\n", " f(y;\\tau) = \\frac{1}{\\tau}\\, \\mathrm{e}^{-y/\\tau}.\n", " \\\\\\phantom{blah}\n", " \\end{align}\n", " - The implementation in the `scipy.stats` module also has a location parameter, which shifts the distribution left and right. For our purposes, you can ignore that parameter, but be aware that `scipy.stats` requires it. The `scipy.stats` Exponential distribution is parametrized in terms of the interarrival time, $\\tau$, and not $\\beta$.\n", " - The `numpy.random.exponential()` function does not need nor accept a location parameter. It is also parametrized in terms of $\\tau$." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "application/vnd.bokehjs_exec.v0+json": "", "text/html": [ "\n", "" ] }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "server_id": "1670d49aadc24a6d934dcf261f1082be" } }, "output_type": "display_data" } ], "source": [ "params = [dict(name='β', start=0.1, end=1, value=0.25, step=0.01)]\n", "app = distribution_plot_app(0,\n", " 30,\n", " st.expon,\n", " params=params,\n", " transform=lambda x: (0, 1/x), \n", " x_axis_label='y',\n", " title='Exponential')\n", "bokeh.io.show(app, notebook_url=notebook_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Gamma distribution\n", "\n", "* **Story.** The amount of time we have to wait for $\\alpha$ arrivals\n", "of a Poisson process. More concretely, if we have events, $X_1$,\n", "$X_2$, $\\ldots$, $X_\\alpha$ that are exponentially distributed,\n", "$X_1 + X_2 + \\cdots + X_\\alpha$ is Gamma distributed.\n", "\n", "* **Example.** Any multistep process where each step happens at\n", "the same rate. This is common in molecular rearrangements.\n", "\n", "* **Parameters.** The number of arrivals, $\\alpha$, and the rate of\n", "arrivals, $\\beta$.\n", "\n", "* **Support.** The Gamma distribution is supported on the set of positive real numbers.\n", "\n", "* **Probability density function.**\n", "\n", "\\begin{align}\n", "f(y;\\alpha, \\beta) = \\frac{1}{\\Gamma(\\alpha)}\\,\\frac{(\\beta y)^\\alpha}{y}\\,\\mathrm{e}^{-\\beta y}.\n", "\\end{align}\n", "\n", "\n", "* **Related distributions.** \n", " - The Gamma distribution is the continuous\n", "analog of the [Negative Binomial distribution](#Negative-Binomial-distribution).\n", " - The special case where $\\alpha=1$ is an [Exponential distribution](#Exponential-distribution).\n", " - The special case where $\\alpha = \\nu/2$ and $\\beta = 1/2$ is a [Chi-square distribution](#Chi-square-distribution) parametrized by $\\nu$.\n", " \n", "* **Usage**\n", "\n", "| Package | Syntax |\n", "|: -------------: | :------------- |\n", "|**NumPy** | `np.random.gamma(alpha, beta)` |\n", "|**SciPy** | `scipy.stats.gamma(alpha, loc=0, scale=beta)` |\n", "|**Stan** | `gamma(alpha, beta)` |\n", "\n", "\n", "* **Notes.**\n", " - The Gamma distribution is useful as a prior for positive parameters. It imparts a heavier tail than the [Half-Normal distribution](#Half-Normal-distribution) (but not too heavy; it keeps parameters from growing too large), and allows the parameter value to come close to zero.\n", " - SciPy has a location parameter, which should be set to zero, with $\\beta$ being the scale parameter." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "application/vnd.bokehjs_exec.v0+json": "", "text/html": [ "\n", "" ] }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "server_id": "f9e869d968654dec8e8d47e07ca8310a" } }, "output_type": "display_data" } ], "source": [ "params = [dict(name='α', start=1, end=5, value=2, step=0.01),\n", " dict(name='β', start=0.1, end=5, value=2, step=0.01)]\n", "app = distribution_plot_app(x_min=0,\n", " x_max=50,\n", " scipy_dist=st.gamma,\n", " params=params,\n", " transform=lambda a, b: (a, 0, b),\n", " x_axis_label='y',\n", " title='Gamma')\n", "bokeh.io.show(app, notebook_url=notebook_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Inverse Gamma distribution\n", "\n", "* **Story.** If $x$ is [Gamma distributed](#Gamma-distribution), then $1/x$ is Inverse Gamma distributed.\n", "\n", "* **Parameters.** The number of arrivals, $\\alpha$, and the rate of\n", "arrivals, $\\beta$.\n", "\n", "* **Support.** The Inverse Gamma distribution is supported on the set of positive real numbers.\n", "\n", "* **Probability density function.**\n", "\n", "\\begin{align}\n", "f(y;\\alpha, \\beta) = \\frac{1}{\\Gamma(\\alpha)}\\,\\frac{\\beta^\\alpha}{y^{(\\alpha+1)}}\\,\\mathrm{e}^{-\\beta/ y}.\n", "\\end{align}\n", "\n", "* **Usage**\n", "\n", "| Package | Syntax |\n", "|: -------------: | :------------- |\n", "|**NumPy** | `1 / np.random.gamma(alpha, 1/beta)` |\n", "|**SciPy** | `scipy.stats.invgamma(alpha, loc=0, scale=beta) |\n", "|**Stan** | `inv_gamma(alpha, beta)` |\n", "\n", "\n", "* **Notes.**\n", " - The Inverse Gamma distribution is useful as a prior for positive parmeters. It imparts a quite heavy tail and keeps probability further from zero than the [Gamma distribution](#Gamma-distribution).\n", " - The `numpy.random` module does not have a function to sample directly from the Inverse Gamma distribution, but it can be achieved by sampling out of a Gamma distribution as shown in the NumPy usage above." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "application/vnd.bokehjs_exec.v0+json": "", "text/html": [ "\n", "" ] }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "server_id": "bf68ed2313014bcb8bd0d97bb53b5f5d" } }, "output_type": "display_data" } ], "source": [ "params = [dict(name='α', start=0.01, end=2, value=0.5, step=0.01),\n", " dict(name='β', start=0.1, end=2, value=1, step=0.01)]\n", "app = distribution_plot_app(x_min=0,\n", " x_max=20,\n", " scipy_dist=st.invgamma,\n", " params=params, \n", " transform=lambda alpha, beta: (alpha, 0, beta),\n", " x_axis_label='y',\n", " title='Inverse Gamma')\n", "\n", "bokeh.io.show(app, notebook_url=notebook_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Weibull distribution\n", "\n", "* **Story.** Distribution of $x = y^\\beta$ if $y$ is\n", "exponentially distributed. For $\\beta > 1$, the longer we have\n", "waited, the more likely the event is to come, and vice versa for\n", "$\\beta < 1$.\n", "\n", "* **Example.** This is a model for aging. The longer an\n", "organism lives, the more likely it is to die.\n", "\n", "* **Parameters.** There are two parameters, both strictly\n", "positive: the shape parameter $\\beta$, which dictates the shape of the\n", "curve, and the scale parameter $\\tau$, which dictates the rate of\n", "arrivals of the event.\n", "\n", "* **Support.**\n", "The Weibull distribution has support on the positive real numbers.\n", "\n", "* **Probability density function.**\n", "\n", "\\begin{align}\n", "f(y;\\alpha, \\sigma) = \\frac{\\alpha}{\\sigma}\\left(\\frac{y}{\\sigma}\\right)^{\\alpha - 1}\\,\n", "\\mathrm{e}^{-(y/\\sigma)^\\alpha}.\n", "\\end{align}\n", "\n", "* **Usage**\n", "\n", "| Package | Syntax |\n", "|: -------------: | :------------- |\n", "|**NumPy** | `np.random.weibull(alpha) * sigma` |\n", "|**SciPy** | `scipy.stats.weibull_min(alpha, loc=0, scale=sigma) |\n", "|**Stan** | `weibull(alpha, sigma)` |\n", "\n", "* **Related distributions.**\n", " - The special case where $\\alpha = 0$ is the [Exponential distribution](#Exponential-distribution) with parameter $\\beta = 1/\\sigma$.\n", " \n", "* **Notes.**\n", " - SciPy has a location parameter, which should be set to zero, with $\\beta$ being the scale parameter.\n", " - NumPy only provides a version of the Weibull distribution with $\\sigma = 1$. Sampling out of the Weibull distribution may be accomplished by multiplying the resulting samples by $\\sigma$." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "scrolled": true }, "outputs": [ { "data": { "application/vnd.bokehjs_exec.v0+json": "", "text/html": [ "\n", "" ] }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "server_id": "c6297276d291410caf8cf50f713b0709" } }, "output_type": "display_data" } ], "source": [ "params = [dict(name='α', start=0.1, end=5, value=1, step=0.01),\n", " dict(name='σ', start=0.1, end=3, value=1.5, step=0.01)]\n", "app = distribution_plot_app(x_min=0,\n", " x_max=8,\n", " scipy_dist=st.weibull_min,\n", " params=params,\n", " transform=lambda a, s: (a, 0, s),\n", " x_axis_label='y',\n", " title='Weibull')\n", "bokeh.io.show(app, notebook_url=notebook_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Beta distribution\n", "\n", "* **Story.** Say you wait for two multistep processes to happen. The individual steps of each process happen at the same rate, but the first multistep process requires $\\alpha$ steps and the second requires $\\beta$ steps. The fraction of the total waiting time take by the first process is Beta distributed.\n", "\n", "* **Example.**\n", "\n", "* **Parameters.** There are two parameters, both strictly\n", "positive: $\\alpha$ and $\\beta$, defined in the above story.\n", "\n", "* **Support.**\n", "The Beta distribution has support on the interval [0, 1].\n", "\n", "* **Probability density function.**\n", "\\begin{align}\n", "f(\\theta;\\alpha, \\beta) = \\frac{\\theta^{\\alpha-1}(1-\\theta)^{\\beta-1}}{B(\\alpha,\\beta)},\n", "\\end{align}\n", "where\n", "\\begin{align}\n", "B(\\alpha,\\beta) = \\frac{\\Gamma(\\alpha)\\,\\Gamma(\\beta)}{\\Gamma(\\alpha + \\beta)}\n", "\\end{align}\n", "is the Beta function.\n", "\n", "* **Usage**\n", "\n", "| Package | Syntax |\n", "|: -------------: | :------------- |\n", "|**NumPy** | `np.random.beta(alpha, beta)` |\n", "|**SciPy** | `scipy.stats.beta(alpha, beta) |\n", "|**Stan** | `weibull(alpha, sigma)` |\n", "\n", "* **Related distributions.**\n", " - The [Uniform distribution](#Uniform-distribution) on the interval [0, 1] is a special case of the Beta distribution with $\\alpha=\\beta=1$.\n", "\n", "* **Notes.** \n", " - The story of the Beta distribution is difficult to parse. Most importantly for our purposes, the Beta distribution allows us to put probabilities on unknown probabilities. It is only defined on $0\\le x\\le 1$, and $x$ here can be interpreted as a probability, say of a Bernoulli trial.\n", " - The case where $a = b = 0$ is not technically a probability distribution because the PDF cannot be normalized. Nonetheless, it can be used as an improper prior, and this prior is known a Haldane prior, names after biologist J. B. S. Haldane. The case where $a = b = 1/2$ is sometimes called a Jeffreys prior." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "application/vnd.bokehjs_exec.v0+json": "", "text/html": [ "\n", "" ] }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "server_id": "2427802df4cc42558cc2eed69f866091" } }, "output_type": "display_data" } ], "source": [ "params = [dict(name='α', start=0.01, end=10, value=1, step=0.01),\n", " dict(name='β', start=0.01, end=10, value=1, step=0.01)]\n", "app = distribution_plot_app(x_min=0,\n", " x_max=1,\n", " scipy_dist=st.beta,\n", " params=params,\n", " x_axis_label='θ',\n", " title='Beta')\n", "bokeh.io.show(app, notebook_url=notebook_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Discrete multivariate distributions\n", "\n", "So far, we have looked a univariate distributions, but we will consider multivariate distributions in class, and you will encounter them in your research. First, we consider a discrete multivariate distribution, the Multinomial." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Multinomial distribution\n", "\n", "* **Story.** This is a generalization of the Binomial distribution. Instead of a Bernoulli trial consisting of two outcomes, each trial has $K$ outcomes. The probability of getting $n_1$ of outcome 1, $n_2$ of outcome 2, ..., and $y_K$ of outcome $K$ out of a total of $N$ trials is Multinomially distributed.\n", "\n", "* **Example.** There are two alleles in a population, A and a. Each individual may have genotype AA, Aa, or aa. The probability distribution describing having $y_1$ AA individuals, $y_2$ Aa individuals, and $n_3$ aa individuals in a population of $N$ total individuals is Multinomially distributed.\n", "\n", "* **Parameters.** $N$, the total number of trials, and $\\boldsymbol{\\theta} = \\{\\theta_1, \\theta_2, \\ldots \\theta_k\\}$, the probabilities of each outcome. Note that $\\sum_i \\theta_i = 1$ and there is a further restriction that $\\sum_i y_i = N$.\n", "\n", "* **Support.** The _K_-nomial distribution is supported on $\\mathbb{N}^K$.\n", "\n", "* **Usage**\n", "The usage below assumes that `theta` is a length _K_ array.\n", "\n", "| Package | Syntax |\n", "|: -------------: | :------------- |\n", "|**NumPy** | `np.random.multinomial(N, theta)` |\n", "|**SciPy** | `scipy.stats.multinomial(N, theta)` |\n", "|**Stan sampling** | `multinomial(theta)` |\n", "|**Stan rng** | `multinomial_rng(theta, N)` |\n", "\n", "\n", "* **Probability density function.**\n", "\n", "\\begin{align}\n", "f(\\mathbf{y};\\mathbf{\\theta}, N) = \\frac{N!}{y_1!\\,y_2!\\cdots y_k!}\\,\\theta_1^{y_1}\\,\\theta_2^{y_2}\\cdots \\theta_K^{y_K}.\n", "\\end{align}\n", "\n", "* **Related distributions.**\n", " - The Multinomial distribution generalizes the [Binomial distribution](#Binomial-distribution) to multiple dimensions.\n", " \n", "* **Notes.**\n", " - For a sampling statement in Stan, the value of `N` is implied." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Continuous Multivariate distributions\n", "\n", "We will consider two continuous multivariate distributions here, the multivariate Gaussian and the Dirichlet. Generally plotting multivariate PDFs is difficult, but bivariate PDFs may be conveniently plotted as **contour plots**. In our investigation of the multivariate Gaussian distribution, I will also demonstrate how to make plots of bivariate PDFs." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Multivariate Gaussian, a.k.a. Multivariate Normal, distribution\n", "\n", "* **Story.** This is a generalization of the [univariate Gaussian](#Gaussian,-a.k.a.-Normal,-distribution).\n", "\n", "* **Example.** Finch beaks are measured for beak depth and beak length. The resulting distribution of depths and length is Gaussian distributed. In this case, the Gaussian is bivariate, with $\\mu = (\\mu_\\mathrm{d}, \\mu_\\mathrm{l})$ and\n", "\\begin{align}\n", "\\mathsf{\\Sigma} = \\begin{pmatrix}\n", "\\sigma_\\mathrm{d}^2 & \\sigma_\\mathrm{dl} \\\\\n", "\\sigma_\\mathrm{dl} & \\sigma_\\mathrm{l}^2\n", "\\end{pmatrix}.\n", "\\end{align}\n", "\n", "\n", "* **Parameters.** There is one vector-valued parameter, $\\boldsymbol{\\mu}$, and a matrix-valued parameter, $\\mathsf{\\Sigma}$, referred to respectively as the mean and covariance matrix. The covariance matrix is symmetric and strictly positive definite.\n", "\n", "* **Support.** The _K_-variate Gaussian distribution is supported on $\\mathbb{R}^K$.\n", "\n", "* **Probability density function.**\n", "\n", "\\begin{align}\n", "f(\\mathbf{y};\\boldsymbol{\\mu}, \\mathsf{\\Sigma}) = \\frac{1}{\\sqrt{(2\\pi)^K \\det \\mathsf{\\Sigma}}}\\,\\mathrm{exp}\\left[-\\frac{1}{2}(\\mathbf{y} - \\boldsymbol{\\mu})^T \\cdot \\mathsf{\\Sigma}^{-1}\\cdot (\\mathbf{y} - \\boldsymbol{\\mu})\\right].\n", "\\end{align}\n", "\n", "* **Usage**\n", "The usage below assumes that `mu` is a length _K_ array, `Sigma` is a _K_×_K_ symmetric positive definite matrix, and `L` is a _K_×_K_ lower-triangular matrix with strictly positive values on teh diagonal that is a Cholesky factor.\n", "\n", "| Package | Syntax |\n", "|: -------------: | :------------- |\n", "|**NumPy** | `np.random.multivariate_normal(mu, Sigma)` |\n", "|**SciPy** | `scipy.stats.multivariate_normal(mu, Sigma)` |\n", "|**Stan** | `multi_normal(mu, Sigma)` |\n", "|**NumPy Cholesky**| `np.random.multivariate_normal(mu, np.dot(L, L.T))` |\n", "|**SciPy Cholesky**| `scipy.stats.multivariate_normal(mu, np.dot(L, L.T))` |\n", "|**Stan Cholesky**| `multi_normal_cholesky(mu, L)` |\n", "\n", "\n", "* **Related distributions.**\n", " - The Multivariate Gaussian is a generalization of the [univariate Gaussian](#Gaussian,-a.k.a.-Normal,-distribution).\n", "\n", "* **Notes.** \n", " - The covariance matrix may also be written as\n", "\\begin{align}\n", "\\\\\\phantom{blah}\n", "\\mathsf{\\Sigma} = \\mathsf{S}\\cdot \\mathsf{C} \\cdot \\mathsf{S},\n", "\\\\\\phantom{blah}\n", "\\end{align}\n", "where \n", "\\begin{align}\n", "\\\\\\phantom{blah}\n", "\\mathsf{S} = \\sqrt{\\text{diag}(\\mathsf{\\Sigma})},\n", "\\\\\\phantom{blah}\n", "\\end{align}\n", "and entry $i, j$ in the **correlation matrix** $\\mathsf{C}$ is\n", "\\begin{align}\n", "\\\\\\phantom{blah}\n", "C_{ij} = \\sigma_{ij} / \\sigma_i \\sigma_j.\n", "\\\\\\phantom{blah}\n", "\\end{align}\n", " - Furthermore, because $\\mathsf{\\Sigma}$ is symmetric and strictly positive definite, it can be uniquely defined in terms of its [Cholesky decomposition](https://en.wikipedia.org/wiki/Cholesky_decomposition), $\\mathsf{L}$, which satisfies $\\mathsf{\\Sigma} = \\mathsf{L}\\cdot\\mathsf{L}^T$. In practice, you will almost always use the Cholesky representation of the Multivariate Gaussian distribution in Stan." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Lewandowski-Kurowicka-Joe (LKJ) distribution\n", "\n", "* **Story.** Probability distribution for positive definite correlation matrices, or equivalanently for their Cholesky factors (which is what we use in practice).\n", "\n", "* **Parameter.** There is one positive scalar parameter, $\\eta$, which tunes the strength of the correlations. If $\\eta = 1$, the density is uniform over all correlation matrix. If $\\eta > 1$, matrices with a stronger diagonal (and therefore smaller correlations) are favored. If $\\eta < 1$, the diagonal is weak and correlations are favored.\n", "\n", "* **Support.** The LKJ distribution is supported over the set of $K\\times K$ Cholesky factors of real symmetric positive definite matrices.\n", "\n", "* **Probability density function.**\n", "We cannot write the PDF in closed form. \n", "\n", "* **Usage**\n", "\n", "| Package | Syntax |\n", "|: -------------: | :------------- |\n", "|**NumPy** | not available |\n", "|**SciPy** | not available |\n", "|**Stan** | lkj_corr_cholesky(eta) |\n", "\n", "* **Notes.** \n", " - The most common use case is as a prior for a covariance matrix. Note that LKJ distribution gives Cholesky factors for *correlation* matrices, not covariance matrices. To get the covariance Cholesky factor from the correlation Cholesky factor, you need to multiply the correlation Cholesky factor by a diagonal constructed from the variances of the individual variates. Here is an example for a multivariate Gaussian in Stan.\n", "\n", " ```\n", " parameters {\n", " // Vector of means\n", " vector[K] mu;\n", "\n", " // Cholesky factor for the correlation matrix\n", " cholesky_factor_corr[K] L_Omega;\n", " \n", " // Sqrt of variances for each variate\n", " vector[K] L_std;\n", " } \n", " model {\n", " // Cholesky factor for covariance matrix\n", " L_Sigma = diag_pre_multiply(L_std, L_Omega);\n", "\n", " // Prior on Cholesky decomposition of correlation matrix\n", " L_Omega ~ lkj_corr_cholesky(1);\n", " \n", " // Prior on standard deviations for each variate\n", " L_std ~ normal(0, 2.5);\n", "\n", " // Likelihood\n", " y ~ multi_normal_cholesky(mu, L_Sigma);\n", " }\n", " ```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Dirichlet distribution\n", "\n", "* **Story.** The Dirichlet distribution is a generalization of the [Beta distribution](#Beta-distribution). It is a probability distribution describing probabilities of outcomes. Instead of describing probability of one of two outcomes of a Bernoulli trial, like the Beta distribution does, it describes probability of $K-1$ of $K$ outcomes. The Beta distribution is the special case of $K = 2$.\n", "\n", "* **Parameters.** The parameters are $\\alpha_1, \\alpha_2, \\ldots \\alpha_K$, all strictly positive, defined analogously to $\\alpha$ and $\\beta$ of the Beta distribution.\n", "\n", "* **Support.** The Dirichlet distribution has support on the interval [0, 1] such that $\\sum_{i=1}^K y_i = 1$.\n", "\n", "* **Probability density function.**\n", "\\begin{align}\n", "f(\\boldsymbol{\\theta};\\boldsymbol{\\alpha}) = \\frac{1}{B(\\boldsymbol{\\alpha})}\\,\\prod_{i=1}^K y_i^{\\alpha_i-1},\n", "\\end{align}\n", "where\n", "\\begin{align}\n", "B(\\boldsymbol{\\alpha}) = \\frac{\\prod_{i=1}^K\\Gamma(\\alpha_i)}{\\Gamma\\left(\\sum_{i=1}^K \\alpha_i\\right)}\n", "\\end{align}\n", "is the multivariate Beta function.\n", "\n", "* **Related distributions.**\n", " - The special case where $K = 2$ is a [Beta distribution](#Beta-distribution) with parameters $\\alpha = \\alpha_1$ and $\\beta = \\alpha_2$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusions\n", "\n", "This document serves as a catalog of probability distributions that will be useful for you in your statistical modeling. As we will see, the mathematical expression of the PDF is not often needed. What is most important in your modeling is that you know the story of the distribution." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Computing environment" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "%load_ext watermark" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPython 3.6.6\n", "IPython 6.5.0\n", "\n", "numpy 1.15.1\n", "scipy 1.1.0\n", "bokeh 0.13.0\n", "jupyterlab 0.34.9\n" ] } ], "source": [ "%watermark -v -p numpy,scipy,bokeh,jupyterlab" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 2 }