{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "view-in-github",
"slideshow": {
"slide_type": "skip"
},
"tags": [
"no-tex"
]
},
"source": [
"\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"id": "JoW4C_OkOMhe",
"slideshow": {
"slide_type": "skip"
},
"tags": [
"remove-cell"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install -q -U gtbook\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "10-snNDwOSuC",
"slideshow": {
"slide_type": "skip"
},
"tags": [
"remove-cell"
]
},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import gtsam\n",
"\n",
"import plotly.graph_objs as go\n",
"import plotly.express as px\n",
"try:\n",
" import google.colab\n",
"except:\n",
" import plotly.io as pio\n",
" pio.renderers.default = \"png\"\n",
"\n",
"from gtbook.discrete import Variables\n",
"from gtbook.display import pretty, show\n",
"\n",
"# recap from S11:\n",
"variables = Variables()\n",
"categories = [\"cardboard\", \"paper\", \"can\", \"scrap metal\", \"bottle\"]\n",
"Category = variables.discrete(\"Category\", categories)\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "nAvx4-UCNzt2",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Learning\n",
"\n",
"> We can learn prior and sensor models from data we collect.\n",
"\n",
"\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tSvXl_mnYeJ_",
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"At various times in this chapter we seemed to pull information out of thin air. But the various probabilistic models we used can be *learned* from data. This is what we will discuss below:\n",
"\n",
"- In Section 2.1 we talked about priors over state. Here we will estimate prior from counts, and introduce the idea of adding \"bogus counts\" in the case that we do not have a lot of data.\n",
"- In section 2.3 we discussed sensor models. Below we estimate those sensor models from counts recorded for each of the possible states.\n",
"- Counting works for discrete sensors, but for continuous sensors we have to a bit more work. We will end this section by showing how to fit simple Gaussian sensor models to data.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "HyMNgnbNYeJ_",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Estimating a Discrete PMF\n",
"\n",
"> Count the occurrences for each category."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ER8xW_d2HmHR",
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"In section 1 we introduced the notion of a probability mass function (PMF) to characterize the *a priori* probability of being in a certain state. It turns out that the *normalized counts* we obtain when observing states over a long time period is a good approximation for the PMF. The more samples that go in, the better the approximation.\n",
"\n",
"As an example, let us assume that, at a *different* trash sorting cell, we observe for a while and note the category for each piece of trash, recording as we go. We might see something like:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "Gmnf89Q7YeJ_",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"data = [1, 1, 1, 2, 1, 1, 1, 3, 0, 0, 0, 1,\n",
" 2, 2, 2, 2, 4, 4, 4, 1, 1, 2, 1, 2, 1]\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "LtttAWbHZgZ7",
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Using `numpy` we can get the counts using the [`bincount`](https://numpy.org/doc/stable/reference/generated/numpy.bincount.html) function. We then plot the counts using `plotly.express`:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 542
},
"id": "pxBtzSjmZqR0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"image/png": ""
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#| caption: Counts of each category in the data.\n",
"#| label: fig:counts_of_categories\n",
"counts = np.bincount(data)\n",
"px.bar(x=categories, y=counts)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "c4Jt2Mwrat5Q",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"We can then estimate the probability of each category $c_k$ simply by dividing the count $N_k$ by the number of data points $N$:\n",
"\n",
"$$P(x_k) \\approx \\frac{N_k}{N}$$\n",
"\n",
"In our example:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "wLX116r9aNdW",
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Counts: [ 3 11 7 1 3]\n",
"Estimated PMF: [0.12 0.44 0.28 0.04 0.12]\n"
]
}
],
"source": [
"estimated_pmf = counts/sum(counts)\n",
"print(f\"Counts: {counts}\\nEstimated PMF: {estimated_pmf}\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tTucUZMYbkX5",
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"We can now easily turn this into a GTSAM discrete prior for pretty-printing:\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 175
},
"id": "NrfBXCjqbFVa",
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/html": [
"
P(Category):
\n", "Category | value |
---|---|
cardboard | 0.12 |
paper | 0.44 |
can | 0.28 |
scrap metal | 0.04 |
bottle | 0.12 |
P(Category):
\n", "Category | value |
---|---|
cardboard | 0.12 |
paper | 0.44 |
can | 0.28 |
scrap metal | 0.04 |
bottle | 0.12 |
\n", " | raw | \n", "smoothed | \n", "
---|---|---|
cardboard | \n", "0.12 | \n", "0.133333 | \n", "
paper | \n", "0.44 | \n", "0.400000 | \n", "
can | \n", "0.28 | \n", "0.266667 | \n", "
scrap metal | \n", "0.04 | \n", "0.066667 | \n", "
bottle | \n", "0.12 | \n", "0.133333 | \n", "
P(Conductivity|Category):
\n", "Category | false | true |
---|---|---|
cardboard | 0.8 | 0.2 |
paper | 0.4 | 0.6 |
can | 0.75 | 0.25 |
scrap metal | 0.4 | 0.6 |
bottle | 0.5 | 0.5 |
P(ThreeValued|Category):
\n", "Category | Value1 | Value2 | Value3 |
---|---|---|---|
cardboard | 0.1 | 0.7 | 0.2 |
paper | 0.2 | 0.2 | 0.6 |
can | 0.125 | 0.625 | 0.25 |
scrap metal | 0.4 | 0.4 | 0.2 |
bottle | 0.25 | 0.25 | 0.5 |