{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "view-in-github",
"tags": [
"no-tex"
]
},
"source": [
""
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"id": "w-CIikP0ZZnm",
"tags": [
"remove-cell"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install -q -U gtbook"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "uef6Kglzcrha",
"tags": [
"remove-cell"
]
},
"outputs": [],
"source": [
"from gtbook.discrete import Variables\n",
"from gtbook.display import pretty\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"import gtsam\n",
"\n",
"import plotly.express as px\n",
"try:\n",
" import google.colab\n",
"except:\n",
" import plotly.io as pio\n",
" pio.renderers.default = \"png\"\n",
"\n",
"import gtbook\n",
"VARIABLES = Variables()\n",
"def pretty(obj): \n",
" return gtbook.display.pretty(obj, VARIABLES)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "TP6jRdj2ccW1"
},
"source": [
"```{index} action; discrete actions\n",
"```\n",
"\n",
"# Actions for Sorting Trash\n",
"\n",
">Robots change the world through their actions. Action models capture their salient aspects.\n",
"\n",
"\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kh41s6J2ccW2"
},
"source": [
"Robots decide how to act in the world by reasoning about how their actions can be used to achieve their goals, given the current state of the world.\n",
"At a high level, actions can be represented by symbolic descriptions of their effects (changes that will occur in the world state when the action is executed)\n",
"and by their preconditions (things that must be true in the current state in order to execute the action).\n",
"The robot's goals can be encoded as a symbolic description of the desired world state, or, as we will do now,\n",
"by assigning a cost executing an action in a particular world state.\n",
"Note that assigning a cost to an action is equivalent to assigning a reward (merely multiply the cost by -1 to obtain a reward).\n",
"If we use a cost-based approach, we generally frame the planning problem as a decision problem: choose the action that minimizes cost.\n",
"If we are interested in long time horizons, we would choose the sequence of actions that minimize cost over the chosen time period.\n",
"If there are uncertainties, either in the world state or in the effects of actions, we would minimize the expected value of the cost.\n",
"\n",
"In this section, we will consider only the problem of evaluating\n",
"the cost of a single action based on limited knowledge of the world.\n",
"In particular, we will assume that the robot has only the prior probability distribution\n",
"on categories described in the previous section.\n",
"We will address the more general problem of planning (i.e., choosing which actions to apply in the\n",
"current context) later in the chapter."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5_gRa4NQZsD4"
},
"source": [
"## Modeling Actions and Their Effects\n",
"\n",
"> For a trash sorting robot, the destination bin is the most important aspect of an action.\n",
"\n",
"For our trash sorting robot, we will define four actions, each of which can be executed when\n",
"there is an item of trash in the work space (i.e., there are no preconditions for the actions).\n",
"The first three actions use the robot manipulator to move an item of trash\n",
"to one of three bins: glass, metal, or mixed paper.\n",
"The fourth action is a nop,\n",
"which corresponds to the robot simply allowing the item to\n",
"pass through the work space, to be processed, for example, by a human worker\n",
"(note that \"nop\" is a shorthand used in many programming languages\n",
"to denote \"no operation\").\n",
"\n",
"We assign labels to these actions as follows:\n",
"\n",
"* $a_1$: put in glass bin\n",
"* $a_2$: put in metal bin\n",
"* $a_3$: put in mixed paper bin\n",
"* $a_4$: nop (let the object continue, unsorted)\n",
"\n",
"and each of these actions can be applied at any stage of execution.\n",
"\n",
"If the robot had perfect knowledge of the world state (i.e., if the robot always knew\n",
"exactly the category of the item in the work space), choosing an action would be simple:\n",
"place paper and scrap cardboard in the paper bin; place cans and scrap metal in the metal bin;\n",
"place bottles in the glass bin. The nop action would never be used.\n",
"But what if the robot's knowledge of the world state is uncertain?\n",
"Suppose, for example, that it sometimes mistakes scrap metal for cardboard.\n",
"Placing scrap metal in the paper bin could lead to significant damage to\n",
"trash processing equipment, possibly requiring the facility to shut down completely\n",
"while repairs are made.\n",
"In contrast, if the robot places paper into the metal bin, serious damage is unlikely, and the cost of this wrong decision would likely be much smaller.\n",
"\n",
"In order to make informed decisions about which action to take,\n",
"the robot needs to have some quantitative way to evaluate the cost of\n",
"executing the wrong actions.\n",
"This begins by assigning a cost to each action, depending on the world\n",
"state when the action is executed.\n",
"\n",
"For this example, we will assign zero cost when the robot executes the correct action,\n",
"and positive value costs when wrong actions are executed, depending on the severity\n",
"of the consequence. We can encode these costs into a table using the following code."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 175
},
"id": "wVPAl8gdccW3",
"outputId": "e0da3179-ed58-4c17-c195-8c5921bffba5"
},
"outputs": [
{
"data": {
"text/html": [
"
\n", " | cardboard | \n", "paper | \n", "can | \n", "scrap metal | \n", "bottle | \n", "
---|---|---|---|---|---|
glass bin | \n", "2 | \n", "2 | \n", "4 | \n", "6 | \n", "0 | \n", "
metal bin | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "2 | \n", "
paper bin | \n", "0 | \n", "0 | \n", "5 | \n", "10 | \n", "3 | \n", "
nop | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "
P(Category):
\n", "Category | value |
---|---|
cardboard | 0.2 |
paper | 0.3 |
can | 0.25 |
scrap metal | 0.2 |
bottle | 0.05 |