3. A Robot Vacuum Cleaner#
Where we will reason over time about probabilistic actions in a discrete state space.
The second robot we will discuss is a mobile robot, modeled on the vacuum cleaning robots that many of us are already familiar with. We assume that this robot is equipped with hardware and software such that it can perform navigation, motion planning, and motion control, but we will not be specific on how these capabilities are implemented. This assumption allows us to focus on high-level problems (for example, deciding which room to clean next), without worrying now about low-level details (e.g., planning specific paths to cover a particular room, or navigating through a doorway).
Besides the obvious fact that our vacuum cleaning robot can move, our vacuum cleaning robot exhibits several key differences from the trash sorting robot discussed in the previous chapter. First, the effects of actions depend on the current world state; if a robot is in the living room and moves to its left, it will arrive to a different location than if it had started in the office. Second, the actions executed by the vacuum cleaning robot have uncertain effects. This is rather different than the actions of the trash sorting robot, which achieved its goals deterministically, regardless of the current state (the “move object to the metal bin” action moves an object to the metal bin, regardless of the category of the object, and with 100% reliability). Third, because the effects of actions depend on state, achieving goals in the future will depend on the actions the robot executes now (since current actions affect future states). Therefore, this robot must consider how the world state evolves with the passing of time.
In this chapter, we will explore the probabilistic outcomes of actions.
For our vacuum cleaning robot, states correspond to rooms in the house, and trajectories represent the robot moving from room to room.
We will model uncertain actions using conditional probability distributions, similar to how we modeled sensor measurements in the previous chapter. Using these distributions, we can propagate uncertainty forward in time for specific sequences of actions and generate sample trajectories from the corresponding probability distributions.
Our treatment of sensing will be quite limited in this chapter.
We will model a simple, discrete light sensor. However, because sensor measurements depend on the state, and the state depends on the sequence of actions executed, perception becomes a more complex problem.
While our trash-sorting robot relied on simple MLE or MAP estimation using only the current sensor reading, the vacuum cleaning robot will need to integrate knowledge about its history of actions (which have uncertain effects) with the sequence of sensor measurements.
We will address this perception problem using Hidden Markov Models (HMMs), which define probabilistic models for sensing over time. Crucially, we will demonstrate how to convert these models into a factor graph, enabling efficient computation of the most probable sequence of states given a series of sensor measurements and actions.
Planning becomes more sophisticated for our vacuum cleaning robot.
Instead of selecting a single action to minimize the cost of the current step,
we will reason about sequences of actions over time.
To achieve this, we will introduce Markov Decision Processes (MDPs).
An MDP incorporates the concept of reward, allowing us to identify optimal actions.
We will also derive an optimal policy—a strategy specifying what action to take in each state to maximize the aggregate reward over time.
Finally, we will introduce the concept of reinforcement learning, where the parameters of an MDP are estimated using data collected during the robot’s normal operation.