7.1. Moving in Three Dimensions#
Motion in 3D has 6 degrees of freedom.

In Section 6.1 we introduced the space of 2D rigid transformations, \(SE(2)\), and showed how \(3 \times 3\) matrices can be used to simultaneously represent rotation and translation in two dimensions. At that time, we promised that this could be easily generalized to the case of rotation and translation in three dimensions. In this section, we deliver on that promise, and introduce two new spaces: the 3D rotations, \(SO(3)\), and 3D rigid transformations, \(SE(3)\).
In addition to representing the pose of a drone in 3D we are also interested in representing velocity. We will discuss both linear and angular velocity, i.e., the rate of change in the orientation of one coordinate frame with respect to another.
The discussion below holds for generic coordinate frames, but because this is the drone chapter, we introduce two frames here that make things a bit more tangible:
the navigation frame \(N\), which you can think of as attached to the Earth;
the body frame \(B\), which you can think of as rigidly attached to the drone.
In the next section the different conventions for these coordinate frames are discussed in more detail. Below we simply use the subscripts “n” and “b” in the definitions and examples, irrespective of which convention is used.
7.1.1. Rotations in 3D, aka SO(3)#
In Section 6.1, we constructed a rotation matrix \(R^0_1 \in SO(2)\) by projecting the axes of Frame 1 onto Frame 0. The extension to 3D is straightforward. Using the navigation and body coordinate frames defined above, let us project the axes \((x_b, y_b, z_b)\) of the body frame \(B\) onto the axes \((x_n,y_n, z_n)\) of the navigation frame \(N\). Of course, in the 3D case each frame is equipped with an additional \(z\)-axis:
The expression for rotational coordinate transformations is now exactly the same as in Chapter 6. Given a point \(p\) with coordinates expressed relative to the body frame \(B\), we compute its coordinates \(p^n\) in the navigation frame \(N\) as follows:
Extending the semantics from 2D rotations in \(SO(2)\) to 3D rotations represented by \(SO(3)\), the columns of \(R_{b}^{n}\) represent the axes of frame \(B\) in the \(N\) coordinate frame:
The 3D rotations together with composition constitute the special orthogonal group of order 3 or \(SO(3)\), hence the abbreviation “SO”. It is made up of all \(3\times3\) orthonormal matrices with determinant \(+1\). This group has matrix multiplication as its composition operation, but unlike in two dimensions, 3D rotations do not commute, i.e., in general \(R_{2}R_{1}\neq R_{1}R_{2}\). The group thus obtained it is called special because of the +1 determinant. Orthonormal matrices either have a -1 or +1 determinant, but those with -1 are reflections, which we specifically exclude here.
It is often useful to refer to rotations about one of the coordinate axes. For example, when a drone rotates around the vertical axis, this will typically be around the z-axis, i.e., the yaw axis. The basic rotation matrices around x, y, and z-axes respectively are given by
Note, in particular, the form of \(R_{z,\psi}\), the rotation about the \(z\)-axis. The upper-left \(2\times 2\) block of this matrix is exactly a rotation matrix \(R \in SO(2)\) as introduced in Section 6.1. This can be understood by realizing that rotation in the \(xy\) plane is actually a rotation about an implicit \(z\)- axis that points “out of the page.”
7.1.2. 3D Rigid transforms, aka SE(3)#
In a similar way we can extend the group of 2D rigid transforms \(SE(2)\) to the case of rotation and translation in three dimensions. Suppose a point \(p\) is rigidly attached to frame \(B\), and we wish to determine its coordinates with respect to the navigation frame \(N\). If the coordinates of \(p\) with respect to frame \(B\) are given by \(p^b\), then we can express it in frame \(N\) as follows:
Above \(R^n_b \in SO(3)\) is the rotation matrix that specifies the orientation of frame \(B\) w.r.t. frame \(N\), and \(d^n_b \in \mathbb{R}^3\) gives the location of the origin of frame \(B\) specified in the coordinates of frame \(N\).
We denote this transform by \(T_{b}^{n}\doteq\left(R_{b}^{n},\,d_{b}^{n}\right)\). These transforms \(T_{b}^{n}\) are elements of the special Euclidean group of order 3, \(SE(3)\). Similarly to the previous chapter we can construct matrices in \(SE(3)\) by embedding the rotation and translation into a \(4\times4\) invertible matrix defined as
Again, as in the previous chapter, we can implement the group operation as simple matrix multiplication. And, in analogy with the 2D case, if we use the homogeneous coordinates for a point \(p\), we define a 3D homogeneous transformation as
Note that both ways of defining and applying 3D rigid transformations are equivalent: sometimes it is more convenient to use one over the other. It should be apparent from the context as to which flavor we use.
7.1.3. The Group \(SE(3)\)#
\(SE(3)\) is a noncommutative group.
Unsurprisingly, \(SE(3)\) is also a group, as the following properties all hold:
Closure: For all transforms \(T, T' \in SE(3)\), their product is also in \(SE(3)\), i.e., \(T T' \in SE(3)\).
Identity: The \(4\times 4\) identity matrix \(I\) is included in the group, and for all \(T \in SE(3)\) we have \(T I = I T = T\).
Inverse: For every \(T \in SE(3)\) there exists \(T^{-1} \in SE(3)\) such that \(T^{-1}T = T T^{-1} = I\).
Associativity: For all \(T_1, T_2, T_3 \in SE(3)\), \((T_1 T_2) T_3 = T_1 (T_2 T_3)\).
The inverse \(T^{-1}\) is given by:
However, 3D rigid transforms do not commute. Hence, \(SE(3)\) is a noncommutative group. In fact, so is the group \(SO(3)\): 3D rotations only commute when they represent rotations along the same axis. A special case is the set of 2D rotations from the last chapter, which are all around the imagined z-axis: this explains why rotations in 2D commute, but rotations in 3D do not, in general.
7.1.4. \(SE(3)\) in GTSAM#
Pose3
is a noncommutative group which acts onPoint3
.
Again, all of this is built into GTSAM, where both 3D poses and 3D rigid transforms are represented by the type Pose3
, with rotation matrices represented by Rot3
:
R = gtsam.Rot3.Yaw(math.degrees(20)) # rotation around z-axis by 20 degrees
T = gtsam.Pose3(R, [10,20,30])
print(f"3D Pose {T}\ncorresponding to the transformation matrix:\n\n{T.matrix()}")
3D Pose R: [
-0.720878, -0.693062, 0;
0.693062, -0.720878, 0;
0, 0, 1
]
t: 10 20 30
corresponding to the transformation matrix:
[[-0.72087779 -0.6930622 0. 10. ]
[ 0.6930622 -0.72087779 0. 20. ]
[ 0. 0. 1. 30. ]
[ 0. 0. 0. 1. ]]
We can verify the group properties for 3D transforms in code, as we do below, noting that the last assertion for commutativity yields False
, as \(SE(3)\) is a noncommutative group:
print("Closure:", isinstance(T * T, gtsam.Pose3)) # closure
I = gtsam.Pose3.Identity()
print("Identity:", T.equals(T * I, 1e-9)) # identity
print("Inverse:", (T * T.inverse()).equals(I, 1e-9)) # inverse
T1, T2, T3 = T, gtsam.Pose3(gtsam.Rot3.Roll(0.1), [1,2,3]), gtsam.Pose3(gtsam.Rot3.Roll(0.2), [1,2,3])
print("Associativity:", ((T1 * T2)* T3).equals(T1 * (T2 * T3), 1e-9)) # associative
print("Commutativity:", (T1 * T2).equals(T2 * T1, 1e-9)) # NOT commutative
Closure: True
Identity: True
Inverse: True
Associativity: True
Commutativity: False
Finally, 3D transforms can act on 3D points, which we can do using matrix multiplication, or using the Pose3.transformFrom
method:
P1 = gtsam.Point3(2, 4, 3)
print(f"P0 = {T.matrix() @ [2, 4, 3, 1]}") # need to make P0 homogeneous
print(f"P0 = {T.transformFrom(P1)}")
P0 = [ 5.78599563 18.50261322 33. 1. ]
P0 = [ 5.78599563 18.50261322 33. ]
You can see from the code snippet above that both approaches yield the same numerical values for the first three coordinates.
7.1.5. Velocity#
Velocity comes in two flavors: linear and angular.
In previous chapters, we have used the special Euclidean group \(SE(n)\) to represent the position and orientation of static coordinate frames or of moving frames at a particular instant in time. In this chapter, we consider frames that are attached to moving drones, and therefore we must explicitly consider the velocity of moving frames. In this case, the pose of the drone is a function of time
The pose \(T_{b}^{n}(t)\) includes two time-varying components, \(R_{b}^{n}(t)\) and \(d_{b}^{n}(t)\). The instantaneous rate of change in \(R_{b}^{n}(t)\) corresponds to the angular velocity of the frame, and the instantaneous rate of change in \(d_{b}^{n}(t)\) to the linear velocity.
The linear velocity \(v\) of the origin of a moving frame is easy to understand. This velocity is obtained merely by computing the derivative of the translation \(d_{b}^{n}(t)\) of the frame’s origin:
The angular velocity \(\omega\) is more involved. The instantaneous rate of change in orientation is defined by the derivative \(\dot{R}\), but recall that a rotation matrix in \(SO(3)\) has only three degrees of freedom. Thus, we should expect that its instantaneous rate of change can be characterized by a three vector
This is indeed the case, and the corresponding quantity \(\omega\) is known as the angular velocity. We now show how the angular velocity is related to the derivative \(\dot{R}\).
7.1.5.2. Angular Velocity in the Body Frame#
The angular velocity vector \(\omega\) is a free vector, and therefore, we can define its coordinates with respect to the coordinate frame of our choice. Suppose, for example, that we would like to know the angular velocity of the body frame relative to the navigation frame, but expressed relative to the body frame.
Since angular velocity is a free vector, we can apply exactly the same rotational transformation methods as developed above. If we are given \(\omega^n\) (i.e., the angular velocity is expressed with respect to the navigation frame), and we instead wish to specify the angular velocity with respect to the body frame, we merely perform the coordinate transformation
This transformation will be useful when discussing desired angular velocities for a quadrotor.
In some cases, we would like to also represent the skew symmetric matrix \(\omega\) in different coordinate frames. In this case, we use a similarity transformation rather than a simple coordinate transformation, since we are transforming matrices instead of coordinate vectors. While we do not go into the details here, the similarity transformation that transforms \(\hat{\omega}^n_{n,b}\) to \(\hat{\omega}^b_{n,b}\) (i.e., from the navigation frame to the body frame)) is given by
and from this, we can immediately verify the relationship
7.1.5.3. Exercises#
In an introductory physics class, you may have learned the formula \( v = \omega \times r\), in which \(v\) is the velocity of a moving point \(p\); \(\omega\) is the angular velocity; and \(r\) is the distance from the origin to the moving point \(p\). Consider the special case when the navigation and body frames share a common origin and the body frame rotates about their shared \(z\)-axis. Suppose the point \(p\) is fixed to the \(x\)-axis of the body frame, with coordinates \(p^b = [10, 0, 0]^T\). Use the relationship \(\dot{p}^n = \frac{d}{dt} (R^n_b p^b)\) to show that the angular velocity derived above is the same as the introductory physics concept of angular velocity.
What is the relationship between the hat operator \(\hat{\cdot}\) and the vector cross product?
7.1.6. Summary#
Now that we know how to talk about 3D rotations, 3D rigid transformations, and their rate of change, we are ready to use this machinery to describe how drones move in 3D. The next section tackles this.
7.1.7. GTSAM 101#
A deeper dive in the GTSAM concepts used above.
The types Point3
and Rot3
are used for 3D position and rotation.
While Point3
is really just an alias for a 3-dimensional numpy
array, the Rot3
type is a wrapper around a C++ GTSAM type that is not just a 3x3 matrix, which is how it represents the rotation internally, but also has a number of methods to interface with them.
For example, the Rot3.Ypr
constructor creates a rotation from yaw, pitch, and roll angles:
help(gtsam.Rot3.Ypr)
Ypr(...) method of builtins.PyCapsule instance
Ypr(y: float, p: float, r: float) -> gtsam.gtsam.Rot3
Using a 3D position and a 3D rotation we can create a 3D pose, represented by a Pose3
. As always, you can execute help(gtsam.Pose3)
to get the full documentation of a class. Below is an excerpt with some useful methods. We have several constructors:
__init__(...)
__init__(*args, **kwargs)
Overloaded function.
1. __init__(self: Pose3) -> None
2. __init__(self: Pose3, other: Pose3) -> None
3. __init__(self: Pose3, r: Rot3, t: Point3) -> None
4. __init__(self: Pose3, pose2: Pose2) -> None
5. __init__(self: Pose3, mat: numpy.ndarray[numpy.float64[m, n]]) -> None
where t
above is just a Point3
, and mat
can be a 3x4 or 4x4 matrix. However, you are responsible for these matrices to satisfy the \(SE(3)\) constraints. We also used the Pose3.TransformFrom
method above, and there is also a Pose3.TransformTo
method:
transformFrom(...)
transformFrom(*args, **kwargs)
Overloaded function.
1. transformFrom(self: Pose3, point: Point3) -> Point3
...
|
transformTo(...)
transformTo(*args, **kwargs)
Overloaded function.
1. transformTo(self: Pose3, point: Point3) -> Point3
....
Both of them have three overloads, but we show only the most commonly used one above.