Thanks for stopping by! Please note that this page will not be updated anymore and all content has been moved into an open-source textbook. The books is available open-source on github, compiled as PDF, and in print on Amazon.com.

Range sensors has emerged as one of the most effective sensors to make robots autonomous. Unlike vision, range data makes the construction of a 3D model of the robot’s environment straightforward and the Velodyne sensor, that combines 64 scanning lasers into one package, was key in mastering the DARPA Grand Challenge.  3D range data has become even more important in robotics with the advent of cheap (priced at a tenth than the cheapest 2D laser scanner) RGB-D (color image plus depth) cameras. Point cloud data allows fitting of lines using RANSAC, which can serve as features in EKF-based localization, but can also be used for improving odometry, loop-closure detection, and mapping. The goals of today’s lecture are

  • introduce the Iterative Closest Point (ICP) algorithm
  • show how ICP can be improved by providing initial guesses via RANSAC
  • show how SIFT features can be used to improve point selection and loop-closure in ICP to achieve RGB-D mapping

The Iterative Closest Point (ICP) algorithm

The ICP algorithm was presented in the early 1990ies for registration of 3D range data to CAD models of objects. The key problem can be reduced to find the best transformation that minimizes the distance between two point clouds. This is the case when matching snapshots from a range sensor or matching a range image with a point cloud sampled from a 3D representation of an object.

In robotics, ICP found an application to match scans from 2D laser range scanners. For example, the transformation that minimizes the error between two consecutive snapshots of the environment is proportional to the motion of the robot. This is a hard problem as it is unclear, which points in the two consecutive snapshots are “pairs”, which of the points are outliers (due to noisy sensors), and which points need to be discarded as not all points overlap in both snapshots. Stitching a series of snapshots together theoretically allows to create a 2D map of the environment. This is difficult, however, as the error between every snapshots – similar to odometry – accumulates.   The ICP algorithm also works in 3D where it allows to infer the change in 6D pose of a camera and creation of 3D maps. In addition, ICP has proven useful for identifying objects from a database of 3D objects.

Before providing a solution to the mapping problem, we will focus on the ICP algorithm to match 2 consecutive frames. Variants of the ICP algorithm can be broken down into 6 consecutive steps [Rusi2001]:

  1. Selection of points in one or both meshes or point clouds.
  2. Matching/Pairing these points to samples in the other point cloud/mesh.
  3. Weighting the corresponding pairs.
  4. Rejecting certain pairs.
  5. Assigning an error metric based on the point pairs.
  6. Minimizing the error metric.

Point Selection

Depending on the number of points generated by the range sensor, it might make sense to use only a few selected points to calculate the optimal transformation between two point clouds, and then test this transformation on all points. Depending on the source of the data, it also turns out that some points are more suitable than others as it is easier to identify matches for them. This is the case for RGB-D data, where SIFT features have been used successfully [Henr2010]. This is also the case for planar objects with grooves, where sampling should ensure that angles of normal vectors of sampling points are broadly distributed [Rusi2001]. Which method to use is therefore strongly dependent on the kind of data being used and should be considered for each specific problem.

Matching Points

The key step in ICP is to match one point to its corresponding point.  For example, a laser scanner hits a certain point at a wall with its 67th ray. After the scanner has been moved by 10cm , the closest hit on the wall to this point might have been by the 3rth ray of the laser. Here, it is actually very unlikely that the laser hits the exact same point on the wall twice, therefore introducing a non-zero error even for optimal pairing. Prominent methods are to find the closest point in the other point cloud or to find the intersection of the source points normal with the destination surface (for matching point clouds to meshes). More recently, SIFT has allowed to match points based on their visual appearance. Similarly to sorting through SIFT features, finding the closest matching point can be accelerated by representing the point cloud in a k-d tree.

Weighting of Pairs

As some pairs are better matches than others, weighting them in some smart way might drastically improve the quality of the resulting transformation. One approach is to give more weight to points that have smaller distances from each other. Another approach is to take into account the color of the point (in RGB-D images) or use the distance of their SIFT features (weighting pairs with low distances higher than pairs with high distances). Finally, expected noise can be used to weight pairings. For example, the estimates made by a laser scanner are much more faithful when taken orthogonally to a plane than when taken at a steep angle.

Rejecting of Pairs

A key problem in ICP are outliers either from sensor noise or simply from incomplete overlap between two consecutive range images. A prime approach in dealing with this problem is to reject pairings of which one of the points lies on a boundary of the point cloud as these points are likely to match with points in non-overlapping regions. As a function of the underlying data, it might also make sense to reject pairings with too high of a distance. This is a threshold-based equivalent to distance-based weighting as described above.

Error Metric and Minimization Algorithm

After points have been selected and matched, pairs have been weighted and rejected, the match between two point clouds needs to be expressed by a suitable error metric, which needs then to be minimized. A straightforward approach is to consider the sum of squared distances between each pair. This formulation can often be solved analytically. Let

A=\{a_1,\ldots,a_m\} and B=\{b_1,\dots,b_n\}

be point clouds in \mathbb{R}^n. The goal is now to find a vector t \in \mathbb{R}^n so that an error function \phi(A+t,B) is minimized. In 6D (translation and rotation), an equivalent notation can be found for a transformation (see forward kinematics). An error function for the squared distance is then given by

\phi(A+t,B)=\frac{1}{m}\sum_{a \in A}\|a+t-N_B(a+t)\|^2

Here N_B(a+t) is a function that provides the nearest neighbor of a translated by b in B.  A key problem now is that the actual value of $t$ affects the outcome of the pairing. What might look like a good match initially often turns out not be the final pairing. A simple numerical approach to this problem is to find t iteratively.

Initially $t=0$ and nearest neighbors/pairings are established. We can now calculate a \delta t that optimizes the least-square problem based on this matching using any solver available for the optimization problem (for a least-square solution \delta t can be obtained analytically by solving for the minimum of the polynomial by setting its derivative to zero). We can then shift all points in A by \delta t and start over. That is, we calculate new pairings and derive a new \delta t.  We can continue to do this, until the cost function reaches a local minimum.

Instead of formulating the cost function as a “point-to-point” distance, a “point-to-plane” has become popular. Here, the cost function consist of the sum of squared distances from each source point to the plane that contains the destination point and is oriented perpendicular to the destination normal. This makes particularly sense when matching a point cloud to a mesh/CAD model of an object. In this case there are no analytical solutions to finding the optimal transformation, but any optimization method such a Levenberg-Marquardt can be used.

RGB-D Mapping

The ICP algorithm can be used to stitch consecutive range images together to create a 3D map of the environment. Together with RGB information, it is possible to create complete 3D walk throughs of an environment. A problem with ICP is that errors in each transformation propagate making maps created using this method as odd as maps created by simple odometry. Here, the SLAM algorithm can be used to correct previous errors once a loop closure is detected.
The intuition behind SLAM is to consider each transformation between consecutive snapshots as a spring with variable stiffness. Whenever the robot returns to a previously seen location, i.e., a loop-closure has been determined, additional constraints are introduced and the collection of snapshots connected by springs become a mesh. Everytime the robot then re-observes a transformation between any of the snapshots, it can “stiffen” the spring connecting the two. As all of the snapshots are connected, this new constraints propagates through the network and literally pull each snapshots in place.
RGB-D Mapping [Henr2010] uses a variant of ICP that is enhanced by SIFT features for point selection and matching. Maps are build incrementally. SIFT features, and their spatial relationship, are used for detecting loop closures. Once a loop closure is detected, an additional constraint is added to the pose graph and a SLAM-like optimization algorithm corrects the pose of all previous observations.
As ICP only works when both point clouds are already closely aligned, which might not be the case for a fast moving robot with a relatively noisy sensor (the XBox Kinect has an error of 3cm for a few meters of range vs. millimeters in laser range scanners), RGB-D Mapping uses RANSAC to find an initial transformation. Here, RANSAC works as for line fitting: it keeps guessing possible transformations for 3 pairs of SIFT feature points and then counts the number of inliers when matching the two point clouds, one of which being transformed using the random guess.


[Rusi2001] Szymon Rusinkiewicz and Marc Levoy. Efficient Variants of the ICP Algorithm. Third International Conference on
3D Digital Imaging and Modeling (3DIM), 2001.

[Henr2010] P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox. RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments. Proc. of International Symposium on Experimental Robotics (ISER), 2010.


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Set your Twitter account name in your settings to use the TwitterBar Section.