ViSP  2.10.0
Tutorial: Pose estimation from points

Introduction

This tutorial focuses on pose estimation from planar or non planar points. From their 2D coordinates in the image plane, and their corresponding 3D coordinates specified in an object frame, ViSP is able to estimate the relative pose between the camera and the object frame. This pose is returned as an homogeneous matrix cMo. Note that to estimate a pose at least four points need to be considered.

In this tutorial we assume that you are familiar with ViSP blob tracking. If not, see Tutorial: Blob tracking.

Pose estimation from 4 planar points

In this section we consider the case of four planar points. An image processing done by our blob tracker allows to extract the pixel coordinates of each blob center of gravity. Then using the camera intrinsic parameters we convert these pixel coordinates in meters in the image plane. Knowing the 3D coordinates of the points we then estimate the pose of the object.

The corresponding source code also provided in tutorial-pose-from-points-image.cpp is the following.

#include <visp/vpDisplayGDI.h>
#include <visp/vpDisplayOpenCV.h>
#include <visp/vpDisplayX.h>
#include <visp/vpDot2.h>
#include <visp/vpImageIo.h>
#include <visp/vpPixelMeterConversion.h>
#include <visp/vpPose.h>
void computePose(std::vector<vpPoint> &point, const std::vector<vpDot2> &dot,
const vpCameraParameters &cam, bool init, vpHomogeneousMatrix &cMo);
void computePose(std::vector<vpPoint> &point, const std::vector<vpDot2> &dot,
const vpCameraParameters &cam, bool init, vpHomogeneousMatrix &cMo)
{
vpPose pose; double x=0, y=0;
for (unsigned int i=0; i < point.size(); i ++) {
vpPixelMeterConversion::convertPoint(cam, dot[i].getCog(), x, y);
point[i].set_x(x);
point[i].set_y(y);
pose.addPoint(point[i]);
}
if (init == true) {
pose.computePose(vpPose::LAGRANGE, cMo_lag);
double residual_dem = pose.computeResidual(cMo_dem);
double residual_lag = pose.computeResidual(cMo_lag);
if (residual_dem < residual_lag)
cMo = cMo_dem;
else
cMo = cMo_lag;
}
}
int main()
{
try {
vpImageIo::read(I, "square.pgm");
#if defined(VISP_HAVE_X11)
vpDisplayX d(I);
#elif defined(VISP_HAVE_GDI)
#elif defined(VISP_HAVE_OPENCV)
#endif
vpCameraParameters cam(840, 840, I.getWidth()/2, I.getHeight()/2);
std::vector<vpDot2> dot(4);
dot[0].initTracking(I, vpImagePoint(193, 157));
dot[1].initTracking(I, vpImagePoint(203, 366));
dot[2].initTracking(I, vpImagePoint(313, 402));
dot[3].initTracking(I, vpImagePoint(304, 133));
std::vector<vpPoint> point(4);
point[0].setWorldCoordinates(-0.06, -0.06, 0);
point[1].setWorldCoordinates( 0.06, -0.06, 0);
point[2].setWorldCoordinates( 0.06, 0.06, 0);
point[3].setWorldCoordinates(-0.06, 0.06, 0);
bool init = true;
while(1){
vpImageIo::read(I, "square.pgm");
for (unsigned int i=0; i < dot.size(); i ++) {
dot[i].setGraphics(true);
dot[i].track(I);
}
computePose(point, dot, cam, init, cMo);
if (init) init = false; // turn off pose initialisation
if (vpDisplay::getClick(I, false)) break;
}
}
catch(vpException e) {
std::cout << "Catch an exception: " << e << std::endl;
}
}

Here is a screen shot of the resulting program:

img-pose-square.jpg

And here is the detailed explanation of the new lines introduced in the source code.

First we include the headers of the vpPixelMeterConversion class that contains static functions able to convert coordinates of a point expressed in pixels in the image into meter in the image plane thanks to the camera intrinsic parameters. We also include the header of the vpPose class that is able to estimate a pose from points.

#include <visp/vpPixelMeterConversion.h>
#include <visp/vpPose.h>

Here we introduce the computePose() function that does the pose estimation. This function uses as input two vectors. The first one is a vpPoint that contains the 3D coordinates in meters of a point, while the second one is a vpDot2 that contains the 2D coordinates in pixels of a blob center of gravity. The 3D and 2D coordinates of the points need to be matched. That means that point[i] and dot[i] should refer to the same physical point. Other inputs are cam that corresponds to the camera intrinsic parameters, and init that indicates if the pose needs to be initialized the first time. We have cMo parameter that will contain the resulting pose estimated from the input points.

void computePose(std::vector<vpPoint> &point, const std::vector<vpDot2> &dot,
const vpCameraParameters &cam, bool init, vpHomogeneousMatrix &cMo)
{
vpPose pose; double x=0, y=0;
for (unsigned int i=0; i < point.size(); i ++) {
vpPixelMeterConversion::convertPoint(cam, dot[i].getCog(), x, y);
point[i].set_x(x);
point[i].set_y(y);
pose.addPoint(point[i]);
}
if (init == true) pose.computePose(vpPose::DEMENTHON_VIRTUAL_VS, cMo);
else pose.computePose(vpPose::VIRTUAL_VS, cMo) ;
}

In the computePose() function, we create first an instance of the vpPose class. Then for each point we get its 2D coordinates in pixels as the center of gravity of a blob, convert them in 2D coordinates (x,y) in meter in the image plane using vpPixelMeterConversion::convertPoint(). We update then the point instance with the corresponding 2D coordinates and add this point to the vpPose class. Once all the points are added, we can estimate the pose. If init is true, we estimate a first pose using vpPose::DEMENTHON linear approach. It produces a pose used as initialization for the vpPose::VIRTUAL_VS (virtual visual servoing) non linear approach that will converge to the solution with a lower residue than the linear approach. If init is false, we consider that the previous pose cMo is near the solution so that it can be uses as initialization for the non linear visual servoing minimization.

In the main() function, instead of using an image sequence of the object, we consider always the same image "square.pgm". In this image we consider a 12cm by 12 cm square. The corners are represented by a blob. Their center of gravity correspond to a corner position. This gray level image is read thanks to:

After the instantiation of a window able to display the image, and at the end the result of the estimated pose, we setup the camera parameters. We consider here the following camera model ${\bf a}=(p_x,p_y, u_0,v_0)$ where $p_x = p_y = 800 $ are the ratio between the focal length and the size of a pixel, and where the principal point $(u_0, v_0)$ is located at the image center position.

vpCameraParameters cam(840, 840, I.getWidth()/2, I.getHeight()/2);

Each blob is then tracked using a vector of four vpDot2 trackers. To avoid human interaction, we initialize the tracker with a pixel position inside each blob.

std::vector<vpDot2> dot(4);
dot[0].initTracking(I, vpImagePoint(193, 157));
dot[1].initTracking(I, vpImagePoint(203, 366));
dot[2].initTracking(I, vpImagePoint(313, 402));
dot[3].initTracking(I, vpImagePoint(304, 133));

We also define the model of our 12cm by 12cm square by setting the 3D coordinates of the corners in an object frame located in this example in the middle of the square. We consider here that the points are located in the plane $z=0$.

std::vector<vpPoint> point(4);
point[0].setWorldCoordinates(-0.06, -0.06, 0);
point[1].setWorldCoordinates( 0.06, -0.06, 0);
point[2].setWorldCoordinates( 0.06, 0.06, 0);
point[3].setWorldCoordinates(-0.06, 0.06, 0);

Next, we created an homogeneous matrix cMo that will contain the estimated pose.

In the infinite loop, at each iteration we read a new image (in this example it remains the same), display its content in the window, and track each blob.

vpImageIo::read(I, "square.pgm");
for (unsigned int i=0; i < dot.size(); i ++) {
dot[i].setGraphics(true);
dot[i].track(I);
}

Then we call the pose estimation function computePose() presented previously. It uses the 3D coordinates of the points defined as our model in the object frame, and their corresponding 2D positions in the image obtained by the blob tracker to estimate the pose cMo. This homogeneous transformation gives the position of the object frame in the camera frame.

computePose(point, dot, cam, init, cMo);

The resulting pose is displayed as an RGB frame in the window overlay. Red, green and blue colors are for x, y and z axis respectively. Each axis is 0.05 meter long. All the drawings are then flushed to update the window content.

At the end of the first iteration, we turn off the init flag that indicates that the next pose estimation could use our non linear vpPose::VIRTUAL_VS estimation method with the previous pose as initial value.

if (init) init = false;

Finally, we interrupt the infinite loop by a user mouse click in the window.

if (vpDisplay::getClick(I, false)) break;

We also introduce a 40 milliseconds sleep to slow down the loop and relax the CPU.

We provide also an other example in tutorial-pose-from-points-tracking.cpp that shows how to estimate the pose from a tracking performed on images acquired by a firewire camera.