Introduction

This tutorial focuses on pose estimation from planar or non planar points. From their 2D coordinates in the image plane, and their corresponding 3D coordinates specified in an object frame, ViSP is able to estimate the relative pose between the camera and the object frame. This pose is returned as an homogeneous matrix cMo. Note that to estimate a pose at least four points need to be considered.

In this tutorial we assume that you are familiar with ViSP blob tracking. If not, see Tutorial: Blob tracking.

Note that all the material (source code and image) described in this tutorial is part of ViSP source code (in tutorial/computer-vision folder) and could be found in https://github.com/lagadic/visp/tree/master/tutorial/computer-vision.

Pose estimation from 4 planar points

Principle on a single image

In this section we consider the case of four planar points. An image processing done by our blob tracker allows to extract the pixel coordinates of each blob center of gravity. Then using the camera intrinsic parameters we convert these pixel coordinates in meters in the image plane. Knowing the 3D coordinates of the points we then estimate the pose of the object.

The corresponding source code also provided in tutorial-pose-from-points-image.cpp is the following.

 
#include <visp3/core/vpIoTools.h>
#include <visp3/gui/vpDisplayGDI.h>
#include <visp3/gui/vpDisplayOpenCV.h>
#include <visp3/gui/vpDisplayX.h>
#include <visp3/io/vpImageIo.h>
 
#include "pose_helper.h"
 
int main(int, char *argv[])
{
  try {
    vpImage<unsigned char> I;
    vpImageIo::read(I, vpIoTools::getParent(argv[0]) + "/data/square.pgm");
 
#if defined(VISP_HAVE_X11)
    vpDisplayX d(I);
#elif defined(VISP_HAVE_GDI)
    vpDisplayGDI d(I);
#elif defined(HAVE_OPENCV_HIGHGUI)
    vpDisplayOpenCV d(I);
#endif
 
    vpCameraParameters cam(840, 840, I.getWidth() / 2, I.getHeight() / 2);
    std::vector<vpDot2> dot(4);
    std::vector<vpImagePoint> ip(4);
    dot[0].initTracking(I, vpImagePoint(193, 157));
    dot[1].initTracking(I, vpImagePoint(203, 366));
    dot[2].initTracking(I, vpImagePoint(313, 402));
    dot[3].initTracking(I, vpImagePoint(304, 133));
    std::vector<vpPoint> point;
    point.push_back(vpPoint(-0.06, -0.06, 0));
    point.push_back(vpPoint(0.06, -0.06, 0));
    point.push_back(vpPoint(0.06, 0.06, 0));
    point.push_back(vpPoint(-0.06, 0.06, 0));
 
    vpHomogeneousMatrix cMo;
    bool init = true;
 
    while (1) {
      vpImageIo::read(I, vpIoTools::getParent(argv[0]) + "/data/square.pgm");
      vpDisplay::display(I);
      for (unsigned int i = 0; i < dot.size(); i++) {
        dot[i].setGraphics(true);
        dot[i].track(I);
        ip[i] = dot[i].getCog();
      }
      computePose(point, ip, cam, init, cMo);
      vpDisplay::displayFrame(I, cMo, cam, 0.05, vpColor::none);
      vpDisplay::flush(I);
 
      if (init)
        init = false; // turn off pose initialisation
 
      if (vpDisplay::getClick(I, false))
        break;
 
      vpTime::wait(40);
    }
  } catch (const vpException &e) {
    std::cout << "Catch an exception: " << e.getMessage() << std::endl;
  }
}

Here is a screen shot of the resulting program:

And here is the detailed explanation of some lines introduced in the source code.

First we include the header pose_helper.h that contains helper functions dedicated to pose estimation.

#include "pose_helper.h"

In pose_helper.h header we include the vpPoint header class that is a container for 3D points with their 3D coordinates in the object frame, and their 2D coordinates in the image plane obtained after perspective projection. These 3D and 2D coordinates will be used as input in the pose estimation algorithm implemented in vpPose class. Then we include the vpCameraParameters header class that contains the camera intrinsic parameters. These parameters are typically used to convert coordinates of a blog cog obtained in pixels from the tracking of the blob, into 2D coordinates expressed in the image plane. The next header concerns the vpHomogeneousMatrix class that is a pose container. Finally, the vpDot2 header class corresponds to the tracker that will be used to track the 4 blobs.

#include <visp3/blob/vpDot2.h>
#include <visp3/core/vpCameraParameters.h>
#include <visp3/core/vpHomogeneousMatrix.h>
#include <visp3/core/vpPoint.h>

If we have a look in pose_helper.cpp we will find the computePose() function that does the pose estimation. This function uses as input two vectors. The first one is a vector of vpPoint that contains the 3D coordinates in meters of a point, while the second one is a vector of vpImagePoints that contains the 2D coordinates in pixels of a blob center of gravity. The 3D and 2D coordinates of the points need to be matched. That means that point[i] and ip[i] should r efer to the same physical point. Other inputs are cam that corresponds to the camera intrinsic parameters, and init that indicates if the pose needs to be initialized the first time. We have cMo [in/out] parameter that is as input the pose estimated from the previous image, and, as output the resulting pose estimated from the input points.

void computePose(std::vector<vpPoint> &point, const std::vector<vpImagePoint> &ip, const vpCameraParameters &cam,
                 bool init, vpHomogeneousMatrix &cMo)
{
  vpPose pose;
  double x = 0, y = 0;
  for (unsigned int i = 0; i < point.size(); i++) {
    vpPixelMeterConversion::convertPoint(cam, ip[i], x, y);
    point[i].set_x(x);
    point[i].set_y(y);
    pose.addPoint(point[i]);
  }
 
  if (init == true) {
    pose.computePose(vpPose::DEMENTHON_LAGRANGE_VIRTUAL_VS, cMo);
  }
  else {
    pose.computePose(vpPose::VIRTUAL_VS, cMo);
  }
}

In the computePose() function, we create first an instance of the vpPose class. Then we convert each coordinates of the blob center of gravity ip in 2D coordinates (x,y) in meter in the image plane using vpPixelMeterConversion::convertPoint(). We update then the point instance with the corresponding 2D coordinates and add this point to the vpPose class. Once all the points are added, when init is true we can estimate an initial pose using vpPose::DEMENTHON_LAGRANGE_VIRTUAL_VS method that runs first Dementhon and Lagrange linear pose estimation algorithms, then retain the result that has the lowest residual and finaly use this result to initialize the non linear virtual visual-servoing algorithm to estimate the pose that will converge to the solution with a lower residue than on of the linear approaches. When init is false, we consider that the previous pose cMo is near the solution so that it can be used as initialization for the non linear visual servoing minimization.

If we come back in tutorial-pose-from-points-image.cpp in the main() function, instead of using an image sequence of the object, we consider always the same image square.pgm. In this image we consider a 12cm by 12 cm square. The corners are represented by a blob. Their center of gravity correspond to a corner position. This gray level image is read thanks to:

vpImage<unsigned char> I;

vpImageIo::read(I, vpIoTools::getParent(argv[0]) + "/data/square.pgm");

After the instantiation of a window able to display the image and at the end the result of the estimated pose as a pose frame, we setup the camera parameters. We consider here the following camera model ${\bf a}=(p_x,p_y, u_0,v_0)$ where are the ratio between the focal length and the size of a pixel, and where the principal point $(u_0, v_0)$ is located at the image center position.

vpCameraParameters cam(840, 840, I.getWidth() / 2, I.getHeight() / 2);

Each blob is then tracked using a vector of four vpDot2 trackers. To avoid human interaction, we initialize the tracker with a pixel position inside each blob. Here we also instantiate a vector of vpImagePoint ip that will be updated in the while loop with the 2D coordinates in pixels of the blob center of gravity.

    std::vector<vpDot2> dot(4);
    std::vector<vpImagePoint> ip(4);
    dot[0].initTracking(I, vpImagePoint(193, 157));
    dot[1].initTracking(I, vpImagePoint(203, 366));
    dot[2].initTracking(I, vpImagePoint(313, 402));
    dot[3].initTracking(I, vpImagePoint(304, 133));

We also define the 3D model of our 12cm by 12cm square by setting the 3D coordinates of the corners in an object frame located in this example in the middle of the square. We consider here that the points are located in the plane .

    std::vector<vpPoint> point;
    point.push_back(vpPoint(-0.06, -0.06, 0));
    point.push_back(vpPoint(0.06, -0.06, 0));
    point.push_back(vpPoint(0.06, 0.06, 0));
    point.push_back(vpPoint(-0.06, 0.06, 0));

Next, we created an homogeneous matrix cMo that will contain the estimated pose.

vpHomogeneousMatrix cMo;

In the infinite loop, at each iteration we read a new image (in this example it remains the same), display its content in the window, track each blob and update the coordinates of the center of gravity in ip vector.

      vpImageIo::read(I, vpIoTools::getParent(argv[0]) + "/data/square.pgm");
      vpDisplay::display(I);
      for (unsigned int i = 0; i < dot.size(); i++) {
        dot[i].setGraphics(true);
        dot[i].track(I);
        ip[i] = dot[i].getCog();
      }

Then we call the pose estimation function `computePose() presented previously. It uses the 3D coordinates of the points defined as our model in the object frame, and their corresponding 2D positions in the image obtained by the blob tracker to estimate the pose cMo. This homogeneous transformation gives the position of the object frame in the camera frame.

computePose(point, ip, cam, init, cMo);

The resulting pose is displayed as an RGB frame in the window overlay. Red, green and blue colors are for X, Y and Z axis respectively. Each axis is 0.05 meter long. All the drawings are then flushed to update the window content.

vpDisplay::displayFrame(I, cMo, cam, 0.05, vpColor::none);

vpDisplay::flush(I);

At the end of the first iteration, we turn off the init flag that indicates that the next pose estimation could use our non linear vpPose::VIRTUAL_VS estimation method with the previous pose as initial value.

if (init)

init = false; // turn off pose initialisation

Finally, we interrupt the infinite loop by a user mouse click in the window.

if (vpDisplay::getClick(I, false))

break;

We also introduce a 40 milliseconds sleep to slow down the loop and relax the CPU.

vpTime::wait(40);

Using a camera

We provide also an other example in tutorial-pose-from-points-live.cpp that works on live images acquired by a camera. Compared to the previous example, we replace the single image reader by one of the frame grabber available in ViSP (see Tutorial: Image frame grabbing).

1. Download and print, a square made of 4 blobs

4 black blobs on a white background defining a 0.1 meter width square [square-10cm-black.pdf] (recommended);
4 white blobs on a black background defining a 0.1 meter width square [square-10cm-white.pdf].

2. Run the corresponding binary

To get the usage of the binary run:

./tutorial-pose-from-points-live --help

Have a trial adapting the size of your square:

./tutorial-pose-from-points-live --square_width 0.1

If your camera is calibrated (see Tutorial: Camera intrinsic calibration), you may use your own camera parameters with something similar to:

./tutorial-pose-from-points-live --square_width 0.1 --intrinsic camera.xml --camera_name Camera

If you get the following error:

$ ./tutorial-pose-from-points-live --square_width 0.1

Catch an exception: No cameras found

it means probably that your camera is not detected by the first grabber that is enabled using:

#if defined(VISP_HAVE_V4L2)
    vpV4l2Grabber g;
    std::ostringstream device;
    device << "/dev/video" << opt_device;
    std::cout << "Use Video 4 Linux grabber on device " << device.str() << std::endl;
    g.setDevice(device.str());
    g.setScale(1);
    g.open(I);
#elif defined(VISP_HAVE_DC1394)
    (void)opt_device; // To avoid non used warning
    std::cout << "Use DC1394 grabber" << std::endl;
    vp1394TwoGrabber g;
    g.open(I);
#elif defined(VISP_HAVE_CMU1394)
    (void)opt_device; // To avoid non used warning
    std::cout << "Use CMU1394 grabber" << std::endl;
    vp1394CMUGrabber g;
    g.open(I);
#elif defined(VISP_HAVE_FLYCAPTURE)
    (void)opt_device; // To avoid non used warning
    std::cout << "Use FlyCapture grabber" << std::endl;
    vpFlyCaptureGrabber g;
    g.open(I);
#elif defined(VISP_HAVE_REALSENSE2)
    (void)opt_device; // To avoid non used warning
    std::cout << "Use Realsense 2 grabber" << std::endl;
    vpRealSense2 g;
    rs2::config config;
    config.disable_stream(RS2_STREAM_DEPTH);
    config.disable_stream(RS2_STREAM_INFRARED);
    config.enable_stream(RS2_STREAM_COLOR, 640, 480, RS2_FORMAT_RGBA8, 30);
    g.open(config);
    g.acquire(I);
 
    std::cout << "Read camera parameters from Realsense device" << std::endl;
    cam = g.getCameraParameters(RS2_STREAM_COLOR, vpCameraParameters::perspectiveProjWithoutDistortion);
#elif defined(HAVE_OPENCV_VIDEOIO)
    std::cout << "Use OpenCV grabber on device " << opt_device << std::endl;
    cv::VideoCapture g(opt_device); // Open the default camera
    if (!g.isOpened()) {            // Check if we succeeded
      std::cout << "Failed to open the camera" << std::endl;
      return EXIT_FAILURE;
    }
    cv::Mat frame;
    g >> frame; // get a new frame from camera
    vpImageConvert::convert(frame, I);
#endif

If this is the case we suggest to modify the following lines in tutorial-pose-from-points-live.cpp by uncommenting all the grabber that you don't want to use:

// #undef VISP_HAVE_V4L2
// #undef VISP_HAVE_DC1394
// #undef VISP_HAVE_CMU1394
// #undef VISP_HAVE_FLYCAPTURE
// #undef VISP_HAVE_REALSENSE2
// #undef HAVE_OPENCV_VIDEOIO

For example, if you want to force OpenCV usage to grab images, uncomment the lines:

#undef VISP_HAVE_V4L2
#undef VISP_HAVE_DC1394
#undef VISP_HAVE_CMU1394
#undef VISP_HAVE_FLYCAPTURE
#undef VISP_HAVE_REALSENSE2

After modification, build and run again this binary.

Next tutorial

You are now ready to see the next Tutorial: Pose estimation from QRcode that gives a use case.

If you have an RGB-D camera, you may also continue with Tutorial: Planar object pose estimation using RGB-D data..

Table of Contents

Introduction

Pose estimation from 4 planar points

Principle on a single image

Using a camera

Next tutorial