Visual Servoing Platform  version 3.6.1 under development (2024-06-15)
Tutorial: Pose estimation from points

# Introduction

This tutorial focuses on pose estimation from planar or non planar points. From their 2D coordinates in the image plane, and their corresponding 3D coordinates specified in an object frame, ViSP is able to estimate the relative pose between the camera and the object frame. This pose is returned as an homogeneous matrix cMo. Note that to estimate a pose at least four points need to be considered.

In this tutorial we assume that you are familiar with ViSP blob tracking. If not, see Tutorial: Blob tracking.

Note that all the material (source code and image) described in this tutorial is part of ViSP source code (in tutorial/computer-vision folder) and could be found in https://github.com/lagadic/visp/tree/master/tutorial/computer-vision.

# Pose estimation from 4 planar points

## Principle on a single image

In this section we consider the case of four planar points. An image processing done by our blob tracker allows to extract the pixel coordinates of each blob center of gravity. Then using the camera intrinsic parameters we convert these pixel coordinates in meters in the image plane. Knowing the 3D coordinates of the points we then estimate the pose of the object.

The corresponding source code also provided in tutorial-pose-from-points-image.cpp is the following.

#include <visp3/core/vpConfig.h>
#include <visp3/core/vpIoTools.h>
#include <visp3/gui/vpDisplayGDI.h>
#include <visp3/gui/vpDisplayOpenCV.h>
#include <visp3/gui/vpDisplayX.h>
#include <visp3/io/vpImageIo.h>
#include "pose_helper.h"
int main(int, char *argv[])
{
#ifdef ENABLE_VISP_NAMESPACE
using namespace VISP_NAMESPACE_NAME;
#endif
try {
#if defined(VISP_HAVE_X11)
vpDisplayX d(I);
#elif defined(VISP_HAVE_GDI)
#elif defined(HAVE_OPENCV_HIGHGUI)
#endif
vpCameraParameters cam(840, 840, I.getWidth() / 2, I.getHeight() / 2);
std::vector<vpDot2> dot(4);
std::vector<vpImagePoint> ip(4);
dot[0].initTracking(I, vpImagePoint(193, 157));
dot[1].initTracking(I, vpImagePoint(203, 366));
dot[2].initTracking(I, vpImagePoint(313, 402));
dot[3].initTracking(I, vpImagePoint(304, 133));
std::vector<vpPoint> point;
point.push_back(vpPoint(-0.06, -0.06, 0));
point.push_back(vpPoint(0.06, -0.06, 0));
point.push_back(vpPoint(0.06, 0.06, 0));
point.push_back(vpPoint(-0.06, 0.06, 0));
bool init = true;
while (1) {
for (unsigned int i = 0; i < dot.size(); i++) {
dot[i].setGraphics(true);
dot[i].track(I);
ip[i] = dot[i].getCog();
}
computePose(point, ip, cam, init, cMo);
if (init)
init = false; // turn off pose initialisation
if (vpDisplay::getClick(I, false))
break;
}
}
catch (const vpException &e) {
std::cout << "Catch an exception: " << e.getMessage() << std::endl;
}
}
Generic class defining intrinsic camera parameters.
static const vpColor none
Definition: vpColor.h:225
Display for windows using GDI (available on any windows 32 platform).
Definition: vpDisplayGDI.h:132
The vpDisplayOpenCV allows to display image using the OpenCV library. Thus to enable this class OpenC...
Use the X11 console to display images on unix-like OS. Thus to enable this class X11 should be instal...
Definition: vpDisplayX.h:131
static bool getClick(const vpImage< unsigned char > &I, bool blocking=true)
static void display(const vpImage< unsigned char > &I)
static void displayFrame(const vpImage< unsigned char > &I, const vpHomogeneousMatrix &cMo, const vpCameraParameters &cam, double size, const vpColor &color=vpColor::none, unsigned int thickness=1, const vpImagePoint &offset=vpImagePoint(0, 0), const std::string &frameName="", const vpColor &textColor=vpColor::black, const vpImagePoint &textOffset=vpImagePoint(15, 15))
static void flush(const vpImage< unsigned char > &I)
error that can be emitted by ViSP classes.
Definition: vpException.h:60
const char * getMessage() const
Definition: vpException.cpp:65
Implementation of an homogeneous matrix and operations on such kind of matrices.
static void read(vpImage< unsigned char > &I, const std::string &filename, int backend=IO_DEFAULT_BACKEND)
Definition: vpImageIo.cpp:147
Class that defines a 2D point in an image. This class is useful for image processing and stores only ...
Definition: vpImagePoint.h:82
unsigned int getWidth() const
Definition: vpImage.h:246
unsigned int getHeight() const
Definition: vpImage.h:185
static std::string getParent(const std::string &pathname)
Definition: vpIoTools.cpp:2101
Class that defines a 3D point in the object frame and allows forward projection of a 3D point in the ...
Definition: vpPoint.h:79
void init(VISP_NAMESPACE_ADDRESSING vpImage< unsigned char > &Iinput, VISP_NAMESPACE_ADDRESSING vpImage< unsigned char > &IcannyVisp, VISP_NAMESPACE_ADDRESSING vpImage< unsigned char > *p_dIx, VISP_NAMESPACE_ADDRESSING vpImage< unsigned char > *p_dIy, VISP_NAMESPACE_ADDRESSING vpImage< unsigned char > *p_IcannyimgFilter)
Initialize the different displays.
VISP_EXPORT int wait(double t0, double t)

Here is a screen shot of the resulting program:

And here is the detailed explanation of some lines introduced in the source code.

First we include the header pose_helper.h that contains helper functions dedicated to pose estimation.

#include "pose_helper.h"

In pose_helper.h header we include the vpPoint header class that is a container for 3D points with their 3D coordinates in the object frame, and their 2D coordinates in the image plane obtained after perspective projection. These 3D and 2D coordinates will be used as input in the pose estimation algorithm implemented in vpPose class. Then we include the vpCameraParameters header class that contains the camera intrinsic parameters. These parameters are typically used to convert coordinates of a blog cog obtained in pixels from the tracking of the blob, into 2D coordinates expressed in the image plane. The next header concerns the vpHomogeneousMatrix class that is a pose container. Finally, the vpDot2 header class corresponds to the tracker that will be used to track the 4 blobs.

#include <visp3/blob/vpDot2.h>
#include <visp3/core/vpCameraParameters.h>
#include <visp3/core/vpConfig.h>
#include <visp3/core/vpHomogeneousMatrix.h>
#include <visp3/core/vpPoint.h>

If we have a look in pose_helper.cpp we will find the computePose() function that does the pose estimation. This function uses as input two vectors. The first one is a vector of vpPoint that contains the 3D coordinates in meters of a point, while the second one is a vector of vpImagePoints that contains the 2D coordinates in pixels of a blob center of gravity. The 3D and 2D coordinates of the points need to be matched. That means that point[i] and ip[i] should r efer to the same physical point. Other inputs are cam that corresponds to the camera intrinsic parameters, and init that indicates if the pose needs to be initialized the first time. We have cMo [in/out] parameter that is as input the pose estimated from the previous image, and, as output the resulting pose estimated from the input points.

void computePose(std::vector<vpPoint> &point, const std::vector<vpImagePoint> &ip, const vpCameraParameters &cam,
{
vpPose pose;
double x = 0, y = 0;
for (unsigned int i = 0; i < point.size(); i++) {
point[i].set_x(x);
point[i].set_y(y);
}
if (init == true) {
}
else {
}
}
void init(unsigned int h, unsigned int w, Type value)
Definition: vpImage.h:630
static void convertPoint(const vpCameraParameters &cam, const double &u, const double &v, double &x, double &y)
Class used for pose computation from N points (pose from point only). Some of the algorithms implemen...
Definition: vpPose.h:80
Definition: vpPose.cpp:95
@ DEMENTHON_LAGRANGE_VIRTUAL_VS
Definition: vpPose.h:101
@ VIRTUAL_VS
Definition: vpPose.h:95
bool computePose(vpPoseMethodType method, vpHomogeneousMatrix &cMo, bool(*func)(const vpHomogeneousMatrix &)=nullptr)
Definition: vpPose.cpp:343

In the computePose() function, we create first an instance of the vpPose class. Then we convert each coordinates of the blob center of gravity ip in 2D coordinates (x,y) in meter in the image plane using vpPixelMeterConversion::convertPoint(). We update then the point instance with the corresponding 2D coordinates and add this point to the vpPose class. Once all the points are added, when init is true we can estimate an initial pose using vpPose::DEMENTHON_LAGRANGE_VIRTUAL_VS method that runs first Dementhon and Lagrange linear pose estimation algorithms, then retain the result that has the lowest residual and finaly use this result to initialize the non linear virtual visual-servoing algorithm to estimate the pose that will converge to the solution with a lower residue than on of the linear approaches. When init is false, we consider that the previous pose cMo is near the solution so that it can be used as initialization for the non linear visual servoing minimization.

If we come back in tutorial-pose-from-points-image.cpp in the main() function, instead of using an image sequence of the object, we consider always the same image square.pgm. In this image we consider a 12cm by 12 cm square. The corners are represented by a blob. Their center of gravity correspond to a corner position. This gray level image is read thanks to:

After the instantiation of a window able to display the image and at the end the result of the estimated pose as a pose frame, we setup the camera parameters. We consider here the following camera model where are the ratio between the focal length and the size of a pixel, and where the principal point is located at the image center position.

vpCameraParameters cam(840, 840, I.getWidth() / 2, I.getHeight() / 2);

Each blob is then tracked using a vector of four vpDot2 trackers. To avoid human interaction, we initialize the tracker with a pixel position inside each blob. Here we also instantiate a vector of vpImagePoint ip that will be updated in the while loop with the 2D coordinates in pixels of the blob center of gravity.

std::vector<vpDot2> dot(4);
std::vector<vpImagePoint> ip(4);
dot[0].initTracking(I, vpImagePoint(193, 157));
dot[1].initTracking(I, vpImagePoint(203, 366));
dot[2].initTracking(I, vpImagePoint(313, 402));
dot[3].initTracking(I, vpImagePoint(304, 133));

We also define the 3D model of our 12cm by 12cm square by setting the 3D coordinates of the corners in an object frame located in this example in the middle of the square. We consider here that the points are located in the plane .

std::vector<vpPoint> point;
point.push_back(vpPoint(-0.06, -0.06, 0));
point.push_back(vpPoint(0.06, -0.06, 0));
point.push_back(vpPoint(0.06, 0.06, 0));
point.push_back(vpPoint(-0.06, 0.06, 0));

Next, we created an homogeneous matrix cMo that will contain the estimated pose.

In the infinite loop, at each iteration we read a new image (in this example it remains the same), display its content in the window, track each blob and update the coordinates of the center of gravity in ip vector.

for (unsigned int i = 0; i < dot.size(); i++) {
dot[i].setGraphics(true);
dot[i].track(I);
ip[i] = dot[i].getCog();
}

Then we call the pose estimation function computePose() presented previously. It uses the 3D coordinates of the points defined as our model in the object frame, and their corresponding 2D positions in the image obtained by the blob tracker to estimate the pose cMo. This homogeneous transformation gives the position of the object frame in the camera frame.

computePose(point, ip, cam, init, cMo);

The resulting pose is displayed as an RGB frame in the window overlay. Red, green and blue colors are for X, Y and Z axis respectively. Each axis is 0.05 meter long. All the drawings are then flushed to update the window content.

At the end of the first iteration, we turn off the init` flag that indicates that the next pose estimation could use our non linear vpPose::VIRTUAL_VS estimation method with the previous pose as initial value.

if (init)
init = false; // turn off pose initialisation

Finally, we interrupt the infinite loop by a user mouse click in the window.

if (vpDisplay::getClick(I, false))
break;

We also introduce a 40 milliseconds sleep to slow down the loop and relax the CPU.

## Using a camera

We provide also an other example in tutorial-pose-from-points-live.cpp that works on live images acquired by a camera. Compared to the previous example, we replace the single image reader by one of the frame grabber available in ViSP (see Tutorial: Image frame grabbing).

2. Run the corresponding binary

To get the usage of the binary run:

./tutorial-pose-from-points-live --help

./tutorial-pose-from-points-live --square_width 0.1

If your camera is calibrated (see Tutorial: Camera intrinsic calibration), you may use your own camera parameters with something similar to:

./tutorial-pose-from-points-live --square_width 0.1 --intrinsic camera.xml --camera_name Camera

If you get the following error:

\$ ./tutorial-pose-from-points-live --square_width 0.1
Catch an exception: No cameras found

it means probably that your camera is not detected by the first grabber that is enabled using:

#if defined(VISP_HAVE_V4L2)
std::ostringstream device;
device << "/dev/video" << opt_device;
std::cout << "Use Video 4 Linux grabber on device " << device.str() << std::endl;
g.setDevice(device.str());
g.setScale(1);
g.open(I);
#elif defined(VISP_HAVE_DC1394)
(void)opt_device; // To avoid non used warning
std::cout << "Use DC1394 grabber" << std::endl;
g.open(I);
#elif defined(VISP_HAVE_CMU1394)
(void)opt_device; // To avoid non used warning
std::cout << "Use CMU1394 grabber" << std::endl;
g.open(I);
#elif defined(VISP_HAVE_FLYCAPTURE)
(void)opt_device; // To avoid non used warning
std::cout << "Use FlyCapture grabber" << std::endl;
g.open(I);
#elif defined(VISP_HAVE_REALSENSE2)
(void)opt_device; // To avoid non used warning
std::cout << "Use Realsense 2 grabber" << std::endl;
rs2::config config;
config.disable_stream(RS2_STREAM_DEPTH);
config.disable_stream(RS2_STREAM_INFRARED);
config.enable_stream(RS2_STREAM_COLOR, 640, 480, RS2_FORMAT_RGBA8, 30);
g.open(config);
g.acquire(I);
std::cout << "Read camera parameters from Realsense device" << std::endl;
#elif defined(HAVE_OPENCV_VIDEOIO)
std::cout << "Use OpenCV grabber on device " << opt_device << std::endl;
cv::VideoCapture g(opt_device); // Open the default camera
if (!g.isOpened()) { // Check if we succeeded
std::cout << "Failed to open the camera" << std::endl;
return EXIT_FAILURE;
}
cv::Mat frame;
g >> frame; // get a new frame from camera
#endif
Firewire cameras video capture based on CMU 1394 Digital Camera SDK.
void open(vpImage< unsigned char > &I)
Class for firewire ieee1394 video devices using libdc1394-2.x api.
void open(vpImage< unsigned char > &I)
@ perspectiveProjWithoutDistortion
Perspective projection without distortion model.
void open(vpImage< unsigned char > &I)
static void convert(const vpImage< unsigned char > &src, vpImage< vpRGBa > &dest)
vpCameraParameters getCameraParameters(const rs2_stream &stream, vpCameraParameters::vpCameraParametersProjType type=vpCameraParameters::perspectiveProjWithDistortion, int index=-1) const
void acquire(vpImage< unsigned char > &grey, double *ts=nullptr)
bool open(const rs2::config &cfg=rs2::config())
Class that is a wrapper over the Video4Linux2 (V4L2) driver.
void open(vpImage< unsigned char > &I)
void setScale(unsigned scale=vpV4l2Grabber::DEFAULT_SCALE)
void setDevice(const std::string &devname)

If this is the case we suggest to modify the following lines in tutorial-pose-from-points-live.cpp by uncommenting all the grabber that you don't want to use:

// #undef VISP_HAVE_V4L2
// #undef VISP_HAVE_DC1394
// #undef VISP_HAVE_CMU1394
// #undef VISP_HAVE_FLYCAPTURE
// #undef VISP_HAVE_REALSENSE2
// #undef HAVE_OPENCV_VIDEOIO

For example, if you want to force OpenCV usage to grab images, uncomment the lines:

#undef VISP_HAVE_V4L2
#undef VISP_HAVE_DC1394
#undef VISP_HAVE_CMU1394
#undef VISP_HAVE_FLYCAPTURE
#undef VISP_HAVE_REALSENSE2

After modification, build and run again this binary.

# Next tutorial

You are now ready to see the next Tutorial: Pose estimation from QRcode that gives a use case.

If you have an RGB-D camera, you may also continue with Tutorial: Planar object pose estimation using RGB-D data..