Visual Servoing Platform  version 3.1.0
Tutorial: Markerless generic model-based tracking

Introduction

This tutorial describes how to use the generic model-based tracker. You are advised to have read these tutorials:

Overview

The main motivations for the introduction of this new class (vpMbGenericTracker) are:

In this perspective, this class can be used instead of the following classes:

Moreover, the features can be combined to improve the tracking robustness (e.g. a stereo tracker with edge + KLT for the left camera and dense depth features for the right camera).

The following video demonstrates the use of the generic tracker with a RGB-D sensor using the following features:

  • edge + KLT features for the color camera
  • depth normal features for the depth sensor

In this video, the same setup is used but with:

  • edge features for the color camera
  • dense depth features for the depth sensor

Note that all the material (source code and video) described in this tutorial is part of ViSP source code and could be downloaded using the following command:

$ svn export https://github.com/lagadic/visp.git/trunk/tutorial/tracking/model-based/stereo-generic

Features overview

The basis type of features that can be used with the model-based tracker are:

  • moving-edges feature (line, face, cylinder and circle primitive) [7]
  • KLT feature (face and cylinder primitives) [36]
  • depth normal feature (face primitives) (since ViSP 3.1.0)
  • dense depth feature (face primitives) (since ViSP 3.1.0)

The moving-edges and KLT features require a RGB camera, more precisely these features operate on grayscale image. The depth normal and dense depth features require a depth map that can be obtained from a RGB-D camera for example.

If you want to use a depth feature, we advise you to choose the dense depth features that is a much more robust method compared to the depth normal feature. Keep in mind also that Kinect-like sensors have a limited depth range (from ~0.8m to ~3.5m).

Note
It is recommended to install an optimized BLAS library (for instance OpenBLAS) to get better performance with the dense depth feature approach which requires large matrix operations. On Ubuntu Xenial, it is the libopenblas-dev package that should be installed. To select or switch the BLAS library to use, see Handle different versions of BLAS and LAPACK:
$ update-alternatives --config libblas.so.3
You should have something similar to this:

There are 3 choices for the alternative libblas.so.3 (providing /usr/lib/libblas.so.3).

Selection Path Priority Status
0 /usr/lib/openblas-base/libblas.so.3 40 auto mode
1 /usr/lib/atlas-base/atlas/libblas.so.3 35 manual mode
2 /usr/lib/libblas/libblas.so.3 10 manual mode
3 /usr/lib/openblas-base/libblas.so.3 40 manual mode

Input data

For classical features working on grayscale image, you have to use the following data type:

You can convert to a grayscale image if the image acquired is in RGBa data type:

// Color image acquisition

For depth features, you need to supply a pointcloud, that means a 2D array where each element contains the X, Y and Z coordinate in the depth sensor frame. The following data type are accepted:

  • a vector of vpColVector: the column vector being a 3x1 (X, Y, Z)
    std::vector<vpColVector> pointcloud(640*480);
    // Fill the pointcloud
    vpColVector coordinate(3);
    coordinate[0] = X;
    coordinate[1] = Y;
    coordinate[2] = Z;
    pointcloud[0] = coordinate;
  • a PCL pointcloud smart pointer of pcl::PointXYZ data type:
    pcl::PointCloud<pcl::PointXYZ>::Ptr pointcloud;

If you have only a depth map, a 2D array where each element is the distance in meter between the depth sensor frame to the object, you will need to compute the 3D coordinates using the depth sensor intrinsic parameters. For instance, without taking into account the distortion (see also vpPixelMeterConversion::convertPoint), the conversion is (u and v are the pixel coordinates, u0, v0, px, py are the intrinsic parameters):

\[ \left\{\begin{matrix} X = \frac{u - u_0}{px} \times Z \\ Y = \frac{v - v_0}{px} \times Z \end{matrix}\right. \]

Here an example of a depth map:

tutorial-tracking-mb-generic-depth-map.png

Getting started

Implementation detail

Note
This section is similar to the Implementation detail section in Tutorial: Markerless model-based tracking with stereo cameras

Each tracker is stored in a map, the key corresponding to the name of the camera on which the tracker will process. By default, the camera names are set to:

  • "Camera" when the tracker is constructed with one camera.
  • "Camera1" to "CameraN" when the tracker is constructed with N cameras.
  • The default reference camera will be "Camera1" in the multiple cameras case.
img-multi-cameras-config.png
Default name convention and reference camera ("Camera1").

To deal with multiple cameras, in the virtual visual servoing control law we concatenate all the interaction matrices and residual vectors and transform them in a single reference camera frame to compute the reference camera velocity. Thus, we have to know the transformation matrix between each camera and the reference camera.

For example, if the reference camera is "Camera1" ( $ c_1 $), we need the following information: $ _{}^{c_2}\textrm{M}_{c_1}, _{}^{c_3}\textrm{M}_{c_1}, \cdots, _{}^{c_n}\textrm{M}_{c_1} $.

Declare a model-based tracker of the desired feature type

The following enumeration (vpMbGenericTracker::vpTrackerType) values are available to get the desired model-based tracker:

The tracker declaration is then:

To combine the features:

To "fully exploit" the capabilities of a RGB-D sensor, you can use for instance:

std::vector<int> trackerTypes(2);
vpMbGenericTracker tracker(trackerTypes);

This will declare a tracker with edge + KLT features for the color camera and dense depth feature for the depth sensor.

Interfacing with the code

Note
This section is more or less similar to the Interfacing with the code section in Tutorial: Markerless model-based tracking with stereo cameras

Each essential method used to initialize the tracker and process the tracking have three signatures in order to ease the call to the method and according to three working modes:

  • tracking using one camera, the signature remains the same.
  • tracking using two cameras, all the necessary methods accept directly the corresponding parameters for each camera.
  • tracking using multiple cameras, you have to supply the different parameters within a map. The key corresponds to the name of the camera and the value is the parameter.

The following table sums up how to call the different methods based on the camera configuration for the main functions.

Example of the different method signatures.
Method calling example: Monocular case Stereo case Multiple cameras case Remarks
Construct a model-based edge tracker: vpMbGenericTracker tracker vpMbGenericTracker tracker(2, vpMbGenericTracker::EDGE_TRACKER) vpMbGenericTracker tracker(5, vpMbGenericTracker::EDGE_TRACKER) The default constructor corresponds to a monocular edge tracker.
Load a configuration file: tracker.loadConfigFile("config.xml") tracker.loadConfigFile("config1.xml", "config2.xml") tracker.loadConfigFile(mapOfConfigFiles) Each tracker can have different parameters (intrinsic parameters, visibility angles, etc.).
Load a model file: tracker.loadModel("model.cao") tracker.loadModel("model1.cao", "model2.cao") tracker.loadModel(mapOfModelFiles) Different models can be used.
Get the intrinsic camera parameters: tracker.getCameraParameters(cam) tracker.getCameraParameters(cam1, cam2) tracker.getCameraParameters(mapOfCam)
Set the transformation matrix between each camera and the reference one: tracker.setCameraTransformationMatrix(mapOfCamTrans) tracker.setCameraTransformationMatrix(mapOfCamTrans) For the reference camera, the identity homogeneous matrix must be used.
Setting to display the features: tracker.setDisplayFeatures(true) tracker.setDisplayFeatures(true) tracker.setDisplayFeatures(true) This is a general parameter.
Initialize the pose by click: tracker.initClick(I, "f_init.init") tracker.initClick(I1, I2, "f_init1.init", "f_init2.init") tracker.initClick(mapOfImg, mapOfInitFiles) Assuming the transformation matrices between the cameras have been set, some init files can be omitted as long as the reference camera has an init file. The corresponding images must be supplied.
Track the object: tracker.track(I) tracker.track(I1, I2) tracker.track(mapOfImg) See the documentation to track with pointcloud.
Get the pose: tracker.getPose(cMo) tracker.getPose(c1Mo, c2Mo) tracker.getPose(mapOfPoses) tracker.getPose(cMo) will return the pose for the reference camera in the stereo/multiple cameras configurations.
Display the model: tracker.display(I, cMo, cam, ...) tracker.display(I1, I2, c1Mo, c2Mo, cam1, cam2, ...) tracker.display(mapOfImg, mapOfPoses, mapOfCam, ...)
Note
As the trackers are stored in an alphabetic order internally, you have to match the method parameters with the correct tracker position in the map in the stereo cameras case.

Example code

Monocular model-based tracking

The following example comes from tutorial-mb-generic-tracker-stereo-mono.cpp and allows to track a tea box modeled in cao format.

Once built, to choose which tracker to use, run the binary with the following argument:

$ ./tutorial-mb-generic-tracker-stereo-mono --name <video name> --tracker <0=egde|1=klt|2=hybrid>

The config, model, init files should be named according to the video name: teabox.mpg / teabox.cao / teabox.xml / teabox.init

Warning
ViSP must have been built with OpenCV and the KLT module must be enabled to run this tutorial. The xml2 library is needed if you want to use a configuration file, otherwise you will have to configure the tracker in the code.
Note
The command line:
$ ./tutorial-mb-generic-tracker-stereo-mono
will run the model-based edge tracker on the teabox sequence.

The source code is the following:

#include <cstdlib>
#include <visp3/core/vpConfig.h>
#include <visp3/core/vpIoTools.h>
#include <visp3/gui/vpDisplayGDI.h>
#include <visp3/gui/vpDisplayOpenCV.h>
#include <visp3/gui/vpDisplayX.h>
#include <visp3/io/vpImageIo.h>
#include <visp3/mbt/vpMbGenericTracker.h>
#include <visp3/io/vpVideoReader.h>
int main(int argc, char **argv)
{
#if defined(VISP_HAVE_OPENCV) && (VISP_HAVE_OPENCV_VERSION >= 0x020300)
try {
std::string opt_videoname = "teabox.mpg";
for (int i = 0; i < argc; i++) {
if (std::string(argv[i]) == "--name" && i + 1 < argc)
opt_videoname = std::string(argv[i + 1]);
else if (std::string(argv[i]) == "--tracker" && i + 1 < argc)
opt_tracker = atoi(argv[i + 1]);
else if (std::string(argv[i]) == "--help") {
std::cout << "\nUsage: " << argv[0]
<< " [--name <video name>] [--tracker "
"<1=egde|2=keypoint|3=hybrid>] [--help]\n"
<< std::endl;
return EXIT_SUCCESS;
}
}
if (opt_tracker < 1 || opt_tracker > 3) {
std::cerr << "Wrong tracker type. Correct values are: "
"1=egde|2=keypoint|3=hybrid."
<< std::endl;
return EXIT_SUCCESS;
}
std::string parentname = vpIoTools::getParent(opt_videoname);
std::string objectname = vpIoTools::getNameWE(opt_videoname);
if (!parentname.empty()) {
objectname = parentname + "/" + objectname;
}
std::cout << "Video name: " << opt_videoname << std::endl;
std::cout << "Tracker requested config files: " << objectname << ".[init, cao]" << std::endl;
std::cout << "Tracker optional config files: " << objectname << ".[ppm]" << std::endl;
g.setFileName(opt_videoname);
g.open(I);
#if defined(VISP_HAVE_X11)
vpDisplayX display;
#elif defined(VISP_HAVE_GDI)
vpDisplayGDI display;
#else
vpDisplayOpenCV display;
#endif
display.init(I, 100, 100, "Model-based tracker");
vpMbGenericTracker tracker(1, opt_tracker);
#if !defined(VISP_HAVE_MODULE_KLT)
if (opt_tracker >= 2) {
std::cout << "KLT and hybrid model-based tracker are not available "
"since visp_klt module is missing"
<< std::endl;
return EXIT_SUCCESS;
}
#endif
#ifdef VISP_HAVE_XML2
tracker.loadConfigFile(objectname + ".xml");
#else
if (opt_tracker == 1 || opt_tracker == 3) {
vpMe me;
me.setMaskSize(5);
me.setMaskNumber(180);
me.setRange(8);
me.setThreshold(10000);
me.setMu1(0.5);
me.setMu2(0.5);
tracker.setMovingEdge(me);
}
#ifdef VISP_HAVE_MODULE_KLT
if (opt_tracker == 2 || opt_tracker == 3) {
vpKltOpencv klt_settings;
tracker.setKltMaskBorder(5);
klt_settings.setMaxFeatures(300);
klt_settings.setWindowSize(5);
klt_settings.setQuality(0.015);
klt_settings.setMinDistance(8);
klt_settings.setHarrisFreeParameter(0.01);
klt_settings.setBlockSize(3);
klt_settings.setPyramidLevels(3);
tracker.setKltOpencv(klt_settings);
}
#endif
{
cam.initPersProjWithoutDistortion(839.21470, 839.44555, 325.66776, 243.69727);
tracker.setCameraParameters(cam);
}
#endif
tracker.loadModel(objectname + ".cao");
tracker.setDisplayFeatures(true);
tracker.initClick(I, objectname + ".init", true);
while (!g.end()) {
g.acquire(I);
tracker.track(I);
tracker.getPose(cMo);
tracker.getCameraParameters(cam);
tracker.display(I, cMo, cam, vpColor::red, 2, true);
vpDisplay::displayFrame(I, cMo, cam, 0.025, vpColor::none, 3);
vpDisplay::displayText(I, 10, 10, "A click to exit...", vpColor::red);
if (vpDisplay::getClick(I, false)) {
break;
}
}
} catch (const vpException &e) {
std::cerr << "Catch a ViSP exception: " << e.what() << std::endl;
}
return EXIT_SUCCESS;
#else
(void)argc;
(void)argv;
std::cout << "Install OpenCV and rebuild ViSP to use this example." << std::endl;
return EXIT_SUCCESS;
#endif
}

Explanation of the code (monocular model-based tracking)

The previous source code shows how to use the new vpMbGenericTracker class as a replacement of the classical classes. The procedure to configure the tracker remains the same:

  • construct the tracker
  • configure the tracker by loading a configuration file
  • load a 3D model
  • track on the current image
  • get the current pose and display the model in the image

Please refer to the tutorial Tutorial: Markerless model-based tracking in order to have explanations about the configuration parameters (Tracker settings) and how to model an object in a ViSP compatible format (CAD model in cao format).

The required include is:

#include <visp3/mbt/vpMbGenericTracker.h>

We need a grayscale image for the tracking:

To set the type of the tracker (the first parameter is the number of cameras, the second parameter is the type of the tracker(s)):

vpMbGenericTracker tracker(1, opt_tracker);

To load the configuration file, we use:

tracker.loadConfigFile(objectname + ".xml");

Otherwise you can parameter the tracker directly in the code:

#ifdef VISP_HAVE_XML2
tracker.loadConfigFile(objectname + ".xml");
#else
if (opt_tracker == 1 || opt_tracker == 3) {
vpMe me;
me.setMaskSize(5);
me.setMaskNumber(180);
me.setRange(8);
me.setThreshold(10000);
me.setMu1(0.5);
me.setMu2(0.5);
tracker.setMovingEdge(me);
}
#ifdef VISP_HAVE_MODULE_KLT
if (opt_tracker == 2 || opt_tracker == 3) {
vpKltOpencv klt_settings;
tracker.setKltMaskBorder(5);
klt_settings.setMaxFeatures(300);
klt_settings.setWindowSize(5);
klt_settings.setQuality(0.015);
klt_settings.setMinDistance(8);
klt_settings.setHarrisFreeParameter(0.01);
klt_settings.setBlockSize(3);
klt_settings.setPyramidLevels(3);
tracker.setKltOpencv(klt_settings);
}
#endif
{
cam.initPersProjWithoutDistortion(839.21470, 839.44555, 325.66776, 243.69727);
tracker.setCameraParameters(cam);
}
#endif

To load the 3D object model, we use:

tracker.loadModel(objectname + ".cao");

We can also use the following setting that enables the display of the features used during the tracking:

tracker.setDisplayFeatures(true);

The initial pose is set by clicking on specific points in the image:

tracker.initClick(I, objectname + ".init", true);

The tracking is done by:

tracker.track(I);

To get the current camera pose:

tracker.getPose(cMo);

To display the model with the estimated pose:

tracker.getCameraParameters(cam);
tracker.display(I, cMo, cam, vpColor::red, 2, true);

Stereo model-based tracking

The following example comes from tutorial-mb-generic-tracker-stereo.cpp and allows to track a tea box modeled in cao format using a stereo rig.

Once built, to choose which tracker to use, run the binary with the following argument:

$ ./tutorial-mb-generic-tracker-stereo --name <video name left> <video name right>] --tracker <0=egde|1=klt|2=hybrid> <0=egde|1=klt|2=hybrid>

The config, model, init files should be named according to the video name: teabox_left.mpg / teabox_left.cao / teabox_left.xml / teabox_left.init, ...

Warning
ViSP must have been built with OpenCV and the KLT module must be enabled to run this tutorial. The xml2 library is required in this tutorial code.
Note
The command line:
$ ./tutorial-mb-generic-tracker-stereo
will run the stereo model-based edge tracker on the stereo teabox sequence.

Explanation of the code (stereo model-based tracking)

The procedure is almost identical to the monocular case.

The include is:

#include <visp3/mbt/vpMbGenericTracker.h>

Two images are needed:

vpImage<unsigned char> I_left, I_right;

To set the types of the tracker:

std::vector<int> trackerTypes(2);
trackerTypes[0] = opt_tracker1;
trackerTypes[1] = opt_tracker2;
vpMbGenericTracker tracker(trackerTypes);
#if !defined(VISP_HAVE_MODULE_KLT)
if (opt_tracker >= 2) {
std::cout << "klt and hybrid model-based tracker are not available "
"since visp_klt module is missing"
<< std::endl;
return EXIT_SUCCESS;
}
#endif

To load the configuration files, we use:

tracker.loadConfigFile(objectname_left + ".xml", objectname_right + ".xml");

We have to set the transformation matrices between the cameras and the reference camera to be able to compute the control law in a reference camera frame. In the code we consider the left camera with the name "Camera1" as the reference camera. For the right camera with the name "Camera2" we have to set the transformation ( $ ^{c_{right}}{\bf M}_{c_{left}} $). This transformation is read from cRightMcLeft.txt file. Since our left and right cameras are not moving, this transformation is constant and does not need to be updated in the tracking loop:

Note
For the reference camera, the camera transformation matrix has to be specified as an identity homogeneous matrix (no rotation, no translation). By default the vpHomogeneousMatrix constructor builds an identity matrix.
vpHomogeneousMatrix cRightMcLeft;
std::ifstream file_cRightMcLeft("cRightMcLeft.txt");
cRightMcLeft.load(file_cRightMcLeft);
std::map<std::string, vpHomogeneousMatrix> mapOfCameraTransformationMatrix;
mapOfCameraTransformationMatrix["Camera1"] = vpHomogeneousMatrix();
mapOfCameraTransformationMatrix["Camera2"] = cRightMcLeft;
tracker.setCameraTransformationMatrix(mapOfCameraTransformationMatrix);

To load the 3D object model, we use:

tracker.loadModel(objectname_left + ".cao", objectname_right + ".cao");

We can also use the following setting that enables the display of the features used during the tracking:

tracker.setDisplayFeatures(true);

The initial poses are set by clicking on specific points in the image:

tracker.initClick(I_left, I_right, objectname_left + ".init", objectname_right + ".init", true);

The tracking is done by:

tracker.track(I_left, I_right);

To get the current camera poses:

vpHomogeneousMatrix cLeftMo, cRightMo;
tracker.getPose(cLeftMo, cRightMo);

To display the model with the estimated poses:

vpCameraParameters cam_left, cam_right;
tracker.getCameraParameters(cam_left, cam_right);
tracker.display(I_left, I_right, cLeftMo, cRightMo, cam_left, cam_right, vpColor::red, 2);

Next tutorial

You are now ready to see the next Tutorial: Markerless model-based tracker CAD model editor - GSoC 2017 project.