Visual Servoing Platform  version 3.6.1 under development (2024-12-17)
Tutorial: Markerless generic model-based tracking using a stereo camera

Introduction

Before following this tutorial, we suggest that you follow Tutorial: Markerless generic model-based tracking using a color camera to learn the bases of the tracker.

This tutorial describes how to extend the generic model model-based tracker implemented in vpMbGenericTracker class using simultaneously multiple camera views acquired by a stereo camera. Our implementation doesn't limit the number of cameras observing the object to track or some parts of the object. It allows to track the object in the images viewed by a set of cameras while providing its 3D localization. Calibrated cameras (intrinsic and extrinsic between the reference and the other cameras) are required.

The mbt ViSP module allows the tracking of a markerless object using the knowledge of its CAD model. Considered objects have to be modeled by segment, circle or cylinder primitives. The model of the object could be defined in vrml format (except for circles) or in cao (our own format).

The visual features that could be considered by multiple cameras are the moving-edges, the keypoints (KLT features) or a combination of them in a hybrid scheme when the object is textured and has visible edges (see Features overview). They are the same than the one used for a single camera.

The vpMbGenericTracker class allow the tracking of the same object assuming two or more cameras: The main advantages of this configuration with respect to the monocular camera case (see Tutorial: Markerless generic model-based tracking using a color camera) concern:

  • the possibility to extend the application field of view;
  • a more robust tracking as the configuration of the stereo rig allows to track the object under multiple viewpoints and thus with more visual features.

In order to achieve this, the following information are required:

In the following sections, we consider the tracking of a tea box modeled in cao format. A stereo camera sees this object. The following video shows the tracking performed with vpMbGenericTracker. In this example, the fixed cameras located on the Romeo Humanoid robot head captured the images.

This other video shows the behavior of the hybrid tracking using moving-edges and keypoints as visual features.

Note
The cameras can move, but the tracking will be effective as long as the transformation matrix between the cameras and the reference camera is known and updated at each iteration (see How to deal with moving cameras).
The vpMbGenericTracker class is not restricted to stereo configuration. It allows also the usage of multiple cameras (3 or more cameras).

Next sections will highlight how to easily adapt your code to use multiple cameras with the generic model-based tracker. As only the new methods dedicated to multiple views tracking will be presented, you are highly recommended to follow Tutorial: Markerless generic model-based tracking using a color camera in order to be familiar with the generic model-based tracking concepts and with the configuration part.

Note that all the material (source code, input video, CAD model or XML settings files) described in this tutorial is part of ViSP source code (in tracking/model-based/generic-stereo folder) and could be found in https://github.com/lagadic/visp/tree/master/tracking/model-based/generic-stereo.

Getting started

To start with the generic markerless model-based tracker using a stereo camera, we recommend to understand the tutorial-mb-generic-tracker-stereo.cpp source code that is given and explained below.

Overview

The generic model-based tracker available for multiple views tracking rely on the same tracker than in the monocular case. Our implementation in vpMbGenericTracker class permits to easily extend the usage of the model-based tracker to multiple cameras with the guarantee to preserve the same behavior compared to the tracking in the monocular configuration.

Implementation detail

Each tracker is stored in a map, the key corresponding to the name of the camera on which the tracker will process. By default, the camera names are set to:

  • "Camera" when the tracker is constructed with one camera.
  • "Camera1" to "CameraN" when the tracker is constructed with N cameras.
  • The default reference camera will be "Camera1" in the multiple cameras case.
Default name convention and reference camera ("Camera1").

To deal with multiple cameras, in the virtual visual servoing control law we concatenate all the interaction matrices and residual vectors and transform them in a single reference camera frame to compute the reference camera velocity. Thus, we have to know the transformation matrix between each camera and the reference camera.

For example, if the reference camera is "Camera1" ( $ c_1 $), we need the following information: $ _{}^{c_2}\textrm{M}_{c_1}, _{}^{c_3}\textrm{M}_{c_1}, \cdots, _{}^{c_n}\textrm{M}_{c_1} $.

Example input/output data

The tutorial-mb-generic-tracker-stereo.cpp example uses the following data as input:

  • two video files; teabox_left.mpg and teabox_right.mpg that are the default videos and that could be changed using --name command line option. They need to be synchronized and correspond to the images acquired by a left and right camera.
  • two configuration files in xml format teabox_left.xml and teabox_right.xml that contain the tracker settings and camera parameters. See Settings from an XML file to know more about the content of these files. When using different cameras, since the intrinsic camera parameters differ the content of these *.xml may be different. There is also the possibility to set these parameters in the code without using an xml file; see Moving-edges settings and Keypoints settings.
  • two cad models that describes the object to track. In the example we use by default teabox_right.cao and teabox_left.cao. As this is the case here, the CAD models are generally the same when using a stereo configuration. See Tracker CAD model section to learn how the teabox is modeled and section CAD model in cao format to learn how to model an other object.
  • two files with extension *.init that contain the 3D coordinates of some points used to compute an initial pose which serves to initialize the tracker. The user has than to click in the left and right images on the corresponding 2D points. The default files are named teabox_left.init and teabox_left.right. In our case they have the same content, but depending on the point of view, sometime it could be useful that they differ to ensure that the 3D points are visible during initialization. The content of these files are detailed in Source code explained section.
  • two optional images with extension *.ppm that may help the user to remember the location of the corresponding 3D points specified in *.init file. By default we use teabox_left.ppm and teabox_right.ppm.
  • the transformation between the two cameras. Here we use cRightMcLeft.txt file that express the transformation between the right camera frame and the left camera frame. Using this transformation we can compute the 3D coordinates of a point in left camera frame in the right camera frame.

As an output the tracker provides the two poses $^c {\bf M}_o $ corresponding to a 4 by 4 matrix that corresponds to the geometric transformation between the frame attached to the object (in our case the tea box) and the frame attached to the left camera and the one attached to the right camera. The poses are returned as a vpHomogeneousMatrix container.

Example code

The following example comes from tutorial-mb-generic-tracker-stereo.cpp and allows to track a tea box modeled in cao format. In this example we consider a stereo configuration with images from a left camera and images from a right camera.

Once built, to choose which tracker to use on each camera, run the binary with the following argument:

$ ./tutorial-mb-generic-tracker-stereo --tracker <1=egde|2=klt|3=hybrid> <1=egde|2=klt|3=hybrid>

For example, to use moving edges features on images acquired by the left camera and an hybrid scheme on images acquired by the right camera, run:

$ ./tutorial-mb-generic-tracker-stereo --tracker 1 2

The source code is the following:

#include <cstdlib>
#include <visp3/core/vpConfig.h>
#include <visp3/core/vpIoTools.h>
#include <visp3/gui/vpDisplayGDI.h>
#include <visp3/gui/vpDisplayOpenCV.h>
#include <visp3/gui/vpDisplayX.h>
#include <visp3/io/vpImageIo.h>
#include <visp3/mbt/vpMbGenericTracker.h>
#include <visp3/io/vpVideoReader.h>
int main(int argc, char **argv)
{
#if defined(VISP_HAVE_OPENCV) && defined(VISP_HAVE_PUGIXML)
#ifdef ENABLE_VISP_NAMESPACE
using namespace VISP_NAMESPACE_NAME;
#endif
try {
std::string opt_videoname_left = "teabox_left.mp4";
std::string opt_videoname_right = "teabox_right.mp4";
for (int i = 0; i < argc; i++) {
if (std::string(argv[i]) == "--name" && i + 2 < argc) {
opt_videoname_left = std::string(argv[i + 1]);
opt_videoname_right = std::string(argv[i + 2]);
}
else if (std::string(argv[i]) == "--tracker" && i + 2 < argc) {
opt_tracker1 = atoi(argv[i + 1]);
opt_tracker2 = atoi(argv[i + 2]);
}
else if (std::string(argv[i]) == "--help") {
std::cout << "\nUsage: " << argv[0]
<< " [--name <video name left> <video name right>]"
" [--tracker <1=egde|2=klt|3=hybrid> <1=egde|2=klt|3=hybrid>]"
" [--help]\n"
<< std::endl;
return EXIT_SUCCESS;
}
}
if ((opt_tracker1 < 1 || opt_tracker1 > 3) && (opt_tracker2 < 1 || opt_tracker2 > 3)) {
std::cerr << "Wrong tracker type. Correct values are: "
"1=egde|2=keypoint|3=hybrid."
<< std::endl;
return EXIT_SUCCESS;
}
std::string parentname = vpIoTools::getParent(opt_videoname_left);
std::string objectname_left = vpIoTools::getNameWE(opt_videoname_left);
std::string objectname_right = vpIoTools::getNameWE(opt_videoname_right);
if (!parentname.empty()) {
objectname_left = parentname + "/" + objectname_left;
}
std::cout << "Video name: " << opt_videoname_left << " ; " << opt_videoname_right << std::endl;
std::cout << "Tracker requested config files: " << objectname_left << ".[init, cao]"
<< " and " << objectname_right << ".[init, cao]" << std::endl;
std::cout << "Tracker optional config files: " << opt_videoname_left << ".ppm"
<< " and " << opt_videoname_right << ".ppm" << std::endl;
vpImage<unsigned char> I_left, I_right;
vpVideoReader g_left, g_right;
g_left.setFileName(opt_videoname_left);
g_left.open(I_left);
g_right.setFileName(opt_videoname_right);
g_right.open(I_right);
#if defined(VISP_HAVE_X11)
vpDisplayX display_left;
vpDisplayX display_right;
#elif defined(VISP_HAVE_GDI)
vpDisplayGDI display_left;
vpDisplayGDI display_right;
#elif defined(HAVE_OPENCV_HIGHGUI)
vpDisplayOpenCV display_left;
vpDisplayOpenCV display_right;
#endif
display_right.setDownScalingFactor(vpDisplay::SCALE_AUTO);
display_left.init(I_left, 100, 100, "Model-based tracker (Left)");
display_right.init(I_right, 110 + (int)I_left.getWidth(), 100, "Model-based tracker (Right)");
std::vector<int> trackerTypes(2);
trackerTypes[0] = opt_tracker1;
trackerTypes[1] = opt_tracker2;
vpMbGenericTracker tracker(trackerTypes);
#if !defined(VISP_HAVE_MODULE_KLT)
unsigned int nbTracker = trackerTypes.size();
for (unsigned int i = 0; i < nbTracker; ++i) {
if (trackerTypes[i] >= 2) {
std::cout << "klt and hybrid model-based tracker are not available "
"since visp_klt module is missing"
<< std::endl;
return EXIT_SUCCESS;
}
}
#endif
tracker.loadConfigFile(objectname_left + ".xml", objectname_right + ".xml");
tracker.loadModel(objectname_left + ".cao", objectname_right + ".cao");
tracker.setDisplayFeatures(true);
vpHomogeneousMatrix cRightMcLeft;
std::ifstream file_cRightMcLeft("cRightMcLeft.txt");
cRightMcLeft.load(file_cRightMcLeft);
std::map<std::string, vpHomogeneousMatrix> mapOfCameraTransformationMatrix;
mapOfCameraTransformationMatrix["Camera1"] = vpHomogeneousMatrix();
mapOfCameraTransformationMatrix["Camera2"] = cRightMcLeft;
tracker.setCameraTransformationMatrix(mapOfCameraTransformationMatrix);
tracker.initClick(I_left, I_right, objectname_left + ".init", objectname_right + ".init", true);
while (!g_left.end() && !g_right.end()) {
g_left.acquire(I_left);
g_right.acquire(I_right);
tracker.track(I_left, I_right);
vpHomogeneousMatrix cLeftMo, cRightMo;
tracker.getPose(cLeftMo, cRightMo);
vpCameraParameters cam_left, cam_right;
tracker.getCameraParameters(cam_left, cam_right);
tracker.display(I_left, I_right, cLeftMo, cRightMo, cam_left, cam_right, vpColor::red, 2);
vpDisplay::displayFrame(I_left, cLeftMo, cam_left, 0.025, vpColor::none, 3);
vpDisplay::displayFrame(I_right, cRightMo, cam_right, 0.025, vpColor::none, 3);
vpDisplay::displayText(I_left, 10, 10, "A click to exit...", vpColor::red);
vpDisplay::flush(I_right);
if (vpDisplay::getClick(I_left, false)) {
break;
}
}
}
catch (const vpException &e) {
std::cerr << "Catch a ViSP exception: " << e.what() << std::endl;
}
#else
(void)argc;
(void)argv;
std::cout << "Install OpenCV and rebuild ViSP to use this example." << std::endl;
#endif
}
Generic class defining intrinsic camera parameters.
static const vpColor red
Definition: vpColor.h:217
static const vpColor none
Definition: vpColor.h:229
Display for windows using GDI (available on any windows 32 platform).
Definition: vpDisplayGDI.h:130
The vpDisplayOpenCV allows to display image using the OpenCV library. Thus to enable this class OpenC...
static bool getClick(const vpImage< unsigned char > &I, bool blocking=true)
virtual void setDownScalingFactor(unsigned int scale)
Definition: vpDisplay.cpp:233
static void display(const vpImage< unsigned char > &I)
static void displayFrame(const vpImage< unsigned char > &I, const vpHomogeneousMatrix &cMo, const vpCameraParameters &cam, double size, const vpColor &color=vpColor::none, unsigned int thickness=1, const vpImagePoint &offset=vpImagePoint(0, 0), const std::string &frameName="", const vpColor &textColor=vpColor::black, const vpImagePoint &textOffset=vpImagePoint(15, 15))
static void flush(const vpImage< unsigned char > &I)
@ SCALE_AUTO
Definition: vpDisplay.h:184
static void displayText(const vpImage< unsigned char > &I, const vpImagePoint &ip, const std::string &s, const vpColor &color)
error that can be emitted by ViSP classes.
Definition: vpException.h:60
const char * what() const
Definition: vpException.cpp:71
Implementation of an homogeneous matrix and operations on such kind of matrices.
void load(std::ifstream &f)
unsigned int getWidth() const
Definition: vpImage.h:242
static std::string getNameWE(const std::string &pathname)
Definition: vpIoTools.cpp:1227
static std::string getParent(const std::string &pathname)
Definition: vpIoTools.cpp:1314
Real-time 6D object pose tracking using its CAD model.
Class that enables to manipulate easily a video file or a sequence of images. As it inherits from the...
void acquire(vpImage< vpRGBa > &I)
void open(vpImage< vpRGBa > &I)
void setFileName(const std::string &filename)

Explanation of the code

The previous source code shows how to use a model-based tracking on stereo images using the standard procedure to configure the tracker:

  • construct the tracker. Only one tracker is used even if we consider multiple views.
  • initialize the tracker by loading the configuration files (*.xml and *.init) for each camera view
  • load a 3D model (*.cao) for each camera view
  • start a while loop
    • acquire left and right images
    • process the tracking
    • get the pose and display the model in the images
Warning
OpenCV is required and the KLT module has to be enabled to use the KLT functionality. See Considered third-parties. Instead of using xml files, there is also the possibility to set the tracker settings modifying the code; see Moving-edges settings and Keypoints settings.
Note
Please refer to the tutorial Tutorial: Markerless generic model-based tracking using a color camera in order to have the explanations about the configuration parameters (Tracker settings) and how to model an object in a ViSP compatible format (CAD model in cao format).

Below we give an explanation of the source code.

First the vpMbGenericTracker header is included:

#include <visp3/mbt/vpMbGenericTracker.h>

We declare two images for the left and right camera views.

vpImage<unsigned char> I_left, I_right;

To construct a stereo tracker, we have to specify for each tracker which are the features to be considered as argument given to the tracker constructors. That is why whe create a vector of size 2 that contains the required vpMbGenericTracker::vpTrackerType:

std::vector<int> trackerTypes(2);
trackerTypes[0] = opt_tracker1;
trackerTypes[1] = opt_tracker2;
vpMbGenericTracker tracker(trackerTypes);

All the configuration parameters for the tracker are stored in xml configuration files. To load the different files that contain also the intrinsic camera parameters, we use:

tracker.loadConfigFile(objectname_left + ".xml", objectname_right + ".xml");

To load the 3D object model, we use:

tracker.loadModel(objectname_left + ".cao", objectname_right + ".cao");

We can also use the following setting that enables the display of the features used during the tracking:

tracker.setDisplayFeatures(true);

We have to set the transformation matrices between the cameras and the reference camera to be able to compute the control law in a reference camera frame. In the code we consider the left camera with the name "Camera1" as the reference camera. For the right camera with the name "Camera2" we have to set the transformation ( $ ^{c_{right}}{\bf M}_{c_{left}} $). This transformation is read from cRightMcLeft.txt file. Since our left and right cameras are not moving, this transformation is constant and has not to be updated in the tracking loop:

Note
For the reference camera, the camera transformation matrix has to be specified as an identity homogeneous matrix (no rotation, no translation). By default the vpHomogeneousMatrix constructor builds an identity matrix.
vpHomogeneousMatrix cRightMcLeft;
std::ifstream file_cRightMcLeft("cRightMcLeft.txt");
cRightMcLeft.load(file_cRightMcLeft);
std::map<std::string, vpHomogeneousMatrix> mapOfCameraTransformationMatrix;
mapOfCameraTransformationMatrix["Camera1"] = vpHomogeneousMatrix();
mapOfCameraTransformationMatrix["Camera2"] = cRightMcLeft;
tracker.setCameraTransformationMatrix(mapOfCameraTransformationMatrix);

The initial pose is set by clicking on specific points in the image:

tracker.initClick(I_left, I_right, objectname_left + ".init", objectname_right + ".init", true);

The tracking is done by:

tracker.track(I_left, I_right);

The poses for each camera are retrieved with:

vpHomogeneousMatrix cLeftMo, cRightMo;
tracker.getPose(cLeftMo, cRightMo);

To display the model with the estimated pose, we use:

vpCameraParameters cam_left, cam_right;
tracker.getCameraParameters(cam_left, cam_right);
tracker.display(I_left, I_right, cLeftMo, cRightMo, cam_left, cam_right, vpColor::red, 2);

Advanced

How to deal with moving cameras

The principle remains the same than with static cameras. You have to supply the camera transformation matrices to the tracker each time the cameras move and before calling the track method:

mapOfCamTrans["Camera1"] = vpHomogeneousMatrix(); //The Camera1 is the reference camera.
mapOfCamTrans["Camera2"] = get_c2Mc1(); //Get the new transformation between the two cameras.
tracker.setCameraTransformationMatrix(mapOfCamTrans);
tracker.track(mapOfImg);

This information can be available through the robot kinematics or using different kind of sensors.

The following video shows the stereo hybrid model-based tracking based on object edges and KLT features located on visible faces. The result of the tracking is then used to servo the Romeo humanoid robot eyes to gaze toward the object. The images were captured by cameras located in the Romeo eyes.

Next tutorial

You are now ready to see the next Tutorial: Markerless generic model-based tracking using a RGB-D camera.