This tutorial describes how to use the generic markerless model-based tracker [8], [49] implemented in vpMbGenericTracker with data acquired by a RGB-D camera like the Intel RealSense devices (SR300 or D400 series) or Occipital Structure devices (Structure Core color or monochrome cameras). You are advised to have read the Tutorial: Markerless generic model-based tracking using a color camera to have an overview of the generic model-based tracker working on images acquired by a color camera that uses moving-edges and keypoints as visual features. We suggest also to follow Tutorial: Markerless generic model-based tracking using a stereo camera to learn how to consider stereo cameras, since a RGB-D camera is in a sense a stereo camera, with the left camera providing the color image and the right the depth image.
Considering the use case of a RGB-D camera, the tracker implemented in vpMbGenericTracker class allows to consider a combination of the following visual features:
moving edges: contour points tracked along the normals of the visible edges defined in the CAD model (line, face, cylinder and circle primitives) [8]
keypoints: they are tracked on the visible object faces using the KLT tracker (face and cylinder primitives) [43]
depth normal: the normal of a face computed from the depth map (face primitives) (since ViSP 3.1.0)
depth dense: the dense pointcloud obtained from the depth map (face primitives) (since ViSP 3.1.0) [49].
The moving-edges and KLT features require a RGB camera, more precisely these features operate on grayscale image. The depth normal and dense depth features require a depth map that can be obtained from a RGB-D camera for example.
If you want to use a depth feature, we advise you to choose the dense depth features that is a much more robust method compared to the depth normal feature. Keep also in mind that Kinect-like sensors have a limited depth range (from ~0.8m to ~3.5m).
Note also that combining the visual features can be a good way to improve the tracking robustness (e.g. a stereo tracker with moving edges + keypoints for the left camera and dense depth visual features for the right camera).
The following video demonstrates the use of the generic tracker with a RGB-D sensor using the following features:
moving-edges + keypoints features for the color camera
depth normal features for the depth sensor
In this video, the same setup is used but with:
moving-edges features for the color camera
dense depth features for the depth sensor
Considered third-parties
We recall that this tutorial is designed to work with either Intel Realsense or Occipital RGB-D cameras. If you want to run the use cases below, depending on your RGB-D device, you need to install the appropriate SDK as described in the following tutorials (see section "librealsense 3rd party" or "libStructure 3rd party" respectively):
Moreover, depending on your use case the following optional third-parties may be used by the tracker. Make sure ViSP was build with the appropriate 3rd parties:
OpenCV: Essential if you want to use the keypoints as visual features that are detected and tracked thanks to the KLT tracker. This 3rd party may be also useful to consider input videos (mpeg, png, jpeg...).
Point Cloud Library: This 3rd party is optional but could be used if your RGB-D sensor provides the depth data as a point cloud. Notice that the source code explained in this tutorial doesn't run if you don't have a version of ViSP build with PCL as 3rd party. Note also that you don't need to install PCL if you don't want to consider depth as visual feature.
Ogre3D: This 3rd party is optional and could be difficult to install on OSX and Windows. To begin with the tracker we don't recommend to install it. Ogre3D allows to enable Advanced visibility via Ogre3D.
Coin3D: This 3rd party is also optional and difficult to install. That's why we don't recommend to install Coin3D to begin with the tracker. Coin3D allows only to consider CAD model in wrml format instead of the home-made CAD model in cao format.
It is recommended to install an optimized BLAS library (for instance OpenBLAS) to get better performance with the dense depth feature approach which requires large matrix operations. On Ubuntu Xenial, it is the libopenblas-dev package that should be installed. To select or switch the BLAS library to use, see Handle different versions of BLAS and LAPACK:
$ update-alternatives --config libblas.so.3
You should have something similar to this:
There are 3 choices for the alternative libblas.so.3 (providing /usr/lib/libblas.so.3).
Selection
Path
Priority
Status
0
/usr/lib/openblas-base/libblas.so.3
40
auto mode
1
/usr/lib/atlas-base/atlas/libblas.so.3
35
manual mode
2
/usr/lib/libblas/libblas.so.3
10
manual mode
3
/usr/lib/openblas-base/libblas.so.3
40
manual mode
Input data
For classical features working on grayscale image, you have to use the following data type:
For depth features, you need to supply a pointcloud, that means a 2D array where each element contains the X, Y and Z coordinate in the depth sensor frame. The following data type are accepted:
a vector of vpColVector: the column vector being a 3x1 (X, Y, Z) vector
If you have only a depth map, a 2D array where each element (u, v) is the distance Z in meter between the depth sensor frame to the object, you will need to compute the 3D coordinates using the depth sensor intrinsic parameters. We recall that u relates to the array columns, while v relates to the array rows. For instance, without taking into account the distortion (see also vpPixelMeterConversion::convertPoint), the conversion is (u and v are the pixel coordinates, u0, v0, px, py are the intrinsic parameters):
Each tracker is stored in a map, the key corresponding to the name of the camera on which the tracker will process. By default, the camera names are set to:
"Camera" when the tracker is constructed with one camera.
"Camera1" to "CameraN" when the tracker is constructed with N cameras.
The default reference camera will be "Camera1" in the multiple cameras case.
Default name convention and reference camera ("Camera1").
To deal with multiple cameras, in the virtual visual servoing control law we concatenate all the interaction matrices and residual vectors and transform them in a single reference camera frame to compute the reference camera velocity. Thus, we have to know the transformation matrix between each camera and the reference camera.
For example, if the reference camera is "Camera1" ( ), we need the following information: .
Declare a model-based tracker of the desired feature type
This will declare a tracker with edge + KLT features for the color camera and dense depth feature for the depth sensor.
Interfacing with the code
Each essential method used to initialize the tracker and process the tracking have three signatures in order to ease the call to the method and according to three working modes:
tracking using one camera, the signature remains the same.
tracking using two cameras, all the necessary methods accept directly the corresponding parameters for each camera.
tracking using multiple cameras, you have to supply the different parameters within a map. The key corresponds to the name of the camera and the value is the parameter.
The following table sums up how to call the different methods based on the camera configuration for the main functions.
Assuming the transformation matrices between the cameras have been set, some init files can be omitted as long as the reference camera has an init file. The corresponding images must be supplied.
Track the object:
tracker.track(I)
tracker.track(I1, I2)
tracker.track(mapOfImg)
See the documentation to track with pointcloud.
Get the pose:
tracker.getPose(cMo)
tracker.getPose(c1Mo, c2Mo)
tracker.getPose(mapOfPoses)
tracker.getPose(cMo) will return the pose for the reference camera in the stereo/multiple cameras configurations.
As the trackers are stored in an alphabetic order internally, you have to match the method parameters with the correct tracker position in the map in the stereo cameras case.
Dataset use case
Cube tracking on a RGB-D dataset
We provide hereafter an example implemented in tutorial-mb-generic-tracker-rgbd.cpp that shows how to track a cube modeled in cao format using moving-eges, keypoints and dense depth as visual features. The input data (color image, depth map and point clould) were acquired by a RealSense RGB-D camera.
Warning
ViSP must have been built with OpenCV and the KLT module must be enabled to run this tutorial. You need also the Point Cloud Library enabled.
Once built, run the binary:
$ ./tutorial-mb-generic-tracker-rgbd
After initialization by 4 user clicks, the tracker is running and produces the following output:
The previous source code shows how to use the vpMbGenericTracker class using depth as visual feature. The procedure to configure the tracker remains the same:
construct the tracker enabling moving-edges, keypoints and depth as visual feature
configure the tracker by loading a configuration file (cube.xml) for the color camera and for the depth camera (cube_depth.xml). These files contain the camera intrinsic parameters associated to each sensor
load the transformation matrix (depth_M_color.txt) between the depth sensor and the color camera
load the cube.init file used to initialize the tracker by a user interaction that consists in clicking on 4 cube corners
load a 3D model for the color and for the depth camera. In our case they are the same (cube.cao)
start a while loop to process all the images from the sequence of data
track on the current images (color and depth)
get the current pose and display the model in the image
Then we implemented the function read_data() to read the color images color_image_%04d.jpg, the depth map depth_image_%04d.bin and the point cloud point_cloud_%04d.bin.
We can also use the following setting that enables the display of the features used during the tracking:
tracker.setDisplayFeatures(true);
We then set the map of transformations between our two sensors indicating that the color sensor frame is the reference frame. depth_M_color matrix is read from the input file depth_M_color.txt.
This video shows a comparison between Intel Realsense D435 and Occipital Structure Core. In the left column we show data acquired using D435, while on the right column they are from the Structure Core color depth camera.
Intel Realsense camera use case
Hereafter we provide the information to test the tracker with different objects (cube, teabox, lego-square) with an Intel Realsense D435 or SR300 RGB-D camera. Here we suppose that librealsense 3rd party is installed, and that ViSP is build with librealsense support as explained in the following tutorials:
Here we consider a 4.2 cm cube. If your cube size differ, modify model/cube/cube.cao file replacing 0.042 with the appropriate value expressed in meters.
To track a 4.2 cm cube with manual initialization, run:
Once initialized by 4 user clicks, use the left click to learn on one or two images and then the right click to quit. In the terminal you should see printings like:
...
Data learned
Data learned
Save learning from 2 images in file: learning/data-learned.bin
You can now use this learning to automize tracker initialization with --auto_init option:
Here we consider a 16.5 x 6.8 x 8 cm box [length x width x height]. If your cube size differ, modify model/cube/cube.cao with the appropriate size values expressed in meters.
To track the 16.5 x 6.8 x 8 cm teabox with manual initialization, run:
Once initialized by 4 user clicks, use the left click to learn on one or two images and then the right click to quit. In the terminal you should see printings like:
...
Data learned
Data learned
Save learning from 2 images in file: learning/data-learned.bin
You can now use this learning to automize tracker initialization with --auto_init option:
Once initialized by 4 user clicks, use the left click to learn on one or two images and then the right click to quit. In the terminal you should see printings like:
...
Data learned
Data learned
Save learning from 2 images in file: learning/data-learned.bin
You can now use this learning to automize tracker initialization with --auto_init option:
Hereafter we provide the information to test the tracker with different objects (cube, teabox, lego-square) with an Occipital Structure Core color or monochrome RGB-D camera. Here we suppose that libStructure 3rd party is installed, and that ViSP is build with Occiptal Structure support as explained in the following tutorials:
Here we consider a 4.2 cm cube. If your cube size differ, modify model/cube/cube.cao file replacing 0.042 with the appropriate value expressed in meters.
To track a 4.2 cm cube with manual initialization, run:
Once initialized by 4 user clicks, use the left click to learn on one or two images and then the right click to quit. In the terminal you should see printings like:
...
Data learned
Data learned
Save learning from 2 images in file: learning/data-learned.bin
You can now use this learning to automize tracker initialization with --auto_init option:
Here we consider a 16.5 x 6.8 x 8 cm box [length x width x height]. If your cube size differ, modify model/cube/cube.cao with the appropriate size values expressed in meters.
To track the 16.5 x 6.8 x 8 cm teabox with manual initialization, run:
Once initialized by 4 user clicks, use the left click to learn on one or two images and then the right click to quit. In the terminal you should see printings like:
...
Data learned
Data learned
Save learning from 2 images in file: learning/data-learned.bin
You can now use this learning to automize tracker initialization with --auto_init option:
Once initialized by 4 user clicks, use the left click to learn on one or two images and then the right click to quit. In the terminal you should see printings like:
...
Data learned
Data learned
Save learning from 2 images in file: learning/data-learned.bin
You can now use this learning to automize tracker initialization with --auto_init option: