Visual Servoing Platform  version 3.6.1 under development (2024-12-17)
Tutorial: Object detection and localization

Introduction

This tutorial will show you how to use keypoints to detect and estimate the pose of a known object using his cad model. The first step consists in detecting and learning keypoints located on the faces of an object, while the second step makes the matching between the detected keypoints in the query image with those previously learned. The pair of matches are then used to estimate the pose of the object with the knowledge of the correspondences between the 2D and 3D coordinates.

The next section presents a basic example of the detection of a teabox with a detailed description of the different steps.

Note that all the material (source code and image) described in this tutorial is part of ViSP source code (in tutorial/detection/object folder) and could be found in https://github.com/lagadic/visp/tree/master/tutorial/detection/object.

Object detection using keypoints

Preamble

You are advised to read the following tutorials Tutorial: Markerless generic model-based tracking using a color camera and Tutorial: Keypoints matching if you are not aware of these concepts.

Principle of object detection using keypoints

A quick overview of the principle is summed-up in the following diagrams.

Learning step.

The first part of the process consists in learning the characteristics of the considered object by extracting the keypoints detected on the different faces. We use here the model-based tracker initialized given a known initial pose to have access to the cad model of the object. The cad model is then used to select only keypoints on faces that are visible and to calculate the 3D coordinates of keypoints.

Note
The calculation of the 3D coordinates of a keypoint is based on a planar location hypothesis. We assume that the keypoint is located on a planar face and the Z-coordinate is retrieved according to the proportional relation between the plane equation expressed in the normalized camera frame (derived from the image coordinate) and the same plane equation expressed in the camera frame, thanks to the known pose of the object.

In this example the learned data (the list of 3D coordinates and the corresponding descriptors) are saved in a file and will be used later in the detection part.

Detection step.

In a query image where we want to detect the object, we find the matches between the keypoints detected in the current image with those previously learned. The estimation of the pose of the object can then be computed with the 3D/2D information.

The next section presents an example of the detection and the pose estimation of a teabox.

Teabox detection and pose estimation

The following video shows the resulting detection and localization of a teabox that is learned on the first image of the video.

The corresponding code is available in tutorial-detection-object-mbt.cpp. It contains the different steps to learn the teabox object on one image (the first image of the video) and then detect and get the pose of the teabox in the rest of the video.

#include <visp3/core/vpConfig.h>
#include <visp3/core/vpIoTools.h>
#include <visp3/gui/vpDisplayGDI.h>
#include <visp3/gui/vpDisplayOpenCV.h>
#include <visp3/gui/vpDisplayX.h>
#include <visp3/io/vpVideoReader.h>
#include <visp3/mbt/vpMbGenericTracker.h>
#include <visp3/vision/vpKeyPoint.h>
int main(int argc, char **argv)
{
#if defined(VISP_HAVE_OPENCV) && defined(HAVE_OPENCV_IMGPROC) && defined(HAVE_OPENCV_FEATURES2D)
#ifdef ENABLE_VISP_NAMESPACE
using namespace VISP_NAMESPACE_NAME;
#endif
try {
std::string videoname = "teabox.mp4";
for (int i = 0; i < argc; i++) {
if (std::string(argv[i]) == "--name")
videoname = std::string(argv[i + 1]);
else if (std::string(argv[i]) == "--help" || std::string(argv[i]) == "-h") {
std::cout << "\nUsage: " << argv[0] << " [--name <video name>] [--help] [-h]\n" << std::endl;
return EXIT_SUCCESS;
}
}
std::string parentname = vpIoTools::getParent(videoname);
std::string objectname = vpIoTools::getNameWE(videoname);
if (!parentname.empty())
objectname = parentname + "/" + objectname;
std::cout << "Video name: " << videoname << std::endl;
std::cout << "Tracker requested config files: " << objectname << ".[init,"
<< "xml,"
<< "cao or wrl]" << std::endl;
std::cout << "Tracker optional config files: " << objectname << ".[ppm]" << std::endl;
g.setFileName(videoname);
g.open(I);
#if defined(VISP_HAVE_X11)
vpDisplayX display;
#elif defined(VISP_HAVE_GDI)
vpDisplayGDI display;
#elif defined(HAVE_OPENCV_HIGHGUI)
vpDisplayOpenCV display;
#else
std::cout << "No image viewer is available..." << std::endl;
return EXIT_FAILURE;
#endif
display.init(I, 100, 100, "Model-based edge tracker");
bool usexml = false;
#if defined(VISP_HAVE_PUGIXML)
if (vpIoTools::checkFilename(objectname + ".xml")) {
tracker.loadConfigFile(objectname + ".xml");
tracker.getCameraParameters(cam);
usexml = true;
}
#endif
if (!usexml) {
vpMe me;
me.setMaskSize(5);
me.setMaskNumber(180);
me.setRange(8);
me.setThreshold(20);
me.setMu1(0.5);
me.setMu2(0.5);
tracker.setMovingEdge(me);
cam.initPersProjWithoutDistortion(839, 839, 325, 243);
tracker.setCameraParameters(cam);
tracker.setAngleAppear(vpMath::rad(70));
tracker.setAngleDisappear(vpMath::rad(80));
tracker.setNearClippingDistance(0.1);
tracker.setFarClippingDistance(100.0);
tracker.setClipping(tracker.getClipping() | vpMbtPolygon::FOV_CLIPPING);
}
tracker.setOgreVisibilityTest(false);
if (vpIoTools::checkFilename(objectname + ".cao"))
tracker.loadModel(objectname + ".cao");
else if (vpIoTools::checkFilename(objectname + ".wrl"))
tracker.loadModel(objectname + ".wrl");
tracker.setDisplayFeatures(true);
tracker.initClick(I, objectname + ".init", true);
tracker.track(I);
#if (defined(VISP_HAVE_OPENCV_NONFREE) || defined(VISP_HAVE_OPENCV_XFEATURES2D)) || \
(VISP_HAVE_OPENCV_VERSION >= 0x030411 && CV_MAJOR_VERSION < 4) || (VISP_HAVE_OPENCV_VERSION >= 0x040400)
std::string detectorName = "SIFT";
std::string extractorName = "SIFT";
std::string matcherName = "BruteForce";
std::string configurationFile = "detection-config-SIFT.xml";
#else
std::string detectorName = "FAST";
std::string extractorName = "ORB";
std::string matcherName = "BruteForce-Hamming";
std::string configurationFile = "detection-config.xml";
#endif
vpKeyPoint keypoint_learning;
if (usexml) {
keypoint_learning.loadConfigFile(configurationFile);
}
else {
keypoint_learning.setDetector(detectorName);
keypoint_learning.setExtractor(extractorName);
keypoint_learning.setMatcher(matcherName);
}
std::vector<cv::KeyPoint> trainKeyPoints;
double elapsedTime;
keypoint_learning.detect(I, trainKeyPoints, elapsedTime);
std::vector<vpPolygon> polygons;
std::vector<std::vector<vpPoint> > roisPt;
std::pair<std::vector<vpPolygon>, std::vector<std::vector<vpPoint> > > pair = tracker.getPolygonFaces(false);
polygons = pair.first;
roisPt = pair.second;
std::vector<cv::Point3f> points3f;
tracker.getPose(cMo);
vpKeyPoint::compute3DForPointsInPolygons(cMo, cam, trainKeyPoints, polygons, roisPt, points3f);
keypoint_learning.buildReference(I, trainKeyPoints, points3f);
keypoint_learning.saveLearningData("teabox_learning_data.bin", true);
for (std::vector<cv::KeyPoint>::const_iterator it = trainKeyPoints.begin(); it != trainKeyPoints.end(); ++it) {
vpDisplay::displayCross(I, (int)it->pt.y, (int)it->pt.x, 4, vpColor::red);
}
vpDisplay::displayText(I, 10, 10, "Learning step: keypoints are detected on visible teabox faces", vpColor::red);
vpDisplay::displayText(I, 30, 10, "Click to continue with detection...", vpColor::red);
vpKeyPoint keypoint_detection;
if (usexml) {
keypoint_detection.loadConfigFile(configurationFile);
}
else {
keypoint_detection.setDetector(detectorName);
keypoint_detection.setExtractor(extractorName);
keypoint_detection.setMatcher(matcherName);
keypoint_detection.setMatchingRatioThreshold(0.8);
keypoint_detection.setUseRansacVVS(true);
keypoint_detection.setUseRansacConsensusPercentage(true);
keypoint_detection.setRansacConsensusPercentage(20.0);
keypoint_detection.setRansacIteration(200);
keypoint_detection.setRansacThreshold(0.005);
}
keypoint_detection.loadLearningData("teabox_learning_data.bin", true);
double error;
bool click_done = false;
while (!g.end()) {
g.acquire(I);
vpDisplay::displayText(I, 10, 10, "Detection and localization in process...", vpColor::red);
if (keypoint_detection.matchPoint(I, cam, cMo, error, elapsedTime)) {
tracker.setPose(I, cMo);
tracker.display(I, cMo, cam, vpColor::red, 2);
vpDisplay::displayFrame(I, cMo, cam, 0.025, vpColor::none, 3);
}
vpDisplay::displayText(I, 30, 10, "A click to exit.", vpColor::red);
if (vpDisplay::getClick(I, false)) {
click_done = true;
break;
}
}
if (!click_done)
}
catch (const vpException &e) {
std::cout << "Catch an exception: " << e << std::endl;
}
#else
(void)argc;
(void)argv;
std::cout << "Install OpenCV and rebuild ViSP to use this example." << std::endl;
#endif
return EXIT_SUCCESS;
}
Generic class defining intrinsic camera parameters.
void initPersProjWithoutDistortion(double px, double py, double u0, double v0)
static const vpColor red
Definition: vpColor.h:217
static const vpColor none
Definition: vpColor.h:229
Display for windows using GDI (available on any windows 32 platform).
Definition: vpDisplayGDI.h:130
The vpDisplayOpenCV allows to display image using the OpenCV library. Thus to enable this class OpenC...
static bool getClick(const vpImage< unsigned char > &I, bool blocking=true)
static void display(const vpImage< unsigned char > &I)
static void displayFrame(const vpImage< unsigned char > &I, const vpHomogeneousMatrix &cMo, const vpCameraParameters &cam, double size, const vpColor &color=vpColor::none, unsigned int thickness=1, const vpImagePoint &offset=vpImagePoint(0, 0), const std::string &frameName="", const vpColor &textColor=vpColor::black, const vpImagePoint &textOffset=vpImagePoint(15, 15))
static void displayCross(const vpImage< unsigned char > &I, const vpImagePoint &ip, unsigned int size, const vpColor &color, unsigned int thickness=1)
static void flush(const vpImage< unsigned char > &I)
static void displayText(const vpImage< unsigned char > &I, const vpImagePoint &ip, const std::string &s, const vpColor &color)
error that can be emitted by ViSP classes.
Definition: vpException.h:60
Implementation of an homogeneous matrix and operations on such kind of matrices.
static bool checkFilename(const std::string &filename)
Definition: vpIoTools.cpp:786
static std::string getNameWE(const std::string &pathname)
Definition: vpIoTools.cpp:1227
static std::string getParent(const std::string &pathname)
Definition: vpIoTools.cpp:1314
Class that allows keypoints detection (and descriptors extraction) and matching thanks to OpenCV libr...
Definition: vpKeyPoint.h:221
unsigned int matchPoint(const vpImage< unsigned char > &I)
void setRansacConsensusPercentage(double percentage)
Definition: vpKeyPoint.h:1804
void setFilterMatchingType(const vpFilterMatchingType &filterType)
Definition: vpKeyPoint.h:1737
void setUseRansacVVS(bool ransacVVS)
Definition: vpKeyPoint.h:1956
void setExtractor(const vpFeatureDescriptorType &extractorType)
Definition: vpKeyPoint.h:1633
void loadLearningData(const std::string &filename, bool binaryMode=false, bool append=false)
void setRansacThreshold(double threshold)
Definition: vpKeyPoint.h:1891
void detect(const vpImage< unsigned char > &I, std::vector< cv::KeyPoint > &keyPoints, const vpRect &rectangle=vpRect())
Definition: vpKeyPoint.cpp:975
static void compute3DForPointsInPolygons(const vpHomogeneousMatrix &cMo, const vpCameraParameters &cam, std::vector< cv::KeyPoint > &candidates, const std::vector< vpPolygon > &polygons, const std::vector< std::vector< vpPoint > > &roisPt, std::vector< cv::Point3f > &points, cv::Mat *descriptors=nullptr)
Definition: vpKeyPoint.cpp:465
void setMatcher(const std::string &matcherName)
Definition: vpKeyPoint.h:1709
void saveLearningData(const std::string &filename, bool binaryMode=false, bool saveTrainingImages=true)
void setUseRansacConsensusPercentage(bool usePercentage)
Definition: vpKeyPoint.h:1947
void setMatchingRatioThreshold(double ratio)
Definition: vpKeyPoint.h:1788
@ ratioDistanceThreshold
Definition: vpKeyPoint.h:230
void setDetector(const vpFeatureDetectorType &detectorType)
Definition: vpKeyPoint.h:1575
unsigned int buildReference(const vpImage< unsigned char > &I)
Definition: vpKeyPoint.cpp:194
void loadConfigFile(const std::string &configFile)
void setRansacIteration(int nbIter)
Definition: vpKeyPoint.h:1826
static double rad(double deg)
Definition: vpMath.h:129
Real-time 6D object pose tracking using its CAD model.
Definition: vpMe.h:134
void setMu1(const double &mu_1)
Definition: vpMe.h:385
void setRange(const unsigned int &range)
Definition: vpMe.h:415
void setLikelihoodThresholdType(const vpLikelihoodThresholdType likelihood_threshold_type)
Definition: vpMe.h:505
void setNbTotalSample(const int &ntotal_sample)
Definition: vpMe.h:399
void setMaskNumber(const unsigned int &mask_number)
Definition: vpMe.cpp:552
void setThreshold(const double &threshold)
Definition: vpMe.h:466
void setSampleStep(const double &sample_step)
Definition: vpMe.h:422
void setMaskSize(const unsigned int &mask_size)
Definition: vpMe.cpp:560
void setMu2(const double &mu_2)
Definition: vpMe.h:392
@ NORMALIZED_THRESHOLD
Definition: vpMe.h:145
Class that enables to manipulate easily a video file or a sequence of images. As it inherits from the...
void acquire(vpImage< vpRGBa > &I)
void open(vpImage< vpRGBa > &I)
void setFileName(const std::string &filename)

You may recognize with the following lines the code used in tutorial-mb-edge-tracker.cpp to initialize the model-based tracker at a given pose and with the appropriate configuration.

try {
std::string videoname = "teabox.mp4";
for (int i = 0; i < argc; i++) {
if (std::string(argv[i]) == "--name")
videoname = std::string(argv[i + 1]);
else if (std::string(argv[i]) == "--help" || std::string(argv[i]) == "-h") {
std::cout << "\nUsage: " << argv[0] << " [--name <video name>] [--help] [-h]\n" << std::endl;
return EXIT_SUCCESS;
}
}
std::string parentname = vpIoTools::getParent(videoname);
std::string objectname = vpIoTools::getNameWE(videoname);
if (!parentname.empty())
objectname = parentname + "/" + objectname;
std::cout << "Video name: " << videoname << std::endl;
std::cout << "Tracker requested config files: " << objectname << ".[init,"
<< "xml,"
<< "cao or wrl]" << std::endl;
std::cout << "Tracker optional config files: " << objectname << ".[ppm]" << std::endl;
g.setFileName(videoname);
g.open(I);
#if defined(VISP_HAVE_X11)
vpDisplayX display;
#elif defined(VISP_HAVE_GDI)
vpDisplayGDI display;
#elif defined(HAVE_OPENCV_HIGHGUI)
vpDisplayOpenCV display;
#else
std::cout << "No image viewer is available..." << std::endl;
return EXIT_FAILURE;
#endif
display.init(I, 100, 100, "Model-based edge tracker");
bool usexml = false;
#if defined(VISP_HAVE_PUGIXML)
if (vpIoTools::checkFilename(objectname + ".xml")) {
tracker.loadConfigFile(objectname + ".xml");
tracker.getCameraParameters(cam);
usexml = true;
}
#endif
if (!usexml) {
vpMe me;
me.setMaskSize(5);
me.setMaskNumber(180);
me.setRange(8);
me.setThreshold(20);
me.setMu1(0.5);
me.setMu2(0.5);
tracker.setMovingEdge(me);
cam.initPersProjWithoutDistortion(839, 839, 325, 243);
tracker.setCameraParameters(cam);
tracker.setAngleAppear(vpMath::rad(70));
tracker.setAngleDisappear(vpMath::rad(80));
tracker.setNearClippingDistance(0.1);
tracker.setFarClippingDistance(100.0);
tracker.setClipping(tracker.getClipping() | vpMbtPolygon::FOV_CLIPPING);
}
tracker.setOgreVisibilityTest(false);
if (vpIoTools::checkFilename(objectname + ".cao"))
tracker.loadModel(objectname + ".cao");
else if (vpIoTools::checkFilename(objectname + ".wrl"))
tracker.loadModel(objectname + ".wrl");
tracker.setDisplayFeatures(true);
tracker.initClick(I, objectname + ".init", true);
tracker.track(I);

The modifications made to the code start from now.

First, we have to choose about which type of keypoints will be used. SIFT keypoints are a widely type of keypoints used in computer vision, but depending of your version of OpenCV and due to some patents, certain types of keypoints will not be available. Here, we will use SIFT if available, otherwise a combination of FAST keypoint detector and ORB descriptor extractor.

#if (defined(VISP_HAVE_OPENCV_NONFREE) || defined(VISP_HAVE_OPENCV_XFEATURES2D)) || \
(VISP_HAVE_OPENCV_VERSION >= 0x030411 && CV_MAJOR_VERSION < 4) || (VISP_HAVE_OPENCV_VERSION >= 0x040400)
std::string detectorName = "SIFT";
std::string extractorName = "SIFT";
std::string matcherName = "BruteForce";
std::string configurationFile = "detection-config-SIFT.xml";
#else
std::string detectorName = "FAST";
std::string extractorName = "ORB";
std::string matcherName = "BruteForce-Hamming";
std::string configurationFile = "detection-config.xml";
#endif

The following line declares an instance of the vpKeyPoint class :

vpKeyPoint keypoint_learning;

You can load the configuration (type of detector, extractor, matcher, ransac pose estimation parameters) directly with an xml configuration file :

keypoint_learning.loadConfigFile(configurationFile);

Otherwise, the configuration must be made in the code.

keypoint_learning.setDetector(detectorName);
keypoint_learning.setExtractor(extractorName);
keypoint_learning.setMatcher(matcherName);

We then detect keypoints in the reference image with the object we want to learn :

std::vector<cv::KeyPoint> trainKeyPoints;
double elapsedTime;
keypoint_learning.detect(I, trainKeyPoints, elapsedTime);

But we need to keep keypoints only on faces of the teabox. This is done by using the model-based tracker to first eliminate keypoints which do not belong to the teabox and secondly to have the plane equation for each faces (and so to be able to compute the 3D coordinate from the 2D information).

std::vector<vpPolygon> polygons;
std::vector<std::vector<vpPoint> > roisPt;
std::pair<std::vector<vpPolygon>, std::vector<std::vector<vpPoint> > > pair = tracker.getPolygonFaces(false);
polygons = pair.first;
roisPt = pair.second;
std::vector<cv::Point3f> points3f;
tracker.getPose(cMo);
vpKeyPoint::compute3DForPointsInPolygons(cMo, cam, trainKeyPoints, polygons, roisPt, points3f);

The next step is the building of the reference keypoints. The descriptors for each keypoints are also extracted and the reference data consist of the lists of keypoints / descriptors and the list of 3D points.

keypoint_learning.buildReference(I, trainKeyPoints, points3f);

We save the learning data in a binary format (the other possibilitie is to save in an xml format but which takes more space) to be able to use it later.

keypoint_learning.saveLearningData("teabox_learning_data.bin", true);

We then visualize the result of the learning process by displaying with a cross the location of the keypoints:

for (std::vector<cv::KeyPoint>::const_iterator it = trainKeyPoints.begin(); it != trainKeyPoints.end(); ++it) {
vpDisplay::displayCross(I, (int)it->pt.y, (int)it->pt.x, 4, vpColor::red);
}
vpDisplay::displayText(I, 10, 10, "Learning step: keypoints are detected on visible teabox faces", vpColor::red);
vpDisplay::displayText(I, 30, 10, "Click to continue with detection...", vpColor::red);

We declare now another instance of the vpKeyPoint class dedicated this time to the detection of the teabox. The configuration is directly loaded from an xml file, otherwise this is done directly in the code.

vpKeyPoint keypoint_detection;
if (usexml) {
keypoint_detection.loadConfigFile(configurationFile);
}
else {
keypoint_detection.setDetector(detectorName);
keypoint_detection.setExtractor(extractorName);
keypoint_detection.setMatcher(matcherName);
keypoint_detection.setMatchingRatioThreshold(0.8);
keypoint_detection.setUseRansacVVS(true);
keypoint_detection.setUseRansacConsensusPercentage(true);
keypoint_detection.setRansacConsensusPercentage(20.0);
keypoint_detection.setRansacIteration(200);
keypoint_detection.setRansacThreshold(0.005);
}

The previously saved binary file corresponding to the teabox learning data is loaded:

keypoint_detection.loadLearningData("teabox_learning_data.bin", true);

We are now ready to detect the teabox in a query image. The call to the function vpKeyPoint::matchPoint() returns true if the matching was successful and permits to get the estimated homogeneous matrix corresponding to the pose of the object. The reprojection error is also computed.

if (keypoint_detection.matchPoint(I, cam, cMo, error, elapsedTime)) {

In order to display the result, we use the tracker initialized at the estimated pose and we display also the location of the world frame:

tracker.setPose(I, cMo);
tracker.display(I, cMo, cam, vpColor::red, 2);
vpDisplay::displayFrame(I, cMo, cam, 0.025, vpColor::none, 3);

The pose of the detected object can then be used to initialize a tracker automatically rather then using a human initialization; see Tutorial: Markerless generic model-based tracking using a color camera and Tutorial: Template tracking.

Quick explanation about some parameters used in the example

The content of the configuration file named detection-config-SIFT.xml and provided with this example is described in the following lines:

<?xml version="1.0"?>
<conf>
<detector>
<name>SIFT</name>
</detector>
<extractor>
<name>SIFT</name>
</extractor>
<matcher>
<name>BruteForce</name>
<matching_method>ratioDistanceThreshold</matching_method>
<matchingRatioThreshold>0.8</matchingRatioThreshold>
</matcher>
<ransac>
<useRansacVVS>1</useRansacVVS>
<useRansacConsensusPercentage>1</useRansacConsensusPercentage>
<ransacConsensusPercentage>20.0</ransacConsensusPercentage>
<nbRansacIterations>200</nbRansacIterations>
<ransacThreshold>0.005</ransacThreshold>
</ransac>
</conf>

In this configuration file, SIFT keypoints are used.

Let us explain now the configuration of the matcher:

  • a brute force matching will explore all the possible solutions to match a considered keypoints detected in the current image to the closest (in descriptor distance term) one in the reference set, contrary to the other type of matching using the library FLANN (Fast Library for Approximate Nearest Neighbors) which contains some optimizations to reduce the complexity of the solution set,
  • to eliminate some possible false matching, one technique consists of keeping only the keypoints whose are sufficienly discriminated using a ratio test.

Now, for the Ransac pose estimation part :

  • two methods are provided to estimate the pose in a robust way: one using OpenCV, the other method uses a virtual visual servoing approach using ViSP,
  • basically, a Ransac method is composed of two steps repeated a certain number of iterations: first we pick randomly 4 points and estimate the pose, the second step is to keep all points which sufficienly "agree" (the reprojection error is below a threshold) with the pose determinated in the first step. These points are inliers and form the consensus set, the other are outliers. If enough points are in the consensus set (here 20 % of all the points), the pose is refined and returned, otherwise another iteration is made (here 200 iterations maximum).

Below you will also find the content of detection-lconfig.xml configuration file, also provided in this example. It allows to use FAST detector and ORB extractor.

<?xml version="1.0"?>
<conf>
<detector>
<name>FAST</name>
</detector>
<extractor>
<name>ORB</name>
</extractor>
<matcher>
<name>BruteForce-Hamming</name>
<matching_method>ratioDistanceThreshold</matching_method>
<matchingRatioThreshold>0.8</matchingRatioThreshold>
</matcher>
<ransac>
<useRansacVVS>1</useRansacVVS>
<useRansacConsensusPercentage>1</useRansacConsensusPercentage>
<ransacConsensusPercentage>20.0</ransacConsensusPercentage>
<nbRansacIterations>200</nbRansacIterations>
<ransacThreshold>0.005</ransacThreshold>
</ransac>
</conf>

Additional functionalities

How to learn keypoints from multiple images

The following video shows an extension of the previous example where here we learn a cube from 3 images and then detect an localize the cube in all the images of the video.

The corresponding source code is given in tutorial-detection-object-mbt2.cpp. If you have a look on this file you will find the following.

Before starting with the keypoints detection and learning part, we have to set the correct pose for the tracker using a predefined pose:

tracker.setPose(I, initPoseTab[i]);

One good thing to do is to refine the pose by running one iteration of the model-based tracker:

tracker.track(I);

The vpKeyPoint::buildReference() allows to append the current detected keypoints with those already present by setting the function parameter append to true.

But before that, the same learning procedure must be done in order to train on multiple images. We detect keypoints on the desired image:

std::vector<cv::KeyPoint> trainKeyPoints;
double elapsedTime;
keypoint_learning.detect(I, trainKeyPoints, elapsedTime);

Then, we keep only keypoints that are located on the object faces:

std::vector<vpPolygon> polygons;
std::vector<std::vector<vpPoint> > roisPt;
std::pair<std::vector<vpPolygon>, std::vector<std::vector<vpPoint> > > pair = tracker.getPolygonFaces();
polygons = pair.first;
roisPt = pair.second;
std::vector<cv::Point3f> points3f;
tracker.getPose(cMo);
tracker.getCameraParameters(cam);
vpKeyPoint::compute3DForPointsInPolygons(cMo, cam, trainKeyPoints, polygons, roisPt, points3f);

And finally, we build the reference keypoints and we set the flag append to true to say that we want to keep the previously learned keypoints:

keypoint_learning.buildReference(I, trainKeyPoints, points3f, true, id);

How to display the matching when the learning is done on multiple

images

In this section we will explain how to display the matching between keypoints detected in the current image and their correspondences in the reference images that are used during the learning stage, as given in the next video:

Warning
If you want to load the learning data from a file, you have to use a learning file that contains training images (with the parameter saveTrainingImages vpKeyPoint::saveLearningData() set to true when saving the file, by default it is).

Before showing how to display the matching for all the training images, we have to attribute an unique identifier (a positive integer) for the set of keypoints learned for a particular image during the training process:

keypoint_learning.buildReference(I, trainKeyPoints, points3f, true, id);

It permits to link the training keypoints with the correct corresponding training image.

After that, the first thing to do is to create the image that will contain the keypoints matching with:

keypoint_detection.createImageMatching(I, IMatching);
void createImageMatching(vpImage< unsigned char > &IRef, vpImage< unsigned char > &ICurrent, vpImage< unsigned char > &IMatching)
Definition: vpKeyPoint.cpp:861

The previous line allows to allocate an image with the correct size according to the number of training images used.

Then, we have to update for each new image the matching image with the current image:

keypoint_detection.insertImageMatching(I, IMatching);
void insertImageMatching(const vpImage< unsigned char > &IRef, const vpImage< unsigned char > &ICurrent, vpImage< unsigned char > &IMatching)
Note
The current image will be inserted preferentially at the center of the matching image if it possible.

And to display the matching we use:

keypoint_detection.displayMatching(I, IMatching);
void displayMatching(const vpImage< unsigned char > &IRef, vpImage< unsigned char > &IMatching, unsigned int crossSize, unsigned int lineThickness=1, const vpColor &color=vpColor::green)

We can also display the RANSAC inliers / outliers in the current image and in the matching image:

for (std::vector<vpImagePoint>::const_iterator it = ransacInliers.begin(); it != ransacInliers.end(); ++it) {
vpImagePoint imPt(*it);
imPt.set_u(imPt.get_u() + I.getWidth());
imPt.set_v(imPt.get_v() + I.getHeight());
}
static const vpColor green
Definition: vpColor.h:220
static void displayCircle(const vpImage< unsigned char > &I, const vpImageCircle &circle, const vpColor &color, bool fill=false, unsigned int thickness=1)
Class that defines a 2D point in an image. This class is useful for image processing and stores only ...
Definition: vpImagePoint.h:82
unsigned int getWidth() const
Definition: vpImage.h:242
unsigned int getHeight() const
Definition: vpImage.h:181
for (std::vector<vpImagePoint>::const_iterator it = ransacOutliers.begin(); it != ransacOutliers.end(); ++it) {
vpImagePoint imPt(*it);
imPt.set_u(imPt.get_u() + I.getWidth());
imPt.set_v(imPt.get_v() + I.getHeight());
vpDisplay::displayCircle(IMatching, imPt, 4, vpColor::red);
}

The following code shows how to retrieve the RANSAC inliers and outliers:

std::vector<vpImagePoint> ransacInliers = keypoint_detection.getRansacInliers();
std::vector<vpImagePoint> ransacOutliers = keypoint_detection.getRansacOutliers();
std::vector< vpImagePoint > getRansacInliers() const
Definition: vpKeyPoint.h:1217
std::vector< vpImagePoint > getRansacOutliers() const
Definition: vpKeyPoint.h:1224

Finally, we can also display the model in the matching image. For that, we have to modify the principal point offset of the intrinsic parameter. This is more or less an hack as you have to manually change the principal point coordinate to make it works.

cam2.initPersProjWithoutDistortion(cam.get_px(), cam.get_py(), cam.get_u0() + I.getWidth(),
cam.get_v0() + I.getHeight());
tracker.setCameraParameters(cam2);
tracker.setPose(IMatching, cMo);
tracker.display(IMatching, cMo, cam2, vpColor::red, 2);
vpDisplay::displayFrame(IMatching, cMo, cam2, 0.05, vpColor::none, 3);
Note
You can refer to the full code in the section How to learn keypoints from multiple images to have an example of how to learn from multiple images and how to display all the matching.