Introduction

In this tutorial, we will explore how to use MegaPose [25], a deep learning method for 6D object pose estimation. To know more about MegaPose see https://megapose6d.github.io/.

Given:

An RGB or RGB-D image for which the intrinsics of the camera are known
A coarse detection of the image region in which lies the object
A 3D model of the object MegaPose can estimate the pose of the object relative to the camera frame $^{c}\mathbf{T}_{o}$ .

The method has several advantages:

Robust estimation in the presence of occlusions and lighting artifacts
Can work with a coarse model of the object
Does not require retraining for novel objects

It has however, several drawbacks:

Running MegaPose requires a GPU. However, the integration in ViSP is based on a client-server model: MegaPose can thus run on a remote machine and its result retrieved on the local host (e.g, a computer with a CPU connected to a robot)
It may be too slow for your requirements
- With the default parameters, on a 640 x 480 image, initial pose estimation takes around 2 seconds on an Nvidia Quadro RTX 6000
- On the same setup, a pose update (refinement) iteration takes around 60-70 milliseconds
To perform the initial pose estimation, MegaPose requires an estimate of the image region containing the image (i.e., a bounding box detection). You may thus require a way to detect the object, such as an object detection neural network (available in ViSP with the class vpDetectorDNNOpenCV, see Tutorial: Deep learning object detection). For initial tests, the bounding box can also be provided by the user via click.

To see some results, scroll to the the end of this tutorial.

For the 3D model and detection inputs required by megapose, we provide tutorials to help you get setup. See Tutorial: Exporting a 3D model to MegaPose after reconstruction with NeRF for the 3D model creation and Tutorial: Generating synthetic data for deep learning with Blenderproc to train a detection network. With these tutorials and the tools presented therein, the work to use megapose can be almost fully automated as summed up in the figure below:

The MegaPose integration in ViSP is based on a client-server model:

The client, that uses either vpMegaPose or vpMegaPoseTracker, is C++-based. It sends pose estimation requests to the server.
The server is written in Python. It wraps around the MegaPose model. Each time a pose estimation is requested, the server reshapes the data and forwards it to MegaPose. It then sends back the information to the client.

Note: The computer running the server needs a GPU. The client can run on the same computer as the server. It can also run on another computer without a GPU. To obtain have a decent tracking speed, it is recommended to have both machines on the same network.

This tutorial will explain how to install and run MegaPose and then demonstrate its usage with a simple object tracking application.

Installation

Installing the client

The MegaPose client, written in C++, is included directly in ViSP. It can be installed on any computer, even without a GPU. To be installed and compiled, it requires:

That ViSP be compiled with the JSON third-party library, as JSON is used to pass messages. To install the 3rd party, see JSON for modern C++ installation procedure for your system. Don't forget to build ViSP again after installing JSON third-party.
Once done, ViSP should be compiled with the visp_dnn_tracker module. When generating build files with CMake, it will be built by default if the JSON third-party is detected on your system
- To check that it is installed, you can check the ViSP-third-party.txt file that is generated by CMake:
  $ cd $VISP_WS/visp-build
  
  $ grep "To be built" ViSP-third-party.txt
  
  To be built: core dnn_tracker gui imgproc io java_bindings_generator klt me sensor ar blob robot visual_features vs vision detection mbt tt tt_mi
  
  If "dnn_tracker" is in the list, then the client can be compiled and used.
- Otherwise it means that ViSP is not built with JSON third-party:
  $ cd $VISP_WS/visp-build
  
  $ grep "json" ViSP-third-party.txt
  
  Use json (nlohmann): no
  
  As explained previously, see JSON for modern C++ installation procedure and build again ViSP.

Installing the server

Warning: The megapose server cannot directly be installed and used on Windows. A workaround is to install it in a WSL container. A WSL container works as a Linux (Ubuntu) distribution. The client still works on Windows, and WSL allows for port forwarding, making its usage seamless from the perspective of the client.

MegaPose server should be installed on a computer equipped with a GPU. To install the MegaPose server, there are two dependencies:

Conda: MegaPose will be installed in a new virtual environment in order to avoid potential conflicts with python and other packages you have already installed
- To install conda on your system, we recommend miniconda, a minimal version of conda. To install, see the miniconda documentation
- Once installed, make sure that conda is in your environment path variable. The conda installation procedure should do this by default.
- To check, simply enter conda --version in your terminal.
- You should obtain an output similar to:
  $ conda --version
  
  conda 23.3.1
Git is also required in order to fetch the MegaPose sources. If you built ViSP from sources, then it should already be installed.

The server sources are located in the $VISP_WS/visp/script/megapose_server folder of your ViSP source directory.

In this folder, you can find multiple files:

run.py: the code for the server
install.py: the installation script
megapose_variables.json: configuration variables, used in the installation process.

To start the installation process, you should first set the variables in megapose_variables.json file:

environment: name of the conda environment that will be created. By default, the environment name is set to "megapose". The MegaPose server will be installed in this environment and it should thus be activated before trying to start the server. For example, if you set this variable to "visp_megapose_server", then you can activate it with:
$ conda activate visp_megapose_server
megapose_dir: the folder where MegaPose will be installed. By default, the installation folder is set to "./megapose6d"
megapose_data_dir: the folder where the MegaPose deep learning models will be downloaded. By default, the data will be downloaded in "megapose" folder.

Once you have configured these variables: run the installation script with:

$ cd $VISP_WS/visp/script/megapose_server

$ python install.py

The script may run for a few minutes, as it downloads all the dependencies as well as the deep learning models that MegaPose requires.

Once the script has finished, you can check the installation status with the following commands where <name_of_your_environment> could be replaced by megapose if you didn't change the content of megapose_variables.json file:

$ conda activate <name_of_your_environment>

$ python -m megapose_server.run -h

The -h argument should print some documentation on the arguments that can be passed to the server.

With MegaPose installed, you are now ready to run a basic, single object tracking example.

Single object tracking with MegaPose

In this tutorial, we will track an object from a live camera feed. For MegaPose to work, we will need:

The 3D model of the object
A way to detect the object in the image
A machine with a GPU, that hosts the server. If your machine has a GPU, then you can run the server and this client in parallel.

To get you started, we provide the full data to run tracking on a short video. To go further, you should check Adapting this tutorial for your use case that will explain what you need to use your own objects and camera.

Starting the server

To use MegaPose, we first need to start the inference server. As we have installed the server in Installing the server, we can now use it from anywhere. First, activate your conda environment:

$ conda activate megapose

where megapose is the name of the conda environment that you have defined in megapose_variables.json file when installing the server.

We can now start the server, and examine its arguments with:

(megapose) $ python -m megapose_server.run -h
...
usage: run.py [-h] [--host HOST] [--port PORT]
              [--model {RGB,RGBD,RGB-multi-hypothesis,RGBD-multi-hypothesis}]
              [--meshes-directory MESHES_DIRECTORY] [--optimize]
              [--num_workers NUM_WORKERS]
 
optional arguments:
  -h, --help            show this help message and exit
  --host HOST           IP or hostname to bind the server to. Set to 0.0.0.0 if
                        you wish to listen for incoming connections from any
                        source (dangerous)
  --port PORT           The port on which to listen for new connections
  --model {RGB,RGBD,RGB-multi-hypothesis,RGBD-multi-hypothesis}
                        Which MegaPose model to use. Some models require the depth
                        map. Some models generate multiple hypotheses when
                        estimating the pose, at the cost of more computation.
                        Options: RGB, RGBD, RGB-multi-hypothesis, RGBD-multi-
                        hypothesis
  --meshes-directory MESHES_DIRECTORY
                        Directory containing the 3D models. each 3D model must be
                        in its own subfolder
  --optimize            Experimental: Optimize network for inference speed.
                        This may incur a loss of accuracy.
  --num-workers NUM_WORKERS
                        Number of workers for rendering

From the multiple arguments described, the required ones are:

--host: the IP address on which the server will listen. If you plan to run the tracking example and the MegaPose server on the same machine, use 127.0.0.1. If running on separate machines, you can find out the IP address of the server with:
- On Linux (with the net-tools package)
  $ ifconfig
  
  and look for the inet field of the network interface that can be reached by the client.
- On Windows
  C:> ipconfig /all
--port: The port on which the server will listen for incoming connections. This port should not already be in use by another program
--model: The model that is used to estimate the pose. The available options are:
- RGB: This model expects an RGB image as an input. From the coarse model estimates, the best pose hypothesis is given to the refiner, which performs 5 iterations by default.
- RGBD: Same as above, except that an RGBD image is expected in input. Using RGBD is not recommended for tracking applications, as the model is sensitive to depth noise.
- RGB-multi-hypothesis: Same as RGB, except that the coarse model selects the top-K hypotheses (Here, K = 5) which are all forwarded to the refiner model. This model will take far more time, and is thus not recommended for tracking, but may be useful for single shot pose estimation if you have no speed requirements.
- RGBD-multi-hypothesis: Is similar to RGB-multi-hypothesis, except that ICP after the refiner model has run on RGB images. This model thus requires an RGBD image.
--meshes-directory: The directory containing the 3D models. The supported format are .obj, .gltf and .glb. If your model is in another format, e.g., .stl, it can be converted through Blender The directory containing the models should be structured as follow:
models

|--cube

|--cube.obj

|--cube.mtl

|--texture.jpg

|--my_obj

|--object.glb

In the example above, if we start the server with the meshes-directory set to models, two objects should be recognized: cube and my_obj. The name of an object is dictated by its folder name.

To run the basic version of the tutorial below, we provide the model of the cube that is to be tracked in the video. The 3D models directory is data/model, located in the tutorial folder. To start the server, you should enter in your terminal:

(megapose) $ cd $VISP_WS/visp-build/tutorial/tracking/dnn

(megapose) $ python -m megapose_server.run --host 127.0.0.1 --port 5555 --model RGB --meshes-directory data/models

Note that this assumes that your current directory is the tutorial folder.

Warning: If you are running on Windows through WSL, you may encounter an error mentioning that a CUDA/CUDNN-related .so file is not found. To resolve this issue, enter
export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH

before starting the server.

Your server should now be started and waiting for incoming connections. You can now launch the tracking tutorial.

Running the tracking example

Let us now run the tracker on a video, with the default provided cube model. The video can be found in the data folder of the tutorial and the source code in tutorial-megapose-live-single-object-tracking.cpp located in $VISP_WS/visp/tutorial/tracking/dnn.

The program accepts many arguments, defined here through a vpJsonArgumentParser:

  vpJsonArgumentParser parser("Single object tracking with Megapose", "--config", "/");
  parser.addArgument("width", width, true, "The image width")
    .addArgument("height", height, true, "The image height")
    .addArgument("camera", cam, true, "The camera intrinsic parameters. Should correspond to a perspective projection model without distortion.")
    .addArgument("video-device", videoDevice, true, "Video device")
    .addArgument("object", objectName, true, "Name of the object to track with megapose.")
    .addArgument("detectionMethod", detectionMethod, true, "How to perform detection of the object to get the bounding box:"
      " \"click\" for user labelling, \"dnn\" for dnn detection.")
    .addArgument("reinitThreshold", reinitThreshold, false, "If the Megapose score falls below this threshold, then a reinitialization is be required."
      " Should be between 0 and 1")
    .addArgument("megapose/address", megaposeAddress, true, "IP address of the Megapose server.")
    .addArgument("megapose/port", megaposePort, true, "Port on which the Megapose server listens for connections.")
    .addArgument("megapose/refinerIterations", refinerIterations, false, "Number of Megapose refiner model iterations."
      "A higher count may lead to better accuracy, at the cost of more processing time")
    .addArgument("megapose/initialisationNumSamples", coarseNumSamples, false, "Number of Megapose renderings used for the initial pose estimation.")
 
    .addArgument("detector/model-path", detectorModelPath, true, "Path to the model")
    .addArgument("detector/config", detectorConfig, true, "Path to the model configuration. Set to none if config is not required.")
    .addArgument("detector/framework", detectorFramework, true, "Detector framework")
    .addArgument("detector/type", detectorTypeString, true, "Detector type")
    .addArgument("detector/labels", labels, true, "Detection class labels")
    .addArgument("detector/mean/red", detectorMeanR, false, "Detector mean red component. Used to normalize image")
    .addArgument("detector/mean/green", detectorMeanG, false, "Detector mean green component. Used to normalize image")
    .addArgument("detector/mean/blue", detectorMeanB, false, "Detector mean red component. Used to normalize image")
    .addArgument("detector/confidenceThreshold", detectorConfidenceThreshold, false, "Detector confidence threshold. "
      "When a detection with a confidence below this threshold, it is ignored")
    .addArgument("detector/nmsThreshold", detectorNmsThreshold, false, "Detector non maximal suppression threshold.")
    .addArgument("detector/filterThreshold", detectorFilterThreshold, false)
    .addArgument("detector/scaleFactor", detectorScaleFactor, false, "Pixel intensity rescaling factor. If set to 1/255, then pixel values are between 0 and 1.")
    .addArgument("detector/swapRedAndBlue", detectorSwapRB, false, "Whether to swap red and blue channels before feeding the image to the detector.");
 
  parser.parse(argc, argv);

Since there are many arguments, we provide a default configuration to run on the video of the cube. This configuration is found in the file $VISP_WS/visp/tutorial/tracking/dnn/data/megapose_cube.json:

{
  "height": 480,
  "width": 640,
  "video-device": "0",
  "object": "cube",
  "detectionMethod": "click",
  "reinitThreshold": 0.5,
  "camera": {
    "model": "perspectiveWithoutDistortion",
    "px": 605.146728515625,
    "py": 604.79150390625,
    "u0": 325.53253173828125,
    "v0": 244.95083618164063
  },
  "detector": {
    "model-path": "/path/to/cube/cube_detector.onnx",
    "config": "none",
    "type": "yolov7",
    "framework": "onnx",
    "labels": [
      "cube"
    ],
    "mean":{
      "red": 0,
      "blue": 0,
      "green": 0
    },
    "confidenceThreshold": 0.65,
    "nmsThreshold": 0.5,
    "filterThreshold": -0.25,
    "scaleFactor": 0.003921569,
    "swapRedAndBlue": false
  },
 
  "megapose": {
    "address": "127.0.0.1",
    "port": 5555,
    "refinerIterations": 1,
    "initialisationNumSamples": 576
  },
  "megaposeObjectToDetectorClass": {
    "cube": "cube"
  }
}

Among the argument, the most interesting ones are:

width, height: the dimensions of the image.
video-device: The source of the images. Input 0,1,2,... etc for a realtime camera feed, or the name of a video file.
camera: The intrinsics of the camera. Here, the video is captured on an Intel Realsense D435, and the intrinsics are obtained from the realsense SDK. The video is captured by using the tutorial Frame grabbing using Realsense SDK.
reinitThreshold: a threshold between 0 and 1. If the MegaPose's score is below this threshold, it should be reinitialised (requiring a 2D bounding box).
detectionMethod: How to acquire a bounding box of the object in the image.
object: name of the object to track. Should match an object that is in the mesh directory of the MegaPose server.
megapose/address: The IP of the MegaPose server.
megapose/refinerIterations: Number of iterations performed by the refiner model. This impacts both (re)initialization and tracking. Values above 1 may be too slow for tracking.
megapsose/initialisationNumSamples: Number of renders (random poses) used for the initialisation.

For the parameters of the detector (used if detectionMethod == dnn), see Tutorial: Deep learning object detection. Here, the parameters correspond to a YoloV7-tiny, trained only to detect the cube. Note that to train this detector, we acquired ~400 images with Frame grabbing using Realsense SDK, then annotated them with labelImg. A more recent alternative seems to be LabelStudio. The detector should be trained (and exported) with images of the same size as provided to MegaPose.

To launch the tracking program, enter:

$ cd $VISP_WS/visp-build/tutorial/tracking/dnn

$ ./tutorial-megapose-live-single-object-tracking --config data/megapose_cube.json megapose/address 127.0.0.1 megapose/port 5555 video-device data/cube_video.mp4

If the MegaPose server is running on another machine or uses another port, replace the arguments with your values.

If everything goes well, you should obtain results similar to those displayed below:

In this visualization, you can see the 3D model being displayed, as well as the object frame expressed in the camera. The model display can be toggled by pressing T. Displaying can be helpful in two ways:

Visually ensuring that tracking produces coherent results
Verifying that the model is correctly interpreted by megapose

The bar at the bottom displays the score coming from megapose. This score reflects whether the tracking has diverged and a reinitialization is required.

Understanding the program

We will now go through the code to understand how MegaPose can be called. The full code can be found in tutorial-megapose-live-single-object-tracking.cpp

After parsing the parameters given by the user (see above), we create a connection to the MegaPose server:

  std::shared_ptr<vpMegaPose> megapose;
  try {
    megapose = std::make_shared<vpMegaPose>(megaposeAddress, megaposePort, cam, height, width);
  }
  catch (...) {
    throw vpException(vpException::ioError, "Could not connect to Megapose server at " + megaposeAddress + " on port " + std::to_string(megaposePort));
  }
 
  vpMegaPoseTracker megaposeTracker(megapose, objectName, refinerIterations);
  megapose->setCoarseNumSamples(coarseNumSamples);
  const std::vector<std::string> allObjects = megapose->getObjectNames();
  if (std::find(allObjects.begin(), allObjects.end(), objectName) == allObjects.end()) {
    throw vpException(vpException::badValue, "Object " + objectName + " is not known by the Megapose server!");
  }
  std::future<vpMegaPoseEstimate> trackerFuture;

We first create the raw vpMegaPose object, passing as parameters the IP adress and the port, as well as the camera calibration and image resolution. This class can directly be used to perform pose estimation, but we will here prefer the vpMegaPoseTracker class, which provides a simpler interface in the case of tracking. In addition, it allows to call MegaPose asynchronously and we can then use the main thread to perform other operations, such as acquiring and displaying the latest frame.

To the tracker, we provide the name of the object we wish to track, as well as the number of iterations that MegaPose should perform. Run time will scale linearly in the number of iterations.

Once our tracker is initialized, we set the number of samples for coarse pose estimation (when we provide a bounding box detection, but no previous pose estimate).

We also check that the object's name is known to MegaPose. If it is not, then tracking will not be possible.

Finally, we initialize a reference to a future object, which will store the latest pose estimation result.

We can now enter a loop which will start by acquiring the latest image from the camera:

  while (true) {
    const double frameStart = vpTime::measureTimeMs();
    capture >> frame;
    if (frame.empty())
      break;
 
    if (I.getSize() == 0) {
      vpImageConvert::convert(frame, I);
      display->init(I);
      vpDisplay::setTitle(I, "Megapose object pose estimation");
    }
    else {
      vpImageConvert::convert(frame, I);
    }
    vpDisplay::display(I);

Once we have acquired an image, we continue by checking MegaPose has returned a result. Of course, this will not be the case for the first iteration. If there is indeed a new result, we can check the confidence score to decide if a reinitialization is required and request the rendering from MegaPose to display it afterwards. In addition, we also request a new pose estimation, by setting the callMegapose boolean to true.

    if (!callMegapose && trackerFuture.wait_for(std::chrono::milliseconds(0)) == std::future_status::ready) {
      megaposeEstimate = trackerFuture.get();
      if (tracking) {
        megaposeTimes.push_back(vpTime::measureTimeMs() - megaposeStartTime);
      }
      callMegapose = true;
      tracking = true;
 
      if (overlayModel) {
        overlayImage = megapose->viewObjects({ objectName }, { megaposeEstimate.cTo }, overlayMode);
      }
 
      if (megaposeEstimate.score < reinitThreshold) { // If confidence is low, require a reinitialisation with 2D detection
        initialized = false;
      }
    }

When requesting a pose estimate, there are two states to handle:

We are not already tracking, or tracking has failed. In this case, we require the 2D bounding box of the object to (re)initialize tracking. To perform detection, we provide two methods
- The first, where a trained detection neural network (detectionMethod == "dnn") performs the bounding box regression.
- The second, ideal for initial tests, where the user provides the detection. Note that in both cases, the methods (described after) return an optional value: the object may not always be visible in the image.
In the second case, we are already tracking: We can simply feed the latest image to MegaPose as we already have an estimate of the object pose.

    if (callMegapose) {
      if (!initialized) {
        tracking = false;
        std::optional<vpRect> detection = std::nullopt;
#if (VISP_HAVE_OPENCV_VERSION >= 0x030403) && defined(HAVE_OPENCV_DNN) && \
    ((__cplusplus >= 201703L) || (defined(_MSVC_LANG) && (_MSVC_LANG >= 201703L)))
        if (detectionMethod == DetectionMethod::DNN) {
          detection = detectObjectForInitMegaposeDnn(
            dnn, frame, objectName, initialized ? std::optional(megaposeEstimate) : std::nullopt);
        }
#endif
        if (detectionMethod == DetectionMethod::CLICK) {
          detection = detectObjectForInitMegaposeClick(I);
        }
 
        if (detection) {
          initialized = true;
          lastDetection = *detection;
          trackerFuture = megaposeTracker.init(I, lastDetection);
          callMegapose = false;
 
        }
      }
      else {
        trackerFuture = megaposeTracker.track(I);
        callMegapose = false;
        megaposeStartTime = vpTime::measureTimeMs();
      }
    }

To provide a bounding box to megapose, the code of the two methods can be found below and is fairly straightforward

/*
 * Run the detection network on an image in order to find a specific object.
 * The best matching detection is returned:
 * - If a previous Megapose estimation is available, find the closest match in the image (Euclidean distance between centers)
 * - Otherwise, take the detection with highest confidence
 * If no detection corresponding to detectionLabel is found, then std::nullopt is returned
 */
std::optional<vpRect> detectObjectForInitMegaposeDnn(vpDetectorDNNOpenCV &detector, const cv::Mat &I,
  const std::string &detectionLabel,
  std::optional<vpMegaPoseEstimate> previousEstimate)
{
  std::vector<vpDetectorDNNOpenCV::DetectedFeatures2D> detections_vec;
  detector.detect(I, detections_vec);
  std::vector<vpDetectorDNNOpenCV::DetectedFeatures2D> matchingDetections;
  for (const auto &detection : detections_vec) {
    std::optional<std::string> classnameOpt = detection.getClassName();
    if (classnameOpt) {
      if (*classnameOpt == detectionLabel) {
        matchingDetections.push_back(detection);
      }
    }
  }
  if (matchingDetections.size() == 0) {
    return std::nullopt;
  }
  else if (matchingDetections.size() == 1) {
    return matchingDetections[0].getBoundingBox();
  }
  else {
    // Get detection that is closest to previous object bounding box estimated by Megapose
    if (previousEstimate) {
      vpRect best;
      double bestDist = 10000.f;
      const vpImagePoint previousCenter = (*previousEstimate).boundingBox.getCenter();
      for (const auto &detection : matchingDetections) {
        const vpRect detectionBB = detection.getBoundingBox();
        const vpImagePoint center = detectionBB.getCenter();
        const double matchDist = vpImagePoint::distance(center, previousCenter);
        if (matchDist < bestDist) {
          bestDist = matchDist;
          best = detectionBB;
        }
      }
      return best;
 
    }
    else { // Get detection with highest confidence
      vpRect best;
      double highestConf = 0.0;
      for (const auto &detection : matchingDetections) {
        const double conf = detection.getConfidenceScore();
        if (conf > highestConf) {
          highestConf = conf;
          best = detection.getBoundingBox();
        }
      }
      return best;
    }
  }
  return std::nullopt;
}
 
/*
 * Ask user to provide the detection themselves. They must click to start labelling, then click on the top left and bottom right corner to create the detection.
 */
std::optional<vpRect> detectObjectForInitMegaposeClick(const vpImage<vpRGBa> &I)
{
  const bool startLabelling = vpDisplay::getClick(I, false);
 
  const vpImagePoint textPosition(10.0, 20.0);
 
  if (startLabelling) {
    vpImagePoint topLeft, bottomRight;
    vpDisplay::displayText(I, textPosition, "Click the upper left corner of the bounding box", vpColor::red);
    vpDisplay::flush(I);
    vpDisplay::getClick(I, topLeft, true);
    vpDisplay::display(I);
    vpDisplay::displayCross(I, topLeft, 5, vpColor::red, 2);
    vpDisplay::displayText(I, textPosition, "Click the bottom right corner of the bounding box", vpColor::red);
    vpDisplay::flush(I);
    vpDisplay::getClick(I, bottomRight, true);
    vpRect bb(topLeft, bottomRight);
    return bb;
  }
  else {
    vpDisplay::display(I);
    vpDisplay::displayText(I, textPosition, "Click when the object is visible and static to start reinitializing megapose.", vpColor::red);
    vpDisplay::flush(I);
    return std::nullopt;
  }
}

Once MegaPose has been called, we can display the results in the image. We plot:

The object pose, expressed in the camera frame
The 3D render as seen by MegaPose, overlayed on the actual image
The confidence score of MegaPose
std::string keyboardEvent;

const bool keyPressed = vpDisplay::getKeyboardEvent(I, keyboardEvent, false);

if (keyPressed) {

if (keyboardEvent == "t") {

overlayModel = !overlayModel;

}

else if (keyboardEvent == "w") {

overlayMode = overlayMode == "full" ? "wireframe" : "full";

}

}

if (tracking) {

if (overlayModel) {

overlayRender(I, overlayImage);

vpDisplay::display(I);

}

vpDisplay::displayText(I, 20, 20, "Right click to quit", vpColor::red);

vpDisplay::displayText(I, 30, 20, "Press T: Toggle overlay", vpColor::red);

vpDisplay::displayText(I, 40, 20, "Press W: Toggle wireframe", vpColor::red);

vpDisplay::displayFrame(I, megaposeEstimate.cTo, cam, 0.05, vpColor::none, 3);

//vpDisplay::displayRectangle(I, lastDetection, vpColor::red);

displayScore(I, megaposeEstimate.score);

}

vpColor::none
static const vpColor none
Definition: vpColor.h:210

vpDisplay::getKeyboardEvent
static bool getKeyboardEvent(const vpImage< unsigned char > &I, bool blocking=true)
Definition: vpDisplay_uchar.cpp:1063

vpDisplay::displayFrame
static void displayFrame(const vpImage< unsigned char > &I, const vpHomogeneousMatrix &cMo, const vpCameraParameters &cam, double size, const vpColor &color=vpColor::none, unsigned int thickness=1, const vpImagePoint &offset=vpImagePoint(0, 0), const std::string &frameName="", const vpColor &textColor=vpColor::black, const vpImagePoint &textOffset=vpImagePoint(15, 15))
Definition: vpDisplay_uchar.cpp:413

We have walked through the code of a single object tracking with MegaPose. You may wish to save the results. You can do so by serializing to JSON, as explained in Tutorial: Using JSON serialization to save your data and read program arguments

Adapting this tutorial for your use case

This program can run with new 3D models, as MegaPose does require retraining. To adapt this script to your problem, you will require multiple things:

The intrinsic parameters of your camera. To calibrate your camera, see Tutorial: Camera intrinsic calibration
The 3D model of your object. See Tutorial: Exporting a 3D model to MegaPose after reconstruction with NeRF
Optionally (but recommended), an automated way to detect the object. You can for instance train a deep neural network and use it in ViSP, as explained in Tutorial: Deep learning object detection. Since you already have your 3D model, you can use Blender to generate a synthetic dataset and train a detection network without manually annotating images. This process is explained in Tutorial: Generating synthetic data for deep learning with Blenderproc.

This tracking example has been used to illustrate some of MegaPose's properties. First, combining it with a deep learning detection method provides an automatic tracking initialization/reinitialization method.

Second, It is able to track reconstructed meshes and is resistant to occlusions as seen below.

It is also resistant to lighting variations and can track textureless objects.

Finally, MegaPose is an ideal candidate for Pose-Based Visual Servoing. The video below shows an example of a PBVS experiment where MegaPose provides the pose estimation that is given as input to the PBVS control law. See Next steps for more information.

Next steps

To go further, you can look at an example of visual servoing using Megapose, available at servoAfma6MegaposePBVS.cpp

Table of Contents