Introduction

This tutorial will show how to use Blender, a free and open source 3D creation suite, to generate color and depth images from a virtual camera and get ground truth camera poses.

The configuration used for this tutorial is:

Ubuntu 16.04
Blender 2.79b

Warning

You are advised to know how to use the basic tools of Blender before reading this tutorial. Some non-exhaustive links:

Note that all the material (source code, input video, CAD model or XML settings files) described and used in this tutorial is part of ViSP source code and could be downloaded using the following command:

$ svn export https://github.com/lagadic/visp.git/trunk/tutorial/tracking/model-based/generic-rgbd-blender

Camera settings

In ViSP and in the computer vision community, camera intrinsic parameters are the following:

$\begin{bmatrix} p_x & 0 & u_0 \\ 0 & p_y & v_0 \\ 0 & 0 & 1 \end{bmatrix}$

focal length $\left( p_x, p_y \right)$
principal point $\left( u_0, v_0 \right)$
(plus the distortion coefficients)

In Blender, you will have to set the camera resolution, generally 640x480 to have a VGA camera resolution:

Focal length can be set with the camera panel by changing the focal length and/or the sensor size:

The relation is the following:

$p_x = \frac{f_{x\text{ in mm}} \times \text{image width in px}}{\text{sensor width in mm}}$

The Python script to get the camera intrinsic parameters is:

 import bpy
 from mathutils import Matrix
 
 # https://blender.stackexchange.com/questions/15102/what-is-blenders-camera-projection-matrix-model/38189#38189
 def get_calibration_matrix_K_from_blender(camd):
     f_in_mm = camd.lens
     scene = bpy.context.scene
     resolution_x_in_px = scene.render.resolution_x
     resolution_y_in_px = scene.render.resolution_y
     scale = scene.render.resolution_percentage / 100
     sensor_width_in_mm = camd.sensor_width
     sensor_height_in_mm = camd.sensor_height
     pixel_aspect_ratio = scene.render.pixel_aspect_x / scene.render.pixel_aspect_y
     if (camd.sensor_fit == 'VERTICAL'):
         # the sensor height is fixed (sensor fit is horizontal), 
         # the sensor width is effectively changed with the pixel aspect ratio
         s_u = resolution_x_in_px * scale / sensor_width_in_mm / pixel_aspect_ratio 
         s_v = resolution_y_in_px * scale / sensor_height_in_mm
     else: # 'HORIZONTAL' and 'AUTO'
         # the sensor width is fixed (sensor fit is horizontal), 
         # the sensor height is effectively changed with the pixel aspect ratio
         pixel_aspect_ratio = scene.render.pixel_aspect_x / scene.render.pixel_aspect_y
         s_u = resolution_x_in_px * scale / sensor_width_in_mm
         s_v = resolution_y_in_px * scale * pixel_aspect_ratio / sensor_height_in_mm
 
     # Parameters of intrinsic calibration matrix K
     alpha_u = f_in_mm * s_u
     alpha_v = f_in_mm * s_v
     u_0 = resolution_x_in_px*scale / 2
     v_0 = resolution_y_in_px*scale / 2
     skew = 0 # only use rectangular pixels
 
     K = Matrix(
         ((alpha_u, skew,    u_0),
         (    0  ,  alpha_v, v_0),
         (    0  ,    0,      1 )))
     return K
 
 if __name__ == "__main__":
     # Insert your camera name below
     K = get_calibration_matrix_K_from_blender(bpy.data.objects['Camera'].data)
     print(K)

On Ubuntu, to run the Python script:

launch Blender from a terminal
split or switch from 3D View to Text Editor
open the Python script file
click on Run Script

You should get something similar to:

<Matrix 3x3 (700.0000,   0.0000, 320.0000)
            (  0.0000, 700.0000, 240.0000)
            (  0.0000,   0.0000,   1.0000)>

Warning: In Blender 2.79b, you may need to switch the Sensor Fit from Auto to Vertical to change the Sensor Height to be compatible with a 4/3 ratio and have

Note: The principal point is always in the middle of the image here.

Stereo camera

To generate the depth map, add a second camera and set the appropriate parameters to match the desired intrinsic parameters. Then select one camera (the child object) and the other one (the parent object) and hit Ctrl + P to parent them. This way, the two cameras will be linked when you will move the cameras.

Note: The default camera used for rendering should be the one with the black triangle. You can change this with the Scene panel.

Create a teabox

To create a teabox with texture, we download directly a 3D model here. Then, the rough instructions should be:

select the teabox object
switch to the Texture panel
add a new texture and open the image
switch to the edit mode
switch to the UV/Image Editor and select the image

You should get something similar to this:

See here for more information about texture and UV unwrapping.

Create a camera trajectory

This can be done easily:

move the stereo cameras at a desired location / orientation
hit I then LocRotScale to insert a keyframe at the current frame
repeat to perform the desired camera or object movement

Do not forget to add some lights to make the object visible.

Generate the images and the depth maps

Images are generated automatically when rendering the animation (menu Render > Render Animation) and are saved on Ubuntu by default in the /tmp folder. To generate the depth maps, switch to the Compositing screen layout, next to the menu bar:

tick Use Nodes and Backdrop
add a file output node
add a link between the Depth output of the Render Layers node to the File Output node
select the OpenEXR file format

The easiest thing is to run the animation first with the camera used to generate color images and a second time with the one for the depth maps. To switch the main camera, go to the Scene panel and select the desired camera.

Retrieve the camera poses

The camera poses can be retrieved using the following Python script:

 import bpy
 import os
 from mathutils import *
 
 prefix_pose = "/tmp/camera_poses/"
 prefix_image = "/tmp/images/"
 
 def get_camera_pose(cameraName, objectName, scene, frameNumber):
   if not os.path.exists(prefix_pose):
     os.makedirs(prefix_pose)
 
   # OpenGL to Computer vision camera frame convention
   M = Matrix().to_4x4()
   M[1][1] = -1
   M[2][2] = -1
 
   cam = bpy.data.objects[cameraName]
   object_pose = bpy.data.objects[objectName].matrix_world
 
   #Normalize orientation with respect to the scale
   object_pose_normalized = object_pose.copy()
   object_orientation_normalized = object_pose_normalized.to_3x3().normalized()
   for i in range(3):
     for j in range(3):
         object_pose_normalized[i][j] = object_orientation_normalized[i][j]
 
   camera_pose = M*cam.matrix_world.inverted()*object_pose_normalized
   print("camera_pose:\n", camera_pose)
   
   filename = prefix_pose + cameraName + "_%03d" % frameNumber + ".txt"
   with open(filename, 'w') as f:
     f.write(str(camera_pose[0][0]) + " ")
     f.write(str(camera_pose[0][1]) + " ")
     f.write(str(camera_pose[0][2]) + " ")
     f.write(str(camera_pose[0][3]) + " ")
     f.write("\n")
 
     f.write(str(camera_pose[1][0]) + " ")
     f.write(str(camera_pose[1][1]) + " ")
     f.write(str(camera_pose[1][2]) + " ")
     f.write(str(camera_pose[1][3]) + " ")
     f.write("\n")
 
     f.write(str(camera_pose[2][0]) + " ")
     f.write(str(camera_pose[2][1]) + " ")
     f.write(str(camera_pose[2][2]) + " ")
     f.write(str(camera_pose[2][3]) + " ")
     f.write("\n")
 
     f.write(str(camera_pose[3][0]) + " ")
     f.write(str(camera_pose[3][1]) + " ")
     f.write(str(camera_pose[3][2]) + " ")
     f.write(str(camera_pose[3][3]) + " ")
     f.write("\n")
 
   return
 
 
 def my_handler(scene):
   frameNumber = scene.frame_current
   print("\n\nFrame Change", scene.frame_current)
   get_camera_pose("Camera", "tea_box_02", scene, frameNumber)
 
 step_count = 250
 scene = bpy.context.scene
 for step in range(1, step_count):
   # Set render frame
   scene.frame_set(step)
 
   # Set filename and render
   if not os.path.exists(prefix_image):
     os.makedirs(prefix_image)
   scene.render.filepath = (prefix_image + '%04d.png') % step
   bpy.ops.render.render( write_still=True )
 
   my_handler(scene)

This script will also automatically generate and save the animation images and write the corresponding camera poses.

Note: Data are saved in the /tmp/ directory by default and the path should be changed accordingly depending on the OS. Camera and object names should also be changed accordingly.

Model-based tracker on simulated data

Source code

Since depth data are stored in OpenEXR file format, OpenCV is used for the reading. The following C++ sample file also available in tutorial-mb-generic-tracker-rgbd-blender.cpp reads color and depth images, pointcloud is recreated using the depth camera intrinsic parameters and the ground truth data are read and printed along with the estimated camera pose from the model-based tracker.

#include <iostream>
#include <visp3/core/vpDisplay.h>
#include <visp3/core/vpIoTools.h>
#include <visp3/io/vpImageIo.h>
#include <visp3/gui/vpDisplayX.h>
#include <visp3/gui/vpDisplayGDI.h>
#include <visp3/gui/vpDisplayOpenCV.h>
#include <visp3/mbt/vpMbGenericTracker.h>
#if (VISP_HAVE_OPENCV_VERSION >= 0x020403) && defined(VISP_HAVE_XML2)
namespace {
bool read_data(unsigned int cpt, const std::string &input_directory, vpImage<unsigned char> &I,
               vpImage<uint16_t> &I_depth_raw, unsigned int &depth_width, unsigned int &depth_height,
               std::vector<vpColVector> &pointcloud, const vpCameraParameters &cam,
               vpHomogeneousMatrix &cMo_ground_truth)
{
  char buffer[FILENAME_MAX];
  // Read color
  std::stringstream ss;
  ss << input_directory << "/images/%04d.jpg";
  sprintf(buffer, ss.str().c_str(), cpt);
  std::string filename_img = buffer;
  if (!vpIoTools::checkFilename(filename_img)) {
    std::cerr << "Cannot read: " << filename_img << std::endl;
    return false;
  }
  vpImageIo::read(I, filename_img);
  // Read depth
  ss.str("");
  ss << input_directory << "/depth/Image%04d.exr";
  sprintf(buffer, ss.str().c_str(), cpt);
  std::string filename_depth = buffer;
  cv::Mat depth_raw = cv::imread(filename_depth, cv::IMREAD_ANYDEPTH | cv::IMREAD_ANYCOLOR);
  if (depth_raw.empty()) {
    std::cerr << "Cannot read: " << filename_depth << std::endl;
    return false;
  }
  depth_width = static_cast<unsigned int>(depth_raw.cols);
  depth_height = static_cast<unsigned int>(depth_raw.rows);
  I_depth_raw.resize(depth_height, depth_width);
  pointcloud.resize(depth_width*depth_height);
  for (int i = 0; i < depth_raw.rows; i++) {
    for (int j = 0; j < depth_raw.cols; j++) {
      I_depth_raw[i][j] = static_cast<uint16_t>(32767.5f * depth_raw.at<cv::Vec3f>(i, j)[0]);
      double x = 0.0, y = 0.0;
      // Manually limit the field of view of the depth camera
      double Z = depth_raw.at<cv::Vec3f>(i, j)[0] > 2.0f ? 0.0 : static_cast<double>(depth_raw.at<cv::Vec3f>(i, j)[0]);
      vpPixelMeterConversion::convertPoint(cam, j, i, x, y);
      size_t idx = static_cast<size_t>(i*depth_raw.cols + j);
      pointcloud[idx].resize(3);
      pointcloud[idx][0] = x*Z;
      pointcloud[idx][1] = y*Z;
      pointcloud[idx][2] = Z;
    }
  }
  // Read ground truth
  ss.str("");
  ss << input_directory << "/camera_poses/Camera_%03d.txt";
  sprintf(buffer, ss.str().c_str(), cpt);
  std::string filename_pose = buffer;
  std::ifstream f_pose;
  f_pose.open(filename_pose.c_str()); // .c_str() to keep compat when c++11 not available
  if (!f_pose.is_open()) {
    std::cerr << "Cannot read: " << filename_pose << std::endl;
    return false;
  }
  cMo_ground_truth.load(f_pose);
  return true;
}
}
int main(int argc, char *argv[])
{
  std::string input_directory = "."; // location of the data (images, depth maps, camera poses)
  std::string config_color = "teabox.xml", config_depth = "teabox_depth.xml";
  std::string model_color = "teabox.cao", model_depth = "teabox.cao";
  std::string init_file = "teabox.init";
  std::string extrinsic_file = "depth_M_color.txt";
  unsigned int first_frame_index = 1;
  bool disable_depth = false;
  bool display_ground_truth = false;
  bool click = false;
  for (int i = 1; i < argc; i++) {
    if (std::string(argv[i]) == "--input_directory" && i + 1 < argc) {
      input_directory = std::string(argv[i + 1]);
    } else if (std::string(argv[i]) == "--config_color" && i + 1 < argc) {
      config_color = std::string(argv[i + 1]);
    } else if (std::string(argv[i]) == "--config_depth" && i + 1 < argc) {
      config_depth = std::string(argv[i + 1]);
    } else if (std::string(argv[i]) == "--model_color" && i + 1 < argc) {
      model_color = std::string(argv[i + 1]);
    } else if (std::string(argv[i]) == "--model_depth" && i + 1 < argc) {
      model_depth = std::string(argv[i + 1]);
    } else if (std::string(argv[i]) == "--init_file" && i + 1 < argc) {
      init_file = std::string(argv[i + 1]);
    } else if (std::string(argv[i]) == "--extrinsics" && i + 1 < argc) {
      extrinsic_file = std::string(argv[i + 1]);
    } else if (std::string(argv[i]) == "--disable_depth") {
      disable_depth = true;
    } else if (std::string(argv[i]) == "--display_ground_truth") {
      display_ground_truth = true;
    } else if (std::string(argv[i]) == "--click") {
      click = true;
    } else if (std::string(argv[i]) == "--first_frame_index" && i+1 < argc) {
      first_frame_index = static_cast<unsigned int>(atoi(argv[i+1]));
    }
    else if (std::string(argv[i]) == "--help" || std::string(argv[i]) == "-h") {
      std::cout << "Usage: \n" << argv[0] << " [--input_directory <data directory> (default: .)]"
                   " [--config_color <object.xml> (default: teabox.xml)] [--config_depth <object.xml> (default: teabox_depth.xml)]"
                   " [--model_color <object.cao> (default: teabox.cao)] [--model_depth <object.cao> (default: teabox.cao)]"
                   " [--init_file <object.init> (default: teabox.init)]"
                   " [--extrinsics <depth to color transformation> (default: depth_M_color.txt)] [--disable_depth]"
                   " [--display_ground_truth] [--click] [--first_frame_index <index> (default: 1)]" << std::endl;
      return EXIT_SUCCESS;
    }
  }
  std::cout << "input_directory: " << input_directory << std::endl;
  std::cout << "config_color: " << config_color << std::endl;
  std::cout << "config_depth: " << config_depth << std::endl;
  std::cout << "model_color: " << model_color << std::endl;
  std::cout << "model_depth: " << model_depth << std::endl;
  std::cout << "init_file: " << model_depth << std::endl;
  std::cout << "extrinsic_file: " << extrinsic_file << std::endl;
  std::cout << "first_frame_index: " << first_frame_index << std::endl;
  std::cout << "disable_depth: " << disable_depth << std::endl;
  std::cout << "display_ground_truth: " << display_ground_truth << std::endl;
  std::cout << "click: " << click << std::endl;
  std::vector<int> tracker_types;
  tracker_types.push_back(vpMbGenericTracker::EDGE_TRACKER | vpMbGenericTracker::KLT_TRACKER);
  if (!disable_depth)
    tracker_types.push_back(vpMbGenericTracker::DEPTH_DENSE_TRACKER);
  vpMbGenericTracker tracker(tracker_types);
  if (!disable_depth)
    tracker.loadConfigFile(config_color, config_depth);
  else
    tracker.loadConfigFile(config_color);
  tracker.loadModel(model_color);
  vpCameraParameters cam_color, cam_depth;
  if (!disable_depth)
    tracker.getCameraParameters(cam_color, cam_depth);
  else
    tracker.getCameraParameters(cam_color);
  tracker.setDisplayFeatures(true);
  std::cout << "cam_color:\n" << cam_color << std::endl;
  std::cout << "cam_depth:\n" << cam_depth << std::endl;
  vpImage<uint16_t> I_depth_raw;
  vpImage<unsigned char> I, I_depth;
  unsigned int depth_width = 0, depth_height = 0;
  std::vector<vpColVector> pointcloud;
  vpHomogeneousMatrix cMo_ground_truth;
  unsigned int frame_cpt = first_frame_index;
  read_data(frame_cpt, input_directory, I, I_depth_raw, depth_width, depth_height, pointcloud, cam_depth, cMo_ground_truth);
  vpImageConvert::createDepthHistogram(I_depth_raw, I_depth);
#if defined(VISP_HAVE_X11)
  vpDisplayX d1, d2;
#elif defined(VISP_HAVE_GDI)
  vpDisplayGDI d1, d2;
#else
  vpDisplayOpenCV d1, d2;
#endif
  d1.init(I, 0, 0, "Color image");
  d2.init(I_depth, static_cast<int>(I.getWidth()), 0, "Depth image");
  vpHomogeneousMatrix depthMcolor;
  if (!disable_depth) {
    std::ifstream f_extrinsics;
    f_extrinsics.open(extrinsic_file.c_str()); // .c_str() to keep compat when c++11 not available
    depthMcolor.load(f_extrinsics);
    tracker.setCameraTransformationMatrix("Camera2", depthMcolor);
    std::cout << "depthMcolor:\n" << depthMcolor << std::endl;
  }
  if (display_ground_truth) {
    tracker.initFromPose(I, cMo_ground_truth); //I and I_depth must be the same size when using depth features!
  } else
    tracker.initClick(I, init_file, true); //I and I_depth must be the same size when using depth features!
  try {
    bool quit = false;
    while (!quit && read_data(frame_cpt, input_directory, I, I_depth_raw, depth_width, depth_height, pointcloud, cam_depth, cMo_ground_truth)) {
      vpImageConvert::createDepthHistogram(I_depth_raw, I_depth);
      vpDisplay::display(I);
      vpDisplay::display(I_depth);
      if (display_ground_truth) {
        tracker.initFromPose(I, cMo_ground_truth); //I and I_depth must be the same size when using depth features!
      } else {
        if (!disable_depth) {
          std::map<std::string, const vpImage<unsigned char> *> mapOfImages;
          std::map<std::string, const std::vector<vpColVector> *> mapOfPointClouds;
          std::map<std::string, unsigned int> mapOfPointCloudWidths;
          std::map<std::string, unsigned int> mapOfPointCloudHeights;
          mapOfImages["Camera1"] = &I;
          mapOfPointClouds["Camera2"] = &pointcloud;
          mapOfPointCloudWidths["Camera2"] = depth_width;
          mapOfPointCloudHeights["Camera2"] = depth_height;
          tracker.track(mapOfImages, mapOfPointClouds, mapOfPointCloudWidths, mapOfPointCloudHeights);
        } else
          tracker.track(I);
      }
      vpHomogeneousMatrix cMo = tracker.getPose();
      std::cout << "\nFrame: " << frame_cpt << std::endl;
      if (!display_ground_truth)
        std::cout << "cMo:\n" << cMo << std::endl;
      std::cout << "cMo ground truth:\n" << cMo_ground_truth << std::endl;
      if (!disable_depth) {
        tracker.display(I, I_depth, cMo, depthMcolor*cMo, cam_color, cam_depth, vpColor::red, 2);
        vpDisplay::displayFrame(I_depth, depthMcolor*cMo, cam_depth, 0.05, vpColor::none, 2);
      }
      else
        tracker.display(I, cMo, cam_color, vpColor::red, 2);
      vpDisplay::displayFrame(I, cMo, cam_color, 0.05, vpColor::none, 2);
      std::ostringstream oss;
      oss << "Frame: " << frame_cpt;
      vpDisplay::displayText(I, 20, 20, oss.str(), vpColor::red);
      if (!display_ground_truth) {
        oss.str("");
        oss << "Nb features: " << tracker.getError().getRows();
        vpDisplay::displayText(I, 40, 20, oss.str(), vpColor::red);
      }
      vpDisplay::flush(I);
      vpDisplay::flush(I_depth);
      vpMouseButton::vpMouseButtonType button;
      if (vpDisplay::getClick(I, button, click)) {
        switch (button) {
        case vpMouseButton::button1:
          quit = !click;
          break;
        case vpMouseButton::button3:
          click = !click;
          break;
        default:
          break;
        }
      }
      frame_cpt++;
    }
    vpDisplay::display(I);
    vpDisplay::displayText(I, 20, 20, "Click to quit.", vpColor::red);
    vpDisplay::flush(I);
    vpDisplay::getClick(I);
  } catch (std::exception& e) {
    std::cerr << "Catch exception: " << e.what() << std::endl;
  }
  return EXIT_SUCCESS;
}
#else
int main()
{
  std::cout << "To run this tutorial, ViSP should be built with OpenCV and libXML2 libraries." << std::endl;
  return EXIT_SUCCESS;
}
#endif

Note: Here the depth values are manually clipped in order to simulate the depth range of a depth sensor. This probably can be done directly in Blender.

Usage on simulated data

Once build, to get tutorial-mb-generic-tracker-rgbd-blender.cpp usage, just run:

$ ./tutorial-mb-generic-tracker-rgbd-blender -h

./tutorial-mb-generic-tracker-rgbd-blender [--input_directory <data directory> (default: .)] [--config_color <object.xml> (default: teabox.xml)] [--config_depth <object.xml> (default: teabox_depth.xml)] [--model_color <object.cao> (default: teabox.cao)] [--model_depth <object.cao> (default: teabox.cao)] [--init_file <object.init> (default: teabox.init)] [--extrinsics <depth to color transformation> (default: depth_M_color.txt)] [--disable_depth] [--display_ground_truth] [--click] [--first_frame_index <index> (default: 1)]

Default parameters allow to run the binary with the data provided in ViSP. Just run:

$ ./tutorial-mb-generic-tracker-rgbd-blender

The next video shows the results that you should obtain:

Next tutorial

You are now ready to see the next Tutorial: Template tracking.

Table of Contents