Visual Servoing Platform  version 3.2.0 under development (2019-01-17)
Tutorial: How to use Blender to generate simulated data for model-based tracking experiments

Introduction

This tutorial will show how to use Blender, a free and open source 3D creation suite, to generate color and depth images from a virtual camera and get ground truth camera poses.

The configuration used for this tutorial is:

  • Ubuntu 16.04
  • Blender 2.79b
Warning
You are advised to know how to use the basic tools of Blender before reading this tutorial. Some non-exhaustive links:

Note that all the material (source code, input video, CAD model or XML settings files) described and used in this tutorial is part of ViSP source code and could be downloaded using the following command:

$ svn export https://github.com/lagadic/visp.git/trunk/tutorial/tracking/model-based/generic-rgbd-blender

Camera settings

In ViSP and in the computer vision community, camera intrinsic parameters are the following:

\[ \begin{bmatrix} p_x & 0 & u_0 \\ 0 & p_y & v_0 \\ 0 & 0 & 1 \end{bmatrix} \]

  • focal length $ \left( p_x, p_y \right) $
  • principal point $ \left( u_0, v_0 \right) $
  • (plus the distortion coefficients)

In Blender, you will have to set the camera resolution, generally 640x480 to have a VGA camera resolution:

img-Blender-camera-settings1.png

Focal length can be set with the camera panel by changing the focal length and/or the sensor size:

img-Blender-camera-settings2.png

The relation is the following:

\[ p_x = \frac{f_{x\text{ in mm}} \times \text{image width in px}}{\text{sensor width in mm}} \]

The Python script to get the camera intrinsic parameters is:

1 import bpy
2 from mathutils import Matrix
3 
4 # https://blender.stackexchange.com/questions/15102/what-is-blenders-camera-projection-matrix-model/38189#38189
5 def get_calibration_matrix_K_from_blender(camd):
6  f_in_mm = camd.lens
7  scene = bpy.context.scene
8  resolution_x_in_px = scene.render.resolution_x
9  resolution_y_in_px = scene.render.resolution_y
10  scale = scene.render.resolution_percentage / 100
11  sensor_width_in_mm = camd.sensor_width
12  sensor_height_in_mm = camd.sensor_height
13  pixel_aspect_ratio = scene.render.pixel_aspect_x / scene.render.pixel_aspect_y
14  if (camd.sensor_fit == 'VERTICAL'):
15  # the sensor height is fixed (sensor fit is horizontal),
16  # the sensor width is effectively changed with the pixel aspect ratio
17  s_u = resolution_x_in_px * scale / sensor_width_in_mm / pixel_aspect_ratio
18  s_v = resolution_y_in_px * scale / sensor_height_in_mm
19  else: # 'HORIZONTAL' and 'AUTO'
20  # the sensor width is fixed (sensor fit is horizontal),
21  # the sensor height is effectively changed with the pixel aspect ratio
22  pixel_aspect_ratio = scene.render.pixel_aspect_x / scene.render.pixel_aspect_y
23  s_u = resolution_x_in_px * scale / sensor_width_in_mm
24  s_v = resolution_y_in_px * scale * pixel_aspect_ratio / sensor_height_in_mm
25 
26  # Parameters of intrinsic calibration matrix K
27  alpha_u = f_in_mm * s_u
28  alpha_v = f_in_mm * s_v
29  u_0 = resolution_x_in_px*scale / 2
30  v_0 = resolution_y_in_px*scale / 2
31  skew = 0 # only use rectangular pixels
32 
33  K = Matrix(
34  ((alpha_u, skew, u_0),
35  ( 0 , alpha_v, v_0),
36  ( 0 , 0, 1 )))
37  return K
38 
39 if __name__ == "__main__":
40  # Insert your camera name below
41  K = get_calibration_matrix_K_from_blender(bpy.data.objects['Camera'].data)
42  print(K)

On Ubuntu, to run the Python script:

  • launch Blender from a terminal
  • split or switch from 3D View to Text Editor
  • open the Python script file
  • click on Run Script

You should get something similar to:

<Matrix 3x3 (700.0000, 0.0000, 320.0000)
( 0.0000, 700.0000, 240.0000)
( 0.0000, 0.0000, 1.0000)>
Warning
In Blender 2.79b, you may need to switch the Sensor Fit from Auto to Vertical to change the Sensor Height to be compatible with a 4/3 ratio and have $ p_x == p_y $
Note
The principal point is always in the middle of the image here.

Stereo camera

To generate the depth map, add a second camera and set the appropriate parameters to match the desired intrinsic parameters. Then select one camera (the child object) and the other one (the parent object) and hit Ctrl + P to parent them. This way, the two cameras will be linked when you will move the cameras.

img-Blender-stereocameras-settings1.png
Note
The default camera used for rendering should be the one with the black triangle. You can change this with the Scene panel.
img-Blender-stereocameras-settings2.png
img-Blender-stereocameras-settings3.png

Create a teabox

To create a teabox with texture, we download directly a 3D model here. Then, the rough instructions should be:

  • select the teabox object
  • switch to the Texture panel
  • add a new texture and open the image
  • switch to the edit mode
  • switch to the UV/Image Editor and select the image

You should get something similar to this:

img-Blender-texture.png

See here for more information about texture and UV unwrapping.

Create a camera trajectory

This can be done easily:

  • move the stereo cameras at a desired location / orientation
  • hit I then LocRotScale to insert a keyframe at the current frame
  • repeat to perform the desired camera or object movement

Do not forget to add some lights to make the object visible.

Generate the images and the depth maps

Images are generated automatically when rendering the animation (menu Render > Render Animation) and are saved on Ubuntu by default in the /tmp folder. To generate the depth maps, switch to the Compositing screen layout, next to the menu bar:

  • tick Use Nodes and Backdrop
  • add a file output node
  • add a link between the Depth output of the Render Layers node to the File Output node
  • select the OpenEXR file format
img-Blender-depth.png

The easiest thing is to run the animation first with the camera used to generate color images and a second time with the one for the depth maps. To switch the main camera, go to the Scene panel and select the desired camera.

Retrieve the camera poses

The camera poses can be retrieved using the following Python script:

1 import bpy
2 import os
3 from mathutils import *
4 
5 prefix_pose = "/tmp/camera_poses/"
6 prefix_image = "/tmp/images/"
7 
8 def get_camera_pose(cameraName, objectName, scene, frameNumber):
9  if not os.path.exists(prefix_pose):
10  os.makedirs(prefix_pose)
11 
12  # OpenGL to Computer vision camera frame convention
13  M = Matrix().to_4x4()
14  M[1][1] = -1
15  M[2][2] = -1
16 
17  cam = bpy.data.objects[cameraName]
18  object_pose = bpy.data.objects[objectName].matrix_world
19 
20  #Normalize orientation with respect to the scale
21  object_pose_normalized = object_pose.copy()
22  object_orientation_normalized = object_pose_normalized.to_3x3().normalized()
23  for i in range(3):
24  for j in range(3):
25  object_pose_normalized[i][j] = object_orientation_normalized[i][j]
26 
27  camera_pose = M*cam.matrix_world.inverted()*object_pose_normalized
28  print("camera_pose:\n", camera_pose)
29 
30  filename = prefix_pose + cameraName + "_%03d" % frameNumber + ".txt"
31  with open(filename, 'w') as f:
32  f.write(str(camera_pose[0][0]) + " ")
33  f.write(str(camera_pose[0][1]) + " ")
34  f.write(str(camera_pose[0][2]) + " ")
35  f.write(str(camera_pose[0][3]) + " ")
36  f.write("\n")
37 
38  f.write(str(camera_pose[1][0]) + " ")
39  f.write(str(camera_pose[1][1]) + " ")
40  f.write(str(camera_pose[1][2]) + " ")
41  f.write(str(camera_pose[1][3]) + " ")
42  f.write("\n")
43 
44  f.write(str(camera_pose[2][0]) + " ")
45  f.write(str(camera_pose[2][1]) + " ")
46  f.write(str(camera_pose[2][2]) + " ")
47  f.write(str(camera_pose[2][3]) + " ")
48  f.write("\n")
49 
50  f.write(str(camera_pose[3][0]) + " ")
51  f.write(str(camera_pose[3][1]) + " ")
52  f.write(str(camera_pose[3][2]) + " ")
53  f.write(str(camera_pose[3][3]) + " ")
54  f.write("\n")
55 
56  return
57 
58 
59 def my_handler(scene):
60  frameNumber = scene.frame_current
61  print("\n\nFrame Change", scene.frame_current)
62  get_camera_pose("Camera", "tea_box_02", scene, frameNumber)
63 
64 step_count = 250
65 scene = bpy.context.scene
66 for step in range(1, step_count):
67  # Set render frame
68  scene.frame_set(step)
69 
70  # Set filename and render
71  if not os.path.exists(prefix_image):
72  os.makedirs(prefix_image)
73  scene.render.filepath = (prefix_image + '%04d.png') % step
74  bpy.ops.render.render( write_still=True )
75 
76  my_handler(scene)

This script will also automatically generate and save the animation images and write the corresponding camera poses.

Note
Data are saved in the /tmp/ directory by default and the path should be changed accordingly depending on the OS. Camera and object names should also be changed accordingly.

Model-based tracker on simulated data

Source code

Since depth data are stored in OpenEXR file format, OpenCV is used for the reading. The following C++ sample file also available in tutorial-mb-generic-tracker-rgbd-blender.cpp reads color and depth images, pointcloud is recreated using the depth camera intrinsic parameters and the ground truth data are read and printed along with the estimated camera pose from the model-based tracker.

#include <iostream>
#include <visp3/core/vpDisplay.h>
#include <visp3/core/vpIoTools.h>
#include <visp3/io/vpImageIo.h>
#include <visp3/gui/vpDisplayX.h>
#include <visp3/gui/vpDisplayGDI.h>
#include <visp3/gui/vpDisplayOpenCV.h>
#include <visp3/mbt/vpMbGenericTracker.h>
#if (VISP_HAVE_OPENCV_VERSION >= 0x020403) && defined(VISP_HAVE_XML2)
namespace {
bool read_data(unsigned int cpt, const std::string &input_directory, vpImage<unsigned char> &I,
vpImage<uint16_t> &I_depth_raw, unsigned int &depth_width, unsigned int &depth_height,
std::vector<vpColVector> &pointcloud, const vpCameraParameters &cam,
vpHomogeneousMatrix &cMo_ground_truth)
{
char buffer[FILENAME_MAX];
// Read color
std::stringstream ss;
ss << input_directory << "/images/%04d.jpg";
sprintf(buffer, ss.str().c_str(), cpt);
std::string filename_img = buffer;
if (!vpIoTools::checkFilename(filename_img)) {
std::cerr << "Cannot read: " << filename_img << std::endl;
return false;
}
vpImageIo::read(I, filename_img);
// Read depth
ss.str("");
ss << input_directory << "/depth/Image%04d.exr";
sprintf(buffer, ss.str().c_str(), cpt);
std::string filename_depth = buffer;
cv::Mat depth_raw = cv::imread(filename_depth, cv::IMREAD_ANYDEPTH | cv::IMREAD_ANYCOLOR);
if (depth_raw.empty()) {
std::cerr << "Cannot read: " << filename_depth << std::endl;
return false;
}
depth_width = static_cast<unsigned int>(depth_raw.cols);
depth_height = static_cast<unsigned int>(depth_raw.rows);
I_depth_raw.resize(depth_height, depth_width);
pointcloud.resize(depth_width*depth_height);
for (int i = 0; i < depth_raw.rows; i++) {
for (int j = 0; j < depth_raw.cols; j++) {
I_depth_raw[i][j] = static_cast<uint16_t>(32767.5f * depth_raw.at<cv::Vec3f>(i, j)[0]);
double x = 0.0, y = 0.0;
// Manually limit the field of view of the depth camera
double Z = depth_raw.at<cv::Vec3f>(i, j)[0] > 2.0f ? 0.0 : static_cast<double>(depth_raw.at<cv::Vec3f>(i, j)[0]);
size_t idx = static_cast<size_t>(i*depth_raw.cols + j);
pointcloud[idx].resize(3);
pointcloud[idx][0] = x*Z;
pointcloud[idx][1] = y*Z;
pointcloud[idx][2] = Z;
}
}
// Read ground truth
ss.str("");
ss << input_directory << "/camera_poses/Camera_%03d.txt";
sprintf(buffer, ss.str().c_str(), cpt);
std::string filename_pose = buffer;
std::ifstream f_pose;
f_pose.open(filename_pose);
if (!f_pose.is_open()) {
std::cerr << "Cannot read: " << filename_pose << std::endl;
return false;
}
cMo_ground_truth.load(f_pose);
return true;
}
}
int main(int argc, char *argv[])
{
std::string input_directory = "."; // location of the data (images, depth maps, camera poses)
std::string config_color = "teabox.xml", config_depth = "teabox_depth.xml";
std::string model_color = "teabox.cao", model_depth = "teabox.cao";
std::string init_file = "teabox.init";
std::string extrinsic_file = "depth_M_color.txt";
unsigned int first_frame_index = 1;
bool disable_depth = false;
bool display_ground_truth = false;
bool click = false;
for (int i = 1; i < argc; i++) {
if (std::string(argv[i]) == "--input_directory" && i + 1 < argc) {
input_directory = std::string(argv[i + 1]);
} else if (std::string(argv[i]) == "--config_color" && i + 1 < argc) {
config_color = std::string(argv[i + 1]);
} else if (std::string(argv[i]) == "--config_depth" && i + 1 < argc) {
config_depth = std::string(argv[i + 1]);
} else if (std::string(argv[i]) == "--model_color" && i + 1 < argc) {
model_color = std::string(argv[i + 1]);
} else if (std::string(argv[i]) == "--model_depth" && i + 1 < argc) {
model_depth = std::string(argv[i + 1]);
} else if (std::string(argv[i]) == "--init_file" && i + 1 < argc) {
init_file = std::string(argv[i + 1]);
} else if (std::string(argv[i]) == "--extrinsics" && i + 1 < argc) {
extrinsic_file = std::string(argv[i + 1]);
} else if (std::string(argv[i]) == "--disable_depth") {
disable_depth = true;
} else if (std::string(argv[i]) == "--display_ground_truth") {
display_ground_truth = true;
} else if (std::string(argv[i]) == "--click") {
click = true;
} else if (std::string(argv[i]) == "--first_frame_index" && i+1 < argc) {
first_frame_index = static_cast<unsigned int>(atoi(argv[i+1]));
}
else if (std::string(argv[i]) == "--help" || std::string(argv[i]) == "-h") {
std::cout << "Usage: \n" << argv[0] << " [--input_directory <data directory> (default: .)]"
" [--config_color <object.xml> (default: teabox.xml)] [--config_depth <object.xml> (default: teabox_depth.xml)]"
" [--model_color <object.cao> (default: teabox.cao)] [--model_depth <object.cao> (default: teabox.cao)]"
" [--init_file <object.init> (default: teabox.init)]"
" [--extrinsics <depth to color transformation> (default: depth_M_color.txt)] [--disable_depth]"
" [--display_ground_truth] [--click] [--first_frame_index <index> (default: 1)]" << std::endl;
return EXIT_SUCCESS;
}
}
std::cout << "input_directory: " << input_directory << std::endl;
std::cout << "config_color: " << config_color << std::endl;
std::cout << "config_depth: " << config_depth << std::endl;
std::cout << "model_color: " << model_color << std::endl;
std::cout << "model_depth: " << model_depth << std::endl;
std::cout << "init_file: " << model_depth << std::endl;
std::cout << "extrinsic_file: " << extrinsic_file << std::endl;
std::cout << "first_frame_index: " << first_frame_index << std::endl;
std::cout << "disable_depth: " << disable_depth << std::endl;
std::cout << "display_ground_truth: " << display_ground_truth << std::endl;
std::cout << "click: " << click << std::endl;
std::vector<int> tracker_types;
if (!disable_depth)
tracker_types.push_back(vpMbGenericTracker::DEPTH_DENSE_TRACKER);
vpMbGenericTracker tracker(tracker_types);
if (!disable_depth)
tracker.loadConfigFile(config_color, config_depth);
else
tracker.loadConfigFile(config_color);
tracker.loadModel(model_color);
vpCameraParameters cam_color, cam_depth;
if (!disable_depth)
tracker.getCameraParameters(cam_color, cam_depth);
else
tracker.getCameraParameters(cam_color);
tracker.setDisplayFeatures(true);
std::cout << "cam_color:\n" << cam_color << std::endl;
std::cout << "cam_depth:\n" << cam_depth << std::endl;
vpImage<uint16_t> I_depth_raw;
unsigned int depth_width = 0, depth_height = 0;
std::vector<vpColVector> pointcloud;
vpHomogeneousMatrix cMo_ground_truth;
unsigned int frame_cpt = first_frame_index;
read_data(frame_cpt, input_directory, I, I_depth_raw, depth_width, depth_height, pointcloud, cam_depth, cMo_ground_truth);
vpImageConvert::createDepthHistogram(I_depth_raw, I_depth);
#if defined(VISP_HAVE_X11)
vpDisplayX d1, d2;
#elif defined(VISP_HAVE_GDI)
vpDisplayGDI d1, d2;
#else
#endif
d1.init(I, 0, 0, "Color image");
d2.init(I_depth, static_cast<int>(I.getWidth()), 0, "Depth image");
vpHomogeneousMatrix depthMcolor;
if (!disable_depth) {
std::ifstream f_extrinsics;
f_extrinsics.open(extrinsic_file);
depthMcolor.load(f_extrinsics);
tracker.setCameraTransformationMatrix("Camera2", depthMcolor);
std::cout << "depthMcolor:\n" << depthMcolor << std::endl;
}
if (display_ground_truth) {
tracker.initFromPose(I, cMo_ground_truth); //I and I_depth must be the same size when using depth features!
} else
tracker.initClick(I, init_file, true); //I and I_depth must be the same size when using depth features!
try {
bool quit = false;
while (!quit && read_data(frame_cpt, input_directory, I, I_depth_raw, depth_width, depth_height, pointcloud, cam_depth, cMo_ground_truth)) {
vpImageConvert::createDepthHistogram(I_depth_raw, I_depth);
if (display_ground_truth) {
tracker.initFromPose(I, cMo_ground_truth); //I and I_depth must be the same size when using depth features!
} else {
if (!disable_depth) {
std::map<std::string, const vpImage<unsigned char> *> mapOfImages;
std::map<std::string, const std::vector<vpColVector> *> mapOfPointClouds;
std::map<std::string, unsigned int> mapOfPointCloudWidths;
std::map<std::string, unsigned int> mapOfPointCloudHeights;
mapOfImages["Camera1"] = &I;
mapOfPointClouds["Camera2"] = &pointcloud;
mapOfPointCloudWidths["Camera2"] = depth_width;
mapOfPointCloudHeights["Camera2"] = depth_height;
tracker.track(mapOfImages, mapOfPointClouds, mapOfPointCloudWidths, mapOfPointCloudHeights);
} else
tracker.track(I);
}
vpHomogeneousMatrix cMo = tracker.getPose();
std::cout << "\nFrame: " << frame_cpt << std::endl;
if (!display_ground_truth)
std::cout << "cMo:\n" << cMo << std::endl;
std::cout << "cMo ground truth:\n" << cMo_ground_truth << std::endl;
if (!disable_depth) {
tracker.display(I, I_depth, cMo, depthMcolor*cMo, cam_color, cam_depth, vpColor::red, 2);
vpDisplay::displayFrame(I_depth, depthMcolor*cMo, cam_depth, 0.05, vpColor::none, 2);
}
else
tracker.display(I, cMo, cam_color, vpColor::red, 2);
vpDisplay::displayFrame(I, cMo, cam_color, 0.05, vpColor::none, 2);
std::ostringstream oss;
oss << "Frame: " << frame_cpt;
vpDisplay::displayText(I, 20, 20, oss.str(), vpColor::red);
if (!display_ground_truth) {
oss.str("");
oss << "Nb features: " << tracker.getError().getRows();
vpDisplay::displayText(I, 40, 20, oss.str(), vpColor::red);
}
vpDisplay::flush(I_depth);
if (vpDisplay::getClick(I, button, click)) {
switch (button) {
quit = !click;
break;
click = !click;
break;
default:
break;
}
}
frame_cpt++;
}
vpDisplay::displayText(I, 20, 20, "Click to quit.", vpColor::red);
} catch (std::exception& e) {
std::cerr << "Catch exception: " << e.what() << std::endl;
}
return EXIT_SUCCESS;
}
#else
int main()
{
std::cout << "To run this tutorial, ViSP should be built with OpenCV and libXML2 libraries." << std::endl;
return EXIT_SUCCESS;
}
#endif
Note
Here the depth values are manually clipped in order to simulate the depth range of a depth sensor. This probably can be done directly in Blender.

Usage on simulated data

Once build, to get tutorial-mb-generic-tracker-rgbd-blender.cpp usage, just run:

$ ./tutorial-mb-generic-tracker-rgbd-blender -h
./tutorial-mb-generic-tracker-rgbd-blender [--input_directory <data directory> (default: .)] [--config_color <object.xml> (default: teabox.xml)] [--config_depth <object.xml> (default: teabox_depth.xml)] [--model_color <object.cao> (default: teabox.cao)] [--model_depth <object.cao> (default: teabox.cao)] [--init_file <object.init> (default: teabox.init)] [--extrinsics <depth to color transformation> (default: depth_M_color.txt)] [--disable_depth] [--display_ground_truth] [--click] [--first_frame_index <index> (default: 1)]

Default parameters allow to run the binary with the data provided in ViSP. Just run:

$ ./tutorial-mb-generic-tracker-rgbd-blender

The next video shows the results that you should obtain:

Next tutorial

You are now ready to see the next Tutorial: Template tracking.