Introduction
This tutorial shows how to run object detection inference using NVIDIA TensorRT inference SDK.
For this tutorial, you'll need ssd_mobilenet.onnx
pre-trained model, and pascal-voc-labels.txt
label's file containing the corresponding labels. These files can be found in visp-images dataset.
Note that all the material (source code and network mode) described in this tutorial is part of ViSP source code (in tutorial/detection/dnn
folder) and could be found in https://github.com/lagadic/visp/tree/master/tutorial/detection/dnn.
Before running this tutorial, you need to install:
- CUDA (version 10.2 or higher)
- cuDNN (version compatible with your CUDA version)
- TensorRT (version 7.1 or higher)
- OpenCV built from source (version 4.5.2 or higher)
Installation instructions are provided in Prerequisites section.
The tutorial was tested on multiple hardwares of NVIDIA. The following table details the versions of CUDA and TensorRT used for each GPU:
NVIDIA hardware | OS | CUDA | TensorRT | CuDNN |
Jetson TX2 | Ubuntu 18.04 (JetPack 4.4) | 10.2 | 7.1.3 | 8.0 |
GeForce GTX 1080 | Ubuntu 16.04 | 11.0 | 8.0 GA | 8.0 |
Quadro RTX 6000 | Ubuntu 18.04 | 11.3 | 8.0 GA Update 1 | 8.2 |
- Note
- Issues were encountered when using TensorRT 8.2 EA with CUDA 11.3 on NVIDIA Quadro RTX 6000, the tutorial didn't work as expected. There were plenty of bounding boxes in any given image.
Prerequisites
Install CUDA
CUDA is a parallel computing platform and programming model invented by NVIDIA.
- To know if CUDA NVidia driver is already installed on your machine, on Ubuntu you can use
nvidia-smi
$ nvidia-smi | grep CUDA
| NVIDIA-SMI 465.27 Driver Version: 465.27 CUDA Version: 11.3 |
Here the output shows that CUDA NVidia driver version 11.3 is installed.
- To know if CUDA toolkit is installed, run:
$ cat /usr/local/cuda/version.{txt,json}
"cuda" : {
"name" : "CUDA SDK",
"version" : "11.3.20210326"
},
Here it shows that CUDA toolkit 11.3 is installed. - Note
- We recommend that NVidia CUDA Driver and CUDA Toolkit have the same version.
- To install NVidia CUDA Driver and Toolkit on your machine, please follow this step-by-step guide.
Install cuDNN
Installation instructions are provided here.
For example, when downloading "cuDNN Runtime Library for Ubuntu18.04 x86_64 (Deb)", you can install it running:
$ sudo dpkg -i libcudnn8_8.2.0.53-1+cuda11.3_amd64.deb
Install TensorRT
TensorRT is a C++ library that facilitates high-performance inference on NVIDIA GPUs. To download and install TensorRT, please follow this step-by-step guide.
Let us consider the installation of TensorRT 8.0 GA Update 1 for x86_64 Architecture
. In that case you need to download "TensorRT 8.0 GA Update 1 for Linux x86_64 and CUDA 11.0, CUDA 11.1, CUDA 11.2, 11.3" TAR Package and extract its content in VISP_WS
.
$ ls $VISP_WS
TensorRT-8.0.3.4 ...
Following the installation instructions:
- Add the absolute path to the TensorRTlib directory to the environment variable LD_LIBRARY_PATH:
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$VISP_WS/TensorRT-8.0.3.4/lib
- Install the Python TensorRT wheel file.
$ sudo apt-get install python3-pip
$ cd $VISP_WS/TensorRT-8.0.3.4/python
$ python3 -m pip install tensorrt-8.0.3.4-cp36-none-linux_x86_64.whl
- Install the Python UFF wheel file. This is only required if you plan to use TensorRT with TensorFlow.
$ cd $VISP_WS/TensorRT-8.0.3.4/uff
$ python3 -m pip install uff-0.6.9-py2.py3-none-any.whl
- Install the Python graphsurgeon wheel file.
$ cd $VISP_WS/TensorRT-8.0.3.4/graphsurgeon
$ python3 -m pip install graphsurgeon-0.4.5-py2.py3-none-any.whl
- Install the Python onnx-graphsurgeon wheel file.
$ cd $VISP_WS/TensorRT-8.0.3.4/onnx_graphsurgeon
$ python3 -m pip install onnx_graphsurgeon-0.3.10-py2.py3-none-any.whl
Install OpenCV from source
To be able to run the tutorial, you should install OpenCV from source, since some extra modules are required (cudev
, cudaarithm
and cudawarping
are not included in libopencv-contrib-dev
package). To do so, proceed as follows:
Build ViSP with TensorRT support
Next step is here to build ViSP from source enabling TensorRT support. As described in Get ViSP source code, we suppose here that you have ViSP source code in ViSP workspace folder: $VISP_WS
. If you follow Prerequisites, you should also find TensorRT and OpenCV in the same workspace.
$ ls $VISP_WS
visp opencv TensorRT-8.0.3.4
Now to ensure that ViSP is build TensorRT, create and enter build folder before configuring ViSP with TensorRT and OpenCV path
-DTENSORRT_DIR=$VISP_WS/TensorRT-8.0.3.4 \
-DOpenCV_DIR=$VISP_WS/opencv/install/lib/cmake/opencv4
Tutorial description
In the following section is a detailed description of the tutorial. The complete source code is available in tutorial-dnn-tensorrt-live.cpp file.
Include header files
Include header files for required extra modules to handle CUDA.
#include <opencv2/core/cuda.hpp>
#include <opencv2/cudaarithm.hpp>
#include <opencv2/cudawarping.hpp>
#include <opencv2/dnn.hpp>
Include cuda_runtime_api.h
header file that defines the public host functions and types for the CUDA runtime API.
#include <cuda_runtime_api.h>
Include TensorRT header files. NvInfer.h
is the top-level API file for TensorRT. NvOnnxParser.h
is the API for the ONNX Parser.
#include <NvInfer.h>
#include <NvOnnxParser.h>
Pre-processing
Prepare input image for inference with OpenCV. First, upload image to GPU, resize it to match model's input dimensions, normalize with meanR
meanG
meanB
being the values used for mean subtraction. Transform data to tensor (copy data to channel by channel to gpu_input
). In the case of ssd_mobilenet.onnx
, the input dimension is 1x3x300x300.
void preprocessImage(cv::Mat &img, float *gpu_input, const nvinfer1::Dims &dims, float meanR, float meanG, float meanB)
{
if (img.empty()) {
std::cerr << "Image is empty." << std::endl;
return;
}
cv::cuda::GpuMat gpu_frame;
gpu_frame.upload(img);
auto input_width = dims.d[3];
auto input_height = dims.d[2];
auto channels = dims.d[1];
auto input_size = cv::Size(input_width, input_height);
cv::cuda::GpuMat resized;
cv::cuda::resize(gpu_frame, resized, input_size, 0, 0, cv::INTER_NEAREST);
cv::cuda::GpuMat flt_image;
resized.convertTo(flt_image, CV_32FC3);
cv::cuda::subtract(flt_image, cv::Scalar(meanR, meanG, meanB), flt_image, cv::noArray(), -1);
cv::cuda::divide(flt_image, cv::Scalar(127.5f, 127.5f, 127.5f), flt_image, 1, -1);
std::vector<cv::cuda::GpuMat> chw;
for (int i = 0; i < channels; ++i)
chw.emplace_back(cv::cuda::GpuMat(input_size, CV_32FC1, gpu_input + i * input_width * input_height));
cv::cuda::split(flt_image, chw);
}
Post-processing
After running the inference, depending on the model used, you will get different results dimensions on the output. These results should be post processed. In the case of ssd_mobilenet.onnx
, there is 2 outputs:
scores
of dimension : 1x3000x21
boxes
of dimension : 1x3000x4
In fact, the model will output 3000 guesses of boxes (bounding boxes) with 21 scores each (1 score for each class). The result of the inference being on the GPU, we should first proceed by copying it to the CPU. Post processing consists of filtering the predictions where we're not sure about the class detected and then merging multiple detections that can occur approximately at the same locations. confThresh
is the confidence threshold used to filter the detections after inference. nmsThresh
is the Non-Maximum Threshold. It is used to merge multiple detections being in the same location approximately.
std::vector<cv::Rect> postprocessResults(std::vector<void *> buffers, const std::vector<nvinfer1::Dims> &output_dims,
int batch_size, int image_width, int image_height, float confThresh,
float nmsThresh, std::vector<int> &classIds)
{
std::vector<cv::Rect> m_boxes, m_boxesNMS;
std::vector<int> m_classIds;
std::vector<float> m_confidences;
std::vector<int> m_indices;
std::vector<std::vector<float> > cpu_outputs;
for (size_t i = 0; i < output_dims.size(); i++) {
cpu_outputs.push_back(std::vector<float>(getSizeByDim(output_dims[i]) * batch_size));
cudaMemcpy(cpu_outputs[i].data(), (float *)buffers[1 + i], cpu_outputs[i].size() * sizeof(float),
cudaMemcpyDeviceToHost);
}
int N = output_dims[0].d[1], C = output_dims[0].d[2];
for (int i = 0; i < N; i++)
{
uint32_t maxClass = 0;
float maxScore = -1000.0f;
for (int j = 1; j < C; j++)
{
const float score = cpu_outputs[0][i * C + j];
if (score < confThresh)
continue;
if (score > maxScore) {
maxScore = score;
maxClass = j;
}
}
if (maxScore > confThresh) {
int left = (int)(cpu_outputs[1][4 * i] * image_width);
int top = (int)(cpu_outputs[1][4 * i + 1] * image_height);
int right = (int)(cpu_outputs[1][4 * i + 2] * image_width);
int bottom = (int)(cpu_outputs[1][4 * i + 3] * image_height);
int width = right - left + 1;
int height = bottom - top + 1;
m_boxes.push_back(cv::Rect(left, top, width, height));
m_classIds.push_back(maxClass);
m_confidences.push_back(maxScore);
}
}
cv::dnn::NMSBoxes(m_boxes, m_confidences, confThresh, nmsThresh, m_indices);
m_boxesNMS.resize(m_indices.size());
for (size_t i = 0; i < m_indices.size(); ++i) {
int idx = m_indices[i];
m_boxesNMS[i] = m_boxes[idx];
}
classIds = m_classIds;
return m_boxesNMS;
}
Parse ONNX Model
Parse ONNX model.
bool parseOnnxModel(const std::string &model_path, TRTUniquePtr<nvinfer1::ICudaEngine> &engine,
TRTUniquePtr<nvinfer1::IExecutionContext> &context)
model_path
is the path to onnx file.
engine
is used for executing inference on a built network.
context
is used for executing inference.
To parse ONNX model, we should first proceed by initializing TensorRT Context and Engine. To do this, we should create an instance of Builder. With Builder, we can create Network that can create the Parser.
If we already have the GPU inference engine loaded once, it will be serialized and saved in a cache file (with .engine extension). In this case, the engine file will be loaded, then inference runtime created, engine and context loaded.
char *engineStream = nullptr;
size_t engineSize = 0;
struct stat filestat;
stat(cache_path, &filestat);
engineSize = filestat.st_size;
engineStream = (char *)malloc(engineSize);
FILE *cacheFile = nullptr;
cacheFile = fopen(cache_path, "rb");
const size_t bytesRead = fread(engineStream, 1, engineSize, cacheFile);
if (bytesRead != engineSize)
{
std::cerr << "Error reading serialized engine into memory." << std::endl;
return false;
}
fclose(cacheFile);
TRTUniquePtr<nvinfer1::IRuntime> infer { nvinfer1::createInferRuntime(gLogger) };
engine.reset(infer->deserializeCudaEngine(engineStream, engineSize, nullptr));
context.reset(engine->createExecutionContext());
return true;
}
Otherwise, we should parse the ONNX model (for the first time only), create an instance of builder. The builder can be configured to select the amount of GPU memory to be used for tactic selection or FP16/INT8 modes. Create engine and context to be used in the main pipeline, and serialize and save the engine for later use.
else {
std::cerr << "Could not parse ONNX model. File not found" << std::endl;
return false;
}
TRTUniquePtr<nvinfer1::IBuilder> builder { nvinfer1::createInferBuilder(gLogger) };
TRTUniquePtr<nvinfer1::INetworkDefinition> network {
builder->createNetworkV2(1U << (uint32_t)nvinfer1::NetworkDefinitionCreationFlag::kEXPLICIT_BATCH) };
TRTUniquePtr<nvonnxparser::IParser> parser { nvonnxparser::createParser(*network, gLogger) };
if (!parser->parseFromFile(model_path.c_str(), static_cast<int>(nvinfer1::ILogger::Severity::kINFO))) {
std::cerr << "ERROR: could not parse the model." << std::endl;
return false;
}
TRTUniquePtr<nvinfer1::IBuilderConfig> config { builder->createBuilderConfig() };
config->setMaxWorkspaceSize(32 << 20);
if (builder->platformHasFastFp16()) {
config->setFlag(nvinfer1::BuilderFlag::kFP16);
}
builder->setMaxBatchSize(1);
engine.reset(builder->buildEngineWithConfig(*network, *config));
context.reset(engine->createExecutionContext());
TRTUniquePtr<nvinfer1::IHostMemory> serMem { engine->serialize() };
if (!serMem) {
std::cout << "Failed to serialize CUDA engine." << std::endl;
return false;
}
const char *serData = (char *)serMem->data();
const size_t serSize = serMem->size();
char *engineMemory = (char *)malloc(serSize);
if (!engineMemory) {
std::cout << "Failed to allocate memory to store CUDA engine." << std::endl;
return false;
}
memcpy(engineMemory, serData, serSize);
FILE *cacheFile = nullptr;
cacheFile = fopen(cache_path, "wb");
fwrite(engineMemory, 1, serSize, cacheFile);
fclose(cacheFile);
return true;
}
Main pipeline
Start by parsing the model and creating engine and context.
TRTUniquePtr<nvinfer1::ICudaEngine> engine { nullptr };
TRTUniquePtr<nvinfer1::IExecutionContext> context { nullptr };
if (!parseOnnxModel(model_path, engine, context))
{
std::cout << "Make sure the model file exists. To see available models, plese visit: "
"\n\twww.github.com/lagadic/visp-images/dnn/object_detection/"
<< std::endl;
return EXIT_FAILURE;
}
Using engine, we can get the dimensions of the input and outputs, and create buffers respectively.
for (int i = 0; i < engine->getNbBindings(); ++i) {
auto binding_size = getSizeByDim(engine->getBindingDimensions(i)) * batch_size * sizeof(float);
cudaMalloc(&buffers[i], binding_size);
if (engine->bindingIsInput(i)) {
input_dims.emplace_back(engine->getBindingDimensions(i));
}
else {
output_dims.emplace_back(engine->getBindingDimensions(i));
}
}
if (input_dims.empty() || output_dims.empty()) {
std::cerr << "Expect at least one input and one output for network" << std::endl;
return EXIT_FAILURE;
}
Create a grabber to retrieve image from webcam (or external camera) or read images from image or video.
cv::VideoCapture capture;
if (input.empty()) {
capture.open(opt_device);
}
else {
capture.open(input);
}
if (!capture.isOpened()) {
std::cout << "Failed to open the camera" << std::endl;
return EXIT_FAILURE;
}
int cap_width = (int)capture.get(cv::CAP_PROP_FRAME_WIDTH);
int cap_height = (int)capture.get(cv::CAP_PROP_FRAME_HEIGHT);
capture.set(cv::CAP_PROP_FRAME_WIDTH, cap_width / opt_scale);
capture.set(cv::CAP_PROP_FRAME_HEIGHT, cap_height / opt_scale);
- Capture a new frame from the grabber,
- Convert this frame to vpImage used for display,
- Call preprocessImage() function to copy the
frame
to GPU and store in input
buffer,
- Perform inference with context->enqueue(),
- Call postprocessResults() function to filter the outputs,
- Display the image with the bounding boxes.
capture >> frame;
preprocessImage(frame, (float *)buffers[0], input_dims[0], meanR, meanG, meanB);
context->enqueue(batch_size, buffers.data(), 0, nullptr);
boxesNMS = postprocessResults(buffers, output_dims, batch_size, width, height, confThresh, nmsThresh, classIds);
for (unsigned int i = 0; i < boxesNMS.size(); i++) {
}
}
static bool getClick(const vpImage< unsigned char > &I, bool blocking=true)
static void display(const vpImage< unsigned char > &I)
static void flush(const vpImage< unsigned char > &I)
static void displayRectangle(const vpImage< unsigned char > &I, const vpImagePoint &topLeft, unsigned int width, unsigned int height, const vpColor &color, bool fill=false, unsigned int thickness=1)
static void displayText(const vpImage< unsigned char > &I, const vpImagePoint &ip, const std::string &s, const vpColor &color)
static void convert(const vpImage< unsigned char > &src, vpImage< vpRGBa > &dest)
Defines a rectangle in the plane.
VISP_EXPORT double measureTimeMs()
Usage
To use this tutorial, you need an USB webcam and you should have downloaded an onnx file of a model with its corresponding labels in txt file format. To start, you may download the ssd_mobilenet.onnx model and pascal-voc-labels.txt** file from here or install Install ViSP data set cloning Github repository.
To see the options, run:
$ ./tutorial-dnn-tensorrt-live --help
Consider you downloaded the files (model and labels), to run object detection on images from webcam, run:
$ ./tutorial-dnn-tensorrt-live --model ssd_mobilenet.onnx --labels pascal-voc-labels.txt
Running the above example on an image will show results like the following:
An example of the object detection can be viewed in this video.