Visual Servoing Platform  version 3.4.0
Tutorial: Face detection

Introduction

This tutorial shows how to detect one or more faces with ViSP. Face detection is performed using OpenCV Haar cascade capabilities that are used in vpDetectorFace class. At least OpenCV 2.2.0 or a more recent version is requested.

In the next sections you will find examples that show how to detect faces in a video, or in images acquired by a camera connected to your computer.

Note that all the material (source code and video) described in this tutorial is part of ViSP source code and could be downloaded using the following command:

$ svn export https://github.com/lagadic/visp.git/trunk/tutorial/detection/face

Face detection in a video

The following example also available in tutorial-face-detector.cpp allows to detect faces in an mpeg video located near the source code. The Haar cascade classifier file requested by OpenCV is also provided in the same folder as the source code.

#include <visp3/gui/vpDisplayGDI.h>
#include <visp3/gui/vpDisplayOpenCV.h>
#include <visp3/gui/vpDisplayX.h>
#include <visp3/detection/vpDetectorFace.h>
#include <visp3/io/vpVideoReader.h>
int main(int argc, const char *argv[])
{
#if (VISP_HAVE_OPENCV_VERSION >= 0x020200) && defined(VISP_HAVE_OPENCV_OBJDETECT)
try {
std::string opt_face_cascade_name = "./haarcascade_frontalface_alt.xml";
std::string opt_video = "video.mp4";
for (int i = 0; i < argc; i++) {
if (std::string(argv[i]) == "--haar")
opt_face_cascade_name = std::string(argv[i + 1]);
else if (std::string(argv[i]) == "--video")
opt_video = std::string(argv[i + 1]);
else if (std::string(argv[i]) == "--help" || std::string(argv[i]) == "-h") {
std::cout << "Usage: " << argv[0]
<< " [--haar <haarcascade xml filename>] [--video <input video file>]"
<< " [--help] [-h]"
<< std::endl;
return 0;
}
}
g.setFileName(opt_video);
g.open(I);
#if defined(VISP_HAVE_X11)
vpDisplayX d(I);
#elif defined(VISP_HAVE_GDI)
#elif defined(VISP_HAVE_OPENCV)
#endif
vpDisplay::setTitle(I, "ViSP viewer");
vpDetectorFace face_detector;
face_detector.setCascadeClassifierFile(opt_face_cascade_name);
bool exit_requested = false;
while (!g.end() && !exit_requested) {
g.acquire(I);
bool face_found = face_detector.detect(I);
if (face_found) {
std::ostringstream text;
text << "Found " << face_detector.getNbObjects() << " face(s)";
vpDisplay::displayText(I, 10, 10, text.str(), vpColor::red);
for (size_t i = 0; i < face_detector.getNbObjects(); i++) {
vpRect bbox = face_detector.getBBox(i);
vpDisplay::displayText(I, (int)bbox.getTop() - 10, (int)bbox.getLeft(),
"Message: \"" + face_detector.getMessage(i) + "\"", vpColor::red);
}
}
vpDisplay::displayText(I, (int)I.getHeight() - 25, 10, "Click to quit...", vpColor::red);
if (vpDisplay::getClick(I, false)) // a click to exit
exit_requested = true;
}
if (!exit_requested)
} catch (const vpException &e) {
std::cout << e.getMessage() << std::endl;
}
#else
(void)argc;
(void)argv;
#endif
}

To detect the faces just run:

$ ./tutorial-face-detector

You will get the following result:

Now we explain the main lines of the source.

First we have to include the header of the class that allows to detect a face.

#include <visp3/detection/vpDetectorFace.h>

Then in the main() function before going further we need to check if OpenCV 2.2.0 is available.

#if (VISP_HAVE_OPENCV_VERSION >= 0x020200) && defined(VISP_HAVE_OPENCV_OBJDETECT)

We set then the default input data:

  • the name of the Haar cascade classifier file "haarcascade_frontalface_alt.xml"
  • the name of the input video "video.mpeg"
std::string opt_face_cascade_name = "./haarcascade_frontalface_alt.xml";
std::string opt_video = "video.mp4";

With command line options it is possible to use other inputs. To know how just run:

$ ./tutorial-face-detector --help
Usage: ./tutorial-face-detector [--haar <haarcascade xml filename>] [--video <input video file>] [--help]

Then we open the video stream, create a windows named "ViSP viewer" where images and the resulting face detection will be displayed.

The creation of the face detector is performed using

vpDetectorFace face_detector;

We need also to set the location and name of the xml file that contains the Haar cascade classifier data used to recognized a face.

face_detector.setCascadeClassifierFile(opt_face_cascade_name);

Then we enter in the while loop where for each new image, the try to detect one or more faces:

bool face_found = face_detector.detect(I);

If a face is detected, vpDetectorFace::detect() returns true. It is then possible to retrieve the number of faces that are detected:

text << "Found " << face_detector.getNbObjects() << " face(s)";

For each face, we have access to its location using vpDetectorFace::getPolygon(), its bounding box using vpDetectorFace::getBBox() and its identifier message using vpDetectorFace::getMessage().

for (size_t i = 0; i < face_detector.getNbObjects(); i++) {
vpRect bbox = face_detector.getBBox(i);
vpDisplay::displayText(I, (int)bbox.getTop() - 10, (int)bbox.getLeft(),
"Message: \"" + face_detector.getMessage(i) + "\"", vpColor::red);
}
Note
When more than one face is detected, faces are ordered from the largest to the smallest. That means that vpDetectorFace::getPolygon(0), vpDetectorFace::getBBox(0) and vpDetectorFace::getMessage(0) return always the characteristics of the largest face.

Face detection from a camera

This other example also available in tutorial-face-detector-live.cpp shows how to detect one or more faces in images acquired by a camera connected to your computer.

#include <visp3/core/vpConfig.h>
#include <visp3/detection/vpDetectorFace.h>
#include <visp3/gui/vpDisplayGDI.h>
#include <visp3/gui/vpDisplayOpenCV.h>
#include <visp3/gui/vpDisplayX.h>
#ifdef VISP_HAVE_MODULE_SENSOR
#include <visp3/sensor/vpV4l2Grabber.h>
#endif
int main(int argc, const char *argv[])
{
#if (VISP_HAVE_OPENCV_VERSION >= 0x020200) && defined(VISP_HAVE_OPENCV_OBJDETECT)
try {
std::string opt_face_cascade_name = "./haarcascade_frontalface_alt.xml";
unsigned int opt_device = 0;
unsigned int opt_scale = 2; // Default value is 2 in the constructor. Turn
// it to 1 to avoid subsampling
for (int i = 0; i < argc; i++) {
if (std::string(argv[i]) == "--haar")
opt_face_cascade_name = std::string(argv[i + 1]);
else if (std::string(argv[i]) == "--device")
opt_device = (unsigned int)atoi(argv[i + 1]);
else if (std::string(argv[i]) == "--scale")
opt_scale = (unsigned int)atoi(argv[i + 1]);
else if (std::string(argv[i]) == "--help") {
std::cout << "Usage: " << argv[0]
<< " [--haar <haarcascade xml filename>] [--device <camera "
"device>] [--scale <subsampling factor>] [--help]"
<< std::endl;
return 0;
}
}
vpImage<unsigned char> I; // for gray images
#if defined(VISP_HAVE_V4L2)
std::ostringstream device;
device << "/dev/video" << opt_device;
g.setDevice(device.str());
g.setScale(opt_scale); // Default value is 2 in the constructor. Turn it
// to 1 to avoid subsampling
g.acquire(I);
#elif defined(VISP_HAVE_OPENCV)
cv::VideoCapture cap(opt_device); // open the default camera
#if (VISP_HAVE_OPENCV_VERSION >= 0x030000)
int width = (int)cap.get(cv::CAP_PROP_FRAME_WIDTH);
int height = (int)cap.get(cv::CAP_PROP_FRAME_HEIGHT);
cap.set(cv::CAP_PROP_FRAME_WIDTH, width / opt_scale);
cap.set(cv::CAP_PROP_FRAME_HEIGHT, height / opt_scale);
#else
int width = cap.get(CV_CAP_PROP_FRAME_WIDTH);
int height = cap.get(CV_CAP_PROP_FRAME_HEIGHT);
cap.set(CV_CAP_PROP_FRAME_WIDTH, width / opt_scale);
cap.set(CV_CAP_PROP_FRAME_HEIGHT, height / opt_scale);
#endif
if (!cap.isOpened()) { // check if we succeeded
std::cout << "Failed to open the camera" << std::endl;
return -1;
}
cv::Mat frame;
cap >> frame; // get a new frame from camera
#endif
#if defined(VISP_HAVE_X11)
vpDisplayX d(I);
#elif defined(VISP_HAVE_GDI)
#elif defined(VISP_HAVE_OPENCV)
#endif
vpDisplay::setTitle(I, "ViSP viewer");
vpDetectorFace face_detector;
face_detector.setCascadeClassifierFile(opt_face_cascade_name);
while (1) {
double t = vpTime::measureTimeMs();
#if defined(VISP_HAVE_V4L2)
g.acquire(I);
bool face_found = face_detector.detect(I);
#else
cap >> frame; // get a new frame from camera
bool face_found = face_detector.detect(frame); // We pass frame to avoid an internal image conversion
#endif
if (face_found) {
std::ostringstream text;
text << "Found " << face_detector.getNbObjects() << " face(s)";
vpDisplay::displayText(I, 10, 10, text.str(), vpColor::red);
for (size_t i = 0; i < face_detector.getNbObjects(); i++) {
vpRect bbox = face_detector.getBBox(i);
vpDisplay::displayText(I, (int)bbox.getTop() - 10, (int)bbox.getLeft(),
"Message: \"" + face_detector.getMessage(i) + "\"", vpColor::red);
}
}
vpDisplay::displayText(I, (int)I.getHeight() - 25, 10, "Click to quit...", vpColor::red);
if (vpDisplay::getClick(I, false)) // a click to exit
break;
std::cout << "Loop time: " << vpTime::measureTimeMs() - t << " ms" << std::endl;
}
} catch (const vpException &e) {
std::cout << e.getMessage() << std::endl;
}
#else
(void)argc;
(void)argv;
#endif
}

The usage of this example is similar to the previous one. Just run

$ ./tutorial-face-detector-live

Additional command line options are available to specify the location of the Haar cascade file and also the camera identifier if more than one camera is connected to your computer:

$ ./tutorial-face-detector-live --help
Usage: ./tutorial-face-detector-live [--device <camera device>] [--haar <haarcascade xml filename>] [--help]

The source code of this example is very similar to the previous one except that here we use camera framegrabber devices (see Tutorial: Image frame grabbing). Two different grabber may be used:

  • If ViSP was build with Video For Linux (V4L2) support available for example on Fedora or Ubuntu distribution, VISP_HAVE_V4L2 macro is defined. In that case, images coming from an USB camera are acquired using vpV4l2Grabber class.
  • If ViSP wasn't build with V4L2 support, but with OpenCV we use cv::VideoCapture class to grab the images. Notice that when images are acquired with OpenCV there is an additional conversion from cv::Mat to vpImage.
#if defined(VISP_HAVE_V4L2)
std::ostringstream device;
device << "/dev/video" << opt_device;
g.setDevice(device.str());
g.setScale(opt_scale); // Default value is 2 in the constructor. Turn it
// to 1 to avoid subsampling
g.acquire(I);
#elif defined(VISP_HAVE_OPENCV)
cv::VideoCapture cap(opt_device); // open the default camera
#if (VISP_HAVE_OPENCV_VERSION >= 0x030000)
int width = (int)cap.get(cv::CAP_PROP_FRAME_WIDTH);
int height = (int)cap.get(cv::CAP_PROP_FRAME_HEIGHT);
cap.set(cv::CAP_PROP_FRAME_WIDTH, width / opt_scale);
cap.set(cv::CAP_PROP_FRAME_HEIGHT, height / opt_scale);
#else
int width = cap.get(CV_CAP_PROP_FRAME_WIDTH);
int height = cap.get(CV_CAP_PROP_FRAME_HEIGHT);
cap.set(CV_CAP_PROP_FRAME_WIDTH, width / opt_scale);
cap.set(CV_CAP_PROP_FRAME_HEIGHT, height / opt_scale);
#endif
if (!cap.isOpened()) { // check if we succeeded
std::cout << "Failed to open the camera" << std::endl;
return -1;
}
cv::Mat frame;
cap >> frame; // get a new frame from camera
#endif

Then in the while loop, at each iteration we acquire a new image

#if defined(VISP_HAVE_V4L2)
g.acquire(I);
bool face_found = face_detector.detect(I);
#else
cap >> frame; // get a new frame from camera
bool face_found = face_detector.detect(frame); // We pass frame to avoid an internal image conversion
#endif

This new image is then given as input to the face detector.

Next tutorial

You are now ready to see the Tutorial: How to use multi-threading capabilities, that illustrates the case of face detection achieved in a separate thread.