Visual Servoing Platform  version 3.6.1 under development (2024-04-19)
Tutorial: Face detection

Introduction

This tutorial shows how to detect one or more faces with ViSP. Face detection is performed using OpenCV Haar cascade capabilities that are used in vpDetectorFace class. At least OpenCV 2.2.0 or a more recent version is requested.

In the next sections you will find examples that show how to detect faces in a video, or in images acquired by a camera connected to your computer.

Note that all the material (source code and image) described in this tutorial is part of ViSP source code (in tutorial/detection/face folder) and could be found in https://github.com/lagadic/visp/tree/master/tutorial/detection/face.

Face detection in a video

The following example also available in tutorial-face-detector.cpp allows to detect faces in an mpeg video located near the source code. The Haar cascade classifier file requested by OpenCV is also provided in the same folder as the source code.

#include <visp3/gui/vpDisplayGDI.h>
#include <visp3/gui/vpDisplayOpenCV.h>
#include <visp3/gui/vpDisplayX.h>
#include <visp3/detection/vpDetectorFace.h>
#include <visp3/io/vpVideoReader.h>
int main(int argc, const char *argv[])
{
#if defined(VISP_HAVE_OPENCV) && defined(HAVE_OPENCV_HIGHGUI) && defined(HAVE_OPENCV_IMGPROC) && defined(HAVE_OPENCV_OBJDETECT)
try {
std::string opt_face_cascade_name = "./haarcascade_frontalface_alt.xml";
std::string opt_video = "video.mp4";
for (int i = 0; i < argc; i++) {
if (std::string(argv[i]) == "--haar")
opt_face_cascade_name = std::string(argv[i + 1]);
else if (std::string(argv[i]) == "--video")
opt_video = std::string(argv[i + 1]);
else if (std::string(argv[i]) == "--help" || std::string(argv[i]) == "-h") {
std::cout << "Usage: " << argv[0] << " [--haar <haarcascade xml filename>] [--video <input video file>]"
<< " [--help] [-h]" << std::endl;
return EXIT_SUCCESS;
}
}
g.setFileName(opt_video);
g.open(I);
#if defined(VISP_HAVE_X11)
vpDisplayX d(I);
#elif defined(VISP_HAVE_GDI)
#elif defined(HAVE_OPENCV_HIGHGUI)
#endif
vpDisplay::setTitle(I, "ViSP viewer");
vpDetectorFace face_detector;
face_detector.setCascadeClassifierFile(opt_face_cascade_name);
bool exit_requested = false;
while (!g.end() && !exit_requested) {
g.acquire(I);
bool face_found = face_detector.detect(I);
if (face_found) {
std::ostringstream text;
text << "Found " << face_detector.getNbObjects() << " face(s)";
vpDisplay::displayText(I, 10, 10, text.str(), vpColor::red);
for (size_t i = 0; i < face_detector.getNbObjects(); i++) {
vpRect bbox = face_detector.getBBox(i);
vpDisplay::displayText(I, (int)bbox.getTop() - 10, (int)bbox.getLeft(),
"Message: \"" + face_detector.getMessage(i) + "\"", vpColor::red);
}
}
vpDisplay::displayText(I, (int)I.getHeight() - 25, 10, "Click to quit...", vpColor::red);
if (vpDisplay::getClick(I, false)) // a click to exit
exit_requested = true;
}
if (!exit_requested)
} catch (const vpException &e) {
std::cout << e.getMessage() << std::endl;
}
#else
(void)argc;
(void)argv;
#endif
}
static const vpColor red
Definition: vpColor.h:211
static const vpColor green
Definition: vpColor.h:214
vpRect getBBox(size_t i) const
size_t getNbObjects() const
std::vector< std::string > & getMessage()
void setCascadeClassifierFile(const std::string &filename)
bool detect(const vpImage< unsigned char > &I) vp_override
Display for windows using GDI (available on any windows 32 platform).
Definition: vpDisplayGDI.h:128
The vpDisplayOpenCV allows to display image using the OpenCV library. Thus to enable this class OpenC...
Use the X11 console to display images on unix-like OS. Thus to enable this class X11 should be instal...
Definition: vpDisplayX.h:128
static bool getClick(const vpImage< unsigned char > &I, bool blocking=true)
static void display(const vpImage< unsigned char > &I)
static void setTitle(const vpImage< unsigned char > &I, const std::string &windowtitle)
static void flush(const vpImage< unsigned char > &I)
static void displayRectangle(const vpImage< unsigned char > &I, const vpImagePoint &topLeft, unsigned int width, unsigned int height, const vpColor &color, bool fill=false, unsigned int thickness=1)
static void displayText(const vpImage< unsigned char > &I, const vpImagePoint &ip, const std::string &s, const vpColor &color)
error that can be emitted by ViSP classes.
Definition: vpException.h:59
const char * getMessage() const
Definition: vpException.cpp:64
unsigned int getHeight() const
Definition: vpImage.h:184
Defines a rectangle in the plane.
Definition: vpRect.h:76
double getLeft() const
Definition: vpRect.h:170
double getTop() const
Definition: vpRect.h:189
Class that enables to manipulate easily a video file or a sequence of images. As it inherits from the...
void acquire(vpImage< vpRGBa > &I)
void open(vpImage< vpRGBa > &I)
void setFileName(const std::string &filename)

To detect the faces just run:

$ ./tutorial-face-detector

You will get the following result:

Now we explain the main lines of the source.

First we have to include the header of the class that allows to detect a face.

#include <visp3/detection/vpDetectorFace.h>

Then in the main() function before going further we need to check if OpenCV 2.2.0 is available.

#if defined(VISP_HAVE_OPENCV) && defined(HAVE_OPENCV_HIGHGUI) && defined(HAVE_OPENCV_IMGPROC) && defined(HAVE_OPENCV_OBJDETECT)

We set then the default input data:

  • the name of the Haar cascade classifier file "haarcascade_frontalface_alt.xml"
  • the name of the input video "video.mpeg"
std::string opt_face_cascade_name = "./haarcascade_frontalface_alt.xml";
std::string opt_video = "video.mp4";

With command line options it is possible to use other inputs. To know how just run:

$ ./tutorial-face-detector --help
Usage: ./tutorial-face-detector [--haar <haarcascade xml filename>] [--video <input video file>] [--help]

Then we open the video stream, create a windows named "ViSP viewer" where images and the resulting face detection will be displayed.

The creation of the face detector is performed using

vpDetectorFace face_detector;

We need also to set the location and name of the xml file that contains the Haar cascade classifier data used to recognized a face.

face_detector.setCascadeClassifierFile(opt_face_cascade_name);

Then we enter in the while loop where for each new image, the try to detect one or more faces:

bool face_found = face_detector.detect(I);

If a face is detected, vpDetectorFace::detect() returns true. It is then possible to retrieve the number of faces that are detected:

text << "Found " << face_detector.getNbObjects() << " face(s)";

For each face, we have access to its location using vpDetectorFace::getPolygon(), its bounding box using vpDetectorFace::getBBox() and its identifier message using vpDetectorFace::getMessage().

for (size_t i = 0; i < face_detector.getNbObjects(); i++) {
vpRect bbox = face_detector.getBBox(i);
vpDisplay::displayText(I, (int)bbox.getTop() - 10, (int)bbox.getLeft(),
"Message: \"" + face_detector.getMessage(i) + "\"", vpColor::red);
}
Note
When more than one face is detected, faces are ordered from the largest to the smallest. That means that vpDetectorFace::getPolygon(0), vpDetectorFace::getBBox(0) and vpDetectorFace::getMessage(0) return always the characteristics of the largest face.

Face detection from a camera

This other example also available in tutorial-face-detector-live.cpp shows how to detect one or more faces in images acquired by a camera connected to your computer.

#include <visp3/core/vpConfig.h>
#include <visp3/detection/vpDetectorFace.h>
#include <visp3/gui/vpDisplayGDI.h>
#include <visp3/gui/vpDisplayOpenCV.h>
#include <visp3/gui/vpDisplayX.h>
#ifdef VISP_HAVE_MODULE_SENSOR
#include <visp3/sensor/vpV4l2Grabber.h>
#endif
#if defined(HAVE_OPENCV_VIDEOIO)
#include <opencv2/videoio.hpp>
#endif
int main(int argc, const char *argv [])
{
#if defined(HAVE_OPENCV_HIGHGUI) && defined(HAVE_OPENCV_IMGPROC) && defined(HAVE_OPENCV_OBJDETECT)
try {
std::string opt_face_cascade_name = "./haarcascade_frontalface_alt.xml";
unsigned int opt_device = 0;
unsigned int opt_scale = 2; // Default value is 2 in the constructor. Turn
// it to 1 to avoid subsampling
for (int i = 0; i < argc; i++) {
if (std::string(argv[i]) == "--haar")
opt_face_cascade_name = std::string(argv[i + 1]);
else if (std::string(argv[i]) == "--device")
opt_device = (unsigned int)atoi(argv[i + 1]);
else if (std::string(argv[i]) == "--scale")
opt_scale = (unsigned int)atoi(argv[i + 1]);
else if (std::string(argv[i]) == "--help") {
std::cout << "Usage: " << argv[0]
<< " [--haar <haarcascade xml filename>] [--device <camera "
"device>] [--scale <subsampling factor>] [--help]"
<< std::endl;
return EXIT_SUCCESS;
}
}
vpImage<unsigned char> I; // for gray images
#if defined(VISP_HAVE_V4L2)
std::ostringstream device;
device << "/dev/video" << opt_device;
g.setDevice(device.str());
g.setScale(opt_scale); // Default value is 2 in the constructor. Turn it
// to 1 to avoid subsampling
g.acquire(I);
#elif defined(HAVE_OPENCV_VIDEOIO)
cv::VideoCapture cap(opt_device); // open the default camera
#if (VISP_HAVE_OPENCV_VERSION >= 0x030000)
int width = (int)cap.get(cv::CAP_PROP_FRAME_WIDTH);
int height = (int)cap.get(cv::CAP_PROP_FRAME_HEIGHT);
cap.set(cv::CAP_PROP_FRAME_WIDTH, width / opt_scale);
cap.set(cv::CAP_PROP_FRAME_HEIGHT, height / opt_scale);
#else
int width = cap.get(CV_CAP_PROP_FRAME_WIDTH);
int height = cap.get(CV_CAP_PROP_FRAME_HEIGHT);
cap.set(CV_CAP_PROP_FRAME_WIDTH, width / opt_scale);
cap.set(CV_CAP_PROP_FRAME_HEIGHT, height / opt_scale);
#endif
if (!cap.isOpened()) { // check if we succeeded
std::cout << "Failed to open the camera" << std::endl;
return EXIT_FAILURE;
}
cv::Mat frame;
cap >> frame; // get a new frame from camera
#endif
#if defined(VISP_HAVE_X11)
vpDisplayX d(I);
#elif defined(VISP_HAVE_GDI)
#elif defined(HAVE_OPENCV_HIGHGUI)
#endif
vpDisplay::setTitle(I, "ViSP viewer");
vpDetectorFace face_detector;
face_detector.setCascadeClassifierFile(opt_face_cascade_name);
while (1) {
double t = vpTime::measureTimeMs();
#if defined(VISP_HAVE_V4L2)
g.acquire(I);
bool face_found = face_detector.detect(I);
#else
cap >> frame; // get a new frame from camera
bool face_found = face_detector.detect(frame); // We pass frame to avoid an internal image conversion
#endif
if (face_found) {
std::ostringstream text;
text << "Found " << face_detector.getNbObjects() << " face(s)";
vpDisplay::displayText(I, 10, 10, text.str(), vpColor::red);
for (size_t i = 0; i < face_detector.getNbObjects(); i++) {
vpRect bbox = face_detector.getBBox(i);
vpDisplay::displayText(I, (int)bbox.getTop() - 10, (int)bbox.getLeft(),
"Message: \"" + face_detector.getMessage(i) + "\"", vpColor::red);
}
}
vpDisplay::displayText(I, (int)I.getHeight() - 25, 10, "Click to quit...", vpColor::red);
if (vpDisplay::getClick(I, false)) // a click to exit
break;
std::cout << "Loop time: " << vpTime::measureTimeMs() - t << " ms" << std::endl;
}
}
catch (const vpException &e) {
std::cout << e.getMessage() << std::endl;
}
#else
(void)argc;
(void)argv;
#endif
}
static void convert(const vpImage< unsigned char > &src, vpImage< vpRGBa > &dest)
Class that is a wrapper over the Video4Linux2 (V4L2) driver.
void setScale(unsigned scale=vpV4l2Grabber::DEFAULT_SCALE)
void setDevice(const std::string &devname)
void acquire(vpImage< unsigned char > &I)
VISP_EXPORT double measureTimeMs()

The usage of this example is similar to the previous one. Just run

$ ./tutorial-face-detector-live

Additional command line options are available to specify the location of the Haar cascade file and also the camera identifier if more than one camera is connected to your computer:

$ ./tutorial-face-detector-live --help
Usage: ./tutorial-face-detector-live [--device <camera device>] [--haar <haarcascade xml filename>] [--help]

The source code of this example is very similar to the previous one except that here we use camera framegrabber devices (see Tutorial: Image frame grabbing). Two different grabber may be used:

  • If ViSP was build with Video For Linux (V4L2) support available for example on Fedora or Ubuntu distribution, VISP_HAVE_V4L2 macro is defined. In that case, images coming from an USB camera are acquired using vpV4l2Grabber class.
  • If ViSP wasn't build with V4L2 support, but with OpenCV we use cv::VideoCapture class to grab the images. Notice that when images are acquired with OpenCV there is an additional conversion from cv::Mat to vpImage.
#if defined(VISP_HAVE_V4L2)
std::ostringstream device;
device << "/dev/video" << opt_device;
g.setDevice(device.str());
g.setScale(opt_scale); // Default value is 2 in the constructor. Turn it
// to 1 to avoid subsampling
g.acquire(I);
#elif defined(HAVE_OPENCV_VIDEOIO)
cv::VideoCapture cap(opt_device); // open the default camera
#if (VISP_HAVE_OPENCV_VERSION >= 0x030000)
int width = (int)cap.get(cv::CAP_PROP_FRAME_WIDTH);
int height = (int)cap.get(cv::CAP_PROP_FRAME_HEIGHT);
cap.set(cv::CAP_PROP_FRAME_WIDTH, width / opt_scale);
cap.set(cv::CAP_PROP_FRAME_HEIGHT, height / opt_scale);
#else
int width = cap.get(CV_CAP_PROP_FRAME_WIDTH);
int height = cap.get(CV_CAP_PROP_FRAME_HEIGHT);
cap.set(CV_CAP_PROP_FRAME_WIDTH, width / opt_scale);
cap.set(CV_CAP_PROP_FRAME_HEIGHT, height / opt_scale);
#endif
if (!cap.isOpened()) { // check if we succeeded
std::cout << "Failed to open the camera" << std::endl;
return EXIT_FAILURE;
}
cv::Mat frame;
cap >> frame; // get a new frame from camera
#endif

Then in the while loop, at each iteration we acquire a new image

#if defined(VISP_HAVE_V4L2)
g.acquire(I);
bool face_found = face_detector.detect(I);
#else
cap >> frame; // get a new frame from camera
bool face_found = face_detector.detect(frame); // We pass frame to avoid an internal image conversion
#endif

This new image is then given as input to the face detector.

Next tutorial

You are now ready to see the Tutorial: Object detection and localization, that illustrates the case of object detection.