Overview

The Native Emotions Library is a portable C++ library for face and facial expression tracking.

The SDK also has wrappers in the following languages:

  • Java/Kotlin (for Android development)

  • Python

Obtaining SDK

Realeyes C++ SDK is available commercially to selected partners and free of charge for academic research. You can request your access to the SDK here:

ACCESS SDK

Release Notes

  • Build 123 (16 Mar 2020)
    • Python support

  • Build 122 (28 Feb 2020)
    • Java support on Android

  • Build 121 (24 Feb 2020)
    • Updated SDK package structure

  • Build 119 (13 Feb 2020)
    • Updated to C++17 standard

    • Added support for a callback based version of the track method

    • Added support for max concurrency parameter in the Tracker constructor

    • Added emotionID member to the EmotionData struct

    • model: 4.4.1 (TCN) model: internal changes

  • Build 113 (13 Dec 2019)
    • Use internal video reader instead of the opencv

    • model: 4.3.1 (TCN) model: added presence

  • Build 112 (10 Nov 2019)
    • Added presence classifier

    • model: 4.3.1 (TCN) model: added presence

  • Build 111 (10 Nov 2019)
    • Added support for smaller mobile optimized models

    • Added option to change the minimum face size

    • Documentation fixes and updates

  • Build 108 (05 Oct 2019)
    • iOS support

  • Build 105 (09 Sep 2019)
    • Updated Tensorflow Lite 1.14

    • Android support

    • Various performance optimizations

    • model: 4.3.0 TCN model: smaller speed optimized model

  • Build 100 (11 Jul 2019)
    • First public version with DNN classifiers.

    • model: 4.2.0 TCN model

Getting Started

Hardware requirements

The SDK doesn’t have any special hardware requirement:

  • CPU: No special requirement, any modern 64 bit capable CPU (x86-64 with AVX, ARM8) is supported

  • GPU: No special requirement

  • RAM: 1 GB of available RAM required

  • Camera: No special requirement, minimum resolution: 640x480. See nel::Tracker::set_minimum_face_ratio()

Software requirements

The SDK is regularly tested on the following Operating Systems:

  • Windows 10

  • Ubuntu 18.4

  • Mac OS Catalina

  • iOS 12.4

  • Android 6.0

3rd Party Licenses

While the SDK is released under a proprietary license, the following Open-Source projects where used in it with their respective licenses:

Dependencies

The public C++ API hides all the implementation details from the user, and it only depends on the C++17 Standard Library. It also provides a binary compatible interface, making it possible to change the underlying implementation without the need of recompilation of the user code.

The library itself has dependency on the following library:
  • Intel Thread Building Blocks (TBB) (version 2017 and up)

This library may or may not be included in the distributed package depending on the platform.

Installation

Extract the SDK contents, include the headers from the include folder and link libNativeEmotionsLibrary to your C++ project.

Extract the SDK contents to the libs directory of the gradle project, and add the dependency to build.gradle:

dependencies {
   implementation fileTree(dir: 'libs', include: ['*.aar'])
}

The python version of the SDK can be installed with pip:

$ pip install pybind11
$ pip install realeyes_nel_sdk/dist/native_emotions_library*.tar.gz

Usage

The main entry point of this library is the nel::Tracker class and its track() function.

After a tracker object was constructed and optionally some settings were changed (e.g.: emotion was enabled or disabled with set_emotion_enabled()), the user can call the track() function repeatedly for every frame of the video or other frame source.

nel::Tracker::track() has two versions both are non-blocking async calls, one is returning std::future the other is calling the callback on completion. After one call a subsequent call is possible without waiting for the result, but the order of the frames should be sequential.

For the frame data, the user must construct a nel::ImageHeader object and pass that to nel::Tracker::track(). The frame data must outlive this object since it is a non-owning view of the data, but it only needs to be valid during the nel::Tracker::track() call, the library will copy the frame data internally.

The following example shows the basic usage of the library using OpenCV for capturing frames from the camera and feeding it to the tracker:

#include "tracker.h"

#include <opencv2/core.hpp>
#include <opencv2/highgui.hpp>

#include <iostream>

void processResult(nel::Tracker::ResultType& result);

int main()
{
   nel::Tracker tracker("../model/model.realZ");

   cv::VideoCapture cam(0);
   if (!cam.isOpened()) {
         std::cerr << "Can't open the webcam." << std::endl;
         return -1;
   }

   auto start = std::chrono::high_resolution_clock::now()

   cv::Mat frame;

   while (frame.empty())
         cam >> frame;

   while (cv::waitKey(1) == -1) {
         // Read one frame
         cam >> frame;
         auto timestamp = std::chrono::duration_cast<std::chrono::milliseconds>(
            std::chrono::high_resolution_clock::now() - start
         );

         // Call the tracker providing the frame data
         auto result = tracker.track(
            {frame.ptr(), frame.cols, frame.rows, static_cast<int>(frame.step1()), nel::ImageFormat::BGR},
            timestamp
         );

         // Do something else, you can even send more frames for processing

         // Do something with the result
         processResult(result.get());
   }
   return 0;
}

The main entry point of this library is the com.realeyesit.nel.Tracker interface and its track function.

After a tracker object was constructed and optionally some settings were changed (e.g.: emotion was enabled or disabled with setEmotionEnabled), the user can call the track function repeatedly for every frame of the video or other frame source.

The following example shows the basic usage of the library using OpenCV for capturing frames from the camera and feeding it to the tracker:

public void tryNel(Context context) {
   Bitmap bitmap = BitmapFactory.decodeResource(context.getResources(), R.drawable.frame);
   ByteBuffer buffer = ByteBuffer.allocateDirect(bitmap.getWidth() * bitmap.getHeight() * 4);
   bitmap.copyPixelsToBuffer(buffer);

   ImageHeader header = new ImageHeader();
   header.setFormat(ImageFormat.RGBA);
   header.setData(buffer);
   header.setWidth(bitmap.getWidth());
   header.setHeight(bitmap.getHeight());

   Tracker tracker = new NelTracker("model.realz");

   TrackerResultFuture future = tracker.track(header, System.currentTimeMillis());
   ResultType result = future.get();
   // do something with result
}

The main entry point of this library is the native_emotions_library.Tracker class and its track() function.

After a tracker object was constructed and optionally some settings were changed (e.g.: emotion was enabled or disabled with set_emotion_enabled()), the user can call the track() function repeatedly for every frame of the video or other frame source.

The following example shows the basic usage of the library using OpenCV for capturing frames from the camera and feeding it to the tracker:

import native_emotions_library as nel
import cv2

# Initialize the tracker
tracker = nel.Tracker(model_filename)

# Initialize the camera and start time
camera = cv2.VideoCapture(0)
start = timer()

while cv2.waitKey(1) == -1:
   ret, frame = camera.read()
   if not ret:
         break

   # Call the tracker with timestamp in ms
   result = tracker.track(frame, round((timer() - start) * 1000))

   # Do something with the result
   processResult(result)

Results

The result of the tracking contains a nel::LandmarkData structure and a nel::EmotionResults vector.

  • The nel::LandmarkData consists of the following members:
    • scale, the size of the face (larger means closer the user to the camera)

    • roll, pitch, yaw, the 3 Euler angles of the face pose

    • translate, the position of the head center on the frame

    • the landmarks2d vector with either 0 or 49 points,

    • the landmarks3d vector with either 0 or 49 points,

    • and the isGood boolean value.

    The isGood indicates whether the tracking is deemed good enough.

    landmarks2d and landmarks3d contain 0 points if the tracker failed to find a face on the image, otherwise it always contain 49 points in the following structure:

    _images/landmarks.png

    landmarks3d contains the 3d coordinates of the frontal face in 3D space with 0 translation and 1 scale.

  • The nel::EmotionResults contains multiple nel::EmotionData elements with the following members:
    • probability, probability of the emotion

    • isActive, whether the probability is higher than an internal threshold

    • isDetectionSuccessful whether the tracking quality was good enough to reliable detect this emotion

    The order of the nel::EmotionData elements are the same as the emotions in nel::Tracker::get_emotion_IDs() and in nel::Tracker::get_emotion_names().

Interpretation of the classifier output

The probability output of the Realeyes classifier (from the nel::EmotionData structure) has the following properties:

  • It is a continuous value from the [0,1] range

  • It changes depending on type and number of facial features activated

  • It typically indicates facial activity in regions of face that correspond to a given facial expression

  • Strong facial wrinkles or shadows can amplify the classifier sensitivity to corresponding facial regions

  • It is purposefully sensitive as the classifier is trained to capture slight expressions

  • It should not be interpreted as intensity of a given facial expression

  • It is not possible to prescribe which facial features correspond to what output levels due to the nature of the used ML models

We recommend the following interpretation of the probability output:

  • values close to 0
    • no or very little activity on the face with respect to a given facial expression

  • values between 0 and binary threshold
    • some facial activity was perceived, though in the view of the classifier it does not amount to a basic facial expression

  • values just below binary threshold
    • high facial activity was perceived, which under some circumstances may be interpreted as true basic facial expression, while under others not (e.g. watching ads vs. playing games)

  • values above binary threshold
    • high facial activity was perceived, which in view of the classifier amount to a basic facial expression