Exploring the Fundamentals of Facial Expression Recognition Systems: An Introduction to Key Methodologies

Published in

Geek Culture

5 min readJan 6, 2023

Facial expression is a major aspect of human emotional expression, social interaction, and communication. Recognizing facial expressions in digital media became one of the fastest-growing research fields in recent years. Automatically recognizing facial expressions will open new pathways to human-computer interactions, human-robot interactions, and many other applications.

This article will explore the overview of the facial expression analysis pipeline.

Facial Expression Analysis Methodologies

The face analysis pipeline can be divided into four main steps,

Face detection
Preprocessing
Feature extraction
Classification.

Today, I will briefly explain the step and summarize the commonly used algorithm for each step.

Face Detection

Identifying human faces in an image or a video is the initial step of applications such as face recognition, face analysis, face tracking, and facial expression recognition systems.

Several open-source face detection algorithms are available with public pre-trained models, most of which are based on local facial features. Viola & Jones face detector algorithm is a well-known algorithm with fast feature computation and efficient feature selection capabilities. However, this algorithm is effective only for the frontal face images and it is sensitive to the lighting conditions of the images.

Neural networks (NN) are also a popular method for face detection.

Preprocessing

Data preprocessing is one of the important steps in the machine learning pipeline that extract meaningful features from data. This step helps to enhance the model's performance. Preprocessing step in the facial expression recognition system aims to align faces into a common reference frame.

This will help to remove rigid head rotations, illuminations, and interpersonal variations from the dataset.

There are multiple stages in data preprocessing, and let’s go through these various stages one by one.

Facial landmarks detection is the first preprocessing step. It is the process of locating key points in the face, such as the tip of the nose, corners of the eyes, and the corners of the lips. These key points help to define the shape of the face. Fig. 1 shows a python implementation of landmark localization.

Fig.1: Python implementation of facial landmark localization — (Left):Original image, (Right): Detected landmarks on the original image

2. Face registration

Face registration is a preprocessing step that aligns the sample faces to a common reference face (as shown inFig. 2). This step improves the performance of the facial expression recognition system by eliminating rigid head motions and interpersonal variations from the samples. Some of the possible approaches for face registration are Procrustes transformation and piecewise affine transformation.

Fig. 2: Python implementation of face registration — (Left): Registered image,(Right): Detected landmarks on the registered image (Right

3. Face crop

Face crop is a step that needs to attend to depending on the final task. Users can define the dimensions of the face crop and it has a simple implementation. Fig 3 illustrates the face crop technique on the original image and the registered image.

Fig. 3. Face crop — (Left): Face crop applied to an original image, (Right):face crop applied to the registered image

4. Other preprocessing methods

Several other preprocessing techniques exist.

Contrast equalization: This technique can use to solve the challenging lighting conditions of the input face. Different types of filters can make the contrast more equal.
Gamma Correction: This is a nonlinear operation that enhances the luminance or tristimulus values in images. This helps to improve the overall brightness of the image.
Difference of Gaussian filtering (DoG): The DoG is a feature enhancement algorithm that increases the visibility of edges and other details.

Feature Extraction

The target of the feature extraction step is to reduce the number of features in the dataset by introducing new features from existing ones.

It helps to remove the redundant data in the dataset.

Feature extraction methods can be grouped into four categories, appearance-based (texture or gradient), geometry-based (shape, angle, or ratio), motion-based, and hybrid methods.

Local binary pattern (LBP): LBP is one of the popular texture descriptors that is based on appearance features. LBP labels the image pixels by thresholding the neighboring pixels and converting the results into a binary number. Face images can be represented with a vector by combining LBP with a histogram.

LBP features are simple, efficient, robust to illumination changes, and sensitive to local structures. However, they are not robust to rotation and the upright face position is important.

In the next article, I will go further into the LBP descriptor.

Filter Banks: This method uses a set of filters to extract the features from an image. Gabor filter banks are commonly used in earlier automatic action unit analysis works.

However, this method is not robust to illumination changes and affine transformations. It also gives very large resulting dimensionalities.

Classification

Classification is the final step of the facial expression analysis pipeline, where we identify the expression. The classifier has 2 phases: the training phase and the classification phase. It is important that the performance of the classifier should be accurate and efficient.

Many classification algorithms exist, such as the Hidden Markov Model (HMM), Support vector machine (SVM), K nearest neighbor (KNN), Decision tree, Least-square method, and Naïve Bayes method. For facial expression classification, the accuracy of HMM is proven to be better in the literature.

It is important to have a real-time classification for facial expression recognition systems.

Conclusion

This article covers a brief introduction to the basic steps of the facial expression recognition system pipeline. More in-depth explanations of these algorithms will be discussed in my upcoming articles.