Mouhammad Bazzi Profile Picture
Mouhammad Bazzi

Real-time Emotion and Gender Classification

Our project focused on developing a real-time pipeline for gender and emotion classification from live camera feed. To ensure high speed, we implemented a model using only convolutional layers, eliminating the need for feed-forward networks. This approach enabled efficient classification, making it practical for real-world applications.

Project Image

Project Overview

I) Context: Real-Time Emotion and Gender Classificationg
Our project aims to develop a real-time emotion and gender classification system, driven by the need for accurate and efficient recognition of these attributes from facial images. Recognizing emotions and genders from facial expressions has various applications in fields such as affective computing and human-computer interaction. However, achieving high classification accuracy poses a challenge, as human performance in emotion classification averages around 65%. To address this, we draw inspiration from relevant research, particularly the paper "Real-time Convolutional Neural Networks for Emotion and Gender Classification" by Octavio Arriaga et al. This paper presents novel techniques and methodologies that can enhance the accuracy and efficiency of our classification models.

II) Implementation Details: Fully Convolutional Networks for Efficient Classification
For our implementation, we adopt a fully convolutional network (FCN) architecture inspired by the referenced paper. Using FCNs offers advantages over traditional approaches that combine convolutional neural networks (CNNs) with feed-forward networks. By utilizing convolutional layers throughout the entire network, we capture spatial information and local dependencies effectively, which is crucial for accurate emotion and gender classification from facial images. Furthermore, we employ global average pooling instead of fully connected layers in the final stage of the network. This reduces the number of parameters, resulting in computational savings and faster inference times. The FCN architecture allows us to process images efficiently while maintaining high classification performance.

To integrate our classification models into a real-time pipeline, we utilize the OpenCV library for capturing video feed, face detection using the Haar Cascade Classifier, and displaying real-time predictions for gender and emotion near the detected faces. This integration, combined with the efficient FCN models, enables us to achieve real-time emotion and gender classification from live video feeds.

III) Results: Promising Performance of the Fully Convolutional Model
In conclusion, our fully convolutional model designed for emotion classification overcame challenges posed by limited computational power and dataset constraints. Despite working with a smaller dataset and fewer training epochs, the model achieved a commendable accuracy of 60.86%, surpassing the performance of traditional convolutional architectures. The scarcity of representative examples for certain emotions impacted classification accuracy, while gender classification fared relatively better.

Moreover, the architecture's success extended beyond accuracy, as it also demonstrated improved speed, with an average prediction time of 20ms compared to the 30ms of feed-forward models (1.5 speedup), making it a compelling choice for real-time applications.


For more details, please refer to the GitHub link, where you can find the complete report and the original paper.

Tools Used

Python
TensorFlow
Keras
OpenCV
GIT