Project Overview
I) Context: Real-Time Emotion and Gender Classificationg
Our project aims to develop a real-time emotion and gender
classification system, driven by the need for accurate and
efficient recognition of these attributes from facial images.
Recognizing emotions and genders from facial expressions has
various applications in fields such as affective computing and
human-computer interaction. However, achieving high
classification accuracy poses a challenge, as human performance
in emotion classification averages around 65%. To address this,
we draw inspiration from relevant research, particularly the paper
"Real-time Convolutional Neural Networks for Emotion and Gender
Classification" by Octavio Arriaga et al. This paper presents
novel techniques and methodologies that can enhance the accuracy
and efficiency of our classification models.
II) Implementation Details: Fully Convolutional Networks for Efficient Classification
For our implementation, we adopt a fully convolutional network (FCN)
architecture inspired by the referenced paper. Using FCNs offers
advantages over traditional approaches that combine convolutional
neural networks (CNNs) with feed-forward networks. By utilizing
convolutional layers throughout the entire network, we capture
spatial information and local dependencies effectively, which
is crucial for accurate emotion and gender classification from
facial images. Furthermore, we employ global average pooling instead
of fully connected layers in the final stage of the network. This
reduces the number of parameters, resulting in computational savings
and faster inference times. The FCN architecture allows us to process
images efficiently while maintaining high classification performance.
To integrate our classification models into a real-time pipeline, we
utilize the OpenCV library for capturing video feed, face detection
using the Haar Cascade Classifier, and displaying real-time predictions
for gender and emotion near the detected faces. This integration,
combined with the efficient FCN models, enables us to achieve real-time
emotion and gender classification from live video feeds.
III) Results: Promising Performance of the Fully Convolutional Model
In conclusion, our fully convolutional model designed for emotion
classification overcame challenges posed by limited computational
power and dataset constraints. Despite working with a smaller dataset
and fewer training epochs, the model achieved a commendable accuracy
of 60.86%, surpassing the performance of traditional convolutional
architectures. The scarcity of representative examples for certain
emotions impacted classification accuracy, while gender classification
fared relatively better.
Moreover, the architecture's success extended
beyond accuracy, as it also demonstrated improved speed, with an average
prediction time of 20ms compared to the 30ms of feed-forward models (1.5 speedup),
making it a compelling choice for real-time applications.
For more details, please refer to the GitHub link, where you can find the complete report and the original paper.