Building a real-time environment using webcam frame division in OpenCV and classify cropped images using a fine-tuned vision transformers on hybryd datasets samples for facial emotion recognition.

Mario Sessa

Last update: Dec 12, 2022

Related tags

Deep Learning machine-learning transformers neural-networks emotion-detection emotion-recognition colab-notebook vision-transformer

Overview

Visual Transformer for Facial Emotion Recognition (FER)

This project has the aim to build an efficient Visual Transformer for the Facial Emotion Recognition (FER) task. Project is interally on Python Notebook, hosted on Google Colab with a runtime environment given by NVIDIA P100 setup.

Dataset

Dataset is formed by 8 different classes integrated by 3 different subsets:

FER-2013: It contains approximately 35,000 facial RGB images of different expressions with size restricted to 48×48, and the main labels of it can be divided into 7 types: 0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral. The Disgust expression has the minimal number of images – 600, while other labels have nearly 5,000 samples each.
CK+: The Extended Cohn-Kanade (CK+) dataset contains some images extrapolated from 593 video sequences from a total of 123 different subjects, ranging from 18 to 50 years of age with a variety of genders and heritage. Each video shows a facial shift from the neutral expression to a targeted peak expression, recorded at 30 frames per second (FPS) with a resolution of either 640x490 or 640x480 pixels. Unfortunately, we don't have the entire generated datasets but we stored only 1000 images with high variance from a kaggle repository.
AffectNet: It is a large facial expression dataset with 41.000 images classified in eight categories (neutral, happy, angry, sad, fear, surprise, disgust, contempt) of facial expressions along with the intensity of valence and arousal.

Data loading, integration and analysis are in the first part of the ViT-Emotion-Recognition.ipynb notebook. The result dataset is an integration divided by two subset (train an val folder) with 8 subfolder with the scope of the class label.

Data Management

Given an eterogeneous dataset on a fine-tuned transformer, we had to manage some image features:

Data Scaling: Pre-trained models are transformers with different configurations that train them on ImageNet dataset for the object detection with images on 224x224. We use the same scale and convert input data to this size.
Data Channels: We use RGB channels for each images for the same reason of the previous point.
Data Augmentation: We use brightness, rotation, scaling, translation and zooming augmentation to improve the amount of the samples and balance the dataset classes variation.

Model

Overview of the model: The input image is split into fixed-sized patches; the embedding phase is preceded by a convolutional layer with a kernel 16x16 with a stride of 16x16. The output of the convolution is then used for the embedding phase where the resulting vector is given by the sum of the position embedding and a linear embedding in a projection space of 768 dimensions. The embedded patches are then processed by a set of 11 sequential Transformer Encoders. For the classification task, the final layer is a linear layer with a 8 dimensional output for our eight emotions. The model we rely on is pretrained on ImageNet and finetuned with the datased described above.

Source: https://github.com/google-research/vision_transformer

Authors

Andrea Gurioli (@andreagurioli1995)
Mario Sessa (@kode-git)

License

You might also like...

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction. It uses a customized encoder decoder architecture with spatio-temporal convolutions and channel gating to capture and interpolate complex motion trajectories between frames to generate realistic high frame rate videos. This repository contains original source code for the paper accepted to CVPR 2021.

280 Dec 23, 2022

Demonstrates how to divide a DL model into multiple IR model files (division) and introduce a simplest way to implement a custom layer works with OpenVINO IR models.

Demonstration of OpenVINO techniques - Model-division and a simplest-way to support custom layers Description: Model Optimizer in Intel(r) OpenVINO(tm

12 Nov 9, 2022

Automatic Attendance marker for LMS Practice School Division, BITS Pilani

LMS Attendance Marker Automatic script for lazy people to mark attendance on LMS for Practice School 1. Setup Add your LMS credentials and time slot t

3 Jun 12, 2021

Automatically measure the facial Width-To-Height ratio and get facial analysis results provided by Microsoft Azure

fwhr-calc-website This project is to automatically measure the facial Width-To-Height ratio and get facial analysis results provided by Microsoft Azur

1 Feb 7, 2022

Hand gesture recognition based whiteboard that allows you to write on live webcam. This is the first version and has features like 4 different colors, eraser and a recording option that records your session and saves it in a "recordings" folder. Use index finger to draw and two or more fingers to move around and select items. Future version will contain more functionalities like changeable thickness, color palette, integration with zoom and google meet etc.

hand-write Hand gesture recognition based whiteboard that allows you to write on live webcam. This is the first version and has features like 4 differ

27 Dec 16, 2022

An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.

简介通过PaddlePaddle框架复现了论文 Real-time Convolutional Neural Networks for Emotion and Gender Classification 中提出的两个模型，分别是SimpleCNN和MiniXception。利用 imdb_crop

8 Mar 11, 2022

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation Ported from https://github.com/hzwer/arXiv2020-RIFE Dependencies NumPy

49 Jan 7, 2023

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real Time Video Interpolation arXiv | YouTube | Colab | Tutorial | Demo Table of Contents Introduction Collection Usage Evaluation Training and

3k Jan 4, 2023

A Moonraker plug-in for real-time compensation of frame thermal expansion

Frame Expansion Compensation A Moonraker plug-in for real-time compensation of frame thermal expansion. Installation Credit to protoloft, from whom I

58 Jan 2, 2023

Comments

Pre-processing phase removes some images
After the Data Analysis on the AVFER, data from the splitting phase is different after the pre-processing, we need to check

Check the removing of png can influence the number

Control if there are some changes after the reshaping

Be care about the possible miss-indentation of the os.remove(fl)

I need to run again the data integration and data analysis of the AVFER before test features variation on the pre-processing phase.
bug
opened by kode-git 2

Releases(0.3.12)

0.3.12(May 16, 2022)
Adding presentation and official documentation

Splitting notebook per sections

Adding additional comments to the code

Source code(tar.gz)
Source code(zip)
0.3.11(May 14, 2022)
Adding ViT-B/16/S model on 25 epochs with constant learning rate

Checking on training and validation accuracy/loss parameters according to the training log

Display results on standalone plots

Source code(tar.gz)
Source code(zip)
vfer_small_25.pth(327.37 MB)
vfer_small_25_history_loss.pkl(490 bytes)
vfer_small_25_history_train.pkl(233 bytes)
vfer_small_25_history_val.pkl(233 bytes)
0.3.10(May 13, 2022)
Adding evaluation for ResNet18

Debugging on SAM model evaluation

Improvment Training Plot support curves on N < 5 lines

Model adaptation during loading on evaluation (standalone) with adapting on backbones

Source code(tar.gz)
Source code(zip)
0.3.9(May 12, 2022)
Adding ResNet 18 (11M parameters)

Upload history for loss and accuracy

Upload epoch 20 dump

Upload final model checkpoint

Source code(tar.gz)
Source code(zip)
resnet18_25.pth(42.72 MB)
resnet18_25_history_loss.pkl(490 bytes)
resnet18_25_history_train.pkl(7.05 KB)
resnet18_25_history_val.pkl(7.05 KB)
0.3.8(May 11, 2022)
Adding ViT-B/16/SG

Gradual learning rate every 10 epochs

SGD optimization

Adding loss and accuracy histories

Source code(tar.gz)
Source code(zip)
vfer_grad_25.pth(327.37 MB)
vfer_grad_25_history_loss.pkl(490 bytes)
vfer_grad_25_history_train.pkl(233 bytes)
vfer_grad_25_history_val.pkl(233 bytes)
0.3.7(May 11, 2022)
Adding VIT-B/16 model checkpoint using customized learning rate scheduler

Adding SAM to the model as a optimization algorithm to smooth the loss landscape

Adding history for training and validation loss

Adding history for training and validation accuracy

Source code(tar.gz)
Source code(zip)
vfer_sam_25.pth(327.37 MB)
vfer_sam_25_history_loss.pkl(490 bytes)
vfer_sam_25_history_train.pkl(233 bytes)
vfer_sam_25_history_val.pkl(233 bytes)
0.3.6(May 9, 2022)
Configuration of resnet18 with gradual learning rate

Starting learning rate at 0.01

Epochs 50 with plateau at 25

Loading training and validation accuracy histories

Source code(tar.gz)
Source code(zip)
resnet18.pth(44.69 MB)
resnet18_25_history_loss.pkl(490 bytes)
resnet18_history_train.pkl(14.17 KB)
resnet18_history_val.pkl(14.17 KB)
0.3.5(May 9, 2022)
Adding SAM optimization for VIT-B/16

Defining closure for sharpness-aware minimization efficiency

Debugging model loader for the checkpoints recovery

Source code(tar.gz)
Source code(zip)
0.2.5(May 7, 2022)
Upload optimal model on AffectNet

Defines evaluation plots on accuracy and loss values

Source code(tar.gz)
Source code(zip)
vfer_grad_25.pth(327.37 MB)
vfer_grad_25_history_loss.pkl(130 bytes)
vfer_grad_25_history_train.pkl(1.48 KB)
vfer_grad_25_history_val.pkl(1.48 KB)
0.2.4(May 6, 2022)
Adding gradual learning rate

Modify dataset with AffectNet in validation and testing set

Adding scheduler for learning rate adjustment

Source code(tar.gz)
Source code(zip)
vfer_grad_50.pth(327.37 MB)
vfer_grad_50_history_train.pkl(2.86 KB)
vfer_grad_50_history_val.pkl(2.86 KB)
0.2.3(Apr 29, 2022)
Extends data analysis for the AffectNet, CK+48 and FER-2013

Creation of AVFER with the following features

Splitting initial dataset in training and testing set with ratio 80/20

Splitting validation and training set with ratio 90/10

Testing and validation set contains only samples from AffectNet (RGB and high quality images)

Drive of AVFER: https://drive.google.com/drive/folders/1-8WG_CNrU3chL_OHpkM8EYx3Bm129cnE?usp=sharing
Source code(tar.gz)
Source code(zip)
0.2.2(Apr 27, 2022)
Adjust train and test splitting

Balancing augmentation over 150.000 samples

Removing augmentation on validation to increment variability

Loading of vfer for 5, 15 and 25 epochs of training on the result dataset

Loading history for training and validation accuracy/loss

Source code(tar.gz)
Source code(zip)
epoch_15_vfer_small_50(327.37 MB)
epoch_15_vfer_small_50.pth(327.37 MB)
epoch_25_vfer_small.pth(327.37 MB)
epoch_25_vfer_small_50(327.37 MB)
epoch_5_vfer_small_50(327.37 MB)
vfer_small_15_on_50_history_loss.pkl(220 bytes)
vfer_small_15_on_50_history_train.pkl(3.00 KB)
vfer_small_15_on_50_history_val.pkl(3.00 KB)
vfer_small_25_on_50_history_loss.pkl(220 bytes)
vfer_small_25_on_50_history_train.pkl(3.00 KB)
vfer_small_25_on_50_history_val.pkl(3.00 KB)
0.2.1(Apr 24, 2022)
Adding integration with partial training during the transformer weights improvements (best-fit)

Updating of the VFER model on 5/50 training epochs with 62% accuracy (state-of-art of AffectNet visual transformer)

Integrating with fluid system for face detection in the cropping phase

Source code(tar.gz)
Source code(zip)
epoch_5_vfer_small_50(327.37 MB)
0.2.0(Apr 22, 2022)
Adjust normalization parameters from [0.48, 0.28] to 0.5

Balancing dataset with not augment element in validation

Resize the training set on double capacity for less epochs on training phase

Adding featuring and inference on video capture tools in OpenCV for models applications

Source code(tar.gz)
Source code(zip)
0.1.0(Apr 18, 2022)
Model dump for batch 50 on 12 epochs for the VFER transformer, accuracy of 69%

Model dump for batch 60 on 24 epochs for the VFER transformer, accuracy of 70%

Model dump for batch 60 on 50 epochs for the VFER transformer, accuracy of 71%

Debugging notebook for the loss evaluation

Adding every section until the evaluation

Integration of the dataset available here

Source code(tar.gz)
Source code(zip)
vfer_base_12.zip(304.26 MB)
vfer_base_24.zip(304.25 MB)
vfer_base_50.zip(608.51 MB)

Owner

Mario Sessa

Computer Scientist for /dev/null. Master Student in Computer Science.

GitHub

source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT

LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval This repository contains source code and pre-trained/fine-tun

65 Dec 26, 2022

Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python

deepface Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python. It is a hybrid

2 Feb 10, 2022

SEOVER: Sentence-level Emotion Orientation Vector based Conversation Emotion Recognition Model

SEOVER-Master This code is the implementation of paper： SEOVER: Sentence-level Emotion Orientation Vector based Conversation Emotion Recognition Model

4 Feb 24, 2022

Method for facial emotion recognition compitition of Xunfei and Datawhale .

人脸情绪识别挑战赛-第3名-W03KFgNOc-源代码、模型以及说明文档队名：W03KFgNOc 排名：3 正确率: 0.75564 队员：yyMoming,xkwang,RichardoMu。比赛链接：人脸情绪识别挑战赛文章地址:link emotion 该项目分别训练八个模型并生成csv文

6 Oct 17, 2022

Testing the Facial Emotion Recognition (FER) algorithm on animations

PegHeads-Tutorial-3 Testing the Facial Emotion Recognition (FER) algorithm on animations

2 Jan 3, 2022

A real-time speech emotion recognition application using Scikit-learn and gradio

Speech-Emotion-Recognition-App A real-time speech emotion recognition application using Scikit-learn and gradio. Requirements librosa==0.6.3 numpy sou

6 Oct 4, 2022

Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

OpenGaze: Web Service for OpenFace Facial Behaviour Analysis Toolkit Overview OpenFace is a fantastic tool intended for computer vision and machine le

4 Nov 3, 2022

OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

OpenFace 2.2.0: a facial behavior analysis toolkit Over the past few years, there has been an increased interest in automatic facial behavior analysis

5.8k Dec 31, 2022

[NeurIPS 2021] Well-tuned Simple Nets Excel on Tabular Datasets

[NeurIPS 2021] Well-tuned Simple Nets Excel on Tabular Datasets Introduction This repo contains the source code accompanying the paper: Well-tuned Sim

52 Jan 4, 2023

Offcial repository for the IEEE ICRA 2021 paper Auto-Tuned Sim-to-Real Transfer.

47 Jun 30, 2022

Building a real-time environment using webcam frame division in OpenCV and classify cropped images using a fine-tuned vision transformers on hybryd datasets samples for facial emotion recognition.

Related tags

Overview

Visual Transformer for Facial Emotion Recognition (FER)

Dataset

Data Management

Model

Authors

License

You might also like...

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction

Demonstrates how to divide a DL model into multiple IR model files (division) and introduce a simplest way to implement a custom layer works with OpenVINO IR models.

Automatic Attendance marker for LMS Practice School Division, BITS Pilani

Automatically measure the facial Width-To-Height ratio and get facial analysis results provided by Microsoft Azure

An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

A Moonraker plug-in for real-time compensation of frame thermal expansion

Comments

Pre-processing phase removes some images

Releases(0.3.12)

0.3.12(May 16, 2022)

0.3.11(May 14, 2022)

0.3.10(May 13, 2022)

0.3.9(May 12, 2022)

0.3.8(May 11, 2022)

0.3.7(May 11, 2022)

0.3.6(May 9, 2022)

0.3.5(May 9, 2022)

0.2.5(May 7, 2022)

0.2.4(May 6, 2022)

0.2.3(Apr 29, 2022)

0.2.2(Apr 27, 2022)

0.2.1(Apr 24, 2022)

0.2.0(Apr 22, 2022)

0.1.0(Apr 18, 2022)

Owner

Mario Sessa

source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT

Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python

SEOVER: Sentence-level Emotion Orientation Vector based Conversation Emotion Recognition Model

Method for facial emotion recognition compitition of Xunfei and Datawhale .

Testing the Facial Emotion Recognition (FER) algorithm on animations

A real-time speech emotion recognition application using Scikit-learn and gradio

Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

[NeurIPS 2021] Well-tuned Simple Nets Excel on Tabular Datasets

Offcial repository for the IEEE ICRA 2021 paper Auto-Tuned Sim-to-Real Transfer.