A novel Engagement Detection with Multi-Task Training (ED-MTT) system

Related tags

Deep Learning ED-MTT
Overview

ED-MTT

A novel Engagement Detection with Multi-Task Training (ED-MTT) system which minimizes MSE and triplet loss together to determine the engagement level of students in an e-learning environment. You can check the colab notebook bellow for detailed explanatoins about data loading and code execution.

Open In Colab

Introduction & Problem Definition

With the Covid-19 outbreak, the online working and learning environments became essential in our lives. For this reason, automatic analysis of non-verbal communication becomes crucial in online environments.

Engagement level is a type of social signal that can be predicted from facial expression and body pose. To this end, we propose an end-to-end deep learning-based system that detects the engagement level of the subject in an e-learning environment.

The engagement level feedback is important because:

  • Make aware students of their performance in classes.
  • Will help instructors to detect confusing or unclear parts of the teaching material.

Model Architecture

triplet_loss.png

The proposed system first extracts features with OpenFace, then aggregates frames in a window for calculating feature statistics as additional features. Finally, uses Bi-LSTM for generating vector embeddings from input sequences. In this system, we introduce a triplet loss as an auxiliary task and design the system as a multi-task training framework by taking inspiration from, where self-supervised contrastive learning of multi-view facial expressions was introduced. To the best of our knowledge, this is a novel approach in engagement detection literature. The key novelty of this work is the multi-task training framework using triplet loss together with Mean Squared Error (MSE). The main contributions of this paper are as follows:

  • Multi-task training with triplet and MSE losses introduces an additional regularization and reduces over-fitting due to very small sample size.
  • Using triplet loss mitigates the label reliability problem since it measures relative similarity between samples.
  • A system with lightweight feature extraction is efficient and highly suitable for real-life applications.

Dataset

We evaluate the performance of ED-MTT on a publicly available ``Engagement in The Wild'' dataset which is comprised of separated training and validation sets.

Untitled

The dataset is comprised of 78 subjects (25 females and 53 males) whose ages are ranged from 19 to 27. Each subject is recorded while watching an approximately 5 minutes long stimulus video of a Korean Language lecture.

Results

We compare the performance of ED-MTT with 9 different works from the state-of-the-art which will be reviewed in the rest of this section. Our results show that ED-MTT outperforms these state-of-the-art methods with at least a 5.74% improvement on MSE.

paper_performance.png

Repository structure

ED-MTT
│   README.md
│   Engagement_Labels.txt
|   ED-MTT.ipynb

└───code
│   │   dataloader.py
|   |   model.py
|   |   train.py
|   |   test.py
│   │   fix_path.py
|   |   utils.py
|   |   requirements.txt

└───configs
    │   batchnorm_default.yaml
    │   sweep.yaml

Running the Code

Untitled

Untitled

To train the experiments and manage the experiments, we used PyTorch Lightning together with Weights&Biases. All the detailed explonations to;

  • Load data and pre-trained weights,
  • Train the model from scratch,
  • Manage expriments and hyper-parameter search with wandb,
  • Reproduce the results presented in the paper,

are shown in ED-MTT.ipynb colab notebook.

You might also like...
Face Detection and Alignment using Multi-task Cascaded Convolutional Networks (MTCNN)
Face Detection and Alignment using Multi-task Cascaded Convolutional Networks (MTCNN)

Face-Detection-with-MTCNN Face detection is a computer vision problem that involves finding faces in photos. It is a trivial problem for humans to sol

Multi-task yolov5 with detection and segmentation based on yolov5
Multi-task yolov5 with detection and segmentation based on yolov5

YOLOv5DS Multi-task yolov5 with detection and segmentation based on yolov5(branch v6.0) decoupled head anchor free segmentation head README中文 Ablation

E2e music remastering system - End-to-end Music Remastering System Using Self-supervised and Adversarial Training
E2e music remastering system - End-to-end Music Remastering System Using Self-supervised and Adversarial Training

End-to-end Music Remastering System This repository includes source code and pre

Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

U2Fusion Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal (VIS-IR, medical), multi

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.
Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.
Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

PyTorch Implementation of Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers 1 Using Colab Please notic

A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.
A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.

P-tuning A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''. How to use our code We have released the code

Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The original code is written in keras.

CasRel-pytorch-reimplement Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The o

The code for the CVPR 2021 paper Neural Deformation Graphs, a novel approach for globally-consistent deformation tracking and 3D reconstruction of non-rigid objects.
The code for the CVPR 2021 paper Neural Deformation Graphs, a novel approach for globally-consistent deformation tracking and 3D reconstruction of non-rigid objects.

Neural Deformation Graphs Project Page | Paper | Video Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction Aljaž Božič, Pablo P

Comments
  • How to run demo for this project

    How to run demo for this project

    Dear Mr.CopurOnur,

    I'm developing a tracking student engagement in an E-learning system for my thesis project. Luckily, I found your work and very interested in it. But I don't know how to run this project for testing purpose such as: using computer's webcam to test, so could you please guide me that how to do that. Have a nice day!

    Best regards,

    Son Nguyen

    opened by sonnguyen1996 1
Owner
Onur Çopur
Data scientist with research interests in computer vision and NLP. Highly skilled in Python programming, MLOps and deep learning frameworks.
Onur Çopur
Using machine learning to predict and analyze high and low reader engagement for New York Times articles posted to Facebook.

How The New York Times can increase Engagement on Facebook Using machine learning to understand characteristics of news content that garners "high" Fa

Jessica Miles 0 Sep 16, 2021
PyToch implementation of A Novel Self-supervised Learning Task Designed for Anomaly Segmentation

Self-Supervised Anomaly Segmentation Intorduction This is a PyToch implementation of A Novel Self-supervised Learning Task Designed for Anomaly Segmen

WuFan 2 Jan 27, 2022
MVGCN: a novel multi-view graph convolutional network (MVGCN) framework for link prediction in biomedical bipartite networks.

MVGCN MVGCN: a novel multi-view graph convolutional network (MVGCN) framework for link prediction in biomedical bipartite networks. Developer: Fu Hait

null 13 Dec 1, 2022
Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxiang Wang, Han Zhao, Bo Li.

Bridging Multi-Task Learning and Meta-Learning Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Trainin

AI Secure 57 Dec 15, 2022
TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.

TorchMultimodal (Alpha Release) Introduction TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.

Meta Research 663 Jan 6, 2023
Novel Instances Mining with Pseudo-Margin Evaluation for Few-Shot Object Detection

Novel Instances Mining with Pseudo-Margin Evaluation for Few-Shot Object Detection (NimPme) The official implementation of Novel Instances Mining with

null 12 Sep 8, 2022
Multi-task Multi-agent Soft Actor Critic for SMAC

Multi-task Multi-agent Soft Actor Critic for SMAC Overview The CARE formulti-task: Multi-Task Reinforcement Learning with Context-based Representation

RuanJingqing 8 Sep 30, 2022
Multi-Object Tracking in Satellite Videos with Graph-Based Multi-Task Modeling

TGraM Multi-Object Tracking in Satellite Videos with Graph-Based Multi-Task Modeling, Qibin He, Xian Sun, Zhiyuan Yan, Beibei Li, Kun Fu Abstract Rece

Qibin He 6 Nov 25, 2022
Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders

MultiMAE: Multi-modal Multi-task Masked Autoencoders Roman Bachmann*, David Mizrahi*, Andrei Atanov, Amir Zamir Website | arXiv | BibTeX Official PyTo

Visual Intelligence & Learning Lab, Swiss Federal Institute of Technology (EPFL) 385 Jan 6, 2023