Computer Vision Script to recognize first person motion, developed as final project for the course "Machine Learning and Deep Learning"

Simone Papicchio

Last update: Jul 16, 2022

Related tags

Deep Learning Imprinting-the-Motion

Overview

Overview of The Code

BaseColab/MLDL_FPAR.pdf: it contains the full explanation of our work
Base Colab: it contains the base colab used to perform all the training required for the project
Script: it contains the scripts used in Colab for calling the correct module
module.py: python files

Imprinting The Motion

Abstract

First person action recognition (FPAR) task is one of the most challenging in action recognition field. Most of the existing works address this issue with two-stream architectures, where the visual appearance and the motion information of the object of interest, are exploited. In this paper, we use as starting point the Ego-RNN architecture with the addition of the Motion Segmentation (MS) auxiliary task. We propose the injection of a new branch in the architecture, in order to employ the motion information more effectively. This leads to have better predictions.

Our Architecture

The Action Recognition Block extracts important spatial and temporal information from the video with the exploitation of the ResNet-34 (mustard), Spatial Attention Layer (yellow) and ConvLSTM (orange). Moreover, it takes advantage from the auxiliary task of the Motion Prediction Block, by embedding its knowledge inside the first layers of the backbone. This is performed with a feedback branch (blue) that takes as input the features of the motion segmentation (MS) task (green). The Motion Prediction Block takes, as input, the appearance features from the layer 4 of the ResNet and tries to identifies which parts of the image are going to move

Results

Our architecture is able to further exploit the motion information provided by the motion segmentation by merging them with the appearance features in the first layers of the backbone. The result is a model that better focuses on the relevant elements for action recognition and this lead to the correct prediction (shake tea cup instead of stir spoon cup)

You might also like...

This repository contains the code for the paper "Hierarchical Motion Understanding via Motion Programs"

Hierarchical Motion Understanding via Motion Programs (CVPR 2021) This repository contains the official implementation of: Hierarchical Motion Underst

40 Dec 5, 2022

Exploring Versatile Prior for Human Motion via Motion Frequency Guidance (3DV2021)

Exploring Versatile Prior for Human Motion via Motion Frequency Guidance This is the codebase for video-based human motion reconstruction in human-mot

5 Jul 14, 2022

All course materials for the Zero to Mastery Deep Learning with TensorFlow course.

3.4k Jan 7, 2023

This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian Sign Language.

LIBRAS-Image-Classifier This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian

26 Oct 14, 2022

Computer Vision Script to recognize first person motion, developed as final project for the course "Machine Learning and Deep Learning"

Related tags

Overview

Overview of The Code

Imprinting The Motion

Abstract

Our Architecture

Results

You might also like...

This repository contains the code for the paper "Hierarchical Motion Understanding via Motion Programs"

Exploring Versatile Prior for Human Motion via Motion Frequency Guidance (3DV2021)

All course materials for the Zero to Mastery Deep Learning with TensorFlow course.

This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian Sign Language.

Recognize Handwritten Digits using Deep Learning on the browser itself.

A program to recognize fruits on pictures or videos using yolov5

use machine learning to recognize gesture on raspberrypi

Recognize numbers from an (28 x 28) image using neural networks

Face recognize and crop them

Owner

Simone Papicchio

Deep Learning for Computer Vision final project

Eye-Blink-Counter - Python based Computer Vision project which counts how many time a person blinks

Final Project for the CS238: Decision Making Under Uncertainty course at Stanford University in Autumn '21.

Implementation of the final project of the course DDA6309 Probabilistic Graphical Model

This is the first released system towards complex meters` detection and recognition, which is implemented by computer vision techniques.

The reference baseline of final exam for XMU machine learning course

A script that trains a model to recognize handwritten digits using the MNIST data set.

In this project, we develop a face recognize platform based on MTCNN object-detection netcwork and FaceNet self-supervised network.

Multi-Person Extreme Motion Prediction

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.