Sign Language Translation with Transformers (COLING'2020, ECCV'20 SLRTP Workshop)

Kayo Yin

Last update: Dec 27, 2022

Related tags

Overview

transformer-slt

This repository gathers data and code supporting the experiments in the paper Better Sign Language Translation with STMC-Transformer.

Installation

This code is based on OpenNMT v1.0.0 and requires all of its dependencies (torch==1.6.0). Additional requirements are NLTK for NMT evaluation metrics.

The recommended way to install is shown below:

# create a new virtual environment
virtualenv --python=python3 venv
source venv/bin/activate

# clone the repo
git clone https://github.com/kayoyin/transformer-slt.git
cd transformer-slt

# install python dependencies
pip install -r requirements.txt

# install OpenNMT-py
python setup.py install

Sample Usage

Data processing

onmt_preprocess -train_src data/phoenix2014T.train.gloss -train_tgt data/phoenix2014T.train.de -valid_src data/phoenix2014T.dev.gloss -valid_tgt data/phoenix2014T.dev.de -save_data data/dgs -lower

Training

python  train.py -data data/dgs -save_model model -keep_checkpoint 1 \
          -layers 2 -rnn_size 512 -word_vec_size 512 -transformer_ff 2048 -heads 8  \
          -encoder_type transformer -decoder_type transformer -position_encoding \
          -max_generator_batches 2 -dropout 0.1 \
          -early_stopping 3 -early_stopping_criteria accuracy ppl \
          -batch_size 2048 -accum_count 3 -batch_type tokens -normalization tokens \
          -optim adam -adam_beta2 0.998 -decay_method noam -warmup_steps 3000 -learning_rate 0.5 \
          -max_grad_norm 0 -param_init 0  -param_init_glorot \
          -label_smoothing 0.1 -valid_steps 100 -save_checkpoint_steps 100 \
          -world_size 1 -gpu_ranks 0

Inference

python translate.py -model model [model2 model3 ...] -src data/phoenix2014T.test.gloss -output pred.txt -gpu 0 -replace_unk -beam_size 4

Scoring

# BLEU-1,2,3,4
python tools/bleu.py 1 pred.txt data/phoenix2014T.test.de
python tools/bleu.py 2 pred.txt data/phoenix2014T.test.de
python tools/bleu.py 3 pred.txt data/phoenix2014T.test.de
python tools/bleu.py 4 pred.txt data/phoenix2014T.test.de

# ROUGE
python tools/rouge.py pred.txt data/phoenix2014T.test.de

# METEOR
python tools/meteor.py pred.txt data/phoenix2014T.test.de

To dos:

Add configurations & steps to recreate paper results

Reference

Please cite the paper below if you found the resources in this repository useful:

@inproceedings{yin-read-2020-better,
    title = "Better Sign Language Translation with {STMC}-Transformer",
    author = "Yin, Kayo  and
      Read, Jesse",
    booktitle = "Proceedings of the 28th International Conference on Computational Linguistics",
    month = dec,
    year = "2020",
    address = "Barcelona, Spain (Online)",
    publisher = "International Committee on Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.coling-main.525",
    doi = "10.18653/v1/2020.coling-main.525",
    pages = "5975--5989",
    abstract = "Sign Language Translation (SLT) first uses a Sign Language Recognition (SLR) system to extract sign language glosses from videos. Then, a translation system generates spoken language translations from the sign language glosses. This paper focuses on the translation system and introduces the STMC-Transformer which improves on the current state-of-the-art by over 5 and 7 BLEU respectively on gloss-to-text and video-to-text translation of the PHOENIX-Weather 2014T dataset. On the ASLG-PC12 corpus, we report an increase of over 16 BLEU. We also demonstrate the problem in current methods that rely on gloss supervision. The video-to-text translation of our STMC-Transformer outperforms translation of GT glosses. This contradicts previous claims that GT gloss translation acts as an upper bound for SLT performance and reveals that glosses are an inefficient representation of sign language. For future SLT research, we therefore suggest an end-to-end training of the recognition and translation models, or using a different sign language annotation scheme.",
}

Comments

Error in inference (google colab)

First, congratulations on this great job.

I'm trying to replicate your experiments using Google Colab. I managed to do the training, but when I do the inference it triggers an error: RuntimeError: result type Float can't be cast to the desired output type Long

Could you help me with something that can resolve the above error?

Thank you for your attention! Congratulations again for the work!

opened by YanSoares 3
Doubt regarding SLT models

From Table.9 (under section 6.2) in the paper, I am confused regarding the difference between the S2G->G2T model and the Transformer model (S2G2T?). Could you please clarify the exact difference between these two?

opened by hshreeshail 1
Type of SLT Model

Just wanted to confirm that STMC-Transformer and STMC-RNN are both S2G->G2T models and NOT S2G2T models. My understanding of these terms is as follows: S2G->G2T: Where S2G model is trained first, and then, the G2T model is trained on the predictions of the S2G model. S2G2T: Where S2G & G2T models are trained in a joint manner. Please correct me if I am wrong.

opened by hshreeshail 1
SyntaxError: invalid syntax

Dear sir, I want do the experiment on colab follow your readme.md procedures, but I have encounter some weried problems. I first run the following commands: ! pip install virtualenv ! virtualenv --python=python3 venv ! source venv/bin/activate ! pip install torch==1.6.0 ! git clone https://github.com/kayoyin/transformer-slt.git ! pip install -r requirements.txt ! python setup.py install

that's fun. but after I copy your onmt-preprocess command which is

onmt_preprocess -train_src data/phoenix2014T.train.gloss -train_tgt data/phoenix2014T.train.de -valid_src data/phoenix2014T.dev.gloss -valid_tgt data/phoenix2014T.dev.de -save_data data/dgs -lower

It returns the error:

File "", line 1 onmt_preprocess -train_src data/phoenix2014T.train.gloss -train_tgt data/phoenix2014T.train.de -valid_src data/phoenix2014T.dev.gloss -valid_tgt data/phoenix2014T.dev.de -save_data data/dgs -lower ^ SyntaxError: invalid syntax

I really don't know why it happens, could you help me ?

opened by kokolerk 1
STMC network for SLT

Hi, I have read your work carefully and it seems to me well done, congratulations. I am very interested in being able to replicate your results in an end to end manner, and, subsequently, i would like to adapt the work to an Italian Sign Language dataset. Your work is based on two steps, the first is the translation from sign language to glosses through the STMC network. Subsequently, the glosses are translated into text through the transformers. Is there the possibility to share the code regarding the first part in which the STMC network is defined and trained? I'm really interested in the mechanism that takes videos in and translates them into glosses in a continuous way.

Thanks, Enrico

opened by enrico310786 1
Example of config_file

I am trying to run the server howewer a JSON file is required in def start(self, config_file). I was wondering whether you could provide an example of this file so I am able to see the structure and type of data that is needed. Thank you very much.

opened by mcabezag 1

The size of the train vocabulary

Although the size of the train vocabulary is 1066 In your paper, the size of the train vocabulary is 1232 which I built from "phoenix2014T.train.gloss".

Here's the code to find the train vocabulary size.


vocab = {}

with open("phoenix2014T.train.gloss", "r") as f:
    lines = f.readlines()

for sentence in lines:
    for word in sentence.strip().split(" "):
        if word in vocab:
            continue
        vocab[word] = len(vocab)
        
print(len(vocab))

opened by NiwakaDev 1

Error while training and testing

Hi, thanks for this open source repo. I am trying to execute the commands given in the README. I am receiving the following error. Any help would be appricated.

(py36torch) E:\transformer-slt>python  train.py -data data/dgs -save_model model -keep_checkpoint 1 -layers 2 -rnn_size 512 -word_vec_size 512 -transformer_ff 2048 -heads 8  -encoder_type transformer -decoder_type transformer -position_encoding -max_generator_batches 2 -dropout 0.1 -early_stopping 3 -early_stopping_criteria accuracy ppl -batch_size 2048 -accum_count 3 -batch_type tokens -normalization tokens -optim adam -adam_beta2 0.998 -decay_method noam -warmup_steps 3000 -learning_rate 0.5 -max_grad_norm 0 -param_init 0  -param_init_glorot -label_smoothing 0.1 -valid_steps 100 -save_checkpoint_steps 100 -world_size 1 -gpu_ranks 0
Traceback (most recent call last):
  File "train.py", line 6, in <module>
    main()
  File "E:\transformer-slt\onmt\bin\train.py", line 204, in main
    train(opt)
  File "E:\transformer-slt\onmt\bin\train.py", line 35, in train
    vocab = torch.load(opt.data + '.vocab.pt')
  File "E:\Envs\py36torch\lib\site-packages\torch\serialization.py", line 584, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "E:\Envs\py36torch\lib\site-packages\torch\serialization.py", line 234, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "E:\Envs\py36torch\lib\site-packages\torch\serialization.py", line 215, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'data/dgs.vocab.pt'

When i execute the testing command i am getting the following error:

(py36torch) E:\transformer-slt>python translate.py -model model [model2 model3 ...] -src data/phoenix2014T.test.gloss -output pred.txt -gpu 0 -replace_unk -beam_size 4
Traceback (most recent call last):
  File "translate.py", line 6, in <module>
    main()
  File "E:\transformer-slt\onmt\bin\translate.py", line 48, in main
    translate(opt)
  File "E:\transformer-slt\onmt\bin\translate.py", line 18, in translate
    translator = build_translator(opt, report_score=True)
  File "E:\transformer-slt\onmt\translate\translator.py", line 28, in build_translator
    fields, model, model_opt = load_test_model(opt)
  File "E:\transformer-slt\onmt\decoders\ensemble.py", line 130, in load_test_model
    onmt.model_builder.load_test_model(opt, model_path=model_path)
  File "E:\transformer-slt\onmt\model_builder.py", line 96, in load_test_model
    map_location=lambda storage, loc: storage)
  File "E:\Envs\py36torch\lib\site-packages\torch\serialization.py", line 584, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "E:\Envs\py36torch\lib\site-packages\torch\serialization.py", line 234, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "E:\Envs\py36torch\lib\site-packages\torch\serialization.py", line 215, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'model'

Please suggest the solution

opened by thisisashukla 1

Results mismatch from paper

I am assuming you used vanilla embeddings in your first ablation experiment on the number of encoder-decoder layers. Moreover, the paper states that it uses #layers=2 for further experiments. In that case, shouldn't the results in Table-3 (for #layers = 2) match the results in Table-5 (for vanilla embedding)? For example, the BLEU-4 score (for test set) in Table-3 is 21.65, and in Table-5, is 22.22. Shouldn't these not be different?

opened by hshreeshail 0
Conversion to tflite

I'm trying to convert the pytorch model to tflite model. But I'm facing issue in providing the dummy model input to torch.onnx.export() method. Do you know what can be the dummy model input ?

Here is how my code look like:

`import torch.nn as nn import torch.onnx import torchvision import torch from onmt.model_builder import build_base_model from onmt.utils.parse import ArgumentParser

checkpoint = torch.load('model_step_1600.pt')

model_opt = ArgumentParser.ckpt_model_opts(checkpoint['opt']) ArgumentParser.update_model_opts(model_opt) ArgumentParser.validate_model_opts(model_opt) vocab = checkpoint['vocab'] fields = vocab

model = build_base_model(model_opt, fields, None, checkpoint) model.eval()

dummy_input = torch.randn(1, 3, 224, 224, requires_grad=True) # dummy_input = torch.from_numpy(X_test[0].reshape(1, -1)).float().to(device)

torch.onnx.export(model, dummy_input, 'model_simple.onnx') `

opened by Aayush2007 0

Owner

Kayo Yin

Grad student at CMU LTI @neulab researching multilingual NLP (spoken + signed languages)

GitHub

Sign Language Transformers (CVPR'20)

Sign Language Transformers (CVPR'20) This repo contains the training and evaluation code for the paper Sign Language Transformers: Sign Language Trans

164 Dec 30, 2022

Source code for "Progressive Transformers for End-to-End Sign Language Production" (ECCV 2020)

Progressive Transformers for End-to-End Sign Language Production Source code for "Progressive Transformers for End-to-End Sign Language Production" (B

58 Dec 21, 2022

This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

Skeleton Aware Multi-modal Sign Language Recognition By Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li and Yun Fu. Smile Lab @ Northeastern

128 Dec 8, 2022

This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian Sign Language.

LIBRAS-Image-Classifier This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian

26 Oct 14, 2022

Sign Language is detected in realtime using video sequences. Our approach involves MediaPipe Holistic for keypoints extraction and LSTM Model for prediction.

RealTime Sign Language Detection using Action Recognition Approach Real-Time Sign Language is commonly predicted using models whose architecture consi

15 Aug 20, 2022

A project to make Amazon Echo respond to sign language using your webcam

Making Alexa respond to Sign Language using Tensorflow.js Try the live demo Read the Blog Post on Tensorflow's Blog Coming Soon Watch the video This p

444 Jan 3, 2023

Model of an AI powered sign language interpreter.

TEXT AND SPEECH TO SIGN LANGUAGE. A web application which takes in text or live audio speech recording as input, converts and displays the relevant Si

4 Mar 30, 2022

This is a model to classify Vietnamese sign language using Motion history image (MHI) algorithm and CNN.

Vietnamese sign lagnuage recognition using MHI and CNN This is a model to classify Vietnamese sign language using Motion history image (MHI) algorithm

3 Feb 24, 2022

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

16 Jul 16, 2022

ICCV2021 Oral SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks

Sign-Agnostic Convolutional Occupancy Networks Paper | Supplementary | Video | Teaser Video | Project Page This repository contains the implementation

63 Nov 18, 2022

ICCV2021 Oral SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks

Sign-Agnostic Convolutional Occupancy Networks Paper | Supplementary | Video | Teaser Video | Project Page This repository contains the implementation

64 Jan 5, 2023

Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

Spacetimeformer Multivariate Forecasting This repository contains the code for the paper, "Long-Range Transformers for Dynamic Spatiotemporal Forecast

440 Jan 2, 2023

The 1st place solution of track2 (Vehicle Re-Identification) in the NVIDIA AI City Challenge at CVPR 2021 Workshop.

AICITY2021_Track2_DMT The 1st place solution of track2 (Vehicle Re-Identification) in the NVIDIA AI City Challenge at CVPR 2021 Workshop. Introduction

91 Dec 21, 2022

Code for "ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on", accepted at WACV 2021 Generation of Human Behavior Workshop.

ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on [ Paper ] [ Project Page ] This repository contains the code fo

97 Dec 13, 2022

Source codes of CenterTrack++ in 2021 ICME Workshop on Big Surveillance Data Processing and Analysis

MOT Tracked object bounding box association (CenterTrack++) New association method based on CenterTrack. Two new branches (Tracked Size and IOU) are a

36 Oct 4, 2022

Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Data Augmentation for Scene Text Recognition (ICCV 2021 Workshop) (Pronounced as "strog") Paper Arxiv Why it matters? Scene Text Recognition (STR) req

152 Dec 28, 2022

Sign Language Translation with Transformers (COLING'2020, ECCV'20 SLRTP Workshop)

Related tags

Overview

transformer-slt

Installation

Sample Usage

Data processing

Training

Inference

Scoring

To dos:

Reference

Comments

Owner

Kayo Yin

Sign Language Transformers (CVPR'20)

Source code for "Progressive Transformers for End-to-End Sign Language Production" (ECCV 2020)

This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian Sign Language.

Sign Language is detected in realtime using video sequences. Our approach involves MediaPipe Holistic for keypoints extraction and LSTM Model for prediction.

A project to make Amazon Echo respond to sign language using your webcam

Model of an AI powered sign language interpreter.

This is a model to classify Vietnamese sign language using Motion history image (MHI) algorithm and CNN.

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

ICCV2021 Oral SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks

ICCV2021 Oral SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks

Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

The 1st place solution of track2 (Vehicle Re-Identification) in the NVIDIA AI City Challenge at CVPR 2021 Workshop.

Code for "ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on", accepted at WACV 2021 Generation of Human Behavior Workshop.

Source codes of CenterTrack++ in 2021 ICME Workshop on Big Surveillance Data Processing and Analysis

Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation.

Tracking code for the winner of track 1 in the MMP-Tracking Challenge at ICCV 2021 Workshop.