This repo holds codes of the ICCV21 paper: Visual Alignment Constraint for Continuous Sign Language Recognition.

Overview

VAC_CSLR

PWC

This repo holds codes of the paper: Visual Alignment Constraint for Continuous Sign Language Recognition.(ICCV 2021) [paper]


Prerequisites

  • This project is implemented in Pytorch (>1.8). Thus please install Pytorch first.

  • ctcdecode==0.4 [parlance/ctcdecode],for beam search decode.

  • [Optional] sclite [kaldi-asr/kaldi], install kaldi tool to get sclite for evaluation. After installation, create a soft link toward the sclite:
    ln -s PATH_TO_KALDI/tools/sctk-2.4.10/bin/sclite ./software/sclite We also provide a python version evaluation tool for convenience, but sclite can provide more detailed statistics.

  • [Optional] SeanNaren/warp-ctc At the beginning of this research, we adopt warp-ctc for supervision, and we recently find that pytorch version CTC can reach similar results.

Data Preparation

  1. Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]. Our experiments based on phoenix-2014.v3.tar.gz.

  2. After finishing dataset download, extract it to ./dataset/phoenix, it is suggested to make a soft link toward downloaded dataset.
    ln -s PATH_TO_DATASET/phoenix2014-release ./dataset/phienix2014

  3. The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.

    cd ./preprocess
    python data_preprocess.py --process-image --multiprocessing

Inference

​ We provide the pretrained models for inference, you can download them from:

Backbone WER on Dev WER on Test Pretrained model
ResNet18 21.2% 22.3% [Baidu] (passwd: qi83)
[Dropbox]

​ To evaluate the pretrained model, run the command below:
python main.py --load-weights resnet18_slr_pretrained.pt --phase test

Training

The priorities of configuration files are: command line > config file > default values of argparse. To train the SLR model on phoenix14, run the command below:

python main.py --work-dir PATH_TO_SAVE_RESULTS --config PATH_TO_CONFIG_FILE --device AVAILABLE_GPUS

Feature Extraction

We also provide feature extraction function to extract frame-wise features for other research purpose, which can be achieved by:

python main.py --load-weights PATH_TO_PRETRAINED_MODEL --phase features

To Do List

  • Pure python implemented evaluation tools.
  • WAR and WER calculation scripts.

Citation

If you find this repo useful in your research works, please consider citing:

@InProceedings{Min_2021_ICCV,
    author    = {Min, Yuecong and Hao, Aiming and Chai, Xiujuan and Chen, Xilin},
    title     = {Visual Alignment Constraint for Continuous Sign Language Recognition},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {11542-11551}
}

Relevant paper

Self-Mutual Distillation Learning for Continuous Sign Language Recognition[paper]

@InProceedings{Hao_2021_ICCV,
    author    = {Hao, Aiming and Min, Yuecong and Chen, Xilin},
    title     = {Self-Mutual Distillation Learning for Continuous Sign Language Recognition},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {11303-11312}
}

Acknowledge

We appreciate the help from Runpeng Cui, Hao Zhou@Rhythmblue and Xinzhe Han@GeraldHan :)

Comments
  • Final accuracy

    Final accuracy

    I want to make sure that you report 22.1 Dev WER and 23.0 Test WER, while 21.2 Dev WER and 22.3 Test WER of released pretrained model ? Thanks in advance for response!

    opened by hulianyuyy 8
  • Are there plans to supplement the code on the CSL dataset?

    Are there plans to supplement the code on the CSL dataset?

    Thank you very much for your contribution to the community. In the paper, I saw that experiments were carried out on both the PHOENIX14 dataset and the CSL dataset. I would like to ask if there are plans to supplement the data processing part and the training part of the code on the CSL dataset?

    opened by HW140701 6
  • Error when I try to do the inference

    Error when I try to do the inference

    Hello, I'm replicating this model but when I execute the command for do the inferece an unknowns error appears. However, I don't know why I have this error. My setup it's:

    • RTX 3060ti
    • 16GB RAM
    • Ryzen 7 5800X

    The complete error is:

    Traceback (most recent call last):
      File "main.py", line 209, in <module>
        processor.start()
      File "main.py", line 61, in start
        dev_wer = seq_eval(self.arg, self.data_loader["dev"], self.model, self.device,
      File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/seq_scripts.py", line 56, in seq_eval
        ret_dict = model(vid, vid_lgt, label=label, label_lgt=label_lgt)
      File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/slr_network.py", line 63, in forward
        framewise = self.masked_bn(inputs, len_x)
      File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/slr_network.py", line 53, in masked_bn
        x = self.conv2d(x)
      File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torchvision/models/resnet.py", line 249, in forward
        return self._forward_impl(x)
      File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torchvision/models/resnet.py", line 233, in _forward_impl
        x = self.bn1(x)
      File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 135, in forward
        return F.batch_norm(
      File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/functional.py", line 2149, in batch_norm
        return torch.batch_norm(
    RuntimeError: CUDA error: unknown error
    

    And I have change the config file: -batch_size: 2 +batch_size: 1 -test_batch_size: 8 -num_worker: 10 -device: 0,1,2 +test_batch_size: 1 +num_worker: 1 +device: 0

    Also my torch version its 1.8.1+cu111

    Thank you for the help!

    UPDATE

    Also i found this error:

    Traceback (most recent call last):
      File "main.py", line 209, in <module>
        processor.start()
      File "main.py", line 61, in start
        dev_wer = seq_eval(self.arg, self.data_loader["dev"], self.model, self.device,
      File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/seq_scripts.py", line 56, in seq_eval
        ret_dict = model(vid, vid_lgt, label=label, label_lgt=label_lgt)
      File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/slr_network.py", line 63, in forward
        framewise = self.masked_bn(inputs, len_x)
      File "/mnt/d/Universidad/Python_Envs/TFG/VAC/VAC_CSLR/slr_network.py", line 53, in masked_bn
        x = self.conv2d(x)
      File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torchvision/models/resnet.py", line 249, in forward
        return self._forward_impl(x)
      File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torchvision/models/resnet.py", line 232, in _forward_impl
        x = self.conv1(x)
      File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 399, in forward
        return self._conv_forward(input, self.weight, self.bias)
      File "/mnt/d/Universidad/Python_Envs/TFG/VAC/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 395, in _conv_forward
        return F.conv2d(input, weight, bias, self.stride,
    RuntimeError: CUDA error: unknown error
    

    whit the next config -batch_size: 2 +batch_size: 1 random_seed: 0 -test_batch_size: 8 -num_worker: 10 -device: 0,1,2 +test_batch_size: 2 +num_worker: 2 +device: 0

    opened by JoseMoFi 6
  • Issue about alignment between label and frames.

    Issue about alignment between label and frames.

    Thanks for your great job. I'm wondering how to draw a picture like Fig.5 in your paper. The key point lies in how to align labels with frames. Could you provide some advice? Thanks in advance!

    opened by hulianyuyy 4
  • Time to train

    Time to train

    Hello, great work with this paper and repo! I would like to ask you how much time you spent training the model (for the dataset Phoenix12) and what kind gpu you used for the training. Because I am trying to replicate it but with other dataset (specificly the Phoenix14-T), and in my first test I spent around 14h to train 10 epochs. I used a TitanXP with 12Gb for the training and a batch = 1.

    Thank you again for your work and congratulation for this repo.

    opened by JoseMoFi 2
  •  how to solve this error in the training model. I look forward to your answer

    how to solve this error in the training model. I look forward to your answer

    Traceback (most recent call last): File "main.py", line 211, in processor.start() File "main.py", line 44, in start seq_train(self.data_loader['train'], self.model, self.optimizer, File "/home/linux/data2/sun/VAC_CSLR/seq_scripts.py", line 18, in seq_train for batch_idx, data in enumerate(tqdm(loader)): File "/home/linux/anaconda3/envs/ssn/lib/python3.8/site-packages/tqdm/std.py", line 1180, in iter for obj in iterable: File "/home/linux/anaconda3/envs/ssn/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 530, in next data = self._next_data() File "/home/linux/anaconda3/envs/ssn/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1224, in _next_data return self._process_data(data) File "/home/linux/anaconda3/envs/ssn/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1250, in _process_data data.reraise() File "/home/linux/anaconda3/envs/ssn/lib/python3.8/site-packages/torch/_utils.py", line 457, in reraise raise exception IndexError: Caught IndexError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/linux/anaconda3/envs/ssn/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/home/linux/anaconda3/envs/ssn/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/linux/anaconda3/envs/ssn/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/linux/data2/sun/VAC_CSLR/dataset/dataloader_video.py", line 47, in getitem input_data, label = self.normalize(input_data, label) File "/home/linux/data2/sun/VAC_CSLR/dataset/dataloader_video.py", line 78, in normalize video, label = self.data_aug(video, label, file_id) File "/home/linux/data2/sun/VAC_CSLR/utils/video_augmentation.py", line 24, in call image = t(image) File "/home/linux/data2/sun/VAC_CSLR/utils/video_augmentation.py", line 119, in call if isinstance(clip[0], np.ndarray): IndexError: list index out of range

    opened by sunsn1997 2
  • 关于baseline复现结果不一致的问题

    关于baseline复现结果不一致的问题

    您好,我有一些关于实验代码的一些问题。 在您的论文表3中,baseline在DEV上的结果是25.4,我在代码中尝试将loss中的ConvCTC和Dist去掉来实现它,但是得到了:仅在epoch=40时,WER=24.8%,最终结果与表3中的结果相差较多,出现这样的结果是否是因为我疏忽了某些应该去掉的部分?

    log.txt config.txt

    opened by miaomiao9miao 2
  • Weird glosses in the annotation of phoenix dataset

    Weird glosses in the annotation of phoenix dataset

    Hi @ycmin95 , recently, I checked the annotation of phoenix dataset and the gloss dictionary generated during the progress of data preparation. There are many weird glosses, such as "ON", "OFF", "LEFTHAND" ... image I wonder whether we should keep these weird glosses in the label... Any advice?

    opened by sunke123 2
  • 请问ctcdecode初始化所用的vocab为什么能用chr(20000-21296)生成呢?

    请问ctcdecode初始化所用的vocab为什么能用chr(20000-21296)生成呢?

    您的工作非常出色! 在ctcdecode的文档中,vocab要用待解码的字典来初始化,为什么代码实现用chr(20000+(0~1296))就可以实现呢?20000这个数字是特定的吗? 另外,您的论文中图5给出了模型生成标签与ground_truth和视频的对齐效果,但是我通过ctcdecode只能生成标签但无法用于对齐标注,请问这部分工作是需要额外的代码实现吗? 期待您的答复!

    opened by blankspark 2
  • Pseudo Label

    Pseudo Label

    I'm wondering how to assign labels for frames with CTC loss. It seems CTC Loss can be viewed as sequential SoftMax losses. But the key point is how to obtain the pseudo labels for frames via back propagation. Thanks in advance!

    opened by hulianyuyy 2
  • Finetuning and continue training

    Finetuning and continue training

    Hello, Thank you for the awesome work. I am trying to use the model on another dataset, so I figure I should structure my data accordingly to the format of phoenix2014. Is there anything else I should worry about or just running the preprocessing with the same structure is gonna be alright?

    Also, since I am training on google colab, I won't be able to train for 80 epochs consecutively and plan to split it into several different runs. Is there a built in function to load the previous model and continue training (or finetuning, if I want to finetune the pretrain) or how should I begin to tackle this problem? I am not sure if --load-weights tag is enough. Thank you so much.

    opened by khoapip 1
  • Video augmentation methods for Pre-trained model

    Video augmentation methods for Pre-trained model

    What are the video augmentation options used in the pre-trained model ([Dropbox]) ? In the code I can see that these are the ones uncommented, is that the case for the pretrained model? dataset/dataloader_video.py ksnip_20220225-151108

    opened by Aayush2007 1
  • Question about CPU or GPU error

    Question about CPU or GPU error

    I ran your code and found the following error, where are the parameters put into the GPU?

    Traceback (most recent call last): File "main.py", line 218, in processor.start() File "main.py", line 46, in start seq_train(self.data_loader['train'], self.model, self.optimizer,self.device, epoch, self.recoder) File "/home/quchunguang/sunday/CSLR/seq_scripts.py", line 24, in seq_train loss = model.criterion_calculation(ret_dict, label, label_lgt) File "/home/quchunguang/sunday/CSLR/slr_network.py", line 96, in criterion_calculation label_lgt.cpu().int()).mean() File "/home/quchunguang/anaconda3/envs/tf/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(*input, **kwargs) File "/home/quchunguang/anaconda3/envs/tf/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 1295, in forward self.zero_infinity) File "/home/quchunguang/anaconda3/envs/tf/lib/python3.6/site-packages/torch/nn/functional.py", line 1767, in ctc_loss zero_infinity) RuntimeError: Tensor for argument #2 'targets' is on CPU, but expected it to be on GPU (while checking arguments for ctc_loss_gpu)

    opened by chunguangqu 5
Owner
Yuecong Min
CS Ph.D. candidate, Computer Vision
Yuecong Min
This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

TransUNet This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation Usage

null 1.4k Jan 4, 2023
This repository holds the code for the paper "Deep Conditional Gaussian Mixture Model forConstrained Clustering".

Deep Conditional Gaussian Mixture Model for Constrained Clustering. This repository holds the code for the paper Deep Conditional Gaussian Mixture Mod

null 17 Oct 30, 2022
Implementation of "Distribution Alignment: A Unified Framework for Long-tail Visual Recognition"(CVPR 2021)

Implementation of "Distribution Alignment: A Unified Framework for Long-tail Visual Recognition"(CVPR 2021)

null 105 Nov 7, 2022
The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift

TwoStageAlign The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift Pa

Shi Guo 32 Dec 15, 2022
MLOps will help you to understand how to build a Continuous Integration and Continuous Delivery pipeline for an ML/AI project.

page_type languages products description sample python azure azure-machine-learning-service azure-devops Code which demonstrates how to set up and ope

null 1 Nov 1, 2021
This repository holds code and data for our PETS'22 article 'From "Onion Not Found" to Guard Discovery'.

From "Onion Not Found" to Guard Discovery (PETS'22) This repository holds the code and data for our PETS'22 paper titled 'From "Onion Not Found" to Gu

Lennart Oldenburg 3 May 4, 2022
Official code for the paper: Deep Graph Matching under Quadratic Constraint (CVPR 2021)

QC-DGM This is the official PyTorch implementation and models for our CVPR 2021 paper: Deep Graph Matching under Quadratic Constraint. It also contain

Quankai Gao 55 Nov 14, 2022
Ranking Models in Unlabeled New Environments (iccv21)

Ranking Models in Unlabeled New Environments Prerequisites This code uses the following libraries Python 3.7 NumPy PyTorch 1.7.0 + torchivision 0.8.1

null 14 Dec 17, 2021
[ICCV21] Self-Calibrating Neural Radiance Fields

Self-Calibrating Neural Radiance Fields, ICCV, 2021 Project Page | Paper | Video Author Information Yoonwoo Jeong [Google Scholar] Seokjun Ahn [Google

null 381 Dec 30, 2022
[ICCV21] Code for RetrievalFuse: Neural 3D Scene Reconstruction with a Database

RetrievalFuse Paper | Project Page | Video RetrievalFuse: Neural 3D Scene Reconstruction with a Database Yawar Siddiqui, Justus Thies, Fangchang Ma, Q

Yawar Nihal Siddiqui 75 Dec 22, 2022
Official PyTorch Implementation of GAN-Supervised Dense Visual Alignment

GAN-Supervised Dense Visual Alignment — Official PyTorch Implementation Paper | Project Page | Video This repo contains training, evaluation and visua

null 944 Jan 7, 2023
Combining Reinforcement Learning and Constraint Programming for Combinatorial Optimization

Hybrid solving process for combinatorial optimization problems Combinatorial optimization has found applications in numerous fields, from aerospace to

null 117 Dec 13, 2022
Robot Reinforcement Learning on the Constraint Manifold

Implementation of "Robot Reinforcement Learning on the Constraint Manifold"

null 31 Dec 5, 2022
ISBI 2022: Cross-level Contrastive Learning and Consistency Constraint for Semi-supervised Medical Image.

Cross-level Contrastive Learning and Consistency Constraint for Semi-supervised Medical Image Introduction This repository contains the PyTorch implem

null 25 Nov 9, 2022
Reusable constraint types to use with typing.Annotated

annotated-types PEP-593 added typing.Annotated as a way of adding context-specific metadata to existing types, and specifies that Annotated[T, x] shou

null 125 Dec 26, 2022
Constraint-based geometry sketcher for blender

Constraint-based sketcher addon for Blender that allows to create precise 2d shapes by defining a set of geometric constraints like tangent, distance,

null 1.7k Dec 31, 2022
This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian Sign Language.

LIBRAS-Image-Classifier This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian

Aryclenio Xavier Barros 26 Oct 14, 2022
Sign Language Transformers (CVPR'20)

Sign Language Transformers (CVPR'20) This repo contains the training and evaluation code for the paper Sign Language Transformers: Sign Language Trans

Necati Cihan Camgoz 164 Dec 30, 2022
Sign Language Translation with Transformers (COLING'2020, ECCV'20 SLRTP Workshop)

transformer-slt This repository gathers data and code supporting the experiments in the paper Better Sign Language Translation with STMC-Transformer.

Kayo Yin 107 Dec 27, 2022