VAC_CSLR
This repo holds codes of the paper: Visual Alignment Constraint for Continuous Sign Language Recognition.(ICCV 2021) [paper]
Prerequisites
-
This project is implemented in Pytorch (>1.8). Thus please install Pytorch first.
-
ctcdecode==0.4 [parlance/ctcdecode],for beam search decode.
-
[Optional] sclite [kaldi-asr/kaldi], install kaldi tool to get sclite for evaluation. After installation, create a soft link toward the sclite:
ln -s PATH_TO_KALDI/tools/sctk-2.4.10/bin/sclite ./software/sclite
We also provide a python version evaluation tool for convenience, but sclite can provide more detailed statistics. -
[Optional] SeanNaren/warp-ctc At the beginning of this research, we adopt warp-ctc for supervision, and we recently find that pytorch version CTC can reach similar results.
Data Preparation
-
Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]. Our experiments based on phoenix-2014.v3.tar.gz.
-
After finishing dataset download, extract it to ./dataset/phoenix, it is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET/phoenix2014-release ./dataset/phienix2014
-
The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
cd ./preprocess python data_preprocess.py --process-image --multiprocessing
Inference
We provide the pretrained models for inference, you can download them from:
Backbone | WER on Dev | WER on Test | Pretrained model |
---|---|---|---|
ResNet18 | 21.2% | 22.3% | [Baidu] (passwd: qi83) [Dropbox] |
To evaluate the pretrained model, run the command below:
python main.py --load-weights resnet18_slr_pretrained.pt --phase test
Training
The priorities of configuration files are: command line > config file > default values of argparse. To train the SLR model on phoenix14, run the command below:
python main.py --work-dir PATH_TO_SAVE_RESULTS --config PATH_TO_CONFIG_FILE --device AVAILABLE_GPUS
Feature Extraction
We also provide feature extraction function to extract frame-wise features for other research purpose, which can be achieved by:
python main.py --load-weights PATH_TO_PRETRAINED_MODEL --phase features
To Do List
- Pure python implemented evaluation tools.
- WAR and WER calculation scripts.
Citation
If you find this repo useful in your research works, please consider citing:
@InProceedings{Min_2021_ICCV,
author = {Min, Yuecong and Hao, Aiming and Chai, Xiujuan and Chen, Xilin},
title = {Visual Alignment Constraint for Continuous Sign Language Recognition},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2021},
pages = {11542-11551}
}
Relevant paper
Self-Mutual Distillation Learning for Continuous Sign Language Recognition[paper]
@InProceedings{Hao_2021_ICCV,
author = {Hao, Aiming and Min, Yuecong and Chen, Xilin},
title = {Self-Mutual Distillation Learning for Continuous Sign Language Recognition},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2021},
pages = {11303-11312}
}
Acknowledge
We appreciate the help from Runpeng Cui, Hao Zhou@Rhythmblue and Xinzhe Han@GeraldHan :)