vieCap4H Challenge 2021: A transformer-based method for Healthcare Image Captioning in Vietnamese
This repo GitHub contains our solution for vieCap4H Challenge 2021. In detail, we use grid features as visual presentation and pre-training a BERT-based language model from PhoBERT-based pre-trained model to obtain language presentation. Besides, we indicate a suitable schedule with the self-critical training sequence (SCST) technique to achieve the best results. Through experiments, we achieve an average of BLEU 30.3% on the public-test round and 28.9% on the private-test round, which ranks 3rd and 4th, respectively.
Figure 1. An overview of our solution based on RSTNet
1. Data preparation
The grid features of vieCap4H can be downloaded via links below:
- X101:
- X152:
- X152++:
Dataset can be downloaded at https://aihub.vn/competitions/40 Annotations must be converted to COCO format. We have already converted and it is available at:
2. Training
Pre-training BERT-based model with PhoBERT-based
python train_language.py \
--img_path <images path> \
--features_path <features path> \
--annotation_folder <annotations folder> \
--batch_size 40
Weights of BERT-based model should be appeared in folder saved_language_models
Then, continue to train Transformer model via command below::
python train_transformer.py \
--img_path <images path> \
--features_path <features path> \
--annotation_folder <annotations folder> \
--batch_size 40
Weights of Transformr-based model should be appeared in folder saved_transformer_rstnet_models
Where <images path>
is data folder, <features path>
is the path of grid features folder, <annotations folder>
is the path of folder that contains file viecap4h-public-train.json
.
3. Inference
The results can be obtained via command below:
python test_viecap.py
4. Pre-trained model
To implement our results on leaderboard, two pretrained models for BERT-based model and Transformer model can be downloaded via links below:
Updating...