YouRefIt: Embodied Reference Understanding with Language and Gesture
YouRefIt: Embodied Reference Understanding with Language and Gesture
by Yixin Chen, Qing Li, Deqian Kong, Yik Lun Kei, Tao Gao, Yixin Zhu, Song-Chun Zhu and Siyuan Huang
The IEEE International Conference on Computer Vision (ICCV), 2021
Introduction
We study the machine's understanding of embodied reference: One agent uses both language and gesture to refer to an object to another agent in a shared physical environment. To tackle this problem, we introduce YouRefIt, a new crowd-sourced, real-world dataset of embodied reference.
For more details, please refer to our paper.
Checklist
- Image ERU
- Video ERU
Installation
The code was tested with the following environment: Ubuntu 18.04/20.04, python 3.7/3.8, pytorch 1.9.1. Run
git clone https://github.com/yixchen/YouRefIt_ERU
pip install -r requirements.txt
Dataset
Download the YouRefIt dataset from Dataset Request Page and put under ./ln_data
Model weights
- Yolov3: download the pretrained model and place the file in
./saved_models
bysh saved_models/yolov3_weights.sh
- More pretrained models are availble Google drive, and should also be placed in
./saved_models
.
Make sure to put the files in the following structure:
|-- ROOT
| |-- ln_data
| |-- yourefit
| |-- images
| |-- paf
| |-- saliency
| |-- saved_modeks
| |-- final_model_full.tar
| |-- final_resc.tar
Training
Train the model, run the code under main folder.
python train.py --data_root ./ln_data/ --dataset yourefit --gpu gpu_id
Evaluation
Evaluate the model, run the code under main folder. Using flag --test
to access test mode.
python train.py --data_root ./ln_data/ --dataset yourefit --gpu gpu_id \
--resume saved_models/model.pth.tar \
--test
Evaluate Image ERU on our released model
Evaluate our full model with PAF and saliency feature, run
python train.py --data_root ./ln_data/ --dataset yourefit --gpu gpu_id \
--resume saved_models/final_model_full.tar --use_paf --use_sal --large --test
Evaluate baseline model that only takes images as input, run
python train.py --data_root ./ln_data/ --dataset yourefit --gpu gpu_id \
--resume saved_models/final_resc.tar --large --test
Evalute the inference results on test set on different IOU levels by changing the path accordingly,
python evaluate_results.py
Citation
@inProceedings{chen2021yourefit,
title={YouRefIt: Embodied Reference Understanding with Language and Gesture},
author = {Chen, Yixin and Li, Qing and Kong, Deqian and Kei, Yik Lun and Zhu, Song-Chun and Gao, Tao and Zhu, Yixin and Huang, Siyuan},
booktitle={The IEEE International Conference on Computer Vision (ICCV),
year={2021}
}
Acknowledgement
Our code is built on ReSC and we thank the authors for their hard work.