Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search
This is an implementation for our paper Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search. The code is modified from Github repositoty "pytorch implementation for ECCV2018 paper Deep Cross-Modal Projection Learning for Image-Text Matching".
Requirement
- Python 3.7
- Pytorch 1.0.0 & torchvision 0.2.1
- numpy
- matplotlib (not necessary unless the need for the result figure)
- scipy 1.2.1
- pytorch_transformers
Usage
Data Preparation
- Please download CUHK-PEDES dataset .
- Put reid_raw.json under project_directory/data/
- run data.sh
- Copy files test_reid.json, train_reid.json and val_reid.json under CUHK-PEDES/data/ to project_directory/data/processed_data/
- Download pretrained Resnet50 model, bert-base-uncased model and vocabulary to project_directory/pretrained/
Training & Testing
You should firstly change the parameter BASE_ROOT
to your current directory and IMAGE_DIR
to the directory of CUHK-PEDES dataset. Run command sh scripts/train.sh to train the model. Run command sh scripts/test.sh to evaluate the model.