Code for CVPR2019 paper《Unequal Training for Deep Face Recognition with Long Tailed Noisy Data》

Overview

Unequal-Training-for-Deep-Face-Recognition-with-Long-Tailed-Noisy-Data.

This is the code of CVPR 2019 paper《Unequal Training for Deep Face Recognition with Long Tailed Noisy Data》.

arch

Usage Instructions

  1. The code is adopted from InsightFace. I sincerely appreciate for their contributions.

  2. Our method need two stage training, therefore the code is also stepwise. I will be happy if my humble code would help you. If there are questions or issues, please let me know.

Note:

  1. Our method is appropriate for the noisy data with long-tailed distribution such as MF2 training dataset. When the training data is good, like MS1M and VGGFace2, InsightFace is more suitable.

  2. We use the last arcface model (best performance) to find the third type noise. Next we drop the fc weight of the last arcface model, then finetune from it using NR loss (adding a reweight term by putting more confidence in the prediction of the training model).

  3. The second stage training process need very careful manual tuning. We provide our training log for reference.

Prepare the code and the data.

  1. Install MXNet with GPU support (Python 2.7).
pip install mxnet-cu90
  1. download the code as unequal_code/
git clone https://github.com/zhongyy/Unequal-Training-for-Deep-Face-Recognition-with-Long-Tailed-Noisy-Data.git
  1. download the MF2 training dataset(password: w9y5) and the evaluation dataset, then place them in unequal_code/MF2_pic9_head/ unequal_code/MF2_pic9_tail/ and unequal_code/eval_dataset/ respectively.

step 1: Pretrain MF2_pic9_head with ArcFace.

End it when the acc of validation dataset (lfw,cfp-fp and agedb-30) does not ascend.

CUDA_VISIBLE_DEVICES='0,1' python -u train_softmax.py --network r50 --loss-type 4  --margin-m 0.5 --data-dir ./MF2_pic9_head/ --end-epoch 40 --per-batch-size 100 --prefix ../models/r50_arc_pic9/model 2>&1|tee r50_arc_pic9.log

step 2: Train the head data with NRA (finetune from step 1).

  1. Once the model_t,0 is saved, end it.
CUDA_VISIBLE_DEVICES='0,1' python -u train_NR_savemodel.py --network r50 --loss-type 4 --margin-m 0.5 --data-dir ./MF2_pic9_head/ --end-epoch 1 --lr 0.01  --per-batch-size 100 --noise-beta 0.9 --prefix ../models/NRA_r50pic9/model_t --bin-dir ./src/ --pretrained ../models/r50_arc_pic9/model,xx 2>&1|tee NRA_r50pic9_savemodel.log
  1. End it when the acc of validation dataset(lfw, cfp-fp and agedb-30) does not ascend.
CUDA_VISIBLE_DEVICES='0,1' python -u train_NR.py --network r50 --loss-type 4 --margin-m 0.5 --data-dir ./MF2_pic9_head/ --lr 0.01 --lr-steps 50000,90000 --per-batch-size 100 --noise-beta 0.9 --prefix ../models/NRA_r50pic9/model --bin-dir ./src/ --pretrained ../models/NRA_r50pic9/model_t,0 2>&1|tee NRA_r50pic9.log

step 3:

  1. Generate the denoised head data using ./MF2_pic9_head/train.lst and 0_noiselist.txt which has been generated in step 2. (We provide our denoised version(password: w9y5)

  2. Using the denoised head data (have removed the third type noise) and the tail data to continue the second stage training. It's noting that the training process need finetune manually by increase the --interweight gradually. When you change the interweight, you also need change the pretrained model by yourself, because we could not know which is the best model in the last training stage unless we test the model on the target dataset (MF2 test). We always finetune from the best model in the last training stage.

CUDA_VISIBLE_DEVICES='0,1,2,3,4,5,6,7' python -u train_debug_soft_gs.py --network r50 --loss-type 4 --data-dir ./MF2_pic9_head_denoise/ --data-dir-interclass ./MF2_pic9_tail/ --end-epoch 100000 --lr 0.001 --interweight 1 --bag-size 3600 --batch-size1 360 --batchsize_id 360 --batch-size2 40  --pretrained /home/zhongyaoyao/insightface/models/NRA_r50pic9/model,xx --prefix ../models/model_all/model 2>&1|tee all_r50.log
CUDA_VISIBLE_DEVICES='0,1,2,3,4,5,6,7' python -u train_debug_soft_gs.py --network r50 --loss-type 4 --data-dir ./MF2_pic9_head_denoise/ --data-dir-interclass ./MF2_pic9_tail/ --end-epoch 100000 --lr 0.001 --interweight 5 --bag-size 3600 --batch-size1 360 --batchsize_id 360 --batch-size2 40  --pretrained ../models/model_all/model,xx --prefix ../models/model_all/model_s2 2>&1|tee all_r50_s2.log
You might also like...
Pytorch implementation for
Pytorch implementation for "Adversarial Robustness under Long-Tailed Distribution" (CVPR 2021 Oral)

Adversarial Long-Tail This repository contains the PyTorch implementation of the paper: Adversarial Robustness under Long-Tailed Distribution, CVPR 20

Exploring Classification Equilibrium in Long-Tailed Object Detection, ICCV2021
Exploring Classification Equilibrium in Long-Tailed Object Detection, ICCV2021

Exploring Classification Equilibrium in Long-Tailed Object Detection (LOCE, ICCV 2021) Paper Introduction The conventional detectors tend to make imba

Awesome Long-Tailed Learning

Awesome Long-Tailed Learning This repo pays specially attention to the long-tailed distribution, where labels follow a long-tailed or power-law distri

A Simple Long-Tailed Rocognition Baseline via Vision-Language Model
A Simple Long-Tailed Rocognition Baseline via Vision-Language Model

BALLAD This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model. Requirements Python3 Pytorch(1.7.

On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks

On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks We provide the code (in PyTorch) and datasets for our paper "On Size-Orient

DVG-Face: Dual Variational Generation for Heterogeneous Face Recognition, TPAMI 2021

DVG-Face: Dual Variational Generation for HFR This repo is a PyTorch implementation of DVG-Face: Dual Variational Generation for Heterogeneous Face Re

Face Library is an open source package for accurate and real-time face detection and recognition
Face Library is an open source package for accurate and real-time face detection and recognition

Face Library Face Library is an open source package for accurate and real-time face detection and recognition. The package is built over OpenCV and us

A large-scale face dataset for face parsing, recognition, generation and editing.
A large-scale face dataset for face parsing, recognition, generation and editing.

CelebAMask-HQ [Paper] [Demo] CelebAMask-HQ is a large-scale face image dataset that has 30,000 high-resolution face images selected from the CelebA da

PyTorch implementation of
PyTorch implementation of "Contrast to Divide: self-supervised pre-training for learning with noisy labels"

Contrast to Divide: self-supervised pre-training for learning with noisy labels This is an official implementation of "Contrast to Divide: self-superv

Comments
  • where the parameter 't' of NR loss?

    where the parameter 't' of NR loss?

    Hi, I just found the parameter noise_beta of NR loss, but I did not found the parameter t, 安and the parameter does not change dynamically as mentioned in the paper, it is a constant number. If I'm wrong, please correct me. Thank you.

    opened by fuxuliu 0
  • the code of generating the denoised head data

    the code of generating the denoised head data

    Hi. I can't find the code of generating the denoised head data or generating the 0_noiselist.txt in stage2. Could you share and upload the code? Thank you so much.

    opened by fuxuliu 0
  • in train_NR.py why cross entropy is divided by batch_size and multiply by 2

    in train_NR.py why cross entropy is divided by batch_size and multiply by 2

    in train_NR.py line 301 cross_entropy = cross_entropy / args.batch_size/2

    could you explain why the loss should be divided by the batch size and then multiply by 2? thank you

    opened by xysong1201 0
  • IndexError

    IndexError

    Traceback (most recent call last): File "train_debug_soft_gs_y0.py", line 530, in main() File "train_debug_soft_gs_y0.py", line 527, in main train_net(args) File "train_debug_soft_gs_y0.py", line 521, in train_net epoch_end_callback = epoch_cb ) File "/home/yaoxi/anaconda2/lib/python2.7/site-packages/mxnet/module/base_module.py", line 520, in fit next_data_batch = next(data_iter) File "/DATA1/yaoxi/Unequal-Training-for-Deep-Face-Recognition-with-Long-Tailed-Noisy-Data/data_longtail_gs.py", line 425, in next self.reset() File "/DATA1/yaoxi/Unequal-Training-for-Deep-Face-Recognition-with-Long-Tailed-Noisy-Data/data_longtail_gs.py", line 377, in reset self.interclass_reset() File "/DATA1/yaoxi/Unequal-Training-for-Deep-Face-Recognition-with-Long-Tailed-Noisy-Data/data_longtail_gs.py", line 339, in interclass_reset id_sel = self.pick_interclass(embeddings, nrof_images_per_class, self.batchsize_id) # shape=(T,3) File "/DATA1/yaoxi/Unequal-Training-for-Deep-Face-Recognition-with-Long-Tailed-Noisy-Data/data_longtail_gs.py", line 210, in pick_interclass centers[i] = np.mean([embeddings[iself.images_per_identity], embeddings[iself.images_per_identity+1],embeddings[i*self.images_per_identity+2]], axis=0) IndexError: index 3600 is out of bounds for axis 0 with size 3600

    opened by tianxingyzxq 1
Owner
Zhong Yaoyao
PhD student in BUPT
Zhong Yaoyao
Improving Calibration for Long-Tailed Recognition (CVPR2021)

MiSLAS Improving Calibration for Long-Tailed Recognition Authors: Zhisheng Zhong, Jiequan Cui, Shu Liu, Jiaya Jia [arXiv] [slide] [BibTeX] Introductio

Jia Research Lab 116 Dec 20, 2022
Improving Calibration for Long-Tailed Recognition (CVPR2021)

Improving Calibration for Long-Tailed Recognition (CVPR2021)

Jia Research Lab 19 Apr 28, 2021
Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)

Large-Scale Long-Tailed Recognition in an Open World [Project] [Paper] [Blog] Overview Open Long-Tailed Recognition (OLTR) is the author's re-implemen

Zhongqi Miao 761 Dec 26, 2022
Improving Calibration for Long-Tailed Recognition (CVPR2021)

MiSLAS Improving Calibration for Long-Tailed Recognition Authors: Zhisheng Zhong, Jiequan Cui, Shu Liu, Jiaya Jia [arXiv] [slide] [BibTeX] Introductio

DV Lab 116 Dec 20, 2022
Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective

Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective Zhengzhuo Xu, Zenghao Chai, Chun Yuan This is the PyTorch implement

Sincere 16 Dec 15, 2022
A Light CNN for Deep Face Representation with Noisy Labels

A Light CNN for Deep Face Representation with Noisy Labels Citation If you use our models, please cite the following paper: @article{wulight, title=

Alfred Xiang Wu 715 Nov 5, 2022
Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

[AAAI22] Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification We point out the overlooked unbiasedness in long-tailed clas

PatatiPatata 28 Oct 18, 2022
Face-Recognition-Attendence-System - This face recognition Attendence system using Python

Face-Recognition-Attendence-System I have developed this face recognition Attend

Riya Gupta 4 May 10, 2022
This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

BALLAD This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model. Requirements Python3 Pytorch(1.7.

peng gao 11 Dec 1, 2021
Training a deep learning model on the noisy CIFAR dataset

Training-a-deep-learning-model-on-the-noisy-CIFAR-dataset This repository contai

null 1 Jun 14, 2022