An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicity.

Last update: Jun 27, 2021

Related tags

Deep Learning FFC

Overview

Fast Face Classification (F²C)

This is the code of our paper An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicity.

Training on ultra-large-scale datasets is time-consuming and takes up a lot of hardware resource. Therefore we design a dul-data loaders and dynamic class pool to deal with large-scale face classification.

Pipeline

Preparation

As FFC contains LRU module, so you may use lru_python_impl.py or instead compile the code under lru_c directory.

If you choose lru_python_impl.py, you should rename lru_python_impl.py to lru_utils.py. As lru is not the bottleneck of the training procedure, so feel free to use python implementation, though the C++ implementation is 5~10 times faster than python version.

Compile LRU (optional)

Command to build LRU

cd lru_c
mkdir build
cd build
cmake ..
make
cd ../../ && ln -s lru_c/build/lru_utils.so .

You can compare this two implementation using lru_c/python/compare_time.py

Database

Training dataset
- MS-Celeb-1M
- Deepglint-360K
Test dataset
- SLLFW
- CPLFW: Baidu or Google Drive
- CALFW: Baidu or Google Drive
- CFP: Baidu or Google Drive
- AgeDB: Baidu or Google Drive
- YTF
- IJBC
- MegaFace
Data preprocess

We use 5 landmarks(Left eye center, right eye center, nose, left mouth corner and right mouth corner) to crop face as what ArcFace does. You can find code here.

Training

In main.py, you should provide the path to your training db at line 152-153.

args.source_lmdb = ['/path to msceleb.lmdb']
args.source_file = ['/path to kv file']

We choose lmdb as the format of our training db. Each element in source_file is the path to a text file, each line of which represents lmdb_key label pairs. You may refer to LFS for more details.

Now you can modify train_ffc.sh. Before running the training, you should set the port number and queue_size. queue_size is a trade-off term that controls the performance and the speed. Larger queue_size means higher performance at the cost of time and GPU resource. It can be any positive integer. The common setting is 1%, 0.1%, 0.001 % of the total identities.

Notice

The difference between r50 and ir50 is that r50 requires 224 × 224 images as input while ir50 requires 112 × 112 as what does by ArcFace. The network ir50 comes from ArcFace.

Evaluation

We provide the whole test script under evaluation_code directory. Each script requires the directory to the images and test pair files.

Tips

Code in evaluation_code/test_megaface.py is much faster than official version. It's also applicable to extremely large-scale testing.

You might also like...

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition, CVPR 2018

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place

294 Dec 12, 2022

Official PyTorch implementation of "Improving Face Recognition with Large AgeGaps by Learning to Distinguish Children" (BMVC 2021)

Inter-Prototype (BMVC 2021): Official Project Webpage This repository provides the official PyTorch implementation of the following paper: Improving F

16 Jun 30, 2022

Open-AI's DALL-E for large scale training in mesh-tensorflow.

DALL-E in Mesh-Tensorflow [WIP] Open-AI's DALL-E in Mesh-Tensorflow. If this is similarly efficient to GPT-Neo, this repo should be able to train mode

432 Dec 16, 2022

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation mode

36 Oct 30, 2022

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation models. It contains 17 different amateur subjects performing 30 sports-related actions each, for a total of 510 action clips.

25 Jun 20, 2021

Official repository for the paper, MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding.

MidiBERT-Piano Authors: Yi-Hui (Sophia) Chou, I-Chun (Bronwin) Chen Introduction This is the official repository for the paper, MidiBERT-Piano: Large-

137 Dec 15, 2022

ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

ManiSkill-Learn ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge, a large-scale learning-from-dem

48 Dec 30, 2022

Galileo library for large scale graph training by JD

近年来，图计算在搜索、推荐和风控等场景中获得显著的效果，但也面临超大规模异构图训练，与现有的深度学习框架Tensorflow和PyTorch结合等难题。 Galileo（伽利略）是一个图深度学习框架，具备超大规模、易使用、易扩展、高性能、双后端等优点，旨在解决超大规模图算法在工业级场景的落地难题，提

128 Nov 29, 2022

UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities

Pre-trained (foundation) models across tasks (understanding, generation and translation), languages (100+ languages), and modalities (language, image, audio, vision + language, audio + language, etc.)

7.6k Jan 1, 2023

Comments

lmdb dataSets

您好，我按照推荐的方式，构建了一个比较小的lmdb 数据测试流程，但是出现了错误。能帮忙分析下：(1)数据集生成方式是否正确，能否提供一个你们生成的数据集信息；(2)这个错误如何解决 ①数据集信息如下：source_file.txt 的格式为: img_name.jpg label。生成lmdb的参数为 --shuffle=true --backend=lmdb --resize_width=128 --resize_height=128
--encoded=true
--encode_type=jpg ② 错误如下： File "main.py", line 51, in train_one_epoch images1, images2, id_indexes = next(id_iter) File "FFC_py36/lib/python3.6/site-ackages/torch/utils/data/dataloader.py", line 363, in next data = self._next_data() File "FFC_py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 964, in _next_data raise StopIteration StopIteration

opened by MSZLean 2
'FFC' object has no attribute '_momentum_update_gallery'

请问下作者，我加载数据运行FFC项目时，提示如下错误，您碰到类似情况吗 File "FFC-master/ffc_ddp.py", line 182, in forward_impl_rollback self._momentum_update_gallery() # update the gallery net File "FFC-py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 772, in __getattr__type(self).name, name)) torch.nn.modules.module.ModuleAttributeError: 'FFC' object has no attribute '_momentum_update_gallery'

opened by MSZLean 2

An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicity.

Related tags

Overview

Fast Face Classification (F²C)

Preparation

Compile LRU (optional)

Database

Training

Notice

Evaluation

You might also like...

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition, CVPR 2018

Official PyTorch implementation of "Improving Face Recognition with Large AgeGaps by Learning to Distinguish Children" (BMVC 2021)

Open-AI's DALL-E for large scale training in mesh-tensorflow.

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

Official repository for the paper, MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding.

ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

Galileo library for large scale graph training by JD

UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities

Comments

lmdb dataSets

'FFC' object has no attribute '_momentum_update_gallery'

Owner

A very tiny, very simple, and very secure file encryption tool.

Technical Indicators implemented in Python only using Numpy-Pandas as Magic - Very Very Fast! Very tiny! Stock Market Financial Technical Analysis Python library . Quant Trading automation or cryptocoin exchange

Face-Recognition-Attendence-System - This face recognition Attendence system using Python

SSD: Single Shot MultiBox Detector pytorch implementation focusing on simplicity

A general-purpose programming language, focused on simplicity, safety and stability.

Paaster is a secure by default end-to-end encrypted pastebin built with the objective of simplicity.

The implementation of the CVPR2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes"

StackRec: Efficient Training of Very Deep Sequential Recommender Models by Iterative Stacking

This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)