The implementation of the CVPR2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes"

Overview

STAR-FC

This code is the implementation for the CVPR 2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes" 🌟 🌟 .

🎓 Requirements

  • Python = 3.6
  • Pytorch = 1.2.0
  • faiss

🧚 Hardware

The hardware we used in this work is as follows:

🍰 Datasets

cd STAR-FC

Create a new folder for training data:

mkdir data

To run the code, please download the refined MS1M dataset and partition it into 10 splits, then construct the data directory as follows:

|——data
   |——features
      |——part0_train.bin
      |——part1_test.bin
      |——...
      |——part9_test.bin
   |——labels
      |——part0_train.meta
      |——part1_test.meta
      |——...
      |——part9_test.meta
   |——knns
      |——part0_train/faiss_k_80.npz
      |——part1_test/faiss_k_80.npz
      |——...
      |——part9_test/faiss_k_80.npz

We have used the data from: https://github.com/yl-1993/learn-to-cluster

🍬 Model

Put the pretrained models Backbone.pth and Head.pth in the ./pretrained_model. Our trained models will come soon.

☘️ Training

Adjust the configuration in ./src/configs/cfg_gcn_ms1m.py, then run the algorithm as follows:

cd STAR-FC
sh scripts/train_gcn_ms1m.sh

🌵 Testing

Adjust the configuration in ./src/configs/cfg_gcn_ms1m.py, then run the algorithm as follows:

cd STAR-FC
python test_final.py

Acknowledgement

This code is based on the publicly available face clustering codebase https://github.com/yl-1993/learn-to-cluster.

Citation

Please cite the following paper if you use this repository in your reseach.

@inproceedings{shen2021starfc,
   author={Shen, Shuai and Li, Wanhua and Zhu, Zheng and Huan, Guan and Du, Dalong and Lu, Jiwen and Zhou, Jie},
   title={Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes},
   booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
   year={2021}
}
Comments
  • There is a problem when running with test_final.py

    There is a problem when running with test_final.py

    Traceback (most recent call last): File "test_final.py", line 167, in gt_labels = np.load('./pretrained_model/gt_labels.npy') File "/mnt/lustre/caoguoliang1/anaconda3/envs/cluster/lib/python3.6/site-packages/numpy/lib/npyio.py", line 416, in load fid = stack.enter_context(open(os_fspath(file), "rb")) FileNotFoundError: [Errno 2] No such file or directory: './pretrained_model/gt_labels.npy'

    What is gt_labels.npy?

    opened by GlennCGL 2
  • why bcubed  fscore is so pool?

    why bcubed fscore is so pool?

    I test ms1m part1_test.bin with the model you provided. The result: [Time] evaluate with pairwise consumes 0.0512 s ave_pre: 0.9559, ave_rec: 0.8870, fscore: 0.9202 [Time] evaluate with bcubed consumes 20.8486 s ave_pre: 0.0858, ave_rec: 1.0000, fscore: 0.1581 [Time] evaluate with nmi consumes 0.1714 s nmi: 0.9738

    opened by xiangliu886 1
  • The results are not robust in new dataset

    The results are not robust in new dataset

    Hi, I have trained and tested star-fc with public person datasets MSMT and Market, but got very poor results. Could you tell me why?


    2022-03-18 18:18:19.397 real clusters: 3060, predict clusters: 7993

    2022-03-18 18:18:19.397 [Time] evaluate with pairwise consumes 0.0095 s

    2022-03-18 18:18:19.397 ave_pre: 0.0032, ave_rec: 0.8302, fscore: 0.0064

    2022-03-18 18:18:19.397 [Time] evaluate with bcubed consumes 4.2473 s

    2022-03-18 18:18:19.397 ave_pre: 0.4681, ave_rec: 0.8321, fscore: 0.5991

    2022-03-18 18:18:19.397 [Time] evaluate with nmi consumes 0.0290 s

    2022-03-18 18:18:19.397 nmi: 0.7601

    2022-03-18 18:18:19.397 avg_acc: 0.8054215519079088

    2022-03-18 18:18:19.397 pairwise: 0.006409889997558421

    2022-03-18 18:18:19.397 bcubed: 0.5991425302360325

    2022-03-18 18:18:19.397 nmi: 0.7600691590384783

    opened by shuxjweb 1
  • About WebFace42M's feature file

    About WebFace42M's feature file

    Very good job ! I want to know if the author can release WebFace42M's feature file and its train/test splits. So that more researchers can follow and cite this work.

    opened by slacklife 1
  • To much samples

    To much samples

    image Hi, I followed your example, didn't change the parameters, just replaced the dataset, and then I was reminded that the number of samples is too large, how can I fix it?

    opened by 1017549629 1
  • About perform_val in training

    About perform_val in training

    Hi! I am checking the train_gcn.py and noticed something weird. In line 32 def perform_val() the test_inst_num is defined as the length of test_idx2lb, which is the total number of samples in the test set. However, the pair_a and pair_b are defined as lists of k duplicates of each sample and their k nearest neighbors. When looping over the patches, the patch_size is defined as patch_size = int(test_int_num/patch_num). It seems the loop only covers a total of sample size, instead of k times of the sample size, which is the total number of pairs. Does this mean only the initial portions of pair_a and pair_b are evaluated? The average_acc is also obtained by dividing sum_acc and test_inst_num, which is only a portion of pair a and pair b. Another question is why only test the pair-wise accuracy on the knn pairs instead of all pairs? The knn pairs are selected since they are close to each other, won't this cause bias since they are very likely to from the same cluster?

    opened by RealNewNoob 1
  • About sampling strategy and clustering setting

    About sampling strategy and clustering setting

    Congratulations on your publication! I am reading your code and paper, however, I have a question about the sampling policy. In your paper, you mentioned M = 2, and N = 750, so two seeds, and their nearest 750 clusters are selected before CR, which makes a total of 1500. However, in train_gcn.py line 146, the for batch in range(cls_num): it seems all the clusters are looped, and for each of them, a total of 1300+200 = 1500 clusters are sampled before CR. In every training step, the features from these clusters are used to construct the affinity graph after SR. Did I miss something?

    opened by RealNewNoob 1
  • Inference time comparison between GCNV and starfc

    Inference time comparison between GCNV and starfc

    In Table 4, the inference time of starfc is 310s, which is faster than GCN-V+E 609s. But did you compare the inference time between starfc and gcnv (without gcn-e part)? Which one is faster? In my experiments, the accuracy of GCN-V is high enough. And the inference time of GCN-E takes up most of the inference time of GCN-V+E. So in most scenarios, GCN-E is not needed.

    opened by marigoold 1
  • What is `cfg.cluster_num` ?

    What is `cfg.cluster_num` ?

    Hi ,congratulations to your work! Got some questions , maybe you could help? Thanks!

    1. What is cfg.cluster_num ?
    2. Is it the N in paper's Algorithms 1?
    3. Why added by 200 in train_gcn.py line 143
    opened by RHxW 1
  • when knn_method =

    when knn_method = "faiss_gpu", the code is running with error

    because the code will build knn in faiss(cpu mode), we change the config "knn_method = faiss" to "knn_method = faiss_gpu", then runing with error

    opened by hx121071 0
  • NameError: name 'knn_dynamic' is not defined

    NameError: name 'knn_dynamic' is not defined

    Hi, thank for repo.

    when I run :python test_final.py

    Traceback (most recent call last): File "STAR-FC/test_final.py", line 4, in from evaluation.evaluate import evaluate File "STAR-FC/evaluation/init.py", line 5, in from .evaluate import evaluate File "STAR-FC/evaluation/evaluate.py", line 9, in from utils import Timer, TextColors File "STAR-FC/utils/init.py", line 5, in from .knn import * File "STAR-FC/utils/knn.py", line 409, in class knn_faiss_dynamic(knn_dynamic): NameError: name 'knn_dynamic' is not defined

    opened by AliRezaSafaei9494 0
  • 关于 实验结果

    关于 实验结果

    作者您好,请问论文中的实验结果可以用您在论文中设置的超参数复现出来吗,我这边使用了您在论文中提到的超参数设置(近邻簇为1500中取1300个,并进行90%的节点抽样,use_Sim = True,阈值设置为0,knn设置为80),训练到第一个epoch的3899batch,训练loss降为0.02左右,但在测试集上的pairwise ave_pre很低,不明白为什么?谢谢~

    opened by joewybean 1
Owner
Shuai Shen
I am a Ph.D. student in the Department of Automation at Tsinghua University, advised by Prof. Jiwen Lu.
Shuai Shen
[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

PointDSC repository PyTorch implementation of PointDSC for CVPR'2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency",

null 153 Dec 14, 2022
The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.

Temporal Query Networks for Fine-grained Video Understanding ?? This repository contains the implementation of CVPR2021 paper Temporal_Query_Networks

null 55 Dec 21, 2022
The official implementation of the CVPR2021 paper: Decoupled Dynamic Filter Networks

Decoupled Dynamic Filter Networks This repo is the official implementation of CVPR2021 paper: "Decoupled Dynamic Filter Networks". Introduction DDF is

F.S.Fire 180 Dec 30, 2022
PyTorch implementation of our Adam-NSCL algorithm from our CVPR2021 (oral) paper "Training Networks in Null Space for Continual Learning"

Adam-NSCL This is a PyTorch implementation of Adam-NSCL algorithm for continual learning from our CVPR2021 (oral) paper: Title: Training Networks in N

Shipeng Wang 34 Dec 21, 2022
Pytorch implementation of CVPR2021 paper "MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generation"

MUST-GAN Code | paper The Pytorch implementation of our CVPR2021 paper "MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generat

TianxiangMa 46 Dec 26, 2022
A pytorch implementation of the CVPR2021 paper "VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild"

VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild A pytorch implementation of the CVPR2021 paper "VSPW: A Large-scale Dataset for Video

null 45 Nov 29, 2022
A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

Keren Ye 35 Nov 20, 2022
Code for our CVPR2021 paper coordinate attention

Coordinate Attention for Efficient Mobile Network Design (preprint) This repository is a PyTorch implementation of our coordinate attention (will appe

Qibin (Andrew) Hou 726 Jan 5, 2023
[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

TBE The source code for our paper "Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Le

Jinpeng Wang 150 Dec 28, 2022
Code for CVPR2021 paper "Robust Reflection Removal with Reflection-free Flash-only Cues"

Robust Reflection Removal with Reflection-free Flash-only Cues (RFC) Paper | To be released: Project Page | Video | Data Tensorflow implementation for

Chenyang LEI 162 Jan 5, 2023
Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information by Masato Tamura, Hiroki Ohashi, and Tomoaki Yosh

null 105 Dec 23, 2022
Code for the paper "Graph Attention Tracking". (CVPR2021)

SiamGAT 1. Environment setup This code has been tested on Ubuntu 16.04, Python 3.5, Pytorch 1.2.0, CUDA 9.0. Please install related libraries before r

null 122 Dec 24, 2022
PyTorch code for the paper "Curriculum Graph Co-Teaching for Multi-target Domain Adaptation" (CVPR2021)

PyTorch code for the paper "Curriculum Graph Co-Teaching for Multi-target Domain Adaptation" (CVPR2021) This repo presents PyTorch implementation of M

Evgeny 79 Dec 19, 2022
The official repo of the CVPR2021 oral paper: Representative Batch Normalization with Feature Calibration

Representative Batch Normalization (RBN) with Feature Calibration The official implementation of the CVPR2021 oral paper: Representative Batch Normali

Open source projects of ShangHua-Gao 76 Nov 9, 2022
Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

AFSD: Learning Salient Boundary Feature for Anchor-free Temporal Action Localization This is an official implementation in PyTorch of AFSD. Our paper

Tencent YouTu Research 146 Dec 24, 2022
Code for C2-Matching (CVPR2021). Paper: Robust Reference-based Super-Resolution via C2-Matching.

C2-Matching (CVPR2021) This repository contains the implementation of the following paper: Robust Reference-based Super-Resolution via C2-Matching Yum

Yuming Jiang 151 Dec 26, 2022
Code for CVPR2021 paper 'Where and What? Examining Interpretable Disentangled Representations'.

PS-SC GAN This repository contains the main code for training a PS-SC GAN (a GAN implemented with the Perceptual Simplicity and Spatial Constriction c

Xinqi/Steven Zhu 40 Dec 16, 2022
Code for the CVPR2021 paper "Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition"

Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition This repository contains code for the CVPR2021 paper "Patch-NetV

QVPR 368 Jan 6, 2023
Official code of paper "PGT: A Progressive Method for Training Models on Long Videos" on CVPR2021

PGT Code for paper PGT: A Progressive Method for Training Models on Long Videos. Install Run pip install -r requirements.txt. Run python setup.py buil

Bo Pang 27 Mar 30, 2022