SphereFace: Deep Hypersphere Embedding for Face Recognition

Weiyang Liu

Last update: Dec 29, 2022

Related tags

Deep Learning caffe deep-learning face-recognition face-detection cvpr-2017 sphereface angular-softmax

Overview

SphereFace: Deep Hypersphere Embedding for Face Recognition

By Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj and Le Song

License

SphereFace is released under the MIT License (refer to the LICENSE file for details).

Update

2018.8.14: We recommand an interesting ECCV 2018 paper that comprehensively evaluates SphereFace (A-Softmax) on current widely used face datasets and their proposed noise-controlled IMDb-Face dataset. Interested users can try to train SphereFace on their IMDb-Face dataset. Take a look here.
2018.5.23: A new SphereFace+ that explicitly enhances the inter-class separability has been introduced in our technical report. Check it out here. Code is released here.
2018.2.1: As requested, the prototxt files for SphereFace-64 are released.
2018.1.27: We updated the appendix of our SphereFace paper with useful experiments and analysis. Take a look here. The content contains:
- The intuition of removing the last ReLU;
- Why do we want to normalize the weights other than because we need more geometric interpretation?
- Empirical experiment of zeroing out the biases;
- More 2D visualization of A-Softmax loss on MNIST;
- Angular Fisher score for evaluating the angular feature discriminativeness, which is a new and straightforward evluation metric other than the final accuracy.
- Experiments of SphereFace on MegaFace with different convolutional layers;
- The annealing optimization strategy for A-Softmax loss;
- Details of the 3-patch ensemble strategy in MegaFace challenge;
2018.1.20: We updated some resources to summarize the current advances in angular margin learning. Take a look here.

Introduction
Citation
Requirements
Installation
Usage
Models
Results
Video Demo
Note
Third-party re-implementation
Resources for angular margin learning

Introduction

The repository contains the entire pipeline (including all the preprocessings) for deep face recognition with SphereFace. The recognition pipeline contains three major steps: face detection, face alignment and face recognition.

SphereFace is a recently proposed face recognition method. It was initially described in an arXiv technical report and then published in CVPR 2017. The most up-to-date paper with more experiments can be found at arXiv or here. To facilitate the face recognition research, we give an example of training on CAISA-WebFace and testing on LFW using the 20-layer CNN architecture described in the paper (i.e. SphereFace-20).

In SphereFace, our network architecures use residual units as building blocks, but are quite different from the standrad ResNets (e.g., BatchNorm is not used, the prelu replaces the relu, different initializations, etc). We proposed 4-layer, 20-layer, 36-layer and 64-layer architectures for face recognition (details can be found in the paper and prototxt files). We provided the 20-layer architecure as an example here. If our proposed architectures also help your research, please consider to cite our paper.

SphereFace achieves the state-of-the-art verification performance (previously No.1) in MegaFace Challenge under the small training set protocol.

Citation

If you find SphereFace useful in your research, please consider to cite:

@InProceedings{Liu_2017_CVPR,
  title = {SphereFace: Deep Hypersphere Embedding for Face Recognition},
  author = {Liu, Weiyang and Wen, Yandong and Yu, Zhiding and Li, Ming and Raj, Bhiksha and Song, Le},
  booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2017}
}

Our another closely-related previous work in ICML'16 (more):

@InProceedings{Liu_2016_ICML,
  title = {Large-Margin Softmax Loss for Convolutional Neural Networks},
  author = {Liu, Weiyang and Wen, Yandong and Yu, Zhiding and Yang, Meng},
  booktitle = {Proceedings of The 33rd International Conference on Machine Learning},
  year = {2016}
}

Requirements

Requirements for Matlab
Requirements for Caffe and matcaffe (see: Caffe installation instructions)
Requirements for MTCNN (see: MTCNN - face detection & alignment) and Pdollar toolbox (see: Piotr's Image & Video Matlab Toolbox).

Installation

Clone the SphereFace repository. We'll call the directory that you cloned SphereFace as SPHEREFACE_ROOT.
```
git clone --recursive https://github.com/wy1iu/sphereface.git
```

Build Caffe and matcaffe

cd $SPHEREFACE_ROOT/tools/caffe-sphereface
# Now follow the Caffe installation instructions here:
# http://caffe.berkeleyvision.org/installation.html
make all -j8 && make matcaffe

Usage

After successfully completing the installation, you are ready to run all the following experiments.

Part 1: Preprocessing

Note: In this part, we assume you are in the directory $SPHEREFACE_ROOT/preprocess/

Download the training set (CASIA-WebFace) and test set (LFW) and place them in data/.
```
mv /your_path/CASIA_WebFace  data/
./code/get_lfw.sh
tar xvf data/lfw.tgz -C data/
```
Please make sure that the directory of data/ contains two datasets.
Detect faces and facial landmarks in CAISA-WebFace and LFW datasets using MTCNN (see: MTCNN - face detection & alignment).
```
# In Matlab Command Window
run code/face_detect_demo.m
```
This will create a file dataList.mat in the directory of result/.
Align faces to a canonical pose using similarity transformation.
```
# In Matlab Command Window
run code/face_align_demo.m
```
This will create two folders (CASIA-WebFace-112X96/ and lfw-112X96/) in the directory of result/, containing the aligned face images.

Part 2: Train

Note: In this part, we assume you are in the directory $SPHEREFACE_ROOT/train/

Get a list of training images and labels.
```
mv ../preprocess/result/CASIA-WebFace-112X96 data/
# In Matlab Command Window
run code/get_list.m
```
The aligned face images in folder CASIA-WebFace-112X96/ are moved from preprocess folder to train folder. A list CASIA-WebFace-112X96.txt is created in the directory of data/ for the subsequent training.
Train the sphereface model.
```
./code/sphereface_train.sh 0,1
```
After training, a model sphereface_model_iter_28000.caffemodel and a corresponding log file sphereface_train.log are placed in the directory of result/sphereface/.

Part 3: Test

Note: In this part, we assume you are in the directory $SPHEREFACE_ROOT/test/

Get the pair list of LFW (view 2).
```
mv ../preprocess/result/lfw-112X96 data/
./code/get_pairs.sh
```
Make sure that the LFW dataset andpairs.txt in the directory of data/

Extract deep features and test on LFW.

# In Matlab Command Window
run code/evaluation.m

Finally we have the sphereface_model.caffemodel, extracted features pairs.mat in folder result/, and accuracy on LFW like this:

fold	1	2	3	4	5	6	7	8	9	10	AVE
ACC	99.33%	99.17%	98.83%	99.50%	99.17%	99.83%	99.17%	98.83%	99.83%	99.33%	99.30%

Models

Visualizations of network architecture (tools from ethereon):
- SphereFace-20: link
Model file
- SphereFace-20: Google Drive | Baidu
- Third-party SphereFace-4 & SphereFace-6: here by zuoqing1988

Results

Following the instruction, we go through the entire pipeline for 5 times. The accuracies on LFW are shown below. Generally, we report the average but we release the model-3 here.

Experiment #1 #2 #3 (released) #4 #5

ACC 99.24% 99.20% 99.30% 99.27% 99.13%
Other intermediate results:
- LFW features: Google Drive | Baidu
- Training log: Google Drive | Baidu

Experiment	#1	#2	#3 (released)	#4	#5
ACC	99.24%	99.20%	99.30%	99.27%	99.13%

Video Demo

Please click the image to watch the Youtube video. For Youku users, click here.

Details:

It is an open-set face recognition scenario. The video is processed frame by frame, following the same pipeline in this repository.
Gallery set consists of 6 identities. Each main character has only 1 gallery face image. All the detected faces are included in probe set.
There is no overlap between gallery set and training set (CASIA-WebFace).
The scores between each probe face and gallery set are computed by cosine similarity. If the maximal score of a probe face is smaller than a pre-definded threshold, the probe face would be considered as an outlier.
Main characters are labeled by boxes with different colors. ( Rachel, Monica, Phoebe, Joey, Chandler, Ross)

Note

Backward gradient
- In this implementation, we did not strictly follow the equations in paper. Instead, we normalize the scale of gradient. It can be interpreted as a varying strategy for learning rate to help converge more stably. Similar idea and intuition also appear in normalized gradients and projected gradient descent.
- More specifically, if the original gradient of f w.r.t x can be written as df/dx = coeff_w * w + coeff_x * x, we use the normalized version [df/dx] = (coeff_w * w + coeff_x * x) / norm_wx to perform backward propragation, where norm_wx is sqrt(coeff_w^2 + coeff_x^2). The same operation is also applied to the gradient of f w.r.t w.
- In fact, you do not necessarily need to use the original gradient, since the original gradient sometimes is not an optimal design. One important criterion for modifying the backprop gradient is that the new "gradient" (strictly speaking, it is not a gradient anymore) need to make the objective value decrease stably and consistently. (In terms of some failure cases for gradient-based back-prop, I recommand a great talk by Shai Shalev-Shwartz)
- If you use the original gradient to do the backprop, you could still make it work but may need different lambda settings, iteration number and learning rate decay strategy.
Lambda and Note for training (When the loss becomes 87)
- Please refer to our previous note and explanation.
According to recent advances, using feature normalization with a tunable scaling parameter s can significantly improve the performance of SphereFace on MegaFace challenge
- This is supported by the experiments done by CosFace. Similar idea also appears in additive margin softmax.
Difficulties in convergence - When you encounter difficulties in convergence (it may appear if you use SphereFace in another dataset), usually there are a few easy ways to address it.
- First, try to use large mini-batch size.
- Second, try to use PReLU instead of ReLU.
- Third, increase the width and depth of our network.
- Fourth, try to use better initialization. For example, use the pretrained model from the original softmax loss (it is also equivalent to finetuning).
- Last and the most effective thing you could try is to change the hyper-parameters for lambda_min, lambda and its decay speed.

Third-party re-implementation

PyTorch: code by clcarwin.
PyTorch: code by Joyako.
TensorFlow: code by pppoe.
TensorFlow (with awesome animations): code by YunYang1994.
TensorFlow: code by hujun100.
TensorFlow: code by HiKapok.
TensorFlow: code by andrewhuman.
MXNet: code by deepinsight (by setting loss-type=1: SphereFace).
Model compression for SphereFace: code by Siyang Liu (useful in practice)
Caffe2: code by tpys.
Trained on MS-1M: code by KaleidoZhouYN.
System: A cool face demo system using SphereFace by tpys.
Third-party pretrained models: code by goodluckcwl.

Resources for angular margin learning

L-Softmax loss and SphereFace present a promising framework for angular representation learning, which is shown very effective in deep face recognition. We are super excited that our works has inspired many well-performing methods (and loss functions). We list a few of them for your potential reference (not very up-to-date):

Additive margin softmax: paper and code
CosFace: paper
ArcFace/InsightFace: paper and code
NormFace: paper and code
L2-Softmax: paper
von Mises-Fisher Mixture Model: paper
COCO loss: paper and code
Angular Triplet Loss: code

To evaluate the effectiveness of the angular margin learning method, you may consider to use the angular Fisher score proposed in the Appendix E of our SphereFace Paper.

Disclaimer: Some of these methods may not necessarily be inspired by us, but we still list them due to its relevance and excellence.

Contact

Weiyang Liu and Yandong Wen

Questions can also be left as issues in the repository. We will be happy to answer them.

Comments

Results from default config

Hi --

What are the results (eg LFW accuracy) that we should expect when running this code as described in the README? I don't have MATLAB so I'll need to run w/ a different face detection/alignment pipeline, so I want to see how much error is introduced by my variant.

Thanks

opened by bkj 46
Questions abou alignment coordinate points

In face_align_demo.m,there are 5 coordinate points used for alignment,so the question is how you get the values of them? Are they located in the original image which is 250 * 250,or the 112 * 96 cropped image?

opened by zzw1123 12
Is lambda equivalent to smaller m?

From the form of the loss function, I think adding the lambda can be seen as using smaller m. I have plotted the curve of lambda=5, m=4 and find that it is approximately m=1.5.

Is my thought right?

opened by happynear 11
training set CASIA-WebFace: did you use clean_list or all images from CASIA?

lfw acc:99.42% is incredible good; About training set CASIA-WebFace, did you use clean_list or all images from CASIA? I use this cleaned version casia, https://groups.google.com/forum/#!topic/cmu-openface/Xue_D4_mxDQ best result is Accuracy: 0.990+-0.004 Validation rate: 0.92067+-0.02112 @ FAR=0.00100

opened by qiqiguaitm 10
Loss always stays around 9.3

Loss always stays around 9.3, not down I set the learning rate to 0.01 and 0.06, and loss didn't converge？？？ How does the training network need to be modified?？？？？ How is the training parameter set?？？？？ Hope to get your help！！thanks!! @wdwen

opened by taoyunuo 8
Question regarding paper
I really grateful that @wy1iu release the code. You and @ydwen are really pushing the Face-Verification forward.

I have some question regarding the paper and the code:

What is the major change between L-SoftMax and A-SoftMax. For the equation it look like that in L-SoftMax weight are transformed to norm of weights and in A-SofrMax weight are transformed to normalized weight, right? If this is true, the main motivation was section 3.3 in Large-Margin Softmax Loss for Convolutional Neural Networks?

Could you explain how did you choose function ψ (which replace cos(θ))?

In both paper you use Taylor Series of cos(mθ) (Eq. 7 in Large Margin), right? What was the idea behind using different degree of series based on margin value? Why not using same for all margin?

Here is my intuition behind this both paper: in fact we just scale the output from Linear layer by matrix of ones with different numbers (<1) on target classes. Both paper propose different method of scaling (with theoretical explanation). I think that maybe there is possible to make implementation which would just use scale matrix. I must think about it as there are many non-linear operation here.

I was thinking about using CenterLoss but using cosine similarity. But then I realized than it is equivalent to SoftMax layer without bias (and SoftMax also compare features to other class center as well, not only target, so it make features even better). Do you agree with my interpretation?
opened by melgor 8
Mirroring and Multi-patch approach

Hi,

in the figures of your papers you can see that the 3-patch net outperforms the one patch net and you say that the 3-patch feature vector is formed by concatenating the feature vectors of each patch. However, you do not say in your paper how these patches are computed. Do you use a different alignment strategy and train a new net on each type of aligned images or do you introduce a small offset and extract the 3-patch features by just transforming the original image a bit and computing the features with the original net? Furthermore I get better results when not concatenating the features of the mirrored image. Has anyone experimented on that (patch selection/(not) mirroring) ?

opened by commanderka 7
Not achieving good result on MegaFace

First of all, thank you for providing the Sphereface training code for us.

I have retrained your model following your instruction. However, I only got 67.35% on Rank-1 Identification Accuracy on Megaface testing, while you got 72.73%. We used your code and trained from scratch.

Training with Softmax + CenterLoss rendered the correct result as reported in your paper, i.e. ~65%.

I believe this is the same protocol you used for Megaface Testing, right? If not, could you kindly point out the difference? Thank you very much!

opened by ZhijingX 6
A-softmax loss with 64 layer?

First of all thank you for your open code. You have developed a 20-layer Sphereface framework, and Is it necessary to study the 64-layer? Which of the two is better?

opened by double-vane 5
build error in Mac
In file included from src/caffe/layers/margin_inner_product_layer.cpp:8: ./include/caffe/layers/margin_inner_product_layer.hpp:46:3: error: unknown type name 'MarginInnerProductParameter_MarginType' MarginInnerProductParameter_MarginType type_; ^ src/caffe/layers/margin_inner_product_layer.cpp:23:30: error: no member named 'margin_inner_product_param' in 'caffe::LayerParameter' iter_ = this->layer_param_.margin_inner_product_param().iteration(); ~~~~~~~~~~~~~~~~~~ ^ src/caffe/layers/margin_inner_product_layer.cpp:410:19: note: in instantiation of member function 'caffe::MarginInnerProductLayer::LayerSetUp' requested here INSTANTIATE_CLASS(MarginInnerProductLayer); ^ src/caffe/layers/margin_inner_product_layer.cpp:26:45: error: no member named 'margin_inner_product_param' in 'caffe::LayerParameter' const int num_output = this->layer_param_.margin_inner_product_param().num_output(); ~~~~~~~~~~~~~~~~~~ ^ src/caffe/layers/margin_inner_product_layer.cpp:29:26: error: no member named 'margin_inner_product_param' in 'caffe::LayerParameter' this->layer_param_.margin_inner_product_param().axis()); ~~~~~~~~~~~~~~~~~~ ^ src/caffe/layers/margin_inner_product_layer.cpp:46:28: error: no member named 'margin_inner_product_param' in 'caffe::LayerParameter' this->layer_param_.margin_inner_product_param().weight_filler())); ~~~~~~~~~~~~~~~~~~ ^ src/caffe/layers/margin_inner_product_layer.cpp:57:26: error: no member named 'margin_inner_product_param' in 'caffe::LayerParameter' this->layer_param_.margin_inner_product_param().axis()); ~~~~~~~~~~~~~~~~~~ ^ src/caffe/layers/margin_inner_product_layer.cpp:410:19: note: in instantiation of member function 'caffe::MarginInnerProductLayer::Reshape' requested here INSTANTIATE_CLASS(MarginInnerProductLayer); ^ src/caffe/layers/margin_inner_product_layer.cpp:112:36: error: no member named 'margin_inner_product_param' in 'caffe::LayerParameter' Dtype base_ = this->layer_param_.margin_inner_product_param().base(); ~~~~~~~~~~~~~~~~~~ ^ src/caffe/layers/margin_inner_product_layer.cpp:410:19: note: in instantiation of member function 'caffe::MarginInnerProductLayer::Forward_cpu' requested here INSTANTIATE_CLASS(MarginInnerProductLayer); ^ src/caffe/layers/margin_inner_product_layer.cpp:113:37: error: no member named 'margin_inner_product_param' in 'caffe::LayerParameter' Dtype gamma_ = this->layer_param_.margin_inner_product_param().gamma(); ~~~~~~~~~~~~~~~~~~ ^ src/caffe/layers/margin_inner_product_layer.cpp:114:37: error: no member named 'margin_inner_product_param' in 'caffe::LayerParameter' Dtype power_ = this->layer_param_.margin_inner_product_param().power(); ~~~~~~~~~~~~~~~~~~ ^ src/caffe/layers/margin_inner_product_layer.cpp:115:42: error: no member named 'margin_inner_product_param' in 'caffe::LayerParameter' Dtype lambda_min_ = this->layer_param_.margin_inner_product_param().lambda_min(); ~~~~~~~~~~~~~~~~~~ ^ src/caffe/layers/margin_inner_product_layer.cpp:194:16: warning: unused variable 'label' [-Wunused-variable] const Dtype* label = bottom[1]->cpu_data(); ^ src/caffe/layers/margin_inner_product_layer.cpp:195:16: warning: unused variable 'x_norm_data' [-Wunused-variable] const Dtype* x_norm_data = x_norm_.cpu_data(); ^ src/caffe/layers/margin_inner_product_layer.cpp:272:16: warning: unused variable 'label' [-Wunused-variable] const Dtype* label = bottom[1]->cpu_data(); ^ src/caffe/layers/margin_inner_product_layer.cpp:410:19: note: in instantiation of member function 'caffe::MarginInnerProductLayer::Backward_cpu' requested here INSTANTIATE_CLASS(MarginInnerProductLayer); ^ src/caffe/layers/margin_inner_product_layer.cpp:273:16: warning: unused variable 'weight' [-Wunused-variable] const Dtype* weight = this->blobs_[0]->cpu_data(); ^ src/caffe/layers/margin_inner_product_layer.cpp:284:18: warning: unused variable 'x_norm_data' [-Wunused-variable] const Dtype* x_norm_data = x_norm_.cpu_data(); ^ src/caffe/layers/margin_inner_product_layer.cpp:23:30: error: no member named 'margin_inner_product_param' in 'caffe::LayerParameter' iter_ = this->layer_param_.margin_inner_product_param().iteration(); ~~~~~~~~~~~~~~~~~~ ^ src/caffe/layers/margin_inner_product_layer.cpp:410:19: note: in instantiation of member function 'caffe::MarginInnerProductLayer::LayerSetUp' requested here INSTANTIATE_CLASS(MarginInnerProductLayer); ^ src/caffe/layers/margin_inner_product_layer.cpp:26:45: error: no member named 'margin_inner_product_param' in 'caffe::LayerParameter' const int num_output = this->layer_param_.margin_inner_product_param().num_output(); ~~~~~~~~~~~~~~~~~~ ^ src/caffe/layers/margin_inner_product_layer.cpp:29:26: error: no member named 'margin_inner_product_param' in 'caffe::LayerParameter' this->layer_param_.margin_inner_product_param().axis()); ~~~~~~~~~~~~~~~~~~ ^ src/caffe/layers/margin_inner_product_layer.cpp:46:28: error: no member named 'margin_inner_product_param' in 'caffe::LayerParameter' this->layer_param_.margin_inner_product_param().weight_filler())); ~~~~~~~~~~~~~~~~~~ ^ src/caffe/layers/margin_inner_product_layer.cpp:57:26: error: no member named 'margin_inner_product_param' in 'caffe::LayerParameter' this->layer_param_.margin_inner_product_param().axis()); ~~~~~~~~~~~~~~~~~~ ^ src/caffe/layers/margin_inner_product_layer.cpp:410:19: note: in instantiation of member function 'caffe::MarginInnerProductLayer::Reshape' requested here INSTANTIATE_CLASS(MarginInnerProductLayer); ^ src/caffe/layers/margin_inner_product_layer.cpp:112:36: error: no member named 'margin_inner_product_param' in 'caffe::LayerParameter' Dtype base_ = this->layer_param_.margin_inner_product_param().base(); ~~~~~~~~~~~~~~~~~~ ^ src/caffe/layers/margin_inner_product_layer.cpp:410:19: note: in instantiation of member function 'caffe::MarginInnerProductLayer::Forward_cpu' requested here INSTANTIATE_CLASS(MarginInnerProductLayer); ^ src/caffe/layers/margin_inner_product_layer.cpp:113:37: error: no member named 'margin_inner_product_param' in 'caffe::LayerParameter' Dtype gamma_ = this->layer_param_.margin_inner_product_param().gamma(); ~~~~~~~~~~~~~~~~~~ ^ src/caffe/layers/margin_inner_product_layer.cpp:114:37: error: no member named 'margin_inner_product_param' in 'caffe::LayerParameter' Dtype power_ = this->layer_param_.margin_inner_product_param().power(); ~~~~~~~~~~~~~~~~~~ ^ src/caffe/layers/margin_inner_product_layer.cpp:115:42: error: no member named 'margin_inner_product_param' in 'caffe::LayerParameter' Dtype lambda_min_ = this->layer_param_.margin_inner_product_param().lambda_min(); ~~~~~~~~~~~~~~~~~~ ^ src/caffe/layers/margin_inner_product_layer.cpp:194:16: warning: unused variable 'label' [-Wunused-variable] const Dtype* label = bottom[1]->cpu_data(); ^ src/caffe/layers/margin_inner_product_layer.cpp:195:16: warning: unused variable 'x_norm_data' [-Wunused-variable] const Dtype* x_norm_data = x_norm_.cpu_data(); ^ src/caffe/layers/margin_inner_product_layer.cpp:272:16: warning: unused variable 'label' [-Wunused-variable] const Dtype* label = bottom[1]->cpu_data(); ^ src/caffe/layers/margin_inner_product_layer.cpp:410:19: note: in instantiation of member function 'caffe::MarginInnerProductLayer::Backward_cpu' requested here INSTANTIATE_CLASS(MarginInnerProductLayer); ^ src/caffe/layers/margin_inner_product_layer.cpp:273:16: warning: unused variable 'weight' [-Wunused-variable] const Dtype* weight = this->blobs_[0]->cpu_data(); ^ src/caffe/layers/margin_inner_product_layer.cpp:284:18: warning: unused variable 'x_norm_data' [-Wunused-variable] const Dtype* x_norm_data = x_norm_.cpu_data(); ^ fatal error: too many errors emitted, stopping now [-ferror-limit=] 10 warnings and 20 errors generated. make: *** [.build_release/src/caffe/layers/margin_inner_product_layer.o] Error 1 make: *** Waiting for unfinished jobs.... src/caffe/layers/crop_layer.cpp:45:11: error: no member named 'Reshape' in 'std::__1::vector<int, std::__1::allocator >' offsets.Reshape(offsets_shape);

src/caffe/layers/crop_layer.cpp:46:30: error: no member named 'mutable_cpu_data' in 'std::__1::vector<int, std::__1::allocator<int> >' int* offset_data = offsets.mutable_cpu_data(); ~~~~~~~ ^ src/caffe/layers/crop_layer.cpp:72:3: error: use of undeclared identifier 'src_strides_' src_strides_.Reshape(offsets_shape); ^ src/caffe/layers/crop_layer.cpp:73:3: error: use of undeclared identifier 'dest_strides_' dest_strides_.Reshape(offsets_shape); ^ src/caffe/layers/crop_layer.cpp:75:5: error: use of undeclared identifier 'src_strides_' src_strides_.mutable_cpu_data()[i] = bottom[0]->count(i + 1, input_dim); ^ src/caffe/layers/crop_layer.cpp:76:5: error: use of undeclared identifier 'dest_strides_' dest_strides_.mutable_cpu_data()[i] = top[0]->count(i + 1, input_dim); ^ src/caffe/layers/crop_layer.cpp:81:24: error: out-of-line definition of 'crop_copy' does not match any declaration in 'CropLayer<Dtype>' void CropLayer<Dtype>::crop_copy(const vector<Blob<Dtype>*>& bottom, ^~~~~~~~~ src/caffe/layers/crop_layer.cpp:127:34: error: no member named 'cpu_data' in 'std::__1::vector<int, std::__1::allocator<int> >' crop_copy(bottom, top, offsets.cpu_data(), indices, 0, bottom_data, top_data, ~~~~~~~ ^ src/caffe/layers/crop_layer.cpp:140:36: error: no member named 'cpu_data' in 'std::__1::vector<int, std::__1::allocator<int> >' crop_copy(bottom, top, offsets.cpu_data(), indices, 0, top_diff, ~~~~~~~ ^ 9 errors generated. make: *** [.build_release/src/caffe/layers/crop_layer.o] Error 1
opened by onexuan 4
question about gradient calculation with respect to weight
In Margin Inner Product, the gradient with respect to weight is very simple:

// Gradient with respect to weight if (this->param_propagate_down_[0]) { caffe_cpu_gemm<Dtype>(CblasTrans, CblasNoTrans, N_, K_, M_, (Dtype)1., top_diff, bottom_data, (Dtype)1., this->blobs_[0]->mutable_cpu_diff()); }

But in large margin softmax, the gradient calcuation is much more complex...

Can you please tell me how to simplify the gradient calcuation? I failed to derive it ...
opened by jay2002 4
what is the function of normalizing weights?

As we know, the decision boundary in softmax loss is (W1 −W2)x+b1 −b2 =0, where Wi and bi are weights and bias in softmax loss, respectively. If we define x as a feature vector and constrain ∥W1∥=∥W2∥=1 and b1 =b2 =0. But I want to know what is the function of normalizing weights ? Anyone can give some advises?

opened by PapaMadeleine2022 0
Effect of mean and scale during the training
Hello, Thanks for the great work. One thing I noticed is that if I change the default mean and scale value, the softmax loss overflows. The problem is that I can not use this 127.5 and 0.0078125 as the scale value somehow. Could anyone please suggest, how to use mean value as 127 and scale as 1.0 and still make the model convergent?

transform_param { mean_value: 127.5 mean_value: 127.5 mean_value: 127.5 scale: 0.0078125 mirror: true }

Thanks.
opened by Maharshi-Aupera 0
why my softmax loss [email protected] , who can help me ?

net: "sphereface/sphereface_model.prototxt"

#文件总数:4686817 , 训练总数:4360447 , 验证总数:326370

test_iter: 5100 #326370 / 256 test_interval: 17033

base_lr: 0.05 lr_policy: "multistep" gamma: 0.1

#17033 stepvalue: 34066 stepvalue: 85165 stepvalue: 136264 stepvalue: 170330 max_iter: 200000 iter_size: 2

display: 500 momentum: 0.9 weight_decay: 0.0005 snapshot: 2000 snapshot_prefix: "sphereface/modal"

solver_mode: GPU

I1218 09:44:23.435408 42442 solver.cpp:289] Solving SpherefaceNet-20 I1218 09:44:23.435416 42442 solver.cpp:290] Learning Rate Policy: multistep I1218 09:44:23.438875 42442 solver.cpp:347] Iteration 0, Testing net (#0) I1218 09:44:23.669739 42442 blocking_queue.cpp:49] Waiting for data I1218 09:48:31.001149 42442 blocking_queue.cpp:49] Waiting for data I1218 09:52:34.625572 42442 blocking_queue.cpp:49] Waiting for data I1218 09:55:54.271853 42442 blocking_queue.cpp:49] Waiting for data I1218 10:00:13.620533 42442 blocking_queue.cpp:49] Waiting for data I1218 10:04:16.510815 42442 blocking_queue.cpp:49] Waiting for data I1218 10:04:38.789021 42764 data_layer.cpp:73] Restarting data prefetching from start. I1218 10:04:38.832135 42442 solver.cpp:414] Test net output #0: lambda = 11.936 I1218 10:04:38.832242 42442 solver.cpp:414] Test net output #1: softmax_loss = 11.2369 (* 1 = 11.2369 loss) I1218 10:04:39.691632 42442 solver.cpp:239] Iteration 0 (-nan iter/s, 1216.21s/500 iters), loss = 9.59401 I1218 10:04:39.691682 42442 solver.cpp:258] Train net output #0: lambda = 806.452 I1218 10:04:39.691699 42442 solver.cpp:258] Train net output #1: softmax_loss = 9.58034 (* 1 = 9.58034 loss) I1218 10:04:39.691731 42442 sgd_solver.cpp:112] Iteration 0, lr = 0.05 I1218 10:15:29.681690 42442 solver.cpp:239] Iteration 500 (0.769271 iter/s, 649.966s/500 iters), loss = 9.75478 I1218 10:15:29.681933 42442 solver.cpp:258] Train net output #0: lambda = 8.2481 I1218 10:15:29.681959 42442 solver.cpp:258] Train net output #1: softmax_loss = 9.59993 (* 1 = 9.59993 loss) I1218 10:15:29.681974 42442 sgd_solver.cpp:112] Iteration 500, lr = 0.05 I1218 10:21:11.482547 42442 blocking_queue.cpp:49] Waiting for data I1218 10:29:43.602144 42442 solver.cpp:239] Iteration 1000 (0.585557 iter/s, 853.888s/500 iters), loss = 9.61311 I1218 10:29:43.602360 42442 solver.cpp:258] Train net output #0: lambda = 5 I1218 10:29:43.602385 42442 solver.cpp:258] Train net output #1: softmax_loss = 9.30922 (* 1 = 9.30922 loss) I1218 10:29:43.602398 42442 sgd_solver.cpp:112] Iteration 1000, lr = 0.05 I1218 10:36:50.166695 42442 blocking_queue.cpp:49] Waiting for data I1218 10:44:11.976773 42442 solver.cpp:239] Iteration 1500 (0.57581 iter/s, 868.342s/500 iters), loss = 9.77459 I1218 10:44:11.977003 42442 solver.cpp:258] Train net output #0: lambda = 5 I1218 10:44:11.977030 42442 solver.cpp:258] Train net output #1: softmax_loss = 9.92095 (* 1 = 9.92095 loss) I1218 10:44:11.977046 42442 sgd_solver.cpp:112] Iteration 1500, lr = 0.05 I1218 10:55:30.798585 42442 solver.cpp:464] Snapshotting to binary proto file sphereface/modal/sphereface_solver_iter_2000.caffemodel I1218 10:55:32.176621 42442 sgd_solver.cpp:284] Snapshotting solver state to binary proto file sphereface/modal/sphereface_solver_iter_2000.solverstate I1218 10:55:33.252632 42442 solver.cpp:239] Iteration 2000 (0.733945 iter/s, 681.25s/500 iters), loss = 9.62272 I1218 10:55:33.252786 42442 solver.cpp:258] Train net output #0: lambda = 5 I1218 10:55:33.252812 42442 solver.cpp:258] Train net output #1: softmax_loss = 9.62173 (* 1 = 9.62173 loss) I1218 10:55:33.252830 42442 sgd_solver.cpp:112] Iteration 2000, lr = 0.05 I1218 11:02:54.046000 42442 solver.cpp:239] Iteration 2500 (1.13436 iter/s, 440.777s/500 iters), loss = 9.63188 I1218 11:02:54.046308 42442 solver.cpp:258] Train net output #0: lambda = 5 I1218 11:02:54.046353 42442 solver.cpp:258] Train net output #1: softmax_loss = 9.62981 (* 1 = 9.62981 loss) I1218 11:02:54.046381 42442 sgd_solver.cpp:112] Iteration 2500, lr = 0.05

opened by ghost 0

Owner

Weiyang Liu

GitHub

DVG-Face: Dual Variational Generation for Heterogeneous Face Recognition, TPAMI 2021

DVG-Face: Dual Variational Generation for HFR This repo is a PyTorch implementation of DVG-Face: Dual Variational Generation for Heterogeneous Face Re

52 Dec 30, 2022

Face Library is an open source package for accurate and real-time face detection and recognition

Face Library Face Library is an open source package for accurate and real-time face detection and recognition. The package is built over OpenCV and us

52 Nov 9, 2022

A large-scale face dataset for face parsing, recognition, generation and editing.

CelebAMask-HQ [Paper] [Demo] CelebAMask-HQ is a large-scale face image dataset that has 30,000 high-resolution face images selected from the CelebA da

1.7k Dec 26, 2022

Realtime Face Anti Spoofing with Face Detector based on Deep Learning using Tensorflow/Keras and OpenCV

Realtime Face Anti-Spoofing Detection ?? Realtime Face Anti Spoofing Detection with Face Detector to detect real and fake faces Please star this repo

86 Aug 3, 2022

Deep Face Recognition in PyTorch

Face Recognition in PyTorch By Alexey Gruzdev and Vladislav Sovrasov Introduction A repository for different experimental Face Recognition models such

141 Sep 11, 2022

A deep learning library that makes face recognition efficient and effective

Distributed Arcface Training in Pytorch This is a deep learning library that makes face recognition efficient, and effective, which can train tens of

10 Nov 23, 2021

img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation

img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation Figure 1: We estimate the 6DoF rigid transformation of a 3D face (rendered in si

519 Dec 29, 2022

Code for HLA-Face: Joint High-Low Adaptation for Low Light Face Detection (CVPR21)

HLA-Face: Joint High-Low Adaptation for Low Light Face Detection The official PyTorch implementation for HLA-Face: Joint High-Low Adaptation for Low L

77 Dec 8, 2022

[TIP 2021] SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction

SADRNet Paper link: SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction Requirements python

Multimedia Computing Group, Nanjing University

99 Dec 30, 2022

Swapping face using Face Mesh with TensorFlow Lite

17 Apr 26, 2022

Face Synthetics dataset is a collection of diverse synthetic face images with ground truth labels.

The Face Synthetics dataset Face Synthetics dataset is a collection of diverse synthetic face images with ground truth labels. It was introduced in ou

608 Jan 2, 2023

VGGFace2-HQ - A high resolution face dataset for face editing purpose

The first open source high resolution dataset for face swapping!!! A high resolution version of VGGFace2 for academic face editing purpose

232 Dec 29, 2022

Python tools for 3D face: 3DMM, Mesh processing(transform, camera, light, render), 3D face representations.

face3d: Python tools for processing 3D face Introduction This project implements some basic functions related to 3D faces. You can use this to process

2.3k Dec 30, 2022

AI Face Mesh: This is a simple face mesh detection program based on Artificial intelligence.

AI Face Mesh: This is a simple face mesh detection program based on Artificial Intelligence which made with Python. It's able to detect 468 different

1 Jan 13, 2022

Video-face-extractor - Video face extractor with Python

Python face extractor Setup Create the srcvideos and faces directories Put your

2 Feb 3, 2022

Face and Pose detector that emits MQTT events when a face or human body is detected and not detected.

Face Detect MQTT Face or Pose detector that emits MQTT events when a face or human body is detected and not detected. I built this as an alternative t

38 Oct 21, 2022

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Deep Text Search - AI Based Text Search & Recommendation System Deep Text Search is an AI-powered multilingual text search and recommendation engine w

19 Sep 29, 2022

Pytorch Implementation of Adversarial Deep Network Embedding for Cross-Network Node Classification

Pytorch Implementation of Adversarial Deep Network Embedding for Cross-Network Node Classification (ACDNE) This is a pytorch implementation of the Adv

8 Oct 13, 2022

A PyTorch Toolbox for Face Recognition

FaceX-Zoo FaceX-Zoo is a PyTorch toolbox for face recognition. It provides a training module with various supervisory heads and backbones towards stat

1.6k Jan 6, 2023

SphereFace: Deep Hypersphere Embedding for Face Recognition

Related tags

Overview

SphereFace: Deep Hypersphere Embedding for Face Recognition

License

Update

Contents

Introduction

Citation

Requirements

Installation

Usage

Part 1: Preprocessing

Part 2: Train

Part 3: Test

Models

Results

Video Demo

Note

Third-party re-implementation

Resources for angular margin learning

Contact

Comments

Owner

Weiyang Liu

DVG-Face: Dual Variational Generation for Heterogeneous Face Recognition, TPAMI 2021

Face Library is an open source package for accurate and real-time face detection and recognition

A large-scale face dataset for face parsing, recognition, generation and editing.

Realtime Face Anti Spoofing with Face Detector based on Deep Learning using Tensorflow/Keras and OpenCV

Deep Face Recognition in PyTorch

A deep learning library that makes face recognition efficient and effective

img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation

Code for HLA-Face: Joint High-Low Adaptation for Low Light Face Detection (CVPR21)

[TIP 2021] SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction

Swapping face using Face Mesh with TensorFlow Lite

Face Synthetics dataset is a collection of diverse synthetic face images with ground truth labels.

VGGFace2-HQ - A high resolution face dataset for face editing purpose

Python tools for 3D face: 3DMM, Mesh processing(transform, camera, light, render), 3D face representations.

AI Face Mesh: This is a simple face mesh detection program based on Artificial intelligence.

Video-face-extractor - Video face extractor with Python

Face and Pose detector that emits MQTT events when a face or human body is detected and not detected.

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Pytorch Implementation of Adversarial Deep Network Embedding for Cross-Network Node Classification

A PyTorch Toolbox for Face Recognition