Code for the Lovász-Softmax loss (CVPR 2018)

Maxim Berman

Last update: Jan 4, 2023

Related tags

Overview

The Lovász-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks

Maxim Berman, Amal Rannen Triki, Matthew B. Blaschko

ESAT-PSI, KU Leuven, Belgium.

Published in CVPR 2018. See project page, arxiv paper, paper on CVF open access.

PyTorch implementation of the loss layer (pytorch folder)

Files included:

lovasz_losses.py: Standalone PyTorch implementation of the Lovász hinge and Lovász-Softmax for the Jaccard index
demo_binary.ipynb: Jupyter notebook showcasing binary training of a linear model, with the Lovász Hinge and with the Lovász-Sigmoid.
demo_multiclass.ipynb: Jupyter notebook showcasing multiclass training of a linear model with the Lovász-Softmax

The binary lovasz_hinge expects real-valued scores (positive scores correspond to foreground pixels).

The multiclass lovasz_softmax expect class probabilities (the maximum scoring category is predicted). First use a Softmax layer on the unnormalized scores.

TensorFlow implementation of the loss layer (tensorflow folder)

Files included:

lovasz_losses_tf.py: Standalone TensorFlow implementation of the Lovász hinge and Lovász-Softmax for the Jaccard index
demo_binary_tf.ipynb: Jupyter notebook showcasing binary training of a linear model, with the Lovász Hinge and with the Lovász-Sigmoid.
demo_multiclass_tf.ipynb: Jupyter notebook showcasing the application of the multiclass loss with the Lovász-Softmax

Warning: the losses values and gradients have been tested to be the same as in PyTorch (see notebooks), however we have not used the TF implementation in a training setting.

Usage

See the demos for simple proofs of principle.

FAQ

How should I use the Lovász-Softmax loss?

The loss can be optimized on its own, but the optimal optimization hyperparameters (learning rates, momentum) might be different from the best ones for cross-entropy. As discussed in the paper, optimizing the dataset-mIoU (Pascal VOC measure) is dependent on the batch size and number of classes. Therefore you might have best results by optimizing with cross-entropy first and finetuning with our loss, or by combining the two losses.

See for example how the work Land Cover Classification From Satellite Imagery With U-Net and Lovasz-Softmax Loss by Alexander Rakhlin et al. used our loss in the CVPR 18 DeepGlobe challenge.

Inference in Tensorflow is very slow...

Compiling from Tensorflow master (or using a future distribution that includes commit tensorflow/tensorflow@73e3215) should solve this problem; see issue #6.

Citation

Please cite

@inproceedings{berman2018lovasz,
  title={The Lov{\'a}sz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks},
  author={Berman, Maxim and Rannen Triki, Amal and Blaschko, Matthew B},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={4413--4421},
  year={2018}
}

Comments

Fail to improve the performance if I train the model from scratch.

Really interesting work. I have a baseline with softmax loss with the Deeplabv3 and achieve mIOU=76.7 on the Cityscapes.

And I simply replace the cross entropy loss with your proposed loss and train the models with the same learning rate and weight decay, but I only achieve mIOU=64.7.

Could you give me some hint?

I also notice that you also do not train the ENet from scratch and you just finetune the models.

Besides, I also conduct a small experiments to train the models with both the cross entropy loss and your proposed loss, which achieves a good performance: mIOU=78.4.

It would be great if you could share me your advice!
question answered

opened by PkuRainBow 12
plug'n play implentation for tensorflow/keras

I'm trying to use your tensorflow implementation for a U-net using keras. The problem I face is that I cannot simply plugin the lovasz_softmax loss function U-Net model since the loss function takes the labels as image batches dim(labels) = (batchsize, width, height, 1) and the probas in the one-hot-vector notation - as it should be for such a problem (dim(probas)=(batchsize, width, height, n_classes).

Simply taking argmax of the probas does not work because keras raises an Exception when trying to calculate the gradient of argmax during training. If I understood your paper correctly, this is problem what the use lovasz softmax should avoid.

model.compile(optimizer=SDG() , loss=lovasz_sofmax, metrics=["accuracy"])

opened by rhoef 9
Got a very low MIoU after simply swapping out the cross entropy loss for "lovasz_softmax"
Hello, nice to read this paper. I have encountered the problem that I got a very low miou(0.003) from Deeplabv3+ with Lovasz_softmax. It can normally achieve miou=76% using cross entropy loss. Environment: pytorch 1.0 Ubuntu 16.04 batch size: 10 dataset: Pascal VOC 2012 (aug) loaded ImageNet pretrained ResNet-101 weight

And here is the code of Lovasz softmax:

class LovaszSoftmax(nn.Module): def __init__(self, per_image=False): super(LovaszSoftmax, self).__init__() self.lovasz_softmax = lovasz_softmax self.per_image = per_image def forward(self, pred, label): pred = F.softmax(pred, dim=1) return self.lovasz_softmax(pred, label, per_image=self.per_image, ignore=255)

Thanks!
opened by Tensorfengsheng1926 5
Labels should be only {-1,1} in case of binary segmentation?

Hi there,

Thanks for sharing the code of this fantastic job. Congratulations on your CVPR paper! I have a question about the labels in the ground-truths. The gt labels should be {-1,1} (-1:background, 1:foreground) and for instance {0,1} (0:background, 1:foreground) doesn't work properly, right?

opened by SorourMo 5
Lovasz softmax with 1 class and small batch does not learn
I have an image segmentation task with small batch size (4-8) and some samples that have only the background (negative) class.

I have implemented lovasz softmax as below:

loss2 = lovasz_softmax(probs, labels, classes=[1], per_image=False)

where probs are B, H, W, C, and label is B, H, W with a range of [0, 1]

However, the network does not learn at all -- the output feature maps look random, and tuning the learning rate does not improve the issue.

The same network works fine with dice, Tversky, focal, or BCE loss.

I think it is due to the presence of background-only classes -- I know that classes = 'present' solves this for multi class problems. Is there a way to do the same for a binary lovasz softmax?
opened by JohnMBrandt 4

"name 'ifilterfalse' is not defined" in Python3

Hi, I've tried to run PyTorch implementation in Python 3.6 and got the error:

    226     l = iter(l)
    227     if ignore_nan:
--> 228         l = ifilterfalse(isnan, l)
    229     try:
    230         n = 1

NameError: name 'ifilterfalse' is not defined

I think the function needs to be aliased after import in Python 3 like this:

from itertools import filterfalse as filterfalse

opened by mxwell 3

Fix the case that labels are all ignored
An error occurred in line 177 ( C = probas.size(1) ) when input labels are all ignored, because labels got empty. This PR deal with such problems.

One thing I concerned about is the difference between the lines I added and below lines

if len(labels) == 0: # only void pixels, the gradients should be 0 return logits.sum() * 0.

Above is a part of the original implementation of lovasz_flat. This doesn't exclude void labels for calculation the loss. Is it excepted behavior?
opened by lyakaap 3
About the slow speed on tensorflow

Hello, I loved your work on Lovasz softmax very much and implemented it on a modified version of Deeplabv3+ in Tensorflow. However, I experienced significant speed drop, the time used per step increased from 0.4s ( using cross entropy) to now almost 3.8s. Is this normal or did I do something wrong? Thankyou!

opened by MarkYangjiayi 3
possible bug

Hi, Thank you for your work.

I believe there is error here

https://github.com/bermanmaxim/LovaszSoftmax/blob/6309c68a2276ada25ebf04692575bce937460f1a/lovasz_losses.py#L26

should be jaccard[1:p] = jaccard[1:p] - jaccard[:p-1]

opened by alexander-rakhlin 3
mIOU decreasing as the Lovasz Hinge Loss decreases

Hi Maxim Berman! Great Work. I am using the lovasz_hinge loss and iou_binary as my metric. My labels are binary masks of foreground represented as 1 and background as 0. And I am currently overfitting one example just to see how my model(which is a form of hyper-network) works. But as the Lovasz Hinge loss decreases, the output of iou_binary also decreases. Thanks alot in advance for helping
wontfix

opened by m-hamza-mughal 1
Some TensorFlow implementation problems.

hi, Maxim

you present a very interesting and solid work! but I met some implementation error while using your lovasz loss in my deeplab v3+. My initial loss is tf.losses.softmax_cross_entropy, and I prepared:

onehot_labels: [batch_size, num_classes] target one-hot-encoded labels. logits: [batch_size, num_classes] logits outputs of the network .

as its input, but it just didn't fit in your loss, could you please give some advice about how to transfer original params into your params like probas, labels? Thank you!

opened by ZanePenn 1

How to understand the lovasz_grad when gt_sorted class number>1?

hi, @bermanmaxim jaccard_loss = 1 - IOU why do the jaccard[1:] - jaccard[0:-1]?

def lovasz_grad(gt_sorted):
    """
    Computes gradient of the Lovasz extension w.r.t sorted errors
    See Alg. 1 in paper
    """
    p = len(gt_sorted)
    gts = gt_sorted.sum()
    intersection = gts - gt_sorted.float().cumsum(0)
    union = gts + (1 - gt_sorted).float().cumsum(0)
    jaccard = 1. - intersection / union
    if p > 1: # cover 1-pixel case
        jaccard[1:p] = jaccard[1:p] - jaccard[0:-1]
    return jaccard

opened by LeopoldACC 0

ModuleNotFoundError: No module named 'lovasz'

pip install lovasz

ERROR: Could not find a version that satisfies the requirement lovasz (from versions: none) ERROR: No matching distribution found for lovasz Note: you may need to restart the kernel to use updated packages.

Please let me know the solution to this.

opened by Taitai6521 0
how to combine lovasz hinge and bce in binary segmentaion task appropriately?

Dear BermanMaxin, Thanks for your great work,it has helped me a lot! I got a confusion that how to combine lovasz hinge and bce in binary segmentaion task appropriately. As we all know,lovasz hinge expects logits(without sigmoid),but bce need relust after sigmoid.What confuses me is that whether these two different types(with/without sigmoid) losses can get along well. Other combo loss,e.g. bce+lovasz softmax,bce+dice, all need sigmoid so in my mind there is no problem. Could you give me some advice about this,thanks! In addition,if 'per_image=False' can bring a faster convergence when batch size is big. Thanks.

opened by yu-Mas 1
multi classes with lovasz_hinge

Hi, thanks for your great work. But I have a question here. emm, when I have a multi classes semantic segmentation task, I can convert the label to one-hot format and do sigmoid to the output of the network then apply nn.BCELoss() to the label and outputs. (Certainly, one-hot + no sigmoid outputs + nn.BCEWithLogitsLoss is also ok), when i do the inference, i just do torch.sigmoid to the outputs of the network and set the thershold as 0.5, then i can get the correct results of semantic segmentation.So may I do the same thing to the lovasz_hinge()? one-hot + no sigmoid outputs + lovasz_hinge?Does that work? And the inference process is same as above?

opened by hwh-hit 0
weird results

Hi，thanks for your work. I have added the tensorflow version of lovaszsoftmax to my task，I trained the model with Cross entropy loss first, then fine tuned it with Cross entropy loss + lovaszsoftmax loss（the weights is 1:1）, the mIoU improved about 2%. But when tested on videos it seems like that the model without lovaszsoftmax loss performs better, especially on recall. Do you have any idea about this, thank you.

my task is 2-class lane segmentation

opened by phoenares 0

Owner

Maxim Berman

GitHub http://bmax.im/LovaszSoftmax

Implement of "Training deep neural networks via direct loss minimization" in PyTorch for 0-1 loss

This is the implementation of "Training deep neural networks via direct loss minimization" published at ICML 2016 in PyTorch. The implementation targe

1 Jan 18, 2022

3D ResNets for Action Recognition (CVPR 2018)

3D ResNets for Action Recognition Update (2020/4/13) We published a paper on arXiv. Hirokatsu Kataoka, Tenga Wakamiya, Kensho Hara, and Yutaka Satoh,

3.5k Jan 6, 2023

StarGAN - Official PyTorch Implementation (CVPR 2018)

StarGAN - Official PyTorch Implementation ***** New: StarGAN v2 is available at https://github.com/clovaai/stargan-v2 ***** This repository provides t

5.1k Jan 4, 2023

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition, CVPR 2018

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place

294 Dec 12, 2022

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

337 Dec 15, 2022

The Noise Contrastive Estimation for softmax output written in Pytorch

An NCE implementation in pytorch About NCE Noise Contrastive Estimation (NCE) is an approximation method that is used to work around the huge computat

287 Nov 25, 2022

[NeurIPS 2021] Galerkin Transformer: a linear attention without softmax

[NeurIPS 2021] Galerkin Transformer: linear attention without softmax Summary A non-numerical analyst oriented explanation on Toward Data Science abou

159 Dec 20, 2022

an implementation of softmax splatting for differentiable forward warping using PyTorch

softmax-splatting This is a reference implementation of the softmax splatting operator, which has been proposed in Softmax Splatting for Video Frame I

338 Dec 28, 2022

GB-CosFace: Rethinking Softmax-based Face Recognition from the Perspective of Open Set Classification

GB-CosFace: Rethinking Softmax-based Face Recognition from the Perspective of Open Set Classification This is the official pytorch implementation of t

5 Nov 14, 2022

Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention

cosFormer Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention Update log 2022/2/28 Add core code License This

120 Dec 15, 2022

[CVPR 2022] Official code for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration"

MDCA Calibration This is the official PyTorch implementation for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved

21 Dec 22, 2022

Code for the Lovász-Softmax loss (CVPR 2018)

Related tags

Overview

The Lovász-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks

PyTorch implementation of the loss layer (pytorch folder)

TensorFlow implementation of the loss layer (tensorflow folder)

Usage

FAQ

Citation

Comments

Owner

Maxim Berman

Implement of "Training deep neural networks via direct loss minimization" in PyTorch for 0-1 loss

3D ResNets for Action Recognition (CVPR 2018)

StarGAN - Official PyTorch Implementation (CVPR 2018)

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

The Noise Contrastive Estimation for softmax output written in Pytorch

[NeurIPS 2021] Galerkin Transformer: a linear attention without softmax

an implementation of softmax splatting for differentiable forward warping using PyTorch

GB-CosFace: Rethinking Softmax-based Face Recognition from the Perspective of Open Set Classification

Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention

[CVPR 2022] Official code for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration"

git《Investigating Loss Functions for Extreme Super-Resolution》(CVPR 2020) GitHub:

CVPR 2021: "The Spatially-Correlative Loss for Various Image Translation Tasks"

The official implementation of Equalization Loss v1 & v2 (CVPR 2020, 2021) based on MMDetection.

PyTorch implementation for Partially View-aligned Representation Learning with Noise-robust Contrastive Loss (CVPR 2021)

Official implementation for CVPR 2021 paper: Adaptive Class Suppression Loss for Long-Tail Object Detection

PyTorch code for our ECCV 2018 paper "Image Super-Resolution Using Very Deep Residual Channel Attention Networks"

Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun

Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"