Implementation of DropLoss for Long-Tail Instance Segmentation in Pytorch


[AAAI 2021]DropLoss for Long-Tail Instance Segmentation

[AAAI 2021] DropLoss for Long-Tail Instance Segmentation
Ting-I Hsieh*, Esther Robb*, Hwann-Tzong Chen, Jia-Bin Huang.
Association for the Advancement of Artificial Intelligence (AAAI), 2021

Image Figure: Measuring the performance tradeoff. Comparison between rare, common, and frequent categories AP for baselines and our method. We visualize the tradeoff for ‘common vs. frequent’ and ‘rare vs. frequent’as a Pareto frontier, where the top-right position indicates an ideal tradeoff between objectives. DropLoss achieves an improved tradeoff between object categories, resulting in higher overall AP.

This project is a pytorch implementation of DropLoss for Long-Tail Instance Segmentation. DropLoss improves long-tail instance segmentation by adaptively removing discouraging gradients to infrequent classes. A majority of the code is modified from facebookresearch/detectron2 and tztztztztz/eql.detectron2.


  • Training code.
  • Evaluation code.
  • LVIS v1.0 datasets.
  • Provide checkpoint model.



  • Linux or macOS with Python = 3.7
  • PyTorch = 1.4 and torchvision that matches the PyTorch installation. Install them together at to make sure of this
  • OpenCV (optional but needed for demos and visualization)

Build Detectron2 from Source

gcc & g++ ≥ 5 are required. ninja is recommended for faster build.

After installing them, run:

python -m pip install 'git+'
# (add --user if you don't have permission)

# Or, to install it from a local clone:
git clone
python -m pip install -e detectron2

# Or if you are on macOS
CC=clang CXX=clang++ ARCHFLAGS="-arch x86_64" python -m pip install ......

Remove the latest fvcore package and install an older version:

pip uninstall fvcore
pip install fvcore==0.1.1.post200513

LVIS Dataset

Following the instructions of to set up the LVIS dataset.


To train a model with 8 GPUs run:

cd /path/to/detectron2/projects/DropLoss
python --config-file configs/droploss_mask_rcnn_R_50_FPN_1x.yaml --num-gpus 8


Model evaluation can be done similarly:

cd /path/to/detectron2/projects/DropLoss
python --config-file configs/droploss_mask_rcnn_R_50_FPN_1x.yaml --eval-only MODEL.WEIGHTS /path/to/model_checkpoint

Citing DropLoss

If you use DropLoss, please use the following BibTeX entry.

  author 	= {Hsieh, Ting-I and Esther Robb and Chen, Hwann-Tzong and Huang, Jia-Bin},
  title     = {DropLoss for Long-Tail Instance Segmentation},
  booktitle = {Proceedings of the Workshop on Artificial Intelligence Safety 2021
               (SafeAI 2021) co-located with the Thirty-Fifth {AAAI} Conference on
               Artificial Intelligence {(AAAI} 2021), Virtual, February 8, 2021},
  year      = {2021}
  • Update


    Thanks for your contribution!

    If you're sending a large PR (e.g., >50 lines), please open an issue first about the feature / bug, and indicate how you want to contribute. See more at about how we handle PRs.

    Before submitting a PR, please run dev/ to lint the code.

    opened by e-271 0
  • Update


    Thanks for your contribution!

    If you're sending a large PR (e.g., >50 lines), please open an issue first about the feature / bug, and indicate how you want to contribute. See more at about how we handle PRs.

    Before submitting a PR, please run dev/ to lint the code.

    opened by e-271 0
  • About the implementation of eq(6)

    About the implementation of eq(6)

    Hi @timy90022 , Thanks for sharing your excellent work. When I try to understand the implementation of Eq(6), I find a bit difference between the implementation and the Eq(6) in the paper.

     def exclude_func_and_ratio(self):
            # instance-level weight
            bg_ind = self.n_c
            weight = (self.gt_classes != bg_ind)
            gt_classes    = self.gt_classes[weight]
            # exclude_ratio = \mu_{f_j}}
            exclude_ratio = torch.mean((self.freq_info[gt_classes] < self.lambda_).float())
            # weight = E(r)
            weight = weight.float().view(self.n_i, 1).expand(self.n_i, self.n_c)
            return weight, exclude_ratio
        def threshold_func(self):
            # class-level weight
            weight = self.pred_class_logits.new_zeros(self.n_c)
            # weight = T_{lambda}(f_j)
            weight[self.freq_info < self.lambda_] = 1
            weight = weight.view(1, self.n_c).expand(self.n_i, self.n_c)
            # fg = E(r)
            # ratio = \mu_{f_j}
            fg, ratio = self.exclude_func_and_ratio()
            # bg =  1 - E(r)
            bg = 1 - fg
            # random = (1-E(r)) * rand
            random = torch.rand_like(bg) * bg
            random = torch.where(random>ratio, torch.ones_like(random), torch.zeros_like(random))
            # weight = { [ (1-E(r)) * rand > \mu_{f_j} ? 1 : 0 ] + E(r) } * T_{\lambda}(f_j)
            weight = (random + fg) * weight
            return weight
        def drop_loss(self):
            self.n_i, self.n_c = self.pred_class_logits.size()
            def expand_label(pred, gt_classes):
                target = pred.new_zeros(self.n_i, self.n_c + 1)
                target[torch.arange(self.n_i), gt_classes] = 1
                return target[:, :self.n_c]
            target = expand_label(self.pred_class_logits, self.gt_classes)
            # drop_w =  1 - { [ (1-E(r)) * rand > \mu_{f_j} ? 1 : 0 ] + E(r) } * T_{\lambda}(f_j) * (1-y_j)
            # When E(r) = 1, drop_w = 1 - T_{\lambda}(f_j) * (1-y_j)
            # When E(r) = 0, drop_w = 1 - { [ (1-E(r)) * rand > \mu_{f_j} ? 1 : 0 ] } * T_{\lambda}(f_j) * (1-y_j) ????
                            # when rand > \mu_{f_j}, drop_w = 1 - T_{\lambda}(f_j) * (1-y_j)
                            # when rand <= \mu_{f_j}, drop_w = 1
            self.drop_w = 1 - self.threshold_func() * (1 - target)
            self.cls_loss = F.binary_cross_entropy_with_logits(self.pred_class_logits, target,
            return torch.sum(self.cls_loss * self.drop_w) / self.n_i
    # When E(r) = 0, drop_w = 1 - [ rand > \mu_{f_j} ? 1 : 0 ] *  T_{\lambda}(f_j) * (1-y_j) 
    # when rand > \mu_{f_j}, drop_w = 1 - T_{\lambda}(f_j) * (1-y_j)???????
    # when rand <= \mu_{f_j}, drop_w = 1

    The implementation is inconsistent with the otherwise condition in Eq(6) .

    drop_w = 1 - T_{\lambda}(f_j) * (1-y_j)

    Could you please explain that? Is there any misunderstanding about the code?

    opened by xizero00 0
