In-Place Activated BatchNorm for Memory-Optimized Training of DNNs

Overview

In-Place Activated BatchNorm

In-Place Activated BatchNorm for Memory-Optimized Training of DNNs

In-Place Activated BatchNorm (InPlace-ABN) is a novel approach to reduce the memory required for training deep networks. It allows for up to 50% memory savings in modern architectures such as ResNet, ResNeXt and Wider ResNet by redefining BN + non linear activation as a single in-place operation, while smartly dropping or recomputing intermediate buffers as needed.

This repository contains a PyTorch implementation of the InPlace-ABN layer, as well as some training scripts to reproduce the ImageNet classification results reported in our paper.

We have now also released the inference code for semantic segmentation, together with the Mapillary Vistas trained model leading to #1 position on the Mapillary Vistas Semantic Segmentation leaderboard. More information can be found at the bottom of this page.

Citation

If you use In-Place Activated BatchNorm in your research, please cite:

@inproceedings{rotabulo2017place,
  title={In-Place Activated BatchNorm for Memory-Optimized Training of DNNs},
  author={Rota Bul\`o, Samuel and Porzi, Lorenzo and Kontschieder, Peter},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2018}
}

Overview

When processing a BN-Activation-Convolution sequence in the forward pass, most deep learning frameworks need to store two big buffers, i.e. the input x of BN and the input z of Conv. This is necessary because the standard implementations of the backward passes of BN and Conv depend on their inputs to calculate the gradients. Using Inplace-ABN to replace the BN-Activation sequence, we can safely discard x, thus saving up to 50% GPU memory at training time. To achieve this, we rewrite the backward pass of BN in terms of its output y, which is in turn reconstructed from z by inverting the activation function.

The parametrization for the scaling factor of BN changed compared to standard BN, in order to ensure an invertible transformation. Specifically, the scaling factor becomes .

Requirements

To install PyTorch, please refer to https://github.com/pytorch/pytorch#installation.

NOTE 1: our code requires PyTorch v1.1 or later

NOTE 2: we are only able to provide support for Linux platforms and CUDA versions >= 10.0

NOTE 3: in general, it is not possible to load weights from a network trained with standard BN into an InPlace-ABN network without severe performance degradation, due to the different handling of BN scaling parameters

To install the package containing the iABN layers:

pip install inplace-abn

Note that some parts of InPlace-ABN have native C++/CUDA implementations, meaning that the command above will need to compile them.

Alternatively, to download and install the latest version of our library, also obtaining a copy of the Imagenet / Vistas scripts:

git clone https://github.com/mapillary/inplace_abn.git
cd inplace_abn
python setup.py install
cd scripts
pip install -r requirements.txt

The last of the commands above will install some additional libraries required by the Imagenet / Vistas scripts.

Force compiling with CUDA

In order to force the compilation of the native CUDA functions on systems that do not have access to a GPU (e.g. Docker containers), two environment variables have to be set:

export TORCH_CUDA_ARCH_LIST="{archs}"
export IABN_FORCE_CUDA=1

where {archs} is a list of target CUDA architectures, e.g. Pascal;Volta, 6.0;6.5 etc.

Training on ImageNet-1k

Here you can find the results from our arXiv paper (top-1 / top-5 scores) with corresponding, trained models and md5 checksums, respectively. The model files provided below are made available under the license attached to ImageNet.

Network Batch 224 224, 10-crops 320 Trained models (+md5)
ResNeXt101, Std-BN 256 77.04 / 93.50 78.72 / 94.47 77.92 / 94.28 448438885986d14db5e870b95f814f91
ResNeXt101, InPlace-ABN 512 78.08 / 93.79 79.52 / 94.66 79.38 / 94.67 3b7a221cbc076410eb12c8dd361b7e4e
ResNeXt152, InPlace-ABN 256 78.28 / 94.04 79.73 / 94.82 79.56 / 94.67 2c8d572587961ed74611d534c5b2e9ce
WideResNet38, InPlace-ABN 256 79.72 / 94.78 81.03 / 95.43 80.69 / 95.27 1c085ab70b789cc1d6c1594f7a761007
ResNeXt101, InPlace-ABN sync 256 77.70 / 93.78 79.18 / 94.60 78.98 / 94.56 0a85a21847b15e5a242e17bf3b753849
DenseNet264, InPlace-ABN 256 78.57 / 94.17 79.72 / 94.93 79.49 / 94.89 0b413d67b725619441d0646d663865bf
ResNet50v1, InPlace-ABN sync 512 75.53 / 92.59 77.04 / 93.57 76.60 / 93.49 2522ca639f7fdfd7c0089ba1f5f6c2e8
ResNet34v1, InPlace-ABN sync 512 73.27 / 91.34 75.19 / 92.66 74.87 / 92.42 61515c1484911c3cc753d405131e1dda
ResNet101v1, InPlace-ABN sync 512 77.07 / 93.45 78.58 / 94.40 78.25 / 94.19 1552ae0f3d610108df702135f56bd27b

Data preparation

Our script uses torchvision.datasets.ImageFolder for loading ImageNet data, which expects folders organized as follows:

root/train/[class_id1]/xxx.{jpg,png,jpeg}
root/train/[class_id1]/xxy.{jpg,png,jpeg}
root/train/[class_id2]/xxz.{jpg,png,jpeg}
...

root/val/[class_id1]/asdas.{jpg,png,jpeg}
root/val/[class_id1]/123456.{jpg,png,jpeg}
root/val/[class_id2]/__32_.{jpg,png,jpeg}
...

Images can have any name, as long as the extension is that of a recognized image format. Class ids are also free-form, but they are expected to match between train and validation data. Note that the training data in the standard ImageNet distribution is already given in the required format, while validation images need to be split into class sub-folders as described above.

Training

The main training script is scripts/train_imagenet.py: this supports training on ImageNet, or any other dataset formatted as described above, while keeping a log of relevant metrics in Tensorboard format and periodically saving snapshots. Most training parameters can be specified as a json-formatted configuration file (look here for a complete list of configurable parameters). All parameters not explicitly specified in the configuration file are set to their defaults, also available in scripts/imagenet/config.py.

Our arXiv results can be reproduced by running scripts/train_imagenet.py with the configuration files in scripts/experiments. As an example, the command to train ResNeXt101 with InPlace-ABN, Leaky ReLU and batch_size = 512 is:

cd scripts
python -m torch.distributed.launch --nproc_per_node <n. GPUs per node> train_imagenet.py --log-dir /path/to/tensorboard/logs experiments/resnext101_ipabn_lr_512.json /path/to/imagenet/root

Validation

Validation is run by scripts/train_imagenet.py at the end of every training epoch. To validate a trained model, you can use the scripts/test_imagenet.py script, which allows for 10-crops validation and transferring weights across compatible networks (e.g. from ResNeXt101 with ReLU to ResNeXt101 with Leaky ReLU). This script accepts the same configuration files as scripts/train_imagenet.py, but note that the scale_val and crop_val parameters are ignored in favour of the --scale and --crop command-line arguments.

As an example, to validate the ResNeXt101 trained above using 10-crops of size 224 from images scaled to 256 pixels, you can run:

cd scripts
python -m torch.distributed.launch --nproc_per_node <n. GPUs per node> test_imagenet.py --crop 224 --scale 256 --ten_crops experiments/resnext101_ipabn_lr_512.json /path/to/checkpoint /path/to/imagenet/root

Usage for Semantic Segmentation on Cityscapes and Mapillary Vistas

We have successfully used InPlace-ABN with a DeepLab3 segmentation head that was trained on top of the WideResNet38 model above. Due to InPlace-ABN, we can significantly increase the amount of input data to this model, which eventually allowed us to obtain #1 positions on Cityscapes, Mapillary Vistas, AutoNUE, Kitti and ScanNet segmentation leaderboards. The training settings mostly follow the description in our paper.

Mapillary Vistas pre-trained model

We release our WideResNet38 + DeepLab3 segmentation model trained on the Mapillary Vistas research set. This is the model used to reach #1 position on the MVD semantic segmentation leaderboard. The segmentation model file provided below is made available under a CC BY-NC-SA 4.0 license.

Network mIOU Trained model (+md5)
WideResNet38 + DeepLab3 53.42 913f78486a34aa1577a7cd295e8a33bb

To use this, please download the .pth.tar model file linked above and run the test_vistas.py script as follows:

cd scripts
python test_vistas.py /path/to/model.pth.tar /path/to/input/folder /path/to/output/folder

The script will process all .png, .jpg and .jpeg images from the input folder and write the predictions in the output folder as .png images. For additional options, e.g. test time augmentation, please consult the script's help message.

The results on the test data written above were obtained by employing only scale 1.0 + flipping.

Changelog

Update 04 Jul. 2019: version 1.0.0

  • Complete rewrite of the CUDA code following the most recent native BN implementation from Pytorch
  • Improved synchronized BN implementation, correctly handling different per-GPU batch sizes and Pytorch distributed groups
  • The iABN layers are now packaged in an installable python library to simplify use in other projects
  • The Imagenet / Vistas scripts are still available in the scripts folder
  • Requires now PyTorch 1.1

Update 08 Jan. 2019:

  • Enabled multiprocessing and inplace ABN synchronization over multiple processes (previously using threads). It now requires to use DistributedDataParallel instead of DataParallel
  • Added compatibility with fp16 (currently allows fp16 input but requires the module to stay in fp32 mode)
  • Requires now PyTorch 1.0

Update Feb. 2019:

  • Added ResNet34v1, ResNet50v1 and ResNet101v1 ImageNet-1k pre-trained models

We have modified the imagenet training code and BN synchronization in order to work with multiple processes. We have also added compatibility of our Inplace ABN module with fp16.

Comments
  • RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

    RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

    I try to use the ABN, InPlaceABN, InPlaceABNSync. But some errors occur.

    RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

    I test it on Pytorch-0.2, cudnnv7, cuda8.

    opened by mingminzhen 37
  • error for test_vistas.py

    error for test_vistas.py

    I try the test_vistas.py, something is wrong.

    Traceback (most recent call last):
      File "test_vistas_single_gpu.py", line 311, in <module>
        main()
      File "test_vistas_single_gpu.py", line 188, in main
        probs, preds = model(img, scales, args.flip)
      File "/home/mingmin/anaconda2/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
        result = self.forward(*input, **kwargs)
      File "test_vistas_single_gpu.py", line 135, in forward
        sem_logits = self._network(x, scale)
      File "test_vistas_single_gpu.py", line 117, in _network
        x_up = functional.upsample(x, size=scaled_size, mode="bilinear")
      File "/home/mingmin/anaconda2/envs/py36/lib/python3.6/site-packages/torch/nn/functional.py", line 1891, in upsample
        return interpolate(input, size, scale_factor, mode, align_corners)
      File "/home/mingmin/anaconda2/envs/py36/lib/python3.6/site-packages/torch/nn/functional.py", line 1985, in interpolate
        return torch._C._nn.upsample_bilinear2d(input, _output_size(2), align_corners)
    TypeError: 'float' object cannot be interpreted as an integer
    

    Then i modify the code in the function _network(self, x, scale): scaled_size = [s * scale for s in x.shape[-2:]] to scaled_size = [int(s * scale) for s in x.shape[-2:]] No error occurs. But the output image is wrong. No right segmentation result. Can you help check the segmentation model and code ?

    opened by mingminzhen 27
  • Performance drop when replacing the old version of inplace with pytorch 1.0 version

    Performance drop when replacing the old version of inplace with pytorch 1.0 version

    The performance of the above experiments (deeplab v3) dropped to 74% by using the new pytorch1.0 version of inplace. Is there anything need to be careful when replacing the old version of inplace with pytorch 1.0 version? I only copied the files in ./modules to ./libs in https://github.com/speedinghzl/pytorch-segmentation-toolbox

    Originally posted by @lzrobots in https://github.com/mapillary/inplace_abn/issues/15#issuecomment-458220361

    opened by rotabulo 21
  • Loading the state dict of a model without InPlaceABN

    Loading the state dict of a model without InPlaceABN

    Hi.

    First of all, this package is amazing, and provided an incredible speed boost to training all of my models, so thanks.

    Now, I am trying to convert a standard pretrained model (like resnet50) to InPlaceABN model. Basically, I would like to be able to take any model, and apply on it a function that would convert all the BatchNorm2d with a InPlaceABN and copy all of the parameters.

    I have written a the following script. It is working if I use ABN but it is not working when I use InPlaceABN.

    Basically, I load a pretrain ResNet50 model, and apply the function to_abn on it. If I replace every nn.BatchNorm2d with ABN, I can infer on my test set and get the test accuracy I am supposed to. If I use InPlaceABN in the below code, I get 0% accuracy, as if InPlaceABN was not able to load a state_dict from a regular nn.BatchNorm2d. Any idea where it comes from?

    from inplace_abn import InPlaceABN, ABN
    import torch.nn as nn
    from utils import set_layer
    
    
    
    def to_abn(module):
        if hasattr(module, 'module'):
            module = module.module
        for n, m in module.named_modules():
            if isinstance(m, nn.BatchNorm2d):
                num_features = m.num_features
                momentum = m.momentum
                eps = m.eps
                # The below line does not seems to work when I try to load the state_dict
                new_bn = InPlaceABN(num_features=num_features, eps=eps, momentum=momentum, activation='identity')
                # But the below line would work and provide same results as original network. 
                # new_bn = ABN(num_features=num_features, eps=eps, momentum=momentum, activation='identity')
                new_bn.load_state_dict(m.state_dict())
                set_layer(module, n, new_bn)
        return module
    
    

    the function set_layer simply replaces a module of the model with another module.

    Thanks for your help

    opened by yoniaflalo 15
  • Command '['ninja', '-v']' returned non-zero exit status 1

    Command '['ninja', '-v']' returned non-zero exit status 1

    Traceback (most recent call last): File "/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 576, in _build_extension_module ['ninja', '-v'], stderr=subprocess.STDOUT, cwd=build_directory) File "/home/peng/anaconda2/envs/python36/lib/python3.6/subprocess.py", line 356, in check_output **kwargs).stdout File "/home/peng/anaconda2/envs/python36/lib/python3.6/subprocess.py", line 438, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "train.py", line 35, in from network import Network File "/home/peng/pytorch-seg-new/experiments/SimpleMerge_NYU/network.py", line 20, in import resnet101_dilation File "../../basemodel/resnet101_dilation.py", line 18, in from LibInplaceABN.modules import InPlaceABNSync, ABN, GlobalAvgPool2d File "../../lib/LibInplaceABN/modules/init.py", line 1, in from .bn import ABN, InPlaceABN, InPlaceABNSync File "../../lib/LibInplaceABN/modules/bn.py", line 10, in from .functions import * File "../../lib/LibInplaceABN/modules/functions.py", line 18, in extra_cuda_cflags=["--expt-extended-lambda"]) File "/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 501, in load _build_extension_module(name, build_directory) File "/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 582, in _build_extension_module name, error.output.decode())) RuntimeError: Error building extension 'inplace_abn': [1/5] /usr/local/cuda-9.1/bin/bin/nvcc -DTORCH_EXTENSION_NAME=inplace_abn -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/TH -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda-9.1/bin/include -I/home/peng/anaconda2/envs/python36/include/python3.6m --compiler-options '-fPIC' --expt-extended-lambda -std=c++11 -c /home/peng/pytorch-seg-new/lib/LibInplaceABN/modules/src/inplace_abn_cuda.cu -o inplace_abn_cuda.cuda.o FAILED: inplace_abn_cuda.cuda.o /usr/local/cuda-9.1/bin/bin/nvcc -DTORCH_EXTENSION_NAME=inplace_abn -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/TH -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda-9.1/bin/include -I/home/peng/anaconda2/envs/python36/include/python3.6m --compiler-options '-fPIC' --expt-extended-lambda -std=c++11 -c /home/peng/pytorch-seg-new/lib/LibInplaceABN/modules/src/inplace_abn_cuda.cu -o inplace_abn_cuda.cuda.o /bin/sh: 1: /usr/local/cuda-9.1/bin/bin/nvcc: not found [2/5] /usr/local/cuda-9.1/bin/bin/nvcc -DTORCH_EXTENSION_NAME=inplace_abn -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/TH -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda-9.1/bin/include -I/home/peng/anaconda2/envs/python36/include/python3.6m --compiler-options '-fPIC' --expt-extended-lambda -std=c++11 -c /home/peng/pytorch-seg-new/lib/LibInplaceABN/modules/src/inplace_abn_cuda_half.cu -o inplace_abn_cuda_half.cuda.o FAILED: inplace_abn_cuda_half.cuda.o /usr/local/cuda-9.1/bin/bin/nvcc -DTORCH_EXTENSION_NAME=inplace_abn -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/TH -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda-9.1/bin/include -I/home/peng/anaconda2/envs/python36/include/python3.6m --compiler-options '-fPIC' --expt-extended-lambda -std=c++11 -c /home/peng/pytorch-seg-new/lib/LibInplaceABN/modules/src/inplace_abn_cuda_half.cu -o inplace_abn_cuda_half.cuda.o /bin/sh: 1: /usr/local/cuda-9.1/bin/bin/nvcc: not found [3/5] c++ -MMD -MF inplace_abn.o.d -DTORCH_EXTENSION_NAME=inplace_abn -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/TH -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda-9.1/bin/include -I/home/peng/anaconda2/envs/python36/include/python3.6m -fPIC -std=c++11 -O3 -c /home/peng/pytorch-seg-new/lib/LibInplaceABN/modules/src/inplace_abn.cpp -o inplace_abn.o FAILED: inplace_abn.o c++ -MMD -MF inplace_abn.o.d -DTORCH_EXTENSION_NAME=inplace_abn -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/TH -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda-9.1/bin/include -I/home/peng/anaconda2/envs/python36/include/python3.6m -fPIC -std=c++11 -O3 -c /home/peng/pytorch-seg-new/lib/LibInplaceABN/modules/src/inplace_abn.cpp -o inplace_abn.o /home/peng/pytorch-seg-new/lib/LibInplaceABN/modules/src/inplace_abn.cpp:1:29: fatal error: torch/extension.h: No such file or directory compilation terminated. [4/5] c++ -MMD -MF inplace_abn_cpu.o.d -DTORCH_EXTENSION_NAME=inplace_abn -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/TH -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda-9.1/bin/include -I/home/peng/anaconda2/envs/python36/include/python3.6m -fPIC -std=c++11 -O3 -c /home/peng/pytorch-seg-new/lib/LibInplaceABN/modules/src/inplace_abn_cpu.cpp -o inplace_abn_cpu.o ninja: build stopped: subcommand failed.

    opened by zsp1993 14
  • gloo error

    gloo error

    Hi, I used distributed train with 'gloo' method but got error:

    File "/home/yuxb/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 102, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph)
      File "/home/yuxb/anaconda3/lib/python3.6/site-packages/torch/autograd/__init__.py", line 90, in backward
        allow_unreachable=True)  # allow_unreachable flag
      File "/home/yuxb/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 445, in distributed_data_parallel_hook
        self._queue_reduction(bucket_idx)
      File "/home/yuxb/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 475, in _queue_reduction
        self.device_ids)
    TypeError: _queue_reduction(): incompatible function arguments. The following argument types are supported:
        1. (process_group: torch.distributed.ProcessGroup, grads_batch: List[List[at::Tensor]], devices: List[int]) -> Tuple[torch.distributed.Work, at::Tensor]
    
    Invoked with: <torch.distributed.ProcessGroupGloo object at 0x7fdf2daaadc0>, [[tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0..
    

    Thanks for your help.

    opened by JamesKasperYu 13
  • How to train the ResNext model on the Cityscapes?

    How to train the ResNext model on the Cityscapes?

    I used to train the PSP with the original ResNet-101(7x7 convolution) and achieve 76.8 mIou on the val set of Cityscapes. Thus I want to replace the ResNet-101 with the ResNext-101(the imagenet pretrained model you provide) and train it with the same parameters, but the performance seems very poor(66.4). Do you use the COCO to finetune the ResNext model?

    Could you give me some tips to tune the hyperparameters for the ResNext-101 compared with ResNet-101?

    Besides, I am wondering whether have you trained the modified ResNet-101(replace the 7X7 conv with three 3x3 conv) model on the ImageNet? If so, it would be great if you could share me this model.

    invalid question 
    opened by PkuRainBow 13
  • About general activation function

    About general activation function

    Great thanks for inplace_abn, it increase batch size by 1.5 in my application!

    I notice that InPlaceABN support no activatation function, relu, elu, leaky relu now. May I ask how to implement other activation function like prelu?

    Can we just use [InPlaceABN(depth, activation='none'), torch.nn.PReLU()] , since PReLU seems not inplace operation?

    opened by luzai 12
  •  error: An extended __device__ lambda must not be defined in a function that is defined within another function

    error: An extended __device__ lambda must not be defined in a function that is defined within another function

    Hi, I find that you have updated your repo and I download the new repo.

    I try to build the repo under such enviroments:

    Ubuntu 16.04(docker)
    Miniconda env: pytorch04
    python: 3.6.5
    cuda: 8.0
    cudnn: 7.0
    

    I meet such errors: image

    enhancement 
    opened by PkuRainBow 12
  • Can't install by download file

    Can't install by download file

    root:/data/research/seamseg-master# pip install inplace_abn-master.zip Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Processing ./inplace_abn-master.zip ERROR: Command errored out with exit status 1: command: /data/config/anaconda3/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-1w23bkoz/setup.py'"'"'; file='"'"'/tmp/pip-req-build-1w23bkoz/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base pip-egg-info cwd: /tmp/pip-req-build-1w23bkoz/ Complete output (20 lines): /data/config/anaconda3/lib/python3.6/distutils/dist.py:261: UserWarning: Unknown distribution option: 'long_description_content_type' warnings.warn(msg) Traceback (most recent call last): File "", line 1, in File "/tmp/pip-req-build-1w23bkoz/setup.py", line 59, in cmdclass={"build_ext": BuildExtension} File "/data/config/anaconda3/lib/python3.6/distutils/core.py", line 108, in setup _setup_distribution = dist = klass(attrs) File "/data/config/anaconda3/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg/setuptools/dist.py", line 318, in init File "/data/config/anaconda3/lib/python3.6/distutils/dist.py", line 281, in init self.finalize_options() File "/data/config/anaconda3/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg/setuptools/dist.py", line 375, in finalize_options File "/tmp/pip-req-build-1w23bkoz/.eggs/setuptools_scm-3.3.3-py3.6.egg/setuptools_scm/integration.py", line 17, in version_keyword File "/tmp/pip-req-build-1w23bkoz/.eggs/setuptools_scm-3.3.3-py3.6.egg/setuptools_scm/init.py", line 150, in get_version File "/tmp/pip-req-build-1w23bkoz/.eggs/setuptools_scm-3.3.3-py3.6.egg/setuptools_scm/init.py", line 113, in _do_parse LookupError: setuptools-scm was unable to detect version for '/tmp/pip-req-build-1w23bkoz'.

    Make sure you're either building from a fully intact git repository or PyPI tarballs. Most other sources (such as GitHub's tarballs, a git checkout without the .git folder) don't contain the necessary metadata and will not work.
    
    For example, if you're using pip, instead of https://github.com/user/proj/archive/master.zip use git+https://github.com/user/proj.git#egg=proj
    ----------------------------------------
    

    ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

    opened by jianlong-yuan 11
  • Have you noticed a big unbalance between gpus with InplaceSyncBN?

    Have you noticed a big unbalance between gpus with InplaceSyncBN?

    With pytorch nn.BatchNorm2d and nn.DataParallel, the memory difference between two gpus is less than 1G. However, with InplaceSyncBN and nn.DataParallel, the memory gap is enlarged to almost 2G. What is the cause of this and how could I avoid it?

    opened by CoinCheung 11
  • Support for Pillow >= 8.3.0

    Support for Pillow >= 8.3.0

    As stated in issue 229, I contribute the implementation to check Pillow version. If the version of Pillow >= 8.3.0, the _PALETTE list must consist of all channels for one color followed by the next color (e.g. RGBRGBRGB). Otherwise, all R values must be contiguous in the list before G and B values.

    opened by gyes00205 0
  • Support for Pillow >= 8.3.0

    Support for Pillow >= 8.3.0

    I try to test picture as below shown but get a strange result not like the result of issue 49.

    Therefore, I try to find the bug of test_vistas_single_gpu.py. In line 320 of test_vistas_single_gpu.py, all R values is contiguous in the list before G and B values.

    _PALETTE = ImagePalette.ImagePalette(
        palette=list(_PALETTE[:, 0]) + list(_PALETTE[:, 1]) + list(_PALETTE[:, 2]),
        mode="RGB",
    )
    

    But in Pillow >= 8.3.0, the list must consist of all channels for one color followed by the next color (e.g. RGBRGBRGB). We can see the description in Pillow 8.3.x version Maybe I can contribute the code to check the Pillow version in test_vistas_single_gpu.py.

    opened by gyes00205 0
  • inplace_abn does not seem to support Pytorch 1.12.1

    inplace_abn does not seem to support Pytorch 1.12.1

    I'm getting the following error when running a training job that was working fine in Pytorch 1.11 with 1.12:

      File "/pip-dl_inplace_abn/inplace_abn/functions.py", line 227, in inplace_abn
        return InPlaceABN.apply(
      File "/pip-dl_inplace_abn/inplace_abn/functions.py", line 86, in forward
        mean, var, count = _backend.statistics(x)
    RuntimeError: Tensors of type TensorImpl do not have sizes
    

    Is inplace_abn not compatible with Pytorch 1.12 or is the problem somewhere else?

    opened by gdippolito 0
  • inplace_abn for transformer

    inplace_abn for transformer

    dear author, the transformer use the layer norm in stead of batch norm, is it possible to apply inplace abn to transformer-based models? or is there any way to lower those models' gpu memory? thanks.

    opened by 17dacheng 0
  • setup.py can not work

    setup.py can not work

    LookupError: setuptools-scm was unable to detect version for /home/pyl/pythonproject_pyl/inplace_abn_main.

    Make sure you're either building from a fully intact git repository or PyPI tarballs. Most other sources (such as GitHub's tarballs, a git checkout without the .git folder) don't contain the necessary metadata and will not work.

    opened by pyl3000 1
  • Import issue: Undefined symbol

    Import issue: Undefined symbol

    When attempting to import Inplace-ABN version 1.1.0, I get the following error:

    >>> from inplace_abn import InPlaceABN
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/curttigges/miniconda3/envs/pytorch-dl/lib/python3.7/site-packages/inplace_abn/__init__.py", line 1, in <module>
        from .abn import ABN, InPlaceABN, InPlaceABNSync
      File "/home/curttigges/miniconda3/envs/pytorch-dl/lib/python3.7/site-packages/inplace_abn/abn.py", line 8, in <module>
        from .functions import inplace_abn, inplace_abn_sync
      File "/home/curttigges/miniconda3/envs/pytorch-dl/lib/python3.7/site-packages/inplace_abn/functions.py", line 8, in <module>
        from . import _backend
    ImportError: /home/curttigges/miniconda3/envs/pytorch-dl/lib/python3.7/site-packages/inplace_abn/_backend.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv
    

    I am using CUDA 11.2 and PyTorch 1.8.2. The Inplace-ABN files were compiled with GCC 10.

    opened by curt-tigges 1
Releases(v1.1.0)
  • v1.1.0(Sep 3, 2020)

    This release updates ABN, InPlaceABN and InPlaceABNSync to feature parity with recent versions of Pytorch's BatchNormNd layers:

    • Add a track_running_stats parameter to enable / disable computation of running statistics independently from the layer's training state
    • Add a num_batches_tracked buffer, and allow passing momentum=None to support cumulative moving average for tracking running stats instead of exponential moving average
    • As a side-effect, now support loading parameters from standard BatchNorm without work-arounds. Still, if the loaded parameters contain negative weight elements the output will differ compared to standard BatchNorm

    Additional changes:

    • Fix backward pass in eval mode: it was not properly accounting for the activation function
    • Refactor code to follow more sensible formatting standards
    • Add type annotations
    • Improve docstrings
    • Update installation instructions, pointing to the PyPI package
    Source code(tar.gz)
    Source code(zip)
  • v1.0.12(Apr 22, 2020)

  • v1.0.11(Jan 27, 2020)

  • v1.0.10(Jan 8, 2020)

    This release contains an improved implementation of the fix for the backward pass in v1.0.9 which uses less temporary memory at no additional computational cost.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.9(Jan 7, 2020)

    In previous versions, both the input/output tensor y and the gradient tensor dy were overwritten during the backward pass. This was causing issues with some network topologies, producing wrong gradients.

    To fix this issue, a pair of temporary tensors is now created during the backward pass to hold the results of intermediate computations. This change will increase the amount of temporary memory required, meaning that in some cases where GPU memory utilization was already very close to the limit OOM errors might now occur. An alternative, more complex fix is also possible at the expense of additional computational costs. We are evaluating the impact of these changes and will provide updates in a future release.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.7(Sep 4, 2019)

  • v1.0.6(Aug 23, 2019)

    At compile time, when determining whether to enable CUDA support, we now base the decision on the Pytorch version installed:

    • If a CUDA-enabled Pytorch is detected, we attempt to compile CUDA support
    • If a CPU-only Pytorch is detected, we disable CUDA support
    Source code(tar.gz)
    Source code(zip)
  • v1.0.5(Aug 20, 2019)

    InPlace-ABN can now be compiled and used without CUDA. Note that Synchronized InPlace-ABN is still only supported in conjunction with CUDA-enabled Pytorch.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.4(Aug 14, 2019)

    State dicts from standard BatchNorm layers trained with Pytorch v1.0.0 or newer can now be properly loaded by ABN, InPlaceABN and InPlaceABNSync.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.3(Jul 16, 2019)

    Added a couple of functions to manage distributed groups with InplaceABNSync:

    • active_group: create a distributed group where each worker can decide wether to participate or not.
    • set_active_group: scan a model, passing a distributed group to all layers that implement a set_group() method.

    These are intended to simplify handling of asymmetric computational graphs in DistributedDataParallel when using InplaceABNSync. A typical usage is as follows:

    class DynamicModel(nn.Module):
        def __init__(self):
            super(DynamicModel, self).__init__()
            self.conv1 = nn.Conv2d(4, 4, 1)
            self.bn1 = InplaceABNSync(4)
            self.conv2 = nn.Conv2d(4, 4, 1)
            self.bn2 = InplaceABNSync(4)
        
        def forward(x):
            x = self.conv1(x)
            x = self.bn1(x)
            
            # Call some data-dependent function telling us wether the second part of the network
            # should be traversed or not
            active = self.get_active(x)
            
            # Create process group containing only the active workers, pass it to bn2
            set_active_group(self.bn2, active_group(active))
            
            # Run the second part of the network only if active is True
            if active:
                x = self.conv2(x)
                x = self.bn2(x)
            
            return x
    
    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Jul 5, 2019)

    This update adds back support for mixed precision training. These combinations of inputs / parameters are now supported:

    • float32 input, float32 weight and bias
    • float64 input, float64 weight and bias
    • float16 input, float16 weight and bias
    • float16 input, float32 weight and bias

    Note: in the float16 cases all internal operations are still performed with float32 math, and float16 is not supported when operating in CPU mode.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Jul 4, 2019)

    This release marks some major changes in inplace_abn:

    • Complete rewrite of the CUDA code following the most recent native BN implementation from Pytorch
    • Improved synchronized BN implementation, correctly handling different per-GPU batch sizes and Pytorch distributed groups
    • The iABN layers are now packaged in an installable python library to simplify use in other projects
    • The Imagenet / Vistas scripts are still available in the scripts folder
    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Feb 14, 2019)

    We added the possibility of training ResNet with inplace ABN layers.

    In addition we released ResNet34 and ResNet50 pre-trained on ImageNet.

    Source code(tar.gz)
    Source code(zip)
  • v0.1(Jan 8, 2019)

    This is a code refactoring to enable compatibility with Pytorch v1.0.

    Additional changes:

    • Moved from multi-threading training to distributed training using multiple processes
    • We provide an adapted implementation of synchronized inplace ABN
    • Our inplace ABN layer is compatible with fp16 tensors.
    Source code(tar.gz)
    Source code(zip)
  • v0.0.3(Jan 7, 2019)

    This is a partial code refactoring to enable compatibility with Pytorch v0.4.1. In particular:

    • Fixed compatibility with pytorch>=0.4.1 due to change of AT_ASSERT
    • Fixed GPU allocation of tensors created in CUDA code

    Additional changes:

    • Added segmentation models and scripts to run inference on Vistas
    • Updated license
    Source code(tar.gz)
    Source code(zip)
  • v0.0.2(Jul 18, 2018)

    This is a partial code refactoring to enable compatibility with Pytorch v0.4. In particular:

    • Native functions have been rewritten to use the new ATen-based extension interface introduced in v0.4. As a side effect, the native code doesn't need to be pre-compiled anymore. Instead, we are now using Pytorch's newly introduced run-time library loading mechanism.
    • The python code has been modified to account for the fact that autograd.Variable does not exist anymore.

    Additional changes:

    • ABN modules have been slightly refactored, leading to a slight change in the structure of the overall models' state_dicts. As a consequence, pre-trained models need to be re-downloaded (updated links in README.md).
    Source code(tar.gz)
    Source code(zip)
  • v0.0.1(Jul 17, 2018)

    NOTE: this is the last release that is compatible with Pytorch v0.3

    After this release, the code will undergo partial rewrite to adapt to the changes introduced in Pytorch v0.4 regarding Tensors / Variables and native functions. As a consequence, we are completely dropping support for versions of Pytorch before v0.3.

    Source code(tar.gz)
    Source code(zip)
Owner
Map data at scale from street-level imagery
null
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Annoy Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given quer

Spotify 10.6k Jan 4, 2023
Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

InfoPro-Pytorch The Information Propagation algorithm for training deep networks with local supervision. (ICLR 2021) Revisiting Locally Supervised Lea

null 78 Dec 27, 2022
ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

ActNN : Activation Compressed Training This is the official project repository for ActNN: Reducing Training Memory Footprint via 2-Bit Activation Comp

UC Berkeley RISE 178 Jan 5, 2023
Segcache: a memory-efficient and scalable in-memory key-value cache for small objects

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects This repo contains the code of Segcache described in the followi

TheSys Group @ CMU CS 78 Jan 7, 2023
PyTorch Code of "Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spatiotemporal Dynamics"

Memory In Memory Networks It is based on the paper Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spati

Yang Li 12 May 30, 2022
Episodic-memory - Ego4D Episodic Memory Benchmark

Ego4D Episodic Memory Benchmark EGO4D is the world's largest egocentric (first p

null 3 Feb 18, 2022
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

Memory Efficient Attention Pytorch Implementation of a memory efficient multi-head attention as proposed in the paper, Self-attention Does Not Need O(

Phil Wang 180 Jan 5, 2023
Code for the paper: Adversarial Training Against Location-Optimized Adversarial Patches. ECCV-W 2020.

Adversarial Training Against Location-Optimized Adversarial Patches arXiv | Paper | Code | Video | Slides Code for the paper: Sukrut Rao, David Stutz,

Sukrut Rao 32 Dec 13, 2022
Optimized code based on M2 for faster image captioning training

Transformer Captioning This repository contains the code for Transformer-based image captioning. Based on meshed-memory-transformer, we further optimi

lyricpoem 16 Dec 16, 2022
A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Object Pose Estimation Demo This tutorial will go through the steps necessary to perform pose estimation with a UR3 robotic arm in Unity. You’ll gain

Unity Technologies 187 Dec 24, 2022
QuakeLabeler is a Python package to create and manage your seismic training data, processes, and visualization in a single place — so you can focus on building the next big thing.

QuakeLabeler Quake Labeler was born from the need for seismologists and developers who are not AI specialists to easily, quickly, and independently bu

Hao Mai 15 Nov 4, 2022
This is the official PyTorch implementation for "Mesa: A Memory-saving Training Framework for Transformers".

Mesa: A Memory-saving Training Framework for Transformers This is the official PyTorch implementation for Mesa: A Memory-saving Training Framework for

Zhuang AI Group 105 Dec 6, 2022
PyTorchMemTracer - Depict GPU memory footprint during DNN training of PyTorch

A Memory Tracer For PyTorch OOM is a nightmare for PyTorch users. However, most

Jiarui Fang 9 Nov 14, 2022
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Documentation | FAQ | Release Notes | Roadmap | MACE Model Zoo | Demo | Join Us | 中文 Mobile AI Compute Engine (or MACE for short) is a deep learning i

Xiaomi 4.7k Dec 29, 2022
Official Pytorch implementation of 'GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network' (NeurIPS 2020)

Official implementation of GOCor This is the official implementation of our paper : GOCor: Bringing Globally Optimized Correspondence Volumes into You

Prune Truong 71 Nov 18, 2022
Use AI to generate a optimized stock portfolio

Use AI, Modern Portfolio Theory, and Monte Carlo simulation's to generate a optimized stock portfolio that minimizes risk while maximizing returns. Ho

Greg James 30 Dec 22, 2022
Supervised Contrastive Learning for Downstream Optimized Sequence Representations

SupCL-Seq ?? Supervised Contrastive Learning for Downstream Optimized Sequence representations (SupCS-Seq) accepted to be published in EMNLP 2021, ext

Hooman Sedghamiz 18 Oct 21, 2022
Tutel MoE: An Optimized Mixture-of-Experts Implementation

Project Tutel Tutel MoE: An Optimized Mixture-of-Experts Implementation. Supported Framework: Pytorch Supported GPUs: CUDA(fp32 + fp16), ROCm(fp32) Ho

Microsoft 344 Dec 29, 2022
MOpt-AFL provided by the paper "MOPT: Optimized Mutation Scheduling for Fuzzers"

MOpt-AFL 1. Description MOpt-AFL is a AFL-based fuzzer that utilizes a customized Particle Swarm Optimization (PSO) algorithm to find the optimal sele

null 172 Dec 18, 2022