In-Place Activated BatchNorm for Memory-Optimized Training of DNNs

Last update: Dec 29, 2022

Related tags

Deep Learning inplace_abn

Overview

In-Place Activated BatchNorm

In-Place Activated BatchNorm for Memory-Optimized Training of DNNs

In-Place Activated BatchNorm (InPlace-ABN) is a novel approach to reduce the memory required for training deep networks. It allows for up to 50% memory savings in modern architectures such as ResNet, ResNeXt and Wider ResNet by redefining BN + non linear activation as a single in-place operation, while smartly dropping or recomputing intermediate buffers as needed.

This repository contains a PyTorch implementation of the InPlace-ABN layer, as well as some training scripts to reproduce the ImageNet classification results reported in our paper.

Overview
Installation
Training on ImageNet

We have now also released the inference code for semantic segmentation, together with the Mapillary Vistas trained model leading to #1 position on the Mapillary Vistas Semantic Segmentation leaderboard. More information can be found at the bottom of this page.

Citation

If you use In-Place Activated BatchNorm in your research, please cite:

@inproceedings{rotabulo2017place,
  title={In-Place Activated BatchNorm for Memory-Optimized Training of DNNs},
  author={Rota Bul\`o, Samuel and Porzi, Lorenzo and Kontschieder, Peter},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2018}
}

Overview

When processing a BN-Activation-Convolution sequence in the forward pass, most deep learning frameworks need to store two big buffers, i.e. the input x of BN and the input z of Conv. This is necessary because the standard implementations of the backward passes of BN and Conv depend on their inputs to calculate the gradients. Using Inplace-ABN to replace the BN-Activation sequence, we can safely discard x, thus saving up to 50% GPU memory at training time. To achieve this, we rewrite the backward pass of BN in terms of its output y, which is in turn reconstructed from z by inverting the activation function.

The parametrization for the scaling factor of BN changed compared to standard BN, in order to ensure an invertible transformation. Specifically, the scaling factor becomes .

Requirements

To install PyTorch, please refer to https://github.com/pytorch/pytorch#installation.

NOTE 1: our code requires PyTorch v1.1 or later

NOTE 2: we are only able to provide support for Linux platforms and CUDA versions >= 10.0

NOTE 3: in general, it is not possible to load weights from a network trained with standard BN into an InPlace-ABN network without severe performance degradation, due to the different handling of BN scaling parameters

To install the package containing the iABN layers:

pip install inplace-abn

Note that some parts of InPlace-ABN have native C++/CUDA implementations, meaning that the command above will need to compile them.

Alternatively, to download and install the latest version of our library, also obtaining a copy of the Imagenet / Vistas scripts:

git clone https://github.com/mapillary/inplace_abn.git
cd inplace_abn
python setup.py install
cd scripts
pip install -r requirements.txt

The last of the commands above will install some additional libraries required by the Imagenet / Vistas scripts.

Force compiling with CUDA

In order to force the compilation of the native CUDA functions on systems that do not have access to a GPU (e.g. Docker containers), two environment variables have to be set:

export TORCH_CUDA_ARCH_LIST="{archs}"
export IABN_FORCE_CUDA=1

where {archs} is a list of target CUDA architectures, e.g. Pascal;Volta, 6.0;6.5 etc.

Training on ImageNet-1k

Here you can find the results from our arXiv paper (top-1 / top-5 scores) with corresponding, trained models and md5 checksums, respectively. The model files provided below are made available under the license attached to ImageNet.

Network	Batch	224	224, 10-crops	320	Trained models (+md5)
ResNeXt101, Std-BN	256	77.04 / 93.50	78.72 / 94.47	77.92 / 94.28	`448438885986d14db5e870b95f814f91`
ResNeXt101, InPlace-ABN	512	78.08 / 93.79	79.52 / 94.66	79.38 / 94.67	`3b7a221cbc076410eb12c8dd361b7e4e`
ResNeXt152, InPlace-ABN	256	78.28 / 94.04	79.73 / 94.82	79.56 / 94.67	`2c8d572587961ed74611d534c5b2e9ce`
WideResNet38, InPlace-ABN	256	79.72 / 94.78	81.03 / 95.43	80.69 / 95.27	`1c085ab70b789cc1d6c1594f7a761007`
ResNeXt101, InPlace-ABN sync	256	77.70 / 93.78	79.18 / 94.60	78.98 / 94.56	`0a85a21847b15e5a242e17bf3b753849`
DenseNet264, InPlace-ABN	256	78.57 / 94.17	79.72 / 94.93	79.49 / 94.89	`0b413d67b725619441d0646d663865bf`
ResNet50v1, InPlace-ABN sync	512	75.53 / 92.59	77.04 / 93.57	76.60 / 93.49	`2522ca639f7fdfd7c0089ba1f5f6c2e8`
ResNet34v1, InPlace-ABN sync	512	73.27 / 91.34	75.19 / 92.66	74.87 / 92.42	`61515c1484911c3cc753d405131e1dda`
ResNet101v1, InPlace-ABN sync	512	77.07 / 93.45	78.58 / 94.40	78.25 / 94.19	`1552ae0f3d610108df702135f56bd27b`

Data preparation

Our script uses torchvision.datasets.ImageFolder for loading ImageNet data, which expects folders organized as follows:

root/train/[class_id1]/xxx.{jpg,png,jpeg}
root/train/[class_id1]/xxy.{jpg,png,jpeg}
root/train/[class_id2]/xxz.{jpg,png,jpeg}
...

root/val/[class_id1]/asdas.{jpg,png,jpeg}
root/val/[class_id1]/123456.{jpg,png,jpeg}
root/val/[class_id2]/__32_.{jpg,png,jpeg}
...

Images can have any name, as long as the extension is that of a recognized image format. Class ids are also free-form, but they are expected to match between train and validation data. Note that the training data in the standard ImageNet distribution is already given in the required format, while validation images need to be split into class sub-folders as described above.

Training

The main training script is scripts/train_imagenet.py: this supports training on ImageNet, or any other dataset formatted as described above, while keeping a log of relevant metrics in Tensorboard format and periodically saving snapshots. Most training parameters can be specified as a json-formatted configuration file (look here for a complete list of configurable parameters). All parameters not explicitly specified in the configuration file are set to their defaults, also available in scripts/imagenet/config.py.

Our arXiv results can be reproduced by running scripts/train_imagenet.py with the configuration files in scripts/experiments. As an example, the command to train ResNeXt101 with InPlace-ABN, Leaky ReLU and batch_size = 512 is:

cd scripts
python -m torch.distributed.launch --nproc_per_node <n. GPUs per node> train_imagenet.py --log-dir /path/to/tensorboard/logs experiments/resnext101_ipabn_lr_512.json /path/to/imagenet/root

Validation

Validation is run by scripts/train_imagenet.py at the end of every training epoch. To validate a trained model, you can use the scripts/test_imagenet.py script, which allows for 10-crops validation and transferring weights across compatible networks (e.g. from ResNeXt101 with ReLU to ResNeXt101 with Leaky ReLU). This script accepts the same configuration files as scripts/train_imagenet.py, but note that the scale_val and crop_val parameters are ignored in favour of the --scale and --crop command-line arguments.

As an example, to validate the ResNeXt101 trained above using 10-crops of size 224 from images scaled to 256 pixels, you can run:

cd scripts
python -m torch.distributed.launch --nproc_per_node <n. GPUs per node> test_imagenet.py --crop 224 --scale 256 --ten_crops experiments/resnext101_ipabn_lr_512.json /path/to/checkpoint /path/to/imagenet/root

Usage for Semantic Segmentation on Cityscapes and Mapillary Vistas

We have successfully used InPlace-ABN with a DeepLab3 segmentation head that was trained on top of the WideResNet38 model above. Due to InPlace-ABN, we can significantly increase the amount of input data to this model, which eventually allowed us to obtain #1 positions on Cityscapes, Mapillary Vistas, AutoNUE, Kitti and ScanNet segmentation leaderboards. The training settings mostly follow the description in our paper.

Mapillary Vistas pre-trained model

We release our WideResNet38 + DeepLab3 segmentation model trained on the Mapillary Vistas research set. This is the model used to reach #1 position on the MVD semantic segmentation leaderboard. The segmentation model file provided below is made available under a CC BY-NC-SA 4.0 license.

Network	mIOU	Trained model (+md5)
WideResNet38 + DeepLab3	53.42	913f78486a34aa1577a7cd295e8a33bb

To use this, please download the .pth.tar model file linked above and run the test_vistas.py script as follows:

cd scripts
python test_vistas.py /path/to/model.pth.tar /path/to/input/folder /path/to/output/folder

The script will process all .png, .jpg and .jpeg images from the input folder and write the predictions in the output folder as .png images. For additional options, e.g. test time augmentation, please consult the script's help message.

The results on the test data written above were obtained by employing only scale 1.0 + flipping.

Changelog

Update 04 Jul. 2019: version 1.0.0

Complete rewrite of the CUDA code following the most recent native BN implementation from Pytorch
Improved synchronized BN implementation, correctly handling different per-GPU batch sizes and Pytorch distributed groups
The iABN layers are now packaged in an installable python library to simplify use in other projects
The Imagenet / Vistas scripts are still available in the scripts folder
Requires now PyTorch 1.1

Update 08 Jan. 2019:

Enabled multiprocessing and inplace ABN synchronization over multiple processes (previously using threads). It now requires to use DistributedDataParallel instead of DataParallel
Added compatibility with fp16 (currently allows fp16 input but requires the module to stay in fp32 mode)
Requires now PyTorch 1.0

Update Feb. 2019:

Added ResNet34v1, ResNet50v1 and ResNet101v1 ImageNet-1k pre-trained models

We have modified the imagenet training code and BN synchronization in order to work with multiple processes. We have also added compatibility of our Inplace ABN module with fp16.

Comments

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

I try to use the ABN, InPlaceABN, InPlaceABNSync. But some errors occur.

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

I test it on Pytorch-0.2, cudnnv7, cuda8.

opened by mingminzhen 37

error for test_vistas.py

I try the test_vistas.py, something is wrong.

Traceback (most recent call last):
  File "test_vistas_single_gpu.py", line 311, in <module>
    main()
  File "test_vistas_single_gpu.py", line 188, in main
    probs, preds = model(img, scales, args.flip)
  File "/home/mingmin/anaconda2/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "test_vistas_single_gpu.py", line 135, in forward
    sem_logits = self._network(x, scale)
  File "test_vistas_single_gpu.py", line 117, in _network
    x_up = functional.upsample(x, size=scaled_size, mode="bilinear")
  File "/home/mingmin/anaconda2/envs/py36/lib/python3.6/site-packages/torch/nn/functional.py", line 1891, in upsample
    return interpolate(input, size, scale_factor, mode, align_corners)
  File "/home/mingmin/anaconda2/envs/py36/lib/python3.6/site-packages/torch/nn/functional.py", line 1985, in interpolate
    return torch._C._nn.upsample_bilinear2d(input, _output_size(2), align_corners)
TypeError: 'float' object cannot be interpreted as an integer

Then i modify the code in the function _network(self, x, scale): scaled_size = [s * scale for s in x.shape[-2:]] to scaled_size = [int(s * scale) for s in x.shape[-2:]] No error occurs. But the output image is wrong. No right segmentation result. Can you help check the segmentation model and code ?

opened by mingminzhen 27

Performance drop when replacing the old version of inplace with pytorch 1.0 version

The performance of the above experiments (deeplab v3) dropped to 74% by using the new pytorch1.0 version of inplace. Is there anything need to be careful when replacing the old version of inplace with pytorch 1.0 version? I only copied the files in ./modules to ./libs in https://github.com/speedinghzl/pytorch-segmentation-toolbox

Originally posted by @lzrobots in https://github.com/mapillary/inplace_abn/issues/15#issuecomment-458220361

opened by rotabulo 21
Loading the state dict of a model without InPlaceABN
Hi.

First of all, this package is amazing, and provided an incredible speed boost to training all of my models, so thanks.

Now, I am trying to convert a standard pretrained model (like resnet50) to InPlaceABN model. Basically, I would like to be able to take any model, and apply on it a function that would convert all the BatchNorm2d with a InPlaceABN and copy all of the parameters.

I have written a the following script. It is working if I use ABN but it is not working when I use InPlaceABN.

Basically, I load a pretrain ResNet50 model, and apply the function to_abn on it. If I replace every nn.BatchNorm2d with ABN, I can infer on my test set and get the test accuracy I am supposed to. If I use InPlaceABN in the below code, I get 0% accuracy, as if InPlaceABN was not able to load a state_dict from a regular nn.BatchNorm2d. Any idea where it comes from?

from inplace_abn import InPlaceABN, ABN import torch.nn as nn from utils import set_layer def to_abn(module): if hasattr(module, 'module'): module = module.module for n, m in module.named_modules(): if isinstance(m, nn.BatchNorm2d): num_features = m.num_features momentum = m.momentum eps = m.eps # The below line does not seems to work when I try to load the state_dict new_bn = InPlaceABN(num_features=num_features, eps=eps, momentum=momentum, activation='identity') # But the below line would work and provide same results as original network. # new_bn = ABN(num_features=num_features, eps=eps, momentum=momentum, activation='identity') new_bn.load_state_dict(m.state_dict()) set_layer(module, n, new_bn) return module

the function set_layer simply replaces a module of the model with another module.

Thanks for your help
opened by yoniaflalo 15
Command '['ninja', '-v']' returned non-zero exit status 1

Traceback (most recent call last): File "/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 576, in _build_extension_module ['ninja', '-v'], stderr=subprocess.STDOUT, cwd=build_directory) File "/home/peng/anaconda2/envs/python36/lib/python3.6/subprocess.py", line 356, in check_output **kwargs).stdout File "/home/peng/anaconda2/envs/python36/lib/python3.6/subprocess.py", line 438, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "train.py", line 35, in from network import Network File "/home/peng/pytorch-seg-new/experiments/SimpleMerge_NYU/network.py", line 20, in import resnet101_dilation File "../../basemodel/resnet101_dilation.py", line 18, in from LibInplaceABN.modules import InPlaceABNSync, ABN, GlobalAvgPool2d File "../../lib/LibInplaceABN/modules/init.py", line 1, in from .bn import ABN, InPlaceABN, InPlaceABNSync File "../../lib/LibInplaceABN/modules/bn.py", line 10, in from .functions import * File "../../lib/LibInplaceABN/modules/functions.py", line 18, in extra_cuda_cflags=["--expt-extended-lambda"]) File "/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 501, in load _build_extension_module(name, build_directory) File "/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 582, in _build_extension_module name, error.output.decode())) RuntimeError: Error building extension 'inplace_abn': [1/5] /usr/local/cuda-9.1/bin/bin/nvcc -DTORCH_EXTENSION_NAME=inplace_abn -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/TH -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda-9.1/bin/include -I/home/peng/anaconda2/envs/python36/include/python3.6m --compiler-options '-fPIC' --expt-extended-lambda -std=c++11 -c /home/peng/pytorch-seg-new/lib/LibInplaceABN/modules/src/inplace_abn_cuda.cu -o inplace_abn_cuda.cuda.o FAILED: inplace_abn_cuda.cuda.o /usr/local/cuda-9.1/bin/bin/nvcc -DTORCH_EXTENSION_NAME=inplace_abn -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/TH -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda-9.1/bin/include -I/home/peng/anaconda2/envs/python36/include/python3.6m --compiler-options '-fPIC' --expt-extended-lambda -std=c++11 -c /home/peng/pytorch-seg-new/lib/LibInplaceABN/modules/src/inplace_abn_cuda.cu -o inplace_abn_cuda.cuda.o /bin/sh: 1: /usr/local/cuda-9.1/bin/bin/nvcc: not found [2/5] /usr/local/cuda-9.1/bin/bin/nvcc -DTORCH_EXTENSION_NAME=inplace_abn -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/TH -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda-9.1/bin/include -I/home/peng/anaconda2/envs/python36/include/python3.6m --compiler-options '-fPIC' --expt-extended-lambda -std=c++11 -c /home/peng/pytorch-seg-new/lib/LibInplaceABN/modules/src/inplace_abn_cuda_half.cu -o inplace_abn_cuda_half.cuda.o FAILED: inplace_abn_cuda_half.cuda.o /usr/local/cuda-9.1/bin/bin/nvcc -DTORCH_EXTENSION_NAME=inplace_abn -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/TH -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda-9.1/bin/include -I/home/peng/anaconda2/envs/python36/include/python3.6m --compiler-options '-fPIC' --expt-extended-lambda -std=c++11 -c /home/peng/pytorch-seg-new/lib/LibInplaceABN/modules/src/inplace_abn_cuda_half.cu -o inplace_abn_cuda_half.cuda.o /bin/sh: 1: /usr/local/cuda-9.1/bin/bin/nvcc: not found [3/5] c++ -MMD -MF inplace_abn.o.d -DTORCH_EXTENSION_NAME=inplace_abn -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/TH -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda-9.1/bin/include -I/home/peng/anaconda2/envs/python36/include/python3.6m -fPIC -std=c++11 -O3 -c /home/peng/pytorch-seg-new/lib/LibInplaceABN/modules/src/inplace_abn.cpp -o inplace_abn.o FAILED: inplace_abn.o c++ -MMD -MF inplace_abn.o.d -DTORCH_EXTENSION_NAME=inplace_abn -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/TH -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda-9.1/bin/include -I/home/peng/anaconda2/envs/python36/include/python3.6m -fPIC -std=c++11 -O3 -c /home/peng/pytorch-seg-new/lib/LibInplaceABN/modules/src/inplace_abn.cpp -o inplace_abn.o /home/peng/pytorch-seg-new/lib/LibInplaceABN/modules/src/inplace_abn.cpp:1:29: fatal error: torch/extension.h: No such file or directory compilation terminated. [4/5] c++ -MMD -MF inplace_abn_cpu.o.d -DTORCH_EXTENSION_NAME=inplace_abn -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/TH -I/home/peng/anaconda2/envs/python36/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda-9.1/bin/include -I/home/peng/anaconda2/envs/python36/include/python3.6m -fPIC -std=c++11 -O3 -c /home/peng/pytorch-seg-new/lib/LibInplaceABN/modules/src/inplace_abn_cpu.cpp -o inplace_abn_cpu.o ninja: build stopped: subcommand failed.

opened by zsp1993 14

gloo error

Hi, I used distributed train with 'gloo' method but got error:

File "/home/yuxb/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 102, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/yuxb/anaconda3/lib/python3.6/site-packages/torch/autograd/__init__.py", line 90, in backward
    allow_unreachable=True)  # allow_unreachable flag
  File "/home/yuxb/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 445, in distributed_data_parallel_hook
    self._queue_reduction(bucket_idx)
  File "/home/yuxb/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 475, in _queue_reduction
    self.device_ids)
TypeError: _queue_reduction(): incompatible function arguments. The following argument types are supported:
    1. (process_group: torch.distributed.ProcessGroup, grads_batch: List[List[at::Tensor]], devices: List[int]) -> Tuple[torch.distributed.Work, at::Tensor]

Invoked with: <torch.distributed.ProcessGroupGloo object at 0x7fdf2daaadc0>, [[tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0..

Thanks for your help.

opened by JamesKasperYu 13

How to train the ResNext model on the Cityscapes?

I used to train the PSP with the original ResNet-101(7x7 convolution) and achieve 76.8 mIou on the val set of Cityscapes. Thus I want to replace the ResNet-101 with the ResNext-101(the imagenet pretrained model you provide) and train it with the same parameters, but the performance seems very poor(66.4). Do you use the COCO to finetune the ResNext model?

Could you give me some tips to tune the hyperparameters for the ResNext-101 compared with ResNet-101?

Besides, I am wondering whether have you trained the modified ResNet-101(replace the 7X7 conv with three 3x3 conv) model on the ImageNet? If so, it would be great if you could share me this model.
invalid question

opened by PkuRainBow 13
About general activation function

Great thanks for inplace_abn, it increase batch size by 1.5 in my application!

I notice that InPlaceABN support no activatation function, relu, elu, leaky relu now. May I ask how to implement other activation function like prelu?

Can we just use [InPlaceABN(depth, activation='none'), torch.nn.PReLU()] , since PReLU seems not inplace operation?

opened by luzai 12
error: An extended __device__ lambda must not be defined in a function that is defined within another function
Hi, I find that you have updated your repo and I download the new repo.

I try to build the repo under such enviroments:

Ubuntu 16.04(docker) Miniconda env: pytorch04 python: 3.6.5 cuda: 8.0 cudnn: 7.0

I meet such errors:
enhancement
opened by PkuRainBow 12
Can't install by download file
root:/data/research/seamseg-master# pip install inplace_abn-master.zip Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Processing ./inplace_abn-master.zip ERROR: Command errored out with exit status 1: command: /data/config/anaconda3/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-1w23bkoz/setup.py'"'"'; file='"'"'/tmp/pip-req-build-1w23bkoz/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base pip-egg-info cwd: /tmp/pip-req-build-1w23bkoz/ Complete output (20 lines): /data/config/anaconda3/lib/python3.6/distutils/dist.py:261: UserWarning: Unknown distribution option: 'long_description_content_type' warnings.warn(msg) Traceback (most recent call last): File "", line 1, in File "/tmp/pip-req-build-1w23bkoz/setup.py", line 59, in cmdclass={"build_ext": BuildExtension} File "/data/config/anaconda3/lib/python3.6/distutils/core.py", line 108, in setup _setup_distribution = dist = klass(attrs) File "/data/config/anaconda3/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg/setuptools/dist.py", line 318, in init File "/data/config/anaconda3/lib/python3.6/distutils/dist.py", line 281, in init self.finalize_options() File "/data/config/anaconda3/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg/setuptools/dist.py", line 375, in finalize_options File "/tmp/pip-req-build-1w23bkoz/.eggs/setuptools_scm-3.3.3-py3.6.egg/setuptools_scm/integration.py", line 17, in version_keyword File "/tmp/pip-req-build-1w23bkoz/.eggs/setuptools_scm-3.3.3-py3.6.egg/setuptools_scm/init.py", line 150, in get_version File "/tmp/pip-req-build-1w23bkoz/.eggs/setuptools_scm-3.3.3-py3.6.egg/setuptools_scm/init.py", line 113, in _do_parse LookupError: setuptools-scm was unable to detect version for '/tmp/pip-req-build-1w23bkoz'.

Make sure you're either building from a fully intact git repository or PyPI tarballs. Most other sources (such as GitHub's tarballs, a git checkout without the .git folder) don't contain the necessary metadata and will not work. For example, if you're using pip, instead of https://github.com/user/proj/archive/master.zip use git+https://github.com/user/proj.git#egg=proj ----------------------------------------

ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
opened by jianlong-yuan 11
Have you noticed a big unbalance between gpus with InplaceSyncBN?

With pytorch nn.BatchNorm2d and nn.DataParallel, the memory difference between two gpus is less than 1G. However, with InplaceSyncBN and nn.DataParallel， the memory gap is enlarged to almost 2G. What is the cause of this and how could I avoid it?

opened by CoinCheung 11
Support for Pillow >= 8.3.0

As stated in issue 229, I contribute the implementation to check Pillow version. If the version of Pillow >= 8.3.0, the _PALETTE list must consist of all channels for one color followed by the next color (e.g. RGBRGBRGB). Otherwise, all R values must be contiguous in the list before G and B values.

opened by gyes00205 0
Support for Pillow >= 8.3.0
I try to test picture as below shown but get a strange result not like the result of issue 49.

Therefore, I try to find the bug of test_vistas_single_gpu.py. In line 320 of test_vistas_single_gpu.py, all R values is contiguous in the list before G and B values.

_PALETTE = ImagePalette.ImagePalette( palette=list(_PALETTE[:, 0]) + list(_PALETTE[:, 1]) + list(_PALETTE[:, 2]), mode="RGB", )

But in Pillow >= 8.3.0, the list must consist of all channels for one color followed by the next color (e.g. RGBRGBRGB). We can see the description in Pillow 8.3.x version Maybe I can contribute the code to check the Pillow version in test_vistas_single_gpu.py.
opened by gyes00205 0

inplace_abn does not seem to support Pytorch 1.12.1

I'm getting the following error when running a training job that was working fine in Pytorch 1.11 with 1.12:

  File "/pip-dl_inplace_abn/inplace_abn/functions.py", line 227, in inplace_abn
    return InPlaceABN.apply(
  File "/pip-dl_inplace_abn/inplace_abn/functions.py", line 86, in forward
    mean, var, count = _backend.statistics(x)
RuntimeError: Tensors of type TensorImpl do not have sizes

Is inplace_abn not compatible with Pytorch 1.12 or is the problem somewhere else?

opened by gdippolito 0

inplace_abn for transformer

dear author, the transformer use the layer norm in stead of batch norm, is it possible to apply inplace abn to transformer-based models? or is there any way to lower those models' gpu memory? thanks.

opened by 17dacheng 0
setup.py can not work

LookupError: setuptools-scm was unable to detect version for /home/pyl/pythonproject_pyl/inplace_abn_main.

Make sure you're either building from a fully intact git repository or PyPI tarballs. Most other sources (such as GitHub's tarballs, a git checkout without the .git folder) don't contain the necessary metadata and will not work.

opened by pyl3000 1

Import issue: Undefined symbol

When attempting to import Inplace-ABN version 1.1.0, I get the following error:

>>> from inplace_abn import InPlaceABN
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/curttigges/miniconda3/envs/pytorch-dl/lib/python3.7/site-packages/inplace_abn/__init__.py", line 1, in <module>
    from .abn import ABN, InPlaceABN, InPlaceABNSync
  File "/home/curttigges/miniconda3/envs/pytorch-dl/lib/python3.7/site-packages/inplace_abn/abn.py", line 8, in <module>
    from .functions import inplace_abn, inplace_abn_sync
  File "/home/curttigges/miniconda3/envs/pytorch-dl/lib/python3.7/site-packages/inplace_abn/functions.py", line 8, in <module>
    from . import _backend
ImportError: /home/curttigges/miniconda3/envs/pytorch-dl/lib/python3.7/site-packages/inplace_abn/_backend.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv

I am using CUDA 11.2 and PyTorch 1.8.2. The Inplace-ABN files were compiled with GCC 10.

opened by curt-tigges 1

Releases(v1.1.0)

v1.1.0(Sep 3, 2020)
This release updates ABN, InPlaceABN and InPlaceABNSync to feature parity with recent versions of Pytorch's BatchNormNd layers:

Add a track_running_stats parameter to enable / disable computation of running statistics independently from the layer's training state

Add a num_batches_tracked buffer, and allow passing momentum=None to support cumulative moving average for tracking running stats instead of exponential moving average

As a side-effect, now support loading parameters from standard BatchNorm without work-arounds. Still, if the loaded parameters contain negative weight elements the output will differ compared to standard BatchNorm

Additional changes:

Fix backward pass in eval mode: it was not properly accounting for the activation function

Refactor code to follow more sensible formatting standards

Add type annotations

Improve docstrings

Update installation instructions, pointing to the PyPI package

Source code(tar.gz)
Source code(zip)
v1.0.12(Apr 22, 2020)

Source code(tar.gz)
Source code(zip)
v1.0.11(Jan 27, 2020)

Source code(tar.gz)
Source code(zip)
v1.0.10(Jan 8, 2020)

This release contains an improved implementation of the fix for the backward pass in v1.0.9 which uses less temporary memory at no additional computational cost.
Source code(tar.gz)
Source code(zip)
v1.0.9(Jan 7, 2020)

In previous versions, both the input/output tensor y and the gradient tensor dy were overwritten during the backward pass. This was causing issues with some network topologies, producing wrong gradients.

To fix this issue, a pair of temporary tensors is now created during the backward pass to hold the results of intermediate computations. This change will increase the amount of temporary memory required, meaning that in some cases where GPU memory utilization was already very close to the limit OOM errors might now occur. An alternative, more complex fix is also possible at the expense of additional computational costs. We are evaluating the impact of these changes and will provide updates in a future release.
Source code(tar.gz)
Source code(zip)
v1.0.8(Nov 22, 2019)

Source code(tar.gz)
Source code(zip)
v1.0.7(Sep 4, 2019)

This release fixes a compatibility issue with CUDA 10.0, resulting in compilation errors in some cases.
Source code(tar.gz)
Source code(zip)
v1.0.6(Aug 23, 2019)
At compile time, when determining whether to enable CUDA support, we now base the decision on the Pytorch version installed:

If a CUDA-enabled Pytorch is detected, we attempt to compile CUDA support

If a CPU-only Pytorch is detected, we disable CUDA support

Source code(tar.gz)
Source code(zip)
v1.0.5(Aug 20, 2019)

InPlace-ABN can now be compiled and used without CUDA. Note that Synchronized InPlace-ABN is still only supported in conjunction with CUDA-enabled Pytorch.
Source code(tar.gz)
Source code(zip)
v1.0.4(Aug 14, 2019)

State dicts from standard BatchNorm layers trained with Pytorch v1.0.0 or newer can now be properly loaded by ABN, InPlaceABN and InPlaceABNSync.
Source code(tar.gz)
Source code(zip)

v1.0.3(Jul 16, 2019)

Added a couple of functions to manage distributed groups with InplaceABNSync:

active_group: create a distributed group where each worker can decide wether to participate or not.
set_active_group: scan a model, passing a distributed group to all layers that implement a set_group() method.

These are intended to simplify handling of asymmetric computational graphs in DistributedDataParallel when using InplaceABNSync. A typical usage is as follows:

class DynamicModel(nn.Module):
    def __init__(self):
        super(DynamicModel, self).__init__()
        self.conv1 = nn.Conv2d(4, 4, 1)
        self.bn1 = InplaceABNSync(4)
        self.conv2 = nn.Conv2d(4, 4, 1)
        self.bn2 = InplaceABNSync(4)
    
    def forward(x):
        x = self.conv1(x)
        x = self.bn1(x)
        
        # Call some data-dependent function telling us wether the second part of the network
        # should be traversed or not
        active = self.get_active(x)
        
        # Create process group containing only the active workers, pass it to bn2
        set_active_group(self.bn2, active_group(active))
        
        # Run the second part of the network only if active is True
        if active:
            x = self.conv2(x)
            x = self.bn2(x)
        
        return x

Source code(tar.gz)
Source code(zip)

v1.0.2(Jul 8, 2019)

Source code(tar.gz)
Source code(zip)
v1.0.1(Jul 5, 2019)
This update adds back support for mixed precision training. These combinations of inputs / parameters are now supported:

float32 input, float32 weight and bias

float64 input, float64 weight and bias

float16 input, float16 weight and bias

float16 input, float32 weight and bias

Note: in the float16 cases all internal operations are still performed with float32 math, and float16 is not supported when operating in CPU mode.
Source code(tar.gz)
Source code(zip)
v1.0.0(Jul 4, 2019)
This release marks some major changes in inplace_abn:

Complete rewrite of the CUDA code following the most recent native BN implementation from Pytorch

Improved synchronized BN implementation, correctly handling different per-GPU batch sizes and Pytorch distributed groups

The iABN layers are now packaged in an installable python library to simplify use in other projects

The Imagenet / Vistas scripts are still available in the scripts folder

Source code(tar.gz)
Source code(zip)
v0.1.1(Feb 14, 2019)

We added the possibility of training ResNet with inplace ABN layers.

In addition we released ResNet34 and ResNet50 pre-trained on ImageNet.
Source code(tar.gz)
Source code(zip)
v0.1(Jan 8, 2019)
This is a code refactoring to enable compatibility with Pytorch v1.0.

Additional changes:

Moved from multi-threading training to distributed training using multiple processes

We provide an adapted implementation of synchronized inplace ABN

Our inplace ABN layer is compatible with fp16 tensors.

Source code(tar.gz)
Source code(zip)
v0.0.3(Jan 7, 2019)
This is a partial code refactoring to enable compatibility with Pytorch v0.4.1. In particular:

Fixed compatibility with pytorch>=0.4.1 due to change of AT_ASSERT

Fixed GPU allocation of tensors created in CUDA code

Additional changes:

Added segmentation models and scripts to run inference on Vistas

Updated license

Source code(tar.gz)
Source code(zip)
v0.0.2(Jul 18, 2018)
This is a partial code refactoring to enable compatibility with Pytorch v0.4. In particular:

Native functions have been rewritten to use the new ATen-based extension interface introduced in v0.4. As a side effect, the native code doesn't need to be pre-compiled anymore. Instead, we are now using Pytorch's newly introduced run-time library loading mechanism.

The python code has been modified to account for the fact that autograd.Variable does not exist anymore.

Additional changes:

ABN modules have been slightly refactored, leading to a slight change in the structure of the overall models' state_dicts. As a consequence, pre-trained models need to be re-downloaded (updated links in README.md).

Source code(tar.gz)
Source code(zip)
v0.0.1(Jul 17, 2018)

NOTE: this is the last release that is compatible with Pytorch v0.3

After this release, the code will undergo partial rewrite to adapt to the changes introduced in Pytorch v0.4 regarding Tensors / Variables and native functions. As a consequence, we are completely dropping support for versions of Pytorch before v0.3.
Source code(tar.gz)
Source code(zip)

Owner

Map data at scale from street-level imagery

GitHub

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Annoy Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given quer

10.6k Jan 4, 2023

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

InfoPro-Pytorch The Information Propagation algorithm for training deep networks with local supervision. (ICLR 2021) Revisiting Locally Supervised Lea

78 Dec 27, 2022

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

ActNN : Activation Compressed Training This is the official project repository for ActNN: Reducing Training Memory Footprint via 2-Bit Activation Comp

178 Jan 5, 2023

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects This repo contains the code of Segcache described in the followi

78 Jan 7, 2023

PyTorch Code of "Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spatiotemporal Dynamics"

Memory In Memory Networks It is based on the paper Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spati

12 May 30, 2022

Episodic-memory - Ego4D Episodic Memory Benchmark

Ego4D Episodic Memory Benchmark EGO4D is the world's largest egocentric (first p

3 Feb 18, 2022

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

Memory Efficient Attention Pytorch Implementation of a memory efficient multi-head attention as proposed in the paper, Self-attention Does Not Need O(

180 Jan 5, 2023

Code for the paper: Adversarial Training Against Location-Optimized Adversarial Patches. ECCV-W 2020.

Adversarial Training Against Location-Optimized Adversarial Patches arXiv | Paper | Code | Video | Slides Code for the paper: Sukrut Rao, David Stutz,

32 Dec 13, 2022

Optimized code based on M2 for faster image captioning training

Transformer Captioning This repository contains the code for Transformer-based image captioning. Based on meshed-memory-transformer, we further optimi

16 Dec 16, 2022

A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Object Pose Estimation Demo This tutorial will go through the steps necessary to perform pose estimation with a UR3 robotic arm in Unity. You’ll gain

187 Dec 24, 2022

QuakeLabeler is a Python package to create and manage your seismic training data, processes, and visualization in a single place — so you can focus on building the next big thing.

QuakeLabeler Quake Labeler was born from the need for seismologists and developers who are not AI specialists to easily, quickly, and independently bu

15 Nov 4, 2022

This is the official PyTorch implementation for "Mesa: A Memory-saving Training Framework for Transformers".

Mesa: A Memory-saving Training Framework for Transformers This is the official PyTorch implementation for Mesa: A Memory-saving Training Framework for

105 Dec 6, 2022

PyTorchMemTracer - Depict GPU memory footprint during DNN training of PyTorch

A Memory Tracer For PyTorch OOM is a nightmare for PyTorch users. However, most

9 Nov 14, 2022

MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

4.7k Dec 29, 2022

Official Pytorch implementation of 'GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network' (NeurIPS 2020)

Official implementation of GOCor This is the official implementation of our paper : GOCor: Bringing Globally Optimized Correspondence Volumes into You

71 Nov 18, 2022

In-Place Activated BatchNorm for Memory-Optimized Training of DNNs

Related tags

Overview

In-Place Activated BatchNorm

Citation

Overview

Requirements

Force compiling with CUDA

Training on ImageNet-1k

Data preparation

Training

Validation

Usage for Semantic Segmentation on Cityscapes and Mapillary Vistas

Mapillary Vistas pre-trained model

Changelog

Comments

Releases(v1.1.0)

v1.1.0(Sep 3, 2020)

v1.0.12(Apr 22, 2020)

v1.0.11(Jan 27, 2020)

v1.0.10(Jan 8, 2020)

v1.0.9(Jan 7, 2020)

v1.0.8(Nov 22, 2019)

v1.0.7(Sep 4, 2019)

v1.0.6(Aug 23, 2019)

v1.0.5(Aug 20, 2019)

v1.0.4(Aug 14, 2019)

v1.0.3(Jul 16, 2019)

v1.0.2(Jul 8, 2019)

v1.0.1(Jul 5, 2019)

v1.0.0(Jul 4, 2019)

v0.1.1(Feb 14, 2019)

v0.1(Jan 8, 2019)

v0.0.3(Jan 7, 2019)

v0.0.2(Jul 18, 2018)

v0.0.1(Jul 17, 2018)

Owner

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects

PyTorch Code of "Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spatiotemporal Dynamics"

Episodic-memory - Ego4D Episodic Memory Benchmark

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

Code for the paper: Adversarial Training Against Location-Optimized Adversarial Patches. ECCV-W 2020.

Optimized code based on M2 for faster image captioning training

A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

QuakeLabeler is a Python package to create and manage your seismic training data, processes, and visualization in a single place — so you can focus on building the next big thing.

This is the official PyTorch implementation for "Mesa: A Memory-saving Training Framework for Transformers".

PyTorchMemTracer - Depict GPU memory footprint during DNN training of PyTorch

MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Official Pytorch implementation of 'GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network' (NeurIPS 2020)

Use AI to generate a optimized stock portfolio

Supervised Contrastive Learning for Downstream Optimized Sequence Representations

Tutel MoE: An Optimized Mixture-of-Experts Implementation

MOpt-AFL provided by the paper "MOPT: Optimized Mutation Scheduling for Fuzzers"