A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Related tags

Deep Learning apex
Overview

Introduction

This repository holds NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pytorch. Some of the code here will be included in upstream Pytorch eventually. The intention of Apex is to make up-to-date utilities available to users as quickly as possible.

Full API Documentation: https://nvidia.github.io/apex

GTC 2019 and Pytorch DevCon 2019 Slides

Contents

1. Amp: Automatic Mixed Precision

apex.amp is a tool to enable mixed precision training by changing only 3 lines of your script. Users can easily experiment with different pure and mixed precision training modes by supplying different flags to amp.initialize.

Webinar introducing Amp (The flag cast_batchnorm has been renamed to keep_batchnorm_fp32).

API Documentation

Comprehensive Imagenet example

DCGAN example coming soon...

Moving to the new Amp API (for users of the deprecated "Amp" and "FP16_Optimizer" APIs)

2. Distributed Training

apex.parallel.DistributedDataParallel is a module wrapper, similar to torch.nn.parallel.DistributedDataParallel. It enables convenient multiprocess distributed training, optimized for NVIDIA's NCCL communication library.

API Documentation

Python Source

Example/Walkthrough

The Imagenet example shows use of apex.parallel.DistributedDataParallel along with apex.amp.

Synchronized Batch Normalization

apex.parallel.SyncBatchNorm extends torch.nn.modules.batchnorm._BatchNorm to support synchronized BN. It allreduces stats across processes during multiprocess (DistributedDataParallel) training. Synchronous BN has been used in cases where only a small local minibatch can fit on each GPU. Allreduced stats increase the effective batch size for the BN layer to the global batch size across all processes (which, technically, is the correct formulation). Synchronous BN has been observed to improve converged accuracy in some of our research models.

Checkpointing

To properly save and load your amp training, we introduce the amp.state_dict(), which contains all loss_scalers and their corresponding unskipped steps, as well as amp.load_state_dict() to restore these attributes.

In order to get bitwise accuracy, we recommend the following workflow:

# Initialization
opt_level = 'O1'
model, optimizer = amp.initialize(model, optimizer, opt_level=opt_level)

# Train your model
...
with amp.scale_loss(loss, optimizer) as scaled_loss:
    scaled_loss.backward()
...

# Save checkpoint
checkpoint = {
    'model': model.state_dict(),
    'optimizer': optimizer.state_dict(),
    'amp': amp.state_dict()
}
torch.save(checkpoint, 'amp_checkpoint.pt')
...

# Restore
model = ...
optimizer = ...
checkpoint = torch.load('amp_checkpoint.pt')

model, optimizer = amp.initialize(model, optimizer, opt_level=opt_level)
model.load_state_dict(checkpoint['model'])
optimizer.load_state_dict(checkpoint['optimizer'])
amp.load_state_dict(checkpoint['amp'])

# Continue training
...

Note that we recommend restoring the model using the same opt_level. Also note that we recommend calling the load_state_dict methods after amp.initialize.

Requirements

Python 3

CUDA 9 or newer

PyTorch 0.4 or newer. The CUDA and C++ extensions require pytorch 1.0 or newer.

We recommend the latest stable release, obtainable from https://pytorch.org/. We also test against the latest master branch, obtainable from https://github.com/pytorch/pytorch.

It's often convenient to use Apex in Docker containers. Compatible options include:

  • NVIDIA Pytorch containers from NGC, which come with Apex preinstalled. To use the latest Amp API, you may need to pip uninstall apex then reinstall Apex using the Quick Start commands below.
  • official Pytorch -devel Dockerfiles, e.g. docker pull pytorch/pytorch:nightly-devel-cuda10.0-cudnn7, in which you can install Apex using the Quick Start commands.

See the Docker example folder for details.

Quick Start

Linux

For performance and full functionality, we recommend installing Apex with CUDA and C++ extensions via

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Apex also supports a Python-only build (required with Pytorch 0.4) via

pip install -v --disable-pip-version-check --no-cache-dir ./

A Python-only build omits:

  • Fused kernels required to use apex.optimizers.FusedAdam.
  • Fused kernels required to use apex.normalization.FusedLayerNorm.
  • Fused kernels that improve the performance and numerical stability of apex.parallel.SyncBatchNorm.
  • Fused kernels that improve the performance of apex.parallel.DistributedDataParallel and apex.amp. DistributedDataParallel, amp, and SyncBatchNorm will still be usable, but they may be slower.

Pyprof support has been moved to its own dedicated repository. The codebase is deprecated in Apex and will be removed soon.

Windows support

Windows support is experimental, and Linux is recommended. pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" . may work if you were able to build Pytorch from source on your system. pip install -v --no-cache-dir . (without CUDA/C++ extensions) is more likely to work. If you installed Pytorch in a Conda environment, make sure to install Apex in that same environment.

Comments
  • Warning:  apex was installed without --cuda_ext.

    Warning: apex was installed without --cuda_ext.

    I install apex according this sentence:

    python setup.py install --cuda_ext --cpp_ext

    2.After that, using

    import apex

    to test, but it report warning as following: Warning: apex was installed without --cuda_ext. Fused syncbn kernels will be unavailable. Python fallbacks will be used instead. Warning: apex was installed without --cuda_ext. FusedAdam will be unavailable. Warning: apex was installed without --cuda_ext. FusedLayerNorm will be unavailable.

    Is there any problem?

    opened by amuier 34
  • Data parallel error with O2 and not O1

    Data parallel error with O2 and not O1

    When using O2, data parallel does not work: RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)

    however with O1, everything works just fine.

    model = GeneralVae(encoder, decoder, rep_size=500).cuda()
    optimizer = optim.Adam(model.parameters(), lr=LR)
    model, optimizer = amp.initialize(model, optimizer, opt_level='O2')
    if data_para and torch.cuda.device_count() > 1:
        print("Let's use", torch.cuda.device_count(), "GPUs!")
        model = nn.DataParallel(model)
        model = model.cuda()
    
    loss_picture = customLoss()
    
    val_losses = []
    train_losses = []
    
    def train(epoch):
        train_loader_food = generate_data_loader(train_root, get_batch_size(epoch), int(rampDataSize * data_size))
        print("Epoch {}: batch_size {}".format(epoch, get_batch_size(epoch)))
        model.train()
        train_loss = 0
        loss = None
        for batch_idx, (data, _, aff) in enumerate(train_loader_food):
            data = data[0].cuda(0)
    
    DataParallel 
    opened by aclyde11 32
  • Error in FusedLayerNorm

    Error in FusedLayerNorm

    After installing apex with the cuda extensions and running pytorch-pretrained-BERT, I get the following error in FusedLayerNormAffineFunction, apex/normalization/fused_layer_norm.py (line 21).

    RuntimeError: a Tensor with 2482176 elements cannot be converted to Scalar (item at /pytorch/aten/src/ATen/native/Scalar.cpp:9)
    

    Here are the shapes of my tensors:

    input_ - [32, 101, 768]
    bias_ - [768]
    weight_ - [768]
    self.normalized_shape - [768]
    

    I'm not sure if it's a problem with pytorch-pretrained-BERT calling it incorrectly or a bug in apex. Any idea? I've also created an issue here.

    I'm running Ubuntu with CUDA 9, PyTorch 0.4.1.

    Full stacktrace below.

    File "/home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/pytorch_pretrained_bert/modeling.py", line 710, in forward
        embedding_output = self.embeddings(input_ids, token_type_ids)
      File "/home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/pytorch_pretrained_bert/modeling.py", line 261, in forward
        embeddings = self.LayerNorm(embeddings)
      File "/home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/normalization/fused_layer_norm.py", line 149, in forward
        input, self.weight, self.bias)
      File "/home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/normalization/fused_layer_norm.py", line 21, in forward
        input_, self.normalized_shape, weight_, bias_, self.eps)
    
    RuntimeError: a Tensor with 2482176 elements cannot be converted to Scalar (item at /pytorch/aten/src/ATen/native/Scalar.cpp:9)
    
    frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f1aa5da3021 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/lib/libc10.so)
    frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f1aa5da28ea in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/lib/libc10.so)
    frame #2: at::native::item(at::Tensor const&) + 0x12c3 (0x7f1aa690d5b3 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
    frame #3: at::TypeDefault::item(at::Tensor const&) const + 0x55 (0x7f1aa6b1c905 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
    frame #4: torch::autograd::VariableType::eye_out(at::Tensor&, long, long) const + 0x184 (0x7f1aa4faeec4 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
    frame #5: <unknown function> + 0x89ca (0x7f1a82e739ca in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
    frame #6: layer_norm_affine(at::Tensor, c10::ArrayRef<long>, at::Tensor, at::Tensor, double) + 0x185 (0x7f1a82e762a5 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
    frame #7: <unknown function> + 0x18d44 (0x7f1a82e83d44 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
    frame #8: <unknown function> + 0x16495 (0x7f1a82e81495 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
    frame #9: _PyCFunction_FastCallDict + 0x154 (0x55a8f9925744 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #10: <unknown function> + 0x198610 (0x55a8f99ac610 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #11: _PyEval_EvalFrameDefault + 0x30a (0x55a8f99d138a in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #12: <unknown function> + 0x71e1 (0x7f1af51ee1e1 in /home/hyper/.PyCharm2018.1/system/cythonExtensions/_pydevd_frame_eval_ext/pydevd_frame_evaluator.cpython-36m-x86_64-linux-gnu.so)
    frame #13: _PyFunction_FastCallDict + 0x11b (0x55a8f99a6bab in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #14: _PyObject_FastCallDict + 0x26f (0x55a8f9925b0f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #15: _PyObject_Call_Prepend + 0x63 (0x55a8f992a6a3 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #16: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #17: THPFunction_do_forward(THPFunction*, _object*) + 0x15c (0x7f1ae02e21ec in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
    frame #18: PyCFunction_Call + 0x5f (0x55a8f992863f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #19: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #20: <unknown function> + 0x16ba91 (0x55a8f997fa91 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #21: _PyObject_FastCallDict + 0x8b (0x55a8f992592b in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #22: <unknown function> + 0x19857e (0x55a8f99ac57e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #23: _PyEval_EvalFrameDefault + 0x30a (0x55a8f99d138a in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #24: <unknown function> + 0x71e1 (0x7f1af51ee1e1 in /home/hyper/.PyCharm2018.1/system/cythonExtensions/_pydevd_frame_eval_ext/pydevd_frame_evaluator.cpython-36m-x86_64-linux-gnu.so)
    frame #25: _PyFunction_FastCallDict + 0x11b (0x55a8f99a6bab in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #26: _PyObject_FastCallDict + 0x26f (0x55a8f9925b0f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #27: _PyObject_Call_Prepend + 0x63 (0x55a8f992a6a3 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #28: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #29: _PyEval_EvalFrameDefault + 0x19ec (0x55a8f99d2a6c in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #30: <unknown function> + 0x71e1 (0x7f1af51ee1e1 in /home/hyper/.PyCharm2018.1/system/cythonExtensions/_pydevd_frame_eval_ext/pydevd_frame_evaluator.cpython-36m-x86_64-linux-gnu.so)
    frame #31: <unknown function> + 0x1918e4 (0x55a8f99a58e4 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #32: _PyFunction_FastCallDict + 0x1bc (0x55a8f99a6c4c in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #33: _PyObject_FastCallDict + 0x26f (0x55a8f9925b0f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #34: _PyObject_Call_Prepend + 0x63 (0x55a8f992a6a3 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #35: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #36: <unknown function> + 0x16ba91 (0x55a8f997fa91 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #37: _PyObject_FastCallDict + 0x8b (0x55a8f992592b in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #38: <unknown function> + 0x19857e (0x55a8f99ac57e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #39: _PyEval_EvalFrameDefault + 0x30a (0x55a8f99d138a in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #40: <unknown function> + 0x71e1 (0x7f1af51ee1e1 in /home/hyper/.PyCharm2018.1/system/cythonExtensions/_pydevd_frame_eval_ext/pydevd_frame_evaluator.cpython-36m-x86_64-linux-gnu.so)
    frame #41: <unknown function> + 0x1918e4 (0x55a8f99a58e4 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #42: _PyFunction_FastCallDict + 0x3da (0x55a8f99a6e6a in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #43: _PyObject_FastCallDict + 0x26f (0x55a8f9925b0f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #44: _PyObject_Call_Prepend + 0x63 (0x55a8f992a6a3 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #45: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #46: _PyEval_EvalFrameDefault + 0x19ec (0x55a8f99d2a6c in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #47: <unknown function> + 0x71e1 (0x7f1af51ee1e1 in /home/hyper/.PyCharm2018.1/system/cythonExtensions/_pydevd_frame_eval_ext/pydevd_frame_evaluator.cpython-36m-x86_64-linux-gnu.so)
    frame #48: <unknown function> + 0x1918e4 (0x55a8f99a58e4 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #49: _PyFunction_FastCallDict + 0x1bc (0x55a8f99a6c4c in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #50: _PyObject_FastCallDict + 0x26f (0x55a8f9925b0f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #51: _PyObject_Call_Prepend + 0x63 (0x55a8f992a6a3 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #52: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #53: <unknown function> + 0x16ba91 (0x55a8f997fa91 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #54: _PyObject_FastCallDict + 0x8b (0x55a8f992592b in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #55: <unknown function> + 0x19857e (0x55a8f99ac57e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #56: _PyEval_EvalFrameDefault + 0x30a (0x55a8f99d138a in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #57: <unknown function> + 0x71e1 (0x7f1af51ee1e1 in /home/hyper/.PyCharm2018.1/system/cythonExtensions/_pydevd_frame_eval_ext/pydevd_frame_evaluator.cpython-36m-x86_64-linux-gnu.so)
    frame #58: <unknown function> + 0x1918e4 (0x55a8f99a58e4 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #59: _PyFunction_FastCallDict + 0x3da (0x55a8f99a6e6a in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #60: _PyObject_FastCallDict + 0x26f (0x55a8f9925b0f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #61: _PyObject_Call_Prepend + 0x63 (0x55a8f992a6a3 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #62: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    frame #63: _PyEval_EvalFrameDefault + 0x19ec (0x55a8f99d2a6c in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
    
    opened by Hyperparticle 32
  • run more than one DDP model on one machine

    run more than one DDP model on one machine

    Hi, thank you so much for your input so far. I have now successfully converted my models to use DDP instead of DP and time per epoch has gone down from 1400s to 1140s when using 8 GPUs. One problem that I was unable to solve so far is running more than one DDP model on the same machine:

    Traceback (most recent call last):
      File "run/run_training_DPP.py", line 68, in <module>
        unpack_data=unpack, deterministic=deterministic, fp16=args.fp16)
      File "/home/fabian/PhD/meddec/meddec/model_training/distributed_training/nnUNetTrainerDPP.py", line 26, in __init__
        dist.init_process_group(backend='nccl', init_method='env://')
      File "/home/fabian/dl_venv_python3/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 354, in init_process_group
        store, rank, world_size = next(rendezvous(url))
      File "/home/fabian/dl_venv_python3/lib/python3.6/site-packages/torch/distributed/rendezvous.py", line 143, in _env_rendezvous_handler
        store = TCPStore(master_addr, master_port, world_size, start_daemon)
    RuntimeError: Address already in use
    
    

    (which is kind of to be expected given how DDP works). Do you know a workaround for this? We recently have gotten a dgx2 with 16 GPUs and I would like to run two different experiments in parallel, each using 8 GPUs. Best, Fabian

    opened by FabianIsensee 27
  • loss spike after checkpoint reload

    loss spike after checkpoint reload

    When running a model using apex+ddt my loss spikes dramatically after the model restarts. If i disable apex, it works fine.

    Currently, I've set up apex this way:

    optimizer = Adam()
    schedulers= LR
    
    torch.cuda.set_device(gpu_nb)
    model.cuda(gpu_nb)
    
    # apex
    model, optimizer = amp.initialize(model, optimizer, opt_level='O2')    
    
    # ddp
    model = DDP(model)   
    
    # restore state
    ckpt = torch.load(path)
    model.load_state_dict(ckpt['state_dict'])
    optimizer.load_state_dict(ckpt['opt_dict'])
    LR.load_state_dict(ckpt['lr_dict'])
    amp.load_state_dict(ckpt['amp'])   
    
    # continue ....
    

    Actual code is here:

    opened by williamFalcon 26
  • Hard error on mismatch between torch.version.cuda and + the Cuda toolkit version being used to compile Apex

    Hard error on mismatch between torch.version.cuda and + the Cuda toolkit version being used to compile Apex

    The warning message was too subtle/too easy to overlook in the output of setup.py, and it really should be a hard error.

    Making it a hard error should also assist with cuda driver version errors like https://github.com/NVIDIA/apex/issues/314. https://github.com/NVIDIA/apex/issues/314 resulted because the cuda driver (libcuda.so) version was 10.0, the cuda toolkit version used to compile the Pytorch binaries was 10.0 (which was fine), but the cuda toolkit version used to compile Apex was 10.1** (which triggered a PTX JIT compilation error at runtime because the 10.0 libcuda.so couldn't handle the PTX produced by the 10.1 nvcc). The PTX JIT compilation error message was cryptic and unhelpful.

    However, if the toolkit version that was used to compile Pytorch binaries is too recent for the system's cuda driver version, Pytorch will raise a much more helpful error, something like

    "AssertionError: 
    The NVIDIA driver on your system is too old (found version 10000)..."
    

    If we hard-enforce that the cuda toolkit version used to compile Apex == the cuda toolkit version used to compile Pytorch, we also ensure that if the toolkit version used to compile Apex is too new for the driver, the toolkit version used to compile Pytorch must also be too new for the driver, and therefore in such cases we will receive the helpful Pytorch error instead of the bizarre PTX JIT error.

    **A warning of the mismatch between torch.version.cuda and the toolkit (nvcc) had likely been issued by the setup.py while compiling apex, but this warning had likely been overlooked, so what ended up surfacing was the PTX JIT error, which was not at all a clear indication of what had gone wrong.

    opened by mcarilli 21
  • Install Failure on GCP Deep Learning VM

    Install Failure on GCP Deep Learning VM

    I created a simple GCP Deep Learning VM: https://cloud.google.com/deep-learning-vm/

    I followed the install directions, and the install failed with errors:

    git clone https://github.com/NVIDIA/apex
    cd apex
    pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .
    
    ...
    Command "/opt/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-req-build-j0qgf5ds/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, 
    __file__, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-record-1yr2fag5/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-req-build-j0qgf5ds/
    Exception information:
    Traceback (most recent call last):
      File "/opt/anaconda3/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 143, in main
        status = self.run(options, args)
      File "/opt/anaconda3/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 366, in run
        use_user_site=options.use_user_site,
      File "/opt/anaconda3/lib/python3.7/site-packages/pip/_internal/req/__init__.py", line 49, in install_given_reqs
        **kwargs
      File "/opt/anaconda3/lib/python3.7/site-packages/pip/_internal/req/req_install.py", line 791, in install
        spinner=spinner,
      File "/opt/anaconda3/lib/python3.7/site-packages/pip/_internal/utils/misc.py", line 705, in call_subprocess
        % (command_desc, proc.returncode, cwd))
    

    The Python-only option also failed:

    pip install -v --no-cache-dir .
    
    ...
    Command "/opt/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-req-build-eedemek6/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, 
    __file__, 'exec'))" install --record /tmp/pip-record-ehl5a4y7/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-req-build-eedemek6/
    Exception information:
    Traceback (most recent call last):
      File "/opt/anaconda3/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 143, in main
        status = self.run(options, args)
      File "/opt/anaconda3/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 366, in run
        use_user_site=options.use_user_site,
      File "/opt/anaconda3/lib/python3.7/site-packages/pip/_internal/req/__init__.py", line 49, in install_given_reqs
        **kwargs
      File "/opt/anaconda3/lib/python3.7/site-packages/pip/_internal/req/req_install.py", line 791, in install
        spinner=spinner,
      File "/opt/anaconda3/lib/python3.7/site-packages/pip/_internal/utils/misc.py", line 705, in call_subprocess
        % (command_desc, proc.returncode, cwd))
    

    It would seem like installation on a GCP Deep Learning VM would be one of the tested use cases here no?? If it doesn't work there of all places, where is it intended to work?

    gcp 
    opened by glenn-jocher 21
  • Segmentation fault

    Segmentation fault

    I'm getting a segmentation fault when trying to train a model with amp.

    torch version '1.0.1.post2' cudnn version 7.4.2

    @mcarilli What could cause those issues?

    Defaults for this optimization level are:
    enabled                : True
    opt_level              : O1
    cast_model_type        : None
    patch_torch_functions  : True
    keep_batchnorm_fp32    : None
    master_weights         : None
    loss_scale             : dynamic
    Processing user overrides (additional kwargs that are not None)...
    After processing overrides, optimization options are:
    enabled                : True
    opt_level              : O1
    cast_model_type        : None
    patch_torch_functions  : True
    keep_batchnorm_fp32    : None
    master_weights         : None
    loss_scale             : dynamic
    Segmentation fault (core dumped)
    
    :[System Logs]:
    :Mar 29 12:23:22 server kernel: python3.6[95957]: segfault at b ip 00007f6e08bc664c sp 00007ffe63a06200 error 6 in amp_C.cpython-36m-x86_64-linux-gnu.so[7f6e08bb3000+64000]
    :Mar 29 12:23:22 server abrt-hook-ccpp[96042]: Process 95957 (python3.6) of user 992434 killed by SIGSEGV - dumping core
    :Mar 29 12:26:09 server kernel: python3.6[99444]: segfault at b ip 00007f502c4b764c sp 00007ffd04c510a0 error 6 in amp_C.cpython-36m-x86_64-linux-gnu.so[7f502c4a4000+64000]
    :Mar 29 12:26:09 server abrt-hook-ccpp[99522]: Process 99444 (python3.6) of user 992434 killed by SIGSEGV - dumping core
    :Mar 29 12:44:58 server kernel: python3.6[106167]: segfault at b ip 00007fa96640c64c sp 00007ffce94225f0 error 6 in amp_C.cpython-36m-x86_64-linux-gnu.so[7fa9663f9000+64000]
    :Mar 29 12:44:58 server abrt-hook-ccpp[106244]: Process 106167 (python3.6) of user 992434 killed by SIGSEGV - dumping core
    
    extension build 
    opened by che85 21
  • SyncBatchNorm doesn't support 2 dimensions input?

    SyncBatchNorm doesn't support 2 dimensions input?

    Hi, I'm facing the issue that the program crash when the input for SyncBatchNorm is two dimensions. Here's the code:

    import torch
    import apex
    
    model = apex.parallel.SyncBatchNorm(4).cuda()
    data = torch.rand((8,4)).cuda()
    output = model(data)
    

    When running the code, error raised like this:

    Traceback (most recent call last):
      File "syncbn_test.by", line 7, in <module>
        output = model(data)
      File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 489, in __call__
        result = self.forward(*input, **kwargs)
      File "/usr/local/lib/python3.5/dist-packages/apex/parallel/optimized_sync_batchnorm.py", line 81, in forward
        return SyncBatchnormFunction.apply(input, self.weight, self.bias, self.running_mean, self.running_var, self.eps, self.training or not self.track_running_stats, exponential_average_factor, self.process_group, self.channel_last)
      File "/usr/local/lib/python3.5/dist-packages/apex/parallel/optimized_sync_batchnorm_kernel.py", line 27, in forward
        mean, var_biased = syncbn.welford_mean_var(input)
    RuntimeError: Dimension out of range (expected to be in range of [-2, 1], but got 2) (maybe_wrap_dim at /pytorch/aten/src/ATen/core/WrapDimMinimal.h:18)
    

    And everthing runs ok when data a 4 dims tensor.

    Here is my environment:

    Ubuntu 16.04
    Python 3.5.2
    Pytorch 1.0.1, installed with "pip install torch"
    apex is installed with command:
      pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .
    cuda 10.0
    nvidia driver 410.72
    
    opened by flymark2010 19
  • PTX JIT compiler failed

    PTX JIT compiler failed

    I installed as such:

    1. pytorch 1.1.

    2. torchvison 2.1 (both from conda -c python).

    3. built apex.

    And now I'm getting this error:

      File "/home/waf/miniconda3/envs/fisherman/lib/python3.7/site-packages/apex/multi_tensor_apply/multi_tensor_
    apply.py", line 30, in __call__                                                                                    
        *args)                                                                                                         
    RuntimeError: CUDA error: a PTX JIT compilation failed (multi_tensor_apply at csrc/multi_tensor_apply.cuh:101)     
    frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f119a22c441 in /home/waf/miniconda3/en
    vs/fisherman/lib/python3.7/site-packages/torch/lib/libc10.so)                                                      
    frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f119a22bd7a in /home/waf/mini
    conda3/envs/fisherman/lib/python3.7/site-packages/torch/lib/libc10.so)                                             
    frame #2: void multi_tensor_apply<2, ScaleFunctor<c10::Half, float>, float>(int, int, at::Tensor const&, std::vecto
    r<std::vector<at::Tensor, std::allocator<at::Tensor> >, std::allocator<std::vector<at::Tensor, std::allocator<at::T
    ensor> > > > const&, ScaleFunctor<c10::Half, float>, float) + 0x2c71 (0x7f112bd57a01 in /home/waf/miniconda3/
    envs/fisherman/lib/python3.7/site-packages/amp_C.cpython-37m-x86_64-linux-gnu.so)                                  
    frame #3: multi_tensor_scale_cuda(int, at::Tensor, std::vector<std::vector<at::Tensor, std::allocator<at::Tensor> >
    , std::allocator<std::vector<at::Tensor, std::allocator<at::Tensor> > > >, float) + 0x3a8 (0x7f112bd4d9a8 in /home/
    waf/miniconda3/envs/fisherman/lib/python3.7/site-packages/amp_C.cpython-37m-x86_64-linux-gnu.so)             
    frame #4: <unknown function> + 0x15bac (0x7f112bd4cbac in /home/waf/miniconda3/envs/fisherman/lib/python3.7/s
    ite-packages/amp_C.cpython-37m-x86_64-linux-gnu.so)                                                                
    frame #5: <unknown function> + 0x15c6e (0x7f112bd4cc6e in /home/waf/miniconda3/envs/fisherman/lib/python3.7/s
    ite-packages/amp_C.cpython-37m-x86_64-linux-gnu.so)                                                                
    frame #6: <unknown function> + 0x12277 (0x7f112bd49277 in /home/waf/miniconda3/envs/fisherman/lib/python3.7/s
    ite-packages/amp_C.cpython-37m-x86_64-linux-gnu.so)                                                                
    <omitting python frames>                                                                                           
    frame #45: __libc_start_main + 0xf1 (0x7f119ea3c2e1 in /lib/x86_64-linux-gnu/libc.so.6)      
    
    
    opened by williamFalcon 16
  • Installation error

    Installation error

    I'd like to install apex but failed. Any suggestions? Thank you.

    $ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
    /lab/cedar/home/lin/.conda/envs/monorepoCancerCommunity/lib/python3.6/site-packages/pip/_internal/commands/install.py:243: UserWarning: Disabling all use of wheels due to the use of --build-options / --global-options / --install-options.
      cmdoptions.check_install_build_global(options)
    Created temporary directory: /tmp/pip-ephem-wheel-cache-8ib28pad
    Created temporary directory: /tmp/pip-req-tracker-su9lkj72
    Created requirements tracker '/tmp/pip-req-tracker-su9lkj72'
    Created temporary directory: /tmp/pip-install-px208sk7
    Processing /lab/cedar/home/lin/gitProjects/apex
      Created temporary directory: /tmp/pip-req-build-zfi8u8xr
      Added file:///lab/cedar/home/lin/gitProjects/apex to build tracker '/tmp/pip-req-tracker-su9lkj72'
        Running setup.py (path:/tmp/pip-req-build-zfi8u8xr/setup.py) egg_info for package from file:///lab/cedar/home/lin/gitProjects/apex
        Running command python setup.py egg_info
        torch.__version__  =  1.2.0
        running egg_info
        creating pip-egg-info/apex.egg-info
        writing pip-egg-info/apex.egg-info/PKG-INFO
        writing dependency_links to pip-egg-info/apex.egg-info/dependency_links.txt
        writing top-level names to pip-egg-info/apex.egg-info/top_level.txt
        writing manifest file 'pip-egg-info/apex.egg-info/SOURCES.txt'
        reading manifest file 'pip-egg-info/apex.egg-info/SOURCES.txt'
        writing manifest file 'pip-egg-info/apex.egg-info/SOURCES.txt'
        /tmp/pip-req-build-zfi8u8xr/setup.py:43: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies!
          warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")
      Source in /tmp/pip-req-build-zfi8u8xr has version 0.1, which satisfies requirement apex==0.1 from file:///lab/cedar/home/lin/gitProjects/apex
      Removed apex==0.1 from file:///lab/cedar/home/lin/gitProjects/apex from build tracker '/tmp/pip-req-tracker-su9lkj72'
    Skipping bdist_wheel for apex, due to binaries being disabled for it.
    Installing collected packages: apex
      Created temporary directory: /tmp/pip-record-mazj60_q
        Running command /lab/cedar/home/lin/.conda/envs/monorepoCancerCommunity/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-zfi8u8xr/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-zfi8u8xr/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' --cpp_ext --cuda_ext install --record /tmp/pip-record-mazj60_q/install-record.txt --single-version-externally-managed --compile
        torch.__version__  =  1.2.0
        /tmp/pip-req-build-zfi8u8xr/setup.py:43: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies!
          warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")
    
        Compiling cuda extensions with
        nvcc: NVIDIA (R) Cuda compiler driver
        Copyright (c) 2005-2018 NVIDIA Corporation
        Built on Sat_Aug_25_21:08:01_CDT_2018
        Cuda compilation tools, release 10.0, V10.0.130
        from /usr/local/cuda/bin
    
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "/tmp/pip-req-build-zfi8u8xr/setup.py", line 100, in <module>
            check_cuda_torch_binary_vs_bare_metal(torch.utils.cpp_extension.CUDA_HOME)
          File "/tmp/pip-req-build-zfi8u8xr/setup.py", line 77, in check_cuda_torch_binary_vs_bare_metal
            "https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.  "
        RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries.  Pytorch binaries were compiled with Cuda 9.2.148.
        In some cases, a minor-version mismatch will not cause later errors:  https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.  You can try commenting out this check (at your own risk).
      Running setup.py install for apex ... error
    Cleaning up...
      Removing source in /tmp/pip-req-build-zfi8u8xr
    Removed build tracker '/tmp/pip-req-tracker-su9lkj72'
    ERROR: Command errored out with exit status 1: /lab/cedar/home/lin/.conda/envs/monorepoCancerCommunity/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-zfi8u8xr/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-zfi8u8xr/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' --cpp_ext --cuda_ext install --record /tmp/pip-record-mazj60_q/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.
    Exception information:
    Traceback (most recent call last):
      File "/lab/cedar/home/lin/.conda/envs/monorepoCancerCommunity/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 188, in main
        status = self.run(options, args)
      File "/lab/cedar/home/lin/.conda/envs/monorepoCancerCommunity/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 407, in run
        use_user_site=options.use_user_site,
      File "/lab/cedar/home/lin/.conda/envs/monorepoCancerCommunity/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 58, in install_given_reqs
        **kwargs
      File "/lab/cedar/home/lin/.conda/envs/monorepoCancerCommunity/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 959, in install
        spinner=spinner,
      File "/lab/cedar/home/lin/.conda/envs/monorepoCancerCommunity/lib/python3.6/site-packages/pip/_internal/utils/misc.py", line 931, in call_subprocess
        raise InstallationError(exc_msg)
    pip._internal.exceptions.InstallationError: Command errored out with exit status 1: /lab/cedar/home/lin/.conda/envs/monorepoCancerCommunity/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-zfi8u8xr/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-zfi8u8xr/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' --cpp_ext --cuda_ext install --record /tmp/pip-record-mazj60_q/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.
    1 location(s) to search for versions of pip:
    * https://pypi.org/simple/pip/
    Getting page https://pypi.org/simple/pip/
    Found index url https://pypi.org/simple
    Starting new HTTPS connection (1): pypi.org:443
    https://pypi.org:443 "GET /simple/pip/ HTTP/1.1" 200 11972
    Analyzing links from page https://pypi.org/simple/pip/
      Found link https://files.pythonhosted.org/packages/3d/9d/1e313763bdfb6a48977b65829c6ce2a43eaae29ea2f907c8bbef024a7219/pip-0.2.tar.gz#sha256=88bb8d029e1bf4acd0e04d300104b7440086f94cc1ce1c5c3c31e3293aee1f81 (from https://pypi.org/simple/pip/), version: 0.2
      Found link https://files.pythonhosted.org/packages/18/ad/c0fe6cdfe1643a19ef027c7168572dac6283b80a384ddf21b75b921877da/pip-0.2.1.tar.gz#sha256=83522005c1266cc2de97e65072ff7554ac0f30ad369c3b02ff3a764b962048da (from https://pypi.org/simple/pip/), version: 0.2.1
      Found link https://files.pythonhosted.org/packages/17/05/f66144ef69b436d07f8eeeb28b7f77137f80de4bf60349ec6f0f9509e801/pip-0.3.tar.gz#sha256=183c72455cb7f8860ac1376f8c4f14d7f545aeab8ee7c22cd4caf79f35a2ed47 (from https://pypi.org/simple/pip/), version: 0.3
      Found link https://files.pythonhosted.org/packages/0a/bb/d087c9a1415f8726e683791c0b2943c53f2b76e69f527f2e2b2e9f9e7b5c/pip-0.3.1.tar.gz#sha256=34ce534f17065c78f980702928e988a6b6b2d8a9851aae5f1571a1feb9bb58d8 (from https://pypi.org/simple/pip/), version: 0.3.1
      Found link https://files.pythonhosted.org/packages/cf/c3/153571aaac6cf999f4bb09c019b1ff379b7b599ea833813a41c784eec995/pip-0.4.tar.gz#sha256=28fc67558874f71fddda7168f73595f1650523dce3bc5bf189713ecdfc1e456e (from https://pypi.org/simple/pip/), version: 0.4
      Found link https://files.pythonhosted.org/packages/8d/c7/f05c87812fa5d9562ecbc5f4f1fc1570444f53c81c834a7f662af406e3c1/pip-0.5.tar.gz#sha256=328d8412782f22568508a0d0c78a49c9920a82e44c8dfca49954fe525c152b2a (from https://pypi.org/simple/pip/), version: 0.5
      Found link https://files.pythonhosted.org/packages/9a/aa/f536b6d14fe03343367da2ff44eee28f340ae650cd017ca088b6be13084a/pip-0.5.1.tar.gz#sha256=e27650538c41fe1007a41abd4cfd0f905b822622cbe1f8e7e09d1215af207694 (from https://pypi.org/simple/pip/), version: 0.5.1
      Found link https://files.pythonhosted.org/packages/db/e6/fdf7be8a17b032c533d3f91e91e2c63dd81d3627cbe4113248a00c2d39d8/pip-0.6.tar.gz#sha256=4cf47db6815b2f435d1f44e1f35ff04823043f6161f7df9aec71a123b0c47f0d (from https://pypi.org/simple/pip/), version: 0.6
      Found link https://files.pythonhosted.org/packages/91/cd/105f4d3c75d0ae18e12623acc96f42168aaba408dd6e43c4505aa21f8e37/pip-0.6.1.tar.gz#sha256=efe47e84ffeb0ea4804f9858b8a94bebd07f5452f907ebed36d03aed06a9f9ec (from https://pypi.org/simple/pip/), version: 0.6.1
      Found link https://files.pythonhosted.org/packages/1c/c7/c0e1a9413c37828faf290f29a85a4d6034c145cc04bf1622ba8beb662ad8/pip-0.6.2.tar.gz#sha256=1c1a504d7e70d2c24246f95bd16e3d5fcec740fd144df69a407bf65a2ee67586 (from https://pypi.org/simple/pip/), version: 0.6.2
      Found link https://files.pythonhosted.org/packages/3f/af/c4b9d49fb0f286996b28dbc0955c3ad359794697eb98e0e69863908070b0/pip-0.6.3.tar.gz#sha256=1a6df71eb29b98cba11bde6d6a0d8c6dd8b0518e74ceb71fb31ea4fbb42fd313 (from https://pypi.org/simple/pip/), version: 0.6.3
      Found link https://files.pythonhosted.org/packages/ec/7a/6fe91ff0079ad0437830957c459d52f3923e516f5b453218f2a93d09a427/pip-0.7.tar.gz#sha256=ceaea0b9e494d893c8a191895301b79c1db33e41f14d3ad93e3d28a8b4e9bf27 (from https://pypi.org/simple/pip/), version: 0.7
      Found link https://files.pythonhosted.org/packages/a5/63/11303863c2f5e9d9a15d89fcf7513a4b60987007d418862e0fb65c09fff7/pip-0.7.1.tar.gz#sha256=f54f05aa17edd0036de433c44892c8fedb1fd2871c97829838feb995818d24c3 (from https://pypi.org/simple/pip/), version: 0.7.1
      Found link https://files.pythonhosted.org/packages/cd/a9/1debaa96bbc1005c1c8ad3b79fec58c198d35121546ea2e858ce0894268a/pip-0.7.2.tar.gz#sha256=98df2eb779358412bbbae75980171ae85deebc846d87e244d086520b1212da09 (from https://pypi.org/simple/pip/), version: 0.7.2
      Found link https://files.pythonhosted.org/packages/74/54/f785c327fb3d163560a879b36edae5c78ee07806be282c9d4807f6be7dd1/pip-0.8.tar.gz#sha256=9017e4484a212dd4e1a43dd9f039dd7fc8338d4eea1c339d5ae1c80726de5b0f (from https://pypi.org/simple/pip/), version: 0.8
      Found link https://files.pythonhosted.org/packages/5c/79/5e8381cc3078bae92166f2ba96de8355e8c181926505ba8882f7b099a500/pip-0.8.1.tar.gz#sha256=7176a87f35675f6468341212f3b959bb51d23ea66eb1c3692bf746c45c716fa2 (from https://pypi.org/simple/pip/), version: 0.8.1
      Found link https://files.pythonhosted.org/packages/17/3e/0a98ab032991518741e7e712a719633e6ae160f51b3d3e855194530fd308/pip-0.8.2.tar.gz#sha256=f80a3549c048bc3bbcb47844826e9c7c6fcd87e77b92bef0d9e66d1b397c4962 (from https://pypi.org/simple/pip/), version: 0.8.2
      Found link https://files.pythonhosted.org/packages/f7/9a/943fc6d879ed7220bac2e7e53096bfe78abec88d77f2f516400e0129679e/pip-0.8.3.tar.gz#sha256=1be2e18edd38aa75b5e4ef38a99ec33ba9247177cfcb4a6d2d2b3e73430e3001 (from https://pypi.org/simple/pip/), version: 0.8.3
      Found link https://files.pythonhosted.org/packages/24/33/6eb675fb6db7b71d69d6928b33dea61b8bf5cfe1e5649be70ec84ce2fc09/pip-1.0.tar.gz#sha256=34ba07e2d14ba86d5088ba896ac80bed845a9b276ab8acb279b8d99bc77fec8e (from https://pypi.org/simple/pip/), version: 1.0
      Found link https://files.pythonhosted.org/packages/10/d9/f584e6107ef98ad7eaaaa5d0f756bfee12561fa6a4712ffdb7209e0e1fd4/pip-1.0.1.tar.gz#sha256=37d2f18213d3845d2038dd3686bc71fc12bb41ad66c945a8b0dfec2879f3497b (from https://pypi.org/simple/pip/), version: 1.0.1
      Found link https://files.pythonhosted.org/packages/16/90/5e6f80364d8a656f60681dfb7330298edef292d43e1499bcb3a4c71ff0b9/pip-1.0.2.tar.gz#sha256=a6ed9b36aac2f121c01a2c9e0307a9e4d9438d100a407db701ac65479a3335d2 (from https://pypi.org/simple/pip/), version: 1.0.2
      Found link https://files.pythonhosted.org/packages/25/57/0d42cf5307d79913a082c5c4397d46f3793bc35e1138a694136d6e31be99/pip-1.1.tar.gz#sha256=993804bb947d18508acee02141281c77d27677f8c14eaa64d6287a1c53ef01c8 (from https://pypi.org/simple/pip/), version: 1.1
      Found link https://files.pythonhosted.org/packages/ba/c3/4e1f892f41aaa217fe0d1f827fa05928783349c69f3cc06fdd68e112678a/pip-1.2.tar.gz#sha256=2b168f1987403f1dc6996a1f22a6f6637b751b7ab6ff27e78380b8d6e70aa314 (from https://pypi.org/simple/pip/), version: 1.2
      Found link https://files.pythonhosted.org/packages/c3/a2/a63244da32afd9ce9a8ca1bd86e71610039adea8b8314046ebe5047527a6/pip-1.2.1.tar.gz#sha256=12a9302acfca62cdc7bc5d83386cac3e0581db61ac39acdb3a4e766a16b88eb1 (from https://pypi.org/simple/pip/), version: 1.2.1
      Found link https://files.pythonhosted.org/packages/00/45/69d4f2602b80550bfb26cfd2f62c2f05b3b5c7352705d3766cd1e5b27648/pip-1.3.tar.gz#sha256=d6a13c5be316cb21a0243047c7f163f47e88973ebccff8d32e63ca1bf4d9321c (from https://pypi.org/simple/pip/), version: 1.3
      Found link https://files.pythonhosted.org/packages/5b/ce/f5b98104f1c10d868936c25f7c597f492d4371aa9ad5fb61a94954ee7208/pip-1.3.1.tar.gz#sha256=145eaa5d1ea1b062663da1f3a97780d7edea4c63c68a37c463b1deedf7bb4957 (from https://pypi.org/simple/pip/), version: 1.3.1
      Found link https://files.pythonhosted.org/packages/5f/d0/3b3958f6a58783bae44158b2c4c7827ae89abaecdd4bed12cff402620b9a/pip-1.4.tar.gz#sha256=1fd43cbf07d95ddcecbb795c97a1674b3ddb711bb4a67661284a5aa765aa1b97 (from https://pypi.org/simple/pip/), version: 1.4
      Found link https://files.pythonhosted.org/packages/3f/f8/da390e0df72fb61d176b25a4b95262e3dcc14bda0ad25ac64d56db38b667/pip-1.4.1.tar.gz#sha256=4e7a06554711a624c35d0c646f63674b7f6bfc7f80221bf1eb1f631bd890d04e (from https://pypi.org/simple/pip/), version: 1.4.1
      Found link https://files.pythonhosted.org/packages/4f/7d/e53bc80667378125a9e07d4929a61b0bd7128a1129dbe6f07bb3228652a3/pip-1.5.tar.gz#sha256=25f81d1a0e55d3b1709818dd57fdfb954b028f229f09bd69cb0bc80a8e03e048 (from https://pypi.org/simple/pip/), version: 1.5
      Found link https://files.pythonhosted.org/packages/44/5d/1dca53b5de6d287e7eb99bd174bb022eb6cb0d6ca6e19ca6b16655dde8c2/pip-1.5.1-py2.py3-none-any.whl#sha256=00960db3b0b8724dd37fe37cfb9c72ecb8f59fab9db7d17c5c1e89a1adab49ce (from https://pypi.org/simple/pip/), version: 1.5.1
      Found link https://files.pythonhosted.org/packages/21/3f/d86a600c9b2f41a75caacf768a24130f343def97652de2345da15ef7911f/pip-1.5.1.tar.gz#sha256=e60e936fbc101d56668c6134c1f2b5b40fcbec8b4fc4ca7fc34842b6b4c5c130 (from https://pypi.org/simple/pip/), version: 1.5.1
      Found link https://files.pythonhosted.org/packages/3d/1f/227d77d5e9ed2df5162de4ba3616799a351eccb1ecd668ae824dd26153a1/pip-1.5.2-py2.py3-none-any.whl#sha256=6903909ccdcdbc3297b74118590e71344d6d262827acd1f5c0e2fcfce9807499 (from https://pypi.org/simple/pip/), version: 1.5.2
      Found link https://files.pythonhosted.org/packages/ed/94/391a003107f6ec997c314199d03bff1c105af758ee490e3255353574487b/pip-1.5.2.tar.gz#sha256=2a8a3e08e652d3a40edbb39264bf01f8ff3c32520a79113357cca1f30533f738 (from https://pypi.org/simple/pip/), version: 1.5.2
      Found link https://files.pythonhosted.org/packages/df/e9/bdb53d44fad1465b43edaf6bc7dd3027ed5af81405cc97603fdff0721ebb/pip-1.5.3-py2.py3-none-any.whl#sha256=f0037aed3ce6cf96b9e9117d42e967a74bea9ebe19088a2fdea5de93d5762fee (from https://pypi.org/simple/pip/), version: 1.5.3
      Found link https://files.pythonhosted.org/packages/55/de/671a48ad313c808623041fc475f7c8f7610401d9f573f06b40eeb84e74e3/pip-1.5.3.tar.gz#sha256=dc53b4d28b88556a37cd73052b6d1d08cc644c6724e37c4d38a2e3c03c5440b2 (from https://pypi.org/simple/pip/), version: 1.5.3
      Found link https://files.pythonhosted.org/packages/a9/9a/9aa19fe00de4c025562e5fb3796ff8520165a7dd1a5662c6ec9816e1ae99/pip-1.5.4-py2.py3-none-any.whl#sha256=fb7282556a42e84464f2e963a859ac4012d8134ba6218b70c1d82d145fcfa82f (from https://pypi.org/simple/pip/), version: 1.5.4
      Found link https://files.pythonhosted.org/packages/78/d8/6e58a7130d457edadb753a0ea5708e411c100c7e94e72ad4802feeef735c/pip-1.5.4.tar.gz#sha256=70208a250bb4afdbbdd74c3ac35d4ab9ba1eb6852d02567a6a87f2f5104e30b9 (from https://pypi.org/simple/pip/), version: 1.5.4
      Found link https://files.pythonhosted.org/packages/ce/c2/10d996b9c51b126a9f0bb9e14a9edcdd5c88888323c0685bb9b392b6c47c/pip-1.5.5-py2.py3-none-any.whl#sha256=fe7a5808190067b2598d85def9b83db46e5d64a00848ad843e107c36e1db4ae6 (from https://pypi.org/simple/pip/), version: 1.5.5
      Found link https://files.pythonhosted.org/packages/88/01/a442fde40bd9aaf837612536f16ab751fac628807fd718690795b8ade77d/pip-1.5.5.tar.gz#sha256=4b7f5124364ae9b5ba833dcd8813a84c1c06fba1d7c8543323c7af4b33188eca (from https://pypi.org/simple/pip/), version: 1.5.5
      Found link https://files.pythonhosted.org/packages/3f/08/7347ca4021e7fe0f1ab8f93cbc7d2a7a7350012300ad0e0227d55625e2b8/pip-1.5.6-py2.py3-none-any.whl#sha256=fbc1351ffedf09ca7560428758845a88d648b9730b63ce9e5df53a7c89f039a4 (from https://pypi.org/simple/pip/), version: 1.5.6
      Found link https://files.pythonhosted.org/packages/45/db/4fb9a456b4ec4d3b701456ef562b9d72d76b6358e0c1463d17db18c5b772/pip-1.5.6.tar.gz#sha256=b1a4ae66baf21b7eb05a5e4f37c50c2706fa28ea1f8780ce8efe14dcd9f1726c (from https://pypi.org/simple/pip/), version: 1.5.6
      Found link https://files.pythonhosted.org/packages/dc/7c/21191b5944b917b66e4e4e06d74f668d814b6e8a3ff7acd874479b6f6b3d/pip-6.0-py2.py3-none-any.whl#sha256=5ec6732505bd8be49fe1f8ad557b88253ffb085736396df4d6bea753fc2a8f2c (from https://pypi.org/simple/pip/), version: 6.0
      Found link https://files.pythonhosted.org/packages/38/fd/065c66a88398f240e344fdf496b9707f92d75f88eedc3d10ff847b28a657/pip-6.0.tar.gz#sha256=6103897f1bb68d3f933edd60f3e3830c4ea6b8abf7a4b500db148921b11f6c9b (from https://pypi.org/simple/pip/), version: 6.0
      Found link https://files.pythonhosted.org/packages/e9/7a/cdbc1a12ed52410d557e48d4646f4543e9e991ff32d2374dc6db849aa617/pip-6.0.1-py2.py3-none-any.whl#sha256=322aea7d1f7b9ee68ad87ac4704cad5df97f77e70668c0bd18f964c5daa78173 (from https://pypi.org/simple/pip/), version: 6.0.1
      Found link https://files.pythonhosted.org/packages/4d/c3/8675b90cd89b9b222062f4f6c7e9d48b0387f5b35cbf747a74403a883e56/pip-6.0.1.tar.gz#sha256=fa2f7c68da4a405d673aa38542f9df009d60026db4f532429ac9cbfbda1f959d (from https://pypi.org/simple/pip/), version: 6.0.1
      Found link https://files.pythonhosted.org/packages/71/3c/b5a521e5e99cfff091e282231591f21193fd80de079ec5fb8ed9c6614044/pip-6.0.2-py2.py3-none-any.whl#sha256=7d17b0f267f7c9cd17cd2924bbbe2b4a3d407322c0e09084ca3f1295c1fed50d (from https://pypi.org/simple/pip/), version: 6.0.2
      Found link https://files.pythonhosted.org/packages/4c/5a/f9e8e3de0153282c7cb54a9b991af225536ac914bac858ca664cf883bb3e/pip-6.0.2.tar.gz#sha256=6fa90667706a679e3dc75b27a51fddafa64401c45e96f8ae6c20978183290077 (from https://pypi.org/simple/pip/), version: 6.0.2
      Found link https://files.pythonhosted.org/packages/73/cb/3eebf42003791df29219a3dfa1874572aa16114b44c9b1b0ac66bf96e8c0/pip-6.0.3-py2.py3-none-any.whl#sha256=b72655b6ac6aef1c86dd07f51e8ace8d7aabd6a1c4ff88db87155276fa32a073 (from https://pypi.org/simple/pip/), version: 6.0.3
      Found link https://files.pythonhosted.org/packages/ce/63/8d99ae60d11ae1a65f5d4fc39a529a598bd3b8e067132210cb0c4d9e9f74/pip-6.0.3.tar.gz#sha256=b091a35f5fa0faffac0b27b97e1e1e93ffe63b463c2ea8dbde0c1fb987933614 (from https://pypi.org/simple/pip/), version: 6.0.3
      Found link https://files.pythonhosted.org/packages/c5/0e/c974206726542bc495fc7443dd97834a6d14c2f0cba183fcfcd01075225a/pip-6.0.4-py2.py3-none-any.whl#sha256=8dfd95de29a7a3bb1e7d368cc83d566938eb210b04d553ebfe5e3a422f4aec65 (from https://pypi.org/simple/pip/), version: 6.0.4
      Found link https://files.pythonhosted.org/packages/02/a1/c90f19910ee153d7a0efca7216758121118d7e93084276541383fe9ca82e/pip-6.0.4.tar.gz#sha256=1dbbff9c369e510c7468ab68ba52c003f68f83c99c2f8259acd51099e8799f1e (from https://pypi.org/simple/pip/), version: 6.0.4
      Found link https://files.pythonhosted.org/packages/e9/1b/c6a375a337fb576784cdea3700f6c3eaf1420f0a01458e6e034cc178a84a/pip-6.0.5-py2.py3-none-any.whl#sha256=b2c20e3a2a43b2bbb1d19ad98be27eccc7b0f0ece016da602ccaa757a862b0e2 (from https://pypi.org/simple/pip/), version: 6.0.5
      Found link https://files.pythonhosted.org/packages/19/f2/58628768f618c8c9fea878e0fb97730c0b8a838d3ab3f325768bf12dac94/pip-6.0.5.tar.gz#sha256=3bf42d28be9085ab2e9aecfd69a6da2d31563fe833304bf71a620a30c38ab8a2 (from https://pypi.org/simple/pip/), version: 6.0.5
      Found link https://files.pythonhosted.org/packages/64/fc/4a49ccb18f55a0ceeb76e8d554bd4563217117492997825d194ed0017cc1/pip-6.0.6-py2.py3-none-any.whl#sha256=fb04f8afe1ba57626783f0c8e2f3d46bbaebaa446fcf124f434e968a2fee595e (from https://pypi.org/simple/pip/), version: 6.0.6
      Found link https://files.pythonhosted.org/packages/f6/ce/d9e4e178b66c766c117f62ddf4fece019ef9d50127a8926d2f60300d615e/pip-6.0.6.tar.gz#sha256=3a14091299dcdb9bab9e9004ae67ac401f2b1b14a7c98de074ca74fdddf4bfa0 (from https://pypi.org/simple/pip/), version: 6.0.6
      Found link https://files.pythonhosted.org/packages/7a/8e/2bbd4fcf3ee06ee90ded5f39ec12f53165dfdb9ef25a981717ad38a16670/pip-6.0.7-py2.py3-none-any.whl#sha256=93a326304c7db749896bcef822bbbac1ab29dad5651c6d732e245975239890e6 (from https://pypi.org/simple/pip/), version: 6.0.7
      Found link https://files.pythonhosted.org/packages/52/85/b160ebdaa84378df6bb0176d4eed9f57edca662446174eead7a9e2e566d6/pip-6.0.7.tar.gz#sha256=35a5a43ac6b7af83ed47ea5731a365f43d350a3a7267e039e5f06b61d42ab3c2 (from https://pypi.org/simple/pip/), version: 6.0.7
      Found link https://files.pythonhosted.org/packages/63/65/55b71647adec1ad595bf0e5d76d028506dfc002df30c256f022ff7a660a5/pip-6.0.8-py2.py3-none-any.whl#sha256=3c22b0a8ff92727bd737a82f72700790591f177541df08c07bc1f90d6b72ac19 (from https://pypi.org/simple/pip/), version: 6.0.8
      Found link https://files.pythonhosted.org/packages/ef/8a/e3a980bc0a7f791d72c1302f65763ed300f2e14c907ac033e01b44c79e5e/pip-6.0.8.tar.gz#sha256=0d58487a1b7f5be2e5e965c11afbea1dc44ecec8069de03491a4d0d6c85f4551 (from https://pypi.org/simple/pip/), version: 6.0.8
      Found link https://files.pythonhosted.org/packages/24/fb/8a56a46243514681e569bbafd8146fa383476c4b7c725c8598c452366f31/pip-6.1.0-py2.py3-none-any.whl#sha256=435a018f6d29e34d4f901bf4e6860d8a5fa1816b68d62008c18ca062a306db31 (from https://pypi.org/simple/pip/), version: 6.1.0
      Found link https://files.pythonhosted.org/packages/6c/84/432eb60bbcb414b9cdfcb135d5f4925e253c74e7d6916ada79990d6cc1a0/pip-6.1.0.tar.gz#sha256=89f120e2ab3d25ab70c36eb28ad4f280fc9ba71736e74d3055f609c1f9173768 (from https://pypi.org/simple/pip/), version: 6.1.0
      Found link https://files.pythonhosted.org/packages/67/f0/ba0fb41dbdbfc4aa3e0c16b40269aca6b9e3d59cacdb646218aa2e9b1d2c/pip-6.1.1-py2.py3-none-any.whl#sha256=a67e54aa0f26b6d62ccec5cc6735eff205dd0fed075f56ac3d3111e91e4467fc (from https://pypi.org/simple/pip/), version: 6.1.1
      Found link https://files.pythonhosted.org/packages/bf/85/871c126b50b8ee0b9819e8a63b614aedd264577e73478caedcd447e8f28c/pip-6.1.1.tar.gz#sha256=89f3b626d225e08e7f20d85044afa40f612eb3284484169813dc2d0631f2a556 (from https://pypi.org/simple/pip/), version: 6.1.1
      Found link https://files.pythonhosted.org/packages/5a/9b/56d3c18d0784d5f2bbd446ea2dc7ffa7476c35e3dc223741d20cfee3b185/pip-7.0.0-py2.py3-none-any.whl#sha256=309c48399c7d68501a10ef206abd6e5c541fedbf84b95435d9063bd454b39df7 (from https://pypi.org/simple/pip/), version: 7.0.0
      Found link https://files.pythonhosted.org/packages/c6/16/6475b142927ca5d03e3b7968efa5b0edd103e4684ecfde181a25f6fa2505/pip-7.0.0.tar.gz#sha256=7b46bfc1b95494731de306a688e2a7bc056d7fa7ad27e026908fb2ae67fed23d (from https://pypi.org/simple/pip/), version: 7.0.0
      Found link https://files.pythonhosted.org/packages/5a/10/bb7a32c335bceba636aa673a4c977effa1e73a79f88856459486d8d670cf/pip-7.0.1-py2.py3-none-any.whl#sha256=d26b8573ba1ac1ec99a9bdbdffee2ff2b06c7790815211d0eb4dc1462a089705 (from https://pypi.org/simple/pip/), version: 7.0.1
      Found link https://files.pythonhosted.org/packages/4a/83/9ae4362a80739657e0c8bb628ea3fa0214a9aba7c8590dacc301ea293f73/pip-7.0.1.tar.gz#sha256=cfec177552fdd0b2d12b72651c8e874f955b4c62c1c2c9f2588cbdc1c0d0d416 (from https://pypi.org/simple/pip/), version: 7.0.1
      Found link https://files.pythonhosted.org/packages/64/7f/7107800ae0919a80afbf1ecba21b90890431c3ee79d700adac3c79cb6497/pip-7.0.2-py2.py3-none-any.whl#sha256=83c869c5ab7113866e2d69641ec470d47f0faae68ca4550a289a4d3db515ad65 (from https://pypi.org/simple/pip/), version: 7.0.2
      Found link https://files.pythonhosted.org/packages/75/b1/66532c273bca0133e42c3b4540a1609289f16e3046f1830f18c60794d661/pip-7.0.2.tar.gz#sha256=ba28fa60b573a9444e7b78ccb3b0f261d1f66f46d20403f9dce37b18a6aed405 (from https://pypi.org/simple/pip/), version: 7.0.2
      Found link https://files.pythonhosted.org/packages/96/76/33a598ae42dd0554207d83c7acc60e3b166dbde723cbf282f1f73b7a127c/pip-7.0.3-py2.py3-none-any.whl#sha256=7b1cb03e827d58d2d05e68ea96a9e27487ed4b0afcd951ac6e40847ce94f0738 (from https://pypi.org/simple/pip/), version: 7.0.3
      Found link https://files.pythonhosted.org/packages/35/59/5b23115758ba0f2fc465c459611865173ef006202ba83f662d1f58ed2fb8/pip-7.0.3.tar.gz#sha256=b4c598825a6f6dc2cac65968feb28e6be6c1f7f1408493c60a07eaa731a0affd (from https://pypi.org/simple/pip/), version: 7.0.3
      Found link https://files.pythonhosted.org/packages/f7/c0/9f8dac88326609b4b12b304e8382f64f7d5af7735a00d2fac36cf135fc30/pip-7.1.0-py2.py3-none-any.whl#sha256=80c29f899d3a00a448d65f8158544d22935baec7159af8da1a4fa1490ced481d (from https://pypi.org/simple/pip/), version: 7.1.0
      Found link https://files.pythonhosted.org/packages/7e/71/3c6ece07a9a885650aa6607b0ebfdf6fc9a3ef8691c44b5e724e4eee7bf2/pip-7.1.0.tar.gz#sha256=d5275ba3221182a5dd1b6bcfbfc5ec277fb399dd23226d6fa018048f7e0f10f2 (from https://pypi.org/simple/pip/), version: 7.1.0
      Found link https://files.pythonhosted.org/packages/1c/56/094d563c508917081bccff365e4f621ba33073c1c13aca9267a43cfcaf13/pip-7.1.1-py2.py3-none-any.whl#sha256=ce13000878d34c1178af76cb8cf269e232c00508c78ed46c165dd5b0881615f4 (from https://pypi.org/simple/pip/), version: 7.1.1
      Found link https://files.pythonhosted.org/packages/3b/bb/b3f2a95494fd3f01d3b3ae530e7c0e910dc25e88e30787b0a5e10cbc0640/pip-7.1.1.tar.gz#sha256=b22fe3c93a13fc7c04f145a42fd2ad50a9e3e1b8a7eed2e2b1c66e540a0951da (from https://pypi.org/simple/pip/), version: 7.1.1
      Found link https://files.pythonhosted.org/packages/b2/d0/cd115fe345dd6f07ec1c780020a7dfe74966fceeb171e0f20d1d4905b0b7/pip-7.1.2-py2.py3-none-any.whl#sha256=b9d3983b5cce04f842175e30169d2f869ef12c3546fd274083a65eada4e9708c (from https://pypi.org/simple/pip/), version: 7.1.2
      Found link https://files.pythonhosted.org/packages/d0/92/1e8406c15d9372084a5bf79d96da3a0acc4e7fcf0b80020a4820897d2a5c/pip-7.1.2.tar.gz#sha256=ca047986f0528cfa975a14fb9f7f106271d4e0c3fe1ddced6c1db2e7ae57a477 (from https://pypi.org/simple/pip/), version: 7.1.2
      Found link https://files.pythonhosted.org/packages/00/ae/bddef02881ee09c6a01a0d6541aa6c75a226a4e68b041be93142befa0cd6/pip-8.0.0-py2.py3-none-any.whl#sha256=262ed1823eb7fbe3f18a9bedb4800e59c4ab9a6682aff8c37b5ee83ea840910b (from https://pypi.org/simple/pip/), version: 8.0.0
      Found link https://files.pythonhosted.org/packages/e3/2d/03c014d11e66628abf2fda5ca00f779cbe7b5292c5cd13d42a95b94aa9b8/pip-8.0.0.tar.gz#sha256=90112b296152f270cb8dddcd19b7b87488d9e002e8cf622e14c4da9c2f6319b1 (from https://pypi.org/simple/pip/), version: 8.0.0
      Found link https://files.pythonhosted.org/packages/45/9c/6f9a24917c860873e2ce7bd95b8f79897524353df51d5d920cd6b6c1ec33/pip-8.0.1-py2.py3-none-any.whl#sha256=dedaac846bc74e38a3253671f51a056331ffca1da70e3f48d8128f2aa0635bba (from https://pypi.org/simple/pip/), version: 8.0.1
      Found link https://files.pythonhosted.org/packages/ea/66/a3d6187bd307159fedf8575c0d9ee2294d13b1cdd11673ca812e6a2dda8f/pip-8.0.1.tar.gz#sha256=477c50b3e538a7ac0fa611fb8b877b04b33fb70d325b12a81b9dbf3eb1158a4d (from https://pypi.org/simple/pip/), version: 8.0.1
      Found link https://files.pythonhosted.org/packages/e7/a0/bd35f5f978a5e925953ce02fa0f078a232f0f10fcbe543d8cfc043f74fda/pip-8.0.2-py2.py3-none-any.whl#sha256=249a6f3194be8c2e8cb4d4be3f6fd16a9f1e3336218caffa8e7419e3816f9988 (from https://pypi.org/simple/pip/), version: 8.0.2
      Found link https://files.pythonhosted.org/packages/ce/15/ee1f9a84365423e9ef03d0f9ed0eba2fb00ac1fffdd33e7b52aea914d0f8/pip-8.0.2.tar.gz#sha256=46f4bd0d8dfd51125a554568d646fe4200a3c2c6c36b9f2d06d2212148439521 (from https://pypi.org/simple/pip/), version: 8.0.2
      Found link https://files.pythonhosted.org/packages/ae/d4/2b127310f5364610b74c28e2e6a40bc19e2d3c9a9a4e012d3e333e767c99/pip-8.0.3-py2.py3-none-any.whl#sha256=b0335bc837f9edb5aad03bd43d0973b084a1cbe616f8188dc23ba13234dbd552 (from https://pypi.org/simple/pip/), version: 8.0.3
      Found link https://files.pythonhosted.org/packages/22/f3/14bc87a4f6b5ec70b682765978a6f3105bf05b6781fa97e04d30138bd264/pip-8.0.3.tar.gz#sha256=30f98b66f3fe1069c529a491597d34a1c224a68640c82caf2ade5f88aa1405e8 (from https://pypi.org/simple/pip/), version: 8.0.3
      Found link https://files.pythonhosted.org/packages/1e/c7/78440b3fb882ed001e6e12d8770bd45e73d6eced4e57f7c072b829ce8a3d/pip-8.1.0-py2.py3-none-any.whl#sha256=a542b99e08002ead83200198e19a3983270357e1cb4fe704247990b5b35471dc (from https://pypi.org/simple/pip/), version: 8.1.0
      Found link https://files.pythonhosted.org/packages/3c/72/6981d5adf880adecb066a1a1a4c312a17f8d787a3b85446967964ac66d55/pip-8.1.0.tar.gz#sha256=d8faa75dd7d0737b16d50cd0a56dc91a631c79ecfd8d38b80f6ee929ec82043e (from https://pypi.org/simple/pip/), version: 8.1.0
      Found link https://files.pythonhosted.org/packages/31/6a/0f19a7edef6c8e5065f4346137cc2a08e22e141942d66af2e1e72d851462/pip-8.1.1-py2.py3-none-any.whl#sha256=44b9c342782ab905c042c207d995aa069edc02621ddbdc2b9f25954a0fdac25c (from https://pypi.org/simple/pip/), version: 8.1.1
      Found link https://files.pythonhosted.org/packages/41/27/9a8d24e1b55bd8c85e4d022da2922cb206f183e2d18fee4e320c9547e751/pip-8.1.1.tar.gz#sha256=3e78d3066aaeb633d185a57afdccf700aa2e660436b4af618bcb6ff0fa511798 (from https://pypi.org/simple/pip/), version: 8.1.1
      Found link https://files.pythonhosted.org/packages/9c/32/004ce0852e0a127f07f358b715015763273799bd798956fa930814b60f39/pip-8.1.2-py2.py3-none-any.whl#sha256=6464dd9809fb34fc8df2bf49553bb11dac4c13d2ffa7a4f8038ad86a4ccb92a1 (from https://pypi.org/simple/pip/), version: 8.1.2
      Found link https://files.pythonhosted.org/packages/e7/a8/7556133689add8d1a54c0b14aeff0acb03c64707ce100ecd53934da1aa13/pip-8.1.2.tar.gz#sha256=4d24b03ffa67638a3fa931c09fd9e0273ffa904e95ebebe7d4b1a54c93d7b732 (from https://pypi.org/simple/pip/), version: 8.1.2
      Found link https://files.pythonhosted.org/packages/3f/ef/935d9296acc4f48d1791ee56a73781271dce9712b059b475d3f5fa78487b/pip-9.0.0-py2.py3-none-any.whl#sha256=c856ac18ca01e7127456f831926dc67cc7d3ab663f4c13b1ec156e36db4de574 (from https://pypi.org/simple/pip/) (requires-python:>=2.6,!=3.0.*,!=3.1.*,!=3.2.*), version: 9.0.0
      Found link https://files.pythonhosted.org/packages/5e/53/eaef47e5e2f75677c9de0737acc84b659b78a71c4086f424f55346a341b5/pip-9.0.0.tar.gz#sha256=f62fb70e7e000e46fce12aaeca752e5281a5446977fe5a75ab4189a43b3f8793 (from https://pypi.org/simple/pip/) (requires-python:>=2.6,!=3.0.*,!=3.1.*,!=3.2.*), version: 9.0.0
      Found link https://files.pythonhosted.org/packages/b6/ac/7015eb97dc749283ffdec1c3a88ddb8ae03b8fad0f0e611408f196358da3/pip-9.0.1-py2.py3-none-any.whl#sha256=690b762c0a8460c303c089d5d0be034fb15a5ea2b75bdf565f40421f542fefb0 (from https://pypi.org/simple/pip/) (requires-python:>=2.6,!=3.0.*,!=3.1.*,!=3.2.*), version: 9.0.1
      Found link https://files.pythonhosted.org/packages/11/b6/abcb525026a4be042b486df43905d6893fb04f05aac21c32c638e939e447/pip-9.0.1.tar.gz#sha256=09f243e1a7b461f654c26a725fa373211bb7ff17a9300058b205c61658ca940d (from https://pypi.org/simple/pip/) (requires-python:>=2.6,!=3.0.*,!=3.1.*,!=3.2.*), version: 9.0.1
      Found link https://files.pythonhosted.org/packages/e7/f9/e801dcea22886cd513f6bd2e8f7e581bd6f67bb8e8f1cd8e7b92d8539280/pip-9.0.2-py2.py3-none-any.whl#sha256=b135491ddb061f39719b8472d8abb59c613816a2b86069c332db74d1cd208ab2 (from https://pypi.org/simple/pip/) (requires-python:>=2.6,!=3.0.*,!=3.1.*,!=3.2.*), version: 9.0.2
      Found link https://files.pythonhosted.org/packages/e5/8f/3fc66461992dc9e9fcf5e005687d5f676729172dda640df2fd8b597a6da7/pip-9.0.2.tar.gz#sha256=88110a224e9d30e5d76592a0b2130ef10e7e67a6426e8617bb918fffbfe91fe5 (from https://pypi.org/simple/pip/) (requires-python:>=2.6,!=3.0.*,!=3.1.*,!=3.2.*), version: 9.0.2
      Found link https://files.pythonhosted.org/packages/ac/95/a05b56bb975efa78d3557efa36acaf9cf5d2fd0ee0062060493687432e03/pip-9.0.3-py2.py3-none-any.whl#sha256=c3ede34530e0e0b2381e7363aded78e0c33291654937e7373032fda04e8803e5 (from https://pypi.org/simple/pip/) (requires-python:>=2.6,!=3.0.*,!=3.1.*,!=3.2.*), version: 9.0.3
      Found link https://files.pythonhosted.org/packages/c4/44/e6b8056b6c8f2bfd1445cc9990f478930d8e3459e9dbf5b8e2d2922d64d3/pip-9.0.3.tar.gz#sha256=7bf48f9a693be1d58f49f7af7e0ae9fe29fd671cde8a55e6edca3581c4ef5796 (from https://pypi.org/simple/pip/) (requires-python:>=2.6,!=3.0.*,!=3.1.*,!=3.2.*), version: 9.0.3
      Found link https://files.pythonhosted.org/packages/4b/5a/8544ae02a5bd28464e03af045e8aabde20a7b02db1911a9159328e1eb25a/pip-10.0.0b1-py2.py3-none-any.whl#sha256=dbd5d24cd461be23429625085a36cc8732cbcac4d2aaf673031f80f6ac07d844 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*), version: 10.0.0b1
      Found link https://files.pythonhosted.org/packages/aa/6d/ffbb86abf18b750fb26f27eda7c7732df2aacaa669c420d2eb2ad6df3458/pip-10.0.0b1.tar.gz#sha256=8d6e63d8b99752e4b53f272b66f9cd7b59e2b288e9a863a61c48d167203a2656 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*), version: 10.0.0b1
      Found link https://files.pythonhosted.org/packages/97/72/1d514201e7d7fc7fff5aac3de9c7b892cd72fb4bf23fd983630df96f7412/pip-10.0.0b2-py2.py3-none-any.whl#sha256=79f55588912f1b2b4f86f96f11e329bb01b25a484e2204f245128b927b1038a7 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*), version: 10.0.0b2
      Found link https://files.pythonhosted.org/packages/32/67/572f642e6e42c580d3154964cfbab7d9322c23b0f417c6c01fdd206a2777/pip-10.0.0b2.tar.gz#sha256=ad6adec2150ce4aed8f6134d9b77d928fc848dbcb887fb1a455988cf99da5cae (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*), version: 10.0.0b2
      Found link https://files.pythonhosted.org/packages/62/a1/0d452b6901b0157a0134fd27ba89bf95a857fbda64ba52e1ca2cf61d8412/pip-10.0.0-py2.py3-none-any.whl#sha256=86a60a96d85e329962a9e6f6af612cbc11106293dbc83f119802b5bee9874cf3 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*), version: 10.0.0
      Found link https://files.pythonhosted.org/packages/e0/69/983a8e47d3dfb51e1463c1e962b2ccd1d74ec4e236e232625e353d830ed2/pip-10.0.0.tar.gz#sha256=f05a3eeea64bce94e85cc6671d679473d66288a4d37c3fcf983584954096b34f (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*), version: 10.0.0
      Found link https://files.pythonhosted.org/packages/0f/74/ecd13431bcc456ed390b44c8a6e917c1820365cbebcb6a8974d1cd045ab4/pip-10.0.1-py2.py3-none-any.whl#sha256=717cdffb2833be8409433a93746744b59505f42146e8d37de6c62b430e25d6d7 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*), version: 10.0.1
      Found link https://files.pythonhosted.org/packages/ae/e8/2340d46ecadb1692a1e455f13f75e596d4eab3d11a57446f08259dee8f02/pip-10.0.1.tar.gz#sha256=f2bd08e0cd1b06e10218feaf6fef299f473ba706582eb3bd9d52203fdbd7ee68 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*), version: 10.0.1
      Found link https://files.pythonhosted.org/packages/5f/25/e52d3f31441505a5f3af41213346e5b6c221c9e086a166f3703d2ddaf940/pip-18.0-py2.py3-none-any.whl#sha256=070e4bf493c7c2c9f6a08dd797dd3c066d64074c38e9e8a0fb4e6541f266d96c (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*), version: 18.0
      Found link https://files.pythonhosted.org/packages/69/81/52b68d0a4de760a2f1979b0931ba7889202f302072cc7a0d614211bc7579/pip-18.0.tar.gz#sha256=a0e11645ee37c90b40c46d607070c4fd583e2cd46231b1c06e389c5e814eed76 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*), version: 18.0
      Found link https://files.pythonhosted.org/packages/c2/d7/90f34cb0d83a6c5631cf71dfe64cc1054598c843a92b400e55675cc2ac37/pip-18.1-py2.py3-none-any.whl#sha256=7909d0a0932e88ea53a7014dfd14522ffef91a464daaaf5c573343852ef98550 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*), version: 18.1
      Found link https://files.pythonhosted.org/packages/45/ae/8a0ad77defb7cc903f09e551d88b443304a9bd6e6f124e75c0fbbf6de8f7/pip-18.1.tar.gz#sha256=c0a292bd977ef590379a3f05d7b7f65135487b67470f6281289a94e015650ea1 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*), version: 18.1
      Found link https://files.pythonhosted.org/packages/60/64/73b729587b6b0d13e690a7c3acd2231ee561e8dd28a58ae1b0409a5a2b20/pip-19.0-py2.py3-none-any.whl#sha256=249ab0de4c1cef3dba4cf3f8cca722a07fc447b1692acd9f84e19c646db04c9a (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*), version: 19.0
      Found link https://files.pythonhosted.org/packages/11/31/c483614095176ddfa06ac99c2af4171375053b270842c7865ca0b4438dc1/pip-19.0.tar.gz#sha256=c82bf8bc00c5732f0dd49ac1dea79b6242a1bd42a5012e308ed4f04369b17e54 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*), version: 19.0
      Found link https://files.pythonhosted.org/packages/46/dc/7fd5df840efb3e56c8b4f768793a237ec4ee59891959d6a215d63f727023/pip-19.0.1-py2.py3-none-any.whl#sha256=aae79c7afe895fb986ec751564f24d97df1331bb99cdfec6f70dada2f40c0044 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*), version: 19.0.1
      Found link https://files.pythonhosted.org/packages/c8/89/ad7f27938e59db1f0f55ce214087460f65048626e2226531ba6cb6da15f0/pip-19.0.1.tar.gz#sha256=e81ddd35e361b630e94abeda4a1eddd36d47a90e71eb00f38f46b57f787cd1a5 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*), version: 19.0.1
      Found link https://files.pythonhosted.org/packages/d7/41/34dd96bd33958e52cb4da2f1bf0818e396514fd4f4725a79199564cd0c20/pip-19.0.2-py2.py3-none-any.whl#sha256=6a59f1083a63851aeef60c7d68b119b46af11d9d803ddc1cf927b58edcd0b312 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*), version: 19.0.2
      Found link https://files.pythonhosted.org/packages/4c/4d/88bc9413da11702cbbace3ccc51350ae099bb351febae8acc85fec34f9af/pip-19.0.2.tar.gz#sha256=f851133f8b58283fa50d8c78675eb88d4ff4cde29b6c41205cd938b06338e0e5 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*), version: 19.0.2
      Found link https://files.pythonhosted.org/packages/d8/f3/413bab4ff08e1fc4828dfc59996d721917df8e8583ea85385d51125dceff/pip-19.0.3-py2.py3-none-any.whl#sha256=bd812612bbd8ba84159d9ddc0266b7fbce712fc9bc98c82dee5750546ec8ec64 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*), version: 19.0.3
      Found link https://files.pythonhosted.org/packages/36/fa/51ca4d57392e2f69397cd6e5af23da2a8d37884a605f9e3f2d3bfdc48397/pip-19.0.3.tar.gz#sha256=6e6f197a1abfb45118dbb878b5c859a0edbdd33fd250100bc015b67fded4b9f2 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*), version: 19.0.3
      Found link https://files.pythonhosted.org/packages/f9/fb/863012b13912709c13cf5cfdbfb304fa6c727659d6290438e1a88df9d848/pip-19.1-py2.py3-none-any.whl#sha256=8f59b6cf84584d7962d79fd1be7a8ec0eb198aa52ea864896551736b3614eee9 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*), version: 19.1
      Found link https://files.pythonhosted.org/packages/51/5f/802a04274843f634469ef299fcd273de4438386deb7b8681dd059f0ee3b7/pip-19.1.tar.gz#sha256=d9137cb543d8a4d73140a3282f6d777b2e786bb6abb8add3ac5b6539c82cd624 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*), version: 19.1
      Found link https://files.pythonhosted.org/packages/5c/e0/be401c003291b56efc55aeba6a80ab790d3d4cece2778288d65323009420/pip-19.1.1-py2.py3-none-any.whl#sha256=993134f0475471b91452ca029d4390dc8f298ac63a712814f101cd1b6db46676 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*), version: 19.1.1
      Found link https://files.pythonhosted.org/packages/93/ab/f86b61bef7ab14909bd7ec3cd2178feb0a1c86d451bc9bccd5a1aedcde5f/pip-19.1.1.tar.gz#sha256=44d3d7d3d30a1eb65c7e5ff1173cdf8f7467850605ac7cc3707b6064bddd0958 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*), version: 19.1.1
      Found link https://files.pythonhosted.org/packages/3a/6f/35de4f49ae5c7fdb2b64097ab195020fb48faa8ad3a85386ece6953c11b1/pip-19.2-py2.py3-none-any.whl#sha256=468c67b0b1120cd0329dc72972cf0651310783a922e7609f3102bd5fb4acbf17 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*), version: 19.2
      Found link https://files.pythonhosted.org/packages/41/13/b6e68eae78405af6e4e9a93319ae5bb371057786f1590b157341f7542d7d/pip-19.2.tar.gz#sha256=aa6fdd80d13caac75d92b5eced06778712859b1606ba92d62389c11be12b2dad (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*), version: 19.2
      Found link https://files.pythonhosted.org/packages/62/ca/94d32a6516ed197a491d17d46595ce58a83cbb2fca280414e57cd86b84dc/pip-19.2.1-py2.py3-none-any.whl#sha256=80d7452630a67c1e7763b5f0a515690f2c1e9ad06dda48e0ae85b7fdf2f59d97 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*), version: 19.2.1
      Found link https://files.pythonhosted.org/packages/8b/8a/1b2aadd922db1afe6bc107b03de41d6d37a28a5923383e60695fba24ae81/pip-19.2.1.tar.gz#sha256=258d702483dd749400aec59c23d638a5b2249ae28a0f478b6cab12ad45681a80 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*), version: 19.2.1
      Found link https://files.pythonhosted.org/packages/8d/07/f7d7ced2f97ca3098c16565efbe6b15fafcba53e8d9bdb431e09140514b0/pip-19.2.2-py2.py3-none-any.whl#sha256=4b956bd8b7b481fc5fa222637ff6d0823a327e5118178f1ec47618a480e61997 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*), version: 19.2.2
      Found link https://files.pythonhosted.org/packages/aa/1a/62fb0b95b1572c76dbc3cc31124a8b6866cbe9139eb7659ac7349457cf7c/pip-19.2.2.tar.gz#sha256=e05103825871e210d50a44c7e448587b0ed99dd775d3ef586304c58f40224a53 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*), version: 19.2.2
      Found link https://files.pythonhosted.org/packages/30/db/9e38760b32e3e7f40cce46dd5fb107b8c73840df38f0046d8e6514e675a1/pip-19.2.3-py2.py3-none-any.whl#sha256=340a0ba40fdeb16413914c0fcd8e0b4ebb0bf39a900ec80e11c05d836c05103f (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*), version: 19.2.3
      Found link https://files.pythonhosted.org/packages/00/9e/4c83a0950d8bdec0b4ca72afd2f9cea92d08eb7c1a768363f2ea458d08b4/pip-19.2.3.tar.gz#sha256=e7a31f147974362e6c82d84b91c7f2bdf57e4d3163d3d454e6c3e71944d67135 (from https://pypi.org/simple/pip/) (requires-python:>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*), version: 19.2.3
    Given no hashes to check 127 links for project 'pip': discarding no candidates
    
    opened by Jack-Lin-DS-AI 15
  • Optimize peer memory halo exchange kernel for latency

    Optimize peer memory halo exchange kernel for latency

    This PR optimizes the peer memory halo exchange to minimize latency:

    • Each CUDA thread performs its communication independently, which allows us to avoid cooperative group syncs. With enough threads, we avoid memory fences entirely. The downside is that we can only achieve 50% bandwidth since we pack data and flags into the same communication packets.
    • Use an alternating double-buffer scheme, which allows push-only communication.
    • The kernel does multiple rounds of communication in the cases where the number of SMs or the peer buffers are too small. This is important since the optimal peer memory requirements have increased by 4x, or more if the tensors are not nicely NHWC. The only strict requirement is that the peer buffer sizes are aligned to 4096B.

    Running on two DGX-1 V100s: | Version | Data type | Kernel time | |-----------|-------|--------------------| | Baseline | FP32 | 19.9 us | | Baseline | FP16 | 17.2 us | | Optimized | FP32 | 15.2 us | | Optimized | FP16 | 11.4 us |

    Details
    • I performed the halo exchange required for 3x3 convolution on a [1,336,336,64] NHWC tensor.
    • To isolate the kernel time from the communication time, I delayed GPU 0 so it would always launch the kernel well after GPU 1.

    Running the Mask R-CNN benchmark:

    | GPUs | Step time | |------|---------------| | 1 | 37.7 ms | | 2 | 34.8 ms | | 4 | 33.6 ms |

    ~The unit tests for the halo exchanger and the bottleneck module all pass for me on DGX-1 V100s.~ Pinging @thorjohnsen.

    opened by timmoon10 1
  • Update megatron fused softmax follow megatron-lm

    Update megatron fused softmax follow megatron-lm

    We updated megatron fused softmax in the following aspects:

    1. We updated the limit of sequence length from 2048 to 4096 according to https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/model/fused_softmax.py#L171
    2. We also enabled mask=None support in scaled_masked_softmax according to https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/model/fused_softmax.py#L84
    opened by yaoyu-33 1
  • Get error when installing apex from the source

    Get error when installing apex from the source

    Hi,

    I am trying to install apex in my virtual env using the commands you put to install apex from the source. However, I got the following error attached in the picture. Could you please help me resolve this issue?

    Error I get: https://drive.google.com/file/d/1GMUqBTfNbNkVAOTR-LWhLdz-z0qOoKcp/view?usp=share_link

    Based on the description of the error, it might misses an installation command to install the requirements file like "requirements.txt". I also noticed that the "setup.py" file doesn't have any command related to installation of the "requirements.txt" or "requirements_dev.txt". Am I right?

    opened by MHVali 0
  • It just compiles without <torch/torch.h> and <torch/extension.h> on Windows with CUDA 11.6 and torch 1.12.1

    It just compiles without and on Windows with CUDA 11.6 and torch 1.12.1

    Since cuda 11.5, nvcc doesn't support to have <torch/extension.h> in .cu, read this: https://pytorch.org/tutorials/advanced/cpp_extension.html Following is my modifications (removing those <torch/extension.h> and <torch/torch.h> and change namespace torch back to aten)

    diff --git a/csrc/fused_dense_cuda.cu b/csrc/fused_dense_cuda.cu
    index c12d264..25cad99 100644
    --- a/csrc/fused_dense_cuda.cu
    +++ b/csrc/fused_dense_cuda.cu
    @@ -4,7 +4,6 @@
     #include <stdio.h>
     #include <stdlib.h>
     #include <string.h>
    -#include <torch/torch.h>
    
     /* Includes, cuda */
     #include <cublas_v2.h>
    @@ -178,7 +177,7 @@ int gemm_bias_lt(
           goto CLEANUP;
         }
           epilogue = CUBLASLT_EPILOGUE_BIAS;
    -  }
    +  }
    
       status = cublasLtMatmulDescSetAttribute(&operationDesc, CUBLASLT_MATMUL_DESC_EPILOGUE, &epilogue, sizeof(epilogue));
       if (status != CUBLAS_STATUS_SUCCESS) {
    @@ -423,7 +422,7 @@ int gemm_bias_gelu_lt(
       if (status != CUBLAS_STATUS_SUCCESS) goto CLEANUP;
       status = cublasLtMatmulDescSetAttribute(&operationDesc, CUBLASLT_MATMUL_DESC_TRANSB, &transb, sizeof(transa));
       if (status != CUBLAS_STATUS_SUCCESS) goto CLEANUP;
    -
    +
       status = cublasLtMatmulDescSetAttribute(&operationDesc, CUBLASLT_MATMUL_DESC_EPILOGUE_AUX_POINTER, &gelu_in, sizeof(gelu_in));
       status = cublasLtMatmulDescSetAttribute(&operationDesc, CUBLASLT_MATMUL_DESC_EPILOGUE_AUX_LD, &ldc, sizeof(ldc));
    
    @@ -433,7 +432,7 @@ int gemm_bias_gelu_lt(
           goto CLEANUP;
         }
           epilogue = CUBLASLT_EPILOGUE_GELU_AUX_BIAS;
    -  }
    +  }
    
       status = cublasLtMatmulDescSetAttribute(&operationDesc, CUBLASLT_MATMUL_DESC_EPILOGUE, &epilogue, sizeof(epilogue));
       if (status != CUBLAS_STATUS_SUCCESS) {
    @@ -563,7 +562,7 @@ int gemm_bias_gelu_lt(
       if (status != CUBLAS_STATUS_SUCCESS) goto CLEANUP;
       status = cublasLtMatmulDescSetAttribute(&operationDesc, CUBLASLT_MATMUL_DESC_TRANSB, &transb, sizeof(transa));
       if (status != CUBLAS_STATUS_SUCCESS) goto CLEANUP;
    -
    +
       status = cublasLtMatmulDescSetAttribute(&operationDesc, CUBLASLT_MATMUL_DESC_EPILOGUE_AUX_POINTER, &gelu_in, sizeof(gelu_in));
       status = cublasLtMatmulDescSetAttribute(&operationDesc, CUBLASLT_MATMUL_DESC_EPILOGUE_AUX_LD, &ldc, sizeof(ldc));
    
    @@ -573,7 +572,7 @@ int gemm_bias_gelu_lt(
           goto CLEANUP;
         }
           epilogue = CUBLASLT_EPILOGUE_GELU_AUX_BIAS;
    -  }
    +  }
    
       status = cublasLtMatmulDescSetAttribute(&operationDesc, CUBLASLT_MATMUL_DESC_EPILOGUE, &epilogue, sizeof(epilogue));
       if (status != CUBLAS_STATUS_SUCCESS) {
    @@ -685,7 +684,7 @@ int gemm_bgradb_lt(
           goto CLEANUP;
         }
           epilogue = CUBLASLT_EPILOGUE_BGRADB;
    -  }
    +  }
    
       status = cublasLtMatmulDescSetAttribute(&operationDesc, CUBLASLT_MATMUL_DESC_EPILOGUE, &epilogue, sizeof(epilogue));
       if (status != CUBLAS_STATUS_SUCCESS) {
    @@ -1191,7 +1190,7 @@ int linear_bias_forward_cuda(at::Tensor input, T *weight, at::Tensor bias, int i
         return status;
     }
    
    -
    +
     template <typename T>
     int linear_bias_backward_cuda(T *input, T *weight, T *d_output, int in_features, int batch_size, int out_features, T *d_weight, T *d_bias, T *d_input,  void *lt_workspace) {
         cublasHandle_t handle = at::cuda::getCurrentCUDABlasHandle();
    @@ -1223,10 +1222,10 @@ int linear_bias_backward_cuda(T *input, T *weight, T *d_output, int in_features,
         true,
         static_cast<const void*>(d_bias));
     #endif
    -
    +
    
         if (status != 0){
    -
    +
             status = gemm_bias(
               handle,
               CUBLAS_OP_N,
    @@ -1243,7 +1242,7 @@ int linear_bias_backward_cuda(T *input, T *weight, T *d_output, int in_features,
               d_weight,
               in_features);
         }
    -
    +
         status = gemm_bias(
           handle,
           CUBLAS_OP_N,
    @@ -1315,7 +1314,7 @@ int linear_gelu_linear_forward_cuda(T *input, T *weight1, T *bias1, T *weight2,
         true,
         static_cast<const void*>(bias2));
         return status;
    -#else
    +#else
         return 1;
     #endif
     }
    diff --git a/csrc/megatron/fused_weight_gradient_dense_16bit_prec_cuda.cu b/csrc/megatron/fused_weight_gradient_dense_16bit_prec_cuda.cu
    index 60d1e8d..0e23610 100644
    --- a/csrc/megatron/fused_weight_gradient_dense_16bit_prec_cuda.cu
    +++ b/csrc/megatron/fused_weight_gradient_dense_16bit_prec_cuda.cu
    @@ -3,7 +3,6 @@
     #include <cstdlib>
     #include <cstring>
    
    -#include <torch/extension.h>
     #include <ATen/ATen.h>
     #include <ATen/cuda/CUDAContext.h>
    
    @@ -113,7 +112,7 @@ void wgrad_gemm_accum_fp16_cuda(T *input, T *d_output, T *d_weight, int in_dim,
             &beta,
             d_weight,
             in_dim);
    -}
    +}
    
     template void wgrad_gemm_accum_fp16_cuda<at::Half>(at::Half *input, at::Half *d_output, at::Half *d_weight, int in_dim, int hidden_dim, int out_dim);
     template void wgrad_gemm_accum_fp16_cuda<at::BFloat16>(at::BFloat16 *input, at::BFloat16 *d_output, at::BFloat16 *d_weight, int in_dim, int hidden_dim, int out_dim);
    diff --git a/csrc/megatron/fused_weight_gradient_dense_cuda.cu b/csrc/megatron/fused_weight_gradient_dense_cuda.cu
    index dfaa134..2e02ea0 100644
    --- a/csrc/megatron/fused_weight_gradient_dense_cuda.cu
    +++ b/csrc/megatron/fused_weight_gradient_dense_cuda.cu
    @@ -3,7 +3,6 @@
     #include <cstdlib>
     #include <cstring>
    
    -#include <torch/extension.h>
     #include <ATen/ATen.h>
     #include <ATen/cuda/CUDAContext.h>
    
    diff --git a/csrc/megatron/generic_scaled_masked_softmax_cuda.cu b/csrc/megatron/generic_scaled_masked_softmax_cuda.cu
    index 93cd94b..ffeea3a 100644
    --- a/csrc/megatron/generic_scaled_masked_softmax_cuda.cu
    +++ b/csrc/megatron/generic_scaled_masked_softmax_cuda.cu
    @@ -20,7 +20,6 @@
     #include <cuda_fp16.h>
     #include <cuda_profiler_api.h>
     #include <ATen/cuda/CUDAContext.h>
    -#include <torch/extension.h>
     #include "generic_scaled_masked_softmax.h"
     #include "type_shim.h"
    
    @@ -28,9 +27,9 @@ namespace multihead_attn {
     namespace fused_softmax {
     namespace generic_scaled_masked_softmax {
    
    -torch::Tensor fwd_cuda(
    -    torch::Tensor const& input,
    -    torch::Tensor const& mask,
    +at::Tensor fwd_cuda(
    +    at::Tensor const& input,
    +    at::Tensor const& mask,
         float scale_factor)
     {
       // input is a 4d tensor with dimensions [batches, attn_heads, seq_len, seq_len]
    @@ -44,10 +43,10 @@ torch::Tensor fwd_cuda(
       TORCH_INTERNAL_ASSERT(mask.size(2) == query_seq_len);
       TORCH_INTERNAL_ASSERT(mask.size(3) == key_seq_len);
    
    -  // Output
    +  // Output
       auto act_options = input.options().requires_grad(false);
    -  torch::Tensor softmax_results =
    -      torch::empty({batches, attn_heads, query_seq_len, key_seq_len}, act_options);
    +  at::Tensor softmax_results =
    +      at::empty({batches, attn_heads, query_seq_len, key_seq_len}, act_options);
    
       // Softmax Intermediate Result Ptr
       void* input_ptr = static_cast<void*>(input.data_ptr());
    @@ -71,9 +70,9 @@ torch::Tensor fwd_cuda(
       return softmax_results;
     }
    
    -torch::Tensor bwd_cuda(
    -    torch::Tensor const& output_grads_,
    -    torch::Tensor const& softmax_results_,
    +at::Tensor bwd_cuda(
    +    at::Tensor const& output_grads_,
    +    at::Tensor const& softmax_results_,
         float scale_factor)  {
    
       auto output_grads = output_grads_.contiguous();
    @@ -86,8 +85,8 @@ torch::Tensor bwd_cuda(
       const int key_seq_len = output_grads.size(3);
    
       auto act_options = output_grads.options();
    -  torch::Tensor input_grad =
    -      torch::empty({batches, attn_heads, query_seq_len, key_seq_len}, act_options);
    +  at::Tensor input_grad =
    +      at::empty({batches, attn_heads, query_seq_len, key_seq_len}, act_options);
    
       void* output_grads_ptr = static_cast<void*>(output_grads.data_ptr());
    
    @@ -96,8 +95,8 @@ torch::Tensor bwd_cuda(
           output_grads_.scalar_type(),
           "dispatch_scaled_masked_softmax_backward",
           dispatch_scaled_masked_softmax_backward_new<scalar_t, scalar_t, float>(
    -          reinterpret_cast<scalar_t*>(static_cast<void*>(input_grad.data_ptr())),
    -         reinterpret_cast<scalar_t*>(output_grads_ptr),
    +          reinterpret_cast<scalar_t*>(static_cast<void*>(input_grad.data_ptr())),
    +         reinterpret_cast<scalar_t*>(output_grads_ptr),
              reinterpret_cast<scalar_t const*>(softmax_results.data_ptr()),
              scale_factor,
              query_seq_len,
    @@ -105,7 +104,7 @@ torch::Tensor bwd_cuda(
              batches,
              attn_heads);
                               );
    -
    +
       //backward pass is completely in-place
       return input_grad;
     }
    diff --git a/csrc/megatron/scaled_masked_softmax_cuda.cu b/csrc/megatron/scaled_masked_softmax_cuda.cu
    index 39931ea..0946173 100644
    --- a/csrc/megatron/scaled_masked_softmax_cuda.cu
    +++ b/csrc/megatron/scaled_masked_softmax_cuda.cu
    @@ -20,7 +20,6 @@
     #include <cuda_fp16.h>
     #include <cuda_profiler_api.h>
     #include <ATen/cuda/CUDAContext.h>
    -#include <torch/extension.h>
     #include "scaled_masked_softmax.h"
     #include "type_shim.h"
    
    @@ -33,9 +32,9 @@ int get_batch_per_block_cuda(int query_seq_len, int key_seq_len, int batches, in
     }
    
    
    -torch::Tensor fwd_cuda(
    -    torch::Tensor const& input,
    -    torch::Tensor const& mask,
    +at::Tensor fwd_cuda(
    +    at::Tensor const& input,
    +    at::Tensor const& mask,
         float scale_factor)
     {
       // input is a 4d tensor with dimensions [batches, attn_heads, seq_len, seq_len]
    @@ -51,10 +50,10 @@ torch::Tensor fwd_cuda(
       TORCH_INTERNAL_ASSERT(mask.size(2) == query_seq_len);
       TORCH_INTERNAL_ASSERT(mask.size(3) == key_seq_len);
    
    -  // Output
    +  // Output
       auto act_options = input.options().requires_grad(false);
    -  torch::Tensor softmax_results =
    -      torch::empty({batches, attn_heads, query_seq_len, key_seq_len}, act_options);
    +  at::Tensor softmax_results =
    +      at::empty({batches, attn_heads, query_seq_len, key_seq_len}, act_options);
    
       // Softmax Intermediate Result Ptr
       void* input_ptr = static_cast<void*>(input.data_ptr());
    @@ -79,11 +78,11 @@ torch::Tensor fwd_cuda(
       return softmax_results;
     }
    
    -torch::Tensor bwd_cuda(
    -    torch::Tensor const& output_grads_,
    -    torch::Tensor const& softmax_results_,
    +at::Tensor bwd_cuda(
    +    at::Tensor const& output_grads_,
    +    at::Tensor const& softmax_results_,
         float scale_factor)  {
    -
    +
       auto output_grads = output_grads_.contiguous();
       auto softmax_results = softmax_results_.contiguous();
    
    @@ -94,8 +93,8 @@ torch::Tensor bwd_cuda(
       const int key_seq_len = output_grads.size(3);
    
       auto act_options = output_grads.options().requires_grad(false);
    -  torch::Tensor input_grads =
    -      torch::empty({batches, attn_heads, query_seq_len, key_seq_len}, act_options);
    +  at::Tensor input_grads =
    +      at::empty({batches, attn_heads, query_seq_len, key_seq_len}, act_options);
       void* input_grads_ptr = static_cast<void*>(input_grads.data_ptr());
       void* output_grads_ptr = static_cast<void*>(output_grads.data_ptr());
    
    @@ -104,8 +103,8 @@ torch::Tensor bwd_cuda(
           output_grads_.scalar_type(),
           "dispatch_scaled_masked_softmax_backward",
           dispatch_scaled_masked_softmax_backward<scalar_t, scalar_t, float>(
    -          reinterpret_cast<scalar_t*>(input_grads_ptr),
    -          reinterpret_cast<scalar_t*>(output_grads_ptr),
    +          reinterpret_cast<scalar_t*>(input_grads_ptr),
    +          reinterpret_cast<scalar_t*>(output_grads_ptr),
               reinterpret_cast<scalar_t const*>(softmax_results.data_ptr()),
               scale_factor,
               query_seq_len,
    diff --git a/csrc/megatron/scaled_upper_triang_masked_softmax_cuda.cu b/csrc/megatron/scaled_upper_triang_masked_softmax_cuda.cu
    index a90a934..12ae769 100644
    --- a/csrc/megatron/scaled_upper_triang_masked_softmax_cuda.cu
    +++ b/csrc/megatron/scaled_upper_triang_masked_softmax_cuda.cu
    @@ -20,7 +20,6 @@
     #include <cuda_fp16.h>
     #include <cuda_profiler_api.h>
     #include <ATen/cuda/CUDAContext.h>
    -#include <torch/extension.h>
     #include "scaled_upper_triang_masked_softmax.h"
     #include "type_shim.h"
    
    @@ -28,8 +27,8 @@ namespace multihead_attn {
     namespace fused_softmax {
     namespace scaled_upper_triang_masked_softmax {
    
    -torch::Tensor fwd_cuda(
    -    torch::Tensor const& input,
    +at::Tensor fwd_cuda(
    +    at::Tensor const& input,
         float scale_factor)
     {
       // input is a 3d tensor with dimensions [attn_batches, seq_len, seq_len]
    @@ -37,10 +36,10 @@ torch::Tensor fwd_cuda(
       const int seq_len = input.size(1);
       TORCH_INTERNAL_ASSERT(seq_len <= 2048);
    
    -  // Output
    +  // Output
       auto act_options = input.options().requires_grad(false);
    -  torch::Tensor softmax_results =
    -      torch::empty({attn_batches, seq_len, seq_len}, act_options);
    +  at::Tensor softmax_results =
    +      at::empty({attn_batches, seq_len, seq_len}, act_options);
    
       // Softmax Intermediate Result Ptr
       void* input_ptr = static_cast<void*>(input.data_ptr());
    @@ -59,13 +58,13 @@ torch::Tensor fwd_cuda(
           );
       return softmax_results;
     }
    -
    
    -torch::Tensor bwd_cuda(
    -    torch::Tensor const& output_grads_,
    -    torch::Tensor const& softmax_results_,
    +
    +at::Tensor bwd_cuda(
    +    at::Tensor const& output_grads_,
    +    at::Tensor const& softmax_results_,
         float scale_factor)  {
    -
    +
       auto output_grads = output_grads_.contiguous();
       auto softmax_results = softmax_results_.contiguous();
    
    @@ -81,15 +80,15 @@ torch::Tensor bwd_cuda(
           output_grads_.scalar_type(),
           "dispatch_scaled_upper_triang_masked_softmax_backward",
           dispatch_scaled_upper_triang_masked_softmax_backward<scalar_t, scalar_t, float>(
    -          reinterpret_cast<scalar_t*>(output_grads_ptr),
    -         reinterpret_cast<scalar_t*>(output_grads_ptr),
    +          reinterpret_cast<scalar_t*>(output_grads_ptr),
    +         reinterpret_cast<scalar_t*>(output_grads_ptr),
              reinterpret_cast<scalar_t const*>(softmax_results.data_ptr()),
              scale_factor,
              seq_len,
              seq_len,
              attn_batches);
           );
    -
    +
       //backward pass is completely in-place
       return output_grads;
     }
    diff --git a/csrc/mlp_cuda.cu b/csrc/mlp_cuda.cu
    index f93f1df..96995e6 100644
    --- a/csrc/mlp_cuda.cu
    +++ b/csrc/mlp_cuda.cu
    @@ -4,7 +4,6 @@
     #include <stdio.h>
     #include <stdlib.h>
     #include <string.h>
    -#include <torch/torch.h>
    
     /* Includes, cuda */
     #include <cublas_v2.h>
    @@ -434,7 +433,7 @@ CLEANUP:
     // Bias ADD. Assume input X is [features x batch size], column major.
     // Bias is one 'features' long vector, with implicit broadcast.
     template <typename T>
    -__global__ void biasAdd_fprop(T *X, T *b, uint batch_size, uint features) {
    +__global__ void biasAdd_fprop(T *X, T *b, unsigned int batch_size, unsigned int features) {
       T r_x[ILP];
       T r_b[ILP];
       if(is_aligned(X) && is_aligned(b) && features % ILP ==0) {
    @@ -481,7 +480,7 @@ __global__ void biasAdd_fprop(T *X, T *b, uint batch_size, uint features) {
     // Bias ADD + ReLU. Assume input X is [features x batch size], column major.
     // Activation support fuesed ReLU. Safe to call in-place.
     template <typename T>
    -__global__ void biasAddRelu_fprop(T *X, T *b, uint batch_size, uint features) {
    +__global__ void biasAddRelu_fprop(T *X, T *b, unsigned int batch_size, unsigned int features) {
       T r_x[ILP];
       T r_b[ILP];
       if(is_aligned(X) && is_aligned(b) && features % ILP ==0) {
    @@ -528,7 +527,7 @@ __global__ void biasAddRelu_fprop(T *X, T *b, uint batch_size, uint features) {
     // ReLU. Assume input X is [features x batch size], column major.
     // Safe to call in-place.
     template <typename T>
    -__global__ void Relu_fprop(T *X, uint batch_size, uint features) {
    +__global__ void Relu_fprop(T *X, unsigned int batch_size, unsigned int features) {
       T r_x[ILP];
       if(is_aligned(X) && features % ILP ==0) {
         int tid = blockIdx.x * blockDim.x + threadIdx.x;
    @@ -568,7 +567,7 @@ __global__ void Relu_fprop(T *X, uint batch_size, uint features) {
     // Sigmoid. Assume input X is [features x batch size], column major.
     // Safe to call in-place.
     template <typename T>
    -__global__ void Sigmoid_fprop(T *X, uint batch_size, uint features) {
    +__global__ void Sigmoid_fprop(T *X, unsigned int batch_size, unsigned int features) {
       T r_x[ILP];
       if(is_aligned(X) && features % ILP ==0) {
         int tid = blockIdx.x * blockDim.x + threadIdx.x;
    @@ -608,7 +607,7 @@ __global__ void Sigmoid_fprop(T *X, uint batch_size, uint features) {
     // ReLU. Assume input X is [features x batch size], column major.
     // Safe to call in-place.
     template <typename T>
    -__global__ void Relu_bprop(T *dY, T *Y, uint batch_size, uint features, T *dX) {
    +__global__ void Relu_bprop(T *dY, T *Y, unsigned int batch_size, unsigned int features, T *dX) {
       T r_dy[ILP];
       T r_y[ILP];
       if(is_aligned(dY) &&
    @@ -656,7 +655,7 @@ __global__ void Relu_bprop(T *dY, T *Y, uint batch_size, uint features, T *dX) {
     // Sigmoid. Assume input X is [features x batch size], column major.
     // Safe to call in-place.
     template <typename T>
    -__global__ void Sigmoid_bprop(T *dY, T *Y, uint batch_size, uint features, T *dX) {
    +__global__ void Sigmoid_bprop(T *dY, T *Y, unsigned int batch_size, unsigned int features, T *dX) {
       T r_dy[ILP];
       T r_y[ILP];
       if(is_aligned(dY) &&
    @@ -1324,7 +1323,7 @@ int mlp_fp(
             return 1;
           }
    
    -      const uint &input_size = ofeat;
    +      const unsigned int &input_size = ofeat;
           int num_blocks = 0;
           int num_SMs = at::cuda::getCurrentDeviceProperties()->multiProcessorCount;
           // Call biasReLU
    diff --git a/csrc/multi_tensor_adagrad.cu b/csrc/multi_tensor_adagrad.cu
    index 699681b..2f0e6ed 100644
    --- a/csrc/multi_tensor_adagrad.cu
    +++ b/csrc/multi_tensor_adagrad.cu
    @@ -10,6 +10,7 @@
     #include "multi_tensor_apply.cuh"
     #include "type_shim.h"
    
    +#define _ENABLE_EXTENDED_ALIGNED_STORAGE
     #define BLOCK_SIZE 1024
     #define ILP 4
    
    diff --git a/csrc/multi_tensor_axpby_kernel.cu b/csrc/multi_tensor_axpby_kernel.cu
    index 021df27..43bd628 100644
    --- a/csrc/multi_tensor_axpby_kernel.cu
    +++ b/csrc/multi_tensor_axpby_kernel.cu
    @@ -72,11 +72,11 @@ struct AxpbyFunctor
             {
               r_out[ii] = a*static_cast<float>(r_x[ii]) + b*static_cast<float>(r_y[ii]);
               if(arg_to_check == -1)
    -            finite = finite && (isfinite(r_x[ii]) && isfinite(r_y[ii]));
    +            finite = finite && (isfinite(static_cast<float>(r_x[ii])) && isfinite(static_cast<float>(r_y[ii])));
               if(arg_to_check == 0)
    -            finite = finite && isfinite(r_x[ii]);
    +            finite = finite && isfinite(static_cast<float>(r_x[ii]));
               if(arg_to_check == 1)
    -            finite = finite && isfinite(r_y[ii]);
    +            finite = finite && isfinite(static_cast<float>(r_y[ii]));
             }
             // store
             load_store(out, r_out, i_start , 0);
    @@ -104,11 +104,11 @@ struct AxpbyFunctor
             {
               r_out[ii] = a*static_cast<float>(r_x[ii]) + b*static_cast<float>(r_y[ii]);
               if(arg_to_check == -1)
    -            finite = finite && (isfinite(r_x[ii]) && isfinite(r_y[ii]));
    +            finite = finite && (isfinite(static_cast<float>(r_x[ii])) && isfinite(static_cast<float>(r_y[ii])));
               if(arg_to_check == 0)
    -            finite = finite && isfinite(r_x[ii]);
    +            finite = finite && isfinite(static_cast<float>(r_x[ii]));
               if(arg_to_check == 1)
    -            finite = finite && isfinite(r_y[ii]);
    +            finite = finite && isfinite(static_cast<float>(r_y[ii]));
             }
             // see note in multi_tensor_scale_kernel.cu
     #pragma unroll
    diff --git a/csrc/multi_tensor_scale_kernel.cu b/csrc/multi_tensor_scale_kernel.cu
    index 629ee94..3bf4bf5 100644
    --- a/csrc/multi_tensor_scale_kernel.cu
    +++ b/csrc/multi_tensor_scale_kernel.cu
    @@ -66,7 +66,7 @@ struct ScaleFunctor
             for(int ii = 0; ii < ILP; ii++)
             {
               r_out[ii] = static_cast<float>(r_in[ii]) * scale;
    -          finite = finite && isfinite(r_in[ii]);
    +          finite = finite && isfinite(static_cast<float>(r_in[ii]));
             }
             // store
             load_store(out, r_out, i_start, 0);
    @@ -94,7 +94,7 @@ struct ScaleFunctor
             for(int ii = 0; ii < ILP; ii++)
             {
               r_out[ii] = static_cast<float>(r_in[ii]) * scale;
    -          finite = finite && isfinite(r_in[ii]);
    +          finite = finite && isfinite(static_cast<float>(r_in[ii]));
             }
     #pragma unroll
             for(int ii = 0; ii < ILP; ii++)
    diff --git a/setup.py b/setup.py
    index e3063be..32b6df4 100644
    --- a/setup.py
    +++ b/setup.py
    @@ -132,7 +132,7 @@ if (TORCH_MAJOR > 1) or (TORCH_MAJOR == 1 and TORCH_MINOR > 2):
     version_ge_1_5 = []
     if (TORCH_MAJOR > 1) or (TORCH_MAJOR == 1 and TORCH_MINOR > 4):
         version_ge_1_5 = ["-DVERSION_GE_1_5"]
    -version_dependent_macros = version_ge_1_1 + version_ge_1_3 + version_ge_1_5
    +version_dependent_macros = version_ge_1_1 + version_ge_1_3 + version_ge_1_5 + ["-D_DISABLE_EXTENDED_ALIGNED_STORAGE"]
    
     _, bare_metal_version = get_cuda_bare_metal_version(CUDA_HOME)
    
    @@ -240,6 +240,7 @@ if "--cuda_ext" in sys.argv:
                     "cxx": ["-O3"] + version_dependent_macros,
                     "nvcc": append_nvcc_threads(["-O3"] + version_dependent_macros),
                 },
    +            extra_link_args=['cublas.lib', 'cublasLt.lib']
             )
         )
         ext_modules.append(
    @@ -250,6 +251,7 @@ if "--cuda_ext" in sys.argv:
                     "cxx": ["-O3"] + version_dependent_macros,
                     "nvcc": append_nvcc_threads(["-O3"] + version_dependent_macros),
                 },
    +            extra_link_args=['cublas.lib', 'cublasLt.lib']
             )
         )
    
    @@ -360,6 +362,7 @@ if "--cuda_ext" in sys.argv:
                             + cc_flag
                         ),
                     },
    +                extra_link_args=['cublas.lib']
                 )
             )
    
    
    opened by wanjunling168 3
  • AttributeError: module 'torch.distributed' has no attribute '_all_gather_base'

    AttributeError: module 'torch.distributed' has no attribute '_all_gather_base'

    https://github.com/NVIDIA/apex/blob/6a40a0ad9ff3d6ebea715cf28faf39792312acbf/apex/transformer/utils.py#L10

    when i use FusedAdam in torch1.8, there is no all_gather_into_tensor or _all_gather_base in dir(torch.distributed).

    opened by AungV 3
Owner
NVIDIA Corporation
NVIDIA Corporation
Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.

HAWQ: Hessian AWare Quantization HAWQ is an advanced quantization library written for PyTorch. HAWQ enables low-precision and mixed-precision uniform

Zhen Dong 276 Nov 25, 2022
BitPack is a practical tool to efficiently save ultra-low precision/mixed-precision quantized models.

BitPack is a practical tool that can efficiently save quantized neural network models with mixed bitwidth.

Zhen Dong 34 Nov 16, 2022
This is the pytorch implementation for the paper: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation, which is accepted to ICCV2021.

GMPQ: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation This is the pytorch implementation for the paper: Generalizable Mix

null 18 Sep 2, 2022
EdMIPS: Rethinking Differentiable Search for Mixed-Precision Neural Networks

EdMIPS is an efficient algorithm to search the optimal mixed-precision neural network directly without proxy task on ImageNet given computation budgets. It can be applied to many popular network architectures, including ResNet, GoogLeNet, and Inception-V3.

Zhaowei Cai 47 Sep 20, 2022
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Microsoft 8.2k Nov 20, 2022
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.

English | įŽ€äŊ“中文 Easy Parallel Library Overview Easy Parallel Library (EPL) is a general and efficient library for distributed model training. Usability

Alibaba 176 Nov 25, 2022
[ICLR 2021] "CPT: Efficient Deep Neural Network Training via Cyclic Precision" by Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin

CPT: Efficient Deep Neural Network Training via Cyclic Precision Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin Accep

null 26 Oct 25, 2022
Vertical Federated Principal Component Analysis and Its Kernel Extension on Feature-wise Distributed Data based on Pytorch Framework

VFedPCA+VFedAKPCA This is the official source code for the Paper: Vertical Federated Principal Component Analysis and Its Kernel Extension on Feature-

John 9 Sep 18, 2022
🏎ī¸ Accelerate training and inference of 🤗 Transformers with easy to use hardware optimization tools

Hugging Face Optimum ?? Optimum is an extension of ?? Transformers, providing a set of performance optimization tools enabling maximum efficiency to t

Hugging Face 794 Nov 28, 2022
The pure and clear PyTorch Distributed Training Framework.

The pure and clear PyTorch Distributed Training Framework. Introduction Requirements and Usage Dependency Dataset Basic Usage Slurm Cluster Usage Base

WILL LEE 204 Nov 22, 2022
Provide partial dates and retain the date precision through processing

Prefix date parser This is a helper class to parse dates with varied degrees of precision. For example, a data source might state a date as 2001, 2001

Friedrich Lindenberg 13 Apr 14, 2022
Exposure Time Calculator (ETC) and radial velocity precision estimator for the Near InfraRed Planet Searcher (NIRPS) spectrograph

NIRPS-ETC Exposure Time Calculator (ETC) and radial velocity precision estimator for the Near InfraRed Planet Searcher (NIRPS) spectrograph February 2

Nolan Grieves 2 Sep 15, 2022
code for paper"A High-precision Semantic Segmentation Method Combining Adversarial Learning and Attention Mechanism"

PyTorch implementation of UAGAN(U-net Attention Generative Adversarial Networks) This repository contains the source code for the paper "A High-precis

Tong 8 Apr 25, 2022
Distributed Arcface Training in Pytorch

Distributed Arcface Training in Pytorch

null 3 Nov 23, 2021
Source code for Acorn, the precision farming rover by Twisted Fields

Acorn precision farming rover This is the software repository for Acorn, the precision farming rover by Twisted Fields. For more information see twist

Twisted Fields 164 Nov 12, 2022
HandTailor: Towards High-Precision Monocular 3D Hand Recovery

HandTailor This repository is the implementation code and model of the paper "HandTailor: Towards High-Precision Monocular 3D Hand Recovery" (arXiv) G

Lv Jun 108 Oct 24, 2022
Official code release for: EditGAN: High-Precision Semantic Image Editing

Official code release for: EditGAN: High-Precision Semantic Image Editing

null 554 Nov 23, 2022
Bagua is a flexible and performant distributed training algorithm development framework.

Bagua is a flexible and performant distributed training algorithm development framework.

null 784 Nov 22, 2022
Finite difference solution of 2D Poisson equation. Can handle Dirichlet, Neumann and mixed boundary conditions.

Poisson-solver-2D Finite difference solution of 2D Poisson equation Current version can handle Dirichlet, Neumann, and mixed (combination of Dirichlet

Mohammad Asif Zaman 32 Oct 20, 2022