A pytorch implementation of Paper "Improved Training of Wasserstein GANs"

Overview

WGAN-GP

An pytorch implementation of Paper "Improved Training of Wasserstein GANs".

Prerequisites

Python, NumPy, SciPy, Matplotlib A recent NVIDIA GPU

A latest master version of Pytorch

Progress

  • gan_toy.py : Toy datasets (8 Gaussians, 25 Gaussians, Swiss Roll).(Finished in 2017.5.8)

  • gan_language.py : Character-level language model (Discriminator is using nn.Conv1d. Generator is using nn.Conv1d. Finished in 2017.6.23. Finished in 2017.6.27.)

  • gan_mnist.py : MNIST (Running Results while Finished in 2017.6.26. Discriminator is using nn.Conv1d. Generator is using nn.Conv1d.)

  • gan_64x64.py: 64x64 architectures(Looking forward to your pull request)

  • gan_cifar.py: CIFAR-10(Great thanks to robotcator)

Results

  • Toy Dataset

    Some Sample Result, you can refer to the results/toy/ folder for details.

    • 8gaussians 154500 iteration

    frame1612

    • 25gaussians 48500 iteration

      frame485

    • swissroll 69400 iteration

    frame694

  • Mnist Dataset

    Some Sample Result, you can refer to the results/mnist/ folder for details.

    mnist_samples_91899

    mnist_samples_91899

    mnist_samples_91899

    mnist_samples_199999

  • Billion Word Language Generation (Using CNN, character-level)

    Some Sample Result after 8699 epochs which is detailed in sample

    I haven't run enough epochs due to that this is very time-comsuming.

    He moved the mat all out clame t

    A fosts of shores forreuid he pe

    It whith Crouchy digcloued defor

    Pamreutol the rered in Car inson

    Nor op to the lecs ficomens o fe

    In is a " nored by of the ot can

    The onteon I dees this pirder ,

    It is Brobes aoracy of " medurn

    Rame he reaariod to thim atreast

    The stinl who herth of the not t

    The witl is f ont UAy Y nalence

    It a over , tose sho Leloch Cumm

  • Cifar10 Dataset

    Some Sample Result, you can refer to the results/cifar10/ folder for details.

    mnist_samples_91899

Acknowledge

Based on the implementation igul222/improved_wgan_training and martinarjovsky/WassersteinGAN

Comments
  • RuntimeError: cuda runtime error (2) : out of memory at /py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/generic/THCStorage.cu:66

    RuntimeError: cuda runtime error (2) : out of memory at /py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/generic/THCStorage.cu:66

    I get this error trying to run the mnist example. I have a Titan X GPU so I don't think I should run out of memory on mnist. I'm using PyTorch version 0.1.12_2 and Python 3.

    Generator (
      (block1): Sequential (
        (0): ConvTranspose2d(256, 128, kernel_size=(5, 5), stride=(1, 1))
        (1): ReLU (inplace)
      )
      (block2): Sequential (
        (0): ConvTranspose2d(128, 64, kernel_size=(5, 5), stride=(1, 1))
        (1): ReLU (inplace)
      )
      (deconv_out): ConvTranspose2d(64, 1, kernel_size=(8, 8), stride=(2, 2))
      (preprocess): Sequential (
        (0): Linear (128 -> 4096)
        (1): ReLU (inplace)
      )
      (sigmoid): Sigmoid ()
    )
    Discriminator (
      (main): Sequential (
        (0): Linear (784 -> 4096)
        (1): ReLU (inplace)
        (2): Linear (4096 -> 4096)
        (3): ReLU (inplace)
        (4): Linear (4096 -> 4096)
        (5): ReLU (inplace)
        (6): Linear (4096 -> 4096)
        (7): ReLU (inplace)
        (8): Linear (4096 -> 4096)
        (9): ReLU (inplace)
        (10): Linear (4096 -> 1)
      )
    )
    
    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    <ipython-input-21-c32a204873f5> in <module>()
          4 print(net_D)
          5 if use_cuda:
    ----> 6     net_D = net_D.cuda()
          7     net_G = net_G.cuda()
          8 opt_D = optim.Adam(net_D.parameters(), lr=1e04, betas=(0.5, 0.9))
    
    /home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in cuda(self, device_id)
        145                 copied to that device
        146         """
    --> 147         return self._apply(lambda t: t.cuda(device_id))
        148 
        149     def cpu(self, device_id=None):
    
    /home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in _apply(self, fn)
        116     def _apply(self, fn):
        117         for module in self.children():
    --> 118             module._apply(fn)
        119 
        120         for param in self._parameters.values():
    
    /home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in _apply(self, fn)
        116     def _apply(self, fn):
        117         for module in self.children():
    --> 118             module._apply(fn)
        119 
        120         for param in self._parameters.values():
    
    /home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in _apply(self, fn)
        122                 # Variables stored in modules are graph leaves, and we don't
        123                 # want to create copy nodes, so we have to unpack the data.
    --> 124                 param.data = fn(param.data)
        125                 if param._grad is not None:
        126                     param._grad.data = fn(param._grad.data)
    
    /home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in <lambda>(t)
        145                 copied to that device
        146         """
    --> 147         return self._apply(lambda t: t.cuda(device_id))
        148 
        149     def cpu(self, device_id=None):
    
    /home/clu/anaconda3/lib/python3.6/site-packages/torch/_utils.py in _cuda(self, device, async)
         63         else:
         64             new_type = getattr(torch.cuda, self.__class__.__name__)
    ---> 65             return new_type(self.size()).copy_(self, async)
         66 
         67 
    
    opened by clu5 10
  • Why

    Why "no required computing gradients"?

    I used the same calc_GradientPenalty method as yours and the latest master branch of pytorch('0.1.12+625850c'). But it stuck at penalty.backward() with an error

    "RuntimeError: there are no graph nodes that require computing gradients"

    . I used requires_gradient = True for the interpolates variable. Thanks!

    opened by HRLTY 7
  • gradients.norm(2, dim=1), dim=1?

    gradients.norm(2, dim=1), dim=1?

    @caogang Thanks for your good code! But something confuses me in gan_cifar10.py

    image

    dim = 1? Why it is only normed in the second axis? I think it should be normed across all axis but the batch axis.

    opened by LynnHo 6
  • shouldn't it be D_real.backward(one)?

    shouldn't it be D_real.backward(one)?

      ```
       D_real.backward(mone)
       # train with fake
        noise = torch.randn(BATCH_SIZE, 128)
        if use_cuda:
            noise = noise.cuda()
        noisev = autograd.Variable(noise, volatile=True)  # totally freeze netG
        fake = autograd.Variable(netG(noisev).data)
        inputv = fake
        D_fake = netD(inputv)
        D_fake = D_fake.mean()
        D_fake.backward(one)
    
    opened by ypxie 6
  • After adding self-implemented Layer-Normalization, the backward time of gradient_penalty became large

    After adding self-implemented Layer-Normalization, the backward time of gradient_penalty became large

    My implementation of layer-normalization is:

    class Layer_Norm(nn.Module):
    
        def __init__(self, dim):
            super(Layer_Norm, self).__init__()
            self.dim = dim
            self.g = Parameter(torch.zeros(1, dim))
            self.b = Parameter(torch.zeros(1, dim))
            self.init_weights()
    
        def forward(self, input):
            miu = torch.sum(input, 1).unsqueeze(1)/self.dim
            input_minus_miu = input - miu.expand_as(input)
            sigma = (torch.sum((input_minus_miu).pow(2), 1)/self.dim).sqrt().unsqueeze(1)
            input = input_minus_miu*self.g.expand(input.size())/sigma.expand_as(input) + self.b.expand(input.size())
    
            return input
    
        def init_weights(self):
            self.g.data.fill_(1)
            self.b.data.fill_(0)
    

    After plugging in this before ReLU, the backward of gradient_penalty became large 0.1149s compared to the former value 0.0075s.

    I compiled the source code from master branch, commit deb0aef30cdaa78f9840bfa4a919ad206e8e73a7 and also modified the ReLU source code before compiling according to your instruction. I am wondering if it is because the my implementation of layer-normalization contains something not suitable for double backward?

    opened by santisy 5
  • Issues with Python3 Version

    Issues with Python3 Version

    When i am trying to run with Python3 .I am facing lot of issues. Fixed almost of them Except the error in plot.py file at Flush

    Axis Error: axis -1 is out of bounds for array of dimension 0.

    Can you kindly help with this?

    def flush(): prints = []

    for name, vals in _since_last_flush.items():
    	prints.append("{}\t{}".format(name, np.mean(vals.values())))
    	_since_beginning[name].update(vals)
    
    	x_vals = np.sort(_since_beginning[name].keys())
    	y_vals = [_since_beginning[name][x] for x in x_vals]
    
    	plt.clf()
    	plt.plot(x_vals, y_vals)
    	plt.xlabel('iteration')
    	plt.ylabel(name)
    	plt.savefig(name.replace(' ', '_')+'.jpg')
    
    opened by jmandivarapu1 3
  • question of the code

    question of the code

    Hi, what does the 'grad_outputs' in line 131 of gan_cifar10.py stand for ?Should the parameter be interpolates.size() instead of disc_interpolates.size()?

    opened by tartarleft 3
  • Zero gradient

    Zero gradient

    I'm trying to implement the gradient-penalty approach using this code, but this code block:

        gradients = autograd.grad(outputs=disc_interpolates, inputs=interpolates,
                                  grad_outputs=torch.ones(disc_interpolates.size()).cuda(gpu) if use_cuda else torch.ones(
                                      disc_interpolates.size()),
                                  create_graph=True, retain_graph=True, only_inputs=True)[0]
    

    always returns gradients of 0.

    Would this be caused by my using ConvTranspose2d units?

    opened by elbamos 3
  • I install pytorch 0.1.12+ac1c674 and it errors like this.

    I install pytorch 0.1.12+ac1c674 and it errors like this.

    https://github.com/pytorch/pytorch/commit/ac1c674723d503d96ba6cd7533a615d1b551e606

    Traceback (most recent call last):
      File "gan_toy.py", line 270, in <module>
        gradient_penalty.backward()
      File "/home/yan/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 151, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
      File "/home/yan/anaconda3/lib/python3.6/site-packages/torch/autograd/__init__.py", line 98, in backward
        variables, grad_variables, retain_graph)
      File "/home/yan/anaconda3/lib/python3.6/site-packages/torch/autograd/function.py", line 90, in apply
        return self._forward_cls.backward(self, *args)
      File "/home/yan/anaconda3/lib/python3.6/site-packages/torch/nn/_functions/linear.py", line 23, in backward
        grad_input = torch.mm(grad_output, weight)
      File "/home/yan/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 539, in mm
        return self._static_blas(Addmm, (output, 0, 1, self, matrix), False)
      File "/home/yan/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 532, in _static_blas
        return cls.apply(*(args[:1] + args[-2:] + (alpha, beta, inplace)))
      File "/home/yan/anaconda3/lib/python3.6/site-packages/torch/autograd/_functions/blas.py", line 24, in forward
        matrix1, matrix2, out=output)
    TypeError: torch.addmm received an invalid combination of arguments - got (int, torch.cuda.ByteTensor, int, torch.cuda.ByteTensor, torch.cuda.FloatTensor, out=torch.cuda.ByteT
    ensor), but expected one of:
     * (torch.cuda.ByteTensor source, torch.cuda.ByteTensor mat1, torch.cuda.ByteTensor mat2, *, torch.cuda.ByteTensor out)
     * (torch.cuda.ByteTensor source, torch.cuda.sparse.ByteTensor mat1, torch.cuda.ByteTensor mat2, *, torch.cuda.ByteTensor out)
     * (int beta, torch.cuda.ByteTensor source, torch.cuda.ByteTensor mat1, torch.cuda.ByteTensor mat2, *, torch.cuda.ByteTensor out)
     * (torch.cuda.ByteTensor source, int alpha, torch.cuda.ByteTensor mat1, torch.cuda.ByteTensor mat2, *, torch.cuda.ByteTensor out)
     * (int beta, torch.cuda.ByteTensor source, torch.cuda.sparse.ByteTensor mat1, torch.cuda.ByteTensor mat2, *, torch.cuda.ByteTensor out)
     * (torch.cuda.ByteTensor source, int alpha, torch.cuda.sparse.ByteTensor mat1, torch.cuda.ByteTensor mat2, *, torch.cuda.ByteTensor out)
     * (int beta, torch.cuda.ByteTensor source, int alpha, torch.cuda.ByteTensor mat1, torch.cuda.ByteTensor mat2, *, torch.cuda.ByteTensor out)
          didn't match because some of the arguments have invalid types: (int, torch.cuda.ByteTensor, int, torch.cuda.ByteTensor, torch.cuda.FloatTensor, out=torch.cuda.ByteTensor
    )
     * (int beta, torch.cuda.ByteTensor source, int alpha, torch.cuda.sparse.ByteTensor mat1, torch.cuda.ByteTensor mat2, *, torch.cuda.ByteTensor out)
          didn't match because some of the arguments have invalid types: (int, torch.cuda.ByteTensor, int, torch.cuda.ByteTensor, torch.cuda.FloatTensor, out=torch.cuda.ByteTensor
    )
    

    Any quick fix or just wait for the milestone when stable double backprop have been implemented?

    opened by Naruto-Sasuke 3
  • why D_real.backward(one) and D_fake.backward(mone)?

    why D_real.backward(one) and D_fake.backward(mone)?

    Thanks for your excellent work. I have two confusion on the operation said on the Title.

    1: what are one/mone doing here

    2: I read some GAN codes, they first compute loss, then to do backward (loss_D_real.backward() or loss_D_fake.backward() ). What's the reason you not using loss?

    opened by geekfeiw 2
  • Logic problem with calc_gradient_penalty in CNN case

    Logic problem with calc_gradient_penalty in CNN case

    Right now, you're getting the norm of the gradient in gan_mnist with gradient_penalty = ((gradients.norm(2, dim=1) - 1) ** 2).mean() * LAMBDA.

    gradient_penalty is of shape [BATCH_SIZE ,1, 28, 28]. We want to calculate the norm of the gradient PER SAMPLE, and then use that as an error metric. That means we need to collapse gradient_penalty into one value per sample, or to shape [BATCH_SIZE, 1] or just [BATCH_SIZE].

    But, gradients.norm(dim=1) collapses it to size [BATCH_SIZE, 28, 28], which isn't right.

    Instead, gradients needs to be reshaped to be flat before you take the norm.

    I monitored the value of gradient_penalty, and doing it the way it has now explodes to around 10000 for gan_mnist, even when the networks gradients were reasonable, so formal reason aside, I'm pretty sure there's a bug.

    Great library by the way, it's made my life really easy. Thanks for posting it.

    Want me to make a PR?

    What do you think?

    opened by samlobel 2
  • how to decide the value of λ

    how to decide the value of λ

    Since the d_loss = fake-real+gp, how to decide the value of LAMBDA to get a reasonable gradient penalty. I mean, if fake=-10, real=10, is gp=10 appropriate?or the gp should be smaller(e.g. gp=5)?

    opened by silentsaber 1
  • How does optimizer work when there are 3 backwards(real, fake, penalty)?

    How does optimizer work when there are 3 backwards(real, fake, penalty)?

    I thought I should write like "D_cost = D_fake - D_real + gradient_penalty D_cost.backward()" but I don't know why you use backward like that.

    https://github.com/caogang/wgan-gp/blob/ae47a185ed2e938c39cf3eb2f06b32dc1b6a2064/gan_cifar10.py#L203-L226

    opened by wook3024 0
  • The aixs of norm of the gradient

    The aixs of norm of the gradient

    https://github.com/caogang/wgan-gp/blob/ae47a185ed2e938c39cf3eb2f06b32dc1b6a2064/gan_mnist.py#L148

    Hi, I accidently saw your code when I google W-GAN GP. I think there is something wrong with your implementationi here. In W-GAN GP, the norm of the interpolated gradient should be calculated across all axis except the batch axis, since the gradient is wrt each sample. But in your code, you only calculated the norm of the second dimension, which is not reaonable. I think you miss the following reshape step:

    gradients.view(gradients.shape[0], -1)

    opened by CharlesNord 1
  • A question about Dcost

    A question about Dcost

    excuse me, can you tell me how can we get the result from descriminator and then muiltiply -1 as the cost? especially in mnist dataset, we use the sigmoid as the last layer, and the cost is negtive all the time, thank you

    opened by etoilestar 0
  • WGAN-gp loss keeps going large

    WGAN-gp loss keeps going large

    Hello, I've implemented your code on my own dataset. However, the d_loss decreases from 10(which equals to lambda) to a very small negative number(like -10000), the wasserstein distance keeps going to order of million, and the gradient penalty changes from 10 to 0 and then goes to order of thousand. I've worked on this problem for several days but I still can't solve it. Can anyone help me with this? @caogang

    opened by haonanhe 4
Owner
Marvin Cao
Marvin Cao
HashNeRF-pytorch - Pure PyTorch Implementation of NVIDIA paper on Instant Training of Neural Graphics primitives

HashNeRF-pytorch Instant-NGP recently introduced a Multi-resolution Hash Encodin

Yash Sanjay Bhalgat 616 Jan 6, 2023
ALBERT-pytorch-implementation - ALBERT pytorch implementation

ALBERT-pytorch-implementation developing... 모델의 개념이해를 돕기 위한 구현물로 현재 변수명을 상세히 적었고

BG Kim 3 Oct 6, 2022
The LaTeX and Python code for generating the paper, experiments' results and visualizations reported in each paper is available (whenever possible) in the paper's directory

This repository contains the software implementation of most algorithms used or developed in my research. The LaTeX and Python code for generating the

João Fonseca 3 Jan 3, 2023
Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Context Matters: Graph-based Self-supervised Representation Learning for Medical Images Official PyTorch implementation for paper Context Matters: Gra

null 49 Nov 23, 2022
A PyTorch re-implementation of the paper 'Exploring Simple Siamese Representation Learning'. Reproduced the 67.8% Top1 Acc on ImageNet.

Exploring simple siamese representation learning This is a PyTorch re-implementation of the SimSiam paper on ImageNet dataset. The results match that

Taojiannan Yang 72 Nov 9, 2022
This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

null 212 Dec 25, 2022
Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement".

HiSD: Image-to-image Translation via Hierarchical Style Disentanglement Official pytorch implementation of paper "Image-to-image Translation

null 364 Dec 14, 2022
PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Neural Scene Flow Fields PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 20

Zhengqi Li 585 Jan 4, 2023
Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

Jie Liu 111 Dec 31, 2022
Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

LLA: Loss-aware Label Assignment for Dense Pedestrian Detection This project provides an implementation for "LLA: Loss-aware Label Assignment for Dens

null 35 Dec 6, 2022
An implementation of Geoffrey Hinton's paper "How to represent part-whole hierarchies in a neural network" in Pytorch.

GLOM An implementation of Geoffrey Hinton's paper "How to represent part-whole hierarchies in a neural network" for MNIST Dataset. To understand this

null 50 Oct 19, 2022
Official implementation of our CVPR2021 paper "OTA: Optimal Transport Assignment for Object Detection" in Pytorch.

OTA: Optimal Transport Assignment for Object Detection This project provides an implementation for our CVPR2021 paper "OTA: Optimal Transport Assignme

null 217 Jan 3, 2023
This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

TransFG: A Transformer Architecture for Fine-grained Recognition Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-gra

Ju He 307 Jan 3, 2023
[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

PointDSC repository PyTorch implementation of PointDSC for CVPR'2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency",

null 153 Dec 14, 2022
PyTorch implementation of the Deep SLDA method from our CVPRW-2020 paper "Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis"

Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis This is a PyTorch implementation of the Deep Streaming Linear Discriminant

Tyler Hayes 41 Dec 25, 2022
The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

TimeSformer This is an official pytorch implementation of Is Space-Time Attention All You Need for Video Understanding?. In this repository, we provid

Facebook Research 1k Dec 31, 2022
PyTorch 1.5 implementation for paper DECOR-GAN: 3D Shape Detailization by Conditional Refinement.

DECOR-GAN PyTorch 1.5 implementation for paper DECOR-GAN: 3D Shape Detailization by Conditional Refinement, Zhiqin Chen, Vladimir G. Kim, Matthew Fish

Zhiqin Chen 72 Dec 31, 2022
Pytorch implementation of our paper under review — Lottery Jackpots Exist in Pre-trained Models

Lottery Jackpots Exist in Pre-trained Models (Paper Link) Requirements Python >= 3.7.4 Pytorch >= 1.6.1 Torchvision >= 0.4.1 Reproduce the Experiment

Yuxin Zhang 27 Jun 28, 2022
Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(2021) paper

ImageNet-21K Pretraining for the Masses Paper | Pretrained models Official PyTorch Implementation Tal Ridnik, Emanuel Ben-Baruch, Asaf Noy, Lihi Zelni

null 574 Jan 2, 2023