A pytorch implementation of Paper "Improved Training of Wasserstein GANs"

Marvin Cao

Last update: Dec 14, 2022

Related tags

Overview

WGAN-GP

An pytorch implementation of Paper "Improved Training of Wasserstein GANs".

Prerequisites

Python, NumPy, SciPy, Matplotlib A recent NVIDIA GPU

A latest master version of Pytorch

Progress

gan_toy.py : Toy datasets (8 Gaussians, 25 Gaussians, Swiss Roll).(Finished in 2017.5.8)
gan_language.py : Character-level language model (Discriminator is using nn.Conv1d. Generator is using nn.Conv1d. Finished in 2017.6.23. Finished in 2017.6.27.)
gan_mnist.py : MNIST (Running Results while Finished in 2017.6.26. Discriminator is using nn.Conv1d. Generator is using nn.Conv1d.)
gan_64x64.py: 64x64 architectures(Looking forward to your pull request)
gan_cifar.py: CIFAR-10(Great thanks to robotcator)

Results

Toy Dataset

Some Sample Result, you can refer to the results/toy/ folder for details.
- 8gaussians 154500 iteration
- 25gaussians 48500 iteration
- swissroll 69400 iteration
Mnist Dataset

Some Sample Result, you can refer to the results/mnist/ folder for details.
Billion Word Language Generation (Using CNN, character-level)

Some Sample Result after 8699 epochs which is detailed in sample

I haven't run enough epochs due to that this is very time-comsuming.

He moved the mat all out clame t

A fosts of shores forreuid he pe

It whith Crouchy digcloued defor

Pamreutol the rered in Car inson

Nor op to the lecs ficomens o fe

In is a " nored by of the ot can

The onteon I dees this pirder ,

It is Brobes aoracy of " medurn

Rame he reaariod to thim atreast

The stinl who herth of the not t

The witl is f ont UAy Y nalence

It a over , tose sho Leloch Cumm
Cifar10 Dataset

Some Sample Result, you can refer to the results/cifar10/ folder for details.

Acknowledge

Based on the implementation igul222/improved_wgan_training and martinarjovsky/WassersteinGAN

Comments

RuntimeError: cuda runtime error (2) : out of memory at /py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/generic/THCStorage.cu:66

I get this error trying to run the mnist example. I have a Titan X GPU so I don't think I should run out of memory on mnist. I'm using PyTorch version 0.1.12_2 and Python 3.

Generator (
  (block1): Sequential (
    (0): ConvTranspose2d(256, 128, kernel_size=(5, 5), stride=(1, 1))
    (1): ReLU (inplace)
  )
  (block2): Sequential (
    (0): ConvTranspose2d(128, 64, kernel_size=(5, 5), stride=(1, 1))
    (1): ReLU (inplace)
  )
  (deconv_out): ConvTranspose2d(64, 1, kernel_size=(8, 8), stride=(2, 2))
  (preprocess): Sequential (
    (0): Linear (128 -> 4096)
    (1): ReLU (inplace)
  )
  (sigmoid): Sigmoid ()
)
Discriminator (
  (main): Sequential (
    (0): Linear (784 -> 4096)
    (1): ReLU (inplace)
    (2): Linear (4096 -> 4096)
    (3): ReLU (inplace)
    (4): Linear (4096 -> 4096)
    (5): ReLU (inplace)
    (6): Linear (4096 -> 4096)
    (7): ReLU (inplace)
    (8): Linear (4096 -> 4096)
    (9): ReLU (inplace)
    (10): Linear (4096 -> 1)
  )
)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-21-c32a204873f5> in <module>()
      4 print(net_D)
      5 if use_cuda:
----> 6     net_D = net_D.cuda()
      7     net_G = net_G.cuda()
      8 opt_D = optim.Adam(net_D.parameters(), lr=1e04, betas=(0.5, 0.9))

/home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in cuda(self, device_id)
    145                 copied to that device
    146         """
--> 147         return self._apply(lambda t: t.cuda(device_id))
    148 
    149     def cpu(self, device_id=None):

/home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    116     def _apply(self, fn):
    117         for module in self.children():
--> 118             module._apply(fn)
    119 
    120         for param in self._parameters.values():

/home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    116     def _apply(self, fn):
    117         for module in self.children():
--> 118             module._apply(fn)
    119 
    120         for param in self._parameters.values():

/home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    122                 # Variables stored in modules are graph leaves, and we don't
    123                 # want to create copy nodes, so we have to unpack the data.
--> 124                 param.data = fn(param.data)
    125                 if param._grad is not None:
    126                     param._grad.data = fn(param._grad.data)

/home/clu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in <lambda>(t)
    145                 copied to that device
    146         """
--> 147         return self._apply(lambda t: t.cuda(device_id))
    148 
    149     def cpu(self, device_id=None):

/home/clu/anaconda3/lib/python3.6/site-packages/torch/_utils.py in _cuda(self, device, async)
     63         else:
     64             new_type = getattr(torch.cuda, self.__class__.__name__)
---> 65             return new_type(self.size()).copy_(self, async)
     66 
     67

opened by clu5 10

Why "no required computing gradients"?

I used the same calc_GradientPenalty method as yours and the latest master branch of pytorch('0.1.12+625850c'). But it stuck at penalty.backward() with an error

"RuntimeError: there are no graph nodes that require computing gradients"

. I used requires_gradient = True for the interpolates variable. Thanks!

opened by HRLTY 7
gradients.norm(2, dim=1), dim=1?

@caogang Thanks for your good code! But something confuses me in gan_cifar10.py

dim = 1? Why it is only normed in the second axis? I think it should be normed across all axis but the batch axis.

opened by LynnHo 6

shouldn't it be D_real.backward(one)?

  ```
   D_real.backward(mone)
   # train with fake
    noise = torch.randn(BATCH_SIZE, 128)
    if use_cuda:
        noise = noise.cuda()
    noisev = autograd.Variable(noise, volatile=True)  # totally freeze netG
    fake = autograd.Variable(netG(noisev).data)
    inputv = fake
    D_fake = netD(inputv)
    D_fake = D_fake.mean()
    D_fake.backward(one)

opened by ypxie 6

After adding self-implemented Layer-Normalization, the backward time of gradient_penalty became large

My implementation of layer-normalization is:

class Layer_Norm(nn.Module):

    def __init__(self, dim):
        super(Layer_Norm, self).__init__()
        self.dim = dim
        self.g = Parameter(torch.zeros(1, dim))
        self.b = Parameter(torch.zeros(1, dim))
        self.init_weights()

    def forward(self, input):
        miu = torch.sum(input, 1).unsqueeze(1)/self.dim
        input_minus_miu = input - miu.expand_as(input)
        sigma = (torch.sum((input_minus_miu).pow(2), 1)/self.dim).sqrt().unsqueeze(1)
        input = input_minus_miu*self.g.expand(input.size())/sigma.expand_as(input) + self.b.expand(input.size())

        return input

    def init_weights(self):
        self.g.data.fill_(1)
        self.b.data.fill_(0)

After plugging in this before ReLU, the backward of gradient_penalty became large 0.1149s compared to the former value 0.0075s.

I compiled the source code from master branch, commit deb0aef30cdaa78f9840bfa4a919ad206e8e73a7 and also modified the ReLU source code before compiling according to your instruction. I am wondering if it is because the my implementation of layer-normalization contains something not suitable for double backward?

opened by santisy 5

Issues with Python3 Version

When i am trying to run with Python3 .I am facing lot of issues. Fixed almost of them Except the error in plot.py file at Flush

Axis Error: axis -1 is out of bounds for array of dimension 0.

Can you kindly help with this?

def flush(): prints = []

for name, vals in _since_last_flush.items():
	prints.append("{}\t{}".format(name, np.mean(vals.values())))
	_since_beginning[name].update(vals)

	x_vals = np.sort(_since_beginning[name].keys())
	y_vals = [_since_beginning[name][x] for x in x_vals]

	plt.clf()
	plt.plot(x_vals, y_vals)
	plt.xlabel('iteration')
	plt.ylabel(name)
	plt.savefig(name.replace(' ', '_')+'.jpg')

opened by jmandivarapu1 3

question of the code

Hi, what does the 'grad_outputs' in line 131 of gan_cifar10.py stand for ?Should the parameter be interpolates.size() instead of disc_interpolates.size()?

opened by tartarleft 3

Zero gradient

I'm trying to implement the gradient-penalty approach using this code, but this code block:

    gradients = autograd.grad(outputs=disc_interpolates, inputs=interpolates,
                              grad_outputs=torch.ones(disc_interpolates.size()).cuda(gpu) if use_cuda else torch.ones(
                                  disc_interpolates.size()),
                              create_graph=True, retain_graph=True, only_inputs=True)[0]

always returns gradients of 0.

Would this be caused by my using ConvTranspose2d units?

opened by elbamos 3

I install pytorch 0.1.12+ac1c674 and it errors like this.

https://github.com/pytorch/pytorch/commit/ac1c674723d503d96ba6cd7533a615d1b551e606

Traceback (most recent call last):
  File "gan_toy.py", line 270, in <module>
    gradient_penalty.backward()
  File "/home/yan/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 151, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "/home/yan/anaconda3/lib/python3.6/site-packages/torch/autograd/__init__.py", line 98, in backward
    variables, grad_variables, retain_graph)
  File "/home/yan/anaconda3/lib/python3.6/site-packages/torch/autograd/function.py", line 90, in apply
    return self._forward_cls.backward(self, *args)
  File "/home/yan/anaconda3/lib/python3.6/site-packages/torch/nn/_functions/linear.py", line 23, in backward
    grad_input = torch.mm(grad_output, weight)
  File "/home/yan/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 539, in mm
    return self._static_blas(Addmm, (output, 0, 1, self, matrix), False)
  File "/home/yan/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 532, in _static_blas
    return cls.apply(*(args[:1] + args[-2:] + (alpha, beta, inplace)))
  File "/home/yan/anaconda3/lib/python3.6/site-packages/torch/autograd/_functions/blas.py", line 24, in forward
    matrix1, matrix2, out=output)
TypeError: torch.addmm received an invalid combination of arguments - got (int, torch.cuda.ByteTensor, int, torch.cuda.ByteTensor, torch.cuda.FloatTensor, out=torch.cuda.ByteT
ensor), but expected one of:
 * (torch.cuda.ByteTensor source, torch.cuda.ByteTensor mat1, torch.cuda.ByteTensor mat2, *, torch.cuda.ByteTensor out)
 * (torch.cuda.ByteTensor source, torch.cuda.sparse.ByteTensor mat1, torch.cuda.ByteTensor mat2, *, torch.cuda.ByteTensor out)
 * (int beta, torch.cuda.ByteTensor source, torch.cuda.ByteTensor mat1, torch.cuda.ByteTensor mat2, *, torch.cuda.ByteTensor out)
 * (torch.cuda.ByteTensor source, int alpha, torch.cuda.ByteTensor mat1, torch.cuda.ByteTensor mat2, *, torch.cuda.ByteTensor out)
 * (int beta, torch.cuda.ByteTensor source, torch.cuda.sparse.ByteTensor mat1, torch.cuda.ByteTensor mat2, *, torch.cuda.ByteTensor out)
 * (torch.cuda.ByteTensor source, int alpha, torch.cuda.sparse.ByteTensor mat1, torch.cuda.ByteTensor mat2, *, torch.cuda.ByteTensor out)
 * (int beta, torch.cuda.ByteTensor source, int alpha, torch.cuda.ByteTensor mat1, torch.cuda.ByteTensor mat2, *, torch.cuda.ByteTensor out)
      didn't match because some of the arguments have invalid types: (int, torch.cuda.ByteTensor, int, torch.cuda.ByteTensor, torch.cuda.FloatTensor, out=torch.cuda.ByteTensor
)
 * (int beta, torch.cuda.ByteTensor source, int alpha, torch.cuda.sparse.ByteTensor mat1, torch.cuda.ByteTensor mat2, *, torch.cuda.ByteTensor out)
      didn't match because some of the arguments have invalid types: (int, torch.cuda.ByteTensor, int, torch.cuda.ByteTensor, torch.cuda.FloatTensor, out=torch.cuda.ByteTensor
)

Any quick fix or just wait for the milestone when stable double backprop have been implemented?

opened by Naruto-Sasuke 3

why D_real.backward(one) and D_fake.backward(mone)?

Thanks for your excellent work. I have two confusion on the operation said on the Title.

1: what are one/mone doing here

2: I read some GAN codes, they first compute loss, then to do backward （loss_D_real.backward() or loss_D_fake.backward() ）. What's the reason you not using loss?

opened by geekfeiw 2
Logic problem with calc_gradient_penalty in CNN case

Right now, you're getting the norm of the gradient in gan_mnist with gradient_penalty = ((gradients.norm(2, dim=1) - 1) ** 2).mean() * LAMBDA.

gradient_penalty is of shape [BATCH_SIZE ,1, 28, 28]. We want to calculate the norm of the gradient PER SAMPLE, and then use that as an error metric. That means we need to collapse gradient_penalty into one value per sample, or to shape [BATCH_SIZE, 1] or just [BATCH_SIZE].

But, gradients.norm(dim=1) collapses it to size [BATCH_SIZE, 28, 28], which isn't right.

Instead, gradients needs to be reshaped to be flat before you take the norm.

I monitored the value of gradient_penalty, and doing it the way it has now explodes to around 10000 for gan_mnist, even when the networks gradients were reasonable, so formal reason aside, I'm pretty sure there's a bug.

Great library by the way, it's made my life really easy. Thanks for posting it.

Want me to make a PR?

What do you think?

opened by samlobel 2
how to decide the value of λ

Since the d_loss = fake-real+gp, how to decide the value of LAMBDA to get a reasonable gradient penalty. I mean, if fake=-10, real=10, is gp=10 appropriate？or the gp should be smaller(e.g. gp=5)?

opened by silentsaber 1
How does optimizer work when there are 3 backwards(real, fake, penalty)?

I thought I should write like "D_cost = D_fake - D_real + gradient_penalty D_cost.backward()" but I don't know why you use backward like that.

https://github.com/caogang/wgan-gp/blob/ae47a185ed2e938c39cf3eb2f06b32dc1b6a2064/gan_cifar10.py#L203-L226

opened by wook3024 0
The aixs of norm of the gradient

https://github.com/caogang/wgan-gp/blob/ae47a185ed2e938c39cf3eb2f06b32dc1b6a2064/gan_mnist.py#L148

Hi, I accidently saw your code when I google W-GAN GP. I think there is something wrong with your implementationi here. In W-GAN GP, the norm of the interpolated gradient should be calculated across all axis except the batch axis, since the gradient is wrt each sample. But in your code, you only calculated the norm of the second dimension, which is not reaonable. I think you miss the following reshape step:

gradients.view(gradients.shape[0], -1)

opened by CharlesNord 1
A question about Dcost

excuse me, can you tell me how can we get the result from descriminator and then muiltiply -1 as the cost? especially in mnist dataset, we use the sigmoid as the last layer, and the cost is negtive all the time, thank you

opened by etoilestar 0
WGAN-gp loss keeps going large

Hello, I've implemented your code on my own dataset. However, the d_loss decreases from 10(which equals to lambda) to a very small negative number(like -10000), the wasserstein distance keeps going to order of million, and the gradient penalty changes from 10 to 0 and then goes to order of thousand. I've worked on this problem for several days but I still can't solve it. Can anyone help me with this? @caogang

opened by haonanhe 4

A pytorch implementation of Paper "Improved Training of Wasserstein GANs"

Related tags

Overview

WGAN-GP

Prerequisites

Progress

Results

Acknowledge

Comments

Owner

Marvin Cao

HashNeRF-pytorch - Pure PyTorch Implementation of NVIDIA paper on Instant Training of Neural Graphics primitives

ALBERT-pytorch-implementation - ALBERT pytorch implementation

The LaTeX and Python code for generating the paper, experiments' results and visualizations reported in each paper is available (whenever possible) in the paper's directory

Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

A PyTorch re-implementation of the paper 'Exploring Simple Siamese Representation Learning'. Reproduced the 67.8% Top1 Acc on ImageNet.

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement".

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

An implementation of Geoffrey Hinton's paper "How to represent part-whole hierarchies in a neural network" in Pytorch.

Official implementation of our CVPR2021 paper "OTA: Optimal Transport Assignment for Object Detection" in Pytorch.

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

PyTorch implementation of the Deep SLDA method from our CVPRW-2020 paper "Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis"

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

PyTorch 1.5 implementation for paper DECOR-GAN: 3D Shape Detailization by Conditional Refinement.

Pytorch implementation of our paper under review — Lottery Jackpots Exist in Pre-trained Models

Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(2021) paper