Image-to-image translation with conditional adversarial nets

Phillip Isola

Last update: Jan 8, 2023

Related tags

Deep Learning computer-vision deep-learning computer-graphics generative-adversarial-network gan dcgan image-manipulation image-generation pix2pix image-to-image-translation

Overview

pix2pix

Project | Arxiv | PyTorch

Torch implementation for learning a mapping from input images to output images, for example:

Image-to-Image Translation with Conditional Adversarial Networks
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros
CVPR, 2017.

On some tasks, decent results can be obtained fairly quickly and on small datasets. For example, to learn to generate facades (example shown above), we trained on just 400 images for about 2 hours (on a single Pascal Titan X GPU). However, for harder problems it may be important to train on far larger datasets, and for many hours or even days.

Note: Please check out our PyTorch implementation for pix2pix and CycleGAN. The PyTorch version is under active development and can produce results comparable to or better than this Torch version.

Setup

Prerequisites

Linux or OSX
NVIDIA GPU + CUDA CuDNN (CPU mode and CUDA without CuDNN may work with minimal modification, but untested)

Getting Started

Install torch and dependencies from https://github.com/torch/distro
Install torch packages nngraph and display

luarocks install nngraph
luarocks install https://raw.githubusercontent.com/szym/display/master/display-scm-0.rockspec

Clone this repo:

git clone [email protected]:phillipi/pix2pix.git
cd pix2pix

Download the dataset (e.g., CMP Facades):

bash ./datasets/download_dataset.sh facades

Train the model

DATA_ROOT=./datasets/facades name=facades_generation which_direction=BtoA th train.lua

(CPU only) The same training command without using a GPU or CUDNN. Setting the environment variables gpu=0 cudnn=0 forces CPU only

DATA_ROOT=./datasets/facades name=facades_generation which_direction=BtoA gpu=0 cudnn=0 batchSize=10 save_epoch_freq=5 th train.lua

(Optionally) start the display server to view results as the model trains. ( See Display UI for more details):

th -ldisplay.start 8000 0.0.0.0

Finally, test the model:

DATA_ROOT=./datasets/facades name=facades_generation which_direction=BtoA phase=val th test.lua

The test results will be saved to an html file here: ./results/facades_generation/latest_net_G_val/index.html.

Train

DATA_ROOT=/path/to/data/ name=expt_name which_direction=AtoB th train.lua

Switch AtoB to BtoA to train translation in opposite direction.

Models are saved to ./checkpoints/expt_name (can be changed by passing checkpoint_dir=your_dir in train.lua).

See opt in train.lua for additional training options.

Test

DATA_ROOT=/path/to/data/ name=expt_name which_direction=AtoB phase=val th test.lua

This will run the model named expt_name in direction AtoB on all images in /path/to/data/val.

Result images, and a webpage to view them, are saved to ./results/expt_name (can be changed by passing results_dir=your_dir in test.lua).

See opt in test.lua for additional testing options.

Datasets

Download the datasets using the following script. Some of the datasets are collected by other researchers. Please cite their papers if you use the data.

bash ./datasets/download_dataset.sh dataset_name

facades: 400 images from CMP Facades dataset. [Citation]
cityscapes: 2975 images from the Cityscapes training set. [Citation]
maps: 1096 training images scraped from Google Maps
edges2shoes: 50k training images from UT Zappos50K dataset. Edges are computed by HED edge detector + post-processing. [Citation]
edges2handbags: 137K Amazon Handbag images from iGAN project. Edges are computed by HED edge detector + post-processing. [Citation]
night2day: around 20K natural scene images from Transient Attributes dataset [Citation]. To train a day2night pix2pix model, you need to add which_direction=BtoA.

Models

Download the pre-trained models with the following script. You need to rename the model (e.g., facades_label2image to /checkpoints/facades/latest_net_G.t7) after the download has finished.

bash ./models/download_model.sh model_name

facades_label2image (label -> facade): trained on the CMP Facades dataset.
cityscapes_label2image (label -> street scene): trained on the Cityscapes dataset.
cityscapes_image2label (street scene -> label): trained on the Cityscapes dataset.
edges2shoes (edge -> photo): trained on UT Zappos50K dataset.
edges2handbags (edge -> photo): trained on Amazon handbags images.
day2night (daytime scene -> nighttime scene): trained on around 100 webcams.

Setup Training and Test data

Generating Pairs

We provide a python script to generate training data in the form of pairs of images {A,B}, where A and B are two different depictions of the same underlying scene. For example, these might be pairs {label map, photo} or {bw image, color image}. Then we can learn to translate A to B or B to A:

Create folder /path/to/data with subfolders A and B. A and B should each have their own subfolders train, val, test, etc. In /path/to/data/A/train, put training images in style A. In /path/to/data/B/train, put the corresponding images in style B. Repeat same for other data splits (val, test, etc).

Corresponding images in a pair {A,B} must be the same size and have the same filename, e.g., /path/to/data/A/train/1.jpg is considered to correspond to /path/to/data/B/train/1.jpg.

Once the data is formatted this way, call:

python scripts/combine_A_and_B.py --fold_A /path/to/data/A --fold_B /path/to/data/B --fold_AB /path/to/data

This will combine each pair of images (A,B) into a single image file, ready for training.

Notes on Colorization

No need to run combine_A_and_B.py for colorization. Instead, you need to prepare some natural images and set preprocess=colorization in the script. The program will automatically convert each RGB image into Lab color space, and create L -> ab image pair during the training. Also set input_nc=1 and output_nc=2.

Extracting Edges

We provide python and Matlab scripts to extract coarse edges from photos. Run scripts/edges/batch_hed.py to compute HED edges. Run scripts/edges/PostprocessHED.m to simplify edges with additional post-processing steps. Check the code documentation for more details.

Evaluating Labels2Photos on Cityscapes

We provide scripts for running the evaluation of the Labels2Photos task on the Cityscapes validation set. We assume that you have installed caffe (and pycaffe) in your system. If not, see the official website for installation instructions. Once caffe is successfully installed, download the pre-trained FCN-8s semantic segmentation model (512MB) by running

bash ./scripts/eval_cityscapes/download_fcn8s.sh

Then make sure ./scripts/eval_cityscapes/ is in your system's python path. If not, run the following command to add it

export PYTHONPATH=${PYTHONPATH}:./scripts/eval_cityscapes/

Now you can run the following command to evaluate your predictions:

python ./scripts/eval_cityscapes/evaluate.py --cityscapes_dir /path/to/original/cityscapes/dataset/ --result_dir /path/to/your/predictions/ --output_dir /path/to/output/directory/

Images stored under --result_dir should contain your model predictions on the Cityscapes validation split, and have the original Cityscapes naming convention (e.g., frankfurt_000001_038418_leftImg8bit.png). The script will output a text file under --output_dir containing the metric.

Further notes: Our pre-trained FCN model is not supposed to work on Cityscapes in the original resolution (1024x2048) as it was trained on 256x256 images that are then upsampled to 1024x2048 during training. The purpose of the resizing during training was to 1) keep the label maps in the original high resolution untouched and 2) avoid the need to change the standard FCN training code and the architecture for Cityscapes. During test time, you need to synthesize 256x256 results. Our test code will automatically upsample your results to 1024x2048 before feeding them to the pre-trained FCN model. The output is at 1024x2048 resolution and will be compared to 1024x2048 ground truth labels. You do not need to resize the ground truth labels. The best way to verify whether everything is correct is to reproduce the numbers for real images in the paper first. To achieve it, you need to resize the original/real Cityscapes images (not labels) to 256x256 and feed them to the evaluation code.

Display UI

Optionally, for displaying images during training and test, use the display package.

Install it with: luarocks install https://raw.githubusercontent.com/szym/display/master/display-scm-0.rockspec
Then start the server with: th -ldisplay.start
Open this URL in your browser: http://localhost:8000

By default, the server listens on localhost. Pass 0.0.0.0 to allow external connections on any interface:

th -ldisplay.start 8000 0.0.0.0

Then open http://(hostname):(port)/ in your browser to load the remote desktop.

L1 error is plotted to the display by default. Set the environment variable display_plot to a comma-separated list of values errL1, errG and errD to visualize the L1, generator, and discriminator error respectively. For example, to plot only the generator and discriminator errors to the display instead of the default L1 error, set display_plot="errG,errD".

Citation

If you use this code for your research, please cite our paper Image-to-Image Translation Using Conditional Adversarial Networks:

@article{pix2pix2017,
  title={Image-to-Image Translation with Conditional Adversarial Networks},
  author={Isola, Phillip and Zhu, Jun-Yan and Zhou, Tinghui and Efros, Alexei A},
  journal={CVPR},
  year={2017}
}

Cat Paper Collection

If you love cats, and love reading cool graphics, vision, and learning papers, please check out the Cat Paper Collection:
[Github] [Webpage]

Acknowledgments

Code borrows heavily from DCGAN. The data loader is modified from DCGAN and Context-Encoder.

Comments

Evaluating Cityscapes
Hi,

I'm having difficulties reproducing the results from the CycleGAN paper for the cityscapes evaluation. For the city->label classification scores I get very similar results. But, for the label->photo FCN score experiment I get really bad results. I used the code from the ./scripts/eval_cityscapes folder and trimmed it down a bit to find the error (see code below): I load a single image from the cityscapes dataset, resize and preprocess it using the code from the repo and then perform a forward pass through the pretrained caffe model.

Unfortunately, the caffe model outputs mostly 0s. Do you have any suggestions?

caffemodel_dir = 'caffemodel/' caffe.set_mode_cpu(); net = caffe.Net(caffemodel_dir + '/deploy.prototxt', caffemodel_dir + 'fcn-8s-cityscapes.caffemodel', caffe.TEST) def preprocess(im): in_ = np.array(im, dtype=np.float32) in_ = in_[:, :, ::-1] in_ -= np.array((72.78044, 83.21195, 73.45286), dtype=np.float32) in_ = in_.transpose((2, 0, 1)) return in_ orig = Image.open('../../../pix2pix/scripts/eval_cityscapes/leftImg8bit/train/dusseldorf/dusseldorf_000087_000019_leftImg8bit.png') resized = scipy.misc.imresize(np.array(orig), (256, 256)) segmented = segrun(net, preprocess(resized))

^Left to right: "orig", "resized" and "segmented"

Thanks in advance.
opened by tychovdo 30
luajit out of memory during training

Hi,

I'm trying to train pix2pix on a particular image labeling task. Things worked fine running the facades demo, although I had to use the CPU on my MacBook since the built-in GPU didn't have enough memory for that task.

I've used the combine_A_and_B.py script to generate new image pairs from about 6k pairs of input and label images. When training, I'm getting an error message: luajit: not enough memory

My command line is below. I've got the display frequency set high so I can see what goes on in early iterations. Would dial that down once more comfortable with what was happening.

Anything I can do about the memory error?

$ DATA_ROOT=./datasets/imageClef/combined name=clef_generation which_direction=AtoB gpu=0 cudnn=0 batchSize=10 save_epoch_freq=5 display_freq=3 th train.lua { cudnn : 0 name : "clef_generation" niter : 200 batchSize : 10 n_layers_D : 0 ndf : 64 which_model_netG : "unet" save_display_freq : 5000 print_freq : 50 gpu : 0 use_GAN : 1 DATA_ROOT : "./datasets/imageClef/combined" serial_batch_iter : 1 use_L1 : 1 save_epoch_freq : 5 output_nc : 3 checkpoints_dir : "./checkpoints" input_nc : 3 beta1 : 0.5 continue_train : 0 which_direction : "AtoB" phase : "train" fineSize : 256 condition_GAN : 1 loadSize : 286 lambda : 100 ngf : 64 preprocess : "regular" which_model_netD : "basic" display_freq : 3 display : 1 display_id : 10 ntrain : inf nThreads : 2 lr : 0.0002 flip : 1 save_latest_freq : 5000 serial_batches : 0 } Random Seed: 276 #threads...2 Starting donkey with id: 2 seed: 278 table: 0x0f12f520 Starting donkey with id: 1 seed: 277 table: 0x0f14f0a8 ./datasets/imageClef/combined ./datasets/imageClef/combined trainCache /Users/danielr/Documents/src/pix2pix/cache/_Users_danielr_Documents_src_pix2pix_datasets_imageClef_combined_train_trainCache.t7 Creating train metadata serial batch:, 0 table: 0x0f1ed738 running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class trainCache /Users/danielr/Documents/src/pix2pix/cache/_Users_danielr_Documents_src_pix2pix_datasets_imageClef_combined_train_trainCache.t7 Creating train metadata serial batch:, 0 table: 0x0f0860f8 running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class now combine all the files to a single large file now combine all the files to a single large file load the large concatenated list of sample paths to self.imagePath cmd..gwc -L '/tmp/lua_4R6Gpf' |gcut -f1 -d' ' load the large concatenated list of sample paths to self.imagePath cmd..gwc -L '/tmp/lua_H4yZFT' |gcut -f1 -d' ' 5758 samples found... 0/5758 ...................] ETA: 0ms | Step: 0ms
Updating classList and imageClass appropriately [=================== 1/1 =====================>] Tot: 2ms | Step: 2ms
5758 samples found... 0/5758 ...................] ETA: 0ms | Step: 0ms
Updating classList and imageClass appropriately [=================== 1/1 =====================>] Tot: 2ms | Step: 2ms
Cleaning up temporary files Cleaning up temporary files Dataset Size: 5758 define model netG... define model netD... nn.gModule nn.Sequential { [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> output] (1): nn.SpatialConvolution(6 -> 64, 4x4, 2,2, 1,1) (2): nn.LeakyReLU(0.2) (3): nn.SpatialConvolution(64 -> 128, 4x4, 2,2, 1,1) (4): nn.SpatialBatchNormalization (4D) (128) (5): nn.LeakyReLU(0.2) (6): nn.SpatialConvolution(128 -> 256, 4x4, 2,2, 1,1) (7): nn.SpatialBatchNormalization (4D) (256) (8): nn.LeakyReLU(0.2) (9): nn.SpatialConvolution(256 -> 512, 4x4, 1,1, 1,1) (10): nn.SpatialBatchNormalization (4D) (512) (11): nn.LeakyReLU(0.2) (12): nn.SpatialConvolution(512 -> 1, 4x4, 1,1, 1,1) (13): nn.Sigmoid } running model on CPU /Users/danielr/torch/install/bin/luajit: not enough memory

opened by rdaniel 16
Pre-trained caffe model for FCN score evaluation

Hello, could you please tell me the performance of the pre-trained caffe model on the original cityscapes datasets given by this code ? I follow the instructions in readme.mk to evaluate the label_to_image generator, which means the input of the FCN model is the generated fake image. I wondered how the model downloaded by this code works on the original cityscapes datasets. So I change the parameter --result_dir in ./scripts/eval_cityscapes/evaluate.py to the original val set of cityscapes. But the results are very bad(as follows). Is there anything wrong with this model ? Or is this model already been trained on cityscapes dataset? Thank you.

opened by FishYuLi 12
Visualizing Training Loss

Hi there,

Is any kind of real-time training plotting visualization in the works for pix2pix?

I'm interested in visualizing Err_G, Err_D, and ErrL1 with the UI display while training with train.lua. I've only poked around with lua and torch with various machine learning projects and have yet to write much of either, although I'm happy to dig-in and try and figure something like this out if it seems like it would be rather trivial/helpful.

My initial thought would be to use display.plot(...) to update the display on each training batch. Anybody more familiar with the code base have any ideas or examples they would like to share?

P.S. Really rad paper + source code, super excited to have access to this research :) Thanks to all who are working on this!

opened by brannondorsey 11
Trouble reproducing ground truth results

Hi,

I am trying to use the evaluation script to reproduce the ground truth results. The label has a shape of (1024, 2048, 3) but the segmentation result has a shape of (1024, 2048). As a result the fast_hist function throws the following error: IndexError: index 2097152 is out of bounds for axis 1 with size 2097152.

If I try to select only one of the label channels or use np.repeat to stack the segmentation result with itself I get very poor results for the ground truth (Mean Pixel accuracy < 0.04).

Is that the intended behavior of the script?

Thanks in advance.

opened by erthher 10
Question about patchGAN

I read your paper and the implementation. The methods about patchGAN described in your paper seems promising, but how it is used in your code i wonder. You preprocess the data into patches before loading or some other way?

opened by SJTUzhanglj 10
ThCudaCheck: out of memory

while i run the training command: DATA_ROOT=./datasets/alphabet1 name=blob_placement1 which_direction=AtoB th train.lua I get the following output: { cudnn : 1 name : "blob_placement1" niter : 200 batchSize : 1 n_layers_D : 0 ndf : 64 which_model_netG : "unet" save_display_freq : 5000 print_freq : 50 gpu : 1 use_GAN : 1 DATA_ROOT : "./datasets/alphabet1" serial_batch_iter : 1 use_L1 : 1 save_epoch_freq : 50 output_nc : 3 checkpoints_dir : "./checkpoints" input_nc : 3 beta1 : 0.5 continue_train : 0 which_direction : "AtoB" phase : "train" fineSize : 256 condition_GAN : 1 loadSize : 286 lambda : 100 ngf : 64 preprocess : "regular" which_model_netD : "basic" display : 1 display_freq : 100 display_id : 10 flip : 1 ntrain : inf lr : 0.0002 nThreads : 2 display_plot : "errL1" save_latest_freq : 5000 serial_batches : 0 } Random Seed: 3566 #threads...2 Starting donkey with id: 1 seed: 3567 table: 0x41eedc58 Starting donkey with id: 2 seed: 3568 table: 0x40b5d2b0 ./datasets/alphabet1 ./datasets/alphabet1 trainCache /home/admink/pix2pix/cache/_home_admink_pix2pix_datasets_alphabet1_train_trainCache.t7 Creating train metadata serial batch:, 0 table: 0x41177720 running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class trainCache /home/admink/pix2pix/cache/_home_admink_pix2pix_datasets_alphabet1_train_trainCache.t7 Creating train metadata serial batch:, 0 table: 0x41b1d360 running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class now combine all the files to a single large file now combine all the files to a single large file load the large concatenated list of sample paths to self.imagePath cmd..wc -L '/tmp/lua_kdsZSG' |cut -f1 -d' ' load the large concatenated list of sample paths to self.imagePath cmd..wc -L '/tmp/lua_e80M1C' |cut -f1 -d' ' 15 samples found..... 0/15 .....................] ETA: 0ms | Step: 0ms
Updating classList and imageClass appropriately [=================== 1/1 =====================>] Tot: 0ms | Step: 0ms
15 samples found..... 0/15 .....................] ETA: 0ms | Step: 0ms
Updating classList and imageClass appropriately [=================== 1/1 =====================>] Tot: 0ms | Step: 0ms
Cleaning up temporary files Cleaning up temporary files Dataset Size: 50 define model netG... define model netD... nn.gModule nn.Sequential { [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> output] (1): nn.SpatialConvolution(6 -> 64, 4x4, 2,2, 1,1) (2): nn.LeakyReLU(0.2) (3): nn.SpatialConvolution(64 -> 128, 4x4, 2,2, 1,1) (4): nn.SpatialBatchNormalization (4D) (128) (5): nn.LeakyReLU(0.2) (6): nn.SpatialConvolution(128 -> 256, 4x4, 2,2, 1,1) (7): nn.SpatialBatchNormalization (4D) (256) (8): nn.LeakyReLU(0.2) (9): nn.SpatialConvolution(256 -> 512, 4x4, 1,1, 1,1) (10): nn.SpatialBatchNormalization (4D) (512) (11): nn.LeakyReLU(0.2) (12): nn.SpatialConvolution(512 -> 1, 4x4, 1,1, 1,1) (13): nn.Sigmoid } transferring to gpu... done THCudaCheck FAIL file=/home/admink/torch/extra/cutorch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory /home/admink/torch/install/bin/luajit: /home/admink/torch/install/share/lua/5.1/nn/Module.lua:309: cuda runtime error (2) : out of memory at /home/admink/torch/extra/cutorch/lib/THC/generic/THCStorage.cu:66 stack traceback: [C]: in function 'Tensor' /home/admink/torch/install/share/lua/5.1/nn/Module.lua:309: in function 'flatten' /home/admink/torch/install/share/lua/5.1/nn/Module.lua:326: in function 'getParameters' train.lua:445: in main chunk [C]: in function 'dofile' ...mink/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50

It only runs fine when dataset size is 3. i reduced the images size to 128*128 and could still train on 10 images maximum. I want to train the model on at least 500 images. How this error could be resolved?

opened by khaulahzia 8
Getting better results by setting use_GAN to 0

The paper has suggested to use a combination of both the GAN loss and L1 loss. But by turning off the GAN loss by setting use_GAN=0, I actually got much more detailed model outputs on the edges2shoes dataset ( trained for ~24 hours on a Titan X GPU )

opened by yanjidriveai 7
Got `internal error in __sub: no metatable` error when I train the sample dataset
I followed Getting Started instruction and installed pix2pix. Then, downloaded facades dataset by bash ./datasets/download_dataset.sh facades and run DATA_ROOT=./datasets/facades name=facades_generation which_direction=BtoA th train.lua

but I got the following error.

~/torch-cl/install/bin/luajit: ./models.lua:69: internal error in __sub: no metatable stack traceback: [C]: in function '__sub' ./models.lua:69: in function 'defineG_unet' train.lua:110: in function 'defineG' train.lua:146: in main chunk [C]: in function 'dofile' ...i/torch-cl/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x010ec81ce0

How can I fix this?

run on Mac OSX

python2.7
opened by naobit 7
Strange phenomenon in my extreme experiments

I am doing what you may consider extreme experiments with your code, as a part of my artistic explorations. See http://liipetti.net/erratic/2016/11/25/imaginary-landscapes-using-pix2pix/

In the images at the end of the post you will see a roughly square shape in the middle, having higher frequency content than the rest of the image. It could result from the nature of the experiment (which includes scaling and filling in the missing part), but I came to think of the PatchGAN method described in your paper. Is it used in the code, at all? By default?

I just happened to think that if the discriminator is looking at an area in the center, then the generator might learn to put more detailed content just there. I don't know yet if this is happening, but it would be a possible explanation for what I am experiencing.

Here's two images as an example, the high frequency "watermark" clearly visible in the center.

opened by htoyryla 7
Input and output sizes of images？

Why do I find that the output size will be smaller in the training results？ How can I modify the code to customize the output size or the same size as the input image...

opened by ArtScanner 6
About the receptive field of the discriminator achitecture

In section 6.1.2 of the paper released in arXiv, how can calculate the receptive field of the discriminator? The receptive field of a 16x16 discriminator "C64-C128" composed of two layers of convolution with a kernel size of 4 and a stride of 2 is 16x16, instead of 10 (4 + 3x(4-1)), is it corrected?

opened by ShuGuoJ 1
Getting Started Installation Instructions.

Hello all. I was following the installation instructions and I keep encountering this error and there isnt much about it online. I get this error when I am attempting to do:

luarocks install nngraph

The error says: `Installing https://raw.githubusercontent.com/torch/rocks/master/nngraph-scm-1.rockspec... Using https://raw.githubusercontent.com/torch/rocks/master/nngraph-scm-1.rockspec... switching to 'build' mode Cloning into 'nngraph'... fatal: remote error: The unauthenticated git protocol on port 9418 is no longer supported. Please see https://github.blog/2021-09-01-improving-git-protocol-security-github/ for more information.

Error: Failed cloning git repository.`

Has anyone encountered this and know how to fix it? Thank you

opened by r1cummings 2
the code for evaluation in Photo2label of datasets-cityscapes might be wrong?

hi i notice that your code convert the color-photo into 14-class label but the output of net has 34 class ,i think this might convert some pixel to the wrong id (for example (0,0,0) into id correspond to (0,0,70)? )

opened by adamas-v 1
How should I modify the model(G and D) structure if my input are all one-hot encoded matrix(3-D tensor) which only contains either 0 or 1

I'm working on something that need to convert each value in 2-D tensor to 3-D tensor where each value in the original 2-D tensor has been one-hot encoded. For example, say, I have a matrix M (51 x 51), and M(I, j) originally was a scalar, then it is converted to a one-hot vector (size 30). In the end, I will end up with a 3-D input (30x51x51). In this case, how should I modify Generator and Discriminator model structure?

Moreover, due to my project goal, the 3-D tensor which only contains one-hot vector will be concatenated another similar 3-D tensor which also only contains one-hot vectors. Thus, in the end, the input actually becomes a 3-D tensor which contains either one-hot vector or two-hot vector, or even three-hot vectors. In this case, how should I modify the model structure? Do I still have to convert all the values to (-1, 1)? for example, 1 becomes to 1, and 0 becomes to -1 ?

opened by lkqnaruto 1

fcn-8s-cityscapes weight link failure

Hi，It seems https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/blob/f13aab8148bd5f15b9eb47b690496df8dadbab0c/scripts/eval_cityscapes/download_fcn8s.sh#L1 link failed when I ran sh download_fcn8s.sh ,

http://people.eecs.berkeley.edu/~tinghuiz/projects/pix2pix/fcn-8s-cityscapes/fcn-8s-cityscapes.caffemodel Resolving people.eecs.berkeley.edu (people.eecs.berkeley.edu)... 128.32.244.190 Connecting to people.eecs.berkeley.edu (people.eecs.berkeley.edu)|128.32.244.190|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently Location: https://tinghuiz.github.io/projects/pix2pix/fcn-8s-cityscapes/fcn-8s-cityscapes.caffemodel [following] --2021-07-05 10:37:38--  https://tinghuiz.github.io/projects/pix2pix/fcn-8s-cityscapes/fcn-8s-cityscapes.caffemodel Resolving tinghuiz.github.io (tinghuiz.github.io)... 185.199.110.153, 185.199.111.153, 185.199.109.153, ...
Connecting to tinghuiz.github.io (tinghuiz.github.io)|185.199.110.153|:443... connected.
HTTP request sent, awaiting response... 404 Not Found 2021-07-05 10:37:38 ERROR 404: Not Found.

Hope to get an effective link,Thanks!

opened by Kravrolens 3

Image-to-image translation with conditional adversarial nets

Related tags

Overview

pix2pix

Setup

Prerequisites

Getting Started

Train

Test

Datasets

Models

Setup Training and Test data

Generating Pairs

Notes on Colorization

Extracting Edges

Evaluating Labels2Photos on Cityscapes

Display UI

Citation

Cat Paper Collection

Acknowledgments

Comments

Owner

Phillip Isola

StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.

Real-CUGAN - Real Cascade U-Nets for Anime Image Super Resolution

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Unofficial implement with paper SpeakerGAN: Speaker identification with conditional generative adversarial network

Create animations for the optimization trajectory of neural nets

A PyTorch Implementation of the paper - Choi, Woosung, et al. "Investigating u-nets with various intermediate blocks for spectrogram-based singing voice separation." 21th International Society for Music Information Retrieval Conference, ISMIR. 2020.

SMD-Nets: Stereo Mixture Density Networks

Code for visualizing the loss landscape of neural nets

Companion code for the paper "An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their Asymptotic Overconfidence" (NeurIPS 2021)

NudeNet: Neural Nets for Nudity Classification, Detection and selective censoring

[NeurIPS 2021] Well-tuned Simple Nets Excel on Tabular Datasets

Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

Code for the paper: Adversarial Training Against Location-Optimized Adversarial Patches. ECCV-W 2020.

Adversarial Color Enhancement: Generating Unrestricted Adversarial Images by Optimizing a Color Filter

transfer attack; adversarial examples; black-box attack; unrestricted Adversarial Attacks on ImageNet; CVPR2021 天池黑盒竞赛

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)

Causal-Adversarial-Instruments - PyTorch Implementation for Developing Library of Investigating Adversarial Examples on A Causal View by Instruments

Super-Fast-Adversarial-Training - A PyTorch Implementation code for developing super fast adversarial training