Image-to-image translation with conditional adversarial nets

Overview

pix2pix

Project | Arxiv | PyTorch

Torch implementation for learning a mapping from input images to output images, for example:

Image-to-Image Translation with Conditional Adversarial Networks
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros
CVPR, 2017.

On some tasks, decent results can be obtained fairly quickly and on small datasets. For example, to learn to generate facades (example shown above), we trained on just 400 images for about 2 hours (on a single Pascal Titan X GPU). However, for harder problems it may be important to train on far larger datasets, and for many hours or even days.

Note: Please check out our PyTorch implementation for pix2pix and CycleGAN. The PyTorch version is under active development and can produce results comparable to or better than this Torch version.

Setup

Prerequisites

  • Linux or OSX
  • NVIDIA GPU + CUDA CuDNN (CPU mode and CUDA without CuDNN may work with minimal modification, but untested)

Getting Started

luarocks install nngraph
luarocks install https://raw.githubusercontent.com/szym/display/master/display-scm-0.rockspec
  • Clone this repo:
git clone [email protected]:phillipi/pix2pix.git
cd pix2pix
bash ./datasets/download_dataset.sh facades
  • Train the model
DATA_ROOT=./datasets/facades name=facades_generation which_direction=BtoA th train.lua
  • (CPU only) The same training command without using a GPU or CUDNN. Setting the environment variables gpu=0 cudnn=0 forces CPU only
DATA_ROOT=./datasets/facades name=facades_generation which_direction=BtoA gpu=0 cudnn=0 batchSize=10 save_epoch_freq=5 th train.lua
  • (Optionally) start the display server to view results as the model trains. ( See Display UI for more details):
th -ldisplay.start 8000 0.0.0.0
  • Finally, test the model:
DATA_ROOT=./datasets/facades name=facades_generation which_direction=BtoA phase=val th test.lua

The test results will be saved to an html file here: ./results/facades_generation/latest_net_G_val/index.html.

Train

DATA_ROOT=/path/to/data/ name=expt_name which_direction=AtoB th train.lua

Switch AtoB to BtoA to train translation in opposite direction.

Models are saved to ./checkpoints/expt_name (can be changed by passing checkpoint_dir=your_dir in train.lua).

See opt in train.lua for additional training options.

Test

DATA_ROOT=/path/to/data/ name=expt_name which_direction=AtoB phase=val th test.lua

This will run the model named expt_name in direction AtoB on all images in /path/to/data/val.

Result images, and a webpage to view them, are saved to ./results/expt_name (can be changed by passing results_dir=your_dir in test.lua).

See opt in test.lua for additional testing options.

Datasets

Download the datasets using the following script. Some of the datasets are collected by other researchers. Please cite their papers if you use the data.

bash ./datasets/download_dataset.sh dataset_name

Models

Download the pre-trained models with the following script. You need to rename the model (e.g., facades_label2image to /checkpoints/facades/latest_net_G.t7) after the download has finished.

bash ./models/download_model.sh model_name
  • facades_label2image (label -> facade): trained on the CMP Facades dataset.
  • cityscapes_label2image (label -> street scene): trained on the Cityscapes dataset.
  • cityscapes_image2label (street scene -> label): trained on the Cityscapes dataset.
  • edges2shoes (edge -> photo): trained on UT Zappos50K dataset.
  • edges2handbags (edge -> photo): trained on Amazon handbags images.
  • day2night (daytime scene -> nighttime scene): trained on around 100 webcams.

Setup Training and Test data

Generating Pairs

We provide a python script to generate training data in the form of pairs of images {A,B}, where A and B are two different depictions of the same underlying scene. For example, these might be pairs {label map, photo} or {bw image, color image}. Then we can learn to translate A to B or B to A:

Create folder /path/to/data with subfolders A and B. A and B should each have their own subfolders train, val, test, etc. In /path/to/data/A/train, put training images in style A. In /path/to/data/B/train, put the corresponding images in style B. Repeat same for other data splits (val, test, etc).

Corresponding images in a pair {A,B} must be the same size and have the same filename, e.g., /path/to/data/A/train/1.jpg is considered to correspond to /path/to/data/B/train/1.jpg.

Once the data is formatted this way, call:

python scripts/combine_A_and_B.py --fold_A /path/to/data/A --fold_B /path/to/data/B --fold_AB /path/to/data

This will combine each pair of images (A,B) into a single image file, ready for training.

Notes on Colorization

No need to run combine_A_and_B.py for colorization. Instead, you need to prepare some natural images and set preprocess=colorization in the script. The program will automatically convert each RGB image into Lab color space, and create L -> ab image pair during the training. Also set input_nc=1 and output_nc=2.

Extracting Edges

We provide python and Matlab scripts to extract coarse edges from photos. Run scripts/edges/batch_hed.py to compute HED edges. Run scripts/edges/PostprocessHED.m to simplify edges with additional post-processing steps. Check the code documentation for more details.

Evaluating Labels2Photos on Cityscapes

We provide scripts for running the evaluation of the Labels2Photos task on the Cityscapes validation set. We assume that you have installed caffe (and pycaffe) in your system. If not, see the official website for installation instructions. Once caffe is successfully installed, download the pre-trained FCN-8s semantic segmentation model (512MB) by running

bash ./scripts/eval_cityscapes/download_fcn8s.sh

Then make sure ./scripts/eval_cityscapes/ is in your system's python path. If not, run the following command to add it

export PYTHONPATH=${PYTHONPATH}:./scripts/eval_cityscapes/

Now you can run the following command to evaluate your predictions:

python ./scripts/eval_cityscapes/evaluate.py --cityscapes_dir /path/to/original/cityscapes/dataset/ --result_dir /path/to/your/predictions/ --output_dir /path/to/output/directory/

Images stored under --result_dir should contain your model predictions on the Cityscapes validation split, and have the original Cityscapes naming convention (e.g., frankfurt_000001_038418_leftImg8bit.png). The script will output a text file under --output_dir containing the metric.

Further notes: Our pre-trained FCN model is not supposed to work on Cityscapes in the original resolution (1024x2048) as it was trained on 256x256 images that are then upsampled to 1024x2048 during training. The purpose of the resizing during training was to 1) keep the label maps in the original high resolution untouched and 2) avoid the need to change the standard FCN training code and the architecture for Cityscapes. During test time, you need to synthesize 256x256 results. Our test code will automatically upsample your results to 1024x2048 before feeding them to the pre-trained FCN model. The output is at 1024x2048 resolution and will be compared to 1024x2048 ground truth labels. You do not need to resize the ground truth labels. The best way to verify whether everything is correct is to reproduce the numbers for real images in the paper first. To achieve it, you need to resize the original/real Cityscapes images (not labels) to 256x256 and feed them to the evaluation code.

Display UI

Optionally, for displaying images during training and test, use the display package.

  • Install it with: luarocks install https://raw.githubusercontent.com/szym/display/master/display-scm-0.rockspec
  • Then start the server with: th -ldisplay.start
  • Open this URL in your browser: http://localhost:8000

By default, the server listens on localhost. Pass 0.0.0.0 to allow external connections on any interface:

th -ldisplay.start 8000 0.0.0.0

Then open http://(hostname):(port)/ in your browser to load the remote desktop.

L1 error is plotted to the display by default. Set the environment variable display_plot to a comma-separated list of values errL1, errG and errD to visualize the L1, generator, and discriminator error respectively. For example, to plot only the generator and discriminator errors to the display instead of the default L1 error, set display_plot="errG,errD".

Citation

If you use this code for your research, please cite our paper Image-to-Image Translation Using Conditional Adversarial Networks:

@article{pix2pix2017,
  title={Image-to-Image Translation with Conditional Adversarial Networks},
  author={Isola, Phillip and Zhu, Jun-Yan and Zhou, Tinghui and Efros, Alexei A},
  journal={CVPR},
  year={2017}
}

Cat Paper Collection

If you love cats, and love reading cool graphics, vision, and learning papers, please check out the Cat Paper Collection:
[Github] [Webpage]

Acknowledgments

Code borrows heavily from DCGAN. The data loader is modified from DCGAN and Context-Encoder.

Comments
  • Evaluating Cityscapes

    Evaluating Cityscapes

    Hi,

    I'm having difficulties reproducing the results from the CycleGAN paper for the cityscapes evaluation. For the city->label classification scores I get very similar results. But, for the label->photo FCN score experiment I get really bad results. I used the code from the ./scripts/eval_cityscapes folder and trimmed it down a bit to find the error (see code below): I load a single image from the cityscapes dataset, resize and preprocess it using the code from the repo and then perform a forward pass through the pretrained caffe model.

    Unfortunately, the caffe model outputs mostly 0s. Do you have any suggestions?

    caffemodel_dir = 'caffemodel/'
    caffe.set_mode_cpu();
    net = caffe.Net(caffemodel_dir + '/deploy.prototxt',
                    caffemodel_dir + 'fcn-8s-cityscapes.caffemodel',
                    caffe.TEST)
    
    def preprocess(im):
        in_ = np.array(im, dtype=np.float32)
        in_ = in_[:, :, ::-1]
        in_ -= np.array((72.78044, 83.21195, 73.45286), dtype=np.float32)
        in_ = in_.transpose((2, 0, 1))
        return in_
    
    orig = Image.open('../../../pix2pix/scripts/eval_cityscapes/leftImg8bit/train/dusseldorf/dusseldorf_000087_000019_leftImg8bit.png')
    resized = scipy.misc.imresize(np.array(orig), (256, 256))
    segmented = segrun(net, preprocess(resized))
    

    download ^Left to right: "orig", "resized" and "segmented"

    Thanks in advance.

    opened by tychovdo 30
  • luajit out of memory during training

    luajit out of memory during training

    Hi,

    I'm trying to train pix2pix on a particular image labeling task. Things worked fine running the facades demo, although I had to use the CPU on my MacBook since the built-in GPU didn't have enough memory for that task.

    I've used the combine_A_and_B.py script to generate new image pairs from about 6k pairs of input and label images. When training, I'm getting an error message: luajit: not enough memory

    My command line is below. I've got the display frequency set high so I can see what goes on in early iterations. Would dial that down once more comfortable with what was happening.

    Anything I can do about the memory error?

    $ DATA_ROOT=./datasets/imageClef/combined name=clef_generation which_direction=AtoB gpu=0 cudnn=0 batchSize=10 save_epoch_freq=5 display_freq=3 th train.lua { cudnn : 0 name : "clef_generation" niter : 200 batchSize : 10 n_layers_D : 0 ndf : 64 which_model_netG : "unet" save_display_freq : 5000 print_freq : 50 gpu : 0 use_GAN : 1 DATA_ROOT : "./datasets/imageClef/combined" serial_batch_iter : 1 use_L1 : 1 save_epoch_freq : 5 output_nc : 3 checkpoints_dir : "./checkpoints" input_nc : 3 beta1 : 0.5 continue_train : 0 which_direction : "AtoB" phase : "train" fineSize : 256 condition_GAN : 1 loadSize : 286 lambda : 100 ngf : 64 preprocess : "regular" which_model_netD : "basic" display_freq : 3 display : 1 display_id : 10 ntrain : inf nThreads : 2 lr : 0.0002 flip : 1 save_latest_freq : 5000 serial_batches : 0 } Random Seed: 276 #threads...2 Starting donkey with id: 2 seed: 278 table: 0x0f12f520 Starting donkey with id: 1 seed: 277 table: 0x0f14f0a8 ./datasets/imageClef/combined ./datasets/imageClef/combined trainCache /Users/danielr/Documents/src/pix2pix/cache/_Users_danielr_Documents_src_pix2pix_datasets_imageClef_combined_train_trainCache.t7 Creating train metadata serial batch:, 0 table: 0x0f1ed738 running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class trainCache /Users/danielr/Documents/src/pix2pix/cache/_Users_danielr_Documents_src_pix2pix_datasets_imageClef_combined_train_trainCache.t7 Creating train metadata serial batch:, 0 table: 0x0f0860f8 running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class now combine all the files to a single large file now combine all the files to a single large file load the large concatenated list of sample paths to self.imagePath cmd..gwc -L '/tmp/lua_4R6Gpf' |gcut -f1 -d' ' load the large concatenated list of sample paths to self.imagePath cmd..gwc -L '/tmp/lua_H4yZFT' |gcut -f1 -d' ' 5758 samples found... 0/5758 ...................] ETA: 0ms | Step: 0ms
    Updating classList and imageClass appropriately [=================== 1/1 =====================>] Tot: 2ms | Step: 2ms
    5758 samples found... 0/5758 ...................] ETA: 0ms | Step: 0ms
    Updating classList and imageClass appropriately [=================== 1/1 =====================>] Tot: 2ms | Step: 2ms
    Cleaning up temporary files Cleaning up temporary files Dataset Size: 5758 define model netG... define model netD... nn.gModule nn.Sequential { [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> output] (1): nn.SpatialConvolution(6 -> 64, 4x4, 2,2, 1,1) (2): nn.LeakyReLU(0.2) (3): nn.SpatialConvolution(64 -> 128, 4x4, 2,2, 1,1) (4): nn.SpatialBatchNormalization (4D) (128) (5): nn.LeakyReLU(0.2) (6): nn.SpatialConvolution(128 -> 256, 4x4, 2,2, 1,1) (7): nn.SpatialBatchNormalization (4D) (256) (8): nn.LeakyReLU(0.2) (9): nn.SpatialConvolution(256 -> 512, 4x4, 1,1, 1,1) (10): nn.SpatialBatchNormalization (4D) (512) (11): nn.LeakyReLU(0.2) (12): nn.SpatialConvolution(512 -> 1, 4x4, 1,1, 1,1) (13): nn.Sigmoid } running model on CPU /Users/danielr/torch/install/bin/luajit: not enough memory

    opened by rdaniel 16
  • Pre-trained caffe model for FCN score evaluation

    Pre-trained caffe model for FCN score evaluation

    Hello, could you please tell me the performance of the pre-trained caffe model on the original cityscapes datasets given by this code ? I follow the instructions in readme.mk to evaluate the label_to_image generator, which means the input of the FCN model is the generated fake image. I wondered how the model downloaded by this code works on the original cityscapes datasets. So I change the parameter --result_dir in ./scripts/eval_cityscapes/evaluate.py to the original val set of cityscapes. But the results are very bad(as follows). Is there anything wrong with this model ? Or is this model already been trained on cityscapes dataset? segresult Thank you.

    opened by FishYuLi 12
  • Visualizing Training Loss

    Visualizing Training Loss

    Hi there,

    Is any kind of real-time training plotting visualization in the works for pix2pix?

    I'm interested in visualizing Err_G, Err_D, and ErrL1 with the UI display while training with train.lua. I've only poked around with lua and torch with various machine learning projects and have yet to write much of either, although I'm happy to dig-in and try and figure something like this out if it seems like it would be rather trivial/helpful.

    My initial thought would be to use display.plot(...) to update the display on each training batch. Anybody more familiar with the code base have any ideas or examples they would like to share?

    P.S. Really rad paper + source code, super excited to have access to this research :) Thanks to all who are working on this!

    opened by brannondorsey 11
  • Trouble reproducing ground truth results

    Trouble reproducing ground truth results

    Hi,

    I am trying to use the evaluation script to reproduce the ground truth results. The label has a shape of (1024, 2048, 3) but the segmentation result has a shape of (1024, 2048). As a result the fast_hist function throws the following error: IndexError: index 2097152 is out of bounds for axis 1 with size 2097152.

    If I try to select only one of the label channels or use np.repeat to stack the segmentation result with itself I get very poor results for the ground truth (Mean Pixel accuracy < 0.04).

    Is that the intended behavior of the script?

    Thanks in advance.

    opened by erthher 10
  • Question about patchGAN

    Question about patchGAN

    I read your paper and the implementation. The methods about patchGAN described in your paper seems promising, but how it is used in your code i wonder. You preprocess the data into patches before loading or some other way?

    opened by SJTUzhanglj 10
  • ThCudaCheck: out of memory

    ThCudaCheck: out of memory

    while i run the training command: DATA_ROOT=./datasets/alphabet1 name=blob_placement1 which_direction=AtoB th train.lua I get the following output: { cudnn : 1 name : "blob_placement1" niter : 200 batchSize : 1 n_layers_D : 0 ndf : 64 which_model_netG : "unet" save_display_freq : 5000 print_freq : 50 gpu : 1 use_GAN : 1 DATA_ROOT : "./datasets/alphabet1" serial_batch_iter : 1 use_L1 : 1 save_epoch_freq : 50 output_nc : 3 checkpoints_dir : "./checkpoints" input_nc : 3 beta1 : 0.5 continue_train : 0 which_direction : "AtoB" phase : "train" fineSize : 256 condition_GAN : 1 loadSize : 286 lambda : 100 ngf : 64 preprocess : "regular" which_model_netD : "basic" display : 1 display_freq : 100 display_id : 10 flip : 1 ntrain : inf lr : 0.0002 nThreads : 2 display_plot : "errL1" save_latest_freq : 5000 serial_batches : 0 } Random Seed: 3566 #threads...2 Starting donkey with id: 1 seed: 3567 table: 0x41eedc58 Starting donkey with id: 2 seed: 3568 table: 0x40b5d2b0 ./datasets/alphabet1 ./datasets/alphabet1 trainCache /home/admink/pix2pix/cache/_home_admink_pix2pix_datasets_alphabet1_train_trainCache.t7 Creating train metadata serial batch:, 0 table: 0x41177720 running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class trainCache /home/admink/pix2pix/cache/_home_admink_pix2pix_datasets_alphabet1_train_trainCache.t7 Creating train metadata serial batch:, 0 table: 0x41b1d360 running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class now combine all the files to a single large file now combine all the files to a single large file load the large concatenated list of sample paths to self.imagePath cmd..wc -L '/tmp/lua_kdsZSG' |cut -f1 -d' ' load the large concatenated list of sample paths to self.imagePath cmd..wc -L '/tmp/lua_e80M1C' |cut -f1 -d' ' 15 samples found..... 0/15 .....................] ETA: 0ms | Step: 0ms
    Updating classList and imageClass appropriately [=================== 1/1 =====================>] Tot: 0ms | Step: 0ms
    15 samples found..... 0/15 .....................] ETA: 0ms | Step: 0ms
    Updating classList and imageClass appropriately [=================== 1/1 =====================>] Tot: 0ms | Step: 0ms
    Cleaning up temporary files Cleaning up temporary files Dataset Size: 50 define model netG... define model netD... nn.gModule nn.Sequential { [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> output] (1): nn.SpatialConvolution(6 -> 64, 4x4, 2,2, 1,1) (2): nn.LeakyReLU(0.2) (3): nn.SpatialConvolution(64 -> 128, 4x4, 2,2, 1,1) (4): nn.SpatialBatchNormalization (4D) (128) (5): nn.LeakyReLU(0.2) (6): nn.SpatialConvolution(128 -> 256, 4x4, 2,2, 1,1) (7): nn.SpatialBatchNormalization (4D) (256) (8): nn.LeakyReLU(0.2) (9): nn.SpatialConvolution(256 -> 512, 4x4, 1,1, 1,1) (10): nn.SpatialBatchNormalization (4D) (512) (11): nn.LeakyReLU(0.2) (12): nn.SpatialConvolution(512 -> 1, 4x4, 1,1, 1,1) (13): nn.Sigmoid } transferring to gpu... done THCudaCheck FAIL file=/home/admink/torch/extra/cutorch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory /home/admink/torch/install/bin/luajit: /home/admink/torch/install/share/lua/5.1/nn/Module.lua:309: cuda runtime error (2) : out of memory at /home/admink/torch/extra/cutorch/lib/THC/generic/THCStorage.cu:66 stack traceback: [C]: in function 'Tensor' /home/admink/torch/install/share/lua/5.1/nn/Module.lua:309: in function 'flatten' /home/admink/torch/install/share/lua/5.1/nn/Module.lua:326: in function 'getParameters' train.lua:445: in main chunk [C]: in function 'dofile' ...mink/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50

    It only runs fine when dataset size is 3. i reduced the images size to 128*128 and could still train on 10 images maximum. I want to train the model on at least 500 images. How this error could be resolved?

    opened by khaulahzia 8
  • Getting better results by setting use_GAN to 0

    Getting better results by setting use_GAN to 0

    The paper has suggested to use a combination of both the GAN loss and L1 loss. But by turning off the GAN loss by setting use_GAN=0, I actually got much more detailed model outputs on the edges2shoes dataset ( trained for ~24 hours on a Titan X GPU )

    opened by yanjidriveai 7
  • Got `internal error in __sub: no metatable` error when I train the sample dataset

    Got `internal error in __sub: no metatable` error when I train the sample dataset

    I followed Getting Started instruction and installed pix2pix. Then, downloaded facades dataset by bash ./datasets/download_dataset.sh facades and run DATA_ROOT=./datasets/facades name=facades_generation which_direction=BtoA th train.lua

    but I got the following error.

    ~/torch-cl/install/bin/luajit: ./models.lua:69: internal error in __sub: no metatable
    stack traceback:
    	[C]: in function '__sub'
    	./models.lua:69: in function 'defineG_unet'
    	train.lua:110: in function 'defineG'
    	train.lua:146: in main chunk
    	[C]: in function 'dofile'
    	...i/torch-cl/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    	[C]: at 0x010ec81ce0
    

    How can I fix this?

    • run on Mac OSX
    • python2.7
    opened by naobit 7
  • Strange phenomenon in my extreme experiments

    Strange phenomenon in my extreme experiments

    I am doing what you may consider extreme experiments with your code, as a part of my artistic explorations. See http://liipetti.net/erratic/2016/11/25/imaginary-landscapes-using-pix2pix/

    In the images at the end of the post you will see a roughly square shape in the middle, having higher frequency content than the rest of the image. It could result from the nature of the experiment (which includes scaling and filling in the missing part), but I came to think of the PatchGAN method described in your paper. Is it used in the code, at all? By default?

    I just happened to think that if the discriminator is looking at an area in the center, then the generator might learn to put more detailed content just there. I don't know yet if this is happening, but it would be a possible explanation for what I am experiencing.

    Here's two images as an example, the high frequency "watermark" clearly visible in the center.

    15232095_10154880580548729_3647209952325463915_n

    15178111_10154880580668729_5562908067157761161_n

    opened by htoyryla 7
  • Input and output sizes of images?

    Input and output sizes of images?

    Why do I find that the output size will be smaller in the training results? How can I modify the code to customize the output size or the same size as the input image...

    opened by ArtScanner 6
  • About the receptive field of the discriminator achitecture

    About the receptive field of the discriminator achitecture

    In section 6.1.2 of the paper released in arXiv, how can calculate the receptive field of the discriminator? The receptive field of a 16x16 discriminator "C64-C128" composed of two layers of convolution with a kernel size of 4 and a stride of 2 is 16x16, instead of 10 (4 + 3x(4-1)), is it corrected?

    opened by ShuGuoJ 1
  • Getting Started Installation Instructions.

    Getting Started Installation Instructions.

    Hello all. I was following the installation instructions and I keep encountering this error and there isnt much about it online. I get this error when I am attempting to do:

    luarocks install nngraph

    The error says: `Installing https://raw.githubusercontent.com/torch/rocks/master/nngraph-scm-1.rockspec... Using https://raw.githubusercontent.com/torch/rocks/master/nngraph-scm-1.rockspec... switching to 'build' mode Cloning into 'nngraph'... fatal: remote error: The unauthenticated git protocol on port 9418 is no longer supported. Please see https://github.blog/2021-09-01-improving-git-protocol-security-github/ for more information.

    Error: Failed cloning git repository.`

    Has anyone encountered this and know how to fix it? Thank you

    opened by r1cummings 2
  • the code for evaluation in Photo2label of datasets-cityscapes might be wrong?

    the code for evaluation in Photo2label of datasets-cityscapes might be wrong?

    hi i notice that your code convert the color-photo into 14-class label but the output of net has 34 class ,i think this might convert some pixel to the wrong id (for example (0,0,0) into id correspond to (0,0,70)? )

    opened by adamas-v 1
  • How should I modify the model(G and D) structure if my input are all one-hot encoded matrix(3-D tensor) which only contains either 0 or 1

    How should I modify the model(G and D) structure if my input are all one-hot encoded matrix(3-D tensor) which only contains either 0 or 1

    I'm working on something that need to convert each value in 2-D tensor to 3-D tensor where each value in the original 2-D tensor has been one-hot encoded. For example, say, I have a matrix M (51 x 51), and M(I, j) originally was a scalar, then it is converted to a one-hot vector (size 30). In the end, I will end up with a 3-D input (30x51x51). In this case, how should I modify Generator and Discriminator model structure?

    Moreover, due to my project goal, the 3-D tensor which only contains one-hot vector will be concatenated another similar 3-D tensor which also only contains one-hot vectors. Thus, in the end, the input actually becomes a 3-D tensor which contains either one-hot vector or two-hot vector, or even three-hot vectors. In this case, how should I modify the model structure? Do I still have to convert all the values to (-1, 1)? for example, 1 becomes to 1, and 0 becomes to -1 ?

    opened by lkqnaruto 1
  • fcn-8s-cityscapes weight link failure

    fcn-8s-cityscapes weight link failure

    Hi,It seems https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/blob/f13aab8148bd5f15b9eb47b690496df8dadbab0c/scripts/eval_cityscapes/download_fcn8s.sh#L1 link failed when I ran sh download_fcn8s.sh ,

    http://people.eecs.berkeley.edu/~tinghuiz/projects/pix2pix/fcn-8s-cityscapes/fcn-8s-cityscapes.caffemodel Resolving people.eecs.berkeley.edu (people.eecs.berkeley.edu)... 128.32.244.190 Connecting to people.eecs.berkeley.edu (people.eecs.berkeley.edu)|128.32.244.190|:80... connected.
    HTTP request sent, awaiting response... 301 Moved Permanently Location: https://tinghuiz.github.io/projects/pix2pix/fcn-8s-cityscapes/fcn-8s-cityscapes.caffemodel [following] --2021-07-05 10:37:38--  https://tinghuiz.github.io/projects/pix2pix/fcn-8s-cityscapes/fcn-8s-cityscapes.caffemodel Resolving tinghuiz.github.io (tinghuiz.github.io)... 185.199.110.153, 185.199.111.153, 185.199.109.153, ...
    Connecting to tinghuiz.github.io (tinghuiz.github.io)|185.199.110.153|:443... connected.
    HTTP request sent, awaiting response... 404 Not Found 2021-07-05 10:37:38 ERROR 404: Not Found.
    

    Hope to get an effective link,Thanks!

    opened by Kravrolens 3
StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.

StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.

null 3k Jan 8, 2023
Real-CUGAN - Real Cascade U-Nets for Anime Image Super Resolution

Real Cascade U-Nets for Anime Image Super Resolution 中文 | English ?? Real-CUGAN

tarsin 111 Dec 28, 2022
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech Jaehyeon Kim, Jungil Kong, and Juhee Son In our rece

Jaehyeon Kim 1.7k Jan 8, 2023
Unofficial implement with paper SpeakerGAN: Speaker identification with conditional generative adversarial network

Introduction This repository is about paper SpeakerGAN , and is unofficially implemented by Mingming Huang ([email protected]), Tiezheng Wang (wtz920729

null 7 Jan 3, 2023
Create animations for the optimization trajectory of neural nets

Animating the Optimization Trajectory of Neural Nets loss-landscape-anim lets you create animated optimization path in a 2D slice of the loss landscap

Logan Yang 81 Dec 25, 2022
Woosung Choi 63 Nov 14, 2022
SMD-Nets: Stereo Mixture Density Networks

SMD-Nets: Stereo Mixture Density Networks This repository contains a Pytorch implementation of "SMD-Nets: Stereo Mixture Density Networks" (CVPR 2021)

Fabio Tosi 115 Dec 26, 2022
Code for visualizing the loss landscape of neural nets

Visualizing the Loss Landscape of Neural Nets This repository contains the PyTorch code for the paper Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer

Tom Goldstein 2.2k Jan 9, 2023
Companion code for the paper "An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their Asymptotic Overconfidence" (NeurIPS 2021)

ReLU-GP Residual (RGPR) This repository contains code for reproducing the following NeurIPS 2021 paper: @inproceedings{kristiadi2021infinite, title=

Agustinus Kristiadi 4 Dec 26, 2021
NudeNet: Neural Nets for Nudity Classification, Detection and selective censoring

NudeNet: Neural Nets for Nudity Classification, Detection and selective censoring Uncensored version of the following image can be found at https://i.

notAI.tech 1.1k Dec 29, 2022
[NeurIPS 2021] Well-tuned Simple Nets Excel on Tabular Datasets

[NeurIPS 2021] Well-tuned Simple Nets Excel on Tabular Datasets Introduction This repo contains the source code accompanying the paper: Well-tuned Sim

null 52 Jan 4, 2023
Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021) This repository is the official P

Jingyun Liang 159 Dec 30, 2022
Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021) This repository is the official P

Jingyun Liang 159 Dec 30, 2022
Code for the paper: Adversarial Training Against Location-Optimized Adversarial Patches. ECCV-W 2020.

Adversarial Training Against Location-Optimized Adversarial Patches arXiv | Paper | Code | Video | Slides Code for the paper: Sukrut Rao, David Stutz,

Sukrut Rao 32 Dec 13, 2022
Adversarial Color Enhancement: Generating Unrestricted Adversarial Images by Optimizing a Color Filter

ACE Please find the preliminary version published at BMVC 2020 in the folder BMVC_version, and its extended journal version in Journal_version. Datase

null 28 Dec 25, 2022
transfer attack; adversarial examples; black-box attack; unrestricted Adversarial Attacks on ImageNet; CVPR2021 天池黑盒竞赛

transfer_adv CVPR-2021 AIC-VI: unrestricted Adversarial Attacks on ImageNet CVPR2021 安全AI挑战者计划第六期赛道2:ImageNet无限制对抗攻击 介绍 : 深度神经网络已经在各种视觉识别问题上取得了最先进的性能。

null 25 Dec 8, 2022
LBK 35 Dec 26, 2022
LBK 26 Dec 28, 2022
Super-Fast-Adversarial-Training - A PyTorch Implementation code for developing super fast adversarial training

Super-Fast-Adversarial-Training This is a PyTorch Implementation code for develo

LBK 26 Dec 2, 2022