Official Implementation of "Third Time's the Charm? Image and Video Editing with StyleGAN3" https://arxiv.org/abs/2201.13433

Overview

Third Time's the Charm? Image and Video Editing with StyleGAN3

Yuval Alaluf*, Or Patashnik*, Zongze Wu, Asif Zamir, Eli Shechtman, Dani Lischinski, Daniel Cohen-Or
*Denotes equal contribution

StyleGAN is arguably one of the most intriguing and well-studied generative models, demonstrating impressive performance in image generation, inversion, and manipulation. In this work, we explore the recent StyleGAN3 architecture, compare it to its predecessor, and investigate its unique advantages, as well as drawbacks. In particular, we demonstrate that while StyleGAN3 can be trained on unaligned data, one can still use aligned data for training, without hindering the ability to generate unaligned imagery. Next, our analysis of the disentanglement of the different latent spaces of StyleGAN3 indicates that the commonly used W/W+ spaces are more entangled than their StyleGAN2 counterparts, underscoring the benefits of using the StyleSpace for fine-grained editing. Considering image inversion, we observe that existing encoder-based techniques struggle when trained on unaligned data. We therefore propose an encoding scheme trained solely on aligned data, yet can still invert unaligned images. Finally, we introduce a novel video inversion and editing workflow that leverages the capabilities of a fine-tuned StyleGAN3 generator to reduce texture sticking and expand the field of view of the edited video.

Inference Notebook:


Using the recent StyleGAN3 generator, we edit unaligned input images across various domains using off-the-shelf editing techniques. Using a trained StyleGAN3 encoder, these techniques can likewise be used to edit real images and videos.

Description

Official implementation of our StyleGAN3 paper "Third Time's the Charm?" where we analyze the recent StyleGAN3 generator and explore its advantages over previous style-based generators. We evaluate StyleGAN3's latent spaces, explore their editability, and introduce an encoding scheme for inverting and editing real images and videos.

Table of Contents

Table of contents generated with markdown-toc

Getting Started

Prerequisites

  • Linux or macOS
  • NVIDIA GPU + CUDA CuDNN (CPU may be possible with some modifications, but is not inherently supported)
  • Python 3

During the building of this repository, we decided to replace all uses of argparse with dataclasses and pyrallis. Using dataclasses provides useful features such as type hints and code completion. We believe this has helped make the code cleaner and we hope you enjoy just as much as we do!

To learn more, check out the pyrallis repository.

Installation

  • Dependencies: We recommend running this repository using Anaconda.
    All dependencies for defining the environment are provided in environment/sg3_env.yaml.

Inference Notebook

Check out our inference notebook for inverting and editing real images with StyleGAN3.

Preparing Your Data

In this repo, we work with images that are both aligned and cropped (unaligned). Given a directory of raw images, you can prepare your data by running the script prepare_data/preparing_faces_parallel.py. For example, for preparing your aligned data, you can run:

python prepare_data/preparing_faces_parallel.py \
--mode align \ 
--root_path /path/to/raw/images

Similarly, to obtain unaligned data, you can run:

python prepare_data/preparing_faces_parallel.py \
--mode crop \
--root_path /path/to/raw/images \
--random_shift 0.05

Given the aligned and unaligned images, we can then compute the transformations between each pair of images.
To compute these landmarks, you can run the following command:

python prepare_data/compute_landmarks_transforms.py \
--raw_root /path/to/root/data \
--aligned_root /path/to/aligned/data \
--cropped_root /path/to/unaligned/data \
--output_root /path/to/directory/to/save/transforms/to/

The aligned data, unaligned data, and landmarks-based transforms will be used in various applications such as inverting unaligned images and editing unaligned images.

StyleGAN3 Encoder

We provide our pretrained StyleGAN3 encoder trained over ReStyle-pSp and ReStyle-e4e using the FFHQ dataset. The models can be downloaded from the following links:

Path Description
ReStyle-pSp Human Faces ReStyle-pSp trained on the FFHQ dataset over the StyleGAN3 generator.
ReStyle-e4e Human Faces ReStyle-e4e trained on the FFHQ dataset over the StyleGAN3 generator..

In addition, we provide various auxiliary models needed for training your own encoders models from scratch.
This includes the StyleGAN generators converted to .pt format and pre-trained models used for loss computation and encoder backbone.

Path Description
FFHQ Aligned StyleGAN3 StyleGAN3 model trained on FFHQ with 1024x1024 output resolution and saved as .pt file.
FFHQ Unaligned StyleGAN3 StyleGAN3 model trained on FFHQU with 1024x1024 output resolution and saved as .pt file.
AFHQ StyleGAN3 StyleGAN3 model trained on AFHQv2 with 512x512 output resolution and saved as .pt file. Model is taken from the official StyleGAN3 repository.
Landscapes HQ StyleGAN3 StyleGAN3 model trained on Landscapes HQ with 256x256 output resolution and saved as .pt file. Model taken from Justin Pinkney.
Path Description
IR-SE50 Model Pretrained IR-SE50 model taken from TreB1eN for use in our ID loss and encoder backbone on human facial domain.
CurricularFace Backbone Pretrained CurricularFace model taken from HuangYG123 for use in ID similarity metric computation.
MTCNN Weights for MTCNN model taken from TreB1eN for use in ID similarity metric computation. (Unpack the tar.gz to extract the 3 model weights.)

Note: while you may use the official StyleGAN3 pkl files, we highly recommend using our provided .pt generators. The generators themselves are identical, but using the pt files provide more flexibility.

By default, we assume that all auxiliary models are downloaded and saved to the directory pretrained_models. However, you may use your own paths by changing the necessary values in configs/path_configs.py.

Training

Recall, that our encoders are trained solely on aligned data. After training is complete, we can perform inference on both aligned and unaligned data by using the user-specified transformations supported by StyleGAN3.

To prepare your data configs, we refer the reader to the ReStyle repository which we follow.

To train our ReStyle-pSp encoder, you can run the following command:

python inversion/scripts/train_restyle_psp.py \
--dataset_type ffhq_encode \
--encoder_type BackboneEncoder \
--exp_dir experiments/restyle_psp_ffhq_encode \
--batch_size 2 \
--test_batch_size 2 \
--workers 8 \
--test_workers 8 \
--val_interval 5000 \
--save_interval 10000 \
--start_from_latent_avg True \
--lpips_lambda 0.8 \
--l2_lambda 1 \
--id_lambda 0.1 \
--input_nc 6 \
--n_iters_per_batch 3 \
--output_size 1024 \
--stylegan_weights pretrained_models/sg3-r-ffhq-1024.pt

Similarly training a ReStyle-e4e encoder can be run with the following command:

python inversion/scripts/train_restyle_e4e.py \
--dataset_type ffhq_encode \
--encoder_type ProgressiveBackboneEncoder \
--exp_dir experiments/restyle_e4e_ffhq_encode \
--batch_size 2 \
--test_batch_size 2 \
--workers 8 \
--test_workers 8 \
--start_from_latent_avg True \
--lpips_lambda 0.8 \
--l2_lambda 1 \
--id_lambda 0.1 \
--w_discriminator_lambda 0.1 \
--use_w_pool True \
--input_nc 6 \
--n_iters_per_batch 3 \
--truncation_psi 0.7 \
--output_size 1024 \
--stylegan_weights pretrained_models/sg3-r-ffhq-1024.pt

Additional Notes:

  • All hyper-parameters are set as the defaults as used in ReStyle except that when training ReStyle-e4e we do not use the progressive training scheme.
  • You should also adjust the --output_size and --stylegan_weights flags according to your StyleGAN3 generator.
  • See inversion/options/train_options.py and inversion/options/e4e_train_options.py for all training-specific flags.

Inference and Evaluation


Sample reconstructions for unaligned images obtained with our ReStyle encoders.

Inversion

You can use inversion/scripts/inference_iterative.py to apply a trained model on a set of images. For example:

python inversion/scripts/inference_iterative.py \
--output_path experiments/restyle_e4e_ffhq_encode/inference \
--checkpoint_path experiments/restyle_e4e_ffhq_encode/checkpoints/best_model.pt \
--data_path /path/to/aligned_test_data \
--test_batch_size 4 \
--test_workers 4 \
--n_iters_per_batch 3 \
--landmarks_transforms_path /path/to/landmarks_transforms.npy

This script will save each step's outputs in a separate sub-directory and the iterative results side-by-side.
Notes:

  • Note that the --data_path flag should point to your aligned data since this is the data that is passed to the encoder.
  • If you are inverting unaligned images, you should specify the landmarks-based transforms using the flag --landmarks_transforms_path.
    • When generating the final reconstruction, these transforms will be used to reconstruct the placement and rotation of the unaligned image.
    • Follow the instructions above for preparing the landmarks_transforms file.
    • Please note, if this file is not specified, then all outputs will be aligned.
  • By default, the images will be saved at their original output resolutions (e.g., 1024x1024 for faces).
    • If you wish to save outputs resized to resolutions of 256x256, you can do so by adding the flag --resize_outputs=True.

This script will also save all the latents as an .npy file in a dictionary format as follows:

{
    "0.jpg": [latent_step_1, latent_step_2, ..., latent_step_N],
    "1.jpg": [latent_step_1, latent_step_2, ..., latent_step_N],
    ...
}

That is, the keys of the dictionary are the image file names and the values are lists of length N containing the output latent of each step where N is the number of inference steps. Each element in the list is of shape (Kx512) where K is the number of style inputs of the generator.

You can use the saved latents to perform latent space manipulations, for example.

Inversion Animation

If you would like to create animations showing interpolations between the inversions of unaligned images (like those on the project page), you can run the script inversion/scripts/create_inversion_animation.py. For example,

python inversion/scripts/create_inversion_animation.py \
--generator_path pretrained_models/sg3-r-ffhq-1024.pt \
--output_path experiments/restyle_e4e_ffhq_encode/inference/animations \
--latents_path experiments/restyle_e4e_ffhq_encode/inference/latents.npy \
--data_path /path/to/animation/data \
--landmarks_transforms_path /path/to/animation/data/landmarks_transforms.npy \
--n_images 10

Note that we assume that you have already run the inference script from above and have saved the saved latents to the path --latents_path (similarly for the landmarks transforms). By default, the script will randomly sample n_images from the data_path or use all images if n_images is None.

Computing Metrics

Given a trained model and generated outputs, we can compute the loss metrics on a given dataset.
These scripts receive the inference output directory and ground truth directory.

  • Calculating L2/LPIPS/MS-SSIM losses and similarities:
python inversion/scripts/calc_losses_on_images.py \
--metrics "[lpips, l2, msssim]" \
--output_path /path/to/experiment/inference/inference_results \
--gt_path /path/to/test_images
  • Calculating the identity loss for the facial domain:
python inversion/scripts/calc_id_loss_parallel.py \
--output_path /path/to/experiment/inference/inference_results \
--gt_path /path/to/test_images

These scripts will traverse through each sub-directory of output_path to compute the metrics on each step's output images.

Editing Real Images

For editing real images, we provide two techniques: InterFaceGAN and StyleCLIP's global directions.

InterFaceGAN


Sample editing results for unaligned images obtained with our ReStyle-e4e encoder and InterFaceGAN. Additional results are provided in the project page.

For editing with InterFaceGAN, you can run the following command:

python inversion/scripts/inference_editing.py \
--output_path /path/to/experiment/inference \
--checkpoint_path experiments/restyle_e4e_ffhq_encode/checkpoints/best_model.pt \
--data_path /path/to/test_data \
--test_batch_size 4 \
--test_workers 4 \
--n_iters_per_batch 3 \
--edit_directions "[age,pose,smile]" \
--factor_ranges "[(-5_5),(-5_5),(-5_5)]" \
--landmarks_transforms_path /path/to/landmarks_transforms.npy

This will run inversion and edit each image in the specified data_path using the trained encoder.
Notes:

  • The edits supported by this script are defined in the FaceEditor class in editing/interfacegan/face_editor.py.
    • You may add your own directions to expand support for additional edits.
  • To specify the lambda (range) of each edit, you can change the flag --factor_ranges.
    • Here, -5_5 means that the range is (-5,5) for each edit.
    • For example, specifying --edit_directions="[age,pose]" and factor_ranges="[(-5_5),(-2_2)]" will perform edits of age and pose with ranges of (-5,5) and (-2,2), respectively.

StyleCLIP Global Directions


Sample editing results for unaligned images obtained with our ReStyle-e4e encoder and StyleCLIP. Additional results are provided in the project page.

For editing with StyleCLIP global directions, you can run the following command:

python editing/styleclip_global_directions/edit.py \
--output_path /path/to/experiment/inference \
--latents_path /path/to/experiment/inference/latents.npy \
--neutral_text "a face" \
--target_tex "a happy face" \
--landmarks_transforms_path /path/to/landmarks_transforms.npy

Notes:

  • Before running the above script, you may need to install the official CLIP package:
    pip install git+https://github.com/openai/CLIP.git
    
  • For each input image we save a grid of results with different values of alpha and beta as defined in StyleCLIP.

Pivotal Tuning Inversion (PTI)

We also provide support for running PTI for inverting real images. This can be run using the run_pti_images.py script. For example:

python inversion/scripts/run_pti_images.py \
--output_path /path/to/experiment/inference/pti \
--generator_path pretrained_models/sg3-r-ffhq-1024.pt \
--images_path /path/to/data \
--latents_path /path/to/experiment/inference/latents.npy \
--landmarks_transforms_path /path/to/landmarks_transforms.npy \
--steps 350 \
--save_interval 100

Currently, instead of starting the PTI inversion process with the mean latent code, we begin with the inversions predicted from our encoder. This speeds up the inversion process by starting from a better initialization point.
Note, as usual, if working with unaligned data, you should pass the landmarks-based transforms in order to reconstruct the original unaligned input pose.

Inverting and Editing Videos

Inference

To invert a given video sequence, you can run the script inversion/video/inference_on_video.py. For example,

python inversion/video/inference_on_video.py \
--video_path /path/to/input/video.mp4 \
--checkpoint_path experiments/restyle_e4e_ffhq_encode/checkpoints/best_model.pt \
--output_path /path/to/experiment/inference/video_inference

Notes:

  • If the landmarks_transforms_path is specified, we will compute and save the landmarks transforms to the specified path. If it is not specified, we will save it to the output path, by default.
    • Then, in future runs, we will load the pre-saved transforms.
  • Users can also specify the following paths: raw_frames_path, aligned_frames_path, cropped_frames_path
    If these paths are not specified, then they will be saved to output_path/raw_frames, output_path/aligned_frames, and output_path/cropped_frames, by default.
    Once the images are extracted and saved once, they will be loaded in future runs from the specified paths.

The result of the script will be several output videos, including:

  • input_video: the video containing the cropped frames
  • result_video: the reconstruction video without the smoothing technique from the paper
  • result_video_smoothed: the reconstruction video with the smoothing technique
  • result_video_coupled: a video with the cropped images and smoothed reconstruction side-by-side

PTI Training on Videos

After running the inference script above, you may notice that the results can be improved. We therefore provide the script inversion/video/run_pti_video.py which runs PTI on the input video to improve the reconstruction obtained by using only the encoder. Running the script can be done using the following command template:

python inversion/video/run_pti_video.py \
--output_path /path/to/experiment/inference/video_inference/pti \
--generator_path pretrained_models/stylegan3/sg3-r-ffhq-1024.pt \
--images_path /path/to/experiment/inference/video_inference/cropped_frames \
--latents_path /path/to/experiment/inference/video_inference/latents.npy \
--landmarks_transforms_path /path/to/experiment/inference/video_inference/landmarks_transforms.npy \
--steps 8000 \
--save_interval 500 \
--model_save_interval 1000 \
--save_final_model

Notes:

  • We train PTI using the unaligned images as inputs. This allows us to improve on image regions which were "cut off" by the aligned pre-processing.
  • The images_path, latents_path, and landmarks_transforms_path should have all be saved when running the inference_on_video.py script beforehand.
  • In our experiments, we trained PTI for 8000 steps per input video.

After running PTI, to get the final video reconstructions, you can run the inference_on_video.py script again and add the --generator_path flag pointing to the final PTI model. This will replace the original SG3 generator with the PTI-tuned generator. For example,

python inversion/video/inference_on_video.py \
--video_path /path/to/input/video.mp4 \
--checkpoint_path experiments/restyle_e4e_ffhq_encode/checkpoints/best_model.pt \
--output_path /path/to/experiment/inference/video_inference/with_pti \
--raw_frames_path /path/to/experiment/inference/video_inference/raw_frames \
--aligned_frames_path /path/to/experiment/inference/video_inference/aligned_frames \
--cropped_frames_path /path/to/experiment/inference/video_inference/cropped_frames \
--landmarks_transforms_path /path/to/experiment/inference/video_inference/landmarks_transforms.npy \
--generator_path /path/to/experiment/inference/video_inference/pti/final_pti_model.pt

Notice, this time, we will use the aligned images, cropped images, and landmarks transforms from last time.

Field of View Expansion

We provide an Expander object in utils/fov_expansion.py that allows to generate images with an extended field-of-view by employing Fourier Features transformations. The Expander should be intiallized with a StyleGAN3 Generator object. Then, images can be generated with the function generate_expanded_image which recieves latent codes if form of w or s and a landmark transform. To expand in 100 pixels to the left, set pixels_left = 100. Note that it is possible to expand in more than one direction.

An example to the Expander usage can be seen in inversion/video/video_editor.py in which we generate an edited video with expanded FOV.

Editing

InterFaceGAN


Editing synthetic images with the aligned StyleGAN3 generator using directions learned with InterFaceGAN.

Pre-trained InterFaceGAN Boundaries

Pre-trained boundaries for age, pose, and the 40 CelebA dataset attributes can be found in the following links:

Path Description
Aligned FFHQ InterFaceGAN boundaries for the Aligned FFHQ StyleGAN3 generator.
Unaligned FFHQ InterFaceGAN boundaries for hte Unaligned FFHQ StyleGAN3 generator.

We include boundaries for both the aligned StyleGAN3 generator and the unaligned StyleGAN3 generator for human faces.

Training Your Own InterFaceGAN Boundaries

We provide code for training boundaries for InterFaceGAN in editing/interfacegan. In our implementation we provide classifiers for age, pose, and the 40 attributes from the CelebA dataset (predicted using the classifier from AnyCostGAN.

The age and pose pre-trained classifiers can be downloaded from the following links (the classifier from AnyCostGAN is downloaded automatically via the code):

Path Description
Age Classifier An VGG-based age classifier taken from SAM.
Pose Classifier A HopeNet pose classifier taken from deep-head-pose.

First, you need to generate a set of latent codes and predict their attribute scores. This can be done by running:

python editing/interfacegan/generate_latents_and_attribute_scores.py \
--generator_path pretrained_models/stylegan3-r-ffhq-1024x1024.pkl \
--n_images 500000 \
--truncation_psi 0.7 \
--output_path /path/to/predicted/latents/and/scores \
--save_interval 10000

An npy file will be saved every save_interval samples. By default, we train boundaries for the aligned StyleGAN3 generator. If training boundaries for an unaligned generator, the psuedo-alignment trick will be performed before passing the images to the classifier.

Once the latents and scores are saved, the boundaries can be trained using the script train_boundaries.py. For example,

python editing/interfacegan/train_boundaries.py \
--input_path /path/to/predicted/latents/and/scores \
--output_path /path/to/trained/boundaries

This script will save a set of boundaries (in npy format) for each attribute predicted in the first step.

Editing Synthetic Images with InterFaceGAN

We also provide a script for editing synthetic images with InterFaceGAN. Given a generator (either aligned or unaligned), the script generates the edited images using the directions specified in configs.paths_config.interfacegan_aligned_edit_paths and configs.paths_config.interfacegan_unaligned_edit_paths.
For example, users can run the following command to edit images:

python editing/interfacegan/edit_synthetic.py \
--generator_path pretrained_models/stylegan3-r-ffhq-1024x1024.pkl \
--generator_type ALIGNED \
--output_path /path/to/edit/outputs \
--attributes_to_edit "[age,smile,pose]" \
--n_images_per_edit 100 \
--truncation_psi 0.7

Notes:

  • generator_type can either be aligned or unaligned. This controls which set of editing directions to use.
  • We currently support edits of age, smile, pose, and Male, but this is easily configurable by adding a new range to INTERFACEGAN_RANGES in edit_synthetic.py.
  • If working with aligned generator, users can also generate randomly unaligned images by adding the flag --apply_random_transforms=True. This will apply a random translation and rotation before generating the edited image.
  • Users can also specify the flag --generate_animation=True which will create an interpolation video depicting the edits. Note, generating these animations does take about 30 seconds per image.

StyleCLIP


Editing synthetic images with using StyleCLIP's global directions technique.

StyleCLIP Latent Mapper

We provide in editing/styleclip_mapper an adaptation of the StyleCLIP latent mapper approach to StyleGAN3. Please refer to the original StyeCLIP latent mapper implementation for more details.

Training:

  • The main training script can be found at editing/styleclip_mapper/scripts/train.py.
  • Training arguments can be found at editing/styleclip_mapper/options/train_options.py.
  • Note: --description is where you provide the driving text.
  • Note that the latent codes dataset is automatically generated in the case where opts.latents_train_path does not exist (and similarly for the test set).
  • Note that the losses coefficients from StyleGAN2 may not optimal!

As an example, training a mapper for the bob-cut hairstyle can be done by running the following commands:

cd editing/styleclip_mapper/scripts
python train.py --exp_dir ../results/bobcut_hairstyle --description "bobcut hairstyle"

Inference:

  • The main inference script can be found at editing/styleclip_mapper/scripts/inference.py.
  • All necessary inference parameters can be found at editing/styleclip_mapper/options/test_options.py.

StyleCLIP Global Directions

Preprocessing

The StyleCLIP global directions technique requires two preprocessing steps (that are general to any type of edit):

  1. First, we must generate the s_stats file, which can be done by running:
python editing/styleclip_global_directions/preprocess/s_statistics.py
  1. Second, we must generate the delta_i_c.npy file, which can be fone by running: python editing/styleclip_global_directions/preprocess/create_delta_i_c.py Please note that this script may take a long time to run.

We provide our precomputed files for FFHQ, FFHQ-U, AFHQ, Landscapes-HQ in the following links:

Path Description
FFHQ Folder containing s_stats and delta_i_c.npy computed on StyleGAN3 FFHQ.
FFHQ-U Folder containing s_stats and delta_i_c.npy computed on StyleGAN3 FFHQ-U.
AFHQ Folder containing s_stats and delta_i_c.npy computed on StyleGAN3 AFHQv2.
Landscapes HQ Folder containing s_stats and delta_i_c.npy computed on Landscapes-HQ.

If you would like to download all files, you can also run the following script:

python editing/styleclip_global_directions/preprocess/download_all_files.py
Editing

For editing randomly sampled images with StyleCLIP global directions, you can run the following command:

python editing/styleclip_global_directions/edit.py \
--output_path /path/to/experiment/inference \
--neutral_text "a sad face" \
--target_text "a happy face" \
--n_images=10

Here, we will randomly sample 10 latents and edit them using the specified edit.

Notes:

  • To apply random transformations to the images, you can set the flag --apply_random_transforms=True.
  • For each input image we save a grid of results with different values of alpha and beta as defined in StyleCLIP.

StyleGAN-NADA

In our paper, we perform various experiments with StyleGAN-NADA trained over StyleGAN3.
We invite the reader to head over to the StyleGAN3-NADA branch in the official paper repository for running StyleGAN-NADA over StyleGAN3.

We provide a set of trained models for FFHQ, FFHQ-U, and AFHQ-v2:

Path Description
FFHQ Folder a set of trained StyleGAN3-NADA models fine-tuned from the original FFHQ model.
FFHQ-U Folder a set of trained StyleGAN3-NADA models fine-tuned from the original FFHQ-U model.
AFHQ Folder a set of trained StyleGAN3-NADA models fine-tuned from the original AFHQ-v2 model.

Acknowledgments

This code borrows from various repositories, including:

Citation

If you use this code for your research, please cite the following work:

@misc{alaluf2022times,
      title={Third Time's the Charm? Image and Video Editing with StyleGAN3}, 
      author={Yuval Alaluf and Or Patashnik and Zongze Wu and Asif Zamir and Eli Shechtman and Dani Lischinski and Daniel Cohen-Or},
      year={2022},
      eprint={2201.13433},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Comments
  • Training with  ReStyle-pSp algorithm results

    Training with ReStyle-pSp algorithm results

    Hi, after training for close to 3 weeks, using a GeForce Titan RTX, the results were not satisfactory

    resultadosytlegan250000

    I am working with Market-1501 dataset with 39000 images to train. Images size 64X127px

    So, I have some questions about how to improve the performance or is posible.

    Should I try to train with ReStyle-e4e algorithm or should I keep training for another week? The problem could be that the number of images in the dataset is not enought? Or the images has a low resolution? During the training is posible to get the latent vectors or the result image? During the training the loss is 0.17 but goes to 0.5 during the testing. The idea is to increase the number of images of the dataset, so I am working with the same images during the train and test.

    Sorry for asking you so many questions, I am working in my post degree thesis, and this part is the most important of my experimental study.

    Thanks! Laura.

    opened by uselessai 9
  • Ideal Encoder Dataset Type

    Ideal Encoder Dataset Type

    Our objective is to edit the original face image without glasses with sample images with glasses. I have trained the StyleGAN3 conditional network using face images with two different eyeglasses. Now for training the encoder, what should be our ideal dataset type so we can get a good inferred image?

    If you can share regarding this it will be really helpful.

    Thank You

    opened by rut00 5
  • [Question] How to convert pkl to pt file?

    [Question] How to convert pkl to pt file?

    Thanks for your excellent work!

    Describe the problem

    I'd like to learn how you convert pkl to pt file. I use pt file you provide to generate images. Code is as follow:

    def get_random_image(generator: Generator, truncation_psi: float, seed):
        with torch.no_grad():
            z = torch.from_numpy(np.random.RandomState(seed).randn(1, 512).astype('float32')).to('cuda')
            if hasattr(generator.synthesis, 'input'):
                m = make_transform(translate=(0, 0), angle=0)
                m = np.linalg.inv(m)
                generator.synthesis.input.transform.copy_(torch.from_numpy(m))
            w = generator.mapping(z, None, truncation_psi=truncation_psi)
            img = generator.synthesis(w, noise_mode='const')
            res_image = tensor2im(img[0])
            return res_image, w
    
    

    And it works well. But when I convert pkl to pt by myself, it appears several errors. The converting code I used is as follow:

    import pickle
    import sys
    from enum import Enum
    from pathlib import Path
    from typing import Optional
    
    import torch
    
    checkpoint_path = "pretrained_models/stylegan3-t-ffhq-1024x1024.pkl"
    print(f"Loading StyleGAN3 generator from path: {checkpoint_path}")
    with open(checkpoint_path, "rb") as f:
        decoder = pickle.load(f)['G_ema'].cuda()
    print('Loading done!')
    
    state_dict = decoder.state_dict()
    torch.save(state_dict, "pretrained_models/stylegan3-t-ffhq-1024x1024.pt")
    print('Converting done!')
    

    Then I use stylegan3-t-ffhq-1024x1024.pt to generate images. And the errors are as follow:

    Loading StyleGAN3 generator from path: pretrained_models/stylegan3-t-ffhq-1024x1024.pt
    Traceback (most recent call last):
      File "/sam/models/stylegan3/model.py", line 61, in _load_checkpoint
        self.decoder.load_state_dict(torch.load(checkpoint_path), strict=True)
      File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1223, in load_state_dict
        raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
    RuntimeError: Error(s) in loading state_dict for Generator:
    	Missing key(s) in state_dict: "synthesis.L0_36_1024.weight", "synthesis.L0_36_1024.bias", "synthesis.L0_36_1024.magnitude_ema", "synthesis.L0_36_1024.up_filter", "synthesis.L0_36_1024.down_filter", "synthesis.L0_36_1024.affine.weight", "synthesis.L0_36_1024.affine.bias", "synthesis.L1_36_1024.weight", "synthesis.L1_36_1024.bias", "synthesis.L1_36_1024.magnitude_ema", "synthesis.L1_36_1024.up_filter", "synthesis.L1_36_1024.down_filter", "synthesis.L1_36_1024.affine.weight", "synthesis.L1_36_1024.affine.bias", "synthesis.L2_52_1024.weight", "synthesis.L2_52_1024.bias", "synthesis.L2_52_1024.magnitude_ema", "synthesis.L2_52_1024.up_filter", "synthesis.L2_52_1024.down_filter", "synthesis.L2_52_1024.affine.weight", "synthesis.L2_52_1024.affine.bias", "synthesis.L3_52_1024.weight", "synthesis.L3_52_1024.bias", "synthesis.L3_52_1024.magnitude_ema", "synthesis.L3_52_1024.up_filter", "synthesis.L3_52_1024.down_filter", "synthesis.L3_52_1024.affine.weight", "synthesis.L3_52_1024.affine.bias", "synthesis.L4_84_1024.weight", "synthesis.L4_84_1024.bias", "synthesis.L4_84_1024.magnitude_ema", "synthesis.L4_84_1024.up_filter", "synthesis.L4_84_1024.down_filter", "synthesis.L4_84_1024.affine.weight", "synthesis.L4_84_1024.affine.bias", "synthesis.L5_148_1024.weight", "synthesis.L5_148_1024.bias", "synthesis.L5_148_1024.magnitude_ema", "synthesis.L5_148_1024.up_filter", "synthesis.L5_148_1024.down_filter", "synthesis.L5_148_1024.affine.weight", "synthesis.L5_148_1024.affine.bias", "synthesis.L6_148_1024.weight", "synthesis.L6_148_1024.bias", "synthesis.L6_148_1024.magnitude_ema", "synthesis.L6_148_1024.up_filter", "synthesis.L6_148_1024.down_filter", "synthesis.L6_148_1024.affine.weight", "synthesis.L6_148_1024.affine.bias", "synthesis.L7_276_645.weight", "synthesis.L7_276_645.bias", "synthesis.L7_276_645.magnitude_ema", "synthesis.L7_276_645.up_filter", "synthesis.L7_276_645.down_filter", "synthesis.L7_276_645.affine.weight", "synthesis.L7_276_645.affine.bias", "synthesis.L8_276_406.weight", "synthesis.L8_276_406.bias", "synthesis.L8_276_406.magnitude_ema", "synthesis.L8_276_406.up_filter", "synthesis.L8_276_406.down_filter", "synthesis.L8_276_406.affine.weight", "synthesis.L8_276_406.affine.bias", "synthesis.L9_532_256.weight", "synthesis.L9_532_256.bias", "synthesis.L9_532_256.magnitude_ema", "synthesis.L9_532_256.up_filter", "synthesis.L9_532_256.down_filter", "synthesis.L9_532_256.affine.weight", "synthesis.L9_532_256.affine.bias", "synthesis.L10_1044_161.weight", "synthesis.L10_1044_161.bias", "synthesis.L10_1044_161.magnitude_ema", "synthesis.L10_1044_161.up_filter", "synthesis.L10_1044_161.down_filter", "synthesis.L10_1044_161.affine.weight", "synthesis.L10_1044_161.affine.bias", "synthesis.L11_1044_102.weight", "synthesis.L11_1044_102.bias", "synthesis.L11_1044_102.magnitude_ema", "synthesis.L11_1044_102.up_filter", "synthesis.L11_1044_102.down_filter", "synthesis.L11_1044_102.affine.weight", "synthesis.L11_1044_102.affine.bias", "synthesis.L12_1044_64.weight", "synthesis.L12_1044_64.bias", "synthesis.L12_1044_64.magnitude_ema", "synthesis.L12_1044_64.up_filter", "synthesis.L12_1044_64.down_filter", "synthesis.L12_1044_64.affine.weight", "synthesis.L12_1044_64.affine.bias", "synthesis.L13_1024_64.weight", "synthesis.L13_1024_64.bias", "synthesis.L13_1024_64.magnitude_ema", "synthesis.L13_1024_64.up_filter", "synthesis.L13_1024_64.down_filter", "synthesis.L13_1024_64.affine.weight", "synthesis.L13_1024_64.affine.bias".
    	Unexpected key(s) in state_dict: "synthesis.L0_36_512.weight", "synthesis.L0_36_512.bias", "synthesis.L0_36_512.magnitude_ema", "synthesis.L0_36_512.up_filter", "synthesis.L0_36_512.down_filter", "synthesis.L0_36_512.affine.weight", "synthesis.L0_36_512.affine.bias", "synthesis.L1_36_512.weight", "synthesis.L1_36_512.bias", "synthesis.L1_36_512.magnitude_ema", "synthesis.L1_36_512.up_filter", "synthesis.L1_36_512.down_filter", "synthesis.L1_36_512.affine.weight", "synthesis.L1_36_512.affine.bias", "synthesis.L2_52_512.weight", "synthesis.L2_52_512.bias", "synthesis.L2_52_512.magnitude_ema", "synthesis.L2_52_512.up_filter", "synthesis.L2_52_512.down_filter", "synthesis.L2_52_512.affine.weight", "synthesis.L2_52_512.affine.bias", "synthesis.L3_52_512.weight", "synthesis.L3_52_512.bias", "synthesis.L3_52_512.magnitude_ema", "synthesis.L3_52_512.up_filter", "synthesis.L3_52_512.down_filter", "synthesis.L3_52_512.affine.weight", "synthesis.L3_52_512.affine.bias", "synthesis.L4_84_512.weight", "synthesis.L4_84_512.bias", "synthesis.L4_84_512.magnitude_ema", "synthesis.L4_84_512.up_filter", "synthesis.L4_84_512.down_filter", "synthesis.L4_84_512.affine.weight", "synthesis.L4_84_512.affine.bias", "synthesis.L5_148_512.weight", "synthesis.L5_148_512.bias", "synthesis.L5_148_512.magnitude_ema", "synthesis.L5_148_512.up_filter", "synthesis.L5_148_512.down_filter", "synthesis.L5_148_512.affine.weight", "synthesis.L5_148_512.affine.bias", "synthesis.L6_148_512.weight", "synthesis.L6_148_512.bias", "synthesis.L6_148_512.magnitude_ema", "synthesis.L6_148_512.up_filter", "synthesis.L6_148_512.down_filter", "synthesis.L6_148_512.affine.weight", "synthesis.L6_148_512.affine.bias", "synthesis.L7_276_323.weight", "synthesis.L7_276_323.bias", "synthesis.L7_276_323.magnitude_ema", "synthesis.L7_276_323.up_filter", "synthesis.L7_276_323.down_filter", "synthesis.L7_276_323.affine.weight", "synthesis.L7_276_323.affine.bias", "synthesis.L8_276_203.weight", "synthesis.L8_276_203.bias", "synthesis.L8_276_203.magnitude_ema", "synthesis.L8_276_203.up_filter", "synthesis.L8_276_203.down_filter", "synthesis.L8_276_203.affine.weight", "synthesis.L8_276_203.affine.bias", "synthesis.L9_532_128.weight", "synthesis.L9_532_128.bias", "synthesis.L9_532_128.magnitude_ema", "synthesis.L9_532_128.up_filter", "synthesis.L9_532_128.down_filter", "synthesis.L9_532_128.affine.weight", "synthesis.L9_532_128.affine.bias", "synthesis.L10_1044_81.weight", "synthesis.L10_1044_81.bias", "synthesis.L10_1044_81.magnitude_ema", "synthesis.L10_1044_81.up_filter", "synthesis.L10_1044_81.down_filter", "synthesis.L10_1044_81.affine.weight", "synthesis.L10_1044_81.affine.bias", "synthesis.L11_1044_51.weight", "synthesis.L11_1044_51.bias", "synthesis.L11_1044_51.magnitude_ema", "synthesis.L11_1044_51.up_filter", "synthesis.L11_1044_51.down_filter", "synthesis.L11_1044_51.affine.weight", "synthesis.L11_1044_51.affine.bias", "synthesis.L12_1044_32.weight", "synthesis.L12_1044_32.bias", "synthesis.L12_1044_32.magnitude_ema", "synthesis.L12_1044_32.up_filter", "synthesis.L12_1044_32.down_filter", "synthesis.L12_1044_32.affine.weight", "synthesis.L12_1044_32.affine.bias", "synthesis.L13_1024_32.weight", "synthesis.L13_1024_32.bias", "synthesis.L13_1024_32.magnitude_ema", "synthesis.L13_1024_32.up_filter", "synthesis.L13_1024_32.down_filter", "synthesis.L13_1024_32.affine.weight", "synthesis.L13_1024_32.affine.bias".
    	size mismatch for synthesis.input.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
    	size mismatch for synthesis.input.freqs: copying a param with shape torch.Size([512, 2]) from checkpoint, the shape in current model is torch.Size([1024, 2]).
    	size mismatch for synthesis.input.phases: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
    	size mismatch for synthesis.L14_1024_3.weight: copying a param with shape torch.Size([3, 32, 1, 1]) from checkpoint, the shape in current model is torch.Size([3, 64, 1, 1]).
    	size mismatch for synthesis.L14_1024_3.affine.weight: copying a param with shape torch.Size([32, 512]) from checkpoint, the shape in current model is torch.Size([64, 512]).
    	size mismatch for synthesis.L14_1024_3.affine.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([64]).
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "gen_images_using_pt.py", line 79, in <module>
        main()
      File "gen_images_using_pt.py", line 47, in main
        generator = SG3Generator(checkpoint_path=args.generator_path).decoder
      File "/sam/models/stylegan3/model.py", line 56, in __init__
        self._load_checkpoint(checkpoint_path)
      File "/sam/models/stylegan3/model.py", line 65, in _load_checkpoint
        self.decoder.load_state_dict(ckpt, strict=False)
      File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1223, in load_state_dict
        raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
    RuntimeError: Error(s) in loading state_dict for Generator:
    	size mismatch for synthesis.input.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
    	size mismatch for synthesis.input.freqs: copying a param with shape torch.Size([512, 2]) from checkpoint, the shape in current model is torch.Size([1024, 2]).
    	size mismatch for synthesis.input.phases: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
    	size mismatch for synthesis.L14_1024_3.weight: copying a param with shape torch.Size([3, 32, 1, 1]) from checkpoint, the shape in current model is torch.Size([3, 64, 1, 1]).
    	size mismatch for synthesis.L14_1024_3.affine.weight: copying a param with shape torch.Size([32, 512]) from checkpoint, the shape in current model is torch.Size([64, 512]).
    	size mismatch for synthesis.L14_1024_3.affine.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([64]).
    
    opened by HuaZheLei 5
  • Questions about generate_latents_and_attribute_scores.py

    Questions about generate_latents_and_attribute_scores.py

    Great work and thank you so much for sharing the code! I got a couple of questions regarding the generate_latents_and_attribute_scores.py which I'm using to generate training data for new boundaries:

    • What is the purpose of using save_interval in this function? Is there any reason that we don't want to save all generated data into a single folder (and a single scores.npy, ws.npy, etc.)?
    • I noticed the following description of save_interval in README (for generate_latents_and_attribute_scores.py as well) -- "An npy file will be saved every save_interval samples". However, in the code Line 90, the if-statement if seed_idx % save_interval == 0 and seed > 0 will only save a npy file at every (save_interval+ 1) step since seed_idx starts from 0. For example, if we set n_images to 8 and save_interval to 4, we want to have two npy files each containing 4 faces, but we'll end up getting only one npy file containing 5 faces. I think maybe this is a bug and need to fix it to something like if (seed_idx + 1) % save_interval == 0 and seed > 0?

    Would you kindly let me know if I'm missing something here? I would really appreciate it!

    opened by NoctisZ 4
  • How to feed latent.npy to fine-tuned StyleGAN 3 model?

    How to feed latent.npy to fine-tuned StyleGAN 3 model?

    Hi, first of all thank you for putting this repo together, you are demonstrating so much skill!

    I have a noob question: When I invert an image with inference_iterative.py and get latents.npy, how can I then feed latents.npy back into my fine-tuned StyleGAN 3 model? Would I have to somehow modify gen_images.py of the main SG3 repo and feed the vector into that?

    This is how I'm getting latents.npy but I'm not sure how to then feed the vector into my fine-tuned model:

    python /content/stylegan3-editing/inversion/scripts/inference_iterative.py \
      --output_path /content/inference \
      --checkpoint_path /content/stylegan3-editing/restyle_pSp_ffhq.pt \
      --data_path /content/data \
      --test_batch_size 4 \
      --test_workers 4 \
      --n_iters_per_batch 3
    

    Thank you!

    opened by kimyanna 4
  • Runtime nan+-nan

    Runtime nan+-nan

    Used docker pytorch/pytorch:1.10.0-cuda11.3-cudnn8-devel.
    Create Environtment: conda env create -f environment/sg3_env.yml But I got error when inferenced as below. Any ideas how to fix it?

    command:

    python inversion/scripts/inference_iterative.py --output_path experiments/restyle_e4e_ffhq_encode/inference --checkpoint_path pretrained_models/restyle_e4e_ffhq.pt --data_path /usr/src/myapp/stylegan3-editing/ --test_batch_size 4 --test_workers 1 --n_iters_per_batch 3 --landmarks_transforms_path output/landmarks_transforms.npy

    output:

    Loading ReStyle e4e from checkpoint: pretrained_models/restyle_e4e_ffhq.pt Loading StyleGAN3 generator from path: None Done! Model successfully loaded! Loading dataset for ffhq_encode Setting up PyTorch plugin "filtered_lrelu_plugin"... Done. 0it [00:00, ?it/s] /opt/conda/envs/sg3_env/lib/python3.6/site-packages/numpy/core/fromnumeric.py:3373: RuntimeWarning: Mean of empty slice. out=out, **kwargs) /opt/conda/envs/sg3_env/lib/python3.6/site-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) /opt/conda/envs/sg3_env/lib/python3.6/site-packages/numpy/core/_methods.py:234: RuntimeWarning: Degrees of freedom <= 0 for slice keepdims=keepdims) /opt/conda/envs/sg3_env/lib/python3.6/site-packages/numpy/core/_methods.py:195: RuntimeWarning: invalid value encountered in true_divide arrmean, rcount, out=arrmean, casting='unsafe', subok=False) /opt/conda/envs/sg3_env/lib/python3.6/site-packages/numpy/core/_methods.py:226: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) Runtime nan+-nan

    Originally posted by @ikcla in https://github.com/yuval-alaluf/stylegan3-editing/issues/31#issuecomment-1168829518

    opened by ikcla 4
  • What if generator is StyleGAN3 that has 256 output resolution?

    What if generator is StyleGAN3 that has 256 output resolution?

    First of all, thank you for your brilliant work!

    I have a question that does your implementation support StyleGAN3 that has 256 output resolution?

    If yes, what batch size per GPU can be applied?

    opened by johannwyh 4
  • Match motion of smooth video with raw video

    Match motion of smooth video with raw video

    Comparing smooth video with raw video, it shows obviously like a slow motion compare to original version. Is there away to match the motion of smooth video with original one? Is it possible to change motion in smooth video here?

    average fine layers

    result_latents[:, 9:, :] = result_latents[:, 9:, :].mean(axis=0)
    

    change 9 to another number to change motion appearance?

    Thanks.

    opened by ikcla 3
  • Latents from latents.npy file

    Latents from latents.npy file

    Hi! I am trying to use the latent vectors (I need to do some interpolations) from the "latents.npy" file in the stylegan3 pkl model and it does not work, I have tried with several latent vectors and always got the same imagen. I trained the model with Market-1501 dataset.

    This is the image I got from the inference when the latents.npy is created. inference

    And this is the imagen when I use latents.npy siempreigual

    This is my code, I have modified the gen_images.py from stylegan3 github project.

        x = np.load("/content/latents.npy", allow_pickle=True)
        for i in x.flat:
            print(i.keys())
            latent_numpy = i["1_1003_00002.jpg"][4] # get latents from image 1_1003_00002.jpg
    
        latent_tensor =  torch.from_numpy(latent_numpy).to(device)
    
        # this is from the Stylegan3 github, file  gen_images.py
        img = G(latent_tensor, label, truncation_psi=truncation_psi, noise_mode=noise_mode)
        img = (img.permute(0, 2, 3, 1) * 127.5 + 128).clamp(0, 255).to(torch.uint8)
        PIL.Image.fromarray(img[0].cpu().numpy(), "RGB").save(f"{outdir}/seed{seed:04d}.png") 
    

    I do not know if I am missing something. Thanks in advance.

    opened by uselessai 3
  • Error using stylegan3 pkl trained model

    Error using stylegan3 pkl trained model

    I am trying to train the encoder "train_restyle_psp" with my own stylegan3 pkl model and I've got this error.

    Loading StyleGAN3 generator from path: pretrained_models/network-snapshot-002160Stylegan3.pkl Done! Traceback (most recent call last): File "inversion/scripts/train_restyle_psp.py", line 29, in <module> main() File "/home/laura/Escritorio/Anaconda/lib/python3.8/site-packages/pyrallis/argparsing.py", line 158, in wrapper_inner response = fn(cfg, *args, **kwargs) File "inversion/scripts/train_restyle_psp.py", line 24, in main coach = Coach(opts) File "./inversion/training/coach_restyle_psp.py", line 44, in __init__ self.avg_image = self.net(self.net.latent_avg.repeat(16, 1).unsqueeze(0), File "/home/laura/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "./inversion/models/psp3.py", line 66, in forward images = self.decoder.synthesis(codes, noise_mode='const', force_fp32=True) File "/home/laura/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "<string>", line 469, in forward File "/home/laura/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "<string>", line 218, in forward RuntimeError: batch1 dim 2 must match batch2 dim 1

    I am working with a pkl model.

    opened by uselessai 3
  • Runing environment on SM_86 GPU Architecture Ampere

    Runing environment on SM_86 GPU Architecture Ampere

    when using a newer graphics card ( like nvidia. GeForce RTX 3090) because the architecture of the graphics card is relatively new , maybe the older version of the pytorch library does not support the 。 that's when it will show up. capability sm_86 is not compatible at the same time, according to the output, you can see The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75 currently, pytorch can only support the above architectures.

    opened by hotnikq 3
Owner
Computer vision research scientist and enthusiast
null
Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

CIPS -- Official Pytorch Implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis Requirements pip install -r requi

Multimodal Lab @ Samsung AI Center Moscow 201 Dec 21, 2022
[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

PointDSC repository PyTorch implementation of PointDSC for CVPR'2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency",

null 153 Dec 14, 2022
Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" https://arxiv.org/abs/2104.02699

ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement Recently, the power of unconditional image synthesis has significantly advanced th

null 967 Jan 4, 2023
This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression Introduction In this paper, we are interested in the bottom-up paradigm of estima

HRNet 367 Dec 27, 2022
Non-Official Pytorch implementation of "Face Identity Disentanglement via Latent Space Mapping" https://arxiv.org/abs/2005.07728 Using StyleGAN2 instead of StyleGAN

Face Identity Disentanglement via Latent Space Mapping - Implement in pytorch with StyleGAN 2 Description Pytorch implementation of the paper Face Ide

Daniel Roich 58 Dec 24, 2022
Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

Hurdles to Progress in Long-form Question Answering This repository contains the official scripts and datasets accompanying our NAACL 2021 paper, "Hur

Kalpesh Krishna 41 Nov 8, 2022
Pytorch implementation of Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization https://arxiv.org/abs/2008.11646

[TCSVT] Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization LPN [Paper] NEWs Prerequisites Python 3.6 GPU Memory >= 8G Numpy > 1.

null 46 Dec 14, 2022
Minimal implementation of PAWS (https://arxiv.org/abs/2104.13963) in TensorFlow.

PAWS-TF ?? Implementation of Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples (PAWS)

Sayak Paul 43 Jan 8, 2023
A PyTorch implementation of EventProp [https://arxiv.org/abs/2009.08378], a method to train Spiking Neural Networks

Spiking Neural Network training with EventProp This is an unofficial PyTorch implemenation of EventProp, a method to compute exact gradients for Spiki

Pedro Savarese 35 Jul 29, 2022
Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

Swin-Transformer-Tensorflow A direct translation of the official PyTorch implementation of "Swin Transformer: Hierarchical Vision Transformer using Sh

null 52 Dec 29, 2022
Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) in PyTorch

alias-free-gan-pytorch Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) This implementation

Kim Seonghyeon 502 Jan 3, 2023
Unofficial Tensorflow-Keras implementation of Fastformer based on paper [Fastformer: Additive Attention Can Be All You Need](https://arxiv.org/abs/2108.09084).

Fastformer-Keras Unofficial Tensorflow-Keras implementation of Fastformer based on paper Fastformer: Additive Attention Can Be All You Need. Tensorflo

Yam Peleg 10 Jan 30, 2022
Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286

Pytorch-DPPO Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286 Using PPO with clip loss (from https

Alexis David Jacq 163 Dec 26, 2022
Tensorflow implementation of Semi-supervised Sequence Learning (https://arxiv.org/abs/1511.01432)

Transfer Learning for Text Classification with Tensorflow Tensorflow implementation of Semi-supervised Sequence Learning(https://arxiv.org/abs/1511.01

DONGJUN LEE 82 Oct 22, 2022
PyTorch implementation of Asymmetric Siamese (https://arxiv.org/abs/2204.00613)

Asym-Siam: On the Importance of Asymmetry for Siamese Representation Learning This is a PyTorch implementation of the Asym-Siam paper, CVPR 2022: @inp

Meta Research 89 Dec 18, 2022
This repository contains the code used for Predicting Patient Outcomes with Graph Representation Learning (https://arxiv.org/abs/2101.03940).

Predicting Patient Outcomes with Graph Representation Learning This repository contains the code used for Predicting Patient Outcomes with Graph Repre

Emma Rocheteau 76 Dec 22, 2022
https://arxiv.org/abs/2102.11005

LogME LogME: Practical Assessment of Pre-trained Models for Transfer Learning How to use Just feed the features f and labels y to the function, and yo

THUML: Machine Learning Group @ THSS 149 Dec 19, 2022
Supplementary code for the paper "Meta-Solver for Neural Ordinary Differential Equations" https://arxiv.org/abs/2103.08561

Meta-Solver for Neural Ordinary Differential Equations Towards robust neural ODEs using parametrized solvers. Main idea Each Runge-Kutta (RK) solver w

Julia Gusak 25 Aug 12, 2021
Code for paper "A Critical Assessment of State-of-the-Art in Entity Alignment" (https://arxiv.org/abs/2010.16314)

A Critical Assessment of State-of-the-Art in Entity Alignment This repository contains the source code for the paper A Critical Assessment of State-of

Max Berrendorf 16 Oct 14, 2022