[CVPR 2021] Generative Hierarchical Features from Synthesizing Images

Related tags

Deep Learning ghfeat
Overview

GH-Feat - Generative Hierarchical Features from Synthesizing Images

image Figure: Training framework of GH-Feat.

Generative Hierarchical Features from Synthesizing Images
Yinghao Xu*, Yujun Shen*, Jiapeng Zhu, Ceyuan Yang, Bolei Zhou
Computer Vision and Pattern Recognition (CVPR), 2021 (Oral)

[Paper] [Project Page]

In this work, we show that well-trained GAN generators can be used as training supervision to learn hierarchical visual features. We call this feature as Generative Hierarchical Feature (GH-Feat). Properly learned from a novel hierarchical encoder, GH-Feat is able to facilitate both discriminative and generative visual tasks, including face verification, landmark detection, layout prediction, transfer learning, style mixing, image editing, etc.

Usage

Environment

Before running the code, please setup the environment with

conda env create -f environment.yml
conda activate ghfeat

Testing

The following script can be used to extract GH-Feat from a list of images.

python extract_ghfeat.py ${ENCODER_PATH} ${IMAGE_LIST} -o ${OUTPUT_DIR}

We provide some well-learned encoders for inference.

Path Description
face_256x256 GH-Feat encoder trained on FF-HQ dataset.
tower_256x256 GH-Feat encoder trained on LSUN Tower dataset.
bedroom_256x256 GH-Feat encoder trained on LSUN Bedroom dataset.

Training

Given a well-trained StyleGAN generator, our hierarchical encoder is trained with the objective of image reconstruction.

python train_ghfeat.py \
       ${TRAIN_DATA_PATH} \
       ${VAL_DATA_PATH} \
       ${GENERATOR_PATH} \
       --num_gpus ${NUM_GPUS}

Here, the train_data and val_data can be created by this script. Note that, according to the official StyleGAN repo, the dataset is prepared in the multi-scale manner, but our encoder training only requires the data at the largest resolution. Hence, please specify the path to the tfrecords with the target resolution instead of the directory of all the tfrecords files.

Users can also train the encoder with slurm:

srun.sh ${PARTITION} ${NUM_GPUS} \
        python train_ghfeat.py \
               ${TRAIN_DATA_PATH} \
               ${VAL_DATA_PATH} \
               ${GENERATOR_PATH} \
               --num_gpus ${NUM_GPUS}

We provide some pre-trained generators as follows.

Path Description
face_256x256 StyleGAN trained on FFHQ dataset.
tower_256x256 StyleGAN trained on LSUN Tower dataset.
bedroom_256x256 StyleGAN trained on LSUN Bedroom dataset.

Codebase Description

  • Most codes are directly borrowed from StyleGAN repo.
  • Structure of the proposed hierarchical encoder: training/networks_ghfeat.py
  • Training loop of the encoder: training/training_loop_ghfeat.py
  • To feed GH-Feat produced by the encoder to the generator as layer-wise style codes, we slightly modify training/networks_stylegan.py. (See Line 263 and Line 477).
  • Main script for encoder training: train_ghfeat.py.
  • Script for extracting GH-Feat from images: extract_ghfeat.py.
  • VGG model for computing perceptual loss: perceptual_model.py.

Results

We show some results achieved by GH-Feat on a variety of downstream visual tasks.

Discriminative Tasks

Indoor scene layout prediction image

Facial landmark detection image

Face verification (face reconstruction) image

Generative Tasks

Image harmonization image

Global editing image

Local Editing image

Multi-level style mixing image

BibTeX

@inproceedings{xu2021generative,
  title     = {Generative Hierarchical Features from Synthesizing Images},
  author    = {Xu, Yinghao and Shen, Yujun and Zhu, Jiapeng and Yang, Ceyuan and Zhou, Bolei},
  booktitle = {CVPR},
  year      = {2021}
}
Comments
  • Does the whole project include the code of local editing?

    Does the whole project include the code of local editing?

    Hi, friend, your work is very exciting and wonderful. But I want to know that if the code for different task especially local editing is included in the code you reaveal? Or your code is just used to generate the hierarchical feature needed in multi-tasks you have mentioned? I am a pytorch user, your code is about tensorflow. If it is the latter condition I mentioned above, I think I can use the feature generated by your code more concisely, beccause I just need to take the feature without modifying your code. By the way,what does the “image list” mean? I suppose it is a folder containing some images, but the error that program pulled out indicates that it's not the right understanding. I will be appreciated if I can receive your reply as soon as possible. Thank you!

    opened by Radium98 11
  • how to translate the feature map to the GH-Feature?

    how to translate the feature map to the GH-Feature?

    Hi, thanks for your work. I want to do some experiments based on your work. Howerer, in your paper, the detail of encoder structure didn't show how to translate the feature map(eg, 102444) to the GH-Feat(512*1). Can you provide it? Thank you very much!

    opened by WangQinghuCS 2
  • Is it possible to train the encoder and generator together?

    Is it possible to train the encoder and generator together?

    Hello, dear authors, thank you for your great work. I have read several of your papers and know that your team has very good insight on generative models. So I would like to ask several questions:

    1. I think this work and Pixel2style2pixel are somewhat similar in proposing a hierarchical stylegan encoder, however, it's hard for both method to get precisely reconstructed images. Though results looks similar to original images, they are very different pixel-wise, and the SSIM and PSNR are relatively low (compared to other reconstuction tasks like SR, Deblur, De-rain, etc). What do you think is the reason? Is there any method to impore the reconstruction quality?

    2. VAE-GAN combines VAE and GAN, is there any chance to train a stylegan and stylegan encoder in this way, or some similar framework?

    opened by SystemErrorWang 0
  • Encoder reconstruction yields very bad results

    Encoder reconstruction yields very bad results

    Hello, I have been trying to reproduce the harmonization task, and still haven't managed to get the results shown as examples.

    I use the provided face encoder "ghfeat-encoder-ffhq-256.pkl" and the "image.list" file provided in the examples folder. When performing the feature extraction, the script takes about 20 minutes to finish, and the resulting reconstructed images look like this: Screenshot_4

    I have performed several tests, using the bedroom encoder too, and the results are always the same: Screenshot_1 :

    When installing the environment, I changed the decorator library from "decorator==5.0.0" to "decorator" (decorator=5.0.0 yields an error when trying to install).

    My distro is Ubuntu 20.04.3 LTS.

    opened by javierbenitezmarin 0
  • Upgrading to TF 2.0

    Upgrading to TF 2.0

    Hi!

    I am trying to upgrade the GHFeat model to Tensorflow 2.2 from Tensorflow 1.2 for my project. To this end, I used the tf_upgrade_v2 provided in TF2.0. The function made several changes however, It omitted making changes to files in the "dnnlib/" directory. So I am having to do those manually.

    There's one statement in "dnnlib/tflib/network.py" which returns an error when interpreted with TF2.0, which I not being able to fix. The troublesome statement is Line 291, given as below:
    exec(self._build_module_src, module.__dict__) # pylint: disable=exec-used. The error trace-back for the same is: exec(self._build_module_src, module.__dict__) # pylint: disable=exec-used File "<string>", line 11, in <module> ModuleNotFoundError: No module named 'tensorflow.contrib' I understand that I need to go to the file where module has been defined and change "tf.contrib." to "tf.compat.v1.contrib." / "tf_slim.". However since the trace-back mentions the filename as "String", I am not being able to figure out the file to which I need to make changes. I request the community and repository maintainers to kindly help me out in this regard.

    So far, the following attempts of mine to fix this issue have failed:

    1. upgrading the Pickle PROTOCOL to 5
    2. following the steps given here
    3. trying out python's compile() function to figure out the path to the erroneous file

    Looking forward to effective insights and discussions in this regard.

    Thanks.

    opened by snehalstomar 0
Owner
GenForce: May Generative Force Be with You
Research on Generative Modeling in Zhou Group
GenForce: May Generative Force Be with You
FACIAL: Synthesizing Dynamic Talking Face With Implicit Attribute Learning. ICCV, 2021.

FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning PyTorch implementation for the paper: FACIAL: Synthesizing Dynamic Talking

null 226 Jan 8, 2023
[SIGGRAPH Asia 2021] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning.

DeepVecFont This is the homepage for "DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning". Yizhi Wang and Zhouhui Lian. WI

Yizhi Wang 17 Dec 22, 2022
[SIGGRAPH Asia 2021] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning.

DeepVecFont This is the homepage for "DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning". Yizhi Wang and Zhouhui Lian. WI

Yizhi Wang 5 Oct 22, 2021
An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

SVM Données Une base d’images contient 490 images pour l’apprentissage (400 voitures et 90 bateaux), et encore 21 images pour fait des tests. Prétrait

Achraf Rahouti 3 Nov 30, 2021
LBK 20 Dec 2, 2022
Synthesizing Long-Term 3D Human Motion and Interaction in 3D in CVPR2021

Long-term-Motion-in-3D-Scenes This is an implementation of the CVPR'21 paper "Synthesizing Long-Term 3D Human Motion and Interaction in 3D". Please ch

Jiashun Wang 76 Dec 13, 2022
This is the research repository for Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition.

Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition This is the research repository for Vid2

Future Interfaces Group (CMU) 26 Dec 24, 2022
CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"

Diverse Structure Inpainting ArXiv | Papar | Supplementary Material | BibTex This repository is for the CVPR 2021 paper, "Generating Diverse Structure

null 152 Nov 4, 2022
git git《Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking》(CVPR 2021) GitHub:git2] 《Masksembles for Uncertainty Estimation》(CVPR 2021) GitHub:git3]

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking Ning Wang, Wengang Zhou, Jie Wang, and Houqiang Li Accepted by CVPR

NingWang 236 Dec 22, 2022
Official Pytorch implementation of "Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video", CVPR 2021

TCMR: Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video Qualtitative result Paper teaser video Introduction This r

Hongsuk Choi 215 Jan 6, 2023
Regularizing Generative Adversarial Networks under Limited Data (CVPR 2021)

Regularizing Generative Adversarial Networks under Limited Data [Project Page][Paper] Implementation for our GAN regularization method. The proposed r

Google 148 Nov 18, 2022
Implementation for the paper SMPLicit: Topology-aware Generative Model for Clothed People (CVPR 2021)

SMPLicit: Topology-aware Generative Model for Clothed People [Project] [arXiv] License Software Copyright License for non-commercial scientific resear

Enric Corona 225 Dec 13, 2022
This repository contains the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields Project Page | Paper | Supplementary | Video | Slides | Blog | Talk If

null 1.1k Dec 30, 2022
Learning Generative Models of Textured 3D Meshes from Real-World Images, ICCV 2021

Learning Generative Models of Textured 3D Meshes from Real-World Images This is the reference implementation of "Learning Generative Models of Texture

Dario Pavllo 115 Jan 7, 2023
improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

null 310 Dec 28, 2022
Static Features Classifier - A static features classifier for Point-Could clusters using an Attention-RNN model

Static Features Classifier This is a static features classifier for Point-Could

ABDALKARIM MOHTASIB 1 Jan 25, 2022
Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

Minimal PyTorch implementation of Generative Latent Optimization This is a reimplementation of the paper Piotr Bojanowski, Armand Joulin, David Lopez-

Thomas Neumann 117 Nov 27, 2022
[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, CVPR 2021. Ayan Kumar Bhunia, Pinaki nath Chowdhury, Yongxin Yan

Ayan Kumar Bhunia 44 Dec 12, 2022
Demo code for paper "Learning optical flow from still images", CVPR 2021.

Depthstillation Demo code for "Learning optical flow from still images", CVPR 2021. [Project page] - [Paper] - [Supplementary] This code is provided t

null 130 Dec 25, 2022