On the Generative Utility of Cyclic Conditionals
This repository is the official implementation of "On the Generative Utility of Cyclic Conditionals" (NeurIPS 2021).
Chang Liu <[email protected]>, Haoyue Tang, Tao Qin, Jintao Wang, Tie-Yan Liu.
[Paper & Appendix] [Slides] [Video] [Poster]
Introduction
Whether and how can two conditional models p(x|z) and q(z|x) that form a cycle uniquely determine a joint distribution p(x,z)? We develop a general theory for this question, including criteria for the two conditionals to correspond to a common joint (compatibility) and for such joint to be unique (determinacy). As in generative models we need a generator (decoder/likelihood model) and also an encoder (inference model) for representation, the theory indicates they could already define a generative model p(x,z) without specifying a prior distribution p(z)! We call this novel generative modeling framework as CyGen, and develop methods to achieve the eligibility (compatibility and determinacy) and the usage (fitting and generating data) as a generative model.
This codebase implements these CyGen methods, and various baseline methods. The model architectures are based on the Sylvester flow (Householder version), and the experiment environments/setups follow FFJORD. Authorship is clarified in each file.
Requirements
The code requires python version >= 3.6, and is based on PyTorch. To install requirements:
pip install -r requirements.txt
Usage
Run the run_toy.sh
and run_image.sh
scripts for the synthetic and real-world (i.e. MNIST and SVHN) experiments. See the commands in the script files or python3 main_[toy|image].py --help
for customized usage or hyperparameter tuning.
For the real-world experiments, downstream classification accuracy is evaluated along training. To evaluate the FID score, run the command python3 compute_gen_fid.py --load_dict=<path_to_model.pth>
.
Results
As a trailer, we show the synthetic results here. We see that CyGen achieves both high-quality data generation, and well-separated latent clusters (useful representation). This is due to the removal of a specified prior distribution so that the manifold mismatch and posterior collapse problems are avoided. DAE (denoising auto-encoder) does not need a prior, but its training method hurts determinacy. If pretrained as a VAE (i.e. CyGen(PT)), we see that the knowledge of a centered and centrosymmetric prior is encoded through the conditional models. See the paper for more results.