This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" (SPNLP@ACL2022)

Wanyu Du

Last update: Dec 29, 2022

Related tags

Deep Learning text-generation transformer lstm style-transfer gaussian-processes variational-autoencoder paraphrase-generation pointer-generator t5-model

Overview

GP-VAE

This repository provides datasets and code for preprocessing, training and testing models for the paper:

Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors
Wanyu Du, Jianqiao Zhao, Liwei Wang and Yangfeng Ji
ACL 2022 6th Workshop on Structured Prediction for NLP

Installation

The following command installs all necessary packages:

pip install -r requirements.txt

The project was tested using Python 3.6.6.

Datasets

Twitter URL includes trn/val/tst.tsv, which has the following format in each line:

source_sentence \t reference_sentence

GYAFC has two sub-domains em and fr, please request and download the data from the original paper here.

Models

Training

Train the LSTM-based variational encoder-decoder with GP priors:

cd models/pg/
python main.py --task train --data_file ../../data/twitter_url \
			   --model_type gp_full --kernel_v 65.0 --kernel_r 0.0001

where --data_file indicates the data path for the training data,
--model_type indicates which prior to use, including copynet/normal/gp_full,
--kernel_v and --kernel_r specifies the hyper-parameters for the kernel of GP prior.

Train the transformer-based variational encoder-decoder with GP priors:

cd models/t5/
python t5_gpvae.py --task train --dataset twitter_url \
    			   --kernel_v 512.0 --kernel_r 0.001

where --data_file indicates the data path for the training data,
--kernel_v and --kernel_r specifies the hyper-parameters for the kernel of GP prior.

Inference

Test the LSTM-based variational encoder-decoder with GP priors:

cd models/pg/
python main.py --task decode --data_file ../../data/twitter_url \
			   --model_type gp_full --kernel_v 65.0 --kernel_r 0.0001 \
			   --decode_from sample \
			   --model_file /path/to/best/checkpoint

where --data_file indicates the data path for the testing data,
--model_type indicates which prior to use, including copynet/normal/gp_full,
--kernel_v and --kernel_r specifies the hyper-parameters for the kernel of GP prior,
--decode_from indicates generating results conditioning on z_mean or randomly sampled z, including mean/sample.

Test the transformer-based variational encoder-decoder with GP priors:

cd models/t5/
python t5_gpvae.py --task eval --dataset twitter_url \
    			   --kernel_v 512.0 --kernel_r 0.001 \
    			   --from_mean \
    			   --timestamp '2021-02-14-04-57-04' \
    			   --ckpt '30000' # load best checkpoint

where --data_file indicates the data path for the testing data,
--kernel_v and --kernel_r specifies the hyper-parameters for the kernel of GP prior,
--from_mean indicates whether to generate results conditioning on z_mean or randomly sampled z,
--timestamp and --ckpt indicate the file path for the best checkpoint.

Citation

If you find this work useful for your research, please cite our paper:

Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors

@inproceedings{du2022gpvae,
    title = "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors",
    author = "Du, Wanyu and Zhao, Jianqiao and Wang, Liwei and Ji, Yangfeng",
    booktitle = "Proceedings of the 6th Workshop on Structured Prediction for NLP (SPNLP 2022)",
    year = "2022",
    publisher = "Association for Computational Linguistics",
}

Comments

inference speed and diversity
Hi! Thanks for your great work! I'm working on getting results on another paraphrase dataset under T5 + GP prior setting. I have the following two questions:

I found that the generation speed is relatively slow due to the inference batch size 1, and something get wrong if I change it. Is there any way to speed up the generation?

if I want to get a trade-off between quality and diversity, is it suitable to set the scalar to 7 just like it used in the paper for the paraphrasing task?
opened by kiaia 2
How to solve the problem of "zero kld!!!"?

When I am trying to train the t5-gpave, there is a problem of "zero kld!!!". Also, the "zero kld!!!" is also in the training of the LSTM-based variational encoder-decoder with GP priors. Thank you for your help and I am looking forward to hearing from you.

opened by zhangming-19 2
prior_logvar should be 1 when calculating KL
Thank you for your great work! I am learning VAE recently, your paper have given me great inspiration.

In naive VAE, the KL divergence should be $KL(\mathcal{N}(\mu, \sigma), \mathcal{N}(0,1))$. But when reading your code, I found model/t5/t5_vae.py line 151, you set the prior_logvar as 0. Is there any mistake?

prior_mean = torch.zeros([hidden_states.size(0), posterior_mean.size(-1)]) \ .to(posterior_mean.dtype).to(posterior_mean.device) prior_logvar = torch.zeros([hidden_states.size(0), posterior_logvar.size(-1)]) \ .to(posterior_logvar.dtype).to(posterior_logvar.device)
opened by NIL-zhuang 1

This repository contains the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

1.1k Dec 30, 2022

This repository contains the code for the CVPR 2020 paper "Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision"

697 Jan 6, 2023

This repository contains the code for "SBEVNet: End-to-End Deep Stereo Layout Estimation" paper by Divam Gupta, Wei Pu, Trenton Tabor, Jeff Schneider

SBEVNet: End-to-End Deep Stereo Layout Estimation This repository contains the code for "SBEVNet: End-to-End Deep Stereo Layout Estimation" paper by D

19 Dec 17, 2022

This GitHub repository contains code used for plots in NeurIPS 2021 paper 'Stochastic Multi-Armed Bandits with Control Variates.'

About Repository This repository contains code used for plots in NeurIPS 2021 paper 'Stochastic Multi-Armed Bandits with Control Variates.' About Code

1 Nov 9, 2021

This repository contains the code for the paper "PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization"

PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization News: [2020/05/04] Added EGL rendering option for training data g

1.5k Jan 3, 2023

This repo contains the code and data used in the paper "Wizard of Search Engine: Access to Information Through Conversations with Search Engines"

Wizard of Search Engine: Access to Information Through Conversations with Search Engines by Pengjie Ren, Zhongkun Liu, Xiaomeng Song, Hongtao Tian, Zh

19 Oct 27, 2022

This repository contains numerical implementation for the paper Intertemporal Pricing under Reference Effects: Integrating Reference Effects and Consumer Heterogeneity.

6 Nov 18, 2022

The repo contains the code of the ACL2020 paper `Dice Loss for Data-imbalanced NLP Tasks`

Dice Loss for NLP Tasks This repository contains code for Dice Loss for Data-imbalanced NLP Tasks at ACL2020. Setup Install Package Dependencies The c

223 Dec 17, 2022

This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

SO-Pose This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation This paper is basically an

52 Nov 25, 2022

This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" (SPNLP@ACL2022)

Related tags

Overview

GP-VAE

Installation

Datasets

Models

Training

Inference

Citation

Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors

You might also like...

This repository contains the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

This repository contains the code for the CVPR 2020 paper "Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision"

This repository contains the code for "SBEVNet: End-to-End Deep Stereo Layout Estimation" paper by Divam Gupta, Wei Pu, Trenton Tabor, Jeff Schneider

This GitHub repository contains code used for plots in NeurIPS 2021 paper 'Stochastic Multi-Armed Bandits with Control Variates.'

This repository contains the code for the paper "PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization"

This repo contains the code and data used in the paper "Wizard of Search Engine: Access to Information Through Conversations with Search Engines"

This repository contains numerical implementation for the paper Intertemporal Pricing under Reference Effects: Integrating Reference Effects and Consumer Heterogeneity.

The repo contains the code of the ACL2020 paper `Dice Loss for Data-imbalanced NLP Tasks`

This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

Comments

inference speed and diversity

How to solve the problem of "zero kld!!!"?

prior_logvar should be 1 when calculating KL

Owner

Wanyu Du

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper

This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effects in Video."

This repository contains the code and models for the following paper.

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

This repository contains the code for the paper "Hierarchical Motion Understanding via Motion Programs"

This repository contains the source code for the paper "DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks",

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.