Related resources for our EMNLP 2021 paper

Overview

Plan-then-Generate: Controlled Data-to-Text Generation via Planning

Authors: Yixuan Su, David Vandyke, Sihui Wang, Yimai Fang, and Nigel Collier

Code for EMNLP 2021 paper Plan-then-Generate: Controlled Data-to-Text Generation via Planning

1. Environment Setup:

(1) Hardware Requirement:

The code in this repo is thoroughly tested on our machine with a single Nvida V100 GPU (16GB)

(2) Installation:

chmod +x ./config_setup.sh
./config_setup.sh

2. ToTTo Data Preprocessing:

Option (1): Preprocess the ToTTo data from scratch by yourself:

cd ./data
chmod +x ./prepare_data.sh
./prepare_data.sh

This process could take up to 1 hour

Option (2): Download the our processed data here

unzip data.zip and replace with the empty ./data folder

For more details about ToTTo dataset, please refer to the original Google Research repo

3. Content Planner:

Please refer to README.md in ./content_planner folder

4. Sequence Generator:

Please refer to README.md in ./generator folder

5. Citation

If you find our paper and resources useful, please kindly cite our paper:

@inproceedings{su2021plangen,
    title={Plan-then-Generate: Controlled Data-to-Text Generation via Planning}, 
     author={Yixuan Su and David Vandyke and Sihui Wang and Yimai Fang and Nigel Collier},
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    month = nov,
    year = "2021",
    publisher = "Association for Computational Linguistics",
}
You might also like...
This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories This repo is the code release of EMNLP 2021 con

CausalNLP is a practical toolkit for causal inference with text as treatment, outcome, or "controlled-for" variable.

CausalNLP CausalNLP is a practical toolkit for causal inference with text as treatment, outcome, or "controlled-for" variable. Install pip install -U

[EMNLP 2020] Keep CALM and Explore: Language Models for Action Generation in Text-based Games

Contextual Action Language Model (CALM) and the ClubFloyd Dataset Code and data for paper Keep CALM and Explore: Language Models for Action Generation

This repository contains the data and code for the paper
This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" (SPNLP@ACL2022)

GP-VAE This repository provides datasets and code for preprocessing, training and testing models for the paper: Diverse Text Generation via Variationa

Automatically download the cwru data set, and then divide it into training data set and test data set

Automatically download the cwru data set, and then divide it into training data set and test data set.自动下载cwru数据集,然后分训练数据集和测试数据集

A curated list of  awesome resources related to Semantic Search🔎  and Semantic Similarity tasks.
A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.

A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.

Codes for our paper "SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge" (EMNLP 2020)

SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge Introduction SentiLARE is a sentiment-aware pre-trained language

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)
Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.

Hello from magnus Magnus provides four capabilities for data teams: Compute execution plan: A DAG representation of work that you want to get done. In

Comments
  • the code of content_planner not provided

    the code of content_planner not provided

    Hi,I'm greatly interested in your "Plan-then-Generate" work,but the content planner mentioned in your paper is not provided code,i doubt how to joint the Bert model and CRF to generate the conent planner.I'm looking forward to reading your code in this part.Could you please provide the complete code as soon as possible?Thanks~

    opened by jiangliqin 5
  • Could you provide the processing code for RDF (WebNLG) data and the processed RDF data?

    Could you provide the processing code for RDF (WebNLG) data and the processed RDF data?

    Hi, could you provide the processing code for RDF data and the processed RDF data with content plan? I find it difficult to parse the content plan using the method proposed in the paper because it is hard to align objects in the reference text with those in the input graph. These objects often appear in different representations in reference text and input graphs.

    opened by Nicoleqwerty 3
  • Require code for content planner

    Require code for content planner

    Hi Yixuan @yxuansu,

    Your paper is an interesting read, thanks for sharing your work and code.

    Could you please provide the missing code for the Content Planner module? I would appreciate it, even if it is an initial/unrefined version that you have at the moment.

    Thanks!

    opened by prajwalgatti 2
  • Content Planner Issues

    Content Planner Issues

    Start Training: Traceback (most recent call last): File "train.py", line 99, in ckpt_save_path, cuda_available, device) File "/home/fanyongfeng/PyCharm/PlanGen/content_planner/trainer.py", line 83, in model_training train_batch_src_tensor, train_batch_tgt_tensor, _ = data.get_next_train_batch(batch_size_per_gpu * number_of_gpu) File "/home/fanyongfeng/PyCharm/PlanGen/content_planner/dataclass.py", line 87, in get_next_train_batch batch_idx_list = random.sample(self.train_idx_list, batch_size) File "/home/fanyongfeng/.conda/envs/fanyfeng/lib/python3.6/random.py", line 320, in sample raise ValueError("Sample larger than population or is negative") ValueError: Sample larger than population or is negative

    Hello, I'm using prepare_ After the data.sh command runs the data, this problem occurs when using train.sh. Do you know why? thank you

    opened by FYF1997 0
Owner
Yixuan Su
Yixuan Su
Code for our paper Aspect Sentiment Quad Prediction as Paraphrase Generation in EMNLP 2021.

Aspect Sentiment Quad Prediction (ASQP) This repo contains the annotated data and code for our paper Aspect Sentiment Quad Prediction as Paraphrase Ge

Isaac 39 Dec 11, 2022
Pytorch implementation of CoCon: A Self-Supervised Approach for Controlled Text Generation

COCON_ICLR2021 This is our Pytorch implementation of COCON. CoCon: A Self-Supervised Approach for Controlled Text Generation (ICLR 2021) Alvin Chan, Y

alvinchangw 79 Dec 18, 2022
A Distributional Approach To Controlled Text Generation

A Distributional Approach To Controlled Text Generation This is the repository code for the ICLR 2021 paper "A Distributional Approach to Controlled T

NAVER 102 Jan 7, 2023
Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Storium GPT-2 Models This is the official repository for the GPT-2 models described in the EMNLP 2020 paper [STORIUM: A Dataset and Evaluation Platfor

Nader Akoury 27 Dec 20, 2022
Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Coming soon!

ToxiChat Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Install depen

Ashutosh Baheti 11 Jan 1, 2023
This repository is related to an Arabic tutorial, within the tutorial we discuss the common data structure and algorithms and their worst and best case for each, then implement the code using Python.

Data Structure and Algorithms with Python This repository is related to the Arabic tutorial here, within the tutorial we discuss the common data struc

Mohamed Ayman 33 Dec 2, 2022
Official repository for "PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation"

pair-emnlp2020 Official repository for the paper: Xinyu Hua and Lu Wang: PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long

Xinyu Hua 31 Oct 13, 2022
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Text-AutoAugment (TAA) This repository contains the code for our paper Text AutoAugment: Learning Compositional Augmentation Policy for Text Classific

LancoPKU 105 Jan 3, 2023
Implementation of CVPR 2021 paper "Spatially-invariant Style-codes Controlled Makeup Transfer"

SCGAN Implementation of CVPR 2021 paper "Spatially-invariant Style-codes Controlled Makeup Transfer" Prepare The pre-trained model is avaiable at http

null 118 Dec 12, 2022
A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

Keren Ye 35 Nov 20, 2022