Related resources for our EMNLP 2021 paper

Yixuan Su

Last update: Jan 3, 2023

Related tags

Deep Learning PlanGen

Overview

Plan-then-Generate: Controlled Data-to-Text Generation via Planning

Authors: Yixuan Su, David Vandyke, Sihui Wang, Yimai Fang, and Nigel Collier

Code for EMNLP 2021 paper Plan-then-Generate: Controlled Data-to-Text Generation via Planning

1. Environment Setup:

(1) Hardware Requirement:

The code in this repo is thoroughly tested on our machine with a single Nvida V100 GPU (16GB)

(2) Installation:

chmod +x ./config_setup.sh
./config_setup.sh

2. ToTTo Data Preprocessing:

Option (1): Preprocess the ToTTo data from scratch by yourself:

cd ./data
chmod +x ./prepare_data.sh
./prepare_data.sh

This process could take up to 1 hour

Option (2): Download the our processed data here

unzip data.zip and replace with the empty ./data folder

For more details about ToTTo dataset, please refer to the original Google Research repo

3. Content Planner:

Please refer to README.md in ./content_planner folder

4. Sequence Generator:

Please refer to README.md in ./generator folder

5. Citation

If you find our paper and resources useful, please kindly cite our paper:

@inproceedings{su2021plangen,
    title={Plan-then-Generate: Controlled Data-to-Text Generation via Planning}, 
     author={Yixuan Su and David Vandyke and Sihui Wang and Yimai Fang and Nigel Collier},
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    month = nov,
    year = "2021",
    publisher = "Association for Computational Linguistics",
}

This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories This repo is the code release of EMNLP 2021 con

12 Nov 22, 2022

CausalNLP is a practical toolkit for causal inference with text as treatment, outcome, or "controlled-for" variable.

CausalNLP CausalNLP is a practical toolkit for causal inference with text as treatment, outcome, or "controlled-for" variable. Install pip install -U

95 Jan 3, 2023

[EMNLP 2020] Keep CALM and Explore: Language Models for Action Generation in Text-based Games

Contextual Action Language Model (CALM) and the ClubFloyd Dataset Code and data for paper Keep CALM and Explore: Language Models for Action Generation

43 Dec 16, 2022

This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" (SPNLP@ACL2022)

GP-VAE This repository provides datasets and code for preprocessing, training and testing models for the paper: Diverse Text Generation via Variationa

18 Dec 29, 2022

Automatically download the cwru data set, and then divide it into training data set and test data set

Automatically download the cwru data set, and then divide it into training data set and test data set.自动下载cwru数据集，然后分训练数据集和测试数据集

6 Jun 27, 2022

A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.

224 Jan 4, 2023

Codes for our paper "SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge" (EMNLP 2020)

SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge Introduction SentiLARE is a sentiment-aware pre-trained language

74 Dec 30, 2022

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

183 Jan 3, 2023

Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.

Hello from magnus Magnus provides four capabilities for data teams: Compute execution plan: A DAG representation of work that you want to get done. In

12 Feb 8, 2022

Comments

the code of content_planner not provided

Hi,I'm greatly interested in your "Plan-then-Generate" work,but the content planner mentioned in your paper is not provided code,i doubt how to joint the Bert model and CRF to generate the conent planner.I'm looking forward to reading your code in this part.Could you please provide the complete code as soon as possible?Thanks~

opened by jiangliqin 5
Could you provide the processing code for RDF (WebNLG) data and the processed RDF data?

Hi, could you provide the processing code for RDF data and the processed RDF data with content plan? I find it difficult to parse the content plan using the method proposed in the paper because it is hard to align objects in the reference text with those in the input graph. These objects often appear in different representations in reference text and input graphs.

opened by Nicoleqwerty 3
Require code for content planner

Hi Yixuan @yxuansu,

Your paper is an interesting read, thanks for sharing your work and code.

Could you please provide the missing code for the Content Planner module? I would appreciate it, even if it is an initial/unrefined version that you have at the moment.

Thanks!

opened by prajwalgatti 2
Content Planner Issues

Start Training: Traceback (most recent call last): File "train.py", line 99, in ckpt_save_path, cuda_available, device) File "/home/fanyongfeng/PyCharm/PlanGen/content_planner/trainer.py", line 83, in model_training train_batch_src_tensor, train_batch_tgt_tensor, _ = data.get_next_train_batch(batch_size_per_gpu * number_of_gpu) File "/home/fanyongfeng/PyCharm/PlanGen/content_planner/dataclass.py", line 87, in get_next_train_batch batch_idx_list = random.sample(self.train_idx_list, batch_size) File "/home/fanyongfeng/.conda/envs/fanyfeng/lib/python3.6/random.py", line 320, in sample raise ValueError("Sample larger than population or is negative") ValueError: Sample larger than population or is negative

Hello, I'm using prepare_ After the data.sh command runs the data, this problem occurs when using train.sh. Do you know why? thank you

opened by FYF1997 0

Related resources for our EMNLP 2021 paper

Related tags

Overview

Plan-then-Generate: Controlled Data-to-Text Generation via Planning

1. Environment Setup:

(1) Hardware Requirement:

(2) Installation:

2. ToTTo Data Preprocessing:

Option (1): Preprocess the ToTTo data from scratch by yourself:

Option (2): Download the our processed data here

3. Content Planner:

4. Sequence Generator:

5. Citation

You might also like...

This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

CausalNLP is a practical toolkit for causal inference with text as treatment, outcome, or "controlled-for" variable.

[EMNLP 2020] Keep CALM and Explore: Language Models for Action Generation in Text-based Games

This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" (SPNLP@ACL2022)

Automatically download the cwru data set, and then divide it into training data set and test data set

A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.

Codes for our paper "SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge" (EMNLP 2020)

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.

Comments

the code of content_planner not provided

Could you provide the processing code for RDF (WebNLG) data and the processed RDF data?

Require code for content planner

Content Planner Issues

Owner

Yixuan Su

Code for our paper Aspect Sentiment Quad Prediction as Paraphrase Generation in EMNLP 2021.

Pytorch implementation of CoCon: A Self-Supervised Approach for Controlled Text Generation

A Distributional Approach To Controlled Text Generation

Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Coming soon!

This repository is related to an Arabic tutorial, within the tutorial we discuss the common data structure and algorithms and their worst and best case for each, then implement the code using Python.

Official repository for "PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation"

Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Implementation of CVPR 2021 paper "Spatially-invariant Style-codes Controlled Makeup Transfer"

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''