Pytorch implementation of CoCon: A Self-Supervised Approach for Controlled Text Generation

alvinchangw

Last update: Dec 18, 2022

Related tags

Deep Learning COCON_ICLR2021

Overview

COCON_ICLR2021

This is our Pytorch implementation of COCON.

CoCon: A Self-Supervised Approach for Controlled Text Generation (ICLR 2021)
Alvin Chan, Yew-Soon Ong, Bill Pung, Aston Zhang, Jie Fu
https://arxiv.org/abs/2010.02684

TL;DR: We propose CoCon to control the content of text generation from LMs by conditioning on content inputs at an interleave layer.

Requirements

Python 3.7.6 on Linux
PyTorch 1.4

Dependencies

Install dependencies with:

pip install -r requirements.txt

Dataset

Download COCON's training data from https://github.com/openai/gpt-2-output-dataset
Place the medium-345M-k40.${split}.jsonl files inside the data/gpt2output/ folder

COCON Training

Train COCON with a GPT-2 language model, with the parameters reported in the paper:

sh train_cocon.sh

After training, the COCON block's weights will be saved as models/COCON/cocon_block_pytorch_model.bin.

Training Key Arguments

--do_train : whether to train COCON or not
--output_dir : directory of COCON weights
--model_name_or_path : type of language model to train COCON with
--output_hidden_for_cocon_after_block_ind : index of transformer block whose hidden states are used as input to COCON for content conditioning, value is 6 for results reported in paper, meaning that the output of GPT-2's 7th transformer block is used as COCON block's input.

Pretrained COCON weights

You can download COCON's pretrained weights here and save it in models/COCON/ to start generating with COCON.

COCON Controlled Generation

Sample script on how to generate COCON sentiment-controlled text:

sh generation/generate_cocon_sentiments.sh

Sample script on how to generate COCON topic-controlled text:

sh generation/generate_cocon_topics.sh

COCON-generated texts correspond to the cocon_output key in the output .jsonl files and Cocon AR output in the output .txt files.

Generation Key Arguments

--do_cocon_compute : whether to do COCON generation
--output_dir : directory of COCON block's weights
--model_name_or_path : type of language model
--cocon_output_filename : path of saved generation samples
--cocon_compute_history_source_data_file : filename of text file containing prompt texts for generation
--cocon_compute_context_source_data_file : filename of text file containing target content for generation

Summary of Key Folders/Files

transformers/: code for models and optimizers
transformers/modeling_gpt2.py: code for COCON block and GPT-2 language model
BOW/: target content tokens used for COCON topic control
attr_markers/: target content tokens used for COCON sentiment control
prompts/: prompt text used for text generation

Citation

If you find our repository useful, please consider citing our paper:

@inproceedings{
chan2021cocon,
title={CoCon: A Self-Supervised Approach for Controlled Text Generation},
author={Alvin Chan and Yew-Soon Ong and Bill Pung and Aston Zhang and Jie Fu},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=VD_ozqvBy4W}
}

Acknowledgements

Code is based largely on:

https://github.com/huggingface/transformers

Comments

A problem about training time of model

Hello, first of all, thank you very much for your training model, but I want to experience the process of my own training model, but the computer prompts me that it will take me a few years to train, which sounds ridiculous, but it's true. I want to ask you how long the training model has been used, and do you use GPU to train? And I have reduced the data set, but it has no effect. The training time has not been greatly reduced, or it will take several years. So I think the training time may not have much to do with the size of the data set, so what should affect the training time? It's very presumptuous to disturb you because of such a simple question, but I really need your help. Thank you very much and look forward to your reply.

opened by zh57398 3
wired output from "cocon_output"

Hi,

Thanks for your great work. I tried to run your generation script, but I got very disfluent "cocon_output".

For example, Given original_input_text "<|endoftext|>Once upon a time" and context_text "is perfect",

cocon_output is "<|endoftext|>Once upon a time each compuls gone 20 J-t and like to been and sa p I less millions the three is ( other remaining more we party in few V the only other end one inf only really more inf have S t and here super so Jp Lor now"

while the prependgpt2_ar_gen "is perfect<|endoftext|>Once upon a time I worked with a bike hacker who used specialised software in simple ways to automate various processes and monitor game flow. A few months ago, at the start of the second-year cycle (which sadly involved a lot of fake (and now living)"

I think the prependgpt2_ar_gen output is much better than the cocon output. I am wondering if this is correct output. Can you check to see if this is the real output from your cocon model or there is something wrong with the script? I basically follow everything in this repo.

Thanks

opened by GaryYufei 1
About pre training weights

The pre training weight I downloaded is lost. I want to download it again, but I can't enter the website you provided again. Where else can I get the training weight? We sincerely look forward to your reply

opened by zh57398 0
About topic evaluation Classifier "COMPUTERS" data

First of all, thank you for your work and code. For the part of using classifier evaluation to generate text categories, you pointed out in the appendix that you used“ https://www.kaggle.com/rmisra/news-category-dataset ”But I can't find the text about the category of "computers" in this data. Where should I get the text data of "COMPUTERS"? This is very important to me. I look forward to your reply. Thank you again!

opened by zh57398 0
'cocon_block' has no attribute 'cocon_attn'

We meet a problem like this: Traceback (most recent call last): File "traininfer_cocon.py", line 2980, in main() File "traininfer_cocon.py", line 2783, in main global_step, tr_loss = train_cocon(args, train_dataset, model, tokenizer, cocon_block=cocon_block, disc_model=disc_model, model_config=config, transform_h_after_layernorm=args.transform_h_after_layernorm) File "traininfer_cocon.py", line 573, in train_cocon self_cocon_lm_loss_grad = torch.autograd.grad(self_cocon_lm_loss, cocon_block.cocon_attn.c_attn.weight, retain_graph=True)[0] File "/home/yingwenjing/anaconda3/envs/cocon/lib/python3.7/site-packages/torch/nn/modules/module.py", line 576, in getattr type(self).name, name)) AttributeError: 'DataParallel' object has no attribute 'cocon_attn'

We sincerely look forward to your reply.

opened by wying8349 2
Which version of huggingface/transformers do you refer to?

I meet on this problem:

from .modeling_tf_utils import TFPreTrainedModel, get_initializer, keras_serializable, shape_list ImportError: cannot import name 'keras_serializable'

The version of huggingface/transformers in my env is 2.10.0.

opened by HunYuanfeng 0

Pytorch implementation of CoCon: A Self-Supervised Approach for Controlled Text Generation

Related tags

Overview

COCON_ICLR2021

Requirements

Dependencies

Dataset

COCON Training

Training Key Arguments

Pretrained COCON weights

COCON Controlled Generation

Generation Key Arguments

Summary of Key Folders/Files

Citation

Acknowledgements

Comments

A problem about training time of model

wired output from "cocon_output"

About pre training weights

About topic evaluation Classifier "COMPUTERS" data

'cocon_block' has no attribute 'cocon_attn'

Which version of huggingface/transformers do you refer to?

Owner

alvinchangw

Related resources for our EMNLP 2021 paper

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

RODD: A Self-Supervised Approach for Robust Out-of-Distribution Detection

PyTorch implementation of SCAFFOLD (Stochastic Controlled Averaging for Federated Learning, ICML 2020).

CausalNLP is a practical toolkit for causal inference with text as treatment, outcome, or "controlled-for" variable.

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

Implementation of CVPR 2021 paper "Spatially-invariant Style-codes Controlled Makeup Transfer"

[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models

Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis

Gesture-controlled Video Game. Just swing your finger and play the game without touching your PC

GuideDog is an AI/ML-based mobile app designed to assist the lives of the visually impaired, 100% voice-controlled

Control-Robot-Arm-using-PS4-Controller - A Robotic Arm based on Raspberry Pi and Arduino that controlled by PS4 Controller

Control-Raspberry-Pi-Robot-using-Hand-Gestures - A 4WD Robot car based on Raspberry Pi that controlled by hand gestures(using openCV and mediapipe)

Federated Learning - Including common test models for federated learning, like CNN, Resnet18 and lstm, controlled by different parser