Implementation of Sequence Generative Adversarial Nets with Policy Gradient

Related tags

Deep Learning SeqGAN
Overview

SeqGAN

Requirements:

  • Tensorflow r1.0.1
  • Python 2.7
  • CUDA 7.5+ (For GPU)

Introduction

Apply Generative Adversarial Nets to generating sequences of discrete tokens.

The illustration of SeqGAN. Left: D is trained over the real data and the generated data by G. Right: G is trained by policy gradient where the final reward signal is provided by D and is passed back to the intermediate action value via Monte Carlo search.

The research paper SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient has been accepted at the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17).

We provide example codes to repeat the synthetic data experiments with oracle evaluation mechanisms. To run the experiment with default parameters:

$ python sequence_gan.py

You can change the all the parameters in sequence_gan.py.

The experiment has two stages. In the first stage, use the positive data provided by the oracle model and Maximum Likelihood Estimation to perform supervise learning. In the second stage, use adversarial training to improve the generator.

After running the experiments, you could get the negative log-likelihodd performance saved in save/experiment-log.txt like:

pre-training...
epoch:	0	nll:	10.1716
epoch:	5	nll:	9.42939
epoch:	10	nll:	9.2388
epoch:	15	nll:	9.11899
epoch:	20	nll:	9.13099
epoch:	25	nll:	9.14474
epoch:	30	nll:	9.12539
epoch:	35	nll:	9.13982
epoch:	40	nll:	9.135
epoch:	45	nll:	9.13081
epoch:	50	nll:	9.10678
epoch:	55	nll:	9.10694
epoch:	60	nll:	9.10349
epoch:	65	nll:	9.10403
epoch:	70	nll:	9.07613
epoch:	75	nll:	9.091
epoch:	80	nll:	9.08909
epoch:	85	nll:	9.0807
epoch:	90	nll:	9.08434
epoch:	95	nll:	9.08936
epoch:	100	nll:	9.07443
epoch:	105	nll:	9.08305
epoch:	110	nll:	9.06973
epoch:	115	nll:	9.07058
adversarial training...
epoch:	0	nll:	9.08457
epoch:	5	nll:	9.04511
epoch:	10	nll:	9.03079
epoch:	15	nll:	8.99239
epoch:	20	nll:	8.96401
epoch:	25	nll:	8.93864
epoch:	30	nll:	8.91642
epoch:	35	nll:	8.87761
epoch:	40	nll:	8.88582
epoch:	45	nll:	8.8592
epoch:	50	nll:	8.83388
epoch:	55	nll:	8.81342
epoch:	60	nll:	8.80247
epoch:	65	nll:	8.77778
epoch:	70	nll:	8.7567
epoch:	75	nll:	8.73002
epoch:	80	nll:	8.72488
epoch:	85	nll:	8.72233
epoch:	90	nll:	8.71473
epoch:	95	nll:	8.71163
epoch:	100	nll:	8.70113
epoch:	105	nll:	8.69879
epoch:	110	nll:	8.69208
epoch:	115	nll:	8.69291
epoch:	120	nll:	8.68371
epoch:	125	nll:	8.689
epoch:	130	nll:	8.68989
epoch:	135	nll:	8.68269
epoch:	140	nll:	8.68647
epoch:	145	nll:	8.68066
epoch:	150	nll:	8.6832

Note: this code is based on the previous work by ofirnachum. Many thanks to ofirnachum.

Comments
  • why should update rollout policy in this way?

    why should update rollout policy in this way?

    According to the paper, rollout policy is the same with generator policy. So self.Wi = self.lstm.Wi, but in the code, here update parameters of rollout policy in a different way. Can you please explain why? Thank you very much @LantaoYu @wnzhang

    opened by vanpersie32 8
  • Understanding of reward and loss function

    Understanding of reward and loss function

    Hello,

    i don't understand the combination of reward and loss function. The label which are given to the discriminator are defined as followed:

    positive_labels = [[0, 1] for _ in positive_examples] negative_labels = [[1, 0] for _ in negative_examples]

    The reward then is always the second, positive label:

    ypred_for_auc = sess.run(discriminator.ypred_for_auc, feed) ypred = np.array([item[1] for item in ypred_for_auc])

    So if the reward gets larger, the samples are identified as real class. But then in the loss function of the generator, the reward is multiplied to the loss:

    self.g_loss = -tf.reduce_sum( tf.reduce_sum( tf.one_hot(tf.to_int32(tf.reshape(self.x, [-1])), self.num_emb, 1.0, 0.0) * tf.log( tf.clip_by_value(tf.reshape(self.g_predictions, [-1, self.num_emb]), 1e-20, 1.0) ), 1) * tf.reshape(self.rewards, [-1]) )

    I don't understand that, since the loss gets minimized, the rewards will be minimized too. So shouldn't be taken the label item[0] for ypred?

    opened by tocab 7
  • How to get the parameters of a different target_lstm?

    How to get the parameters of a different target_lstm?

    I noticed that there is a class named TARGET_LSTM which uses the predefined parameters from target_params.pkl. My question is that if I use my own data and different global parameters, such as EMB_DIM, HIDDEN_DIM, SEQ_LENGTH.., how to obtain the parameters for TARGET_LSTM? What is the usage of TARGET_LSTM?

    opened by xiaopyyy 7
  • target_params.pkl open using pyhotn3 pickle failed.

    target_params.pkl open using pyhotn3 pickle failed.

    hi, I am trying run this repo under pyhton3, original implements using python2, code is fine but the target_params.pkl can not open, do you have an alternate version of that file which can open using python3?

    opened by jinfagang 6
  • evaluation issues

    evaluation issues

    Hi there, I got a question about the evaluation on text generation. In your AAAI2017 paper, you have mentioned that for the Chinese poem generation you "use the whole test set as the references" to caculate the BLEU score. What is the meaning of "references"? How to use the test samples as the "positive examples" as you mentioned? Are all the test samples loaded to the model as the input? And then the BLEU scores are calculated based on the corresponding output from the model? I would appreciate if you could explain more details about the evaluation procedure. Thanks in advance!

    opened by xiaopyyy 6
  •  global_variables_initializer error - tensor flow 12.1

    global_variables_initializer error - tensor flow 12.1

    Hi.

    I would like to start generating models from my own data but I'm unable to get the code working. Running ubuntu 16 & 14 64 bit and current tensorflow 12.1 on nvidia gpu. I've checked several forums and there are similar issues with tensorflow 12.

    If this cannot be fixed is it possible to post your previous code that works on earlier versions of tensorflow?

    Thanks

    Traceback (most recent call last): File "pretrain_experiment.py", line 123, in main() File "pretrain_experiment.py", line 93, in main sess.run(tf.global_variables_initializer) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 766, in run run_metadata_ptr) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 951, in _run fetch_handler = _FetchHandler(self._graph, fetches, feed_dict_string) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 407, in init self._fetch_mapper = _FetchMapper.for_fetch(fetches) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 238, in for_fetch return _ElementFetchMapper(fetches, contraction_fn) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 271, in init % (fetch, type(fetch), str(e))) TypeError: Fetch argument <function global_variables_initializer at 0x7fb92909bb18> has invalid type <type 'function'>, must be a string or Tensor. (Can not convert a function into a Tensor or Operation.)

    Similar error when running sequence_gan.py

    opened by GenTxt 4
  • Question about Rollout

    Question about Rollout

    In this loop:

    https://github.com/LantaoYu/SeqGAN/blob/5f2c0a5c978826febe94864da69c77c00f237f81/rollout.py#L79

    This is N Time Monte Carlo sampling with n = 16 in the code. But how are the different samples generated? given_num represents how many tokens to use from the input, and irepresents the i'th sample. Why are the samples different for different values of i? Is the rollout network being updated somewhere within call to get_reward and I'm missing it? I also don't see where the randomness is coming in for the Monte Carlo estimation of the partial sequence reward.

    From my examination of the code, the network doesn't get updated and the session parameters are the same so I'm not sure how different samples are being generated.

    Can someone help me understand how a) different samples are being generated, b) where is the randomness coming from, c) if the rollout network has the same parameters as the Generator network, how is it generating different samples than the generator?

    Any help is greatly appreciated! Thank you for providing this code it has been very helpful to me.

    opened by nathan-whitaker 3
  • about the loss of generator

    about the loss of generator

    Hi, I have read your code in generator.py (line 106-113), # Unsupervised Training self.g_loss = -tf.reduce_sum( tf.reduce_sum( tf.one_hot(tf.to_int32(tf.reshape(self.x, [-1])), self.num_emb, 1.0, 0.0) * tf.log( tf.clip_by_value(tf.reshape(self.g_predictions, [-1, self.num_emb]), 1e-20, 1.0) ), 1) * tf.reshape(self.rewards, [-1]) ) I find the the variable (self.g_predictions and self.x) is the same as the variable in self.pretrain_loss, but since this is for the Unsupervised Training, the variable self.g_predictions should be replaced with the variable ( line 50, tf.nn.softmax(o_t) ). After all, the line 47-56 do not use the supervised information (self.x). Is there any reason ?

    opened by wjb123 2
  • How to train and generate from our own Data?

    How to train and generate from our own Data?

    Can you provide any guidance on how to train and generate from my own data? I would like to try SeqGAN with various English poetry and prose, but I am not sure how to change this code to train on my own data and then generate new writing.

    opened by ghost 2
  • Please Cite

    Please Cite

    Hey - looks like you heavily based this code on my implementation at https://github.com/ofirnachum/sequence_gan

    It's nice to see that you got it working on bigger problems, but please cite/reference my work.

    opened by ofirnachum 2
  • where does the randomness of the generator come from?

    where does the randomness of the generator come from?

    It seems that according to the generator, at the beginning of unrolling, the input is START_TOKEN. And seems that there's no source of randomness to feed into the network, so shouldn't the trained network be deterministic?

    opened by kunrenzhilu 1
  • Questions about the recurring results of the code

    Questions about the recurring results of the code

    Has anyone tried to implement this code? I have generated a lot of combinations of numbers, such as ‘7811 3499 2314’. I don’t know the meaning of these numbers. Who can give me an answer?

    opened by giraffeCjl 1
  • How to resume training in Colab?

    How to resume training in Colab?

    I am running the sequence_gan.py file, the pre-training has been started. Since it is running on Colab, it will stop after a specific time. What's the procedure to resume it after the next time starts on Colab?

    opened by syedasara-angelium 0
  • dataset

    dataset

    I run the code and some text files are mentioned in the code file sequence_gan.py such as positive_file = 'save/real_data.txt' and negative_file = 'save/generator_sample.txt' and eval_file = 'save/eval_file.txt' which I do not know where those files came from because there are no such files in the save folder. please help me

    opened by saharjandaghy 1
  • About generator in adversarial training

    About generator in adversarial training

    Hi, I am very interested in your work. About 'g, d and k', I think 'g' means one epoch in the paper. But in code, I think it may be one batch. So, could u help me with it ? Thanks a lot.

    opened by xieexiaotuzi 2
Owner
Lantao Yu
Ph.D. Student at Stanford CS Department
Lantao Yu
Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

V-MPO Simple code to demonstrate Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO) in Pyt

Nugroho Dewantoro 9 Jun 6, 2022
A PyTorch implementation of Learning to learn by gradient descent by gradient descent

Intro PyTorch implementation of Learning to learn by gradient descent by gradient descent. Run python main.py TODO Initial implementation Toy data LST

Ilya Kostrikov 300 Dec 11, 2022
This project provides a stock market environment using OpenGym with Deep Q-learning and Policy Gradient.

Stock Trading Market OpenAI Gym Environment with Deep Reinforcement Learning using Keras Overview This project provides a general environment for stoc

Kim, Ki Hyun 769 Dec 25, 2022
PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).

This is the original implementation of our paper, A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem (arXiv:1706.1

Zhengyao Jiang 1.5k Dec 29, 2022
A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms This repo contains the source code to reproduce the results in the paper A Close

Costa Huang 73 Dec 24, 2022
Trains an agent with stochastic policy gradient ascent to solve the Lunar Lander challenge from OpenAI

Introduction This script trains an agent with stochastic policy gradient ascent to solve the Lunar Lander challenge from OpenAI. In order to run this

Momin Haider 0 Jan 2, 2022
Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

SETR - Pytorch Since the original paper (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.) has no official

zhaohu xing 112 Dec 16, 2022
An implementation of a sequence to sequence neural network using an encoder-decoder

Keras implementation of a sequence to sequence model for time series prediction using an encoder-decoder architecture. I created this post to share a

Luke Tonin 195 Dec 17, 2022
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Segmentation Transformer Implementation of Segmentation Transformer in PyTorch, a new model to achieve SOTA in semantic segmentation while using trans

Abhay Gupta 161 Dec 8, 2022
Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021) Citation Please cite as: @inproceedings{liu2020understan

Sunbow Liu 22 Nov 25, 2022
[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Fudan Zhang Vision Group 897 Jan 5, 2023
Sequence to Sequence Models with PyTorch

Sequence to Sequence models with PyTorch This repository contains implementations of Sequence to Sequence (Seq2Seq) models in PyTorch At present it ha

Sandeep Subramanian 708 Dec 19, 2022
Sequence-to-Sequence learning using PyTorch

Seq2Seq in PyTorch This is a complete suite for training sequence-to-sequence models in PyTorch. It consists of several models and code to both train

Elad Hoffer 514 Nov 17, 2022
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

Maha 490 Dec 15, 2022
Sequence lineage information extracted from RKI sequence data repo

Pango lineage information for German SARS-CoV-2 sequences This repository contains a join of the metadata and pango lineage tables of all German SARS-

Cornelius Roemer 24 Oct 26, 2022
Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Paper | Blog OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image gene

OFA Sys 1.4k Jan 8, 2023
Woosung Choi 63 Nov 14, 2022
Create animations for the optimization trajectory of neural nets

Animating the Optimization Trajectory of Neural Nets loss-landscape-anim lets you create animated optimization path in a 2D slice of the loss landscap

Logan Yang 81 Dec 25, 2022
SMD-Nets: Stereo Mixture Density Networks

SMD-Nets: Stereo Mixture Density Networks This repository contains a Pytorch implementation of "SMD-Nets: Stereo Mixture Density Networks" (CVPR 2021)

Fabio Tosi 115 Dec 26, 2022