Deal or No Deal? End-to-End Learning for Negotiation Dialogues

Facebook Research

Last update: Dec 29, 2022

Related tags

Deep Learning end-to-end-negotiator

Overview

Introduction

This is a PyTorch implementation of the following research papers:

The code is developed by Facebook AI Research.

The code trains neural networks to hold negotiations in natural language, and allows reinforcement learning self play and rollout-based planning.

Citation

If you want to use this code in your research, please cite:

@inproceedings{DBLP:conf/icml/YaratsL18,
  author    = {Denis Yarats and
               Mike Lewis},
  title     = {Hierarchical Text Generation and Planning for Strategic Dialogue},
  booktitle = {Proceedings of the 35th International Conference on Machine Learning,
               {ICML} 2018, Stockholmsm{\"{a}}ssan, Stockholm, Sweden, July
               10-15, 2018},
  pages     = {5587--5595},
  year      = {2018},
  crossref  = {DBLP:conf/icml/2018},
  url       = {http://proceedings.mlr.press/v80/yarats18a.html},
  timestamp = {Fri, 13 Jul 2018 14:58:25 +0200},
  biburl    = {https://dblp.org/rec/bib/conf/icml/YaratsL18},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Dataset

We release our dataset together with the code, you can find it under data/negotiate. This dataset consists of 5808 dialogues, based on 2236 unique scenarios. Take a look at §2.3 of the paper to learn about data collection.

Each dialogue is converted into two training examples in the dataset, showing the complete conversation from the perspective of each agent. The perspectives differ on their input goals, output choice, and in special tokens marking whether a statement was read or written. See §3.1 for the details on data representation.

# Perspective of Agent 1
<input> 1 4 4 1 1 2 </input>
<dialogue> THEM: i would like 4 hats and you can have the rest . <eos> YOU: deal <eos> THEM: <selection> </dialogue>
<output> item0=1 item1=0 item2=1 item0=0 item1=4 item2=0 </output> 
<partner_input> 1 0 4 2 1 2 </partner_input>

# Perspective of Agent 2
<input> 1 0 4 2 1 2 </input>
<dialogue> YOU: i would like 4 hats and you can have the rest . <eos> THEM: deal <eos> YOU: <selection> </dialogue>
<output> item0=0 item1=4 item2=0 item0=1 item1=0 item2=1 </output>
<partner_input> 1 4 4 1 1 2 </partner_input>

Setup

All code was developed with Python 3.0 on CentOS Linux 7, and tested on Ubuntu 16.04. In addition, we used PyTorch 1.0.0, CUDA 9.0, and Visdom 0.1.8.4.

We recommend to use Anaconda. In order to set up a working environment follow the steps below:

# Install anaconda
conda create -n py30 python=3 anaconda
# Activate environment
source activate py30
# Install PyTorch
conda install pytorch torchvision cuda90 -c pytorch
# Install Visdom if you want to use visualization
pip install visdom

Usage

Supervised Training

Action Classifier

We use an action classifier to compare performance of various models. The action classifier is described in section 3 of (2). It can be trained by running the following command:

python train.py \
--cuda \
--bsz 16 \
--clip 2.0 \
--decay_every 1 \
--decay_rate 5.0 \
--domain object_division \
--dropout 0.1 \
--init_range 0.2 \
--lr 0.001 \
--max_epoch 7 \
--min_lr 1e-05 \
--model_type selection_model \
--momentum 0.1 \
--nembed_ctx 128 \
--nembed_word 128 \
--nhid_attn 128 \
--nhid_ctx 64 \
--nhid_lang 128 \
--nhid_sel 128 \
--nhid_strat 256 \
--unk_threshold 20 \
--skip_values \
--sep_sel \
--model_file selection_model.th

Baseline RNN Model

This is the baseline RNN model that we describe in (1):

python train.py \
--cuda \
--bsz 16 \
--clip 0.5 \
--decay_every 1 \
--decay_rate 5.0 \
--domain object_division \
--dropout 0.1 \
--model_type rnn_model \
--init_range 0.2 \
--lr 0.001 \
--max_epoch 30 \
--min_lr 1e-07 \
--momentum 0.1 \
--nembed_ctx 64 \
--nembed_word 256 \
--nhid_attn 64 \
--nhid_ctx 64 \
--nhid_lang 128 \
--nhid_sel 128 \
--sel_weight 0.6 \
--unk_threshold 20 \
--sep_sel \
--model_file rnn_model.th

Hierarchical Latent Model

In this section we provide guidelines on how to train the hierarchical latent model from (2). The final model requires two sub-models: the clustering model, which learns compact representations over intents; and the language model, which translates intent representations into language. Please read sections 5 and 6 of (2) for more details.

Clustering Model

python train.py \
--cuda \
--bsz 16 \
--clip 2.0 \
--decay_every 1 \
--decay_rate 5.0 \
--domain object_division \
--dropout 0.2 \
--init_range 0.3 \
--lr 0.001 \
--max_epoch 15 \
--min_lr 1e-05 \
--model_type latent_clustering_model \
--momentum 0.1 \
--nembed_ctx 64 \
--nembed_word 256 \
--nhid_ctx 64 \
--nhid_lang 256 \
--nhid_sel 128 \
--nhid_strat 256 \
--unk_threshold 20 \
--num_clusters 50 \
--sep_sel \
--skip_values \
--nhid_cluster 256 \
--selection_model_file selection_model.th \
--model_file clustering_model.th

Language Model

python train.py \
--cuda \
--bsz 16 \
--clip 2.0 \
--decay_every 1 \
--decay_rate 5.0 \
--domain object_division \
--dropout 0.1 \
--init_range 0.2 \
--lr 0.001 \
--max_epoch 15 \
--min_lr 1e-05 \
--model_type latent_clustering_language_model \
--momentum 0.1 \
--nembed_ctx 64 \
--nembed_word 256 \
--nhid_ctx 64 \
--nhid_lang 256 \
--nhid_sel 128 \
--nhid_strat 256 \
--unk_threshold 20 \
--num_clusters 50 \
--sep_sel \
--nhid_cluster 256 \
--skip_values \
--selection_model_file selection_model.th \
--cluster_model_file clustering_model.th \
--model_file clustering_language_model.th

Full Model

python train.py \
--cuda \
--bsz 16 \
--clip 2.0 \
--decay_every 1 \
--decay_rate 5.0 \
--domain object_division \
--dropout 0.2 \
--init_range 0.3 \
--lr 0.001 \
--max_epoch 10 \
--min_lr 1e-05 \
--model_type latent_clustering_prediction_model \
--momentum 0.2 \
--nembed_ctx 64 \
--nembed_word 256 \
--nhid_ctx 64 \
--nhid_lang 256 \
--nhid_sel 128 \
--nhid_strat 256 \
--unk_threshold 20 \
--num_clusters 50 \
--sep_sel \
--selection_model_file selection_model.th \
--lang_model_file clustering_language_model.th \
--model_file full_model.th

Selfplay

If you want to have two pretrained models to negotiate against each another, use selfplay.py. For example, lets have two rnn models to play against each other:

python selfplay.py \
--cuda \
--alice_model_file rnn_model.th \
--bob_model_file rnn_model.th \
--context_file data/negotiate/selfplay.txt  \
--temperature 0.5 \
--selection_model_file selection_model.th

The script will output generated dialogues, as well as some statistics. For example:

================================================================================
Alice : book=(count:3 value:1) hat=(count:1 value:5) ball=(count:1 value:2)
Bob   : book=(count:3 value:1) hat=(count:1 value:1) ball=(count:1 value:6)
--------------------------------------------------------------------------------
Alice : i would like the hat and the ball . <eos>
Bob   : i need the ball and the hat <eos>
Alice : i can give you the ball and one book . <eos>
Bob   : i can't make a deal without the ball <eos>
Alice : okay then i will take the hat and the ball <eos>
Bob   : okay , that's fine . <eos>
Alice : <selection>
Alice : book=0 hat=1 ball=1 book=3 hat=0 ball=0
Bob   : book=3 hat=0 ball=0 book=0 hat=1 ball=1
--------------------------------------------------------------------------------
Agreement!
Alice : 7 points
Bob   : 3 points
--------------------------------------------------------------------------------
dialog_len=4.47 sent_len=6.93 agree=86.67% advantage=3.14 time=2.069s comb_rew=10.93 alice_rew=6.93 alice_sel=60.00% alice_unique=26 bob_rew=4.00 bob_sel=40.00% bob_unique=25 full_match=0.78 
--------------------------------------------------------------------------------
debug: 3 1 1 5 1 2 item0=0 item1=1 item2=1
debug: 3 1 1 1 1 6 item0=3 item1=0 item2=0
================================================================================

Reinforcement Learning

To fine-tune a pretrained model with RL use the reinforce.py script:

python reinforce.py \
--cuda \
--alice_model_file rnn_model.th \
--bob_model_file rnn_model.th \
--output_model_file rnn_rl_model.th \
--context_file data/negotiate/selfplay.txt  \
--temperature 0.5 \
--verbose \
--log_file rnn_rl.log \
--sv_train_freq 4 \
--nepoch 4 \
--selection_model_file selection_model.th  \
--rl_lr 0.00001 \
--rl_clip 0.0001 \
--sep_sel

License

This project is licenced under CC-by-NC, see the LICENSE file for details.

Comments

Invalid argument 2: dimension 2 out of range of 2D tensor

Hello,

When the code below is run a error is throwed. Could someone help me with this problem?

ps: cuda support is disabled.

# concatenate attention and context hidden and pass it to the selection encoder h = torch.cat([attn, ctx_h], 2).squeeze(0) h = self.dropout(h) h = self.sel_encoder.forward(h)

python train.py data data/negotiate --bsz 16 --clip 0.5
--decay_every 1
--decay_rate 5.0
--dropout 0.5
--init_range 0.1
--lr 1
--max_epoch 30
--min_lr 0.01
--momentum 0.1
--nembed_ctx 64
--nembed_word 256
--nesterov
--nhid_attn 256
--nhid_ctx 64
--nhid_lang 128
--nhid_sel 256
--nhid_strat 128
--sel_weight 0.5
--model_file sv_model.th dataset data/negotiate/train.txt, total 687919, unks 8718, ratio 1.27% dataset data/negotiate/val.txt, total 74653, unks 914, ratio 1.22% dataset data/negotiate/test.txt, total 70262, unks 847, ratio 1.21% Traceback (most recent call last): File "train.py", line 105, in main() File "train.py", line 98, in main train_loss, valid_loss, select_loss = engine.train(corpus) File "/Users/diegosoaresub/Downloads/facebookIA/end-to-end-negotiator-master/src/engine.py", line 193, in train _, _, valid_select_loss = self.iter(N, epoch, lr, traindata, validdata) File "/Users/diegosoaresub/Downloads/facebookIA/end-to-end-negotiator-master/src/engine.py", line 161, in iter train_loss, train_time = self.train_pass(N, trainset) File "/Users/diegosoaresub/Downloads/facebookIA/end-to-end-negotiator-master/src/engine.py", line 104, in train_pass out, hid, tgt, sel_out, sel_tgt = Engine.forward(self.model, batch, volatile=False) File "/Users/diegosoaresub/Downloads/facebookIA/end-to-end-negotiator-master/src/engine.py", line 84, in forward sel_out = model.forward_selection(inpt, lang_h, ctx_h) File "/Users/diegosoaresub/Downloads/facebookIA/end-to-end-negotiator-master/src/models/dialog_model.py", line 178, in forward_selection h = torch.cat([attn, ctx_h], 2).squeeze(0) File "/Users/diegosoaresub/anaconda/lib/python3.6/site-packages/torch/autograd/variable.py", line 897, in cat return Concat.apply(dim, *iterable) File "/Users/diegosoaresub/anaconda/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 316, in forward ctx.input_sizes = [i.size(dim) for i in inputs] File "/Users/diegosoaresub/anaconda/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 316, in ctx.input_sizes = [i.size(dim) for i in inputs] RuntimeError: invalid argument 2: dimension 2 out of range of 2D tensor at /Users/soumith/miniconda2/conda-bld/pytorch_1502000975045/work/torch/lib/TH/generic/THTensor.c:24

opened by diegosoaresub 8
training: value cannot be converted without overflow

Dear Community I have the following message during python train.py "Traceback (most recent call last): File "train.py", line 105, in main() File "train.py", line 98, in main train_loss, valid_loss, select_loss = engine.train(corpus) File "/Users/ra312/Documents/GitHub/end-to-end-negotiator/src/engine.py", line 195, in train if valid_select_loss < best_valid_select_loss: RuntimeError: value cannot be converted to type float without overflow: 10000000000000000159028911097599180468360808563945281389781327557747838772170381060813469985856815104.000000" I did as instructed. Any help is appreciated. I launched train.py in condo environment py30. In this environment, I installed two packages:PyTorch and Visdom. The operating system is MacOs High Sierra 10.13.4

opened by ra312 6
Human chat with full model
Hi,

I have two questions:

How can I make human chat with full model version. I noticed that chat.py is disabled.

The dialogues generated by full model version is very short, even shorter than original RNN version. What is the reason?

Look forward to your reply!
opened by rzhao1 3

Config has no attribute rl_temperature

Hi, when running reinforce.py I get the following error, any idea what may be causing this? Thanks!

File "reinforce.py", line 95, in main
    parser.add_argument('--temperature', type=float, default=config.rl_temperature,
AttributeError: module 'config' has no attribute 'rl_temperature'

opened by plugimi 3

TypeError: expected str, bytes or os.PathLike object, not NoneType

Hi, I'm new to artificial intelligence, machine learning and also python. I tried to implement this research as a part of my final dissertation on CentOS 7. After installing anaconda and all other tools and dependencies, when i executed "train.py" file, it took about 20 minutes, and finished fine. But then after I tried to look for where it has saved the trained model , I couldnt find it. I wanted to chat with it myself so I executed "chat.py", but it gave me an error saying "utils.py line 35 with open(file_name, 'rb') as f: TypeError: expected str, bytes or os.PathLike object, not NoneType" I dont know if I was supposed to make any changes in the code or not, please let me know if I need to and also where? Thank you.

opened by SarimShehzad 3
reinforce fix: the bug with fixed random seed was corrected

Dear authors,

Thank you for the very good paper "Deal or No Deal? End-to-End Learning for Negotiation Dialogues", I read it, and it's very interesting.

I've tried to reproduce results with original code and parameters (https://github.com/facebookresearch/end-to-end-negotiator) but reinforce result is not the same as in the paper. I have the average score (5.95 vs. 5.58) comparing with (7.1 vs 4.2). Also, I tried to tune parameters a little bit, but it does not help. Do you have any assumptions about the reason for it?

Dear authors,

I've found the source of the problem, it was a very low-level effect. The random seed has been fixed (in reinforce.py line 137), however, small patches to interpretation or libraries may affect. So if I do not fix random seed and change optimization parameters a little bit I have (7.33 vs. 4.33) comparing with (5.95 vs. 5.58) in the original code, that is very close to the paper result (7.1 vs 4.2). Only one epoch for reinforce.py looks like mistype, I made 4.

original code reinforce.py results: 4000: dialog_len=4.73 sent_len=6.50 agree=90.67% advantage=0.41 time=0.091s comb_rew=11.53 alice_rew=5.95 alice_sel=69.30% alice_unique=2586 bob_rew=5.58 bob_sel=30.70% bob_unique=2227

reinforce.py results (4 epoch): 16000: dialog_len=4.97 sent_len=6.66 agree=90.40% advantage=1.39 time=0.103s comb_rew=11.54 alice_rew=6.39 alice_sel=72.89% alice_unique=7342 bob_rew=5.14 bob_sel=27.11% bob_unique=6521

reinforce.py results (4 epoch, tuning of the learning rate, non-fixed random seed): 16000: dialog_len=6.30 sent_len=7.28 agree=91.00% advantage=3.29 time=0.134s comb_rew=11.66 alice_rew=7.33 alice_sel=81.16% alice_unique=4345 bob_rew=4.33 bob_sel=18.84% bob_unique=9112
CLA Signed

opened by senya-ashukha 3
Cuda 9.0 / Pytorch 0.4.1 / Visdom 0.1.8.4 support

Hello, my GPU (Tesla V100-SXM2-16GB) requires CUDA 9.0 support to work, so I updated your project (which is awesome!) to support breaking changes from CUDA 9.0 (from 8), as well as Pytorch 0.4.1 (from 0.3), and Visdom (0.1.8.4).

I tested these updates on an AWS P3 instance, running the deep learning Ubuntu version 12, and the outputs from the GPU match running without CUDA on the original pull.

I added logging statements for visibility into program operation, and moved default settings from the args parameters into their own config.py file. Please feel free to review / comment / merge if you'd like, and I'd be happy to make any changes if you'd like. Hope you find it useful. Thanks again! -Alex

Training example

2018-08-02 16:38:18,984 : INFO : train.py : Starting training using pytorch version:0.4.1 2018-08-02 16:38:18,987 : INFO : train.py : CUDA is enabled. Using device_id:0 version:9.0.176 on gpu:Tesla V100-SXM2-16GB 2018-08-02 16:38:18,987 : INFO : train.py : Building word corpus, requiring minimum word frequency of 20 for dictionary 2018-08-02 16:38:19,732 : INFO : data.py : dataset data/negotiate/train.txt, total 687919, unks 8718, ratio 1.27% 2018-08-02 16:38:19,770 : INFO : data.py : dataset data/negotiate/val.txt, total 74653, unks 914, ratio 1.22% 2018-08-02 16:38:19,807 : INFO : data.py : dataset data/negotiate/test.txt, total 70262, unks 847, ratio 1.21% 2018-08-02 16:38:19,808 : INFO : train.py : Building RNN-based dialogue model from word corpus 2018-08-02 16:38:25,319 : INFO : train.py : Training model 2018-08-02 16:38:36,093 : INFO : engine.py : | epoch 001 | train_loss 3.587 | train_ppl 36.125 | s/epoch 10.31 | lr 1.00000000 2018-08-02 16:38:36,094 : INFO : engine.py : | epoch 001 | valid_loss 2.636 | valid_ppl 13.952 2018-08-02 16:38:36,094 : INFO : engine.py : | epoch 001 | valid_select_loss 1.232 | valid_select_ppl 3.427 2018-08-02 16:38:46,927 : INFO : engine.py : | epoch 002 | train_loss 2.862 | train_ppl 17.505 | s/epoch 10.33 | lr 1.00000000 2018-08-02 16:38:46,927 : INFO : engine.py : | epoch 002 | valid_loss 2.293 | valid_ppl 9.904 2018-08-02 16:38:46,928 : INFO : engine.py : | epoch 002 | valid_select_loss 1.195 | valid_select_ppl 3.305 2018-08-02 16:38:57,761 : INFO : engine.py : | epoch 003 | train_loss 2.643 | train_ppl 14.061 | s/epoch 10.36 | lr 1.00000000 2018-08-02 16:38:57,761 : INFO : engine.py : | epoch 003 | valid_loss 2.153 | valid_ppl 8.607 2018-08-02 16:38:57,761 : INFO : engine.py : | epoch 003 | valid_select_loss 1.002 | valid_select_ppl 2.723 2018-08-02 16:39:08,705 : INFO : engine.py : | epoch 004 | train_loss 2.392 | train_ppl 10.938 | s/epoch 10.47 | lr 1.00000000 2018-08-02 16:39:08,705 : INFO : engine.py : | epoch 004 | valid_loss 2.070 | valid_ppl 7.925 2018-08-02 16:39:08,705 : INFO : engine.py : | epoch 004 | valid_select_loss 0.692 | valid_select_ppl 1.997 2018-08-02 16:39:19,554 : INFO : engine.py : | epoch 005 | train_loss 2.253 | train_ppl 9.512 | s/epoch 10.37 | lr 1.00000000 2018-08-02 16:39:19,554 : INFO : engine.py : | epoch 005 | valid_loss 2.053 | valid_ppl 7.792 2018-08-02 16:39:19,554 : INFO : engine.py : | epoch 005 | valid_select_loss 0.580 | valid_select_ppl 1.785 2018-08-02 16:39:30,437 : INFO : engine.py : | epoch 006 | train_loss 2.174 | train_ppl 8.790 | s/epoch 10.40 | lr 1.00000000 2018-08-02 16:39:30,437 : INFO : engine.py : | epoch 006 | valid_loss 2.013 | valid_ppl 7.484 2018-08-02 16:39:30,437 : INFO : engine.py : | epoch 006 | valid_select_loss 0.533 | valid_select_ppl 1.704 2018-08-02 16:39:41,434 : INFO : engine.py : | epoch 007 | train_loss 2.081 | train_ppl 8.014 | s/epoch 10.52 | lr 1.00000000 2018-08-02 16:39:41,434 : INFO : engine.py : | epoch 007 | valid_loss 1.906 | valid_ppl 6.723 2018-08-02 16:39:41,434 : INFO : engine.py : | epoch 007 | valid_select_loss 0.465 | valid_select_ppl 1.591 2018-08-02 16:39:52,328 : INFO : engine.py : | epoch 008 | train_loss 2.009 | train_ppl 7.454 | s/epoch 10.41 | lr 1.00000000 2018-08-02 16:39:52,328 : INFO : engine.py : | epoch 008 | valid_loss 1.895 | valid_ppl 6.655 2018-08-02 16:39:52,328 : INFO : engine.py : | epoch 008 | valid_select_loss 0.450 | valid_select_ppl 1.569 2018-08-02 16:40:03,231 : INFO : engine.py : | epoch 009 | train_loss 1.973 | train_ppl 7.190 | s/epoch 10.43 | lr 1.00000000 2018-08-02 16:40:03,231 : INFO : engine.py : | epoch 009 | valid_loss 1.898 | valid_ppl 6.674 2018-08-02 16:40:03,231 : INFO : engine.py : | epoch 009 | valid_select_loss 0.444 | valid_select_ppl 1.559 2018-08-02 16:40:14,271 : INFO : engine.py : | epoch 010 | train_loss 1.935 | train_ppl 6.924 | s/epoch 10.56 | lr 1.00000000 2018-08-02 16:40:14,271 : INFO : engine.py : | epoch 010 | valid_loss 1.852 | valid_ppl 6.375 2018-08-02 16:40:14,271 : INFO : engine.py : | epoch 010 | valid_select_loss 0.395 | valid_select_ppl 1.484 2018-08-02 16:40:25,199 : INFO : engine.py : | epoch 011 | train_loss 1.905 | train_ppl 6.722 | s/epoch 10.45 | lr 1.00000000 2018-08-02 16:40:25,199 : INFO : engine.py : | epoch 011 | valid_loss 1.843 | valid_ppl 6.315 2018-08-02 16:40:25,199 : INFO : engine.py : | epoch 011 | valid_select_loss 0.408 | valid_select_ppl 1.503 2018-08-02 16:40:36,132 : INFO : engine.py : | epoch 012 | train_loss 1.874 | train_ppl 6.516 | s/epoch 10.46 | lr 1.00000000 2018-08-02 16:40:36,132 : INFO : engine.py : | epoch 012 | valid_loss 1.843 | valid_ppl 6.318 2018-08-02 16:40:36,132 : INFO : engine.py : | epoch 012 | valid_select_loss 0.346 | valid_select_ppl 1.413 2018-08-02 16:40:47,090 : INFO : engine.py : | epoch 013 | train_loss 1.850 | train_ppl 6.357 | s/epoch 10.48 | lr 1.00000000 2018-08-02 16:40:47,090 : INFO : engine.py : | epoch 013 | valid_loss 1.830 | valid_ppl 6.237 2018-08-02 16:40:47,090 : INFO : engine.py : | epoch 013 | valid_select_loss 0.322 | valid_select_ppl 1.380 2018-08-02 16:40:57,909 : INFO : engine.py : | epoch 014 | train_loss 1.822 | train_ppl 6.183 | s/epoch 10.35 | lr 1.00000000 2018-08-02 16:40:57,910 : INFO : engine.py : | epoch 014 | valid_loss 1.899 | valid_ppl 6.679 2018-08-02 16:40:57,910 : INFO : engine.py : | epoch 014 | valid_select_loss 0.285 | valid_select_ppl 1.329 2018-08-02 16:41:08,608 : INFO : engine.py : | epoch 015 | train_loss 1.796 | train_ppl 6.026 | s/epoch 10.23 | lr 1.00000000 2018-08-02 16:41:08,608 : INFO : engine.py : | epoch 015 | valid_loss 1.854 | valid_ppl 6.387 2018-08-02 16:41:08,608 : INFO : engine.py : | epoch 015 | valid_select_loss 0.344 | valid_select_ppl 1.410 2018-08-02 16:41:19,258 : INFO : engine.py : | epoch 016 | train_loss 1.770 | train_ppl 5.870 | s/epoch 10.19 | lr 1.00000000 2018-08-02 16:41:19,258 : INFO : engine.py : | epoch 016 | valid_loss 1.832 | valid_ppl 6.247 2018-08-02 16:41:19,258 : INFO : engine.py : | epoch 016 | valid_select_loss 0.264 | valid_select_ppl 1.303 2018-08-02 16:41:30,106 : INFO : engine.py : | epoch 017 | train_loss 1.748 | train_ppl 5.746 | s/epoch 10.38 | lr 1.00000000 2018-08-02 16:41:30,107 : INFO : engine.py : | epoch 017 | valid_loss 1.802 | valid_ppl 6.061 2018-08-02 16:41:30,107 : INFO : engine.py : | epoch 017 | valid_select_loss 0.218 | valid_select_ppl 1.243 2018-08-02 16:41:40,915 : INFO : engine.py : | epoch 018 | train_loss 1.732 | train_ppl 5.654 | s/epoch 10.33 | lr 1.00000000 2018-08-02 16:41:40,915 : INFO : engine.py : | epoch 018 | valid_loss 1.862 | valid_ppl 6.435 2018-08-02 16:41:40,915 : INFO : engine.py : | epoch 018 | valid_select_loss 0.220 | valid_select_ppl 1.247 2018-08-02 16:41:51,674 : INFO : engine.py : | epoch 019 | train_loss 1.712 | train_ppl 5.539 | s/epoch 10.29 | lr 1.00000000 2018-08-02 16:41:51,674 : INFO : engine.py : | epoch 019 | valid_loss 1.795 | valid_ppl 6.019 2018-08-02 16:41:51,674 : INFO : engine.py : | epoch 019 | valid_select_loss 0.247 | valid_select_ppl 1.280 2018-08-02 16:42:02,520 : INFO : engine.py : | epoch 020 | train_loss 1.702 | train_ppl 5.487 | s/epoch 10.38 | lr 1.00000000 2018-08-02 16:42:02,521 : INFO : engine.py : | epoch 020 | valid_loss 1.817 | valid_ppl 6.155 2018-08-02 16:42:02,521 : INFO : engine.py : | epoch 020 | valid_select_loss 0.201 | valid_select_ppl 1.222 2018-08-02 16:42:13,287 : INFO : engine.py : | epoch 021 | train_loss 1.687 | train_ppl 5.403 | s/epoch 10.30 | lr 1.00000000 2018-08-02 16:42:13,288 : INFO : engine.py : | epoch 021 | valid_loss 1.835 | valid_ppl 6.267 2018-08-02 16:42:13,288 : INFO : engine.py : | epoch 021 | valid_select_loss 0.219 | valid_select_ppl 1.245 2018-08-02 16:42:24,054 : INFO : engine.py : | epoch 022 | train_loss 1.673 | train_ppl 5.330 | s/epoch 10.30 | lr 1.00000000 2018-08-02 16:42:24,054 : INFO : engine.py : | epoch 022 | valid_loss 1.800 | valid_ppl 6.051 2018-08-02 16:42:24,054 : INFO : engine.py : | epoch 022 | valid_select_loss 0.207 | valid_select_ppl 1.230 2018-08-02 16:42:34,958 : INFO : engine.py : | epoch 023 | train_loss 1.660 | train_ppl 5.259 | s/epoch 10.44 | lr 1.00000000 2018-08-02 16:42:34,958 : INFO : engine.py : | epoch 023 | valid_loss 1.798 | valid_ppl 6.040 2018-08-02 16:42:34,958 : INFO : engine.py : | epoch 023 | valid_select_loss 0.200 | valid_select_ppl 1.221 2018-08-02 16:42:45,765 : INFO : engine.py : | epoch 024 | train_loss 1.649 | train_ppl 5.201 | s/epoch 10.33 | lr 1.00000000 2018-08-02 16:42:45,765 : INFO : engine.py : | epoch 024 | valid_loss 1.780 | valid_ppl 5.931 2018-08-02 16:42:45,765 : INFO : engine.py : | epoch 024 | valid_select_loss 0.191 | valid_select_ppl 1.210 2018-08-02 16:42:56,670 : INFO : engine.py : | epoch 025 | train_loss 1.639 | train_ppl 5.148 | s/epoch 10.43 | lr 1.00000000 2018-08-02 16:42:56,670 : INFO : engine.py : | epoch 025 | valid_loss 1.795 | valid_ppl 6.020 2018-08-02 16:42:56,670 : INFO : engine.py : | epoch 025 | valid_select_loss 0.170 | valid_select_ppl 1.185 2018-08-02 16:43:07,725 : INFO : engine.py : | epoch 026 | train_loss 1.629 | train_ppl 5.096 | s/epoch 10.58 | lr 1.00000000 2018-08-02 16:43:07,725 : INFO : engine.py : | epoch 026 | valid_loss 1.783 | valid_ppl 5.950 2018-08-02 16:43:07,725 : INFO : engine.py : | epoch 026 | valid_select_loss 0.164 | valid_select_ppl 1.178 2018-08-02 16:43:18,639 : INFO : engine.py : | epoch 027 | train_loss 1.619 | train_ppl 5.048 | s/epoch 10.44 | lr 1.00000000 2018-08-02 16:43:18,639 : INFO : engine.py : | epoch 027 | valid_loss 1.765 | valid_ppl 5.844 2018-08-02 16:43:18,639 : INFO : engine.py : | epoch 027 | valid_select_loss 0.203 | valid_select_ppl 1.225 2018-08-02 16:43:29,566 : INFO : engine.py : | epoch 028 | train_loss 1.605 | train_ppl 4.978 | s/epoch 10.45 | lr 1.00000000 2018-08-02 16:43:29,566 : INFO : engine.py : | epoch 028 | valid_loss 1.781 | valid_ppl 5.937 2018-08-02 16:43:29,566 : INFO : engine.py : | epoch 028 | valid_select_loss 0.178 | valid_select_ppl 1.195 2018-08-02 16:43:40,507 : INFO : engine.py : | epoch 029 | train_loss 1.596 | train_ppl 4.936 | s/epoch 10.47 | lr 1.00000000 2018-08-02 16:43:40,507 : INFO : engine.py : | epoch 029 | valid_loss 1.779 | valid_ppl 5.924 2018-08-02 16:43:40,507 : INFO : engine.py : | epoch 029 | valid_select_loss 0.202 | valid_select_ppl 1.224 2018-08-02 16:43:51,585 : INFO : engine.py : | epoch 030 | train_loss 1.586 | train_ppl 4.884 | s/epoch 10.60 | lr 1.00000000 2018-08-02 16:43:51,585 : INFO : engine.py : | epoch 030 | valid_loss 1.799 | valid_ppl 6.043 2018-08-02 16:43:51,585 : INFO : engine.py : | epoch 030 | valid_select_loss 0.144 | valid_select_ppl 1.155 2018-08-02 16:43:51,605 : INFO : engine.py : | start annealing | best validselectloss 0.144 | best validselectppl 1.155 2018-08-02 16:44:01,865 : INFO : engine.py : | epoch 031 | train_loss 1.515 | train_ppl 4.547 | s/epoch 9.80 | lr 0.20000000 2018-08-02 16:44:01,865 : INFO : engine.py : | epoch 031 | valid_loss 1.733 | valid_ppl 5.658 2018-08-02 16:44:01,865 : INFO : engine.py : | epoch 031 | valid_select_loss 0.128 | valid_select_ppl 1.137 2018-08-02 16:44:12,118 : INFO : engine.py : | epoch 032 | train_loss 1.490 | train_ppl 4.438 | s/epoch 9.78 | lr 0.04000000 2018-08-02 16:44:12,118 : INFO : engine.py : | epoch 032 | valid_loss 1.731 | valid_ppl 5.645 2018-08-02 16:44:12,118 : INFO : engine.py : | epoch 032 | valid_select_loss 0.129 | valid_select_ppl 1.137 2018-08-02 16:44:12,138 : INFO : train.py : final select_ppl 1.137

real 5m54.212s user 5m36.176s sys 0m7.336s

opened by zredlined 2
Expecting torch.Longtensor but found type torch.Inttensor

In Moules.py file Line no 107 and 108 only 2 arguments are passed. cnt = ctx.index_select(0, cnt_idx) val = ctx.index_select(0, val_idx)

But as per documentation (https://pytorch.org/docs/stable/torch.html) 3 arguments are passed. I am getting below error. "RunTimeError: Expected object of type torch.Longtensor but found of type torch.Inttensor ". I try to typecast it as follows: cnt = ctx.index_select(0, torch.Longtensor(cnt_idx)). But still got the same error.

I am using python 3.7 . So what might be the issue?

Thanks

opened by hegebharat 2
non-empty torch.LongTensor is ambiguous

Hi, When I ran the reinforce.py, I got the following error. ** Since my machine doesn't have GPU, I ran train.py and reinforce.py without --cuda. ** I also got the same error when running selfplay.py. Thanks.

(py30) [root@localhost src]# python reinforce.py --data data/negotiate --bsz 16 --clip 1 --context_file data/negotiate/selfplay.txt --eps 0.0 --gamma 0.95 --lr 0.5 --momentum 0.1 --nepoch 1 --nesterov --ref_text data/negotiate/train.txt --rl_clip 1 --rl_lr 0.1 --score_threshold 6 --sv_train_freq 4 --temperature 0.5 --alice_model sv_model.th --bob_model sv_model.th --output_model_file rl_model.th Traceback (most recent call last): File "reinforce.py", line 165, in main() File "reinforce.py", line 159, in main reinforce.run() File "reinforce.py", line 57, in run self.dialog.run(ctxs, self.logger) File "/root/end-to-end-negotiator-master/src/dialog.py", line 170, in run out = writer.write() File "/root/end-to-end-negotiator-master/src/agent.py", line 329, in write 100, self.args.temperature) File "/root/end-to-end-negotiator-master/src/models/dialog_model.py", line 282, in write if inpt: File "/root/anaconda3/envs/py30/lib/python3.6/site-packages/torch/autograd/variable.py", line 123, in bool torch.typename(self.data) + " is ambiguous") RuntimeError: bool value of Variable objects containing non-empty torch.LongTensor is ambiguous

opened by uclaRaymond 2
pytorch0.4.1 does work for me

pytorch0.4.1 does work for me, but pytorch1.0.0 works. when I use 0.4.1, it shows "mean is not a valid value for reduction" (it should be "elementwise_mean" in 0.4.1)

opened by ryz1 1
CUDA not available. Unable to train

Hi, when I run "python train.py --data data/negotiate --cuda --bsz 16 --clip 0.5 --decay_every 1 --decay_rate 5.0 --dropout 0.5 --init_range 0.1 --lr 1 --max_epoch 30 --min_lr 0.01 --momentum 0.1 --nembed_ctx 64 --nembed_word 256 --nesterov --nhid_attn 256 --nhid_ctx 64 --nhid_lang 128 --nhid_sel 256 --nhid_strat 128 --sel_weight 0.5 --model_file sv_model.th"

I get the assertion fail error that CUDA not available. I do not have GPU on my machine. CUDA wont work due to that.

Is there any way forward for training without GPU?

Thanks for the help.

opened by ishwara-bhat 1
test.py: could not run, it seems like the file version is not consistent with others

File "test.py", line 62, in main out, hid, tgt, sel_out, sel_tgt = Engine.forward(model, batch)#, volatile=False) File "/home/liuchen/data/end-to-end-negotiator/src/engines/engine.py", line 78, in forward sel_tgt = Variable(sel_tgt) TypeError: Variable data has to be a tensor, but got list

opened by ghost 2
Questions about arguments in reinforce script

Hello.

I have a few questions. I would be grateful if you answer them.

Could you please tell me what the argument is and what it affects? parser.add_argument('--smart_bob', action='store_true', default=False, help='make Bob smart again')

In Deal or No Deal? End-to-End Learning for Negotiation Dialogues in 6.1: During reinforcement learning, we use a learning rate of 0.1, clip gradients above 1.0, and use a discount factor of γ=0.95. But in reinforce.py: parser.add_argument('--gamma', type=float, default=0.99, help='discount factor'). It matters to learning?

Also in reinforce.py we see: 'parser.add_argument('--clip', type=float, default=0.1, help='gradient clip') In report: clip gradients above 1.0

Reinforce learning rate and gradient clip. In script default value is:
parser.add_argument('--rl_lr', type=float, default=0.002, help='RL learning rate') parser.add_argument('--rl_clip', type=float, default=2.0, help='RL gradient clip') In code snippet in readme file: --rl_lr 0.00001 \ --rl_clip 0.0001 \ It matters to learning?

How long does it take to execute the script reinforce with arguments from snippet?

During training, a large number of dialogs appear in which one of the agents repeats one word a large number of times. It' ok?

opened by Ulitochka 2
Reinfrocement Learning with full_model

Hello.

If I use full_model.th in reinforcment learning script: python reinforce.py
--cuda
--alice_model_file full_model.th
--bob_model_file full_model.th
--output_model_file rl_model.th
--context_file data/negotiate/selfplay.txt
--temperature 0.5
--verbose
--log_file rnn_rl.log
--sv_train_freq 4
--nepoch 4
--selection_model_file selection_model.th
--rl_lr 0.00001
--rl_clip 0.0001
--sep_sel

I have this:

Traceback (most recent call last): File "/home/.../end-to-end-negotiator/src/reinforce.py", line 169, in main() File "/home/.../end-to-end-negotiator/src/reinforce.py", line 163, in main reinforce.run() File "/home/.../end-to-end-negotiator/src/reinforce.py", line 51, in run self.dialog.run(ctxs, self.logger) File "/home/.../end-to-end-negotiator/src/dialog.py", line 171, in run out = writer.write(max_words=words_left) File "/home/.../end-to-end-negotiator/src/agent.py", line 1661, in write _, lat_h, log_q_z = self.model.forward_prediction(self.cnt, self.mem_h, sample=self.train) File "/home/.../end-to-end-negotiator/src/models/latent_clustering_model.py", line 541, in forward_prediction z = q_z.multinomial().detach() TypeError: multinomial() missing 1 required positional arguments: "num_samples"

torch==1.0.0

opened by Ulitochka 0

Owner

Facebook Research

GitHub

Data & Code for ACCENTOR Adding Chit-Chat to Enhance Task-Oriented Dialogues

ACCENTOR: Adding Chit-Chat to Enhance Task-Oriented Dialogues Overview ACCENTOR consists of the human-annotated chit-chat additions to the 23.8K dialo

69 Dec 29, 2022

NUANCED is a user-centric conversational recommendation dataset that contains 5.1k annotated dialogues and 26k high-quality user turns.

NUANCED: Natural Utterance Annotation for Nuanced Conversation with Estimated Distributions Overview NUANCED is a user-centric conversational recommen

18 Dec 28, 2021

An end-to-end machine learning web app to predict rugby scores (Pandas, SQLite, Keras, Flask, Docker)

Rugby score prediction An end-to-end machine learning web app to predict rugby scores Overview An demo project to provide a high-level overview of the

34 May 24, 2022

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

InfoPro-Pytorch The Information Propagation algorithm for training deep networks with local supervision. (ICLR 2021) Revisiting Locally Supervised Lea

78 Dec 27, 2022

Neural Dynamic Policies for End-to-End Sensorimotor Learning

This is a PyTorch based implementation for our NeurIPS 2020 paper on Neural Dynamic Policies for end-to-end sensorimotor learning.

47 Dec 11, 2022

An end-to-end machine learning library to directly optimize AUC loss

LibAUC An end-to-end machine learning library for AUC optimization. Why LibAUC? Deep AUC Maximization (DAM) is a paradigm for learning a deep neural n

75 Dec 12, 2022

[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning [CVPR'21, Oral] By Zhicheng Huang*, Zhaoyang Zeng*, Yupan H

196 Dec 13, 2022

"SOLQ: Segmenting Objects by Learning Queries", SOLQ is an end-to-end instance segmentation framework with Transformer.

SOLQ: Segmenting Objects by Learning Queries This repository is an official implementation of the paper SOLQ: Segmenting Objects by Learning Queries.

179 Jan 2, 2023

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech Jaehyeon Kim, Jungil Kong, and Juhee Son In our rece

1.7k Jan 8, 2023

FPGA: Fast Patch-Free Global Learning Framework for Fully End-to-End Hyperspectral Image Classification

FPGA & FreeNet Fast Patch-Free Global Learning Framework for Fully End-to-End Hyperspectral Image Classification by Zhuo Zheng, Yanfei Zhong, Ailong M

92 Jan 3, 2023

WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

WarpDrive is a flexible, lightweight, and easy-to-use open-source reinforcement learning (RL) framework that implements end-to-end multi-agent RL on a single GPU (Graphics Processing Unit).

334 Jan 6, 2023

Roach: End-to-End Urban Driving by Imitating a Reinforcement Learning Coach

CARLA-Roach This is the official code release of the paper End-to-End Urban Driving by Imitating a Reinforcement Learning Coach by Zhejun Zhang, Alexa

118 Dec 28, 2022

Task-based end-to-end model learning in stochastic optimization

Task-based End-to-end Model Learning in Stochastic Optimization This repository is by Priya L. Donti, Brandon Amos, and J. Zico Kolter and contains th

164 Dec 29, 2022

Teaching end to end workflow of deep learning

Deep-Education This repository is now available for public use for teaching end to end workflow of deep learning. This implies that learners/researche

2 Sep 26, 2022

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

NVIDIA Merlin NVIDIA Merlin is an open source library designed to accelerate recommender systems on NVIDIA’s GPUs. It enables data scientists, machine

419 Jan 3, 2023

Avalanche RL: an End-to-End Library for Continual Reinforcement Learning

43 Dec 24, 2022

This repository contains the entire code for our work "Two-Timescale End-to-End Learning for Channel Acquisition and Hybrid Precoding"

Two-Timescale-DNN Two-Timescale End-to-End Learning for Channel Acquisition and Hybrid Precoding This repository contains the entire code for our work

3 Mar 7, 2022

End-to-end machine learning project for rices detection

Basmatinet Welcome to this project folks ! Whether you like it or not this project is all about riiiiice or riz in french. It is also about Deep Learn

47 Jun 18, 2022

End-to-End Object Detection with Fully Convolutional Network

This project provides an implementation for "End-to-End Object Detection with Fully Convolutional Network" on PyTorch.

472 Dec 22, 2022

Deal or No Deal? End-to-End Learning for Negotiation Dialogues

Related tags

Overview

Introduction

Citation

Dataset

Setup

Usage

Supervised Training

Action Classifier

Baseline RNN Model

Hierarchical Latent Model

Selfplay

Reinforcement Learning

License

Comments

Training example

Owner

Facebook Research

Data & Code for ACCENTOR Adding Chit-Chat to Enhance Task-Oriented Dialogues

NUANCED is a user-centric conversational recommendation dataset that contains 5.1k annotated dialogues and 26k high-quality user turns.

An end-to-end machine learning web app to predict rugby scores (Pandas, SQLite, Keras, Flask, Docker)

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

Neural Dynamic Policies for End-to-End Sensorimotor Learning

An end-to-end machine learning library to directly optimize AUC loss

[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

"SOLQ: Segmenting Objects by Learning Queries", SOLQ is an end-to-end instance segmentation framework with Transformer.

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

FPGA: Fast Patch-Free Global Learning Framework for Fully End-to-End Hyperspectral Image Classification

WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

Roach: End-to-End Urban Driving by Imitating a Reinforcement Learning Coach

Task-based end-to-end model learning in stochastic optimization

Teaching end to end workflow of deep learning

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

Avalanche RL: an End-to-End Library for Continual Reinforcement Learning

This repository contains the entire code for our work "Two-Timescale End-to-End Learning for Channel Acquisition and Hybrid Precoding"

End-to-end machine learning project for rices detection

End-to-End Object Detection with Fully Convolutional Network