Code for our paper "Sematic Representation for Dialogue Modeling" in ACL2021

Overview

AMR-Dialogue

An implementation for paper "Semantic Representation for Dialogue Modeling". You may find our paper here.

Requirements

  • python 3.6
  • pytorch 1.6
  • Tesla V100 (32G)
  • Memory > 150G

We recommend to use conda to manage virtual environments:

conda create --name <env> --file requirements.txt

Data

We provide preprocessed data for two tasks here [tobeadded].

Preprocessing

bash /path/to/code/preprocess.sh

Training

bash /path/to/code/run-dual(hier).sh

Evaluation

bash /path/to/code/eval.sh                   # for dialogue relation extraction
bash /path/to/code/decode.sh                 # for dialogue response generation

Todo

  • upload preprocessed data
  • upload trained checkpoint
  • clean code

References

@inproceedings{bai-etal-2020-online,
    title = "Semantic Representation for Dialogue Modeling",
    author = "Bai, Xuefeng  and 
      Chen, Yulong and
      Song, Linfeng  and
      Zhang, Yue",
    booktitle = "Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021)",
    month = August,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "todo",
    doi = "todo",
    pages = "todo",
}
Comments
  • tokenize and bpe for src and tgt in DialogRG

    tokenize and bpe for src and tgt in DialogRG

    Hello! I have two questions regarding the processing of src and tgt files in DialogRG.

    1. For src and tgt, I wonder if your work followed Zhu et. al 2019 to use the PTBTokenizer for tokenization and then applied bpe using the code from subword-nmt? I plan to run bpe like the commands below, but I'm not sure about the num_operations you used.
    subword-nmt learn-bpe -s {num_operations} < {train_file} > {codes_file}
    subword-nmt apply-bpe -c {codes_file} < {dev_file} > {out_file}
    subword-nmt apply-bpe -c {codes_file} < {test_file} > {out_file}
    
    1. I noticed DialogRG/dataset_utils.py also included a bert_tokenize(tokenizer, src) function. Does it mean we can also use bert_tokenizer instead of bpe on src and tgt files?

    Thank you again for your reply to my previous questions!

    opened by Bobby-Hua 5
  • Examples of data process

    Examples of data process

    Hi, thanks for giving instructions about the data process in #1 . However, I'm not familiar with AMR, could you give a simple real example of step 2 and step 6? For example, an origin amr graph and the processed amr graph with dummy node and speaker edges? I don't know how the format looks like.

    opened by sangyx 4
  • data process

    data process

    hi!i don‘t know how to preprocess the raw data and it is not included in this project. do you have the part of this code? Looking forward to your reply.

    opened by techzz 3
  • Data preprocessing

    Data preprocessing

    Hi, I am trying to reproduce the preprocessing pipeline in #1, and I have a few questions:

    1. For neural coreference, what setting was used? Was the speaker name(speaker1,2) included in the text? https://huggingface.co/coref/?text=Could%20I%20have%20my%20bill%2C%20please%3F%20Certainly%2C%20sir.%20I%E2%80%99m%20afraid%20there%20has%20been%20a%20mistake.What%20could%20it%20be%3F using this demo it seems to produce different result than what's in the paper.
    2. For step 6, how were the edges added? were them added directly to the amr or encoded in some other way?
    3. For AMR-simplifier, what's the difference between the file posted in the link and the neural-amr repo(https://github.com/sinantie/NeuralAmr)? Thanks!
    opened by dzy49 1
  • Questions about Preprocessed Data

    Questions about Preprocessed Data

    Hi. I have three questions about this repository:

    1. When will the preprocessed DialogRG data be uploaded?
    2. DialogRE contains two versions, of which version 2 is obtained by modifying some of the annotation errors in version 1. Is it necessary to experiment on version 1?
    3. Can you give some explanations about the contents of the preprocessed data, such as the contents of train.mask and train.path? This will be very helpful in understanding your code.

    Thank you for your meaningful paper.

    opened by KuanKuanQAQ 1
  • question for DailyDialogue data

    question for DailyDialogue data

    Hi, how do you transfer the DailyDialogue data to the train format, like .mask, .tgt, .rel file et; I have some confusion about relation and conception, any code for share?

    opened by changleilei 0
  • Problems about the create the environment

    Problems about the create the environment

    Hi, @freesunshine0316 , recently I have tried to run your projects, but when I run the commond line conda create --name <env_name> --file requirements.txt, there occurences some problems as follows:

    Collecting package metadata (current_repodata.json): done
    Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
    Collecting package metadata (repodata.json): done
    Solving environment: failed
    
    PackagesNotFoundError: The following packages are not available from current channels:
    nlp=0.4.0=pypi_0
    apex=0.1=pypi_0
    bpemb=0.3.2=pypi_0
    .......
    

    These problems may be caused by the conda list --export, which does not capture pip-installed packages in a form that allows for recreating the environment. Resolution can be found at conda export yaml .

    Thanks!

    opened by ZipGao 0
Owner
xfbai
Stay Hungry,Stay Foolish
xfbai
Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter Code and checkpoints for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling

null 274 Dec 6, 2022
Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

ERICA Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive L

THUNLP 75 Nov 2, 2022
Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning.

xTune Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning. Environment DockerFile: dancingsoul/pytorch:xTune Install the f

Bo Zheng 42 Dec 9, 2022
Source code for the paper "PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction" in ACL2021

PLOME:Pre-training with Misspelled Knowledge for Chinese Spelling Correction (ACL2021) This repository provides the code and data of the work in ACL20

null 197 Nov 26, 2022
This is the code for ACL2021 paper A Unified Generative Framework for Aspect-Based Sentiment Analysis

This is the code for ACL2021 paper A Unified Generative Framework for Aspect-Based Sentiment Analysis Install the package in the requirements.txt, the

null 108 Dec 23, 2022
Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Multi-Task Framework for Cross-Lingual Abstractive Summarization (MCLAS) The code for ACL2021 paper Cross-Lingual Abstractive Summarization with Limit

Yu Bai 43 Nov 7, 2022
Source code for "UniRE: A Unified Label Space for Entity Relation Extraction.", ACL2021.

UniRE Source code for "UniRE: A Unified Label Space for Entity Relation Extraction.", ACL2021. Requirements python: 3.7.6 pytorch: 1.8.1 transformers:

Wang Yijun 109 Nov 29, 2022
A Multi-modal Model Chinese Spell Checker Released on ACL2021.

ReaLiSe ReaLiSe is a multi-modal Chinese spell checking model. This the office code for the paper Read, Listen, and See: Leveraging Multimodal Informa

DaDa 106 Dec 29, 2022
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation(mCOLT/mRASP2), ACL2021

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation(mCOLT/mRASP2), ACL2021 The code for training mCOLT/mRASP2, a multilingua

null 104 Jan 1, 2023
PyTorch implementation of our Adam-NSCL algorithm from our CVPR2021 (oral) paper "Training Networks in Null Space for Continual Learning"

Adam-NSCL This is a PyTorch implementation of Adam-NSCL algorithm for continual learning from our CVPR2021 (oral) paper: Title: Training Networks in N

Shipeng Wang 34 Dec 21, 2022
Convolutional neural network web app trained to track our infant’s sleep schedule using our Google Nest camera.

Machine Learning Sleep Schedule Tracker What is it? Convolutional neural network web app trained to track our infant’s sleep schedule using our Google

g-parki 7 Jul 15, 2022
Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

CoProtector Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Zhensu Sun 1 Oct 26, 2021
Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Region Proportion Regularized Inference (RePRI) for Few-Shot Segmentation In this repo, we provide the code for our paper : "Few-Shot Segmentation Wit

Malik Boudiaf 138 Dec 12, 2022
Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

SA-Net: Shuffle Attention for Deep Convolutional Neural Networks (paper) By Qing-Long Zhang and Yu-Bin Yang [State Key Laboratory for Novel Software T

Qing-Long Zhang 199 Jan 8, 2023
Code for our CVPR 2021 paper "MetaCam+DSCE"

Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification (CVPR'21) Introduction Code for our CVPR 2021

FlyingRoastDuck 59 Oct 31, 2022
Code for our CVPR2021 paper coordinate attention

Coordinate Attention for Efficient Mobile Network Design (preprint) This repository is a PyTorch implementation of our coordinate attention (will appe

Qibin (Andrew) Hou 726 Jan 5, 2023
[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

TBE The source code for our paper "Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Le

Jinpeng Wang 150 Dec 28, 2022
Code for our paper at ECCV 2020: Post-Training Piecewise Linear Quantization for Deep Neural Networks

PWLQ Updates 2020/07/16 - We are working on getting permission from our institution to release our source code. We will release it once we are granted

null 54 Dec 15, 2022
Code for our paper Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation

CorDA Code for our paper Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation Prerequisite Please create and activate the follo

Qin Wang 60 Nov 30, 2022