K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce (EMNLP Founding 2021)

Overview

Introduction

K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce.

Installation

  • PyTorch version >= 1.5.0
  • Python version >= 3.6
git clone https://github.com/pytorch/fairseq.git
cd fairseq 
pip install --editable ./

Pre-training

prepare data for pre-training train.sh

export CUDA_VISIBLE_DEVICES=0,1,2,3

function join_by { local IFS="$1"; shift; echo "$*"; }
DATA_DIR=$(join_by : data/kplug/bin/part*)

USER_DIR=src
TOKENS_PER_SAMPLE=512
WARMUP_UPDATES=10000
PEAK_LR=0.0005
TOTAL_UPDATES=125000
#MAX_SENTENCES=8
MAX_SENTENCES=16
UPDATE_FREQ=16   # batch_size=update_freq*max_sentences*nGPU = 16*16*4 = 1024

SUB_TASK=mlm_clm_sentcls_segcls_titlegen 
## ablation task
#SUB_TASK=clm_sentcls_segcls_titlegen
#SUB_TASK=mlm_sentcls_segcls_titlegen
#SUB_TASK=mlm_clm_sentcls_segcls
#SUB_TASK=mlm_clm_segcls_titlegen
#SUB_TASK=mlm_clm_sentcls_titlegen

fairseq-train $DATA_DIR \
    --user-dir $USER_DIR \
    --task multitask_lm \
    --sub-task $SUB_TASK \
    --arch transformer_pretrain_base \
    --min-loss-scale=0.000001 \
    --sample-break-mode none \
    --tokens-per-sample $TOKENS_PER_SAMPLE \
    --criterion multitask_lm \
    --apply-bert-init \
    --max-source-positions 512 --max-target-positions 512 \
    --optimizer adam --adam-betas '(0.9, 0.98)' --adam-eps 1e-6 --clip-norm 0.0 \
    --lr-scheduler polynomial_decay --lr $PEAK_LR \
    --warmup-updates $WARMUP_UPDATES --total-num-update $TOTAL_UPDATES \
    --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \
    --max-sentences $MAX_SENTENCES --update-freq $UPDATE_FREQ \
    --ddp-backend=no_c10d \
    --tensorboard-logdir tensorboard \
    --classification-head-name pretrain_head --num-classes 40 \
    --tagging-head-name pretrain_tag_head --tag-num-classes 2 \
    --fp16

Fine-tuning and Inference

Finetuning on JDDC (Response Generation)

Finetuning on ECD Corpus (Response Retrieval)

Finetuning on JD Product Dataset (Abstractive Summarization)

Finetuning on MEPAVE Dataset (Sequence Tagging)

You might also like...
Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Storium GPT-2 Models This is the official repository for the GPT-2 models described in the EMNLP 2020 paper [STORIUM: A Dataset and Evaluation Platfor

🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI
🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI

PyTorch implementation of OpenAI's Finetuned Transformer Language Model This is a PyTorch implementation of the TensorFlow code provided with OpenAI's

DziriBERT: a Pre-trained Language Model for the Algerian Dialect
DziriBERT: a Pre-trained Language Model for the Algerian Dialect

DziriBERT DziriBERT is the first Transformer-based Language Model that has been pre-trained specifically for the Algerian Dialect. It handles Algerian

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

CLIP4CMR A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval The original data and pre-calculate

SUPERVISED-CONTRASTIVE-LEARNING-FOR-PRE-TRAINED-LANGUAGE-MODEL-FINE-TUNING - The Facebook paper about fine tuning RoBERTa with contrastive loss
Official repository for "PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation"

pair-emnlp2020 Official repository for the paper: Xinyu Hua and Lu Wang: PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long

[IJCAI-2021] A benchmark of data-free knowledge distillation from paper
[IJCAI-2021] A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation"

DataFree A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation" Authors: Gongfa

Implementation of EMNLP 2017 Paper
Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

Language Emergence in Multi Agent Dialog Code for the Paper Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog Satwik Kottur, José M.

Implementation of EMNLP 2017 Paper
Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

Language Emergence in Multi Agent Dialog Code for the Paper Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog Satwik Kottur, José M.

Comments
  • 预训练示例数据以及finetune结果数据

    预训练示例数据以及finetune结果数据

    1. 如题,预训练示例数据。data_sample文件只包含finetune任务数据,可否提供下预训练任务示例数据。
    2. 执行demo进行finetune任务后,得到的结果和示例相差较大,如下所示:

    S-167 大 松 格 力 电 饭 煲 煲 多 功 能 电 饭 锅 家 用 4 升 无 可 挑 剔 的 口 感 , 真 正 的 中 国 好 米 饭 , 满 足 人 类 对 米 饭 的 需 求 , 双 层 合 金 内 加 , 一 产 品 全 家 福 , 多 种 烹 饪 模 式 智 能 随 心 控 , 匠 心 精 神 , 让 你 拥 有 , 让 您 拥 有 无 可 挑 剔 的 口 感 , 3 段 智 能 保 温 技 术 , 点 阵 液 晶 显 示 智 能 触 摸 操 控 , - 产 品 细 节 , 24 小 时 美 味 预 约 , 保 温 盖 板 可 拆 洗 , 垂 涎 欲 滴 的 晚 餐 , 香 甜 软 糯 的 早 餐 , 我 们 给 您 配 置 了 , 早 上 想 喝 粥 , 不 吃 米 饭 会 缺 爱 , 主 机 * 1 、 量 杯 * 1 、 内 胆 * 1 , 产 品 尺 寸 37 ##5 * 27 ##0 * 21 ##5 ##mm , 高 温 穿 透 米 芯 , 从 底 部 发 热 , 磁 力 穿 透 锅 体 , 突 然 一 阵 饭 香 扑 面 而 来 , 晚 上 想 吃 米 饭 , 米 饭 快 速 升 温 , 汤 勺 , 煲 仔 饭 , 量 杯 , 额 定 电 压 22 ##0 ##v , 古 法 烹 饪 , 保 留 食 材 原 汁 原 味 , 米 饭 口 感 五 颗 星 , 预 约 10 小 时 , 米 饭 放 入 锅 中 , 精 华 煮 , 煲 汤 , 蒸 煮 , 产 品 容 量 4 ##l , 右 手 还 在 关 门 , 别 扭

    T-167 电 磁 感 应 加 热 标 准 火 候 控 制 , 米 饭 表 面 柔 软 平 整 口 感 软 儒 , 三 段 智 能 保 温 技 术 循 环 保 温 , 全 面 锁 住 营 养 不 流 失 , 焖 出 米 饭 好 口 感 ; 双 层 合 金 内 胆 古 法 烹 饪 , 保 留 食 材 原 汁 原 味 米 饭 口 感 五 颗 星 。

    H-167 -0.8923664689064026 。 格 力 电 饭 煲 煲 多 功 能 电 饭 煲 家 用 4 。 无 可 挑 剔 的 口 碑 多 功 能 电 饭 煲 家 4 。 无 可 挑 剔 , 家 用 4 多 功 能 电 饭 煲 家 用 4

    里面出现重复句子,是否因为训练数据较少,同时包含大量重复数据呢

    opened by AlexYoung757 6
  • what are the train/val/test split of JDDC in your experiment

    what are the train/val/test split of JDDC in your experiment

    Hi, Thanks for your nice work, and I got jddc corpus from the original author. But the dataset only contains these columns "session_id user_id role sku content intent". So I am wondering what is your data segmentation.

    opened by NonvolatileMemory 5
Owner
Xu Song
daily
Xu Song
Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

ERICA Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive L

THUNLP 75 Nov 2, 2022
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Salesforce 1.3k Dec 31, 2022
The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Cutoff: A Simple Data Augmentation Approach for Natural Language This repository contains source code necessary to reproduce the results presented in

Dinghan Shen 49 Dec 22, 2022
Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

PLBART Code pre-release of our work, Unified Pre-training for Program Understanding and Generation accepted at NAACL 2021. Note. A detailed documentat

Wasi Ahmad 138 Dec 30, 2022
Source code for paper: Knowledge Inheritance for Pre-trained Language Models

Knowledge-Inheritance Source code paper: Knowledge Inheritance for Pre-trained Language Models (preprint). The trained model parameters (in Fairseq fo

THUNLP 31 Nov 19, 2022
Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

InversePrompting Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting Code: The code is provided in the "chinese_ip"

THUDM 101 Dec 16, 2022
A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

Awesome Pretrained StyleGAN2 A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution. Note the readme is a

Justin 1.1k Dec 24, 2022
《K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters》(2020)

K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters This repository is the implementation of the paper "K-Adapter: Infusing Knowledge

Microsoft 118 Dec 13, 2022
Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

Ancient Greek BERT The first and only available Ancient Greek sub-word BERT model! State-of-the-art post fine-tuning on Part-of-Speech Tagging and Mor

Pranaydeep Singh 22 Dec 8, 2022
Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

Kimio Kuramitsu 1 Dec 13, 2021