Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle

Last update: Dec 31, 2022

Related tags

Deep Learning Knover

Overview

Knover

Knover is a toolkit for knowledge grounded dialogue generation based on PaddlePaddle. Knover allows researchers and developers to carry out efficient training/inference of large-scale dialogue generation models.

What's New:

February 2021: We are opening our implementation (Team 19) in DSTC9-Track1.
July 2020: We are opening PLATO-2, a large-scale generative model with latent space for open-domain dialogue systems.

Requirements and Installation

python version >= 3.7
paddlepaddle-gpu version >= 2.0.0
- You can install PaddlePaddle following the instructions.
- The specific version of PaddlePaddle is also based on your CUDA version (recommended version: 10.1) and CuDNN version (recommended version: 7.6). See more information on PaddlePaddle document about GPU support
sentencepiece
termcolor
If you want to run distributed training, you'll also need NCCL
Install Knover locally:

git clone https://github.com/PaddlePaddle/Knover.git
cd Knover
pip3 install -e .

Or you can setup PYTHONPATH only:

export PYTHONPATH=/abs/path/to/Knover:$PYTHONPATH

Basic usage

See usage document.

Disclaimer

This project aims to facilitate further research progress in dialogue generation. Baidu is not responsible for the 3rd party's generation with the pre-trained system.

Contact information

For help or issues using Knover, please submit a GitHub issue.

Comments

有关训练效果

530w数据，从头训stage1, stage2.1 。仍然明显有safe response&重复的现象，请问是我训练的不够充分吗？ stage1 batch_size=16 训练了320000 step，stage2.1训练了batch_size=1024 18000step

改成选随机的候选感觉好一些。

opened by lonelydancer 31
想问下分布式训练有什么特殊设置吗，单机多卡可以跑通，多机多卡可以建立通信但是不报错也不训练

配置里按照paddle分布式教程设置为：distributed_args="--ips 10.130.19.203,10.130.17.157 --selected_gpus 0,1"，两台机器可以建立通信但是不开始训练，GPU每张卡有2g内存占用，下面这种配置可以正常训练：distributed_args="--ips 10.130.19.203 --selected_gpus 0,1"，

opened by jidlin 18
the missing full source code of plato-2

Hi thanks for your great work! I explore the plato-2 directory and just found there are .sh files, may I ask where is the .py files? so I could try the chatbot interaction, thanks for your help!

opened by chikiuso 9
关于PLATO-XL的训练

非常感谢您开源的XL模型。我尝试用8个A100（每块40G显存）训练自己的XL模型，但因参数过大，显存还是不够。看plato-XL论文里面提到：Given the limited memory of each device, vanilla data parallelism cannot support the training of such a model with up to 11 billion parameters.As such, we adopt the sharded data parallelism (Rajbhandari et al., 2020) to eliminate memory redundancies, by partitioning the optimizer states, gradients and parameters across multiple devices. 请问论文里提到的这种模型参数跨多个显卡的训练方法要如何实现？

opened by guijuzhejiang 7

中文plato2，单机单卡可以训练，单机多卡跑到一定步数就退出，无有用报错信息

中文对话数据，数据量400w，单卡可以跑完整个epoch，单机4卡运行到一定步数就退出

环境： paddlepaddle-gpu==2.0.1 cuda==11.0 cudnn==8.0

终端报错是：

INFO 2021-09-05 21:51:40,245 launch_utils.py:327] terminate all the procs ERROR 2021-09-05 21:51:40,245 launch_utils.py:584] ABORT!!! Out of all 4 trainers, the trainer process with rank=[3] was aborted. Please check its log. INFO 2021-09-05 21:51:43,248 launch_utils.py:327] terminate all the procs`

work_log.3里面报错如下：

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   paddle::framework::ParallelExecutor::Run(std::vector<std::string, std::allocator<std::string > > const&, bool)
1   paddle::framework::details::ScopeBufferedSSAGraphExecutor::Run(std::vector<std::string, std::allocator<std::string > > const&, bool)
2   paddle::framework::details::FastThreadedSSAGraphExecutor::Run(std::vector<std::string, std::allocator<std::string > > const&, bool)
3   paddle::framework::BlockingQueue<unsigned long>::Pop()
4   paddle::framework::SignalHandle(char const*, int)
5   paddle::platform::GetCurrentTraceBackString[abi:cxx11]()

----------------------
Error Message Summary:
----------------------
FatalError: `Termination signal` is detected by the operating system.
  [TimeInfo: *** Aborted at 1630683022 (unix time) try "date -d @1630683022" if you are using GNU date ***]
  [SignalInfo: *** SIGTERM (@0x3e800000a5e) received by PID 2812 (TID 0x7f718f576b80) from PID 2654 ***]

opened by jidlin 7

建议把如何从check_point继续训练的方式也在文档里写一下

这个训练一般会持续很久，很可能会断了之后继续训练，所以继续训练也是个刚需。建议把如何继续训练写到文档里面。

还有就是现在要继续训练要自己在参数里填check_point路径和当前的start_step，这样还是太麻烦了，建议在保存check_point的时候把这个信息保存一下，这样继续训练的时候先检测这个信息，然后自动从上次最后的step开始训练
enhancement

opened by onewaymyway 7
ValueError: (InvalidArgument) Tensor holds the wrong type, it holds int, but desires to be int64_t.

Lic2022的baseline源码，在AIStudio可以正常跑，本地跑时train_query，infer_dial，infer_dial均无错误，只在infer_query时出现以下错误

paddlepaddle：2.2.2 cuda：11.2 cudnn：8.2

$ sh ./scripts/local/job.sh ./projects/lic2022/conf/query_infer.conf

2022-04-18 15:40:25,456-INFO: [topology.py:169:init] HybridParallelInfo: rank_id: 0, mp_degree: 1, sharding_degree: 1, pp_degree: 1, dp_degree: 1, mp_group: [0], sharding_group: [0], pp_group: [0], dp_gr oup: [0], check/clip group: [0] W0418 15:40:25.456908 14688 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.6, Runtime API Version: 11.2 W0418 15:40:25.472528 14688 device_context.cc:465] device: 0, cuDNN Version: 8.2. [WARN] Using constant learning rate because of warmup_steps is not positive while using NoamScheduler. Loading parameters from ./projects/lic2022/model_zoo/query_finetune.pdparams. Loading has done! Traceback (most recent call last): File "./knover/scripts/infer.py", line 140, in infer(args) File "./knover/scripts/infer.py", line 83, in infer predictions = task.infer_step(model, data) File "e:\jupyternotebookproject\lic2022\knover\knover\core\task.py", line 46, in infer_step predictions = model.infer_step(inputs) File "e:\jupyternotebookproject\lic2022\knover\knover\core\model.py", line 508, in infer_step predictions = self._model(*inputs, mode="infer") File "e:\jupyternotebookproject\lic2022\knover\knover\core\model.py", line 180, in call outputs = self.infer_step(inputs) File "e:\jupyternotebookproject\lic2022\knover\knover\core\model.py", line 170, in infer_step predictions = self.infer(inputs, outputs) File "e:\jupyternotebookproject\lic2022\knover\knover\models\unified_transformer.py", line 297, in infer outputs = self.generator(self, inputs, outputs) File "e:\jupyternotebookproject\lic2022\knover\knover\modules\generator.py", line 163, in call state = self._update_state(state, probs) File "e:\jupyternotebookproject\lic2022\knover\knover\modules\generator.py", line 390, in _update_state state["predictions"] = paddle.concat([state["predictions"], pred], axis=1) File "E:\software\Anaconda3\envs\Knover\lib\site-packages\paddle\tensor\manipulation.py", line 345, in concat return paddle.fluid.layers.concat(input=x, axis=axis, name=name) File "E:\software\Anaconda3\envs\Knover\lib\site-packages\paddle\fluid\layers\tensor.py", line 327, in concat return _C_ops.concat(input, 'axis', axis) ValueError: (InvalidArgument) Tensor holds the wrong type, it holds int, but desires to be int64_t. [Hint: Expected valid == true, but received valid:0 != true:1.] (at ../paddle/fluid/framework/tensor_impl.h:33) [operator < concat > error] INFO 2022-04-18 15:40:36,201 launch_utils.py:341] terminate all the procs ERROR 2022-04-18 15:40:36,201 launch_utils.py:604] ABORT!!! Out of all 1 trainers, the trainer process with rank=[0] was aborted. Please check its log. INFO 2022-04-18 15:40:39,210 launch_utils.py:341] terminate all the procs INFO 2022-04-18 15:40:39,210 launch.py:311] Local processes completed.

exit_code=0 [[ 0 != 0 ]] exit 0

opened by chikin-lau 6
About PLATO-KAG

First of all thank you very much for your work. When I run the code of PLATO-KAG as instructed, there is a problem. Would you mind answering this question？thank you very much.

opened by bingfeiz 6
为什么vocab里必须既有[UNK]又有呢？

看代码的规则，vocab里既要有[UNK]又要有<unk>，否则会报错，这两个token都代表未知词吧，有什么区别吗？另外我看例子中英语的vocab有些token的ids重复了，如下，不明白为什么，重复的id不会被覆盖吗？自己做vocab的时候也要改成重复的吗？ <unk> 0 <s> 1 </s> 2 [UNK] 0 [PAD] 0 [CLS] 1 [SEP] 2
enhancement

opened by guijuzhejiang 6

DSTC10-Track2/task2 inference code: Error while running the command 'bash ./submission_0_infer.sh'

When running DSTC10-Track2/task2 (Knowledge-grounded Dialogue Modeling), I got error message like this. Please help me to fix the following error.

the error message:

Load pretraining parameters from /home/Knover/projects/DSTC10-Track2/task2/models/SOP-32L-Detection
Traceback (most recent call last):
  File "/home/Knover/knover/data/dialog_reader.py", line 578, in __wrapper__
    for batch in batch_reader():
  File "/home/Knover/knover/data/dialog_reader.py", line 517, in __wrapper__
    for batch in batch_reader():
  File "/home/Knover/knover/data/dialog_reader.py", line 432, in __wrapper__
    for record in reader():
  File "/home/Knover/knover/data/dialog_reader.py", line 369, in __wrapper__
    yield from self._read_numerical_file(fp, phase, is_infer)
TypeError: _read_numerical_file() takes from 2 to 3 positional arguments but 4 were given
WARNING:root:Your reader has raised an exception!
Traceback (most recent call last):
Exception in thread   File "./knover/scripts/infer.py", line 145, in <module>
Thread-1:
Traceback (most recent call last):
      File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
infer(args)
    for step, data in enumerate(infer_generator(), 1):
  File "/home/Knover/myenv/lib/python3.8/site-packages/paddle/fluid/reader.py", line 1392, in __next__
        return self._reader.read_next()self.run()

SystemError:   File "/usr/lib/python3.8/threading.py", line 870, in run
(Fatal) Blocking queue is killed because the data reader raises an exception.
  [Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at /paddle/paddle/fluid/operators/reader/blocking_queue.h:166)

Thank you for taking the time to review this.

opened by JH-debug 5

关于使用finetune Knover进行infer时，同一个语料库的infer结果每次都不一样

你好，我使用自己的数据对Knover Classfier进行finetune后保存checkpoint （我们发现保存的checkpoint有2612个参数文件比提供的SOP-32L-Context模型的522个多了近4倍），然后基于这个checkpoint使用infer.sh进行预测，但是同一个数据集的预测结果每次都不一致，请问这种情况正常么？该如何解决？

opened by luomou97 5
InvalidArgumentError: Broadcast dimension mismatch

您好，我使用Knover训练了一个Plato2模型，但在使用hub serving start部署到我的后台后，使用jmeter测试，jmeter客户端报错。报错内容如下： InvalidArgumentError: Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [20, 16, 20, 27] and the shape of Y = [20, 16, 1, 8]. Received [27] in X is not equal to [8] in Y at i:3. [Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at /paddle/paddle/phi/kernels/funcs/common_shape.h:84) [operator < elementwise_add > error]","results":"","status":"101"} 我的环境如下：服务部署平台：paddlepaddle-gpu容器。容器版本：paddlepaddle/paddle 2.3.2-gpu-cuda11.2-cudnn8 Knover版本：0.0.6 GPU数量：1 paddlehub版本：2.3.0 我完成了以下方案的测试： 1、export CUDA_VISIBLE_DEVICES=0 2、因为本地运行interact.py对应脚本成功，因此我将AIstudio上一位开发者的开源项目中的module.py中关于数据加载的部分按照Knover/knover/core/model.py中对应的部分重写了一次，但仍然报错。对比两步骤发现：本地调用时，将数据转换为tensor的部分shape恒定为20，但hub部署的服务过程中的tensor的shape会随着文本分词后的长度而变化。我不太清楚应该修改哪个部分，请问是否有这方面的方案，或者在该版本下的plato2_en_base的部署教程啊？ 2中的开源作者的AIStudio的链接为：https://aistudio.baidu.com/aistudio/projectdetail/1197592 谢谢

opened by what-is-perfect 2
请问Link theWorld这个论文中2.1节Service Information的service API是如何构建的

感谢百度一直以来在中文对话上的工作~

我想咨询一下论文《Link theWorld: Improving Open-domain Conversation with Dynamic Spatiotemporal-aware Knowledge》中2.1节Service Information第一段中的service API具体是如何构建的。我看了论文，觉得全文最重要的就是这个service API的构建，假若构造的足够好的话，确实可以大大提高人机交互体验。但是论文似乎并没有细说这部分的工作以及相关开源的代码/数据。确实非常好奇~

非常期待能得到您的回复！谢谢~

opened by cingtiye 0
如何基于现有的开源英文plato-2模型，搭建一个中文多轮对话机器人

各位大佬，

请问如何基于现有的开源英文plato-2模型，搭建一个中文多轮对话机器人？本人看了下面的链接，但还是对如何使用英文的plato-2搭建适用于中文多轮对话任务的plato-2模型表示不太了解。能否请各位大佬提供一些更详细的细节？还能否请各位已经实现的大佬共享一些代码供小弟参考，谢谢。

链接： https://github.com/PaddlePaddle/Knover/issues/25

opened by ZeyuTeng96 0

Owner

GitHub

SEOVER: Sentence-level Emotion Orientation Vector based Conversation Emotion Recognition Model

SEOVER-Master This code is the implementation of paper： SEOVER: Sentence-level Emotion Orientation Vector based Conversation Emotion Recognition Model

4 Feb 24, 2022

Feedback is important: response-aware feedback mechanism for background based conversation

RFM The code for the paper: "Feedback is important: response-aware feedback mechanism for background based conversation." Requirements python 3.7 pyto

2 Sep 29, 2022

DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021)

DeepLM DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021) Run Please install th

130 Dec 2, 2022

Code for CVPR2021 "Visualizing Adapted Knowledge in Domain Transfer". Visualization for domain adaptation. #explainable-ai

Visualizing Adapted Knowledge in Domain Transfer @inproceedings{hou2021visualizing, title={Visualizing Adapted Knowledge in Domain Transfer}, auth

80 Dec 25, 2022

Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting (ICCV, 2021)

DKPNet ICCV 2021 Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting Baseline of DKPNet is availa

19 Oct 14, 2022

Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training

ColossalAI An integrated large-scale model training system with efficient parallelization techniques Installation PyPI pip install colossalai Install

7.1k Jan 3, 2023

Source codes for "Structure-Aware Abstractive Conversation Summarization via Discourse and Action Graphs"

Structure-Aware-BART This repo contains codes for the following paper: Jiaao Chen, Diyi Yang:Structure-Aware Abstractive Conversation Summarization vi

56 Dec 8, 2022

Rank1 Conversation Emotion Detection Task

Rank1-Conversation_Emotion_Detection_Task accuracy macro-f1 recall 0.826 0.7544 0.719 基于预训练模型和时序预测模型的对话情感探测任务 1 摘要针对对话情感探测任务，本文将其分为文本分类和时间序列预测两个子任务，分

2 Nov 28, 2021

Building Ellee — A GPT-3 and Computer Vision Powered Talking Robotic Teddy Bear With Human Level Conversation Intelligence

Using an object detection and facial recognition system built on MobileNetSSDV2 and Dlib and running on an NVIDIA Jetson Nano, a GPT-3 model, Google Speech Recognition, Amazon Polly and servo motors, I built Ellee - a robotic teddy bear who can move her head and converse naturally.

24 Oct 26, 2022

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus General info This is

71 Oct 25, 2022

Open-AI's DALL-E for large scale training in mesh-tensorflow.

DALL-E in Mesh-Tensorflow [WIP] Open-AI's DALL-E in Mesh-Tensorflow. If this is similarly efficient to GPT-Neo, this repo should be able to train mode

432 Dec 16, 2022

ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

ManiSkill-Learn ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge, a large-scale learning-from-dem

48 Dec 30, 2022

Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)

Large-Scale Long-Tailed Recognition in an Open World [Project] [Paper] [Blog] Overview Open Long-Tailed Recognition (OLTR) is the author's re-implemen

761 Dec 26, 2022

OSLO: Open Source framework for Large-scale transformer Optimization

O S L O Open Source framework for Large-scale transformer Optimization What's New: December 21, 2021 Released OSLO 1.0. What is OSLO about? OSLO is a

280 Nov 24, 2022

Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

ResDAVEnet-VQ Official PyTorch implementation of Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech What is in this repo? M

21 Aug 23, 2022

ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

ALFRED A Benchmark for Interpreting Grounded Instructions for Everyday Tasks Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han,

204 Dec 15, 2022

PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner [Li et al., 2020].

VGPL-Visual-Prior PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner (VGPL). Give

8 Dec 29, 2022

[CVPR'22] Official PyTorch Implementation of Collaborative Transformers for Grounded Situation Recognition

[CVPR'22] Collaborative Transformers for Grounded Situation Recognition Paper | Model Checkpoint This is the official PyTorch implementation of Collab

29 Dec 10, 2022

An algorithm that handles large-scale aerial photo co-registration, based on SURF, RANSAC and PyTorch autograd.

41 Oct 29, 2022