Crowd sourced training data for Rasa NLU models

Overview

Open in Streamlit

NLU Training Data

Crowd-sourced training data for the development and testing of Rasa NLU models.

If you're interested in grabbing some data feel free to check out our live data fetching ui.


About this repository

This is an experiment with the goal of providing basic training data for developing chatbots, therefore, this repository is open for contributions!

We need your help to create an open source dataset to empower chatbot makers and conversational AI enthusiasts alike, and we very much appreciate your support in expanding the collection of data available to the community.

How do I donate my training data?

Each folder should contain a list of multiple intents, consider if the set of training data you're contributing could fit within an existing folder before creating a new one.

To contribute via pull request, follow these steps:

  1. Create an issue describing the training data you would like to contribute.

  2. Create a new file with a folder title and a NLU.yml file, or contribute to an existing folder.

  3. In the NLU.yml file, format your training data using YAML, remove all entities (see script), title each section with the intent types and add a short description e.g.intent:inform_rain <!--The user says that it is currently raining somewhere.-->

  4. Update the README.md file, include a list of the intent types added.

  5. Create a pull request describing your changes.

Your pull request will be reviewed by a maintainer, who will get back to you about any necessary changes or questions. You will also be asked to sign a Contributor License Agreement.

FAQs

How should I label my intents?

Please always put the domain at the end of each intent. For example: ask_transport

What do I do about multi-intent utterences?

If you would like to contribute multi-intent utterences, please add a + to indicate an additional intent, for example: affirm+ask_transport

What about training data that’s not in English?

Currently, we are unable to evaluate the quality of all language contributions, and therefore, during the initial phase we can only accept English training data to the repository. However, we understand that the Rasa community is a global one, and in the long-term we would like to find a solution for this in collaboration with the community.

Why do I need to remove entities from my training data?

We would like to make the training data as easy as possible to adopt to new training models and annotating entities highly dependent on your bot’s purpose. Therefore, we will first focus on collecting training data that only includes intents.

To help you remove the annotated entities from your training data, you can run this script.


About Rasa

Comments
  • Adding new intents + training examples

    Adding new intents + training examples

    Adding new intents, namely: online_banking, retirement_plan, change_pin, interest_rate. Also contributing with some extra examples on transfer_money. I'd suggest the duplicate intents to be merged together.

    opened by luiseloi 5
  • Add more currencies and more examples for the existing intents

    Add more currencies and more examples for the existing intents

    I have gathered Natural language inputs from 26 people from 9 different countries, you can see the sample from the file here. I have also gather some new intents from them but still need to analyze better the data

    opened by jusce17 5
  • New Folder: Fixing Internet, and additions to existing folders

    New Folder: Fixing Internet, and additions to existing folders

    New Folder: Fixing Internet. 4 Intents + Readme

    Retail: 2 new intents + additional utterances to existing intent

    Smalltalk: 1 new intent + renamed intents to follow convention + added utterances to existing intent

    opened by pdrabinski 4
  • Add Telecom Training Data

    Add Telecom Training Data

    Hi,

    I'd like to add a folder for "fixing internet" that would include intents such as "internet_is_down", "internet_is_slow", "need_a_new_router", etc.

    Best,

    Paul

    opened by pdrabinski 4
  • Adding examples to mood nlu.md

    Adding examples to mood nlu.md

    Please do let me know if there is a number of examples to add. Also, any specification of domain, or general mood tweets.

    Looking forward to contribution

    opened by shaz13-socgen 4
  • Right single quotation mark being used instead of apostrophe

    Right single quotation mark being used instead of apostrophe

    Looking through the training data, I noticed that the right single quotation mark (U+2019) is being used in several places instead of the apostrophe ', for instance:

        - I’d like to make a transfer
        - I’d like to send money
    

    I imagine that this would have an effect on downstream ML processes as the apostrophe is more commonly recognized in this context. Happy to open a pull request if this is an issue.

    opened by happilyeverafter95 3
  • Added  More Mood Training Examples and Intents

    Added More Mood Training Examples and Intents

    Added more training examples for mood_happy, mood_unhappy, mood_angry and mood_nervous. Added 4 new moods:

    • mood_bored
    • mood_excited
    • mood_lonely
    • mood_tired
    opened by saumitrasapre 3
  • Added more top traded currencies and more data for intents

    Added more top traded currencies and more data for intents

    Here are the things that I added:

    • Added more top-traded currencies that were missing from the list

    • Added some common messages that users give for show_accounts intent and transfer_money intent

    opened by navendu-pottekkat 3
  • script should use rasa methods

    script should use rasa methods

    https://github.com/RasaHQ/NLU-training-data/blob/f99c3ad15df7bdd98788062f2ac4d36f5dac5175/how-to-remove-entities/entity-remove-script#L12

    @JEM-Mosig (@EmmaWightman said you wrote this script, right?)

    this should use rasa 's built in methods, rather than a regex. that way this also works for the json format, here's a rough example:

    from rasa.nlu.training_data import load_data
    
    data = load_data('test.md')
    
    for e in data.training_examples:
        e.set("entities", [])
    
    with open('no_entities.md', 'w') as f:
        f.write(data.nlu_as_markdown())
    
    opened by amn41 3
  • adds 54 new intents to smalltalk and 5 new intents to mood + extra training data for a few out of the box intents

    adds 54 new intents to smalltalk and 5 new intents to mood + extra training data for a few out of the box intents

    I've added more data to mood and smalltalk categories. Mood

    • added data to mood_unhappy
    • added data to mood_great

    New mood intents

    • mood_angry
    • mood_busy
    • mood_cannotsleep
    • mood_excited

    Small Talk

    • added data to greet
    • added data to goodbyes
    • added data to affirm

    New intents to SmallTalk

    • added deny
    • agent_acquaintance
    • agent_age
    • agent_annoying
    • agent_bad
    • agent_beclever
    • agent_beautiful
    • agent_birthday
    • agent_boring
    • agent_boss
    • agent_busy
    • agent_canyouhelp
    • agent_chatbot
    • agent_clever
    • agent_crazy
    • agent_fire
    • agent_funny
    • agent_good
    • agent_happy
    • agent_hobby
    • agent_hungry
    • agent_marryuser
    • agent_myfriend
    • agent_occupation
    • agent_origin
    • agent_ready
    • agent_real
    • agent_residence
    • agent_right
    • agent_sure
    • agent_talktome
    • agent_there
    • appraisal_bad
    • appraisal_good
    • appraisal_noproblem
    • appraisal_thankyou
    • appraisal_welcome
    • dialog_holdon
    • dialog_hug
    • dialog_idontcare
    • dialog_sorry
    • greetings_howareyou
    • greetings_nicetomeetyou
    • greetings_nicetoseeyou
    • greetings_nicetotalktoyou
    • user_angry
    • user_back
    • user_bored
    • user_busy
    • user_cannotsleep
    • user_excited
    • user_likeagent
    • user_testing
    • user_lovesagent
    • user_needsadvice

    Any questions, let me know.

    opened by jwheat 3
  • Added crypto_currency as intent to banking/nlu.md & fixed grammar errors in the banking/nlu.md file & added Vietnamese Dong as data for currency intent

    Added crypto_currency as intent to banking/nlu.md & fixed grammar errors in the banking/nlu.md file & added Vietnamese Dong as data for currency intent

    1. Added crypto_currency as intent to banking/nlu.md
    2. fixed grammar errors in the banking/nlu.md file
    3. added Vietnamese Dong as data for currency intent

    #58

    opened by congnguyendinh0 2
  • Adding more training data about Banking

    Adding more training data about Banking

    Hi,

    I added more examples of training data about banking mixing also with their contribution, even though they closed their pull request.

    Issue: https://github.com/RasaHQ/contributors/issues/17

    opened by guilhermemoraisr 1
  • Adding data to the restaurant folder

    Adding data to the restaurant folder

    More training data is currently being added to the restaurant folder.

    New intents:

    • restaurant_opening_hours

    Modified intents:

    • locate_restaurant
    • restaurant_type
    • table_reservation
    • restaurant_review

    Related issue: #77

    opened by AlvaroLeles 1
  • Bump streamlit from 0.76.0 to 1.11.1

    Bump streamlit from 0.76.0 to 1.11.1

    Bumps streamlit from 0.76.0 to 1.11.1.

    Release notes

    Sourced from streamlit's releases.

    1.11.1

    No release notes provided.

    1.11.0

    No release notes provided.

    1.10.0

    No release notes provided.

    1.9.2

    No release notes provided.

    1.9.1

    No release notes provided.

    1.9.0

    No release notes provided.

    1.8.1

    No release notes provided.

    1.8.0

    No release notes provided.

    1.7.0

    • ❄️ Add st.snow()!

    1.6.0

    • 🗜 WebSocket compression is now disabled by default, which will improve CPU and latency performance for large dataframes. You can use the server.enableWebsocketCompression  configuration option to re-enable it if you find the increased network traffic more impactful.
    • ☑️ 🔘 Radio and checkboxes improve focus on Keyboard navigation (#4308)

    1.5.1

    No release notes provided.

    1.5.0

    Release date: Jan 27, 2022

    Notable Changes

    • 🌟 Favicon defaults to a PNG to allow for transparency (#4272).
    • 🚦 Select Slider Widget now has the disabled parameter that removes interactivity (completing all of our widgets) (#4314).

    Other Changes

    • 🔤 Improvements to our markdown library to provide better support for HTML (specifically nested HTML) (#4221).
    • 📖 Expanders maintain their expanded state better when multiple expanders are present (#4290).
    • 🗳 Improved file uploader and camera input to call its on_change handler only when necessary (#4270).

    1.4.0

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
Owner
Rasa
Open source machine learning tools for developers to build, improve, and deploy text-and voice-based chatbots and assistants
Rasa
customer care chatbot made with Rasa Open Source.

Customer Care Bot Customer care bot for ecomm company which can solve faq and chitchat with users, can contact directly to team. ?? Features Basic E-c

Dishant Gandhi 23 Oct 27, 2022
COVID-19 Chatbot with Rasa 2.0: open source conversational AI

COVID-19 chatbot implementation with Rasa open source 2.0, conversational AI framework.

Aazim Parwaz 1 Dec 23, 2022
🤖 Basic Financial Chatbot with handoff ability built with Rasa

Financial Services Example Bot This is an example chatbot demonstrating how to build AI assistants for financial services and banking with Rasa. It in

Mohammad Javad Hossieni 4 Aug 10, 2022
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

Speech-Backbones This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab. Grad-TTS Official implementation of the Grad-

HUAWEI Noah's Ark Lab 295 Jan 7, 2023
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

Rasa 15.3k Dec 30, 2022
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

Rasa 15.3k Jan 3, 2023
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

Rasa 10.8k Feb 18, 2021
Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

null 186 Dec 24, 2022
Code for PED: DETR For (Crowd) Pedestrian Detection

Code for PED: DETR For (Crowd) Pedestrian Detection

null 36 Sep 13, 2022
This project is part of Eleuther AI's quest to create a massive repository of high quality text data for training language models.

This project is part of Eleuther AI's quest to create a massive repository of high quality text data for training language models.

EleutherAI 42 Dec 13, 2022
Universal End2End Training Platform, including pre-training, classification tasks, machine translation, and etc.

背景 安装教程 快速上手 (一)预训练模型 (二)机器翻译 (三)文本分类 TenTrans 进阶 1. 多语言机器翻译 2. 跨语言预训练 背景 TrenTrans是一个统一的端到端的多语言多任务预训练平台,支持多种预训练方式,以及序列生成和自然语言理解任务。 安装教程 git clone git

Tencent Minority-Mandarin Translation Team 42 Dec 20, 2022
A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

ParlAI (pronounced “par-lay”) is a python framework for sharing, training and testing dialogue models, from open-domain chitchat, to task-oriented dia

Facebook Research 9.7k Jan 9, 2023
🏖 Easy training and deployment of seq2seq models.

Headliner Headliner is a sequence modeling library that eases the training and in particular, the deployment of custom sequence models for both resear

Axel Springer Ideas Engineering GmbH 231 Nov 18, 2022
A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

ParlAI (pronounced “par-lay”) is a python framework for sharing, training and testing dialogue models, from open-domain chitchat, to task-oriented dia

Facebook Research 7k Feb 18, 2021
🏖 Easy training and deployment of seq2seq models.

Headliner Headliner is a sequence modeling library that eases the training and in particular, the deployment of custom sequence models for both resear

Axel Springer Ideas Engineering GmbH 220 Feb 10, 2021
Ongoing research training transformer language models at scale, including: BERT & GPT-2

What is this fork of Megatron-LM and Megatron-DeepSpeed This is a detached fork of https://github.com/microsoft/Megatron-DeepSpeed, which in itself is

BigScience Workshop 316 Jan 3, 2023
Ongoing research training transformer language models at scale, including: BERT & GPT-2

Megatron (1 and 2) is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA.

NVIDIA Corporation 3.5k Dec 30, 2022
A2T: Towards Improving Adversarial Training of NLP Models (EMNLP 2021 Findings)

A2T: Towards Improving Adversarial Training of NLP Models This is the source code for the EMNLP 2021 (Findings) paper "Towards Improving Adversarial T

QData 17 Oct 15, 2022