A high-level yet extensible library for fast language model tuning via automatic prompt search

Overview

ruPrompts

ruPrompts is a high-level yet extensible library for fast language model tuning via automatic prompt search, featuring integration with HuggingFace Hub, configuration system powered by Hydra, and command line interface.

Prompt is a text instruction for language model, like

Translate English to French:
cat =>

For some tasks the prompt is obvious, but for some it isn't. With ruPrompts you can define only the prompt format, like {text} , and train it automatically for any task, if you have a training dataset.

You can currently use ruPrompts for text-to-text tasks, such as summarization, detoxification, style transfer, etc., and for styled text generation, as a special case of text-to-text.

Features

  • Modular structure for convenient extensibility
  • Integration with HF Transformers, support for all models with LM head
  • Integration with HF Hub for sharing and loading pretrained prompts
  • CLI and configuration system powered by Hydra
  • Pretrained prompts for ruGPT-3

Installation

ruPrompts can be installed with pip:

pip install ruprompts[hydra]

See Installation for other installation options.

Usage

Loading a pretrained prompt for styled text generation:

>> ppln_joke("Говорит кружка ложке") [{"generated_text": 'Говорит кружка ложке: "Не бойся, не утонешь!".'}]">
>>> import ruprompts
>>> from transformers import pipeline

>>> ppln_joke = pipeline("text-generation-with-prompt", prompt="konodyuk/prompt_rugpt3large_joke")
>>> ppln_joke("Говорит кружка ложке")
[{"generated_text": 'Говорит кружка ложке: "Не бойся, не утонешь!".'}]

For text2text tasks:

>> ppln_detox("Опять эти тупые дятлы все испортили, чтоб их черти взяли") [{"generated_text": 'Опять эти люди все испортили'}]">
>>> ppln_detox = pipeline("text2text-generation-with-prompt", prompt="konodyuk/prompt_rugpt3large_detox_russe")
>>> ppln_detox("Опять эти тупые дятлы все испортили, чтоб их черти взяли")
[{"generated_text": 'Опять эти люди все испортили'}]

Proceed to Quick Start for a more detailed introduction or start using ruPrompts right now with our Colab Tutorials.

License

ruPrompts is Apache 2.0 licensed. See the LICENSE file for details.

You might also like...
Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

Word2Wave is a simple method for text-controlled GAN audio generation. You can either follow the setup instructions below and use the source code and CLI provided in this repo or you can have a play around in the Colab notebook provided. Note that, in both cases, you will need to train a WaveGAN model first

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Google Text-To-Speech Batch Prompt File Maker Are you in the need of IVR prompts, but you have no voice actors? Let Google talk your prompts like a pr

Easy, fast, effective, and automatic g-code compression!
Easy, fast, effective, and automatic g-code compression!

Getting to the meat of g-code. Easy, fast, effective, and automatic g-code compression! MeatPack nearly doubles the effective data rate of a standard

Blazing fast language detection using fastText model

Luga A blazing fast language detection using fastText's language models Luga is a Swahili word for language. fastText provides a blazing fast language

An ultra fast tiny model for lane detection, using onnx_parser, TensorRTAPI, torch2trt to accelerate. our model support for int8, dynamic input and profiling. (Nvidia-Alibaba-TensoRT-hackathon2021)

Ultra_Fast_Lane_Detection_TensorRT An ultra fast tiny model for lane detection, using onnx_parser, TensorRTAPI to accelerate. our model support for in

A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration. fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

fastNLP fastNLP是一款轻量级的自然语言处理(NLP)工具包,目标是快速实现NLP任务以及构建复杂模型。 fastNLP具有如下的特性: 统一的Tabular式数据容器,简化数据预处理过程; 内置多种数据集的Loader和Pipe,省去预处理代码; 各种方便的NLP工具,例如Embedd

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

fastNLP fastNLP是一款轻量级的自然语言处理(NLP)工具包,目标是快速实现NLP任务以及构建复杂模型。 fastNLP具有如下的特性: 统一的Tabular式数据容器,简化数据预处理过程; 内置多种数据集的Loader和Pipe,省去预处理代码; 各种方便的NLP工具,例如Embedd

Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages
Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages

Coreferee Author: Richard Paul Hudson, msg systems ag 1. Introduction 1.1 The basic idea 1.2 Getting started 1.2.1 English 1.2.2 German 1.2.3 Polish 1

Comments
  • Doest work on colab - ImportError: cannot import name 'cached_path' from 'transformers.file_utils' (/usr/local/lib/python3.8/dist-packages/transformers/file_utils.py)

    Doest work on colab - ImportError: cannot import name 'cached_path' from 'transformers.file_utils' (/usr/local/lib/python3.8/dist-packages/transformers/file_utils.py)

    %pip install ruprompts
    import ruprompts
    
    from transformers import pipeline
    
    
    ---------------------------------------------------------------------------
    ImportError                               Traceback (most recent call last)
    [<ipython-input-2-c349a1a8aeb4>](https://localhost:8080/#) in <module>
    ----> 1 import ruprompts
          2 
          3 from transformers import pipeline
    
    2 frames
    [/usr/local/lib/python3.8/dist-packages/ruprompts/prompt.py](https://localhost:8080/#) in <module>
          7 from torch import nn
          8 from transformers import AutoModel, AutoTokenizer, PreTrainedModel, PreTrainedTokenizerBase
    ----> 9 from transformers.file_utils import PushToHubMixin, cached_path, hf_bucket_url
         10 from typeguard import typechecked
         11 
    
    ImportError: cannot import name 'cached_path' from 'transformers.file_utils' (/usr/local/lib/python3.8/dist-packages/transformers/file_utils.py)
    
    ---------------------------------------------------------------------------
    NOTE: If your import is failing due to a missing package, you can
    manually install dependencies using either !pip or !apt.
    
    To view examples of installing some common dependencies, click the
    "Open Examples" button below.
    ---------------------------------------------------------------------------
    
    opened by Diyago 0
Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification"

PTR Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification" If you use the code, please cite the following paper: @art

THUNLP 118 Dec 30, 2022
A high-level Python library for Quantum Natural Language Processing

lambeq About lambeq is a toolkit for quantum natural language processing (QNLP). Documentation: https://cqcl.github.io/lambeq/ Getting started Prerequ

Cambridge Quantum 315 Jan 1, 2023
This is the Alpha of Nutte language, she is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda

nutte-language This is the Alpha of Nutte language, it is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda My language was

catdochrome 2 Dec 18, 2021
Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

CTC Decoding Algorithms Update 2021: installable Python package Python implementation of some common Connectionist Temporal Classification (CTC) decod

Harald Scheidl 736 Jan 3, 2023
Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks, which modifies the input text with a textual template and directly uses PLMs to conduct pre-trained tasks. This library provides a standard, flexible and extensible framework to deploy the prompt-learning pipeline. OpenPrompt supports loading PLMs directly from huggingface transformers. In the future, we will also support PLMs implemented by other libraries.

THUNLP 2.3k Jan 8, 2023
Incorporating KenLM language model with HuggingFace implementation of Wav2Vec2CTC Model using beam search decoding

Wav2Vec2CTC With KenLM Using KenLM ARPA language model with beam search to decode audio files and show the most probable transcription. Assuming you'v

farisalasmary 65 Sep 21, 2022
An example project using OpenPrompt under pytorch-lightning for prompt-based SST2 sentiment analysis model

pl_prompt_sst An example project using OpenPrompt under the framework of pytorch-lightning for a training prompt-based text classification model on SS

Zhiling Zhang 5 Oct 21, 2022
Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

Lightning ASR Modular and extensible speech recognition library leveraging pytorch-lightning and hydra What is Lightning ASR • Installation • Get Star

Soohwan Kim 40 Sep 19, 2022
Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

CodeBERT-Implementation In this repo we have replicated the paper CodeBERT: A Pre-Trained Model for Programming and Natural Languages. We are interest

Tanuj Sur 4 Jul 1, 2022