A high-level yet extensible library for fast language model tuning via automatic prompt search

Sber AI

Last update: Dec 7, 2022

Related tags

Text Data & NLP ru-prompts

Overview

ruPrompts

ruPrompts is a high-level yet extensible library for fast language model tuning via automatic prompt search, featuring integration with HuggingFace Hub, configuration system powered by Hydra, and command line interface.

Prompt is a text instruction for language model, like

Translate English to French:
cat =>

For some tasks the prompt is obvious, but for some it isn't. With ruPrompts you can define only the prompt format, like {text}, and train it automatically for any task, if you have a training dataset.

You can currently use ruPrompts for text-to-text tasks, such as summarization, detoxification, style transfer, etc., and for styled text generation, as a special case of text-to-text.

Features

Modular structure for convenient extensibility
Integration with HF Transformers, support for all models with LM head
Integration with HF Hub for sharing and loading pretrained prompts
CLI and configuration system powered by Hydra
Pretrained prompts for ruGPT-3

Installation

ruPrompts can be installed with pip:

pip install ruprompts[hydra]

See Installation for other installation options.

Usage

Loading a pretrained prompt for styled text generation:

>> ppln_joke("Говорит кружка ложке") [{"generated_text": 'Говорит кружка ложке: "Не бойся, не утонешь!".'}]">

>>> import ruprompts
>>> from transformers import pipeline

>>> ppln_joke = pipeline("text-generation-with-prompt", prompt="konodyuk/prompt_rugpt3large_joke")
>>> ppln_joke("Говорит кружка ложке")
[{"generated_text": 'Говорит кружка ложке: "Не бойся, не утонешь!".'}]

For text2text tasks:

>> ppln_detox("Опять эти тупые дятлы все испортили, чтоб их черти взяли") [{"generated_text": 'Опять эти люди все испортили'}]">

>>> ppln_detox = pipeline("text2text-generation-with-prompt", prompt="konodyuk/prompt_rugpt3large_detox_russe")
>>> ppln_detox("Опять эти тупые дятлы все испортили, чтоб их черти взяли")
[{"generated_text": 'Опять эти люди все испортили'}]

Proceed to Quick Start for a more detailed introduction or start using ruPrompts right now with our Colab Tutorials.

License

ruPrompts is Apache 2.0 licensed. See the LICENSE file for details.

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

Word2Wave is a simple method for text-controlled GAN audio generation. You can either follow the setup instructions below and use the source code and CLI provided in this repo or you can have a play around in the Colab notebook provided. Note that, in both cases, you will need to train a WaveGAN model first

91 Dec 23, 2022

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Google Text-To-Speech Batch Prompt File Maker Are you in the need of IVR prompts, but you have no voice actors? Let Google talk your prompts like a pr

1 Aug 19, 2021

Easy, fast, effective, and automatic g-code compression!

Getting to the meat of g-code. Easy, fast, effective, and automatic g-code compression! MeatPack nearly doubles the effective data rate of a standard

97 Nov 21, 2022

Blazing fast language detection using fastText model

Luga A blazing fast language detection using fastText's language models Luga is a Swahili word for language. fastText provides a blazing fast language

18 Dec 20, 2022

An ultra fast tiny model for lane detection, using onnx_parser, TensorRTAPI, torch2trt to accelerate. our model support for int8, dynamic input and profiling. (Nvidia-Alibaba-TensoRT-hackathon2021)

Ultra_Fast_Lane_Detection_TensorRT An ultra fast tiny model for lane detection, using onnx_parser, TensorRTAPI to accelerate. our model support for in

121 Dec 27, 2022

A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

Styleformer A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/cas

431 Dec 19, 2022

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

fastNLP fastNLP是一款轻量级的自然语言处理（NLP）工具包，目标是快速实现NLP任务以及构建复杂模型。 fastNLP具有如下的特性：统一的Tabular式数据容器，简化数据预处理过程；内置多种数据集的Loader和Pipe，省去预处理代码; 各种方便的NLP工具，例如Embedd

2.8k Jan 1, 2023

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

2k Feb 14, 2021

Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages

Coreferee Author: Richard Paul Hudson, msg systems ag 1. Introduction 1.1 The basic idea 1.2 Getting started 1.2.1 English 1.2.2 German 1.2.3 Polish 1

169 Dec 21, 2022

Comments

Doest work on colab - ImportError: cannot import name 'cached_path' from 'transformers.file_utils' (/usr/local/lib/python3.8/dist-packages/transformers/file_utils.py)

%pip install ruprompts
import ruprompts

from transformers import pipeline


---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
[<ipython-input-2-c349a1a8aeb4>](https://localhost:8080/#) in <module>
----> 1 import ruprompts
      2 
      3 from transformers import pipeline

2 frames
[/usr/local/lib/python3.8/dist-packages/ruprompts/prompt.py](https://localhost:8080/#) in <module>
      7 from torch import nn
      8 from transformers import AutoModel, AutoTokenizer, PreTrainedModel, PreTrainedTokenizerBase
----> 9 from transformers.file_utils import PushToHubMixin, cached_path, hf_bucket_url
     10 from typeguard import typechecked
     11 

ImportError: cannot import name 'cached_path' from 'transformers.file_utils' (/usr/local/lib/python3.8/dist-packages/transformers/file_utils.py)

---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------

opened by Diyago 0

A high-level yet extensible library for fast language model tuning via automatic prompt search

Related tags

Overview

ruPrompts

Features

Installation

Usage

License

You might also like...

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Easy, fast, effective, and automatic g-code compression!

Blazing fast language detection using fastText model

An ultra fast tiny model for lane detection, using onnx_parser, TensorRTAPI, torch2trt to accelerate. our model support for int8, dynamic input and profiling. (Nvidia-Alibaba-TensoRT-hackathon2021)

A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages

Comments

Doest work on colab - ImportError: cannot import name 'cached_path' from 'transformers.file_utils' (/usr/local/lib/python3.8/dist-packages/transformers/file_utils.py)

Owner

Sber AI

Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification"

A high-level Python library for Quantum Natural Language Processing

This is the Alpha of Nutte language, she is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks

Incorporating KenLM language model with HuggingFace implementation of Wav2Vec2CTC Model using beam search decoding

An example project using OpenPrompt under pytorch-lightning for prompt-based SST2 sentiment analysis model

Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.