Catbird is an open source paraphrase generation toolkit based on PyTorch.

Related tags

Overview

Catbird is an open source paraphrase generation toolkit based on PyTorch.

Quick Start

Requirements and Installation

The project is based on PyTorch 1.5+ and Python 3.6+.

Install Catbird

The package can be installed using pip:

pip install catbird

This does not include configuration files or tools. Alternatively, you can run from the source code:

a. Clone the repository.

git clone https://github.com/AfonsoSalgadoSousa/catbird.git

b. Install dependencies. This project uses Poetry as its package manager. There should Make sure you have it installed. For more info check Poetry's official documentation. To install dependencies, simply run:

poetry install

Dataset Preparation

For now, we only work with the Quora Question Pairs dataset. It is recommended to download and extract the datasets somewhere outside the project directory and symlink the dataset root to $CATBIRD/data as below. If your folder structure is different, you may need to change the corresponding paths in config files.

catbird
├── catbird
├── tools
├── configs
├── data
│   ├── quora
│   │   ├── quora_duplicate_questions.tsv

We use the HuggingFace Datasets library to load the datasets.

Train

poetry run python tools/train.py ${CONFIG_FILE} [optional arguments]

Example:

Train T5 on QQP.

$ python tools/train.py configs/t5_quora.yaml

Contributors

Afonso Sousa ([email protected])

You might also like...

MMDetection3D is an open source object detection toolbox based on PyTorch

MMDetection3D is an open source object detection toolbox based on PyTorch, towards the next-generation platform for general 3D detection. It is a part of the OpenMMLab project developed by MMLab.

3.2k Jan 5, 2023

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models. Developers can reproduce these SOTA methods and build their own methods.

405 Jan 4, 2023

A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.

Knodle (Knowledge-supervised Deep Learning Framework) - a new framework for weak supervision with neural networks. It provides a modularization for se

93 Nov 6, 2022

GeneralOCR is open source Optical Character Recognition based on PyTorch.

Comments

Add an example how to generate paraphrases

Hi @AfonsoSalgadoSousa ,

I was checking the README of this project and was trying to figure out what does exactly the project. Can you please add an example of usage with expected output or some pretrained models and instructions who user could download it and run inference.py script to generate some output.

Thanks!

opened by vfdev-5 1

Catbird is an open source paraphrase generation toolkit based on PyTorch.

Related tags

Overview

Quick Start

Requirements and Installation

Install Catbird

Dataset Preparation

Train

Contributors

You might also like...

MMDetection3D is an open source object detection toolbox based on PyTorch

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models

A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.

GeneralOCR is open source Optical Character Recognition based on PyTorch.

MMFlow is an open source optical flow toolbox based on PyTorch

An open source object detection toolbox based on PyTorch

mmfewshot is an open source few shot learning toolbox based on PyTorch

Mmdetection3d Noted - MMDetection3D is an open source object detection toolbox based on PyTorch

Code for "Intra-hour Photovoltaic Generation Forecasting based on Multi-source Data and Deep Learning Methods."

Comments

Add an example how to generate paraphrases

Owner

Afonso Salgado de Sousa

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

LogDeep is an open source deeplearning-based log analysis toolkit for automated anomaly detection.

Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

OpenGAN: Open-Set Recognition via Open Data Generation

GT4SD, an open-source library to accelerate hypothesis generation in the scientific discovery process.

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

Rlmm blender toolkit - A set of tools to streamline level generation in UDK straight from Blender