Generating new names based on trends in data using GPT2 (Transformer network)

Gustav Lang Moesmand

Last update: Jan 10, 2022

Related tags

Text Data & NLP mlopsnamegenerator

Overview

MLOpsNameGenerator

Overall Goal

The goal of the project is to develop a model that is capable of creating Pokémon names based on its description, using principles orginization and version control, reproduceability, etc.

Framework

The framework we use is Transformer. We intend to use the Natural Language Processing (NLP) part of the framework. The model we are going to use is GPT-2 doing finetuning over it so we can specialize it over our precise problem.

Data

Initially, we pretend to use the description of each Pokémon using the PokéAPI, which is a RESTful API linked to a database of details of Pokémon.

Relevant querys to the API:

Obtain the list of all Pokémon:

https://pokeapi.co/api/v2/pokedex/national

Get the description of each Pokémon:

https://pokeapi.co/api/v2/pokemon-species/{PKMN_SPECIE_NUMBER}

Commands

make requirements: Installs all requirements from requirements.txt.
make devrequirements: Installs additional dependencies for development.
make datafolders: Creates folders for the data in the project (data/raw, data/processed, data/external and data/interim)
make data: Downloads and process the data.
make clean: Deletes compiled Python files
make train: Trains model
make deploy: Uploads the updates cleaning and fixing style

RoadMap

Week 1

Goal of this week is to setup the project. This includes: Setting up the makefile, setting up the first model and a script for training the model, fetching the data required to train the models, setting up hydra to test with hyperparameters and setting up docker for containerization.

Alba	Alejandro	Gustav
Data obtaining and processing	Test usage of GPT-2	Develop model using GPT-2
Hydra and config. files	Review and change structure of the train script	-
Add wandb to log training progress	Do predict script	-

Week 2

Week3

Project Organization

├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── docs               <- A default Sphinx project; see sphinx-doc.org for details
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── setup.py           <- makes project pip installable (pip install -e .) so src can be imported
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └── make_dataset.py
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling
│   │   └── build_features.py
│   │
│   ├── models         <- Scripts to train models and then use trained models to make
│   │   │                 predictions
│   │   ├── predict_model.py
│   │   └── train_model.py
│   │
│   └── visualization  <- Scripts to create exploratory and results oriented visualizations
│       └── visualize.py
│
└── tox.ini            <- tox file with settings for running tox; see tox.readthedocs.io

Cites and references

PokéAPI

Movie name generation with GPT-2

Huggingface transformers

Huggingface notebooks

NameKrea An AI That Generates Domain Names

DTU Course 02476 - Machine Learning Operations

Project based on the cookiecutter data science project template. #cookiecutterdatascience

You might also like...

Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

textgenrnn Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code, or quickly tr

4.3k Feb 18, 2021

An extensive UI tool built using new data scraped from BBC News

BBC-News-Analyzer An extensive UI tool built using new data scraped from BBC New

1 Dec 31, 2021

Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form.

Neural G2P to portuguese language Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written for

11 Nov 16, 2022

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

Word2Wave is a simple method for text-controlled GAN audio generation. You can either follow the setup instructions below and use the source code and CLI provided in this repo or you can have a play around in the Colab notebook provided. Note that, in both cases, you will need to train a WaveGAN model first

91 Dec 23, 2022

Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Data Augmentation using Pre-trained Transformer Models Code associated with the Data Augmentation using Pre-trained Transformer Models paper Code cont

44 Dec 31, 2022

Transformer-based Text Auto-encoder (T-TA) using TensorFlow 2.

T-TA (Transformer-based Text Auto-encoder) This repository contains codes for Transformer-based Text Auto-encoder (T-TA, paper: Fast and Accurate Deep

13 Dec 13, 2022

This code extends the neural style transfer image processing technique to video by generating smooth transitions between several reference style images

Neural Style Transfer Transition Video Processing By Brycen Westgarth and Tristan Jogminas Description This code extends the neural style transfer ima

110 Jan 7, 2023

Source code for AAAI20 "Generating Persona Consistent Dialogues by Exploiting Natural Language Inference".

Generating Persona Consistent Dialogues by Exploiting Natural Language Inference Source code for RCDG model in AAAI20 Generating Persona Consistent Di

16 Oct 8, 2022

This repository contains the code for "Generating Datasets with Pretrained Language Models".

Datasets from Instructions (DINO 🦕 ) This repository contains the code for Generating Datasets with Pretrained Language Models. The paper introduces

154 Jan 1, 2023

Comments

Make tests

Initial test finished. DO not know what else to actually try, as we are using the Huggingface pretrained model, there is not much test that I can do to the training. Maybe I could try to implicitly test the model and tokenizer, but I don't know if that's a good idea, as is something we didn't develop

opened by Pheithar 0

Generating new names based on trends in data using GPT2 (Transformer network)

Related tags

Overview

MLOpsNameGenerator

Overall Goal

Framework

Data

Commands

RoadMap

Week 1

Week 2

Week3

Project Organization

Cites and references

You might also like...

Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

An extensive UI tool built using new data scraped from BBC News

Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form.

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Transformer-based Text Auto-encoder (T-TA) using TensorFlow 2.

This code extends the neural style transfer image processing technique to video by generating smooth transitions between several reference style images

Source code for AAAI20 "Generating Persona Consistent Dialogues by Exploiting Natural Language Inference".

This repository contains the code for "Generating Datasets with Pretrained Language Models".

Comments

Make tests

Owner

Gustav Lang Moesmand

NSFW A chatbot based on GPT2-chitchat

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 B) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Gpt2-WebAPI - The objective of this API is to provide the 3 best possible responses to sentences that the user would input via http GET request as a parameter

This simple Python program calculates a love score based on your and your crush's full names in English

Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

Generating new names based on trends in data using GPT2 (Transformer network)

Related tags

Overview

MLOpsNameGenerator

Overall Goal

Framework

Data

Commands

RoadMap

Week 1

Week 2

Week3

Project Organization

Cites and references

You might also like...

Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

An extensive UI tool built using new data scraped from BBC News

Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form.

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Transformer-based Text Auto-encoder (T-TA) using TensorFlow 2.

This code extends the neural style transfer image processing technique to video by generating smooth transitions between several reference style images

Source code for AAAI20 "Generating Persona Consistent Dialogues by Exploiting Natural Language Inference".

This repository contains the code for "Generating Datasets with Pretrained Language Models".

Comments

Make tests

Owner

Gustav Lang Moesmand

**NSFW** A chatbot based on GPT2-chitchat

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 B) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Gpt2-WebAPI - The objective of this API is to provide the 3 best possible responses to sentences that the user would input via http GET request as a parameter

This simple Python program calculates a love score based on your and your crush's full names in English

Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

NSFW A chatbot based on GPT2-chitchat