🇰🇷 Text to Image in Korean

HappyFace

Last update: Sep 22, 2022

Related tags

Overview

KoDALLE

Utilizing pretrained language model’s token embedding layer and position embedding layer as DALLE’s text encoder.

Background

Training DALLE model from scratch demands large size paired dataset of images and captions. For example, OpenAI DALLE is trained with more than 250 million text-image pairs for the training.
If the dataset isn’t large enough or is limited to specific domains, number of vocabularies in the trained DALLE model are insufficient. For instance, 1 million text captions of K-Fashion dataset only consists of more or less than 300 tokens.
Therefore, inferencing from such DALLE models could be problematic if the given sentence query is unconnected to the originally trained captions’ text dataset.

KoDALLE's Result on Small Size Fashion Dataset

	OpenAI’s DALLE	KoDALLE of HappyFace
Train Dataset Size	250 Million Pairs	0.8 Million Pairs
#Params	12 Billion	428 Million
#Layers	64 Layers	16 Layers
Computing Resource	1024 x V100 16GB	1 x V100 32GB
Text Encoder	16384 Vocab x 512 Dim BPE	32000 Vocab x 1024 Dim klue/roberta-large
Image Encoder	VQVAE	VQGAN
Optimizer	AdamW	AdamW
Learning Rate	4.5e-5	3.0e-5
Weight Decay	4.5e-3	3.0e-3
LR Scheduler	ReduceLROnPlateau	-

The team constructed Text to Fashion Design DALLE model in Korean language with less than 100k text-image sampled pairs.


Caption	하의에서 색상은 스카이블루이다. 상의에서 기장은 롱이다. 색상은 화이트이다. 카테고리는 블라우스이다. 디테일에는 셔링이다. 소매기장은 반팔이다. 소재에는 실크이다. 프린트에는 무지이다. 넥라인은 브이넥이다. 핏은 노멀
Generated Image


Caption	아우터는 색상이 카키 소재가 우븐 핏이 루즈인 코트이다. 하의는 색상이 네이비 소재가 데님 핏이 스키니인 청바지이다.
Generated Image


Caption	하의에서 기장은 발목이다. 색상은 블루이다. 카테고리는 스커트이다. 소재에는 데님이다. 핏은 와이드이다. 상의에서 색상은 화이트이다. 카테고리는 블라우스이다. 디테일에는 셔링이다. 소매기장은 반팔이다. 소재에는 우븐이다.
Generated Image


Caption	상의에서 기장은 노멀이다. 상의에서 색상은 화이트이다. 상의에서 서브색상은 블랙이다. 상의에서 카테고리는 티셔츠이다. 상의에서 소매기장은 반팔이다. 상의에서 소재에는 저지이다. 상의에서 프린트에는 레터링이다. 상의에서 넥라인은 라운드넥이다. 상의에서 핏은 루즈이다.
Generated Image

Methodology

Experimentations were conducted with the following Korean Transformers Models’ embedding layers. The team selected klue/roberta-large as baseline in the repository considering the size of the model.

klue/roberta-large: Vocab Size of 32000, Embedding Dimension of 1024.
KoGPT Trinity of SKT: Vocab Size of 51200, Embedding Dimension of 1920.
KoGPT of Kakao Brain: Vocab Size of 64512, Embedding Dimension of 4096.

KoDALLE with klue/roberta-large's wpe and wte which is trainable on 16GB GPU Google Colab environment. Hyperparams related to the DALLE's model size are following.

'BATCH_SIZE': 32
'DEPTH': 2
'TEXT_SEQ_LEN': 128
'VOCAB_SIZE': 32000
'MODEL_DIM': 1024
'ATTN_TYPES': 'full'
'DIM_HEAD': 64
'HEADS': 8

DALLE model is composed on lucidrain's DALLE-pytorch
Image encoder is constructed based on VQGAN(Taming Transformers)

Significance

Offers promising result for training from scratch on specific domains with small size dataset.
Introduces solution for domain specific DALLE & CLIP models to be robust on input sentence.
Recommends adequate text-to-image model size for given computation resource.
Suggests effortless method of creating DALLE & CLIP model for own languages if pretrained language model is available.

WIP

Add image-caption reranker(EfficientNet + Klue/roberta-large)
Model trained with 500k text-image pairs.
Modulize in python code.
Update Inference code.
Update FID and IS metrics on test and validation dataset.

Comments

Koclip apply in KoDALLE

변경사항

add) model.py

현수님의 KoCLIP이 DALLE Roberta 에서 작동하게끔 코드를 수정한 파일입니다.

dev branch에 존재하는 model.py 비교하면서 수정이 필요합니다.

add) generate.ipynb

KoCLIP이 작동하는것을 볼 수 있도록 만든 코드입니다.

opened by JoonHong-Kim 1
add: KoCLIP codes
변경사항:

refactor) clipmodel.py

CLIPModel 최종 버전으로 수정

clip folder로 이동

add) clip/train_clip.py

CLIP 모델 학습에 사용한 코드입니다

add) clip/dataloader.py

CLIP 모델 학습에 사용한 dataloader 함수입니다.
opened by shawnhyeonsoo 0
add skip_sample in TextImageDataset
변경사항

modify) loader.py

TextImageDataset에서 texts, image를 불러올 때, data가 없을 경우 발생하는 에러 처리

skip_sample 함수를 활용하여 error가 발생할 경우, random 혹은 다음 index로 변환하여 skip

기존 train_dalle_gpt_roberta.py를 바탕으로 수정
opened by jjonhwa 0

Releases(v0.1.0-beta)

v0.1.0-beta(Dec 26, 2021)

update codes
Source code(tar.gz)
Source code(zip)
v0.1.0-alpha(Dec 26, 2021)
First draft release of KoDALLE. Currently work in progress are:

Connecting Kodalle and KoCLIP

Removing unused functions in KoDALLE

Source code(tar.gz)
Source code(zip)

[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach This is the repo to host the dataset TextSeg and code for TexRNe

174 Dec 19, 2022

BARTScore: Evaluating Generated Text as Text Generation

This is the Repo for the paper: BARTScore: Evaluating Generated Text as Text Generation Updates 2021.06.28 Release online evaluation Demo 2021.06.25 R

196 Dec 17, 2022

Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Text-AutoAugment (TAA) This repository contains the code for our paper Text AutoAugment: Learning Compositional Augmentation Policy for Text Classific

105 Jan 3, 2023

a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

RNN-Playwrite a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LS

1 Oct 29, 2021

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

1 Dec 13, 2021

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task

Siamese Deep Neural Networks for Semantic Text Similarity PyTorch A repository c

32 Dec 15, 2022

Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Automatic Number Plate Recognition Automatic Number Plate Recognition (ANPR) is the process of reading the characters on the plate with various optica

52 Dec 22, 2022

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Deep Daze mist over green hills shattered plates on the grass cosmic love and attention a time traveler in the crowd life during the plague meditative

4.4k Jan 3, 2023

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

DALL-E in Pytorch Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch. It will also contain CLIP for ranking the ge

5k Jan 4, 2023

🇰🇷 Text to Image in Korean

Related tags

Overview

KoDALLE

Background

KoDALLE's Result on Small Size Fashion Dataset

Methodology

Significance

WIP

You might also like...

[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

BARTScore: Evaluating Generated Text as Text Generation

Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task

Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Comments

Koclip apply in KoDALLE

변경사항

add) model.py

add) generate.ipynb

add: KoCLIP codes

변경사항:

refactor) clipmodel.py

add) clip/train_clip.py

add) clip/dataloader.py

add skip_sample in TextImageDataset

변경사항

modify) loader.py

Releases(v0.1.0-beta)

v0.1.0-beta(Dec 26, 2021)

v0.1.0-alpha(Dec 26, 2021)

Owner

HappyFace

A 1.3B text-to-image generation model trained on 14 million image-text pairs

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022)

Image-generation-baseline - MUGE Text To Image Generation Baseline

Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

Pytorch implementation for "Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter".