This repository provides an unified frameworks to train and test the state-of-the-art few-shot font generation (FFG) models.

Clova AI Research

Last update: Dec 27, 2022

Related tags

Deep Learning font machine-learning deep-learning pytorch generative-models font-generation fewshot-font-generation

Overview

FFG-benchmarks

This repository provides an unified frameworks to train and test the state-of-the-art few-shot font generation (FFG) models.

What is Few-shot Font Generation (FFG)?

Few-shot font generation tasks aim to generate a new font library using only a few reference glyphs, e.g., less than 10 glyph images, without additional model fine-tuning at the test time [ref].

In this repository, we do not consider methods fine-tuning on the unseen style fonts.

Sub-documents

docs
├── Dataset.md
├── FTransGAN-Dataset.md
├── Inference.md
├── Evaluator.md
└── models
    ├── DM-Font.md
    ├── FUNIT.md
    ├── LF-Font.md
    └── MX-Font.md

Available models

FUNIT (Liu, Ming-Yu, et al. ICCV 2019) [pdf] [github]: not originally proposed for FFG tasks, but we modify the unpaired i2i framework to the paired i2i framework for FFG tasks.
DM-Font (Cha, Junbum, et al. ECCV 2020) [pdf] [github]: proposed for complete compositional scripts (e.g., Korean). If you want to test DM-Font in Chinese generation tasks, you have to modify the code (or use other models).
LF-Font (Park, Song, et al. AAAI 2021) [pdf] [github]: originally proposed to solve the drawback of DM-Font, but it still require component labels for generation. Our implementation allows to generate characters with unseen component.
MX-Font (Park, Song, et al. ICCV 2021) [pdf] [github]: generating fonts by employing multiple experts where each expert focuses on different local concepts.

Not available here, but you may also consider

EMD (Zhang, Yexun, et al. CVPR 2018) [pdf] [github]
AGIS-Net (Yue, Gao, et al. SIGGRAPH Asia 2019) [pdf] [github]
FTransGAN (Li, Chenhao, et al. WACV 2021) [pdf] [github]

Model overview

Model	Provided in this repo?	Chinese generation?	Need component labels?
EMD (CVPR'18)	X	O	X
FUNIT (ICCV'19)	O	O	X
AGIS-Net (SIGGRAPH Asia'19)	X	O	X
DM-Font (ECCV'20)	O	X	O
LF-Font (AAAI'21)	O	O	O
FTransGAN (WACV'21)	X	O	X
MX-Font (ICCV'21)	O	O	Only for training

Preparing Environments

Requirements

Our code is tested on Python >= 3.6 (we recommend conda) with the following libraries

torch >= 1.5
sconf
numpy
scipy
scikit-image
tqdm
jsonlib-python3
fonttools

Datasets

Korean / Chinese / ...

The full description is in docs/Dataset.md

We allow two formats for datasets:

TTF: We allow using the native true-type font (TTF) formats for datasets. It is storage-efficient and easy-to-use, particularly if you want to build your own dataset.
Images: We also allow rendered images for datasets, similar to ImageFoler (but a modified version). It is convenient when you want to generate a full font library from the un-digitalized characters (e.g., handwritings).

You can collect your own fonts from the following web sites (for non-commercial purpose):

https://www.foundertype.com/index.php/FindFont/index (acknowledgement: DG-Font refers this web site)
https://chinesefontdesign.com/
Any other web sites providing non-commercial fonts

Note that fonts are protected intellectual property and it is unable to release the collected font datasets unless license is cleaned-up. Many font generation papers do not publicly release their own datasets due to this license issue. We also face the same issue here. Therefore, we encourage the users to collect their own datasets from the web, or using the publicly avaiable datasets.

FTransGAN (Li, Chenhao, et al. WACV 2021) [pdf] [github] released the rendered image files for training and evaluating FFG models. We also make our repository able to use the font dataset provided by FTransGAN. More details can be found in docs/FTransGAN-Dataset.md.

Training

We separately provide model documents in docs/models as follows

Generation

Preparing reference images

Detailed instruction for preparing reference images is decribed in here.

Run test

Please refer following documents to train the model:

DM-Font: docs/models/DM-Font.md
LF-Font(Phase 1, 2): docs/models/LF-Font.md
MX-Font: docs/models/MX-Font.md
FUNIT(modified for fonts): docs/models/FUNIT.md

Evaluation

Detailed instructions for preparing evaluator and testing the generated images are decribed in here.

License

This project is distributed under MIT license, except FUNIT and base/modules/modules.py which is adopted from https://github.com/NVlabs/FUNIT.

FFG-benchmarks
Copyright (c) 2021-present NAVER Corp.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

Comments

Problems on the second stage of Chinese training

Hello. Thank you very much for your help on the previous LF-FONT, I am now using this version of the code through your suggestion, the first stage of LF-FONT training achieved good results on Chinese, but the second stage of my training is very bad, the network will become worse and worse. From 10,000 times to 200,000 effects are getting worse and worse

200,000 times 10000 times

opened by zj916716524 19
The results of LFFont are not satisfactory on a custom korean dataset
I am trying to train LFFont on a custom Korean dataset consisting of 60 printed font styles.

For the phase 1 training I use all the default configurations of data (cfgs/data) and LFFont (cfgs/LF/p1) except the batch sizewhich is set to 4 and number of workers are 8 on a single 3080 ti GPU with 12 gigz. The training goes normal for 200k iterations. The results are OK considering that the phase 2 training will further improve it.

When I train the model for the phase 2 I have OOM problem with the default configurations although the default batch size is 1. Finally with some alterations to the p2 default configuration file (default.yaml) where I set the num_workers to 2 and emb_dim to 6 I can train the model. However, the training results are really bad and doesn't seem to get better till 200k iterations (in the paper i think for Korean characters p2 was trained for 50k iterations). I personally assume that probably its down to the embedding dimension i adopted (6) however in the paper it has been shown that it doesn't effect the performance a lot (from 8 to 6).

So, I tried to train p2 with multiple GPUs by only adding the use_ddp in p2/train.yamlto True and gpus_per_node to 3 (in train_LF file) but still i get OOM

Could you please help me in the following

How can I train the model on multiple GPUs? Do I need to change some other parameters other than use_ddp: True, gpus_per_node=3?

Do you think the performance of my trained model is down to lowering the emb_dim and num_workers?

For the phase 2 training do we need to give the --resume value of the p1 last checkpoint?

BTW I trained FUNIT with your provided source on the same dataset and it performs well so definitely not down to the dataset I am using.
opened by ammar-deep 12
the requirement of the font

Is it necessary that the font I use must be a true type or an open type one? There is an error when I input a font made by other organizations into the get_chars_from_ttfs.py.

opened by simbadu1999 10
ImportError: cannot import name 'NSMLWriter' from 'base.utils.writer'
Hi thanks for providing this FFG framework.

I am trying to run MXFont on a custom Korean dataset. When I run the training command I get the following error "ImportError: cannot import name 'NSMLWriter' from 'base.utils.writer'"

I removed the 'NSMLWriter' import and its second reference from the 'base/utils/__init__.py ' file and now its working fine.

Do we really need 'NSMLWriter'?

Seems like the framework only supports grayscale images (not RGB as I got a channel error with RGB dataset). Can we train with RGB character images?
opened by ammar-deep 2
Finding it difficult to get started

Hi, I'm finding it hard to get started, The problem is the config part for the training part, which is highlighted as no 1 for getting started for the training section please help, and Thanks for giving me the opportunity being able to become a Machine learning beginer

opened by yomismith 1
Can I get a trained model for korean?

Hello, I trained 98 korean fonts more than 100,000 times. If I try to convert my own handwriting into letters, the recognition rate drops considerably. If there is a great model that has already been trained, can I get it for the test?

opened by jihong-yu 1
Should generate font images of different styles, but generate fonts of the same style,

When using the FUNIT model, my original style font image i s There are two kinds of target style images, respectively

ser-images.githubusercontent.com/60809733/169986605-226ea8f5-ba73-47e6-bd44-42377ef7a09e.jpg)

But why do the results of the test generate the same target style image,

opened by githubnameoo 1
test

Hello, may I ask why the result of image/result in the training process is good, but the test result is poor, and the images used are the same? The first picture shows the results of the training process and the second shows the results of the test. Here are the test results

opened by githubnameoo 0

Owner

Clova AI Research

Open source repository of Clova AI Research, NAVER & LINE

GitHub

Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

?? Flamingo - Pytorch Implementation of Flamingo, state-of-the-art few-shot visual question answering attention net, in Pytorch. It will include the p

630 Dec 28, 2022

Ever felt tired after preprocessing the dataset, and not wanting to write any code further to train your model? Ever encountered a situation where you wanted to record the hyperparameters of the trained model and able to retrieve it afterward? Models Playground is here to help you do that. Models playground allows you to train your models right from the browser.

Models Playground ??️ Upload a Preprocessed Dataset ?? Choose whether to perform Classification or Regression ?? Enter the Dependent Variable ?

19 Dec 10, 2022

Library of various Few-Shot Learning frameworks for text classification

FewShotText This repository contains code for the paper A Neural Few-Shot Text Classification Reality Check Environment setup # Create environment pyt

47 Jan 3, 2023

Train a state-of-the-art yolov3 object detector from scratch!

TrainYourOwnYOLO: Building a Custom Object Detector from Scratch This repo let's you train a custom image detector using the state-of-the-art YOLOv3 c

616 Jan 8, 2023

Few-shot NLP benchmark for unified, rigorous eval

FLEX FLEX is a benchmark and framework for unified, rigorous few-shot NLP evaluation. FLEX enables: First-class NLP support Support for meta-training

85 Dec 3, 2022

Official repository for Few-shot Image Generation via Cross-domain Correspondence (CVPR '21)

Few-shot Image Generation via Cross-domain Correspondence Utkarsh Ojha, Yijun Li, Jingwan Lu, Alexei A. Efros, Yong Jae Lee, Eli Shechtman, Richard Zh

251 Dec 11, 2022

PyTorch implementation of D2C: Diffuison-Decoding Models for Few-shot Conditional Generation.

D2C: Diffuison-Decoding Models for Few-shot Conditional Generation Project | Paper PyTorch implementation of D2C: Diffuison-Decoding Models for Few-sh

90 Dec 27, 2022

Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"

T-Few This repository contains the official code for the paper: "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learni

220 Dec 31, 2022

Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

Image Classification Project Killer in PyTorch This repo is designed for those who want to start their experiments two days before the deadline and ki

349 Dec 8, 2022

Few-NERD: Not Only a Few-shot NER Dataset

Few-NERD: Not Only a Few-shot NER Dataset This is the source code of the ACL-IJCNLP 2021 paper: Few-NERD: A Few-shot Named Entity Recognition Dataset.

319 Dec 30, 2022

PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+

PaddlePaddle Vision Transformers State-of-the-art Visual Transformer and MLP Models for PaddlePaddle ?? PaddlePaddle Visual Transformers (PaddleViT or

1k Dec 28, 2022

a delightful machine learning tool that allows you to train, test and use models without writing code

igel A delightful machine learning tool that allows you to train/fit, test and use models without writing code Note I'm also working on a GUI desktop

3k Jan 5, 2023

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

152 Jan 2, 2023

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models. Developers can reproduce these SOTA methods and build their own methods.

405 Jan 4, 2023

LWCC: A LightWeight Crowd Counting library for Python that includes several pretrained state-of-the-art models.

LWCC: A LightWeight Crowd Counting library for Python LWCC is a lightweight crowd counting framework for Python. It wraps four state-of-the-art models

39 Dec 28, 2022

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

PySlowFast PySlowFast is an open source video understanding codebase from FAIR that provides state-of-the-art video classification models with efficie

5.3k Jan 3, 2023

TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.

TorchMultimodal (Alpha Release) Introduction TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.

663 Jan 6, 2023

This repository contains the code for using the H3DS dataset introduced in H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction

H3DS Dataset This repository contains the code for using the H3DS dataset introduced in H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction Access

72 Dec 10, 2022

This repository is the code of the paper "Sparse Spatial Transformers for Few-Shot Learning".

?? Sparse Spatial Transformers for Few-Shot Learning This code implements the Sparse Spatial Transformers for Few-Shot Learning(SSFormers). Our code i

38 Dec 13, 2022