A 1.3B text-to-image generation model trained on 14 million image-text pairs

Overview

minDALL-E on Conceptual Captions

minDALL-E, named after minGPT, is a 1.3B text-to-image generation model trained on 14 million image-text pairs for non-commercial purposes.

a painting of a bird in the style of asian painting a photo of san francisco's golden gate bridge in black and white tone

Environment Setup

  • Basic setup
PyTorch == 1.8.0
CUDA >= 10.1
  • Other packages
pip install -r requirements.txt

Model Checkpoint

  • Model structure (two-stage autoregressive model)
    • Stage1: Unlike the original DALL-E [1], we replace Discrete VAE with VQGAN [2] to generate high-quality samples effectively. We slightly fine-tune vqgan_imagenet_f16_16384, provided by the official VQGAN repository, on FFHQ [3] as well as ImageNet.
    • Stage2: We train our 1.3B transformer from scratch on 14 million image-text pairs from CC3M [4] and CC12M [5]. For the more detailed model spec, please see configs/dalle-1.3B.yaml.
  • You can download the pretrained models including the tokenizer from this link. This will require about 5GB space.

Sampling

  • Given a text prompt, the code snippet below generates candidate images and re-ranks them using OpenAI's CLIP [6].
  • This has been tested under a single V100 of 32GB memory. In the case of using GPUs with limited memory, please lower down num_candidates to avoid OOM.
from matplotlib import pyplot as plt
import clip
from dalle.models import Dalle
from dalle.utils.utils import set_seed, clip_score

device = 'cuda:0'
set_seed(0)

prompt = "A painting of a monkey with sunglasses in the frame"
model = Dalle.from_pretrained('minDALL-E/1.3B')  # This will automatically download the pretrained model.
model.to(device=device)

# Sampling
images = model.sampling(prompt=prompt,
                        top_k=256, # It is recommended that top_k is set lower than 256.
                        top_p=None,
                        softmax_temperature=1.0,
                        num_candidates=96,
                        device=device).cpu().numpy()
images = np.transpose(images, (0, 2, 3, 1))

# CLIP Re-ranking
model_clip, preprocess_clip = clip.load("ViT-B/32", device=device)
model_clip.to(device=device)
rank = clip_score(prompt=prompt,
                  images=images,
                  model_clip=model_clip,
                  preprocess_clip=preprocess_clip,
                  device=device)

# Plot images
images = images[rank]
plt.imshow(images[0])
plt.show()

Samples (Top-K=256, Temperature=1.0)

  • "a painting of a {cat, dog} with sunglasses in the frame"

  • "a large {pink, black} elephant walking on the beach"

  • "Eiffel tower on a {desert, mountain}"

Quantitative Results

  • We have validated minDALL-E on the CC3M validation set (in-distribution evaluation) and MS-COCO (zero-shot evaluation).
  • For CC3M, we measure the cosine similarity between image and text representations from the pretrained CLIP model (ViT-B/32), referred to as CLIP-score.
  • For MS-COCO, we compute FID between 30K generated and real samples from MS-COCO 2017, where we randomly choose 30K captions from COCO as in DALL-E. We select the best out of 32 candidates by CLIP re-ranking.
Model CC3M:CLIP-score (higher is better) MS-COCO:FID-30K (lower is better)
VQGAN [2] 0.20 -
ImageBART [7] 0.23 -
DALL-E [1] - 27.5
minDALL-E 0.26 14.7

Transfer Learning Examples

  • minDALL-E, which is pre-trained on noisy text supervisions, could be transferable to class-conditional and unconditional generation tasks. To validate this, we simply fine-tune it on ImageNet over 8 epochs in the case of class-conditional generation and unconditional generation.
  • The commands below fine-tune the pretrained DALL-E. It takes about 36 hours on 8 V100 GPUs.
# unconditinoal image generation for imagenet (256x256)
python examples/transfer_learning_ex.py -d=configs/transfer-imagenet-uncond-gen.yaml
                                        -u=[MODEL_CKPT]
                                        -r=[RESULT_PATH]
                                        --n-gpus=[NUM_GPUS]

# class-conditinoal image generation for imagenet (256x256)
python examples/transfer_learning_ex.py -d=configs/transfer-imagenet-clscond-gen.yaml
                                        -u=[MODEL_CKPT]
                                        -r=[RESULT_PATH]
                                        --n-gpus=[NUM_GPUS]
  • We compute FID-50K between 50K generated samples and all ImageNet training samples, where we use top-k=256 and softmax temperature=1.0 for generation. All results are obtained without the rejection sampling. Interestingly, our model achieves very competitive performance with baselines, even though minDALL-E is fine-tuned in a few epochs.
Model Params FID-50K(class-cond.) FID-50K(uncond.)
VQ-GAN 1.4B 15.78 -
ImageBART 3.5B 21.19 -
minDALL-E 1.3B 15.55 37.58

BibTex

If you find this repository useful in your research, please cite:

@misc{kakaobrain2021minDALL-E,
  title         = {minDALL-E on Conceptual Captions},
  author        = {Saehoon Kim, Sanghun Cho, Chiheon Kim, Doyup Lee, and Woonhyuk Baek},
  year          = {2021},
  howpublished  = {\url{https://github.com/kakaobrain/minDALL-E}},
}

References

  • [1] Ramesh et al. Zero-Shot Text-to-Image Generation, ICML 2021.
  • [2] Esser et al. Taming Transformers for High-Resolution Image Synthesis, CVPR 2021.
  • [3] Karras et al. A Style-Based Generator Architecture for Generative Adversarial Networks, CVPR 2019.
  • [4] Sharma et al. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning, ACL 2018.
  • [5] Changpinyo et al. Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts, CVPR 2021.
  • [6] Radford et al. Learning Transferable Visual Models From Natural Language Supervision, ICML 2021.
  • [7] Esser et al. ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis, NeurIPS 2021.
  • [8] https://github.com/karpathy/minGPT

Licenses

  • The source codes are licensed under Apache 2.0 License.
  • The stage2 pretrained weights are licensed under CC-BY-NC-SA 4.0 License.

Contact

We hope that minDALL-E helps various projects in research-oriented institutes and startups. If you would like to collaborate with us or share a feedback, please e-mail to us, [email protected]

Limitations

Although minDALL-E is trained on a small set (14M image-text pairs), this might be vulnerable to malicious attacks from the prompt engineering to generate socially unacceptable images. If you obersve these images, please report the "prompt" and "generated images" to us.

Comments
  •  Does zero-shot work in minDALL-E?

    Does zero-shot work in minDALL-E?

    Thanks for your amazing work!

    I'm attempting zero-shot image-to-image translation, as described in the original paper, by inserting only half of the image. The outcomes are as follows. Will this problem be solved if I increase the size of the model?

    스크린샷 2021-12-15 오후 8 22 23
    opened by SeungyounShin 3
  • text token index slice to N-1

    text token index slice to N-1

    Hi, thanks for sharing the code.

    In the forward function of Transformer1d, text index is sliced with 0 ~ N-2 and image index is sliced with N-1 ~ N-1 + (T-1).

    B, T = images.shape
    _, N = texts.shape
    ...
    x = torch.cat([texts, images], axis=1).contiguous()
    ...
    texts = x[:, :N-1].contiguous()
    images = x[:, N-1:-1].contiguous()
    

    Could you please clarify why you didn't slice like below? Thanks!

    texts = x[:, :N]
    images = x[:, N:]
    
    opened by j-min 2
  • CUDA out-of-memory

    CUDA out-of-memory

    Hi, It is mentioned in the "Transfer Learning Examples" section that you fine-tuned the pre-trained DALL-E on 8 V100 GPUs. I tried running you transfer_learning_ex.py script on V100 GPUs (16GB GPU memory per CPU). It throws CUDA OOM error. Can you please share the exact specs of the hardware you used for this?

    opened by smittal10 1
  • Comparison against GLIDE

    Comparison against GLIDE

    Recently Open AI posted GLIDE, a diffusion model made for generating images from text, much like DALL-E.

    Would it be possible to compare minDALL-E to GLIDE and put the results on the github?

    Thank you in advance!

    Also I have to say this is amazing!

    opened by MyUsernamee 1
  • Amazing work; models CDN?

    Amazing work; models CDN?

    Hi there! Just want to quickly congratulate all the effort done in this project!

    Will the models / tokenizers also be stored in Github's releases binary? It could be good as a backup / alternative.

    opened by johnpaulbin 1
  • sampling in GPU with 12 GB memory

    sampling in GPU with 12 GB memory

    I found that sampling code examples/sampling_ex.py fails to save the image if the num_candiates is smaller than 16.

    It is due to the value 16 is hardcoded in line 61, for i in range(16):

    The below modification works for lower num_candidates value. for i in range(min(16, args.num_candidates)):

    opened by tackgeun 0
  • Project dependencies may have API risk issues

    Project dependencies may have API risk issues

    Hi, In minDALL-E, inappropriate dependency versioning constraints can cause risks.

    Below are the dependencies and version constraints that the project is using

    torch==1.8.0
    torchvision>=0.8.2
    tokenizers>=0.10.2
    pyflakes>=2.2.0
    tqdm>=4.46.0
    pytorch-lightning>=1.5
    einops
    omegaconf
    git+https://github.com/openai/CLIP.git
    matplotlib
    

    The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict. The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.

    After further analysis, in this project, The version constraint of dependency tqdm can be changed to >=4.36.0,<=4.64.0.

    The above modification suggestions can reduce the dependency conflicts as much as possible, and introduce the latest version as much as possible without calling Error in the projects.

    The invocation of the current project includes all the following methods.

    The calling methods from the tqdm
    tqdm.tqdm.set_description
    tqdm.tqdm
    
    The calling methods from the all methods
    self.resid_drop
    torch.cuda.manual_seed_all
    PIL.Image.fromarray
    PIL.Image.fromarray.save
    ExpConfig
    self.key
    hashlib.md5
    module.weight.data.normal_
    self.head
    pytorch_lightning.loggers.TensorBoardLogger
    self.lr_schedulers.get_last_lr
    text_features.image_features.F.cosine_similarity.squeeze
    W.B.device.H.torch.arange.repeat.transpose
    numpy.transpose
    min
    argparse.ArgumentParser.add_argument
    self.quantize.get_codebook_entry
    self.v
    sorted_idx_remove_cond.scatter
    self.quant_conv
    RuntimeError
    self.apply
    ImageNetDataModule
    self.sos.repeat
    pytorch_lightning.Trainer.fit
    torchvision.transforms.Compose
    self.stage2.sos
    AttnBlock
    model.stage1.from_ckpt
    from_file
    reversed
    get_positional_encoding
    datetime.datetime.now
    tokens.to.unsqueeze
    torch.nn.functional.cosine_similarity
    probs.torch.multinomial.clone
    self.encode
    pl_module.stage1
    self.down.append
    Normalize
    self.mid.block_1
    download
    self.conv1
    Downsample
    z_q.permute.contiguous
    self.conv
    OptConfig
    torch.nn.functional.pad
    Stage1Hparams
    self.embedding
    super
    w_.permute.permute
    i.images.astype
    source.info.get
    from_file.enable_truncation
    self.norm2
    random.seed
    numpy.random.seed
    os.path.expanduser
    x.self.query.view
    codes.device.T.torch.arange.repeat
    layers.Block
    device.args.num_candidates.args.softmax_temperature.args.top_p.args.top_k.args.prompt.model.sampling.cpu
    self.conv_in
    device.H.torch.arange.repeat
    self.mlp.transpose
    cutoff_topp_probs.masked_fill
    self.norm1
    k.reshape.reshape
    torch.cuda.amp.autocast
    x.contiguous.contiguous
    loop.update
    argparse.ArgumentParser.parse_args
    prompt.clip.tokenize.to
    self.tok_emb_txt
    device.args.num_candidates.args.softmax_temperature.args.top_p.args.top_k.args.prompt.model.sampling.cpu.numpy
    Stage2Hparams
    os.path.dirname
    torch.tril
    self.ln1
    pytorch_lightning.callbacks.ModelCheckpoint
    cnt.code_.unsqueeze
    model_clip.encode_text
    y.transpose.contiguous.view
    ImageNetDataModule.setup
    tuple
    enumerate
    torch.nn.Linear
    self.resid_drop.transpose
    tokenizer.build_tokenizer
    i_block.i_level.self.down.attn
    self.register_buffer
    self.dropout
    torchvision.utils.make_grid
    self.mid.attn_1
    x.self.value.view
    torch.randn
    output.write
    self.pos_emb_img
    self.n_heads.C.self.n_heads.B.T.x.self.key.view.transpose
    self.ln2
    self.nin_shortcut
    self.stage2.eval
    self.lr_schedulers.step
    self.blocks
    os.path.abspath
    model.stage2.from_ckpt
    torch.multinomial
    self.encoder
    quant.permute.permute
    min_encoding_indices.self.embedding.view
    torch.nn.functional.interpolate
    labels.self.sos.unsqueeze
    print
    torchvision.transforms.Normalize
    sys.path.append
    self.decoder
    torch.einsum
    self.norm_out
    torch.optim.AdamW
    images.self.stage1.get_codes.detach.view
    MultiHeadSelfAttention
    einops.rearrange
    urllib.parse.urlparse
    stage2.transformer.Transformer1d
    self.stage1.get_codes
    DataConfig
    self.drop
    omegaconf.OmegaConf.structured
    dalle.models.Dalle.from_pretrained.sampling
    preprocess_clip
    images.torch.stack.to
    tqdm.tqdm.set_description
    utils.config.get_base_config
    tqdm.tqdm
    x.self.key.view
    self.n_heads.C.self.n_heads.B.T.x.self.query.view.transpose
    torch.cat.clone
    self.decode
    self.stage2
    self.query
    i_level.self.up.upsample
    urllib.request.urlopen
    torch.nn.ModuleList.append
    self.conv2
    source.info
    self.n_heads.C.self.n_heads.B.T.x.self.value.view.transpose
    self.lr_schedulers
    layers.Encoder
    tarfile.open
    images.self.stage1.get_codes.detach
    model_clip.encode_image
    cutoff_topk_logits
    utils.sampling.sampling
    torch.nn.Sequential
    torch.nn.ModuleList
    setup_callbacks
    self.value
    tokens.to.to
    self.log
    math.sqrt
    isinstance
    omegaconf.OmegaConf.merge
    open
    torch.cat
    torch.ones
    torch.topk
    self.proj_out.reshape
    torch.argmin
    self.q
    self.stage1.parameters
    os.path.join
    os.path.exists
    torch.utils.data.DataLoader
    self.embedding.weight.data.uniform_
    scores.torch.argsort.cpu
    torch.nn.Module
    cutoff_topk_logits.to
    dalle.utils.utils.clip_score
    int
    cutoff_topk_logits.clone
    N.x.contiguous
    f.extract
    torch.stack
    torch.sort
    self.attn_drop.masked_fill
    torchvision.datasets.ImageNet
    torchvision.transforms.CenterCrop
    optimizer.step
    download_target.open.read
    cnt.pos_enc_code_.unsqueeze
    args.config_downstream.os.path.basename.split
    self
    torch.optim.lr_scheduler.CosineAnnealingLR
    stage1.vqgan.VQGAN
    ValueError
    torch.argsort
    Stage1Config
    range
    torch.nn.functional.avg_pool2d
    omegaconf.OmegaConf.load
    self.sos
    x.transpose.contiguous
    torch.manual_seed
    os.path.isfile
    image.astype
    present.torch.stack.clone
    pl_module.logger.experiment.add_image
    os.path.basename
    ImageLogger
    self.stage1.eval
    pytorch_lightning.seed_everything
    torch.cat.size
    v.reshape.reshape
    sos.self.stage2.sos.unsqueeze
    torchvision.transforms.Resize
    url.split
    clip.tokenize
    datetime.datetime.now.strftime
    device.W.torch.arange.repeat
    torch.nn.Conv2d
    torch.nn.LayerNorm
    dalle.utils.utils.set_seed
    cls_idx.torch.LongTensor.to
    torch.nn.functional.softmax
    i_block.i_level.self.up.attn
    ResnetBlock
    torch.nn.functional.cross_entropy
    probs.torch.multinomial.clone.detach
    float
    images.texts.torch.cat.contiguous
    f.getmembers
    z_q.permute.contiguous.view
    dalle.models.Dalle.from_pretrained
    source.read
    VectorQuantizer
    pytorch_lightning.Trainer
    torch.sigmoid
    self.tok_emb_img
    i_block.i_level.self.down.block
    torch.clamp
    self.tokenizer.encode
    h.self.quantize.view
    self.conv_out
    nonlinearity
    model_clip.to
    self.ln_f
    q.permute.reshape
    torch.arange
    self.load_state_dict
    q.permute.permute
    self.k
    functools.partial
    torch.sum
    self.stage2.sos.repeat
    self.norm
    self.mid.block_2
    self.head_txt
    cls
    utils.realpath_url_or_path
    torch.load
    torch.no_grad
    format
    past.append
    torchvision.transforms.ToTensor
    device.N.torch.arange.repeat
    presents.append
    self.stage1.decode_code
    self.quantize
    from_file.token_to_id
    os.makedirs
    self.pos_emb_txt
    torch.nn.Embedding
    utils.sampling.sampling_igpt
    code.clone.detach
    dalle.models.ImageGPT.from_pretrained
    z_q.permute.contiguous.permute
    torchvision.transforms.RandomCrop
    self.attn
    Upsample
    stage2.transformer.iGPT
    self.post_quant_conv
    torch.cumsum
    super.__init__
    download_target.open.read.hashlib.md5.hexdigest
    self.proj_out
    i_level.self.down.downsample
    h.sos.torch.cat.contiguous
    ImageNetDataModule.train_dataloader
    self.stage2.view
    self.head_img
    self.proj
    ImageNetDataModule.valid_dataloader
    self.parameters
    len
    z.rearrange.contiguous
    torch.clip
    torch.nn.GroupNorm
    torch.nn.Parameter
    model.sampling
    argparse.ArgumentParser
    torch.nn.Dropout
    sorted_idx_remove_cond.clone
    block.sample
    torch.LongTensor
    self.log_img
    from_file.enable_padding
    torch.bmm
    self.mlp
    self.conv_shortcut
    y.transpose.contiguous
    recons.cpu.cpu
    module.bias.data.zero_
    GELU
    self.up.insert
    dataclasses.field
    module.weight.data.fill_
    clip.load
    torch.nn.functional.gelu
    i_block.i_level.self.up.block
    present.torch.stack.clone.detach
    from_file.add_special_tokens
    Stage2Config
    torch.repeat_interleave
    dalle.models.Dalle.from_pretrained.to
    layers.Decoder
    scores.torch.argsort.cpu.numpy
    cutoff_topp_probs
    self.mask.torch.tril.view
    sos.self.stage2.sos.unsqueeze.repeat
    torch.cat.transpose
    images.cpu.cpu
    self.attn_drop
    quant.rearrange.contiguous
    z.rearrange.contiguous.view
    

    @developer Could please help me check this issue? May I pull a request to fix it? Thank you very much.

    opened by PyDeps 0
  • How to do inference from half image

    How to do inference from half image

    Hi I want to know if the code can do the inference when we input the text and half of the image like iGPT and Taming Transformer? If possible, would you mind pointing to the relevance code for this.

    opened by thuangb 0
  • Increasing positional embeddings text

    Increasing positional embeddings text

    I am finetuning the minDALL-E model on a self-made dataset but my tokenized text prompts are sometimes longer than 64. What would be the best technique to increase the length of the positional encodings to e.g. 128? I was thinking of keeping the original 64 embeddings and appending 64 more, which have to be trained from scratch. However, I think it might mess with the finetuning, since the embeddings are in the very first layer.

    Are there better options/techniques to accomplish this?

    opened by ChristiaensBert 0
  • How much VRAM is needed for this?

    How much VRAM is needed for this?

    I was trying to run the sampling_ex.py, but no matter how low I set the num_candidates value (even if it's set to one or two), it always tells me that it has run out of memory. I am using an NVIDIA Quadro M5000 with 8 GB of VRAM.

    opened by mjohanning99 2
  • Script for VQGAN Finetuning

    Script for VQGAN Finetuning

    This is an incredible project! For reproducibility, and for some of my own work, would you mind sharing/pointing me to code for fine-tuning VQGAN models (e.g., vqgan_imagenet_f16_16384) on custom datasets? This would be different than code for training VQGAN from scratch on different datasets.

    Additionally, how long does fine-tuning take?

    opened by siddk 0
Owner
Kakao Brain
Kakao Brain Corp.
Kakao Brain
Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh

Arjun Majumdar 44 Dec 14, 2022
Annotate datasets with a semi-trained or fully trained YOLOv5 model

YOLOv5 Auto Annotator Annotate datasets with a semi-trained or fully trained YOLOv5 model Prerequisites Ubuntu >=20.04 Python >=3.7 System dependencie

Akash James 3 May 14, 2022
Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

We challenge a common assumption underlying most supervised deep learning: that a model makes a prediction depending only on its parameters and the features of a single input. To this end, we introduce a general-purpose deep learning architecture that takes as input the entire dataset instead of processing one datapoint at a time.

OATML 360 Dec 28, 2022
Image-generation-baseline - MUGE Text To Image Generation Baseline

MUGE Text To Image Generation Baseline Requirements and Installation More detail

null 23 Oct 17, 2022
A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

Awesome Pretrained StyleGAN2 A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution. Note the readme is a

Justin 1.1k Dec 24, 2022
Official repository for "PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation"

pair-emnlp2020 Official repository for the paper: Xinyu Hua and Lu Wang: PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long

Xinyu Hua 31 Oct 13, 2022
LIAO Shuiying 6 Dec 1, 2022
Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

InversePrompting Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting Code: The code is provided in the "chinese_ip"

THUDM 101 Dec 16, 2022
CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

CPT This repository contains code and checkpoints for CPT. CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Gener

fastNLP 341 Dec 29, 2022
Pre-trained model, code, and materials from the paper "Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation" (MICCAI 2019).

Adaptive Segmentation Mask Attack This repository contains the implementation of the Adaptive Segmentation Mask Attack (ASMA), a targeted adversarial

Utku Ozbulak 53 Jul 4, 2022
PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence) and pre-trained model on ImageNet dataset

Reference-Based-Sketch-Image-Colorization-ImageNet This is a PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization usin

Yuzhi ZHAO 11 Jul 28, 2022
Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

GLIDE This is the official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing w

OpenAI 2.9k Jan 4, 2023
a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

RNN-Playwrite a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LS

Arno Barton 1 Oct 29, 2021
Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

Kimio Kuramitsu 1 Dec 13, 2021
Repository to run object detection on a model trained on an autonomous driving dataset.

Autonomous Driving Object Detection on the Raspberry Pi 4 Description of Repository This repository contains code and instructions to configure the ne

Ethan 51 Nov 17, 2022
Chinese clinical named entity recognition using pre-trained BERT model

Chinese clinical named entity recognition (CNER) using pre-trained BERT model Introduction Code for paper Chinese clinical named entity recognition wi

Xiangyang Li 109 Dec 14, 2022
RoBERTa Marathi Language model trained from scratch during huggingface 🤗 x flax community week

RoBERTa base model for Marathi Language (मराठी भाषा) Pretrained model on Marathi language using a masked language modeling (MLM) objective. RoBERTa wa

Nipun Sadvilkar 23 Oct 19, 2022
The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment

Hailo Model Zoo The Hailo Model Zoo provides pre-trained models for high-performance deep learning applications. Using the Hailo Model Zoo you can mea

Hailo 50 Dec 7, 2022