Per-Pixel Classification is Not All You Need for Semantic Segmentation

Facebook Research

Last update: Jan 8, 2023

Related tags

Deep Learning MaskFormer

Overview

MaskFormer: Per-Pixel Classification is Not All You Need for Semantic Segmentation

Bowen Cheng, Alexander G. Schwing, Alexander Kirillov

[arXiv] [Project] [BibTeX]

Features

Better results while being more efficient.
Unified view of semantic- and instance-level segmentation tasks.
Support major semantic segmentation datasets: ADE20K, Cityscapes, COCO-Stuff, Mapillary Vistas.
Support ALL Detectron2 models.

Installation

See installation instructions.

Getting Started

See Preparing Datasets for MaskFormer.

See Getting Started with MaskFormer.

Model Zoo and Baselines

We provide a large set of baseline results and trained models available for download in the MaskFormer Model Zoo.

License

Shield:

The majority of MaskFormer is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

However portions of the project are available under separate license terms: Swin-Transformer-Semantic-Segmentation is licensed under the MIT license.

Citing MaskFormer

If you use MaskFormer in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@article{cheng2021maskformer,
  title={Per-Pixel Classification is Not All You Need for Semantic Segmentation},
  author={Bowen Cheng and Alexander G. Schwing and Alexander Kirillov},
  journal={arXiv},
  year={2021}
}

Comments

Result in log file.

The test result (PQ = 40.5) in https://dl.fbaipublicfiles.com/maskformer/panoptic-coco/maskformer_panoptic_R50_bs64_554k/metrics.json is different from the result (PQ=46.5)in table.

opened by chhluo 9

AttributeError: Attribute 'ignore_label' does not exist in the metadata of dataset 'ade20k_sem_seg_train'.

I ran the following command.

./train_net.py \
  --config-file configs/ade20k-150/maskformer_R50_bs16_160k.yaml \
  --num-gpus 1 SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0001

Then I got the following error. What should I do?

Traceback (most recent call last):
  File "train_net.py", line 264, in <module>
    launch(
  File "/home/keiichi.kuroyanagi/.pyenv/versions/3.8.6-mask-former/lib/python3.8/site-packages/detectron2/engine/launch.py", line 62, in launch
    main_func(*args)
  File "train_net.py", line 256, in main
    trainer = Trainer(cfg)
  File "/home/keiichi.kuroyanagi/.pyenv/versions/3.8.6-mask-former/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 312, in __init__
    data_loader = self.build_train_loader(cfg)
  File "train_net.py", line 107, in build_train_loader
    mapper = MaskFormerSemanticDatasetMapper(cfg, True)
  File "/home/keiichi.kuroyanagi/.pyenv/versions/3.8.6-mask-former/lib/python3.8/site-packages/detectron2/config/config.py", line 181, in wrapped
    explicit_args = _get_args_from_config(from_config_func, *args, **kwargs)
  File "/home/keiichi.kuroyanagi/.pyenv/versions/3.8.6-mask-former/lib/python3.8/site-packages/detectron2/config/config.py", line 238, in _get_args_from_config
    ret = from_config_func(*args, **kwargs)
  File "/mnt/sdb1/lost+found/clones/Study-MaskFormer/mask_former/data/dataset_mappers/mask_former_semantic_dataset_mapper.py", line 87, in from_config
    ignore_label = meta.ignore_label
  File "/home/keiichi.kuroyanagi/.pyenv/versions/3.8.6-mask-former/lib/python3.8/site-packages/detectron2/data/catalog.py", line 126, in __getattr__
    raise AttributeError(
AttributeError: Attribute 'ignore_label' does not exist in the metadata of dataset 'ade20k_sem_seg_train'. Available keys are dict_keys(['name', 'stuff_classes', 'image_root', 'sem_seg_root', 'evaluator_type', 'stuff_colors']).
⋊> ~/l/c/Study-MaskFormer on develop

opened by Keiku 7

How to train instance segmentation with COCO dataset?

Hello, thank you for your innovative work and code. I have seen the code for instance segmentation experiments with Maskformer in your Mask2former paper, but I can't seem to find instructions on how to use this code directly for instance segmentation training here? I wonder if you would be so kind as to update the code and config file for instance segmentation of COCO datasets using Maskformer? Thank you!

opened by WuTao-CS 6
Question about parameter tuning

Hi bowen,

Thank you for sharing such a great work.

I have a question about the parameter for training on new dataset for panoptic segmentation task. In the new dataset, we have less objects in each image (maybe 1-5). What parameter do you think is the most important to conduct this adaptation? Any advice is really appreciated.

The situation is that I found the final mask and dice loss are close to 0.1 which is somehow smaller than it in coco panoptic training (about 0.3)? I wonder is there any normalization in the code to make the loss small when object number is small? I think no?

Another random question: do you use a large batchsize of 64 because of the poor label quality of coco? If I change it to 16 will that have a large difference (I ask this because of the panoptic-deeplab pytorch version using 16 while paper using 64 also)?

Look forward to your reply.

opened by lxa9867 6
the mIoU in ade20k is not good like in the ade20k benchmark

thank you for your great work. However, the mIoU i test with the maskformer_swin_large_IN21k_384_bs16_160k_res640.yaml and model is only 0.4022, but your mIoU in ade20k benchmark is 0.4967, which is much better. Could you please tell me why? thank you.

opened by daixiaolei623 6
Panoptic segmentation on Cityscapes

Thanks for your excellent work.

May I know if you have tried to do panoptic segmentation on Cityscapes dataset?

I am trying to do it but got weird results. I prepared the cityscapes dataset by following the instructions and modified the config files from the semantic config file "maskformer_R101_bs16_90k.yaml".

If you have tried, could you please provide the config files for panoptic segmentation on Cityscapes?

Thank you.

opened by Jingyi1017 6
About the missing file.

Hello, I have not found the file 'prepare_ade20k_panoptic_annotations.py' to generate the ade20k panoptic annotations, could you please provide the file? By the way, if the MaskFormer could be used on Mapillary Vistas panoptic segmentation? Looking forward to your reply.

opened by HCShi 5

Is it possible to use the Detectron2 API to load a pretrained model?

I want to run inference on an image using a pretrained model given in this project and I want to use the detectron2 API to load the model. Is it something possible?

Here's what I tried:

I tried to run the Run a pre-trained detectron2 model section in the Getting Started notebook of detectron2.
When obtaining the config, I've changed the code to below:

URL = "ade20k-150/maskformer_R50_bs16_160k.yaml"
cfg = get_cfg()
# add project-specific config (e.g., TensorMask) here if you're not running a model in detectron2's core library
cfg.merge_from_file(model_zoo.get_config_file(URL))
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  # set threshold for this model
# Find a model from detectron2's model zoo. You can use the https://dl.fbaipublicfiles... url as well
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(URL)

It seems like the models aren't available in the model zoo:

  ---------------------------------------------------------------------------
  RuntimeError                              Traceback (most recent call last)
  <ipython-input-9-f6671c809590> in <module>()
        1 cfg = get_cfg()
        2 # add project-specific config (e.g., TensorMask) here if you're not running a model in detectron2's core library
  ----> 3 cfg.merge_from_file(model_zoo.get_config_file(URL))
        4 cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  # set threshold for this model
        5 # Find a model from detectron2's model zoo. You can use the https://dl.fbaipublicfiles... url as well
  
  /usr/local/lib/python3.7/dist-packages/detectron2/model_zoo/model_zoo.py in get_config_file(config_path)
      117     )
      118     if not os.path.exists(cfg_file):
  --> 119         raise RuntimeError("{} not available in Model Zoo!".format(config_path))
      120     return cfg_file
      121 
  
  RuntimeError: ade20k-150/maskformer_R50_bs16_160k.yaml not available in Model Zoo!

If this isn't possible, what is the way we can load a pretrained model in the user code?

opened by ok-ad 5

the input size of Flops is 256x256?

https://github.com/facebookresearch/detectron2/blob/main/tools/analyze_model.py

Hi Bowen. I calculate the flop and params with the scirpt, but the result is not the same with your paper. The maskformer_swin_small_bs16_160k.yaml is 63M Params and 111G Flops. In your paper is 63M Params and 79G Flops. Is there any problems with my calculation? When the input shape resize to 256x256 it is the similar as your paper.

python3 analyze_model.py --config-file ./configs/ade20k-150/swin/maskformer_swin_small_bs16_160k.yaml --tasks flop

Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(512, 512), max_size=2048, sample_style='choice')] [11/15 13:41:29 detectron2]: Flops table computed from only one input sample: | module | #parameters or shape | #flops | |:----------------------------------------------|:-----------------------|:-----------| | model | 63.075M | 80.909G | | backbone | 48.839M | 49.38G | | backbone.patch_embed | 4.896K | 83.362M | | backbone.patch_embed.proj | 4.704K | 75.497M | | backbone.patch_embed.norm | 0.192K | 7.864M | | backbone.layers | 48.831M | 49.282G | | backbone.layers.0 | 0.299M | 4.394G | | backbone.layers.1 | 1.188M | 4.367G | | backbone.layers.2 | 33.16M | 35.953G | | backbone.layers.3.blocks | 14.184M | 4.567G | | backbone.norm0 | 0.192K | 7.864M | | backbone.norm0.weight | (96,) | | | backbone.norm0.bias | (96,) | | | backbone.norm1 | 0.384K | 3.932M | | backbone.norm1.weight | (192,) | | | backbone.norm1.bias | (192,) | | | backbone.norm2 | 0.768K | 1.966M | | backbone.norm2.weight | (384,) | | | backbone.norm2.bias | (384,) | | | backbone.norm3 | 1.536K | 0.983M | | backbone.norm3.weight | (768,) | | | backbone.norm3.bias | (768,) | | | sem_seg_head | 14.236M | 27.453G | | sem_seg_head.pixel_decoder | 4.305M | 23.56G | | sem_seg_head.pixel_decoder.adapter_1 | 25.088K | 0.424G | | sem_seg_head.pixel_decoder.layer_1 | 0.59M | 9.685G | | sem_seg_head.pixel_decoder.adapter_2 | 49.664K | 0.207G | | sem_seg_head.pixel_decoder.layer_2 | 0.59M | 2.421G | | sem_seg_head.pixel_decoder.adapter_3 | 98.816K | 0.102G | | sem_seg_head.pixel_decoder.layer_3 | 0.59M | 0.605G | | sem_seg_head.pixel_decoder.layer_4 | 1.77M | 0.453G | | sem_seg_head.pixel_decoder.mask_features | 0.59M | 9.664G | | sem_seg_head.predictor | 9.932M | 3.887G | | sem_seg_head.predictor.transformer.decoder | 9.473M | 1.179G | | sem_seg_head.predictor.query_embed | 25.6K | | | sem_seg_head.predictor.input_proj | 0.197M | 50.332M | | sem_seg_head.predictor.class_embed | 38.807K | 23.194M | | sem_seg_head.predictor.mask_embed.layers | 0.197M | 0.118G | [11/15 13:41:29 detectron2]: Average GFlops for each type of operators: [('conv', 32.83191595008), ('layer_norm', 0.22296760319999998), ('linear', 67.07614236672), ('matmul', 1.92566500224), ('group_norm', 0.0769406976), ('upsample_nearest2d', 0.00764854272), ('bmm', 0.139984896), ('einsum', 8.959275), ('upsample_bilinear2d', 0.29302461)] [11/15 13:41:29 detectron2]: Total GFlops: 111.5±12.8

opened by Sunting78 4
question about ade20k benchmark learning setting
A great work! Here, I have some questions.

val data is used to fine-tune the model by training data or train and val data learn together?

if val to fine-tune, how much epoch is used? Is there any unify and default setting for fair comparison?

if train and val data learn together, the learning setting is also the same? 160k iters?

Look forward to your reply.
opened by Sunting78 4
The error when training

Thank you for your great work. However, when i train the maskformer_swin_large_IN21k_384_bs16_160k_res640.yaml using the commend: ./train_net.py --num-gpus 2 --config-file configs/ade20k-150/swin/maskformer_swin_large_IN21k_384_bs16_160k_res640.yaml .

I got the following errors: `MaskFormer Training Script.

This script is a simplified version of the training script in detectron2/tools. : No such file or directory import-im6.q16: not authorized copy' @ error/constitute.c/WriteImage/1037. import-im6.q16: not authorizeditertools' @ error/constitute.c/WriteImage/1037. import-im6.q16: not authorized logging' @ error/constitute.c/WriteImage/1037. import-im6.q16: not authorizedos' @ error/constitute.c/WriteImage/1037. from: can't read /var/mail/collections from: can't read /var/mail/typing import-im6.q16: not authorized torch' @ error/constitute.c/WriteImage/1037. import-im6.q16: not authorizedcomm' @ error/constitute.c/WriteImage/1037. from: can't read /var/mail/detectron2.checkpoint from: can't read /var/mail/detectron2.config from: can't read /var/mail/detectron2.data from: can't read /var/mail/detectron2.engine ./train_net.py: line 21: syntax error near unexpected token (' ./train_net.py: line 21:from detectron2.evaluation import ('`

Could you please tell me what is the problem and how to solve it? thank you very much!

opened by daixiaolei623 4
Error in training

When I run train.py using the following command !python /content/drive/MyDrive/MaskFormer-main/train_net.py
--config-file /content/drive/MyDrive/MaskFormer-main/configs/ade20k-150/maskformer_R50_bs16_160k.yaml
--num-gpus 1 SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.1 on google colab . I found the following error

FileNotFoundError: [Errno 2] No such file or directory: 'datasets/ADEChallengeData2016/images/training'

As I download the Maskformer code from your link

opened by ainneabid 0
Unable to train the model

Hi,

Thanks for your great work! I try to train the model myself recently, but I found that it takes so long to transfer the model from cpu to gpu (about an hour) and then it failed. Could you pls give me any suggestions? Did I do something wrong?

Thanks in advance!

My environment is below:

sys.platform linux Python 3.7.0 (default, Oct 9 2018, 10:31:47) [GCC 7.3.0] numpy 1.21.5 detectron2 0.6 @/home/mu/anaconda3/envs/maskformer/lib/python3.7/site-packages/detectron2 Compiler GCC 7.3 CUDA compiler CUDA 10.2 detectron2 arch flags 3.7, 5.0, 5.2, 6.0, 6.1, 7.0, 7.5 DETECTRON2_ENV_MODULE PyTorch 1.8.2 @/home/mu/anaconda3/envs/maskformer/lib/python3.7/site-packages/torch PyTorch debug build False GPU available Yes GPU 0 NVIDIA GeForce RTX 3080 Laptop GPU (arch=8.6) Driver version 510.60.02 CUDA_HOME /usr/local/cuda Pillow 9.2.0 torchvision 0.9.2 @/home/mu/anaconda3/envs/maskformer/lib/python3.7/site-packages/torchvision torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5 fvcore 0.1.5.post20220512 iopath 0.1.9 cv2 4.6.0

The error is below:

res4.9.conv3.norm.num_batches_tracked res5.0.conv1.norm.num_batches_tracked res5.0.conv2.norm.num_batches_tracked res5.0.conv3.norm.num_batches_tracked res5.0.shortcut.norm.num_batches_tracked res5.1.conv1.norm.num_batches_tracked res5.1.conv2.norm.num_batches_tracked res5.1.conv3.norm.num_batches_tracked res5.2.conv1.norm.num_batches_tracked res5.2.conv2.norm.num_batches_tracked res5.2.conv3.norm.num_batches_tracked stem.conv1.norm.num_batches_tracked stem.conv2.norm.num_batches_tracked stem.conv3.norm.num_batches_tracked stem.fc.{bias, weight} [08/21 20:18:39 d2.engine.train_loop]: Starting training from iteration 0 ERROR [08/21 20:20:24 d2.engine.train_loop]: Exception during training: Traceback (most recent call last): File "/cloud/maskformer/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 149, in train self.run_step() File "/cloud/maskformer/lib/python3.7/site-packages/detectron2/engine/defaults.py", line 494, in run_step self._trainer.run_step() File "/cloud/maskformer/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 285, in run_step losses.backward() File "/cloud/maskformer/lib/python3.7/site-packages/torch/tensor.py", line 245, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/cloud/maskformer/lib/python3.7/site-packages/torch/autograd/init.py", line 147, in backward allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag RuntimeError: Unable to find a valid cuDNN algorithm to run convolution [08/21 20:20:24 d2.engine.hooks]: Total training time: 0:01:45 (0:00:00 on hooks) [08/21 20:20:24 d2.utils.events]: iter: 0 lr: N/A max_mem: 5604M Traceback (most recent call last): File "train_net.py", line 270, in args=(args,), File "/cloud/maskformer/lib/python3.7/site-packages/detectron2/engine/launch.py", line 82, in launch main_func(*args) File "train_net.py", line 258, in main return trainer.train() File "/cloud/maskformer/lib/python3.7/site-packages/detectron2/engine/defaults.py", line 484, in train super().train(self.start_iter, self.max_iter) File "/cloud/maskformer/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 149, in train self.run_step() File "/cloud/maskformer/lib/python3.7/site-packages/detectron2/engine/defaults.py", line 494, in run_step self._trainer.run_step() File "/cloud/maskformer/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 285, in run_step losses.backward() File "/cloud/maskformer/lib/python3.7/site-packages/torch/tensor.py", line 245, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/cloud/maskformer/lib/python3.7/site-packages/torch/autograd/init.py", line 147, in backward allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

opened by kagawa588 0
A few questions about the configuration files
Thank you for your wonderful work! I have a few questions about the configuration.

For ADE20k and Cityscapes, it seems that there is no configuration file for Swin as a backbone network. Could you please provide it?

If I train with only one GPU, like you said in GETTING_STARTED.md, is it enough to adjust only base_lr and batch_size, do the other parameters in optimizer and lr_scheduler still need to be modified?
opened by WBS-123 0
Question about Pixel decoder last Conv2d layer
Great work！ Your paper said that “Finally, we apply a single 1 × 1 convolution layer to get the per-pixel embeddings. ” in 4.1 Implementation details Pixel decoder. However，the codes in pixel_decoder.py ，

self.mask_features = Conv2d( conv_dim, mask_dim, kernel_size=3, stride=1, padding=1, )

the final conv2d also has a 3*3 kernel, do I miss something? Thanks!
opened by YaGami01 0
Extract only mask from the output

First of all, thanks for the code and the model. Question - How can I extract/save only the mask (RGB, without any label text on it) during inference? Which function shall I modify to get the RGB mask?

Thanks!

opened by SahilChachra 0

Owner

Facebook Research

GitHub

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

337 Dec 15, 2022

This is code to fit per-pixel environment map with spherical Gaussian lobes, using LBFGS optimization

Spherical Gaussian Optimization This is code to fit per-pixel environment map with spherical Gaussian lobes, using LBFGS optimization. This code has b

41 Dec 14, 2022

Exploring Cross-Image Pixel Contrast for Semantic Segmentation

Exploring Cross-Image Pixel Contrast for Semantic Segmentation Exploring Cross-Image Pixel Contrast for Semantic Segmentation, Wenguan Wang, Tianfei Z

510 Jan 2, 2023

Pytorch Implementation for NeurIPS (oral) paper: Pixel Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

Pixel-Level Cycle Association This is the Pytorch implementation of our NeurIPS 2020 Oral paper Pixel-Level Cycle Association: A New Perspective for D

87 Oct 19, 2022

[CVPR'22] Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast

wseg Overview The Pytorch implementation of Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast. [arXiv] Though image-level weakly

96 Dec 30, 2022

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

32 Sep 21, 2022

Ever felt tired after preprocessing the dataset, and not wanting to write any code further to train your model? Ever encountered a situation where you wanted to record the hyperparameters of the trained model and able to retrieve it afterward? Models Playground is here to help you do that. Models playground allows you to train your models right from the browser.

Models Playground ??️ Upload a Preprocessed Dataset ?? Choose whether to perform Classification or Regression ?? Enter the Dependent Variable ?

19 Dec 10, 2022

Code for "Diffusion is All You Need for Learning on Surfaces"

Source code for "Diffusion is All You Need for Learning on Surfaces", by Nicholas Sharp Souhaib Attaiki Keenan Crane Maks Ovsjanikov NOTE: the linked

247 Dec 28, 2022

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

TimeSformer This is an official pytorch implementation of Is Space-Time Attention All You Need for Video Understanding?. In this repository, we provid

1k Dec 31, 2022

Unofficial PyTorch implementation of Fastformer based on paper "Fastformer: Additive Attention Can Be All You Need"."

Fastformer-PyTorch Unofficial PyTorch implementation of Fastformer based on paper Fastformer: Additive Attention Can Be All You Need. Usage : import t

126 Dec 6, 2022

An implementation of Fastformer: Additive Attention Can Be All You Need in TensorFlow

Fast Transformer This repo implements Fastformer: Additive Attention Can Be All You Need by Wu et al. in TensorFlow. Fast Transformer is a Transformer

139 Dec 28, 2022

Unofficial Tensorflow-Keras implementation of Fastformer based on paper [Fastformer: Additive Attention Can Be All You Need](https://arxiv.org/abs/2108.09084).

Fastformer-Keras Unofficial Tensorflow-Keras implementation of Fastformer based on paper Fastformer: Additive Attention Can Be All You Need. Tensorflo

10 Jan 30, 2022

A PyTorch implementation of the Transformer model in "Attention is All You Need".

Attention is all you need: A Pytorch Implementation This is a PyTorch implementation of the Transformer model in "Attention is All You Need" (Ashish V

7.1k Jan 4, 2023

pytorch implementation of Attention is all you need

A Pytorch Implementation of the Transformer: Attention Is All You Need Our implementation is largely based on Tensorflow implementation Requirements N

230 Dec 7, 2022

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

16 Jul 16, 2022

Open-Set Recognition: A Good Closed-Set Classifier is All You Need

Open-Set Recognition: A Good Closed-Set Classifier is All You Need Code for our paper: "Open-Set Recognition: A Good Closed-Set Classifier is All You

194 Jan 3, 2023

BasicRL: easy and fundamental codes for deep reinforcement learning。It is an improvement on rainbow-is-all-you-need and OpenAI Spinning Up.

BasicRL: easy and fundamental codes for deep reinforcement learning BasicRL is an improvement on rainbow-is-all-you-need and OpenAI Spinning Up. It is

12 Apr 28, 2022

Implementation of ConvMixer-Patches Are All You Need? in TensorFlow and Keras

Patches Are All You Need? - ConvMixer ConvMixer, an extremely simple model that is similar in spirit to the ViT and the even-more-basic MLP-Mixer in t

8 Oct 3, 2022

Implementation of Vaswani, Ashish, et al. "Attention is all you need."

Attention Is All You Need Paper Implementation This is my from-scratch implementation of the original transformer architecture from the following pape

195 Dec 30, 2022