Replication of Pix2Seq with Pretrained Model

peng gao

Last update: Nov 22, 2022

Related tags

Deep Learning Pretrained-Pix2Seq

Overview

Pretrained-Pix2Seq

We provide the pre-trained model of Pix2Seq. This version contains new data augmentation. The model is trained for 300 epochs and can acheive 37 mAP without beam search or neucles search.

Installation

Install PyTorch 1.5+ and torchvision 0.6+ (recommend torch1.8.1 torchvision 0.8.0)

Install pycocotools (for evaluation on COCO):

pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

That's it, should be good to train and evaluate detection models.

Data preparation

Download and extract COCO 2017 train and val images with annotations from http://cocodataset.org. We expect the directory structure to be the following:

path/to/coco/
  annotations/  # annotation json files
  train2017/    # train images
  val2017/      # val images

Training

First link coco dataset to the project folder

ln -s /path/to/coco ./coco

Training

sh train.sh --model pix2seq --output_dir /path/to/save

Evaluation

sh train.sh --model pix2seq --output_dir /path/to/save --resume /path/to/checkpoints --eval

COCO

Method	backbone	Epoch	Batch Size	AP	AP50	AP75	Weights
Pix2Seq	R50	300	32	37.0	53.4	39.4	weight

Contributor

Qiu Han, Peng Gao, Jingqiu Zhou(Beam Search)

Acknowledegement

Pix2Seq, DETR

Comments

About your setting 'mask' equals to 'False'
Hi, thanks for sharing! I am wondering that why you set all elements of mask equal to False in pix2seq.py?

src, mask = features[-1].decompose() assert mask is not None mask = torch.zeros_like(mask).bool()
opened by Williamongh 1
A detail about your code

Hi! Thank you for your great work. And I don't understand the detail about the vocal_embed: In your code, 'self.vocal_embed = nn.Embedding(self.num_vocal-2, d_model)', here why subtract 2? and what is the mean of 2? Thanks

opened by sunmenmian 0
Which one is the best version?

Hello, thanks very much for your code. What's the difference between the three versions of the pix2seq code, and which one is the more compact version?

opened by Epiphqny 0
About Cusom Datasets

Thanks for your great work, I successfully tested on the coco dataset. I want to test on my own datasets(coco format) that are 13 classes from 1-13 catID. I just edit the num_classes(91 to 14) in 'build' function at pix2seq.py. Is that corect? But I got this error:

File "main.py", line 260, in <module> main(args) File "main.py", line 194, in main train_stats = train_one_epoch( File "/home/dsm/graduate/Pretrained-Pix2Seq/engine.py", line 27, in train_one_epoch for samples, targets in metric_logger.log_every(data_loader, print_freq, header): File "/home/dsm/graduate/Pretrained-Pix2Seq/util/misc.py", line 224, in log_every for obj in iterable: File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 517, in __next__ data = self._next_data() File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1182, in _next_data idx, data = self._get_data() File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1148, in _get_data success, data = self._try_get_data() File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 986, in _try_get_data data = self._data_queue.get(timeout=timeout) File "/usr/lib/python3.8/multiprocessing/queues.py", line 116, in get return _ForkingPickler.loads(res) File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/reductions.py", line 282, in rebuild_storage_fd fd = df.detach() File "/usr/lib/python3.8/multiprocessing/resource_sharer.py", line 58, in detach return reduction.recv_handle(conn) File "/usr/lib/python3.8/multiprocessing/reduction.py", line 189, in recv_handle return recvfds(s, 1)[0] File "/usr/lib/python3.8/multiprocessing/reduction.py", line 164, in recvfds raise RuntimeError('received %d items of ancdata' % RuntimeError: received 0 items of ancdata

opened by Dengshima 0
Why much slower than Stable-Pix2Seq

I found that it takes about 75 minutes per epoch during I was training 'Pre-trained Pix2Seq', while only takes 50 minutes per epoch in 'Stable Pix2Seq'. Why? Where's the differences between them?

opened by yshMars 0
About the gap between Training loss and Inference loss

I found that the loss gap between training and inference stage is very large. The loss in inference stage is 10 times that in training. Even making inference on the training set ,the situation is the same. Can you give some advice?

opened by T-FOUNTAIN 0
Some problems about your great work

Hi! Thank you for your great work. And I have some problems about your work: in the paper,there are some dropouts for class when building input seq,but in your work i never find them,so do you ingore them? and if i want to use dropout, do i need to add a drop token in the vocal?

opened by hughwcq 0
About LargeScaleJitter

hi, great work! We also try to reimplement the Pix2Seq, we find the absolute coordinate is useful, which is similar to your LargeScaleJitter (pad or crop the image to the fix desired size), the absolute coordinate means that normalized the position by dividing the fix size. boxes = boxes / 1333. instead of boxes = boxes / torch.tensor([w, h, w, h], dtype=torch.float32),Then, padding or croppinf the image to the fix desired size is not necessary.

opened by ShuaiBai623 3

Owner

peng gao

Young Scientist at Shanghai AI Lab

GitHub

A full-fledged version of Pix2Seq

Stable-Pix2Seq A full-fledged version of Pix2Seq What it is. This is a full-fledged version of Pix2Seq. Compared with unofficial-pix2seq, stable-pix2s

205 Dec 27, 2022

Implementation of Pix2Seq in PyTorch

pix2seq-pytorch Implementation of Pix2Seq paper Different from the paper image input size 1280 bin size 1280 LambdaLR scheduler used instead of Linear

9 Dec 15, 2022

Replication attempt for the Protein Folding Model

RGN2-Replica (WIP) To eventually become an unofficial working Pytorch implementation of RGN2, an state of the art model for MSA-less Protein Folding f

36 Nov 29, 2022

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

DALL-E in Pytorch Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch. It will also contain CLIP for ranking the ge

5k Jan 4, 2023

Implementation and replication of ProGen, Language Modeling for Protein Generation, in Jax

ProGen - (wip) Implementation and replication of ProGen, Language Modeling for Protein Generation, in Pytorch and Jax (the weights will be made easily

71 Dec 1, 2022

This Repostory contains the pretrained DTLN-aec model for real-time acoustic echo cancellation.

182 Jan 7, 2023

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

This repository is a toolkit to do machine learning for programming languages. It implements tokenization, dataset preprocessing, model training and m

408 Jan 1, 2023

Adds timm pretrained backbone to pytorch's FasterRcnn model

timmFasterRcnn model_config.py -> it returns the model,feat_sizes,output channel and the feat layer names, which is reqd by the Add_FPN.py file Add_FP

12 Dec 3, 2022

[ACL 2022] LinkBERT: A Knowledgeable Language Model 😎 Pretrained with Document Links

LinkBERT: A Knowledgeable Language Model Pretrained with Document Links This repo provides the model, code & data of our paper: LinkBERT: Pretraining

264 Jan 1, 2023

In this project we investigate the performance of the SetCon model on realistic video footage. Therefore, we implemented the model in PyTorch and tested the model on two example videos.

Contrastive Learning of Object Representations Supervisor: Prof. Dr. Gemma Roig Institutions: Goethe University CVAI - Computational Vision & Artifici

6 Dec 8, 2022

Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

3 Mar 30, 2022

Official codebase for Pretrained Transformers as Universal Computation Engines.

universal-computation Overview Official codebase for Pretrained Transformers as Universal Computation Engines. Contains demo notebook and scripts to r

210 Dec 28, 2022

Repository providing a wide range of self-supervised pretrained models for computer vision tasks.

Hierarchical Pretraining: Research Repository This is a research repository for reproducing the results from the project "Self-supervised pretraining

53 Nov 9, 2022

(ImageNet pretrained models) The official pytorch implemention of the TPAMI paper "Res2Net: A New Multi-scale Backbone Architecture"

Res2Net The official pytorch implemention of the paper "Res2Net: A New Multi-scale Backbone Architecture" Our paper is accepted by IEEE Transactions o

928 Dec 29, 2022

This project provides an unsupervised framework for mining and tagging quality phrases on text corpora with pretrained language models (KDD'21).

UCPhrase: Unsupervised Context-aware Quality Phrase Tagging To appear on KDD'21...[pdf] This project provides an unsupervised framework for mining and

146 Dec 22, 2022

Replication of Pix2Seq with Pretrained Model

Related tags

Overview

Pretrained-Pix2Seq

Installation

Data preparation

Training

COCO

Contributor

Acknowledegement

Comments

About your setting 'mask' equals to 'False'

A detail about your code

Which one is the best version?

About Cusom Datasets

Why much slower than Stable-Pix2Seq

About the gap between Training loss and Inference loss

Some problems about your great work

About LargeScaleJitter

Owner

peng gao

A full-fledged version of Pix2Seq

Implementation of Pix2Seq in PyTorch

Replication attempt for the Protein Folding Model

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Implementation and replication of ProGen, Language Modeling for Protein Generation, in Jax

This Repostory contains the pretrained DTLN-aec model for real-time acoustic echo cancellation.

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

Adds timm pretrained backbone to pytorch's FasterRcnn model

[ACL 2022] LinkBERT: A Knowledgeable Language Model 😎 Pretrained with Document Links

In this project we investigate the performance of the SetCon model on realistic video footage. Therefore, we implemented the model in PyTorch and tested the model on two example videos.

Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

Official codebase for Pretrained Transformers as Universal Computation Engines.

Repository providing a wide range of self-supervised pretrained models for computer vision tasks.

(ImageNet pretrained models) The official pytorch implemention of the TPAMI paper "Res2Net: A New Multi-scale Backbone Architecture"

Fine-tune pretrained Convolutional Neural Networks with PyTorch

Pretrained Pytorch face detection (MTCNN) and recognition (InceptionResnet) models

Minimal implementation of Denoised Smoothing: A Provable Defense for Pretrained Classifiers in TensorFlow.

Pretrained models for Jax/Flax: StyleGAN2, GPT2, VGG, ResNet.

This project provides an unsupervised framework for mining and tagging quality phrases on text corpora with pretrained language models (KDD'21).