UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities

Microsoft

Last update: Jan 1, 2023

Comments

[LayoutLM] How to reproduce FUNSD result

Hello, I have run fine tuning for the Sequence Labeling Task with FUNSD dataset but my I couldn't achieve the result presented in the paper (precision is only 40%), here are some scripts and log that I used, any idea about what could be wrong? Thank you very much. Training:

#!/bin/bash

python run_seq_labeling.py  --data_dir ~/mnt/data \
                            --model_type layoutlm \
                            --model_name_or_path ~/mnt/model \
                            --do_lower_case \
                            --max_seq_length 512 \
                            --do_train \
                            --num_train_epochs 100.0 \
                            --logging_steps 10 \
                            --save_steps -1 \
                            --output_dir ~/mnt/output \
                            --labels ~/mnt/data/labels.txt \
                            --per_gpu_train_batch_size 16 \
                            --fp16

Testing:

#!/bin/bash

python run_seq_labeling.py --do_predict\
  --model_type layoutlm\
  --model_name_or_path ~/mnt/model\
  --data_dir ~/mnt/data\
  --output_dir ~/mnt/output\
  --labels ~/mnt/data/labels.txt

Some log:

05/14/2020 09:40:45 - INFO - __main__ -   ***** Running training *****
05/14/2020 09:40:45 - INFO - __main__ -     Num examples = 150
05/14/2020 09:40:45 - INFO - __main__ -     Num Epochs = 100
05/14/2020 09:40:45 - INFO - __main__ -     Instantaneous batch size per GPU = 16
05/14/2020 09:40:45 - INFO - __main__ -     Total train batch size (w. parallel, distributed & accumulation) = 16
05/14/2020 09:40:45 - INFO - __main__ -     Gradient Accumulation steps = 1
05/14/2020 09:40:45 - INFO - __main__ -     Total optimization steps = 1000
05/14/2020 09:53:00 - INFO - __main__ -    global_step = 1000, average loss = 0.10387736940692412

05/14/2020 10:17:07 - INFO - __main__ -   ***** Running evaluation  *****
05/14/2020 10:17:07 - INFO - __main__ -     Num examples = 52
05/14/2020 10:17:07 - INFO - __main__ -     Batch size = 8
05/14/2020 10:17:07 - INFO - __main__ -   
           precision    recall  f1-score   support

 QUESTION       0.41      0.70      0.52       771
   HEADER       0.00      0.00      0.00       108
   ANSWER       0.39      0.50      0.44       513

micro avg       0.40      0.57      0.47      1392
macro avg       0.37      0.57      0.45      1392

05/14/2020 10:17:07 - INFO - __main__ -   ***** Eval results  *****
05/14/2020 10:17:07 - INFO - __main__ -     f1 = 0.472115668338743
05/14/2020 10:17:07 - INFO - __main__ -     loss = 2.9291565077645436
05/14/2020 10:17:07 - INFO - __main__ -     precision = 0.400600901352028
05/14/2020 10:17:07 - INFO - __main__ -     recall = 0.5747126436781609

opened by nv-quan 17

LayoutLMv2 code release

Thanks for sharing the great work! I was wondering if there is an expected date on when you will be releasing your code and pre-trained models for LayoutLMv2.

opened by rtanaka-lab 14
Unable to get vaild value from layoutlm model

Hi there,

Thank you for your works very much and i'd like to use the LayoutLM network to try labeling in seq but I'm in trouble due to the question detail as follow:

for the env, i try to setup as readme.md but in the command pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ there is a trouble so i use my env existed which also meet the requirement;

for the data, i download the FUNSD and preprocessing its testing_data follow the readme.md;

for the main, I run it in the run_seq_labeling.py but not command after preprocessing.I can finish the process of training but can not for the one of predicting. my conf is equivalent to the follow command: python run_seq_labeling.py --data_dir data
--model_type layoutlm
--model_name_or_path layout-large-uncased \ (Downloaded form google drive) --output_dir out
--labels data/labels.txt
--do_predict
--do_lower_case
--overwrite_output_dir
and other command is not change from default. It will be error in line 357: eval_loss += tmp_eval_loss.item()

and the error is RuntimeError: CUDA error: device-side assert triggered

I debug it and this error may be created from line 349: outputs = model(input)

due to the input is a dict of tensor contains {input_ids, attention_mask, labels, bbox, token_type_ids} but output is a tuple which contains 2 tensor but their data are Unable to get repr for 'torch.Tensor';

i run it as I understand it from paper but i do not know whether it is correct, and i have spend a long time in it but get none result, could you please provide some help for me. sincerely thanks;

opened by NancyNozomi 14
[unused99] occurs two times in unilm1.2-base-uncased-vocab.txt

when running s2s-ft, [unused99] occurs two times in unilm1.2-base-uncased-vocab.txt, which makes incorrect seq2seq_loader.Preprocess4Seq2seqDecoder.vocab_words

opened by GentleSmile 14
Question Generation example for S2S-FT

Hi Li Dong Thanks a lot for your code and the S2S-FT example. You gave us in the README abstractive summarization usage, could you please also give Question Generation usage example ? Thanks in advance Philippe PS: I have another question, but better on a separate issue for tracking

opened by Neuronys 13

Issue in the backbone of LayoutLMv2

Describe the bug Model I am using: LayoutLMv2

While trying to train LayoutLMv2, I got an error saying,

~/orbi/unilm/layoutlmft/layoutlmft/models/layoutlmv2/modeling_layoutlmv2.py in forward(self, images)
    605 
    606     def forward(self, images):
--> 607         images_input = (images.tensor - self.pixel_mean) / self.pixel_std
    608         features = self.backbone(images_input)
    609         features = features[self.out_feature_key]

AttributeError: 'Tensor' object has no attribute 'tensor'

I tried running it without .tensor and it worked fine.

Please let me know if there is any mistake in the way I am passing the images. Otherwise, I can open a PR to remove .tensor.

Thanks!

cc: @ranpox

opened by rushabh-v 12

KeyError: 'layoutlmv2' in AutoTokenizer

I get key error, when I try to run AutoTokenizer.from_pretrained("microsoft/layoutlmv2-base-uncased") same is the case even when I download the files to local and run the above with path to the config folder

I also do not find layoutlmv2 in the AutoTokenizer.from_pretrained Documentation.

Any leads on how to use it the right way would be helpful

opened by sindhuattaiger 12
Is transformers LayoutLM really working?

Getting endless erros when trying to use the LayoutLMForTokenClassification from transformers for NER task, is just me doing wrong or the function still on work? Really appreciate if you can give some information.

opened by shaonanqinghuaizongshishi 12

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

Hi, I was using run_classification.py with the base model of layoutlm for my own document set. However, I came across this error:

04/25/2020 09:26:55 - WARNING - __main__ -   Process rank: -1, device: cuda:0, n_gpu: 1, distributed training: False, 16-bits training: False
04/25/2020 09:26:55 - INFO - transformers.configuration_utils -   loading configuration file /home/ubuntu/rwik_xx_document_classification/unilm/layoutlm-base-uncased/config.json
04/25/2020 09:26:55 - INFO - transformers.configuration_utils -   Model config {
  "attention_probs_dropout_prob": 0.1,
  "finetuning_task": "cdip",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "is_decoder": false,
  "layer_norm_eps": 1e-12,
  "max_2d_position_embeddings": 1024,
  "max_position_embeddings": 512,
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "num_labels": 2,
  "output_attentions": false,
  "output_hidden_states": false,
  "output_past": true,
  "pruned_heads": {},
  "torchscript": false,
  "type_vocab_size": 2,
  "use_bfloat16": false,
  "vocab_size": 30522
}

04/25/2020 09:26:55 - INFO - transformers.tokenization_utils -   Model name '/home/ubuntu/rwik-xx/unilm/layoutlm-base-uncased/' not found in model shortcut name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased). Assuming '/home/ubuntu/rwik-xx/unilm/layoutlm-base-uncased/' is a path or url to a directory containing tokenizer files.
04/25/2020 09:26:55 - INFO - transformers.tokenization_utils -   Didn't find file /home/ubuntu/rwik-xx/unilm/layoutlm-base-uncased/added_tokens.json. We won't load it.
04/25/2020 09:26:55 - INFO - transformers.tokenization_utils -   loading file /home/ubuntu/rwik-xx/unilm/layoutlm-base-uncased/vocab.txt
04/25/2020 09:26:55 - INFO - transformers.tokenization_utils -   loading file None
04/25/2020 09:26:55 - INFO - transformers.tokenization_utils -   loading file /home/ubuntu/rwik-xx/unilm/layoutlm-base-uncased/special_tokens_map.json
04/25/2020 09:26:55 - INFO - transformers.tokenization_utils -   loading file /home/ubuntu/rwik-xx/unilm/layoutlm-base-uncased/tokenizer_config.json
04/25/2020 09:26:55 - INFO - transformers.modeling_utils -   loading weights file /home/ubuntu/rwik-xx/unilm/layoutlm-base-uncased/pytorch_model.bin
04/25/2020 09:27:16 - INFO - transformers.modeling_utils -   Weights of LayoutLMForSequenceClassification not initialized from pretrained model: ['classifier.weight', 'classifier.bias']
04/25/2020 09:27:16 - INFO - transformers.modeling_utils -   Weights from pretrained model not used in LayoutLMForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
04/25/2020 09:27:19 - INFO - __main__ -   Training/evaluation parameters Namespace(adam_epsilon=1e-08, cache_dir='', config_name='', data_dir='/home/ubuntu/rwik-xx/xx_pdftotext_html_output/', dev_folder='trial_24_04_test', device=device(type='cuda', index=0), do_eval=True, do_lower_case=True, do_train=True, eval_all_checkpoints=True, evaluate_during_training=True, fp16=False, fp16_opt_level='O1', gradient_accumulation_steps=2, hierarchical_tokens=False, learning_rate=5e-05, local_rank=-1, logging_steps=100, max_grad_norm=1.0, max_seq_length=448, max_steps=10000, model_name_or_path='/home/ubuntu/rwik_xx_document_classification/unilm/layoutlm-base-uncased/', model_type='layoutlm', n_gpu=1, no_cuda=False, nuance_mode=True, num_train_epochs=4.0, output_dir='/home/ubuntu/rwik_xx_document_classification/weight_files/1_layout_lm_base_24_04/', output_mode='classification', overwrite_cache=False, overwrite_output_dir=False, per_gpu_eval_batch_size=16, per_gpu_train_batch_size=8, save_steps=100, seed=566, server_ip='', server_port='', stride_len=112, task_name='cdip', tokenizer_name='', tpu=False, tpu_ip_address='', tpu_name='', tqdm_notebook_mode=False, train_folder='trial_24_04_train', warmup_steps=500, weight_decay=0.0, xrt_tpu_config='')
04/25/2020 09:27:19 - INFO - __main__ -   Creating features from dataset file at /home/ubuntu/rwik-xx/xx_pdftotext_html_output/
Gettting train examples: 100%|██████████| 789/789 [00:26<00:00, 29.34it/s]
  0%|          | 0/789 [00:00<?, ?it/s]04/25/2020 09:27:46 - INFO - utils_classification -   *** Example ***
04/25/2020 09:27:46 - INFO - utils_classification -   guid: train-1
04/25/2020 09:27:46 - INFO - utils_classification -   input_ids: 101 9986 2271 23773 11255 8909 1024 5709... ( truncated )
04/25/2020 09:27:46 - INFO - utils_classification -   bboxes: [0, 0, 0, 0] [0, 0, 0, 0] [55, 44, 109, 53] [55, 44, 109, 53] [55, 44, 109, 53] [112, 44, 164, 53] ......[1000, 1000, 1000, 1000] ( truncated )
04/25/2020 09:27:46 - INFO - utils_classification -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 ... ( truncated )
04/25/2020 09:27:46 - INFO - utils_classification -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 ... ( truncated )
04/25/2020 09:27:46 - INFO - utils_classification -   label: xxx (id = 1)
04/25/2020 09:27:46 - INFO - utils_classification -   *** Example ***
.
.
.
(truncated)

100%|██████████| 789/789 [00:30<00:00, 26.17it/s]
04/25/2020 09:28:16 - INFO - __main__ -   Saving features into cached file /home/ubuntu/rwik_xx/xx_pdftotext_html_output/cached_train_layoutlm-base-uncased_448_cdip
04/25/2020 09:28:19 - INFO - __main__ -   ***** Running training *****
04/25/2020 09:28:19 - INFO - __main__ -     Num examples = 1929
04/25/2020 09:28:19 - INFO - __main__ -     Num Epochs = 83
04/25/2020 09:28:19 - INFO - __main__ -     Instantaneous batch size per GPU = 8
04/25/2020 09:28:19 - INFO - __main__ -     Total train batch size (w. parallel, distributed & accumulation) = 16
04/25/2020 09:28:19 - INFO - __main__ -     Gradient Accumulation steps = 2
04/25/2020 09:28:19 - INFO - __main__ -     Total optimization steps = 10000
Epoch:   0%|          | 0/83 [00:00<?, ?it/s]
Iteration:   0%|          | 0/242 [00:00<?, ?it/s]
Epoch:   0%|          | 0/83 [00:00<?, ?it/s]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/home/ubuntu/rwik_xx_document_classification/unilm/layoutlm/run_classification.py in <module>()
    925 
    926 if __name__ == "__main__":
--> 927     main()

/home/ubuntu/rwik_xx_document_classification/unilm/layoutlm/run_classification.py in main()
    859             args, args.task_name, tokenizer, evaluate=False
    860         )
--> 861         global_step, tr_loss = train(args, train_dataset, model, tokenizer)
    862         logger.info(" global_step = %s, average loss = %s", global_step, tr_loss)
    863 

/home/ubuntu/rwik_xx_document_classification/unilm/layoutlm/run_classification.py in train(args, train_dataset, model, tokenizer)
    217                 )  # XLM, DistilBERT and RoBERTa don't use segment_ids
    218             #pdb.set_trace()
--> 219             outputs = model(**inputs)
    220             loss = outputs[
    221                 0

/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

/home/ubuntu/rwik_xx_document_classification/unilm/layoutlm/modeling_layoutlm.py in forward(self, input_ids, bbox, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, labels)
    328             token_type_ids=token_type_ids,
    329             position_ids=position_ids,
--> 330             head_mask=head_mask,
    331         )
    332 

/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

/home/ubuntu/rwik_xx_document_classification/unilm/layoutlm/modeling_layoutlm.py in forward(self, input_ids, bbox, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask)
    191         )
    192         encoder_outputs = self.encoder(
--> 193             embedding_output, extended_attention_mask, head_mask=head_mask
    194         )
    195         sequence_output = encoder_outputs[0]

/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

/home/ubuntu/.local/lib/python3.6/site-packages/transformers/modeling_bert.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask)
    378                 all_hidden_states = all_hidden_states + (hidden_states,)
    379 
--> 380             layer_outputs = layer_module(hidden_states, attention_mask, head_mask[i], encoder_hidden_states, encoder_attention_mask)
    381             hidden_states = layer_outputs[0]
    382 

/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

/home/ubuntu/.local/lib/python3.6/site-packages/transformers/modeling_bert.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask)
    349 
    350     def forward(self, hidden_states, attention_mask=None, head_mask=None, encoder_hidden_states=None, encoder_attention_mask=None):
--> 351         self_attention_outputs = self.attention(hidden_states, attention_mask, head_mask)
    352         attention_output = self_attention_outputs[0]
    353         outputs = self_attention_outputs[1:]  # add self attentions if we output attention weights

/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

/home/ubuntu/.local/lib/python3.6/site-packages/transformers/modeling_bert.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask)
    303 
    304     def forward(self, hidden_states, attention_mask=None, head_mask=None, encoder_hidden_states=None, encoder_attention_mask=None):
--> 305         self_outputs = self.self(hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask)
    306         attention_output = self.output(self_outputs[0], hidden_states)
    307         outputs = (attention_output,) + self_outputs[1:]  # add attentions if we output them

/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

/home/ubuntu/.local/lib/python3.6/site-packages/transformers/modeling_bert.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask)
    213 
    214     def forward(self, hidden_states, attention_mask=None, head_mask=None, encoder_hidden_states=None, encoder_attention_mask=None):
--> 215         mixed_query_layer = self.query(hidden_states)
    216 
    217         # If this is instantiated as a cross-attention module, the keys

/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/modules/linear.py in forward(self, input)
     85 
     86     def forward(self, input):
---> 87         return F.linear(input, self.weight, self.bias)
     88 
     89     def extra_repr(self):

/home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/functional.py in linear(input, weight, bias)
   1610         ret = torch.addmm(bias, input, weight.t())
   1611     else:
-> 1612         output = input.matmul(weight.t())
   1613         if bias is not None:
   1614             output += bias

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

I am using torch version 1.3.0 and transformers version version 2.2.1 and Python 3.

Note that I have modified the loading process to suit the dataset format that my file is present in and I have verified that it is being loaded correctly. In addition to that, I tried running roberta-base through the same dataset and it worked.

When I tried to run python debugger and check the value of the input variable, I got this error

> /home/ubuntu/.local/lib/python3.6/site-packages/torch/nn/functional.py(1612)linear()
   1610         ret = torch.addmm(bias, input, weight.t())
   1611     else:
-> 1612         output = input.matmul(weight.t())
   1613         if bias is not None:
   1614             output += bias

ipdb> input
*** RuntimeError: cuda runtime error (700) : an illegal memory access was encountered at /pytorch/aten/src/THC/THCCachingHostAllocator.cpp:278
ipdb> weight
*** RuntimeError: cuda runtime error (700) : an illegal memory access was encountered at /pytorch/aten/src/THC/THCCachingHostAllocator.cpp:278

opened by rwikdutta 12

Question about pre-training BEiT-base on ImageNet-1k and then fine-tuning on ADE20K

Describe Model I am using (UniLM, MiniLM, LayoutLM ...): BEIT I am try to reproducing self-supervised pre-training BEiT-base on ImageNet-1k and then fine-tuning on ADE20K, in your paper it will get mIoU 45.6, slightly higher than Supervised Pre-Training on ImageNet(45.3) and DINO(44.1), But i can not reproduce this result, i only get mIoU about 39.

I wonder whether you can provide all the training commands and hyperparams for self-supervised pre-training BEiT-base on ImageNet-1k and then fine-tuning on ADE20K. Thanks~

opened by laojiangwei 11
Can LayoutLM be used for commercial purpose?

**Can LayoutLM be used for commercial purposes? And how LayoutLM license is different than other versions of LayoutLM (LayoutLMv2, LayoutLMFT, layoutXLM) Will license hold for both train model and code. Or one can use a trained model provide by other sources such as Docbank for commercial purposes.

I have noticed that the LayoutLM folder is showing deprecated. What does that mean in terms of use and license? **

Model I am using (LayoutLM):

opened by ghost 11
ValueError: You must provide corresponding bounding boxes

Hi,

I got one question when I try to use LayoutLMv2 to train on the FUNSD dataset. I follow the steps and I got the following issue ValueError: You must provide corresponding bounding boxes.

I follow the command to train the LayoutLMv2 on FUNSD dataset.

Here is the screenshot for the issue. Any advice will be appreciated!

opened by 14H034160212 0
Beit3 classification

Thanks for your inspiring and effective work. You achieved really great performance on imagenet-1k classification. As the paper mentioned, you treat classification as retrieval and do intermediate retrieval on imagenet-21k before fine-tuning. But we are wondering what you do for polysemy and same class names for different classes in the dataset imagenet-21k.

Looking forward to your reply.

opened by amandaluof 1
Could you please release the training pipeline of BEATs?

As the paper, the training pipeline/framework of BEATs is complex, and is the core contribution of it. @[email protected] @[email protected] Thank you!

BEATs: Audio Pre-Training with Acoustic Tokenizers

opened by hllmathcs 0
License for LayoutLMv2

Hi @wolfshow We noticed that the license for LayoutLMv2 has been changed to Non-Commercial one. Would this license change apply to only model weights/code downloaded after sept 19 (the day when you changed the license)? Can people who downloaded it prior to the license modification date continue to use for commercial purpose?

opened by riteshKumarUMass 1
Bump pillow from 8.3.1 to 9.3.0 in /vlmo
Bumps pillow from 8.3.1 to 9.3.0.

Release notes

Sourced from pillow's releases.

9.3.0

https://pillow.readthedocs.io/en/stable/releasenotes/9.3.0.html

Changes

Initialize libtiff buffer when saving #6699 [@radarhere]

Limit SAMPLESPERPIXEL to avoid runtime DOS #6700 [@wiredfool]

Inline fname2char to fix memory leak #6329 [@nulano]

Fix memory leaks related to text features #6330 [@nulano]

Use double quotes for version check on old CPython on Windows #6695 [@hugovk]

GHA: replace deprecated set-output command with GITHUB_OUTPUT file #6697 [@nulano]

Remove backup implementation of Round for Windows platforms #6693 [@cgohlke]

Upload fribidi.dll to GitHub Actions #6532 [@nulano]

Fixed set_variation_by_name offset #6445 [@radarhere]

Windows build improvements #6562 [@nulano]

Fix malloc in _imagingft.c:font_setvaraxes #6690 [@cgohlke]

Only use ASCII characters in C source file #6691 [@cgohlke]

Release Python GIL when converting images using matrix operations #6418 [@hmaarrfk]

Added ExifTags enums #6630 [@radarhere]

Do not modify previous frame when calculating delta in PNG #6683 [@radarhere]

Added support for reading BMP images with RLE4 compression #6674 [@npjg]

Decode JPEG compressed BLP1 data in original mode #6678 [@radarhere]

pylint warnings #6659 [@marksmayo]

Added GPS TIFF tag info #6661 [@radarhere]

Added conversion between RGB/RGBA/RGBX and LAB #6647 [@radarhere]

Do not attempt normalization if mode is already normal #6644 [@radarhere]

Fixed seeking to an L frame in a GIF #6576 [@radarhere]

Consider all frames when selecting mode for PNG save_all #6610 [@radarhere]

Don't reassign crc on ChunkStream close #6627 [@radarhere]

Raise a warning if NumPy failed to raise an error during conversion #6594 [@radarhere]

Only read a maximum of 100 bytes at a time in IMT header #6623 [@radarhere]

Show all frames in ImageShow #6611 [@radarhere]

Allow FLI palette chunk to not be first #6626 [@radarhere]

If first GIF frame has transparency for RGB_ALWAYS loading strategy, use RGBA mode #6592 [@radarhere]

Round box position to integer when pasting embedded color #6517 [@radarhere]

Removed EXIF prefix when saving WebP #6582 [@radarhere]

Pad IM palette to 768 bytes when saving #6579 [@radarhere]

Added DDS BC6H reading #6449 [@ShadelessFox]

Added support for opening WhiteIsZero 16-bit integer TIFF images #6642 [@JayWiz]

Raise an error when allocating translucent color to RGB palette #6654 [@jsbueno]

Moved mode check outside of loops #6650 [@radarhere]

Added reading of TIFF child images #6569 [@radarhere]

Improved ImageOps palette handling #6596 [@PososikTeam]

Defer parsing of palette into colors #6567 [@radarhere]

Apply transparency to P images in ImageTk.PhotoImage #6559 [@radarhere]

Use rounding in ImageOps contain() and pad() #6522 [@bibinhashley]

Fixed GIF remapping to palette with duplicate entries #6548 [@radarhere]

Allow remap_palette() to return an image with less than 256 palette entries #6543 [@radarhere]

Corrected BMP and TGA palette size when saving #6500 [@radarhere]

... (truncated)

Changelog

Sourced from pillow's changelog.

9.3.0 (2022-10-29)

Limit SAMPLESPERPIXEL to avoid runtime DOS #6700 [wiredfool]

Initialize libtiff buffer when saving #6699 [radarhere]

Inline fname2char to fix memory leak #6329 [nulano]

Fix memory leaks related to text features #6330 [nulano]

Use double quotes for version check on old CPython on Windows #6695 [hugovk]

Remove backup implementation of Round for Windows platforms #6693 [cgohlke]

Fixed set_variation_by_name offset #6445 [radarhere]

Fix malloc in _imagingft.c:font_setvaraxes #6690 [cgohlke]

Release Python GIL when converting images using matrix operations #6418 [hmaarrfk]

Added ExifTags enums #6630 [radarhere]

Do not modify previous frame when calculating delta in PNG #6683 [radarhere]

Added support for reading BMP images with RLE4 compression #6674 [npjg, radarhere]

Decode JPEG compressed BLP1 data in original mode #6678 [radarhere]

Added GPS TIFF tag info #6661 [radarhere]

Added conversion between RGB/RGBA/RGBX and LAB #6647 [radarhere]

Do not attempt normalization if mode is already normal #6644 [radarhere]

... (truncated)

Commits

d594f4c Update CHANGES.rst [ci skip]

909dc64 9.3.0 version bump

1a51ce7 Merge pull request #6699 from hugovk/security-libtiff_buffer

2444cdd Merge pull request #6700 from hugovk/security-samples_per_pixel-sec

744f455 Added release notes

0846bfa Add to release notes

799a6a0 Fix linting

00b25fd Hide UserWarning in logs

05b175e Tighter test case

13f2c5a Prevent DOS with large SAMPLESPERPIXEL in Tiff IFD

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
[DiT] Question regarding image resolutions on different tasks

First of all, thanks for your general great work on document AI and specifically DiT in this case.

In the paper, it says

Since the image resolution for object detection tasks is much larger than classification, we limit the batch size to 16.

which confuses me. If I understand correctly, when using a pretrained version of DiT, one is ALWAYS limited by the 224x224 image resolution, since this is constrained by the size of the patch embeddings (similar to how e.g. BERT-base simply can't go beyond 512 tokens due to the position embeddings). So regardless of the original size of the image, the input the model gets is always limited to this predefined 224x224.

IF this reasoning is correct, then I cannot comprehend the logic behind resizing random crops of an image as described in the paper:

Specifically, the input image is cropped with probability 0.5 to a random rectangular patch which is then resized again such that the shortest side is at least 480 and at most 800 pixels while the longest at most 1,333.

Any clarification for this would be very much appreciated, thanks in advance!

opened by mrvoh 1

Releases(s2s-ft.v0.3)

s2s-ft.v0.3(Apr 2, 2020)

fix an issue running on CPUs: https://github.com/microsoft/unilm/commit/13641268b59df5cf90d27b451d87ab58b6a07055
Source code(tar.gz)
Source code(zip)
s2s-ft.v0.2(Mar 13, 2020)
Support MiniLM in s2s-ft

Source code(tar.gz)
Source code(zip)
s2s-ft.v0.0(Mar 10, 2020)

seq2seq fine-tuning with unilmv1/unilmv2
Source code(tar.gz)
Source code(zip)

Owner

Microsoft

Open source projects and samples from Microsoft

GitHub

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

59 Dec 28, 2022

Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

UniSpeech The family of UniSpeech: UniSpeech (ICML 2021): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR UniSpeech-

282 Jan 9, 2023

Official repository for the paper, MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding.

MidiBERT-Piano Authors: Yi-Hui (Sophia) Chou, I-Chun (Bronwin) Chen Introduction This is the official repository for the paper, MidiBERT-Piano: Large-

137 Dec 15, 2022

Large-Scale Pre-training for Person Re-identification with Noisy Labels (LUPerson-NL)

LUPerson-NL Large-Scale Pre-training for Person Re-identification with Noisy Labels (LUPerson-NL) The repository is for our CVPR2022 paper Large-Scale

43 Dec 26, 2022

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training By Likun Cai, Zhi Zhang, Yi Zhu, Li Zhang, Mu Li, Xiangyang Xue. This

290 Dec 29, 2022

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

XL-Sum This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Lang

190 Jan 3, 2023

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

212 Dec 25, 2022

Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021)

Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021) This is the implementation of PSD (ICCV 2021),

12 Dec 12, 2022

PyTorch implementation of "Contrast to Divide: self-supervised pre-training for learning with noisy labels"

Contrast to Divide: self-supervised pre-training for learning with noisy labels This is an official implementation of "Contrast to Divide: self-superv

55 Nov 23, 2022

Self-Supervised Pre-Training for Transformer-Based Person Re-Identification

Self-Supervised Pre-Training for Transformer-Based Person Re-Identification [pdf] The official repository for Self-Supervised Pre-Training for Transfo

45 Dec 3, 2021

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training [Arxiv] VideoMAE: Masked Autoencoders are Data-Efficient Learne

Multimedia Computing Group, Nanjing University

697 Jan 7, 2023

A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.

About This repository provides data and code for the paper: Scalable Data Annotation Pipeline for High-Quality Large Speech Datasets Development (subm

86 Dec 7, 2022

Self-training for Few-shot Transfer Across Extreme Task Differences

Self-training for Few-shot Transfer Across Extreme Task Differences (STARTUP) Introduction This repo contains the official implementation of the follo

33 Oct 31, 2022

The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Published by SpaceML • About SpaceML • Quick Colab Example Self-Supervised Learner The Self-Supervised Learner can be used to train a classifier with

92 Nov 30, 2022

AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations

AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations. Each modality’s augmentations are contained within its own sub-library. These sub-libraries include both function-based and class-based transforms, composition operators, and have the option to provide metadata about the transform applied, including its intensity.

4.6k Jan 9, 2023

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation mode

36 Oct 30, 2022

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation models. It contains 17 different amateur subjects performing 30 sports-related actions each, for a total of 510 action clips.

25 Jun 20, 2021

Open-AI's DALL-E for large scale training in mesh-tensorflow.

DALL-E in Mesh-Tensorflow [WIP] Open-AI's DALL-E in Mesh-Tensorflow. If this is similarly efficient to GPT-Neo, this repo should be able to train mode

432 Dec 16, 2022

An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicity.

Fast Face Classification (F²C) This is the code of our paper An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicit

33 Jun 27, 2021

UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities

Related tags

Overview

UniLM AI

News

Release

License

Contact Information

Comments

9.3.0

Changes

9.3.0 (2022-10-29)

Releases(s2s-ft.v0.3)

s2s-ft.v0.3(Apr 2, 2020)

s2s-ft.v0.2(Mar 13, 2020)

s2s-ft.v0.0(Mar 10, 2020)

Owner

Microsoft

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

Official repository for the paper, MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding.

Large-Scale Pre-training for Person Re-identification with Noisy Labels (LUPerson-NL)

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data.

Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021)

PyTorch implementation of "Contrast to Divide: self-supervised pre-training for learning with noisy labels"

Self-Supervised Pre-Training for Transformer-Based Person Re-Identification

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.

Self-training for Few-shot Transfer Across Extreme Task Differences

The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

Open-AI's DALL-E for large scale training in mesh-tensorflow.

An Efficient Training Approach for Very Large Scale Face Recognition or F²C for simplicity.