Training BERT with Compute/Time (Academic) Budget

Intel Labs

Last update: Jan 7, 2023

Related tags

Deep Learning academic-budget-bert

Overview

Training BERT with Compute/Time (Academic) Budget

This repository contains scripts for pre-training and finetuning BERT-like models with limited time and compute budget. The code is based on the work presented in the following paper:

Peter Izsak, Moshe Berchansky, Omer Levy, How to Train BERT with an Academic Budget - (to appear at EMNLP 2021).

Installation

The pre-training and finetuning scripts are based on Deepspeed and HuggingFace Transformers libraries.

Preliminary Installation

We recommend creating a virtual environment with python 3.6+, PyTorch and apex.

Installation Requirements

pip install -r requirements.txt

We suggest running Deepspeed's utility ds_report and verify Deepspeed components can be compiled (JIT).

Dataset

The dataset directory includes scripts to pre-process the datasets we used in our experiments (Wikipedia, Bookcorpus). See dedicated README for full details.

Pretraining

Pretraining script: run_pretraining.py

For all possible pretraining arguments see: python run_pretraining.py -h

We highly suggest reviewing the various training features we provide within the library.

Example for training with the best configuration presented in our paper (24-layers/1024H/time-based learning rate schedule/fp16):

deepspeed run_pretraining.py \
  --model_type bert-mlm --tokenizer_name bert-large-uncased \
  --hidden_act gelu \
  --hidden_size 1024 \
  --num_hidden_layers 24 \
  --num_attention_heads 16 \
  --intermediate_size 4096 \
  --hidden_dropout_prob 0.1 \
  --attention_probs_dropout_prob 0.1 \
  --encoder_ln_mode pre-ln \
  --lr 1e-3 \
  --train_batch_size 4096 \
  --train_micro_batch_size_per_gpu 32 \
  --lr_schedule time \
  --curve linear \
  --warmup_proportion 0.06 \
  --gradient_clipping 0.0 \
  --optimizer_type adamw \
  --weight_decay 0.01 \
  --adam_beta1 0.9 \
  --adam_beta2 0.98 \
  --adam_eps 1e-6 \
  --total_training_time 24.0 \
  --early_exit_time_marker 24.0 \
  --dataset_path <dataset path> \
  --output_dir /tmp/training-out \
  --print_steps 100 \
  --num_epochs_between_checkpoints 10000 \
  --job_name pretraining_experiment \
  --project_name budget-bert-pretraining \
  --validation_epochs 3 \
  --validation_epochs_begin 1 \
  --validation_epochs_end 1 \
  --validation_begin_proportion 0.05 \
  --validation_end_proportion 0.01 \
  --validation_micro_batch 16 \
  --deepspeed \
  --data_loader_type dist \
  --do_validation \
  --use_early_stopping \
  --early_stop_time 180 \
  --early_stop_eval_loss 6 \
  --seed 42 \
  --fp16

Time-based Training

Pretraining can be limited to a time-based value by defining --total_training_time=24.0 (24 hours for example).

Time-based Learning Rate Scheduling

The learning rate can be scheduled to change according to the configured total training time. The argument --total_training_time controls the total time assigned for the trainer to run, and must be specified in order to use time-based learning rate scheduling.

To select time-based learning rate scheduling, define --lr_schedule time, and define a shape for for the annealing curve (--curve=linear for example, as seen in the figure). The warmup phase of the learning rate is define by specifying a proportion (--warmup_proportion) which accounts for the time-budget proportion available in the training session (as defined by --total_training_time). For example, for a 24 hour training session, warmup_proportion=0.1 would account for 10% of 24 hours, that is, 2.4 hours (or 144 minutes) to reach peak learning rate. The learning rate will then be scheduled to reach 0 at the end of the time budget. We refer to the provided figure for an example.

Checkpoints and Finetune Checkpoints

There are 2 types of checkpoints that can be enabled:

Training checkpoint - saves model weights, optimizer state and training args. Defined by --num_epochs_between_checkpoints.
Finetuning checkpoint - saves model weights and configuration to be used for finetuning later on. Defined by --finetune_time_markers.

finetune_time_markers can be assigned multiple points in the training time-budget by providing a list of time markers of the overall training progress. For example --finetune_time_markers=0.5 will save a finetuning checkpoint when reaching 50% of training time budget. For multiple finetuning checkpoints, use commas without space 0.5,0.6,0.9.

Validation Scheduling

Enable validation while pre-training with --do_validation

Control the number of epochs between validation runs with --validation_epochs=

To control the amount of validation runs in the beginning and end (running more that validation_epochs) use validation_begin_proportion and validation_end_proportion to specify the proportion of time and, validation_epochs_begin and validation_epochs_end to control the custom values accordingly.

Mixed Precision Training

Mixed precision is supported by adding --fp16. Use --fp16_backend=ds to use Deepspeed's mixed precision backend and --fp16_backend=apex for apex (--fp16_opt controls optimization level).

Finetuning

Use run_glue.py to run finetuning for a saved checkpoint on GLUE tasks.

The finetuning script is identical to the one provided by Huggingface with the addition of our model.

For all possible pretraining arguments see: python run_glue.py -h

Example for finetuning on MRPC:

python run_glue.py \
  --model_name_or_path <path to model> \
  --task_name MRPC \
  --max_seq_length 128 \
  --output_dir /tmp/finetuning \
  --overwrite_output_dir \
  --do_train --do_eval \
  --evaluation_strategy steps \
  --per_device_train_batch_size 32 --gradient_accumulation_steps 1 \
  --per_device_eval_batch_size 32 \
  --learning_rate 5e-5 \
  --weight_decay 0.01 \
  --eval_steps 50 --evaluation_strategy steps \
  --max_grad_norm 1.0 \
  --num_train_epochs 5 \
  --lr_scheduler_type polynomial \
  --warmup_steps 50

Generating Pretraining Commands

We provide a useful script for generating multiple (or single) pretraining commands by using python generate_training_commands.py.

python generate_training_commands.py -h

	--param_file PARAM_FILE Hyperparameter and configuration yaml
  	--job_name JOB_NAME   job name
 	--init_cmd INIT_CMD   initialization command (deepspeed or python directly)

A parameter yaml must be defined with 2 main keys: hyperparameters with argument values defined as a list of possible values, and default_parameters as default values. Each generated command will be a possible combination of the various arguments specified in the hyperparameters section.

Example:

hyperparameters:
  param1: [val1, val2]
  param2: [val1, val2]

default_parameters:
  param3: 0.0

will result in:

deepspeed run_pretraining.py --param1=val1 --param2=val1 --param3=0.0
deepspeed run_pretraining.py --param1=val1 --param2=val2 --param3=0.0
deepspeed run_pretraining.py --param1=val2 --param2=val1 --param3=0.0
deepspeed run_pretraining.py --param1=val2 --param2=val2 --param3=0.0

Citation

If you find this paper or this code useful, please cite this paper:

@article{izsak2021,
  author={Izsak, Peter and Berchansky, Moshe and Levy, Omer},
  title={How to Train BERT with an Academic Budget},
  journal={arXiv preprint arXiv:2104.07705},
  url = {https://arxiv.org/abs/2104.07705} 
  year={2021}
}

Comments

GLUE results not reproducible

Hello,

I understand the results mentioned in paper for GLUE are for test set but we are not able to reproduce them. Our pretrained model had a loss of 1.72 and after sweeping through the hyper-parameters mentioned in Table 7 of paper, the best score that we got on CoLA is 43% (in validation set) and as per table 4 your test result is 57.1%. We did finetuning on 1 GPU.

Are we missing something ?

opened by lumliolum 11
Unable to train a roberta model?

Hi everyone, thanks for publishing your code :) I've been trying to run your code on some data I had (just to see how it goes) and it seems that something is stalling right after the training has been initialized.

you can find the log here: https://pastebin.com/Z2CrZA9C

when I ctrl+C the script, things seemed to be stalling at a subprocess creation point in deepspeed (whether it's 10s or 30mn)

^CKilling subprocess 3401541 Killing subprocess 3401542 Main process received SIGINT, exiting Traceback (most recent call last): File "/home/ROCQ/alpage/seddah/src/miniconda3/envs/budgetBERT/bin/deepspeed", line 6, in main() File "/home/ROCQ/alpage/seddah/src/miniconda3/envs/budgetBERT/lib/python3.9/site-packages/deepspeed/launcher/runner.py", line 362, in main result.wait() File "/home/ROCQ/alpage/seddah/src/miniconda3/envs/budgetBERT/lib/python3.9/subprocess.py", line 1189, in wait return self._wait(timeout=timeout) File "/home/ROCQ/alpage/seddah/src/miniconda3/envs/budgetBERT/lib/python3.9/subprocess.py", line 1917, in _wait (pid, sts) = self._try_wait(0) File "/home/ROCQ/alpage/seddah/src/miniconda3/envs/budgetBERT/lib/python3.9/subprocess.py", line 1875, in _try_wait (pid, sts) = os.waitpid(self.pid, wait_flags) KeyboardInterrupt

is there any chance you'd have an idea?

Best, Djamé ps: say hi to Omer :)

opened by dseddah 10
Question about validation and testing

Hello,

I wonder if there is a train-validation-test split or a train-test split only. I'm asking because during the dataset generation train-* and test-* files are generated, but no valid-* files. Then later, during pre-training, distributed_pretraining_dataset.py searches for valid-* files rather than test-* files (line 164). There are no such files and therefore, training crashes...

Thanks a lot! David

opened by peerdavid 6
How to combine wiki and bookcorpus into one file?

I found that in the dataset description, we can Use process_data.py for pre-processing wikipedia/bookcorpus datasets into a single text file.

What if I want to process these two datasets at the same time? At which step should I combine them? Thanks!

opened by shizhediao 4
bert_model not used
Hi I noticed some inconsistencies in the create_pretraining_data script and wanted to make sure they don't generate any further issues,

The script data/create_pretraining_data.py has an unused argument bert_model. Should it be used for anything?

And BertTokenizer is instantiated with max_len 512. Shouldn't it rather be with max_seq_length?

Best!
opened by senisioi 2
Clarification on "sparse token prediction" or "sparse output prediction"

First off, thanks for the great paper and making your code publicly available, it's been really valuable.

I was hoping to clarify what you meant by "sparse token prediction" (in Section 3.1, under Software) and "sparse output prediction" (in Table 2). For both instances, you cite the original RoBERTa paper, however I can't find any mention of sparsity in their paper.

In your model args, you define sparse mask prediction as predicting only masked tokens: https://github.com/IntelLabs/academic-budget-bert/blob/ea000838156e3be251699ad6a3c8b1339c76e987/pretraining/args/model_args.py#L105

Is this what is meant in the paper? If so, what is the time saved reported in Table 2 relative to? If not, what do you mean by sparse token/output prediction?

Thank you!

opened by mirandrom 2
Which kind of optimization you use from DeepSpeed

Hi,

First and foremost. Thank you for your excellent work.

Could you please elaborate more on what DeepSpeed functionality you used? Zero-1 or Zero-2 optimization, for example?

opened by jzhang38 1
Question: Easiest way to load deepspeed checkpoints as standard PyTorch models?

Hello, Thank you for this codebase.

When I try to load a finetune checkpoint with BertLMHeadModel.from_pretrained(model_path), it fails to load the weights on file. I get warnings like:

Some weights of the model checkpoint at training_out/random_init_large/run_1/pretraining_experiment-/epoch1000000_step63491/ were not used when initializing BertLMHeadModel: ['bert.encoder.FinalLayerNorm.weight', 'bert.encoder.FinalLayerNorm.bias', 'bert.encoder.layer.0.PreAttentionLayerNorm.weight', 'bert.encoder.layer.0.PreAttentionLayerNorm.bias', 'bert.encoder.layer.0.PostAttentionLayerNorm.weight', 'bert.encoder.layer.0.PostAttentionLayerNorm.bias', 'bert.encoder.layer.0.intermediate.dense_act.dense.weight', 'bert.encoder.layer.0.intermediate.dense_act.dense.bias', 'bert.encoder.layer.1.PreAttentionLayerNorm.weight', 'bert.encoder.layer.1.PreAttentionLayerNorm.bias', 'bert.encoder.layer.1.PostAttentionLayerNorm.weight', 'bert.encoder.layer.1.PostAttentionLayerNorm.bias', 'bert.encoder.layer.1.intermediate.dense_act.dense.weight', 'bert.encoder.layer.1.intermediate.dense_act.dense.bias', 'bert.encoder.layer.2.PreAttentionLayerNorm.weight', 'bert.encoder.layer.2.PreAttentionLayerNorm.bias', 'bert.encoder.layer.2.PostAttentionLayerNorm.weight', 'bert.encoder.layer.2.PostAttentionLayerNorm.bias', 'bert.encoder.layer.2.intermediate.dense_act.dense.weight', 'bert.encoder.layer.2.intermediate.dense_act.dense.bias', 'bert.encoder.layer.3.PreAttentionLayerNorm.weight', 'bert.encoder.layer.3.PreAttentionLayerNorm.bias', 'bert.encoder.layer.3.PostAttentionLayerNorm.weight', 'bert.encoder.layer.3.PostAttentionLayerNorm.bias', 'bert.encoder.layer.3.intermediate.dense_act.dense.weight', 'bert.encoder.layer.3.intermediate.dense_act.dense.bias', 'bert.encoder.layer.4.PreAttentionLayerNorm.weight', 'bert.encoder.layer.4.PreAttentionLayerNorm.bias', 'bert.encoder.layer.4.PostAttentionLayerNorm.weight', 'bert.encoder.layer.4.PostAttentionLayerNorm.bias', 'bert.encoder.layer.4.intermediate.dense_act.dense.weight', 'bert.encoder.layer.4.intermediate.dense_act.dense.bias', 'bert.encoder.layer.5.PreAttentionLayerNorm.weight', 'bert.encoder.layer.5.PreAttentionLayerNorm.bias', 'bert.encoder.layer.5.PostAttentionLayerNorm.weight', 'bert.encoder.layer.5.PostAttentionLayerNorm.bias', 'bert.encoder.layer.5.intermediate.dense_act.dense.weight', 'bert.encoder.layer.5.intermediate.dense_act.dense.bias', 'bert.encoder.layer.6.PreAttentionLayerNorm.weight', 'bert.encoder.layer.6.PreAttentionLayerNorm.bias', 'bert.encoder.layer.6.PostAttentionLayerNorm.weight', 'bert.encoder.layer.6.PostAttentionLayerNorm.bias', 'bert.encoder.layer.6.intermediate.dense_act.dense.weight', 'bert.encoder.layer.6.intermediate.dense_act.dense.bias', 'bert.encoder.layer.7.PreAttentionLayerNorm.weight', 'bert.encoder.layer.7.PreAttentionLayerNorm.bias', 'bert.encoder.layer.7.PostAttentionLayerNorm.weight', 'bert.encoder.layer.7.PostAttentionLayerNorm.bias', 'bert.encoder.layer.7.intermediate.dense_act.dense.weight', 'bert.encoder.layer.7.intermediate.dense_act.dense.bias', 'bert.encoder.layer.8.PreAttentionLayerNorm.weight', 'bert.encoder.layer.8.PreAttentionLayerNorm.bias', 'bert.encoder.layer.8.PostAttentionLayerNorm.weight', 'bert.encoder.layer.8.PostAttentionLayerNorm.bias', 'bert.encoder.layer.8.intermediate.dense_act.dense.weight', 'bert.encoder.layer.8.intermediate.dense_act.dense.bias', 'bert.encoder.layer.9.PreAttentionLayerNorm.weight', 'bert.encoder.layer.9.PreAttentionLayerNorm.bias', 'bert.encoder.layer.9.PostAttentionLayerNorm.weight', 'bert.encoder.layer.9.PostAttentionLayerNorm.bias', 'bert.encoder.layer.9.intermediate.dense_act.dense.weight', 'bert.encoder.layer.9.intermediate.dense_act.dense.bias', 'bert.encoder.layer.10.PreAttentionLayerNorm.weight', 'bert.encoder.layer.10.PreAttentionLayerNorm.bias', 'bert.encoder.layer.10.PostAttentionLayerNorm.weight', 'bert.encoder.layer.10.PostAttentionLayerNorm.bias', 'bert.encoder.layer.10.intermediate.dense_act.dense.weight', 'bert.encoder.layer.10.intermediate.dense_act.dense.bias', 'bert.encoder.layer.11.PreAttentionLayerNorm.weight', 'bert.encoder.layer.11.PreAttentionLayerNorm.bias', 'bert.encoder.layer.11.PostAttentionLayerNorm.weight', 'bert.encoder.layer.11.PostAttentionLayerNorm.bias', 'bert.encoder.layer.11.intermediate.dense_act.dense.weight', 'bert.encoder.layer.11.intermediate.dense_act.dense.bias', 'cls.predictions.transform.dense_act.dense.weight', 'cls.predictions.transform.dense_act.dense.bias'] - This IS expected if you are initializing BertLMHeadModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing BertLMHeadModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

I'm not familiar with deepspeed and would prefer not to use it in my downstream analysis. Is there a straightforward way to load the finetune checkpoint as a standard PyTorch model?

Thank you for your help!

opened by QuintinPope 1
Which versions for pre-training?
First of all: thanks for your work! I am trying to run the pre-training script. I keep having compatbility issues while installing dependencies. I would be interested to know which of these you used:

Nvidia driver version

Pytorch version + its CUDA version

CUDA version

CuDNN version

Python version

OS
opened by marcelbra 1
Fixed run_glue script

Fixes #10 the run_glue script. The problem was not that the functions were not imported from the local dataset directory as assumed in #9

The problem was that the datasets package was not listed in the requirements.txt

I was able to run the script using this setup.

opened by Rotendahl 1
Fix: Finetune checkpoints are not created

Finetune checkpoints are not created, because the function save_weights_ckpt is called which is named save_weights in BasePretrainModel. This PR fixes the issue.

opened by peerdavid 1
The file produced by process_data.py is empty
Thanks for your awesome work and detailed README! However, when I perform preprocessing with process_data.py, the output directory and file wiki_one_article_per_line.txt is empty. I think the input file of process_data.py is in the right format, like what is mentioned in wikiextractor:

<doc id="" url="" title=""> ... </doc>

I'm looking forward to your early reply :)
opened by Richar-Du 0
the eval_acc on RTE dataset is only 55%

Hello, thank you for your code. I tired to run your code with the following commond: aim=pretraining_experiment-bert-mlm--23000 deepspeed --include=localhost:0,1,2,3,4,5,6,7 --master_port 64000 run_pretraining.py
--model_type bert-mlm --tokenizer_name bert-base-uncased
--hidden_act gelu
--hidden_size 1024
--num_hidden_layers 24
--num_attention_heads 16
--intermediate_size 4096
--hidden_dropout_prob 0.1
--attention_probs_dropout_prob 0.1
--encoder_ln_mode pre-ln
--lr 1e-3
--train_batch_size 4096
--train_micro_batch_size_per_gpu 128
--lr_schedule step
--curve linear
--warmup_proportion 0.06
--gradient_clipping 0.0
--optimizer_type adamw
--weight_decay 0.01
--adam_beta1 0.9
--adam_beta2 0.98
--adam_eps 1e-6
--total_training_time 24.0
--early_exit_time_marker 24.0
--dataset_path path_to_dataset
--output_dir path_to_output
--print_steps 100
--num_epochs_between_checkpoints 10000
--job_name ${aim}
--project_name budget-bert-pretraining
--validation_epochs 3
--validation_epochs_begin 1
--validation_epochs_end 1
--validation_begin_proportion 0.05
--validation_end_proportion 0.01
--validation_micro_batch 16
--deepspeed
--data_loader_type dist
--do_validation
--use_early_stopping
--early_stop_time 180
--early_stop_eval_loss 6
--seed 42
--fp16
--max_steps 23000
--finetune_checkpoint_at_end

I did not change your code. But the eval_acc on RTE is only 55%, which is significantly lower than bert-baseline (~65%). Could you give some advices?

opened by leoozy 1
The training process will get stuck after training for one epoch

Hi, @peteriz seems like there is an issue if deleting the line global_rank = 0. With different worker reading different shard, the total num of iteration for each worker in an epoch is different. So at the end of an epoch, it has synchronization issue and gets stuck. With global_bank=0 the issue disappeared, since the torch data sampler gives each worker the same amount of data. But this has the issue as @sangmichaelxie described, it will skip every 8 files to read.

Originally posted by @Xinpeng-Wang in https://github.com/IntelLabs/academic-budget-bert/issues/22#issuecomment-1173159490

As @Xinpeng-Wang said, after you fix the bug by deleting global_bank=0, the code will get stuch after one epoch. Could you please help me to solve the problem?

opened by leoozy 10
Finetuning commands for other glue tasks

HI, Can you share what finetuning commands you used for other glue tasks? Did you use the same warmup, hyperparameters etc as for the example MRPC command you shared?

opened by raghavlite 1
What is the size of the processed data？

Hello, I processed the wikipedia and bookcorpors using your scripts. The total size of the processed wikipedia dataset is around 106G (~2650 hdf5 files). Could you please tell me whether it is right?

opened by leoozy 1
Distributed pretraining dataset question

https://github.com/IntelLabs/academic-budget-bert/blob/ea000838156e3be251699ad6a3c8b1339c76e987/pretraining/dataset/distributed_pretraining_dataset.py#L280

In the above line, the global_rank is set to 0 for all workers, meaning that the function will return the same file_index for all the workers. If world_size = 8, then it seems like this code is reading every 8th file and skipping the files in between. Can you explain why this is done? Thanks.

opened by sangmichaelxie 3

Owner

Intel Labs

GitHub

Deploy a ML inference service on a budget in less than 10 lines of code.

BudgetML is perfect for practitioners who would like to quickly deploy their models to an endpoint, but not waste a lot of time, money, and effort trying to figure out how to do this end-to-end.

1.3k Dec 25, 2022

PyTorch implementation of "A Simple Baseline for Low-Budget Active Learning".

A Simple Baseline for Low-Budget Active Learning This repository is the implementation of A Simple Baseline for Low-Budget Active Learning. In this pa

10 Nov 14, 2022

BERT model training impelmentation using 1024 A100 GPUs for MLPerf Training v1.1

Pre-trained checkpoint and bert config json file Location of checkpoint and bert config json file This MLCommons members Google Drive location contain

SAIT (Samsung Advanced Institute of Technology)

12 Apr 27, 2022

I-BERT: Integer-only BERT Quantization

I-BERT: Integer-only BERT Quantization HuggingFace Implementation I-BERT is also available in the master branch of HuggingFace! Visit the following li

139 Dec 27, 2022

Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

TR-BERT Source code and dataset for "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference". The code is based on huggaface's transformers.

37 Oct 30, 2022

LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

14 Aug 24, 2022

VD-BERT: A Unified Vision and Dialog Transformer with BERT

VD-BERT: A Unified Vision and Dialog Transformer with BERT PyTorch Code for the following paper at EMNLP2020: Title: VD-BERT: A Unified Vision and Dia

44 Nov 1, 2022

Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

Ancient Greek BERT The first and only available Ancient Greek sub-word BERT model! State-of-the-art post fine-tuning on Part-of-Speech Tagging and Mor

22 Dec 8, 2022

Rendering Point Clouds with Compute Shaders

Compute Shader Based Point Cloud Rendering This repository contains the source code to our techreport: Rendering Point Clouds with Compute Shaders and

460 Jan 5, 2023

Details about the wide minima density hypothesis and metrics to compute width of a minima

wide-minima-density-hypothesis Details about the wide minima density hypothesis and metrics to compute width of a minima This repo presents the wide m

9 Dec 27, 2022

Compute FID scores with PyTorch.

FID score for PyTorch This is a port of the official implementation of Fréchet Inception Distance to PyTorch. See https://github.com/bioinf-jku/TTUR f

2.1k Jan 6, 2023

Compute descriptors for 3D point cloud registration using a multi scale sparse voxel architecture

MS-SVConv : 3D Point Cloud Registration with Multi-Scale Architecture and Self-supervised Fine-tuning Compute features for 3D point cloud registration

42 Jul 25, 2022

A fast model to compute optical flow between two input images.

DCVNet: Dilated Cost Volumes for Fast Optical Flow This repository contains our implementation of the paper: @InProceedings{jiang2021dcvnet, title={

8 Sep 27, 2021

General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)

General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.

1k Jan 6, 2023

Fast algorithms to compute an approximation of the minimal volume oriented bounding box of a point cloud in 3D.

ApproxMVBB Status Build UnitTests Homepage Fast algorithms to compute an approximation of the minimal volume oriented bounding box of a point cloud in

390 Dec 31, 2022

Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.

Hello from magnus Magnus provides four capabilities for data teams: Compute execution plan: A DAG representation of work that you want to get done. In

12 Feb 8, 2022

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

InfoPro-Pytorch The Information Propagation algorithm for training deep networks with local supervision. (ICLR 2021) Revisiting Locally Supervised Lea

78 Dec 27, 2022

Code for pre-training CharacterBERT models (as well as BERT models).

Pre-training CharacterBERT (and BERT) This is a repository for pre-training BERT and CharacterBERT. DISCLAIMER: The code was largely adapted from an o

31 Dec 5, 2022

(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

Kaleido-BERT: Vision-Language Pre-training on Fashion Domain Mingchen Zhuge*, Dehong Gao*, Deng-Ping Fan#, Linbo Jin, Ben Chen, Haoming Zhou, Minghui

248 Dec 4, 2022