Repository for the paper "Optimal Subarchitecture Extraction for BERT"

Related tags

Text Data & NLP bort
Overview

Bort

Companion code for the paper "Optimal Subarchitecture Extraction for BERT."

Bort is an optimal subset of architectural parameters for the BERT architecture, extracted by applying a fully polynomial-time approximation scheme (FPTAS) for neural architecture search. Bort has an effective (that is, not counting the embedding layer) size of 5.5% the original BERT-large architecture, and 16% of the net size. It is also able to be pretrained in 288 GPU hours, which is 1.2% of the time required to pretrain the highest-performing BERT parametric architectural variant, RoBERTa-large. It is also 7.9x faster than BERT-base (20x faster than BERT/RoBERTa-large) on a CPU, and performs better than other compressed variants of the architecture, and some of the non-compressed variants; it obtains an average performance improvement of between 0.3% and 31%, relative, with respect to BERT-large on multiple public natural language understanding (NLU) benchmarks.

Here are the corresponding GLUE scores on the test set:

Model Score CoLA SST-2 MRPC STS-B QQP MNLI-m MNLI-mm QNLI(v2) RTE WNLI AX
Bort 83.6 63.9 96.2 94.1/92.3 89.2/88.3 66.0/85.9 88.1 87.8 92.3 82.7 71.2 51.9
BERT-Large 80.5 60.5 94.9 89.3/85.4 87.6/86.5 72.1/89.3 86.7 85.9 92.7 70.1 65.1 39.6

And SuperGLUE scores on the test set:

Model Score BoolQ CB COPA MultiRC ReCoRD RTE WiC WSC AX-b AX-g
Bort 74.1 83.7 81.9/86.5 89.6 83.7/54.1 49.8/49.0 81.2 70.1 65.8 48.0 96.1/61.5
BERT-Large 69.0 77.4 75.7/83.6 70.6 70.0/24.1 72.0/71.3 71.7 69.6 64.4 23.0 97.8/51.7

And here are the architectural parameters:

Model Parameters (M) Layers Attention heads Hidden size Intermediate size Embedding size (M) Encoder proportion (%)
Bort 56 4 8 1024 768 39 30.3
BERT-Large 340 24 16 1024 4096 31.8 90.6

Setup:

  1. You need to install the requirements from the requirements.txt file:
pip install -r requirements.txt

This code has been tested with Python 3.6.5+. To save yourself some headache we recommend you install Horovod from source, after you install MxNet. This is only needed if you are pre-training the architecture. For this, run the following commands (you'll need a C++ compiler which supports c++11 standards, like gcc > 4.8):

    pip uninstall horovod
    HOROVOD_CUDA_HOME=/usr/local/cuda-10.1 \
    HOROVOD_WITH_MXNET=1 \
    HOROVOD_GPU_ALLREDUCE=NCCL \
    pip install horovod==0.16.2 --no-cache-dir
  1. You also need to download the model from here. If you have the AWS CLI, all you need to do is run:
aws s3 cp s3://alexa-saif-bort/bort.params model/
  1. To run the tests, you also need to download the sample text from Gluon and put it in test_data/:
wget https://github.com/dmlc/gluon-nlp/blob/v0.9.x/scripts/bert/sample_text.txt
mv sample_text.txt test_data/

Pre-training:

Bort is already pre-trained, but if you want to try out other datasets, you can follow the steps here. Note that this does not run the FPTAS described in the paper, and works for a fixed architecture (Bort).

  1. First, you will need to tokenize the pre-training text:
python create_pretraining_data.py \
            --input_file <input text> \
            --output_dir <output directory> \
            --dataset_name <dataset name> \
            --dupe_factor <duplication factor> \
            --num_outputs <number of output files>

We recommend using --dataset_name openwebtext_ccnews_stories_books_cased for the vocabulary. If your data file is too large, the script will throw out-of-memory errors. We recommend splitting it into smaller chunks and then calling the script one-by-one.

  1. Then run the pre-training distillation script:
./run_pretraining_distillation.sh <num gpus> <training data> <testing data> [optional teacher checkpoint]

Please see the contents of run_pretraining_distillation.sh for example usages and additional optional configuration. If you have installed Horovod, we highly recommend you use run_pretraining_distillation_hvd.py instead.

Fine-tuning:

  1. To fine-tune Bort, run:
./run_finetune.sh <your task here>

We recommend you play with the hyperparameters from run_finetune.sh. This code supports all the tasks outlined in the paper, but for the case of the RACE dataset, you need to download the data and extract it. The default location for extraction is ~/.mxnet/datasets/race. Same goes for SuperGLUE's MultiRC, since the Gluon implementation is the old version. You can download the data and extract it to ~/.mxnet/datasets/superglue_multirc/.

It is normal to get very odd results for the fine-tuning step, since this repository only contains the training part of Agora. However, you can easily implement your own version of that algorithm. We recommend you use the following initial set of hyperparameters, and follow the requirements described in the papers at the end of this file:

seeds={0,1,2,3,4}
learning_rates={1e-4, 1e-5, 9e-6}
weight_decays={0, 10, 100, 350}
warmup_rates={0.35, 0.40, 0.45, 0.50}
batch_sizes={8, 16}

Troubleshooting:

Dependency errors

Bort requires a rather unusual environment to run. For this reason, most of the problems regarding runtime can be fixed by installing the requirements from the requirements.txt file. Also make sure to have reinstalled Horovod as outlined above.

Script failing when downloading the data

This is inherent to the way Bort is fine-tuned, since it expects the data to be preexisting for some arbitrary implementation of Agora. You can get around that error by downloading the data before running the script, e.g.:

from data.classification import BoolQTask
task = BoolQTask()
task.dataset_train()[1]; task.dataset_val()[1]; task.dataset_test()[1]
Out-of-memory errors

While Bort is designed to be efficient in terms of the space it occupies in memory, a very large batch size or sequence length will still cause you to run out of memory. More often than ever, reducing the sequence length from 512 to 256 will solve out-of-memory issues. 80% of the time, it works every time.

Slow fine-tuning/pre-training

We strongly recommend using distributed training for both fine-tuning and pre-training. If your Horovod acts weird, remember that it needs to be built after the installation of MXNet (or any framework for that matter).

Low task-specific performance

If you observe near-random task-specific performance, that is to be expected. Bort is a rather small architecture and the optimizer/scheduler/learning rate combination is quite aggressive. We highly recommend you fine-tune Bort using an implementation of Agora. More details on how to do that are in the references below, specifically the second paper. Note that we needed to implement "replay" (i.e., re-doing some iterations of Agora) to get it to converge better.

References

If you use Bort or the other algorithms in your work, we'd love to hear from it! Also, please cite the so-called "Bort trilogy" papers:

@article{deWynterApproximation,
    title={An Approximation Algorithm for Optimal Subarchitecture Extraction},
    author={Adrian de Wynter},
    year={2020},
    eprint={2010.08512},
    archivePrefix={arXiv},
    primaryClass={cs.LG},
    journal={CoRR},
    volume={abs/2010.08512},
    url={http://arxiv.org/abs/2010.08512}
}
@article{deWynterAlgorithm,
      title={An Algorithm for Learning Smaller Representations of Models With Scarce Data},
      author={Adrian de Wynter},
      year={2020},
      eprint={2010.07990},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      journal={CoRR},
      volume={abs/2010.07990},
      url={http://arxiv.org/abs/2010.07990}
}
@article{deWynterPerryOptimal,
      title={Optimal Subarchitecture Extraction for BERT},
      author={Adrian de Wynter and Daniel J. Perry},
      year={2020},
      eprint={2010.10499},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      journal={CoRR},
      volume={abs/2010.10499},
      url={http://arxiv.org/abs/2010.10499}
}

Lastly, if you use the GLUE/SuperGLUE/RACE tasks, don't forget to give proper attribution to the original authors.

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Comments
  • bort pretrain

    bort pretrain

    i try to use create_pretraining_data.py for bort pretrain

    python create_pretraining_data.py --input_file ./train/train.txt0,./train/train.txt1,./train/train.txt2,./train/train.txt3,./train/train.txt4,./train/train.txt5,./train/train.txt6,./train/train.txt7,./train/train.txt8,./train/train.txt9 --output_dir output --dupe_factor 1 INFO:root:Namespace(dataset_name='openwebtext_ccnews_stories_books_cased', dupe_factor=1, input_file='./train/train.txt0,./train/train.txt1,./train/train.txt2,./train/train.txt3,./train/train.txt4,./train/train.txt5,./train/train.txt6,./train/train.txt7,./train/train.txt8,./train/train.txt9', masked_lm_prob=0.15, max_predictions_per_seq=80, max_seq_length=512, num_outputs=1, num_workers=8, output_dir='output', random_seed=12345, short_seq_prob=0.1, verbose=False, whole_word_mask=False) INFO:root: ./train/train.txt0 INFO:root: ./train/train.txt1 INFO:root: ./train/train.txt2 INFO:root: ./train/train.txt3 INFO:root: ./train/train.txt4 INFO:root: ./train/train.txt5 INFO:root: ./train/train.txt6 INFO:root: ./train/train.txt7 INFO:root: ./train/train.txt8 INFO:root: ./train/train.txt9 INFO:root:*** Reading from 10 input files ***

    multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib64/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/usr/lib64/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(*args)) File "create_pretraining_data.py", line 304, in create_training_instances vocab, tokenizer))) File "create_pretraining_data.py", line 385, in create_instances_from_document 0, len(all_documents) - 2) File "/export/sdb/xiongwei/tfmxnet/lib64/python3.6/random.py", line 221, in randint return self.randrange(a, b+1) File "/export/sdb/xiongwei/tfmxnet/lib64/python3.6/random.py", line 199, in randrange raise ValueError("empty range for randrange() (%d,%d, %d)" % (istart, istop, width)) ValueError: empty range for randrange() (0,0, 0) """

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last): File "create_pretraining_data.py", line 691, in main() File "create_pretraining_data.py", line 597, in main pool.map(create_training_instances, process_args) File "/usr/lib64/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/usr/lib64/python3.6/multiprocessing/pool.py", line 644, in get raise self._value ValueError: empty range for randrange() (0,0, 0)

    bug 
    opened by nicexw 5
  • Pre-training-Using-Knowledge-Distillation is better than Pre-training-Only for downstream tasks?

    Pre-training-Using-Knowledge-Distillation is better than Pre-training-Only for downstream tasks?

    image

    But this picture shows the same MLM accuracy to the end. What is the downstream tasks' performance comparison for pre-training-using-knowledge-distillation and pre-training-only? I am new to BORT. Thank you very much. @adewynter

    image

    opened by guotong1988 2
  • Huggingface support

    Huggingface support

    Great work! Really looking forward to trying this out. Would it be possible to add Huggingface compatibility please? That would make it a lot easier to test with my application.

    opened by sbsky 2
  • how to train model on another language?

    how to train model on another language?

    Hi! If I want to train bort on another language, do I need first to pretrain bert on this language and then extract sub-model from it? Or can I just train bort from zero without pretrained bert?

    opened by Archelunch 1
  • Create pretraining data with multiprocessing not Implemented

    Create pretraining data with multiprocessing not Implemented

    Hello @adewynter, I noticed that you are using multiprocessing to create pretraining data script but your code is not completed I guess as mentioned on that line below:

    • https://github.com/alexa/bort/blob/05adebf7a51ef03927947a24e08d20cd5609689e/create_pretraining_data.py#L588

    Also here, What is worker_pool ?

    • https://github.com/alexa/bort/blob/05adebf7a51ef03927947a24e08d20cd5609689e/create_pretraining_data.py#L248

    So, Is your idea is to create a pool inside another pool? I mean pool with children of pools ?. Could you please give me more hints?

    Thanks in advance

    opened by 7AM7 1
  • Can't download model again!

    Can't download model again!

    Hi, I can't download this model from: https://alexa-saif-bort.s3.amazonaws.com/bort.params, it says "Access Denied". It seems the same error as the first error(Closed now).

    opened by killua-zyk 1
  • I couldn't understand the configuration of the model. please can someone clarify?

    I couldn't understand the configuration of the model. please can someone clarify?

    For bort_4_8_768_1024 mdel: 4 = number of attention heads 8 = number of layers 768 = hidden size 1024 = intermediate size

    according to the paper: Architecture | <D, A, H, I> |Parameters (M) Bort |<4, 8, 768, 1024> |56.1

    I think in the implementation the hidden size(H) and intermediate size(I) are interchanged. Isn't it?

    good first issue 
    opened by preethamgali 1
  • Accuracy during fine-tuning is very low (only 0.68)

    Accuracy during fine-tuning is very low (only 0.68)

    Hi, I have tried to finetune the model with the run_finetune.sh script, but the accuracy is very low.

    Here is the log:

    INFO:root:18:19:05 Namespace(accumulate=None, batch_size=8, dataset='openwebtext_ccnews_stories_books_cased', dev_batch_size=8, dropout=0.1, dtype='float32', early_stop=200, epochs=1, epsilon=1e-06, gpu=1, init='uniform', log_interval=10, lr=5e-05, max_len=512, model_parameters=None, momentum=0.9, multirc_test_location='/home/ec2-user/.mxnet/datasets/superglue_multirc/test.jsonl', no_distributed=False, only_inference=False, output_dir='./output_dir', pretrained_parameters='model/bort.params', prob=0.5, race_dataset_location=None, ramp_up_epochs=1, record_dev_location='/home/ec2-user/.mxnet/datasets/superglue_record/val.jsonl', record_test_location='/home/ec2-user/.mxnet/datasets/superglue_record/test.jsonl', seed=2, task_name='MRPC', training_steps=None, use_scheduler=True, warmup_ratio=0.45, weight_decay=110.0) INFO:root:18:19:05 get_bort_model: bort_4_8_768_1024 INFO:root:18:19:08 loading Bort params from model/bort.params INFO:root:18:19:08 Processing dataset... INFO:root:18:19:11 Now we are doing Bort classification training on gpu(0)! INFO:root:18:19:11 training steps=458 INFO:root:18:19:12 [Epoch 1 Batch 10/465] loss=3.2699, lr=0.0000021845, metrics:f1:0.5835,accuracy:0.4800 INFO:root:18:19:12 [Epoch 1 Batch 20/465] loss=3.5952, lr=0.0000046117, metrics:f1:0.5684,accuracy:0.4774 INFO:root:18:19:13 [Epoch 1 Batch 30/465] loss=3.3002, lr=0.0000070388, metrics:f1:0.6084,accuracy:0.5149 INFO:root:18:19:13 [Epoch 1 Batch 40/465] loss=2.6364, lr=0.0000094660, metrics:f1:0.6187,accuracy:0.5302 INFO:root:18:19:13 [Epoch 1 Batch 50/465] loss=2.9594, lr=0.0000118932, metrics:f1:0.6443,accuracy:0.5494 INFO:root:18:19:14 [Epoch 1 Batch 60/465] loss=2.4208, lr=0.0000143204, metrics:f1:0.6581,accuracy:0.5621 INFO:root:18:19:14 [Epoch 1 Batch 70/465] loss=3.3903, lr=0.0000167476, metrics:f1:0.6549,accuracy:0.5586 INFO:root:18:19:15 [Epoch 1 Batch 80/465] loss=2.5813, lr=0.0000191748, metrics:f1:0.6504,accuracy:0.5606 INFO:root:18:19:15 [Epoch 1 Batch 90/465] loss=2.2408, lr=0.0000216019, metrics:f1:0.6447,accuracy:0.5610 INFO:root:18:19:16 [Epoch 1 Batch 100/465] loss=3.1120, lr=0.0000240291, metrics:f1:0.6551,accuracy:0.5675 INFO:root:18:19:16 [Epoch 1 Batch 110/465] loss=2.4501, lr=0.0000264563, metrics:f1:0.6541,accuracy:0.5647 INFO:root:18:19:16 [Epoch 1 Batch 120/465] loss=2.6082, lr=0.0000288835, metrics:f1:0.6571,accuracy:0.5645 INFO:root:18:19:17 [Epoch 1 Batch 130/465] loss=2.4734, lr=0.0000313107, metrics:f1:0.6672,accuracy:0.5741 INFO:root:18:19:17 [Epoch 1 Batch 140/465] loss=2.2288, lr=0.0000337379, metrics:f1:0.6645,accuracy:0.5740 INFO:root:18:19:18 [Epoch 1 Batch 150/465] loss=1.6799, lr=0.0000361650, metrics:f1:0.6641,accuracy:0.5758 INFO:root:18:19:18 [Epoch 1 Batch 160/465] loss=1.1061, lr=0.0000385922, metrics:f1:0.6703,accuracy:0.5810 INFO:root:18:19:18 [Epoch 1 Batch 170/465] loss=1.4413, lr=0.0000410194, metrics:f1:0.6712,accuracy:0.5835 INFO:root:18:19:19 [Epoch 1 Batch 180/465] loss=1.2923, lr=0.0000434466, metrics:f1:0.6684,accuracy:0.5810 INFO:root:18:19:19 [Epoch 1 Batch 190/465] loss=1.9684, lr=0.0000458738, metrics:f1:0.6627,accuracy:0.5780 INFO:root:18:19:20 [Epoch 1 Batch 200/465] loss=1.6337, lr=0.0000483010, metrics:f1:0.6620,accuracy:0.5772 INFO:root:18:19:20 [Epoch 1 Batch 210/465] loss=1.9206, lr=0.0000494048, metrics:f1:0.6632,accuracy:0.5771 INFO:root:18:19:21 [Epoch 1 Batch 220/465] loss=1.5550, lr=0.0000474206, metrics:f1:0.6655,accuracy:0.5782 INFO:root:18:19:21 [Epoch 1 Batch 230/465] loss=1.5174, lr=0.0000454365, metrics:f1:0.6647,accuracy:0.5760 INFO:root:18:19:21 [Epoch 1 Batch 240/465] loss=1.6342, lr=0.0000434524, metrics:f1:0.6563,accuracy:0.5698 INFO:root:18:19:22 [Epoch 1 Batch 250/465] loss=1.6304, lr=0.0000414683, metrics:f1:0.6521,accuracy:0.5669 INFO:root:18:19:22 [Epoch 1 Batch 260/465] loss=1.5732, lr=0.0000394841, metrics:f1:0.6523,accuracy:0.5681 INFO:root:18:19:23 [Epoch 1 Batch 270/465] loss=0.9988, lr=0.0000375000, metrics:f1:0.6479,accuracy:0.5661 INFO:root:18:19:23 [Epoch 1 Batch 280/465] loss=1.8495, lr=0.0000355159, metrics:f1:0.6485,accuracy:0.5673 INFO:root:18:19:23 [Epoch 1 Batch 290/465] loss=1.0105, lr=0.0000335317, metrics:f1:0.6523,accuracy:0.5702 INFO:root:18:19:24 [Epoch 1 Batch 300/465] loss=0.8022, lr=0.0000315476, metrics:f1:0.6535,accuracy:0.5708 INFO:root:18:19:24 [Epoch 1 Batch 310/465] loss=0.8974, lr=0.0000295635, metrics:f1:0.6546,accuracy:0.5713 INFO:root:18:19:25 [Epoch 1 Batch 320/465] loss=0.9764, lr=0.0000275794, metrics:f1:0.6527,accuracy:0.5698 INFO:root:18:19:25 [Epoch 1 Batch 330/465] loss=0.8853, lr=0.0000255952, metrics:f1:0.6521,accuracy:0.5692 INFO:root:18:19:25 [Epoch 1 Batch 340/465] loss=0.9318, lr=0.0000236111, metrics:f1:0.6521,accuracy:0.5687 INFO:root:18:19:26 [Epoch 1 Batch 350/465] loss=0.9023, lr=0.0000216270, metrics:f1:0.6548,accuracy:0.5702 INFO:root:18:19:26 [Epoch 1 Batch 360/465] loss=0.8698, lr=0.0000196429, metrics:f1:0.6545,accuracy:0.5697 INFO:root:18:19:27 [Epoch 1 Batch 370/465] loss=0.9013, lr=0.0000176587, metrics:f1:0.6552,accuracy:0.5698 INFO:root:18:19:27 [Epoch 1 Batch 380/465] loss=0.8277, lr=0.0000156746, metrics:f1:0.6550,accuracy:0.5698 INFO:root:18:19:28 [Epoch 1 Batch 390/465] loss=0.7523, lr=0.0000136905, metrics:f1:0.6591,accuracy:0.5732 INFO:root:18:19:28 [Epoch 1 Batch 400/465] loss=0.8378, lr=0.0000117063, metrics:f1:0.6621,accuracy:0.5759 INFO:root:18:19:28 [Epoch 1 Batch 410/465] loss=0.8365, lr=0.0000097222, metrics:f1:0.6633,accuracy:0.5783 INFO:root:18:19:29 [Epoch 1 Batch 420/465] loss=0.8266, lr=0.0000077381, metrics:f1:0.6610,accuracy:0.5779 INFO:root:18:19:29 [Epoch 1 Batch 430/465] loss=0.7012, lr=0.0000057540, metrics:f1:0.6627,accuracy:0.5794 INFO:root:18:19:30 [Epoch 1 Batch 440/465] loss=0.8099, lr=0.0000037698, metrics:f1:0.6640,accuracy:0.5798 INFO:root:18:19:30 [Epoch 1 Batch 450/465] loss=0.8413, lr=0.0000017857, metrics:f1:0.6645,accuracy:0.5798 INFO:root:18:19:30 [Epoch 1 Batch 460/465] loss=0.8424, lr=0.0000001000, metrics:f1:0.6651,accuracy:0.5799 INFO:root:18:19:31 Now we are doing evaluation on dev with gpu(0). INFO:root:18:19:31 [Batch 10/51] loss=0.6030, metrics:f1:0.8086,accuracy:0.7000 INFO:root:18:19:31 [Batch 20/51] loss=0.6176, metrics:f1:0.8028,accuracy:0.6875 INFO:root:18:19:31 [Batch 30/51] loss=0.5927, metrics:f1:0.8101,accuracy:0.6958 INFO:root:18:19:31 [Batch 40/51] loss=0.6228, metrics:f1:0.8073,accuracy:0.6906 INFO:root:18:19:31 [Batch 50/51] loss=0.6260, metrics:f1:0.8025,accuracy:0.6850 INFO:root:18:19:31 epoch: 0; validation metrics:f1:0.8019,accuracy:0.6838 INFO:root:18:19:31 Time cost=0.61s, throughput=670.77 samples/s INFO:root:18:19:32 params saved in: ./output_dir/model_bort_MRPC_0.params INFO:root:18:19:32 Time cost=20.66s INFO:root:18:19:32 Best model at epoch 0. Validation metrics:f1:0.8019,accuracy:0.6838,average:0.7429 INFO:root:18:19:32 Now we are doing testing on test with gpu(0). INFO:root:18:19:34 Time cost=2.26s, throughput=764.32 samples/s

    opened by waugustus 1
  • Bump numpy from 1.16.2 to 1.22.0

    Bump numpy from 1.16.2 to 1.22.0

    Bumps numpy from 1.16.2 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Mask-Filling with pretrained BORT

    Mask-Filling with pretrained BORT

    Hello, I am trying to get the "mask-filling" to work correctly with BORT and don't seem to get good results. In short, I load the pretrained BORT model including the decoder (use_decoder=True) and then pass as an input something along the lines "The weather is today." I would expect the pretrained BORT model with decoder head to be able to predict a sensible word in this case, but I only get weird results (such as predicting the token...) :-/

    I made a short notebook that showcases exactly how I use BORT with gluonnlp and mxnet and am not able to get any good results for the mask-filling problem.

    Here is the notebook: https://colab.research.google.com/drive/17qNu6g1s2KJEwuRl1s5c3ipk2-99dZfm?usp=sharing that:

    a) loads a tokenizer + vocab b) loads the pretrained model + decoder c) runs a forward pass through the encoder and the decoder lm head d) shows that the result is not as good as expected

    @adewynter I would be very grateful if you could take a look at possible errors I've made in the notebook that could explain the strange behavior.

    Thank you very much!

    cc @stefan-it

    opened by patrickvonplaten 3
Owner
Alexa
Alexa
Implementing SimCSE(paper, official repository) using TensorFlow 2 and KR-BERT.

KR-BERT-SimCSE Implementing SimCSE(paper, official repository) using TensorFlow 2 and KR-BERT. Training Unsupervised python train_unsupervised.py --mi

Jeong Ukjae 27 Dec 12, 2022
This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Proteno This is the data release associated with the corresponding NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deploymen

null 37 Dec 4, 2022
This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Word-Level Coreference Resolution This is a repository with the code to reproduce the experiments described in the paper of the same name, which was a

null 79 Dec 27, 2022
The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models

Graformer The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models Graformer (also named BridgeTransformer in t

null 22 Dec 14, 2022
Repository for the paper: VoiceMe: Personalized voice generation in TTS

?? VoiceMe: Personalized voice generation in TTS Abstract Novel text-to-speech systems can generate entirely new voices that were not seen during trai

Pol van Rijn 80 Dec 29, 2022
Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021.

capbot-siic Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021. Problem Inspiration A plethora

Aryan Kargwal 19 Feb 17, 2022
This project is part of Eleuther AI's quest to create a massive repository of high quality text data for training language models.

This project is part of Eleuther AI's quest to create a massive repository of high quality text data for training language models.

EleutherAI 42 Dec 13, 2022
This repository contains the code for "Generating Datasets with Pretrained Language Models".

Datasets from Instructions (DINO ?? ) This repository contains the code for Generating Datasets with Pretrained Language Models. The paper introduces

Timo Schick 154 Jan 1, 2023
This repository describes our reproducible framework for assessing self-supervised representation learning from speech

LeBenchmark: a reproducible framework for assessing SSL from speech Self-Supervised Learning (SSL) using huge unlabeled data has been successfully exp

null 49 Aug 24, 2022
NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations

NL-Augmenter ?? → ?? The NL-Augmenter is a collaborative effort intended to add transformations of datasets dealing with natural language. Transformat

null 684 Jan 9, 2023
This repository contains all the source code that is needed for the project : An Efficient Pipeline For Bloom’s Taxonomy Using Natural Language Processing and Deep Learning

Pipeline For NLP with Bloom's Taxonomy Using Improved Question Classification and Question Generation using Deep Learning This repository contains all

Rohan Mathur 9 Jul 17, 2021
This repository is home to the Optimus data transformation plugins for various data processing needs.

Transformers Optimus's transformation plugins are implementations of Task and Hook interfaces that allows execution of arbitrary jobs in optimus. To i

Open Data Platform 37 Dec 14, 2022
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

Speech-Backbones This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab. Grad-TTS Official implementation of the Grad-

HUAWEI Noah's Ark Lab 295 Jan 7, 2023
Repository for Graph2Pix: A Graph-Based Image to Image Translation Framework

Graph2Pix: A Graph-Based Image to Image Translation Framework Installation Install the dependencies in env.yml $ conda env create -f env.yml $ conda a

null 18 Nov 17, 2022
Plugin repository for Macast

Macast-plugins Plugin repository for Macast. How to use third-party player plugin Download Macast from GitHub Release. Download the plugin you want fr

null 109 Jan 4, 2023
In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset.

Med-VQA In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset. Two of these are made on top of Facebook AI Reasearch's Multi-Mo

Kshitij Ambilduke 8 Apr 14, 2022
this repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

uber-pickups-analysis Data Source: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city Information about data set The dataset contain

null 1 Nov 2, 2021