BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).

Overview

BMInf

DocumentationFeaturesInstallationQuick StartSupported Models简体中文

BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs). It has following features:

  • Hardware Friendly. BMInf supports running models with more than 10 billion parameters on a single NVIDIA GTX 1060 GPU in its minimum requirements. Running with better GPUs leads to better performance. In cases where the GPU memory supports the large model inference (such as V100 or A100), BMInf still has a significant performance improvement over the existing PyTorch implementation.
  • Open. The parameters of models are open. Users can access large models locally with their own machines without applying or accessing an online API.
  • Comprehensive Ability. BMInf supports generative model CPM1 [1], general language model CPM2.1 [2], and dialogue model EVA [3]. The abilities of these models cover text completion, text generation, and dialogue generation.
  • Upgraded Model. Based on CPM2 [2], the newly upgraded model CPM2.1 is currently supported. Based on continual learning, the text generation ability of CPM2.1 is greatly improved compared to CPM2.
  • Convenient Deployment. Using BMInf, it will be fast and convenient to develop interesting downstream applications.

Demo

demo

Documentation

Our documentation provides more information about the package.

Install

  • From pip: pip install bminf

  • From source code: download the package and run python setup.py install

  • From docker: docker run -it --gpus 1 -v $HOME/.cache/bigmodels:/root/.cache/bigmodels --rm openbmb/bminf python3 examples/fill_blank.py

Here we list the minimum and recommended configurations for running BMInf.

Minimum Configuration Recommended Configuration
Memory 16GB 24GB
GPU NVIDIA GeForce GTX 1060 6GB NVIDIA Tesla V100 16GB
PCI-E PCI-E 3.0 x16 PCI-E 3.0 x16

Quick Start

Here we provide a simple script for using BMInf.

Firstly, import a model from the model base (e.g. CPM1, CPM2, EVA).

import bminf
cpm2 = bminf.models.CPM2()

Then define the text and use the token to denote the blank to fill in.

制度,即推出淡季日、平季日、旺季日和特定日门票。价格为418元,价格为528元,价格为638元,价格为元。北京环球度假区将提供90天滚动价格日历,以方便游客提前规划行程。" ">
text = "北京环球度假区相关负责人介绍,北京环球影城指定单日门票将采用制度,即推出淡季日、平季日、旺季日和特定日门票。价格为418元,价格为528元,价格为638元,价格为元。北京环球度假区将提供90天滚动价格日历,以方便游客提前规划行程。"

Use the fill_blank function to obtain the results and replace tokens with the results.

", "\033[0;32m" + value + "\033[0m", 1) print(text) ">
for result in cpm2.fill_blank(text, 
    top_p=1.0,
    top_n=10, 
    temperature=0.9,
    frequency_penalty=0,
    presence_penalty=0
):
    value = result["text"]
    text = text.replace("", "\033[0;32m" + value + "\033[0m", 1)
print(text)

Finally, you can get the predicted text. For more examples, go to the examples folder.

Supported Models

BMInf currently supports these models:

  • CPM2.1. CPM2.1 is an upgraded version of CPM2 [1], which is a general Chinese pre-trained language model with 11 billion parameters. Based on CPM2, CPM2.1 introduces a generative pre-training task and was trained via the continual learning paradigm. In experiments, CPM2.1 has a better generation ability than CPM2.

  • CPM1. CPM1 [2] is a generative Chinese pre-trained language model with 2.6 billion parameters. The architecture of CPM1 is similar to GPT [4] and it can be used in various NLP tasks such as conversation, essay generation, cloze test, and language understanding.

  • EVA. EVA [3] is a Chinese pre-trained dialogue model with 2.8 billion parameters. EVA performs well on many dialogue tasks, especially in the multi-turn interaction of human-bot conversations.

Besides these models, we are now working on adding more PLMs especially large-scale PLMs. We welcome every contributor to add their models to this project by proposing an issue.

Performances

Here we report the speeds of CPM2 encoder and decoder we have tested on different platforms. You can also run benchmark/cpm2/encoder.py and benchmark/cpm2/decoder.py to test the speed on your machine!

Implementation GPU Encoder Speed (tokens/s) Decoder Speed (tokens/s)
BMInf NVIDIA GeForce GTX 1060 533 1.6
BMInf NVIDIA GeForce GTX 1080Ti 1200 12
BMInf NVIDIA GeForce GTX 2080Ti 2275 19
BMInf NVIDIA Tesla V100 2966 20
BMInf NVIDIA Tesla A100 4365 26
PyTorch NVIDIA Tesla V100 - 3
PyTorch NVIDIA Tesla A100 - 7

Contributing

Here is the QRCode to our WeChat user community and we welcome others to contribute codes following our contributing guidelines.

Our community

License

The package is released under the Apache 2.0 License.

References

  1. CPM-2: Large-scale Cost-efficient Pre-trained Language Models. Zhengyan Zhang, Yuxian Gu, Xu Han, Shengqi Chen, Chaojun Xiao, Zhenbo Sun, Yuan Yao, Fanchao Qi, Jian Guan, Pei Ke, Yanzheng Cai, Guoyang Zeng, Zhixing Tan, Zhiyuan Liu, Minlie Huang, Wentao Han, Yang Liu, Xiaoyan Zhu, Maosong Sun.
  2. CPM: A Large-scale Generative Chinese Pre-trained Language Model. Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
  3. EVA: An Open-Domain Chinese Dialogue System with Large-Scale Generative Pre-Training. Hao Zhou, Pei Ke, Zheng Zhang, Yuxian Gu, Yinhe Zheng, Chujie Zheng, Yida Wang, Chen Henry Wu, Hao Sun, Xiaocong Yang, Bosi Wen, Xiaoyan Zhu, Minlie Huang, Jie Tang.
  4. Language Models are Unsupervised Multitask Learners. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever.
Issues
  • cublas error: CUBLAS_STATUS_NOT_SUPPORTED

    cublas error: CUBLAS_STATUS_NOT_SUPPORTED

    运行examples/generate.py时报错,上层调用栈是functions/gemm.py的第249行。 https://github.com/OpenBMB/BMInf/blob/d40c6f5d5678e1cba771048ecd0923bceae176e2/bminf/functions/gemm.py#L249

    使用的环境为: cuda 10.1(cublas版本为10.1.0.63) BMInf 0.0.4 通过clone + python setup.py install方式安装 torch 1.7.1

    bug 
    opened by huhk-sysu 4
  • running examples and get error:  type object 'cublasLt' has no attribute 'cublasLtHandle_t'

    running examples and get error: type object 'cublasLt' has no attribute 'cublasLtHandle_t'

    when running examples/fill_blank.py, get error: AttributeError: type object 'cublasLt' has no attribute 'cublasLtHandle_t'

    cuda version is 10.0 have successfully installed bminf 0.4.0 any idea how to solve this problem?

    question 
    opened by qiufengyuyi 3
  • [BUG] eva2 = bminf.models.EVA2()

    [BUG] eva2 = bminf.models.EVA2()

    EVA报错

    In [11]: eva2 = bminf.models.EVA2()

    KeyError Traceback (most recent call last) in () ----> 1 eva2 = bminf.models.EVA2()

    ~/anaconda3/envs/yhs/lib/python3.6/site-packages/bminf/models/eva2.py in init(self, device, memory_limit, config) 56 raise ValueError("Memory is not enough") 57 ---> 58 super().init(config) 59 60 def dialogue(self,

    ~/anaconda3/envs/yhs/lib/python3.6/site-packages/bminf/arch/t5/model.py in init(self, config) 73 vocab_path = data.ensure_file(config.MODEL_NAME, "vocab.txt") 74 ---> 75 self.tokenizer = T5Tokenizer(vocab_path) 76 77 self.device = config.DEVICE

    ~/anaconda3/envs/yhs/lib/python3.6/site-packages/bminf/arch/t5/tokenizer.py in init(self, vocab_path, max_len, max_sentinels) 81 self.translator_dec = str.maketrans("\u2582\u2583", " \n") 82 ---> 83 self.sentinel_list = [self.encoder['<s_{}>'.format(i)] for i in range(max_sentinels)] 84 85 @property

    ~/anaconda3/envs/yhs/lib/python3.6/site-packages/bminf/arch/t5/tokenizer.py in (.0) 81 self.translator_dec = str.maketrans("\u2582\u2583", " \n") 82 ---> 83 self.sentinel_list = [self.encoder['<s_{}>'.format(i)] for i in range(max_sentinels)] 84 85 @property

    KeyError: '<s_0>'

    bug solved 
    opened by Hansen06 3
  • [MODEL] Debug Self-Trained GPT-Model

    [MODEL] Debug Self-Trained GPT-Model

    Introduction When I load a trained gpt-2 model into BMInf and do some inference, it would produce NaN in forwarding propagation. Although I can get DEBUG INFO, I still do not know what's going wrong. Here is the log info, how can I fix it?

    2021-10-08 03:12:08,611 - model - INFO - MAX_LENGTH: 1024
    2021-10-08 03:12:08,622 - model - INFO - Start loading parameters from disk to cpu
    2021-10-08 03:12:08,622 - bminf.layers.base - DEBUG - Parameter Loader [CodeGPT]: size 75027456
    2021-10-08 03:12:08,623 - bminf.layers.base - DEBUG - Parameter Loader [CodeGPT]: parameters 0, sub_layers 5
    2021-10-08 03:12:08,623 - bminf.layers.base - DEBUG - In input_embedding: ==
    2021-10-08 03:12:08,623 - bminf.layers.base - DEBUG - Parameter Loader [Embedding]: size 30781440
    2021-10-08 03:12:08,623 - bminf.layers.base - DEBUG - Parameter Loader [Embedding]: parameters 1, sub_layers 0
    2021-10-08 03:12:08,645 - bminf.layers.base - DEBUG - Out input_embedding: ==
    2021-10-08 03:12:08,645 - bminf.layers.base - DEBUG - In position_embedding: ==
    2021-10-08 03:12:08,645 - bminf.layers.base - DEBUG - Parameter Loader [Embedding]: size 1572864
    2021-10-08 03:12:08,645 - bminf.layers.base - DEBUG - Parameter Loader [Embedding]: parameters 1, sub_layers 0
    2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Out position_embedding: ==
    2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - In input_mask: ==
    2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [InputMask]: size 0
    2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [InputMask]: parameters 0, sub_layers 0
    2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Out input_mask: ==
    2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - In layers: ==
    2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [LayerList]: size 42670080
    2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [LayerList]: parameters 0, sub_layers 6
    2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - In 0: ==
    2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: size 7111680
    2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: parameters 0, sub_layers 4
    2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - In layer_nrom_before_self_attn: ==
    2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
    2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
    2021-10-08 03:12:08,646 - bminf.layers.base - DEBUG - Out layer_nrom_before_self_attn: ==
    2021-10-08 03:12:08,647 - bminf.layers.base - DEBUG - In self_attention: ==
    2021-10-08 03:12:08,647 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: size 2371584
    2021-10-08 03:12:08,647 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: parameters 6, sub_layers 0
    2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Out self_attention: ==
    2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - In layer_nrom_before_ff: ==
    2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
    2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
    2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Out layer_nrom_before_ff: ==
    2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - In dense_gelu_dense: ==
    2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: size 4733952
    2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: parameters 0, sub_layers 2
    2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - In wi: ==
    2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2371584
    2021-10-08 03:12:08,649 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
    2021-10-08 03:12:08,651 - bminf.layers.base - DEBUG - Out wi: ==
    2021-10-08 03:12:08,651 - bminf.layers.base - DEBUG - In wo: ==
    2021-10-08 03:12:08,651 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2362368
    2021-10-08 03:12:08,651 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
    2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Out wo: ==
    2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Out dense_gelu_dense: ==
    2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Out 0: ==
    2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - In 1: ==
    2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: size 7111680
    2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: parameters 0, sub_layers 4
    2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - In layer_nrom_before_self_attn: ==
    2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
    2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
    2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Out layer_nrom_before_self_attn: ==
    2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - In self_attention: ==
    2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: size 2371584
    2021-10-08 03:12:08,653 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: parameters 6, sub_layers 0
    2021-10-08 03:12:08,655 - bminf.layers.base - DEBUG - Out self_attention: ==
    2021-10-08 03:12:08,655 - bminf.layers.base - DEBUG - In layer_nrom_before_ff: ==
    2021-10-08 03:12:08,655 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
    2021-10-08 03:12:08,655 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
    2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - Out layer_nrom_before_ff: ==
    2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - In dense_gelu_dense: ==
    2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: size 4733952
    2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: parameters 0, sub_layers 2
    2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - In wi: ==
    2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2371584
    2021-10-08 03:12:08,656 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
    2021-10-08 03:12:08,658 - bminf.layers.base - DEBUG - Out wi: ==
    2021-10-08 03:12:08,658 - bminf.layers.base - DEBUG - In wo: ==
    2021-10-08 03:12:08,658 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2362368
    2021-10-08 03:12:08,658 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
    2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Out wo: ==
    2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Out dense_gelu_dense: ==
    2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Out 1: ==
    2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - In 2: ==
    2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: size 7111680
    2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: parameters 0, sub_layers 4
    2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - In layer_nrom_before_self_attn: ==
    2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
    2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
    2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Out layer_nrom_before_self_attn: ==
    2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - In self_attention: ==
    2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: size 2371584
    2021-10-08 03:12:08,660 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: parameters 6, sub_layers 0
    2021-10-08 03:12:08,662 - bminf.layers.base - DEBUG - Out self_attention: ==
    2021-10-08 03:12:08,662 - bminf.layers.base - DEBUG - In layer_nrom_before_ff: ==
    2021-10-08 03:12:08,662 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
    2021-10-08 03:12:08,662 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
    2021-10-08 03:12:08,662 - bminf.layers.base - DEBUG - Out layer_nrom_before_ff: ==
    2021-10-08 03:12:08,662 - bminf.layers.base - DEBUG - In dense_gelu_dense: ==
    2021-10-08 03:12:08,663 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: size 4733952
    2021-10-08 03:12:08,663 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: parameters 0, sub_layers 2
    2021-10-08 03:12:08,663 - bminf.layers.base - DEBUG - In wi: ==
    2021-10-08 03:12:08,663 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2371584
    2021-10-08 03:12:08,663 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
    2021-10-08 03:12:08,664 - bminf.layers.base - DEBUG - Out wi: ==
    2021-10-08 03:12:08,665 - bminf.layers.base - DEBUG - In wo: ==
    2021-10-08 03:12:08,665 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2362368
    2021-10-08 03:12:08,665 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
    2021-10-08 03:12:08,666 - bminf.layers.base - DEBUG - Out wo: ==
    2021-10-08 03:12:08,666 - bminf.layers.base - DEBUG - Out dense_gelu_dense: ==
    2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Out 2: ==
    2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - In 3: ==
    2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: size 7111680
    2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: parameters 0, sub_layers 4
    2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - In layer_nrom_before_self_attn: ==
    2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
    2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
    2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Out layer_nrom_before_self_attn: ==
    2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - In self_attention: ==
    2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: size 2371584
    2021-10-08 03:12:08,667 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: parameters 6, sub_layers 0
    2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Out self_attention: ==
    2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - In layer_nrom_before_ff: ==
    2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
    2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
    2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Out layer_nrom_before_ff: ==
    2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - In dense_gelu_dense: ==
    2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: size 4733952
    2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: parameters 0, sub_layers 2
    2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - In wi: ==
    2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2371584
    2021-10-08 03:12:08,669 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
    2021-10-08 03:12:08,671 - bminf.layers.base - DEBUG - Out wi: ==
    2021-10-08 03:12:08,671 - bminf.layers.base - DEBUG - In wo: ==
    2021-10-08 03:12:08,671 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2362368
    2021-10-08 03:12:08,671 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
    2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - Out wo: ==
    2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - Out dense_gelu_dense: ==
    2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - Out 3: ==
    2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - In 4: ==
    2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: size 7111680
    2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: parameters 0, sub_layers 4
    2021-10-08 03:12:08,673 - bminf.layers.base - DEBUG - In layer_nrom_before_self_attn: ==
    2021-10-08 03:12:08,674 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
    2021-10-08 03:12:08,674 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
    2021-10-08 03:12:08,674 - bminf.layers.base - DEBUG - Out layer_nrom_before_self_attn: ==
    2021-10-08 03:12:08,674 - bminf.layers.base - DEBUG - In self_attention: ==
    2021-10-08 03:12:08,674 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: size 2371584
    2021-10-08 03:12:08,674 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: parameters 6, sub_layers 0
    2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Out self_attention: ==
    2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - In layer_nrom_before_ff: ==
    2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
    2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
    2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Out layer_nrom_before_ff: ==
    2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - In dense_gelu_dense: ==
    2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: size 4733952
    2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: parameters 0, sub_layers 2
    2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - In wi: ==
    2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2371584
    2021-10-08 03:12:08,676 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
    2021-10-08 03:12:08,678 - bminf.layers.base - DEBUG - Out wi: ==
    2021-10-08 03:12:08,678 - bminf.layers.base - DEBUG - In wo: ==
    2021-10-08 03:12:08,678 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2362368
    2021-10-08 03:12:08,678 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
    2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Out wo: ==
    2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Out dense_gelu_dense: ==
    2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Out 4: ==
    2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - In 5: ==
    2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: size 7111680
    2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Parameter Loader [TransformerBlockGPT]: parameters 0, sub_layers 4
    2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - In layer_nrom_before_self_attn: ==
    2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
    2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
    2021-10-08 03:12:08,680 - bminf.layers.base - DEBUG - Out layer_nrom_before_self_attn: ==
    2021-10-08 03:12:08,681 - bminf.layers.base - DEBUG - In self_attention: ==
    2021-10-08 03:12:08,681 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: size 2371584
    2021-10-08 03:12:08,681 - bminf.layers.base - DEBUG - Parameter Loader [GPTAttention]: parameters 6, sub_layers 0
    2021-10-08 03:12:08,682 - bminf.layers.base - DEBUG - Out self_attention: ==
    2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - In layer_nrom_before_ff: ==
    2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
    2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
    2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Out layer_nrom_before_ff: ==
    2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - In dense_gelu_dense: ==
    2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: size 4733952
    2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Parameter Loader [GPTDenseGeluDense]: parameters 0, sub_layers 2
    2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - In wi: ==
    2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2371584
    2021-10-08 03:12:08,683 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
    2021-10-08 03:12:08,685 - bminf.layers.base - DEBUG - Out wi: ==
    2021-10-08 03:12:08,685 - bminf.layers.base - DEBUG - In wo: ==
    2021-10-08 03:12:08,685 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: size 2362368
    2021-10-08 03:12:08,685 - bminf.layers.base - DEBUG - Parameter Loader [Linear]: parameters 3, sub_layers 0
    2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Out wo: ==
    2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Out dense_gelu_dense: ==
    2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Out 5: ==
    2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Out layers: ==
    2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - In encoder_final_layer_nrom: ==
    2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: size 3072
    2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Parameter Loader [GPTLayerNorm]: parameters 2, sub_layers 0
    2021-10-08 03:12:08,687 - bminf.layers.base - DEBUG - Out encoder_final_layer_nrom: ==
    2021-10-08 03:12:08,687 - model - INFO - Start loading parameters from cpu to gpu
    2021-10-08 03:12:08,687 - model - INFO - Using static loader: total: 75027456, dynamic_memory 536870912, memory_limit 11453988864
    2021-10-08 03:12:08,688 - bminf.allocator.base - INFO - Allocate 30781440
    2021-10-08 03:12:08,695 - bminf.allocator.base - INFO - Allocate 1572864
    2021-10-08 03:12:08,696 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,696 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,696 - bminf.allocator.base - INFO - Allocate 1769472
    2021-10-08 03:12:08,696 - bminf.allocator.base - INFO - Allocate 4608
    2021-10-08 03:12:08,696 - bminf.allocator.base - INFO - Allocate 4608
    2021-10-08 03:12:08,696 - bminf.allocator.base - INFO - Allocate 589824
    2021-10-08 03:12:08,697 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,697 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,697 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,697 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,697 - bminf.allocator.base - INFO - Allocate 2359296
    2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 6144
    2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 6144
    2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 2359296
    2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,698 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 1769472
    2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 4608
    2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 4608
    2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 589824
    2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,699 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,700 - bminf.allocator.base - INFO - Allocate 2359296
    2021-10-08 03:12:08,700 - bminf.allocator.base - INFO - Allocate 6144
    2021-10-08 03:12:08,700 - bminf.allocator.base - INFO - Allocate 6144
    2021-10-08 03:12:08,700 - bminf.allocator.base - INFO - Allocate 2359296
    2021-10-08 03:12:08,701 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,701 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,701 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,701 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,701 - bminf.allocator.base - INFO - Allocate 1769472
    2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 4608
    2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 4608
    2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 589824
    2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,702 - bminf.allocator.base - INFO - Allocate 2359296
    2021-10-08 03:12:08,703 - bminf.allocator.base - INFO - Allocate 6144
    2021-10-08 03:12:08,703 - bminf.allocator.base - INFO - Allocate 6144
    2021-10-08 03:12:08,703 - bminf.allocator.base - INFO - Allocate 2359296
    2021-10-08 03:12:08,703 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,703 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 1769472
    2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 4608
    2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 4608
    2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 589824
    2021-10-08 03:12:08,704 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 2359296
    2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 6144
    2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 6144
    2021-10-08 03:12:08,705 - bminf.allocator.base - INFO - Allocate 2359296
    2021-10-08 03:12:08,706 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,706 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,706 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,706 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,706 - bminf.allocator.base - INFO - Allocate 1769472
    2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 4608
    2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 4608
    2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 589824
    2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,707 - bminf.allocator.base - INFO - Allocate 2359296
    2021-10-08 03:12:08,708 - bminf.allocator.base - INFO - Allocate 6144
    2021-10-08 03:12:08,708 - bminf.allocator.base - INFO - Allocate 6144
    2021-10-08 03:12:08,708 - bminf.allocator.base - INFO - Allocate 2359296
    2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 1769472
    2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 4608
    2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 4608
    2021-10-08 03:12:08,709 - bminf.allocator.base - INFO - Allocate 589824
    2021-10-08 03:12:08,710 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,710 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,710 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,710 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,710 - bminf.allocator.base - INFO - Allocate 2359296
    2021-10-08 03:12:08,710 - bminf.allocator.base - INFO - Allocate 6144
    2021-10-08 03:12:08,711 - bminf.allocator.base - INFO - Allocate 6144
    2021-10-08 03:12:08,711 - bminf.allocator.base - INFO - Allocate 2359296
    2021-10-08 03:12:08,711 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,711 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,711 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,711 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,712 - bminf.allocator.base - INFO - Allocate 536870912
    2021-10-08 03:12:08,713 - model - INFO - Cleaning useless parameters on cpu
    2021-10-08 03:12:08,715 - model - INFO - End of model initialization
    2021-10-08 03:12:08,715 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,859 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,860 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,861 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,862 - bminf.allocator.base - INFO - Allocate 18874368
    2021-10-08 03:12:08,862 - model - INFO - Calc encoder layer 0
    2021-10-08 03:12:08,862 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm self-attn
    2021-10-08 03:12:08,862 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,863 - bminf.allocator.base - INFO - Allocate 256
    2021-10-08 03:12:08,863 - bminf.allocator.base - INFO - Allocate 256
    2021-10-08 03:12:08,871 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,872 - bminf.layers.transformer_block - INFO - Encoder transformer block -- self attention
    2021-10-08 03:12:08,872 - bminf.allocator.base - INFO - Allocate 49152
    2021-10-08 03:12:08,872 - bminf.allocator.base - INFO - Allocate 128
    2021-10-08 03:12:08,874 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,874 - bminf.allocator.base - INFO - Allocate 294912
    2021-10-08 03:12:08,923 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) Missing
    2021-10-08 03:12:08,923 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) Missing
    2021-10-08 03:12:08,923 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) Missing
    2021-10-08 03:12:08,923 - bminf.utils.cache - DEBUG - Get (10, False) Missing
    2021-10-08 03:12:08,923 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) Missing
    2021-10-08 03:12:08,926 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,926 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,926 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,926 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,926 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,927 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,927 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,927 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,927 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,927 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,928 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,928 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) Missing
    2021-10-08 03:12:08,929 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,929 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,929 - bminf.utils.cache - DEBUG - Get (0, 68, False, True) Missing
    2021-10-08 03:12:08,931 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,937 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,937 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,937 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,937 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,937 - bminf.utils.cache - DEBUG - Get (0, 68, False, False) Missing
    2021-10-08 03:12:08,937 - bminf.allocator.base - INFO - Allocate 49152
    2021-10-08 03:12:08,937 - bminf.allocator.base - INFO - Allocate 128
    2021-10-08 03:12:08,938 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,938 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,938 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,938 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,938 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,938 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,938 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,939 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm ff
    2021-10-08 03:12:08,939 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,940 - bminf.allocator.base - INFO - Allocate 256
    2021-10-08 03:12:08,940 - bminf.allocator.base - INFO - Allocate 256
    2021-10-08 03:12:08,940 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,940 - bminf.layers.transformer_block - INFO - Encoder transformer block -- ff
    2021-10-08 03:12:08,940 - bminf.allocator.base - INFO - Allocate 49152
    2021-10-08 03:12:08,940 - bminf.allocator.base - INFO - Allocate 128
    2021-10-08 03:12:08,940 - bminf.allocator.base - INFO - Allocate 786432
    2021-10-08 03:12:08,940 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,941 - bminf.utils.cache - DEBUG - Get (3, 768, 3072, 768, 0, 1, 0) Missing
    2021-10-08 03:12:08,941 - bminf.utils.cache - DEBUG - Get (10, 64, 3072, 64, 0, 1, 196608) Missing
    2021-10-08 03:12:08,941 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,941 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,941 - bminf.allocator.base - INFO - Allocate 393216
    2021-10-08 03:12:08,942 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,942 - bminf.allocator.base - INFO - Allocate 128
    2021-10-08 03:12:08,942 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,942 - bminf.utils.cache - DEBUG - Get (3, 64, 3072, 64, 0, 1, 0) Missing
    2021-10-08 03:12:08,943 - bminf.utils.cache - DEBUG - Get (3, 3072, 768, 3072, 0, 1, 0) Missing
    2021-10-08 03:12:08,943 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,943 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,943 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,943 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,943 - model - INFO - Calc encoder layer 1
    2021-10-08 03:12:08,943 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm self-attn
    2021-10-08 03:12:08,943 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,943 - bminf.allocator.base - INFO - Allocate 256
    2021-10-08 03:12:08,943 - bminf.allocator.base - INFO - Allocate 256
    2021-10-08 03:12:08,944 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,944 - bminf.layers.transformer_block - INFO - Encoder transformer block -- self attention
    2021-10-08 03:12:08,944 - bminf.allocator.base - INFO - Allocate 49152
    2021-10-08 03:12:08,944 - bminf.allocator.base - INFO - Allocate 128
    2021-10-08 03:12:08,944 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,944 - bminf.allocator.base - INFO - Allocate 294912
    2021-10-08 03:12:08,944 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,944 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,944 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,944 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,944 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,945 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,946 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (0, 68, False, True) HIT
    2021-10-08 03:12:08,946 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,946 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,946 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,947 - bminf.utils.cache - DEBUG - Get (0, 68, False, False) HIT
    2021-10-08 03:12:08,947 - bminf.allocator.base - INFO - Allocate 49152
    2021-10-08 03:12:08,947 - bminf.allocator.base - INFO - Allocate 128
    2021-10-08 03:12:08,947 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,947 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,947 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,947 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,947 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,947 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,947 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,947 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm ff
    2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 256
    2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 256
    2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,948 - bminf.layers.transformer_block - INFO - Encoder transformer block -- ff
    2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 49152
    2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 128
    2021-10-08 03:12:08,948 - bminf.allocator.base - INFO - Allocate 786432
    2021-10-08 03:12:08,948 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (3, 768, 3072, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (10, 64, 3072, 64, 0, 1, 196608) HIT
    2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,949 - bminf.allocator.base - INFO - Allocate 393216
    2021-10-08 03:12:08,949 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,949 - bminf.allocator.base - INFO - Allocate 128
    2021-10-08 03:12:08,949 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (3, 64, 3072, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (3, 3072, 768, 3072, 0, 1, 0) HIT
    2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,949 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,950 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,950 - model - INFO - Calc encoder layer 2
    2021-10-08 03:12:08,950 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm self-attn
    2021-10-08 03:12:08,950 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,950 - bminf.allocator.base - INFO - Allocate 256
    2021-10-08 03:12:08,950 - bminf.allocator.base - INFO - Allocate 256
    2021-10-08 03:12:08,950 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,950 - bminf.layers.transformer_block - INFO - Encoder transformer block -- self attention
    2021-10-08 03:12:08,951 - bminf.allocator.base - INFO - Allocate 49152
    2021-10-08 03:12:08,951 - bminf.allocator.base - INFO - Allocate 128
    2021-10-08 03:12:08,951 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,951 - bminf.allocator.base - INFO - Allocate 294912
    2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,951 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,952 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,952 - bminf.utils.cache - DEBUG - Get (0, 68, False, True) HIT
    2021-10-08 03:12:08,953 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,953 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,953 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,953 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,953 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,953 - bminf.utils.cache - DEBUG - Get (0, 68, False, False) HIT
    2021-10-08 03:12:08,953 - bminf.allocator.base - INFO - Allocate 49152
    2021-10-08 03:12:08,953 - bminf.allocator.base - INFO - Allocate 128
    2021-10-08 03:12:08,954 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,954 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,954 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,954 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,954 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,954 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,954 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,954 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm ff
    2021-10-08 03:12:08,954 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,954 - bminf.allocator.base - INFO - Allocate 256
    2021-10-08 03:12:08,954 - bminf.allocator.base - INFO - Allocate 256
    2021-10-08 03:12:08,955 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,955 - bminf.layers.transformer_block - INFO - Encoder transformer block -- ff
    2021-10-08 03:12:08,955 - bminf.allocator.base - INFO - Allocate 49152
    2021-10-08 03:12:08,955 - bminf.allocator.base - INFO - Allocate 128
    2021-10-08 03:12:08,955 - bminf.allocator.base - INFO - Allocate 786432
    2021-10-08 03:12:08,955 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,955 - bminf.utils.cache - DEBUG - Get (3, 768, 3072, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,955 - bminf.utils.cache - DEBUG - Get (10, 64, 3072, 64, 0, 1, 196608) HIT
    2021-10-08 03:12:08,955 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,955 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,955 - bminf.allocator.base - INFO - Allocate 393216
    2021-10-08 03:12:08,956 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,956 - bminf.allocator.base - INFO - Allocate 128
    2021-10-08 03:12:08,956 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,956 - bminf.utils.cache - DEBUG - Get (3, 64, 3072, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,956 - bminf.utils.cache - DEBUG - Get (3, 3072, 768, 3072, 0, 1, 0) HIT
    2021-10-08 03:12:08,956 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,956 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,956 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,956 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,956 - model - INFO - Calc encoder layer 3
    2021-10-08 03:12:08,957 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm self-attn
    2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 256
    2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 256
    2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,957 - bminf.layers.transformer_block - INFO - Encoder transformer block -- self attention
    2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 49152
    2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 128
    2021-10-08 03:12:08,957 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,958 - bminf.allocator.base - INFO - Allocate 294912
    2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,958 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,959 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,959 - bminf.utils.cache - DEBUG - Get (0, 68, False, True) HIT
    2021-10-08 03:12:08,959 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,960 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,960 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,960 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,960 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,960 - bminf.utils.cache - DEBUG - Get (0, 68, False, False) HIT
    2021-10-08 03:12:08,960 - bminf.allocator.base - INFO - Allocate 49152
    2021-10-08 03:12:08,960 - bminf.allocator.base - INFO - Allocate 128
    2021-10-08 03:12:08,960 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,960 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,960 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,961 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,961 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,961 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,961 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,961 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm ff
    2021-10-08 03:12:08,961 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,961 - bminf.allocator.base - INFO - Allocate 256
    2021-10-08 03:12:08,961 - bminf.allocator.base - INFO - Allocate 256
    2021-10-08 03:12:08,961 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,962 - bminf.layers.transformer_block - INFO - Encoder transformer block -- ff
    2021-10-08 03:12:08,962 - bminf.allocator.base - INFO - Allocate 49152
    2021-10-08 03:12:08,962 - bminf.allocator.base - INFO - Allocate 128
    2021-10-08 03:12:08,962 - bminf.allocator.base - INFO - Allocate 786432
    2021-10-08 03:12:08,962 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,962 - bminf.utils.cache - DEBUG - Get (3, 768, 3072, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,962 - bminf.utils.cache - DEBUG - Get (10, 64, 3072, 64, 0, 1, 196608) HIT
    2021-10-08 03:12:08,962 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,962 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,962 - bminf.allocator.base - INFO - Allocate 393216
    2021-10-08 03:12:08,962 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,963 - bminf.allocator.base - INFO - Allocate 128
    2021-10-08 03:12:08,963 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,963 - bminf.utils.cache - DEBUG - Get (3, 64, 3072, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,963 - bminf.utils.cache - DEBUG - Get (3, 3072, 768, 3072, 0, 1, 0) HIT
    2021-10-08 03:12:08,963 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,963 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,963 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,963 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,963 - model - INFO - Calc encoder layer 4
    2021-10-08 03:12:08,963 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm self-attn
    2021-10-08 03:12:08,963 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 256
    2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 256
    2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,964 - bminf.layers.transformer_block - INFO - Encoder transformer block -- self attention
    2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 49152
    2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 128
    2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,964 - bminf.allocator.base - INFO - Allocate 294912
    2021-10-08 03:12:08,964 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,965 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,966 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,966 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,966 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,966 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,966 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,966 - bminf.utils.cache - DEBUG - Get (0, 68, False, True) HIT
    2021-10-08 03:12:08,966 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,966 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (0, 68, False, False) HIT
    2021-10-08 03:12:08,967 - bminf.allocator.base - INFO - Allocate 49152
    2021-10-08 03:12:08,967 - bminf.allocator.base - INFO - Allocate 128
    2021-10-08 03:12:08,967 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,967 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,968 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm ff
    2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 256
    2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 256
    2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,968 - bminf.layers.transformer_block - INFO - Encoder transformer block -- ff
    2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 49152
    2021-10-08 03:12:08,968 - bminf.allocator.base - INFO - Allocate 128
    2021-10-08 03:12:08,969 - bminf.allocator.base - INFO - Allocate 786432
    2021-10-08 03:12:08,969 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,969 - bminf.utils.cache - DEBUG - Get (3, 768, 3072, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,969 - bminf.utils.cache - DEBUG - Get (10, 64, 3072, 64, 0, 1, 196608) HIT
    2021-10-08 03:12:08,969 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,969 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,969 - bminf.allocator.base - INFO - Allocate 393216
    2021-10-08 03:12:08,969 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,969 - bminf.allocator.base - INFO - Allocate 128
    2021-10-08 03:12:08,969 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,970 - bminf.utils.cache - DEBUG - Get (3, 64, 3072, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,970 - bminf.utils.cache - DEBUG - Get (3, 3072, 768, 3072, 0, 1, 0) HIT
    2021-10-08 03:12:08,970 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,970 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,970 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,970 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,970 - model - INFO - Calc encoder layer 5
    2021-10-08 03:12:08,970 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm self-attn
    2021-10-08 03:12:08,970 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,970 - bminf.allocator.base - INFO - Allocate 256
    2021-10-08 03:12:08,970 - bminf.allocator.base - INFO - Allocate 256
    2021-10-08 03:12:08,971 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,971 - bminf.layers.transformer_block - INFO - Encoder transformer block -- self attention
    2021-10-08 03:12:08,971 - bminf.allocator.base - INFO - Allocate 49152
    2021-10-08 03:12:08,971 - bminf.allocator.base - INFO - Allocate 128
    2021-10-08 03:12:08,971 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,971 - bminf.allocator.base - INFO - Allocate 294912
    2021-10-08 03:12:08,971 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,971 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,971 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,971 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,971 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,972 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,972 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (0, 68, False, True) HIT
    2021-10-08 03:12:08,973 - bminf.allocator.base - INFO - Allocate 1536
    2021-10-08 03:12:08,973 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (2, 64, 64, 64, 0, 12, 4096) HIT
    2021-10-08 03:12:08,973 - bminf.utils.cache - DEBUG - Get (0, 68, False, False) HIT
    2021-10-08 03:12:08,974 - bminf.allocator.base - INFO - Allocate 49152
    2021-10-08 03:12:08,974 - bminf.allocator.base - INFO - Allocate 128
    2021-10-08 03:12:08,974 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,974 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,974 - bminf.utils.cache - DEBUG - Get (3, 768, 768, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,974 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,974 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,974 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,974 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,974 - bminf.layers.transformer_block - INFO - Encoder transformer block -- layer norm ff
    2021-10-08 03:12:08,974 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,975 - bminf.allocator.base - INFO - Allocate 256
    2021-10-08 03:12:08,975 - bminf.allocator.base - INFO - Allocate 256
    2021-10-08 03:12:08,975 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,975 - bminf.layers.transformer_block - INFO - Encoder transformer block -- ff
    2021-10-08 03:12:08,975 - bminf.allocator.base - INFO - Allocate 49152
    2021-10-08 03:12:08,975 - bminf.allocator.base - INFO - Allocate 128
    2021-10-08 03:12:08,975 - bminf.allocator.base - INFO - Allocate 786432
    2021-10-08 03:12:08,975 - bminf.utils.cache - DEBUG - Get (3, 64, 768, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,975 - bminf.utils.cache - DEBUG - Get (3, 768, 3072, 768, 0, 1, 0) HIT
    2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (10, 64, 3072, 64, 0, 1, 196608) HIT
    2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,976 - bminf.allocator.base - INFO - Allocate 393216
    2021-10-08 03:12:08,976 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,976 - bminf.allocator.base - INFO - Allocate 128
    2021-10-08 03:12:08,976 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (3, 64, 3072, 64, 0, 1, 0) HIT
    2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (3, 3072, 768, 3072, 0, 1, 0) HIT
    2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (10, 64, 768, 64, 0, 1, 49152) HIT
    2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (10, False) HIT
    2021-10-08 03:12:08,976 - bminf.utils.cache - DEBUG - Get (10, 72, False, False) HIT
    2021-10-08 03:12:08,977 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,977 - bminf.allocator.base - INFO - Allocate 196608
    2021-10-08 03:12:08,977 - bminf.allocator.base - INFO - Allocate 256
    2021-10-08 03:12:08,977 - bminf.allocator.base - INFO - Allocate 256
    2021-10-08 03:12:08,977 - bminf.allocator.base - INFO - Allocate 98304
    2021-10-08 03:12:08,979 - bminf.allocator.base - INFO - Allocate 40080
    2021-10-08 03:12:08,979 - bminf.utils.cache - DEBUG - Get (2, 768, 20040, 768, 0, 1, 0) Missing
    2021-10-08 03:12:08,979 - bminf.utils.cache - DEBUG - Get (2, 768, 1, 768, 0, 1, 0) Missing
    2021-10-08 03:12:08,979 - bminf.utils.cache - DEBUG - Get (2, 20040, 1, 20040, 0, 1, 20040) Missing
    2021-10-08 03:12:08,979 - bminf.utils.cache - DEBUG - Get (0, 68, True, False) Missing
    Loading model
    Start
    [[nan nan nan ... nan nan nan]]
    
    help wanted 
    opened by HoratioJSY 2
  • Error during installation, `No matching distribution found for cupy-cuda90<10,>=9`  [BUG]

    Error during installation, `No matching distribution found for cupy-cuda90<10,>=9` [BUG]

    Error log:

    Collecting cupy-cuda90<10,>=9 (from bminf)                                
      ERROR: Could not find a version that satisfies the requirement cupy-cuda
    90<10,>=9 (from bminf) (from versions: 4.0.0, 4.1.0, 4.2.0, 4.3.0, 4.4.0, 
    4.4.1, 4.5.0, 5.0.0, 5.1.0, 5.2.0, 5.3.0, 5.4.0, 6.0.0, 6.1.0, 6.2.0, 6.3.
    0, 6.4.0, 6.5.0, 6.6.0, 6.7.0, 7.0.0, 7.1.0, 7.1.1, 7.2.0, 7.3.0, 7.4.0, 7
    .5.0, 7.6.0, 7.7.0, 7.8.0, 8.0.0, 8.1.0, 8.2.0, 8.3.0, 8.4.0, 8.5.0, 8.6.0
    , 9.0.0a1, 9.0.0a2)                                                     
    ERROR: No matching distribution found for cupy-cuda90<10,>=9 (from bminf)
    

    /usr/local/cuda/version.txt:

    CUDA Version 9.0.176                                                      
    CUDA Patch Version 9.0.176.1                                              
    CUDA Patch Version 9.0.176.2                                              
    CUDA Patch Version 9.0.176.3
    
    question 
    opened by YixuanCao 1
  • generate.py示例程序错误

    generate.py示例程序错误

    在 https://github.com/OpenBMB/BMInf/blob/d40c6f5d5678e1cba771048ecd0923bceae176e2/examples/generate.py#L4-L21 中,函数设置了生成的result中含有<eod>后才停止。

    但在 https://github.com/OpenBMB/BMInf/blob/d40c6f5d5678e1cba771048ecd0923bceae176e2/bminf/models/cpm1.py#L90-L99 中,采样出<eod>后循环终止,其未被加入到ret中,因此不会被解码出来,result中不可能含有<eod>

    因此,generate.py中实际会无限循环生成文本。

    enhancement 
    opened by huhk-sysu 1
  • [FEATURE]Compare to FasterTransformer

    [FEATURE]Compare to FasterTransformer

    Is there any comparison between BMInf and Nvidia's FasterTransformer?

    I would like to use some tools to improve our model's inference performance. BMInf is great, and it seems like use CUDA implementation to boost inference performance, just like FasterTransformer. So, is there any comparison in inference time between BMInf and FasterTransformer?

    question 
    opened by HoratioJSY 1
  • [doc] change docker installation

    [doc] change docker installation

    null

    opened by jayzzhou-thu 0
  • Docs update

    Docs update

    null

    opened by jayzzhou-thu 0
  • change readme

    change readme

    null

    opened by jayzzhou-thu 0
  • 我们收集的是医疗问答数据怎么训练自己收集的数据呢?[FEATURE]

    我们收集的是医疗问答数据怎么训练自己收集的数据呢?[FEATURE]

    您好 我们这里收集的是医疗问答数据,想基于eva进行训练,但是我们尝试了一些方法还是没有实现,请问一下您这边实现了没有呢?谢谢您Is your feature request related to a problem? Please describe.

    Describe the solution you'd like

    Describe alternatives you've considered

    opened by yunfeihaha 0
  • [BUG]运行generate_cpm2.py 报value error

    [BUG]运行generate_cpm2.py 报value error

    运行generate_cpm2.py 报value error

    (EVAAA) [[email protected] examples]# python generate_cpm2.py Loading model Input: 天空是蔚蓝色,窗外有 Output: 天空是蔚蓝色,窗外有Traceback (most recent call last): File "generate_cpm2.py", line 32, in main() File "generate_cpm2.py", line 29, in main generate(cpm2_1, input_text) File "generate_cpm2.py", line 11, in generate value, stoped = model.generate( ValueError: too many values to unpack (expected 2)

    bug 
    opened by xiaoqiao 3
  • CPM2.1模型做文本生成时的问题

    CPM2.1模型做文本生成时的问题

    我在尝试用CPM2.1进行文本生成时,为了生成更长的结果,我修改了下面这行代码,使程序不会在生成标点符号时就停止。 https://github.com/OpenBMB/BMInf/blob/59e8903366ed53615d8af0a61e15b9f932042dcc/bminf/models/cpm2.py#L216

    我的调用方法如下: image

    在修改代码后,我发现生成的结果result中会有换行符(由词表中id为3的token转换而来),并且在换行后,上下文就不再连贯了,像是另起了一个段,有时候甚至话题都变了,如下图。

    这个例子在开始生成时直接换行了。 image

    这个例子在换行后话题出现了较大变化。 image

    1. 这种现象的出现是因为训练时就是以这种方式分隔段落的吗?
    2. 可否允许从generate函数传入自定义的”停止字符“来控制生成行为?
    3. 可否给出一个CPM2.1用来生成长篇文本的示例?
    enhancement question 
    opened by huhk-sysu 3
Releases(0.0.5)
Owner
OpenBMB
Open Lab for Big Model Base
OpenBMB
PyTorch original implementation of Cross-lingual Language Model Pretraining.

XLM NEW: Added XLM-R model. PyTorch original implementation of Cross-lingual Language Model Pretraining. Includes: Monolingual language model pretrain

Facebook Research 2.5k Oct 15, 2021
Beyond the Imitation Game collaborative benchmark for enormous language models

BIG-bench ?? The Beyond the Imitation Game Benchmark (BIG-bench) will be a collaborative benchmark intended to probe large language models, and extrap

Google 343 Oct 20, 2021
Beautiful visualizations of how language differs among document types.

Scattertext 0.1.0.0 A tool for finding distinguishing terms in corpora and displaying them in an interactive HTML scatter plot. Points corresponding t

Jason S. Kessler 1.7k Oct 23, 2021
Beautiful visualizations of how language differs among document types.

Scattertext 0.1.0.0 A tool for finding distinguishing terms in corpora and displaying them in an interactive HTML scatter plot. Points corresponding t

Jason S. Kessler 1.5k Feb 17, 2021
Unsupervised Language Modeling at scale for robust sentiment classification

** DEPRECATED ** This repo has been deprecated. Please visit Megatron-LM for our up to date Large-scale unsupervised pretraining and finetuning code.

NVIDIA Corporation 1k Oct 17, 2021
A Survey of Natural Language Generation in Task-Oriented Dialogue System (TOD): Recent Advances and New Frontiers

A Survey of Natural Language Generation in Task-Oriented Dialogue System (TOD): Recent Advances and New Frontiers

Libo Qin 86 Oct 12, 2021
🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow ?? Transformers provides thousands of pretrained mo

Hugging Face 52.8k Oct 21, 2021
Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Alexander Veysov 1.1k Oct 23, 2021
A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN

artificial intelligence cosmic love and attention fire in the sky a pyramid made of ice a lonely house in the woods marriage in the mountains lantern

Phil Wang 1.5k Oct 14, 2021
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0 ?? Transformers provides thousands of pretrained models to perform tasks o

Hugging Face 52.8k Oct 22, 2021
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0 ?? Transformers provides thousands of pretrained models to perform tasks o

Hugging Face 40.9k Feb 18, 2021
ACL'2021: Learning Dense Representations of Phrases at Scale

DensePhrases DensePhrases is an extractive phrase search tool based on your natural language inputs. From 5 million Wikipedia articles, it can search

Princeton Natural Language Processing 314 Oct 19, 2021
Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages

Coreferee Author: Richard Paul Hudson, msg systems ag 1. Introduction 1.1 The basic idea 1.2 Getting started 1.2.1 English 1.2.2 German 1.2.3 Polish 1

msg systems ag 103 Oct 14, 2021
A Paper List for Speech Translation

Keyword: Speech Translation, Spoken Language Processing, Natural Language Processing

null 89 Oct 21, 2021
:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

(Framework for Adapting Representation Models) What is it? FARM makes Transfer Learning with BERT & Co simple, fast and enterprise-ready. It's built u

deepset 1.4k Oct 16, 2021
:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

(Framework for Adapting Representation Models) What is it? FARM makes Transfer Learning with BERT & Co simple, fast and enterprise-ready. It's built u

deepset 1.1k Feb 14, 2021
EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?

BioLAMA BioLAMA is biomedical factual knowledge triples for probing biomedical LMs. The triples are collected and pre-processed from three sources: CT

DMIS Laboratory - Korea University 11 Oct 14, 2021
中文医疗信息处理基准CBLUE: A Chinese Biomedical LanguageUnderstanding Evaluation Benchmark

English | 中文说明 CBLUE AI (Artificial Intelligence) is playing an indispensabe role in the biomedical field, helping improve medical technology. For fur

null 206 Oct 18, 2021