GLM (General Language Model)

Related tags

Deep Learning GLM
Overview

GLM

GLM is a General Language Model pretrained with an autoregressive blank-filling objective and can be finetuned on various natural language understanding and generation tasks.

Please refer to our paper for a detailed description of GLM:

All NLP Tasks Are Generation Tasks: A General Pretraining Framework

Zhengxiao Du*, Yujie Qian*, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, Jie Tang (*: equal contribution)

Part of the code is based on Megatron-LM and PET.

Pretrained Models

You can download the pretrained models used in the paper here.

Name Params File Config
GLM-Base 110M glm-base-blank.tar.bz2 model_blocklm_base.sh
GLM-Large 335M glm-large-blank.tar.bz2 model_blocklm_large.sh
GLM-Large (multi-task) 335M glm-large-generation.tar.bz2 model_blocklm_large_generation.sh
GLM-410M (multi-task) 410M glm-1.25-generation.tar.bz2 model_blocklm_1.25_generation.sh
GLM-515M (multi-task) 515M glm-1.5-generation.tar.bz2 model_blocklm_1.5_generation.sh
GLM-RoBERTa 335M glm-roberta-large-blank.tar.bz2 model_blocklm_roberta_large.sh

Installation

Clone this repo

git clone https://github.com/THUDM/GLM
cd GLM

Please first install PyTorch (we use 1.7.0) and apex, and then install other dependencies by

pip install -r requirements.txt

Usage

We provide scripts for finetuning GLM on some downstream tasks.

SuperGLUE

  • Download the SuperGlue data and check the experiment setup in scripts/finetune_superglue.sh. Note that DATA_ROOT, CHECKPOINT_PATH, SAVE_PATH need to be changed to your local path. You may also change the batch-size and nproc_per_node according to your available hardware. We suggest to use aggregated batch size 64 for MultiRC and ReCORD and 16 for other tasks.

  • Run the following script (use the COPA dataset as an example)

bash scripts/finetune_superglue.sh \
     config_tasks/model_blocklm_roberta_large.sh \
     config_tasks/task_copa.sh
  • To apply GLM to a new NLU dataset with cloze-filling finetuning, implement a DataProcessor in tasks/superglue/dataset.py for data loading and add a PVP in tasks/superglue/pvp.py for the cloze question. More details can be found here.

  • The cloze questions (prompts) used in this work are written by human. We are also studying a P-tuning (prompt tuning) approach to search for the optimal continuous prompt. Please refer to our paper and code.

Text Summarization

  • Download the Gigaword dataset and check the experiment setup in scripts/finetune_seq2seq.sh. Change DATA_ROOT, CHECKPOINT_PATH, SAVE_PATH to your local path.

  • Run the following script

bash scripts/finetune_seq2seq.sh \ 
     config_tasks/model_blocklm_large_generation.sh \ 
     config_tasks/seq_gigaword.sh
  • For calculating rouge, install file2rouge from here and run bash scripts/evaluate_seq2seq.sh

Language Modeling

LAMBADA Cloze Accuracy

bash scripts/evaluate_lm.sh \ 
     config_tasks/model_blocklm_large_generation.sh \
     config_tasks/zero_lambada.sh 

LM Perplexity

  • Download our test set of wikibook (or any dataset following the same format) and change DATA_ROOT, CHECKPOINT_PATH in scripts/evaluate_lm.sh
  • Run the following script
    bash scripts/evaluate_lm.sh \ 
       config_tasks/model_blocklm_large_generation.sh \
       config_tasks/zero_lm.sh 

Blank Language Model

  • Download the Yahoo dataset and check the experiment setup in scripts/finetune_blank.sh. Change DATA_ROOT, CHECKPOINT_PATH, SAVE_PATH to your local path.

  • Run the following script

bash scripts/finetune_blank.sh \ 
     config_tasks/model_blocklm_large.sh \ 
     config_tasks/seq_blank.sh

Blank Filling (Interactive)

  • Change CHECKPOINT_PATH to your local path. Run the following script
bash scripts/generate_block.sh \
     config_tasks/model_blocklm_large.sh

Example:

Context: Ng is an adjunct professor at [MASK] (formerly associate professor and Director of its Stanford AI Lab or SAIL ). Also a pioneer in online education, Ng co-founded Coursera and deeplearning.ai.

GLM: [CLS] ng is an adjunct professor at [MASK] ( formerly associate professor and director of its stanford ai lab or sail ) . also a pioneer in online education , ng co - founded coursera and deeplearning . ai . [PAD] <|startofpiece|> the stanford university

Citation

Please cite our paper if you find this code useful for your research:

@article{DBLP:journals/corr/abs-2103-10360,
  author    = {Zhengxiao Du and
               Yujie Qian and
               Xiao Liu and
               Ming Ding and
               Jiezhong Qiu and
               Zhilin Yang and
               Jie Tang},
  title     = {All {NLP} Tasks Are Generation Tasks: {A} General Pretraining Framework},
  journal   = {CoRR},
  volume    = {abs/2103.10360},
  year      = {2021},
  url       = {https://arxiv.org/abs/2103.10360}
}
Comments
  • text infilling cases

    text infilling cases

    Thanks for your wonderful work!

    I tried

    bash scripts/generate_block.sh \
         config_tasks/model_blocklm_large.sh
    

    with many context inputs, but got unsatisfied predictions (weird tokens or not consistently with local context), for example:

    #1
    Context: Ng is a good teacher at [MASK] .
    
    GLM: [CLS] ng is a good teacher at [MASK] . [PAD] <|startofpiece|> are . e
    
    #2
    Context: Ng is an adjunct professor at [MASK] (formerly associate professor and Director of its Stanford AI Lab or SAIL ). Also a [MASK] in online education
    
    GLM: [CLS] ng is an adjunct professor at [MASK] ( formerly associate professor and director of its stanford ai lab or sail ) . also a [MASK] in online education [PAD] <|startofpiece|> the university of arizona <|startofpiece|> researcher at the
    
    #3
    Context: Ng is an adjunct professor at [MASK] (formerly associate professor and Director of its Stanford AI Lab or SAIL ). Also a [MASK] in online education, Ng co-founded Coursera and deeplearning.ai.
    
    GLM: [CLS] ng is an adjunct professor at [MASK] ( formerly associate professor and director of its stanford ai lab or sail ) . also a [MASK] in online education , ng co - founded coursera and deeplearning . ai . [PAD] <|startofpiece|> the university of michigan <|startofpiece|> senior associate at the university of
    

    I want to generate multiple spans for a given context, I was wondering whether I have mistakes in using the scripts.

    opened by qtli 11
  • run infer failed

    run infer failed

    I use A100 40G * 8 to run the huggingface hub code and failed. I try to add device_map='auto' at AutoModelForSeq2SeqLM.from_pretrained,but not support. how to run this code?

    opened by xv44586 4
  • continue pretrain的时候遇到loss scale的问题,怎么解决?

    continue pretrain的时候遇到loss scale的问题,怎么解决?

    2022-09-14 22:57:53,096] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 4 Skipping step. Attempted loss scale: 4294967296, reducing to 4294967296 [2022-09-14 22:57:53,096] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 6 Skipping step. Attempted loss scale: 4294967296, reducing to 4294967296 [2022-09-14 22:57:53,096] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 7 Skipping step. Attempted loss scale: 4294967296, reducing to 4294967296 [2022-09-14 22:57:53,096] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 2 Skipping step. Attempted loss scale: 4294967296, reducing to 4294967296 [2022-09-14 22:57:53,096] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 5 Skipping step. Attempted loss scale: 4294967296, reducing to 4294967296 [2022-09-14 22:57:53,096] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 4294967296, reducing to 4294967296 [2022-09-14 22:57:53,096] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 4294967296, reducing to 4294967296 [2022-09-14 22:57:53,097] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 3 Skipping step. Attempted loss scale: 4294967296, reducing to 4294967296 [2022-09-14 22:57:55,597] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 4 Skipping step. Attempted loss scale: 4294967296, reducing to 2147483648.0 [2022-09-14 22:57:55,597] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 6 Skipping step. Attempted loss scale: 4294967296, reducing to 2147483648.0 [2022-09-14 22:57:55,597] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 4294967296, reducing to 2147483648.0 [2022-09-14 22:57:55,597] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 2 Skipping step. Attempted loss scale: 4294967296, reducing to 2147483648.0 [2022-09-14 22:57:55,597] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 5 Skipping step. Attempted loss scale: 4294967296, reducing to 2147483648.0 [2022-09-14 22:57:55,597] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 7 Skipping step. Attempted loss scale: 4294967296, reducing to 2147483648.0 [2022-09-14 22:57:55,597] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 3 Skipping step. Attempted loss scale: 4294967296, reducing to 2147483648.0 [2022-09-14 22:57:55,597] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 4294967296, reducing to 2147483648.0 [2022-09-14 22:57:57,418] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 4 Skipping step. Attempted loss scale: 2147483648.0, reducing to 1073741824.0 [2022-09-14 22:57:57,418] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 6 Skipping step. Attempted loss scale: 2147483648.0, reducing to 1073741824.0 [2022-09-14 22:57:57,418] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 2147483648.0, reducing to 1073741824.0 [2022-09-14 22:57:57,418] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 2 Skipping step. Attempted loss scale: 2147483648.0, reducing to 1073741824.0 [2022-09-14 22:57:57,419] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 5 Skipping step. Attempted loss scale: 2147483648.0, reducing to 1073741824.0 [2022-09-14 22:57:57,419] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 7 Skipping step. Attempted loss scale: 2147483648.0, reducing to 1073741824.0 [2022-09-14 22:57:57,419] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 3 Skipping step. Attempted loss scale: 2147483648.0, reducing to 1073741824.0 [2022-09-14 22:57:57,419] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 2147483648.0, reducing to 1073741824.0 [2022-09-14 22:57:59,286] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 4 Skipping step. Attempted loss scale: 1073741824.0, reducing to 536870912.0 [2022-09-14 22:57:59,286] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 6 Skipping step. Attempted loss scale: 1073741824.0, reducing to 536870912.0 [2022-09-14 22:57:59,286] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1073741824.0, reducing to 536870912.0 [2022-09-14 22:57:59,287] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 2 Skipping step. Attempted loss scale: 1073741824.0, reducing to 536870912.0 [2022-09-14 22:57:59,287] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 7 Skipping step. Attempted loss scale: 1073741824.0, reducing to 536870912.0 [2022-09-14 22:57:59,287] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 5 Skipping step. Attempted loss scale: 1073741824.0, reducing to 536870912.0 [2022-09-14 22:57:59,287] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 3 Skipping step. Attempted loss scale: 1073741824.0, reducing to 536870912.0 [2022-09-14 22:57:59,287] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1073741824.0, reducing to 536870912.0 [2022-09-14 22:58:01,089] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 6 Skipping step. Attempted loss scale: 536870912.0, reducing to 268435456.0 [2022-09-14 22:58:01,089] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 4 Skipping step. Attempted loss scale: 536870912.0, reducing to 268435456.0 [2022-09-14 22:58:01,089] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 536870912.0, reducing to 268435456.0 [2022-09-14 22:58:01,089] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 7 Skipping step. Attempted loss scale: 536870912.0, reducing to 268435456.0 [2022-09-14 22:58:01,089] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 2 Skipping step. Attempted loss scale: 536870912.0, reducing to 268435456.0 [2022-09-14 22:58:01,089] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 5 Skipping step. Attempted loss scale: 536870912.0, reducing to 268435456.0 [2022-09-14 22:58:01,089] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 3 Skipping step. Attempted loss scale: 536870912.0, reducing to 268435456.0 [2022-09-14 22:58:01,089] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 536870912.0, reducing to 268435456.0 [2022-09-14 22:58:03,610] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 6 Skipping step. Attempted loss scale: 268435456.0, reducing to 134217728.0 [2022-09-14 22:58:03,610] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 4 Skipping step. Attempted loss scale: 268435456.0, reducing to 134217728.0 [2022-09-14 22:58:03,610] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 268435456.0, reducing to 134217728.0 [2022-09-14 22:58:03,610] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 2 Skipping step. Attempted loss scale: 268435456.0, reducing to 134217728.0 [2022-09-14 22:58:03,610] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 5 Skipping step. Attempted loss scale: 268435456.0, reducing to 134217728.0 [2022-09-14 22:58:03,610] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 3 Skipping step. Attempted loss scale: 268435456.0, reducing to 134217728.0 [2022-09-14 22:58:03,610] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 7 Skipping step. Attempted loss scale: 268435456.0, reducing to 134217728.0 [2022-09-14 22:58:03,610] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 268435456.0, reducing to 134217728.0 [2022-09-14 22:58:04,962] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 6 Skipping step. Attempted loss scale: 134217728.0, reducing to 67108864.0 [2022-09-14 22:58:04,963] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 4 Skipping step. Attempted loss scale: 134217728.0, reducing to 67108864.0 [2022-09-14 22:58:04,963] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 134217728.0, reducing to 67108864.0 [2022-09-14 22:58:04,963] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 2 Skipping step. Attempted loss scale: 134217728.0, reducing to 67108864.0 [2022-09-14 22:58:04,963] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 3 Skipping step. Attempted loss scale: 134217728.0, reducing to 67108864.0 [2022-09-14 22:58:04,963] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 5 Skipping step. Attempted loss scale: 134217728.0, reducing to 67108864.0 [2022-09-14 22:58:04,963] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 7 Skipping step. Attempted loss scale: 134217728.0, reducing to 67108864.0 [2022-09-14 22:58:04,963] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 134217728.0, reducing to 67108864.0 [2022-09-14 22:58:07,493] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 6 Skipping step. Attempted loss scale: 67108864.0, reducing to 33554432.0 [2022-09-14 22:58:07,493] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 4 Skipping step. Attempted loss scale: 67108864.0, reducing to 33554432.0 [2022-09-14 22:58:07,493] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 67108864.0, reducing to 33554432.0 [2022-09-14 22:58:07,493] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 2 Skipping step. Attempted loss scale: 67108864.0, reducing to 33554432.0 [2022-09-14 22:58:07,494] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 7 Skipping step. Attempted loss scale: 67108864.0, reducing to 33554432.0 [2022-09-14 22:58:07,494] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 3 Skipping step. Attempted loss scale: 67108864.0, reducing to 33554432.0 [2022-09-14 22:58:07,494] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 67108864.0, reducing to 33554432.0 [2022-09-14 22:58:07,494] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 5 Skipping step. Attempted loss scale: 67108864.0, reducing to 33554432.0 [2022-09-14 22:58:10,152] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 4 Skipping step. Attempted loss scale: 33554432.0, reducing to 16777216.0 [2022-09-14 22:58:10,152] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 6 Skipping step. Attempted loss scale: 33554432.0, reducing to 16777216.0 [2022-09-14 22:58:10,152] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 33554432.0, reducing to 16777216.0 [2022-09-14 22:58:10,152] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 2 Skipping step. Attempted loss scale: 33554432.0, reducing to 16777216.0 [2022-09-14 22:58:10,152] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 7 Skipping step. Attempted loss scale: 33554432.0, reducing to 16777216.0 [2022-09-14 22:58:10,152] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 3 Skipping step. Attempted loss scale: 33554432.0, reducing to 16777216.0 [2022-09-14 22:58:10,152] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 5 Skipping step. Attempted loss scale: 33554432.0, reducing to 16777216.0 [2022-09-14 22:58:10,152] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 33554432.0, reducing to 16777216.0 [2022-09-14 22:58:11,943] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 4 Skipping step. Attempted loss scale: 16777216.0, reducing to 8388608.0 [2022-09-14 22:58:11,943] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 6 Skipping step. Attempted loss scale: 16777216.0, reducing to 8388608.0 [2022-09-14 22:58:11,943] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 16777216.0, reducing to 8388608.0 [2022-09-14 22:58:11,943] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 2 Skipping step. Attempted loss scale: 16777216.0, reducing to 8388608.0 [2022-09-14 22:58:11,943] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 7 Skipping step. Attempted loss scale: 16777216.0, reducing to 8388608.0 [2022-09-14 22:58:11,943] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 3 Skipping step. Attempted loss scale: 16777216.0, reducing to 8388608.0 [2022-09-14 22:58:11,943] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 5 Skipping step. Attempted loss scale: 16777216.0, reducing to 8388608.0 [2022-09-14 22:58:11,943] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 16777216.0, reducing to 8388608.0 [2022-09-14 22:58:13,765] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 6 Skipping step. Attempted loss scale: 8388608.0, reducing to 4194304.0 [2022-09-14 22:58:13,765] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 4 Skipping step. Attempted loss scale: 8388608.0, reducing to 4194304.0 [2022-09-14 22:58:13,765] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 8388608.0, reducing to 4194304.0 [2022-09-14 22:58:13,765] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 2 Skipping step. Attempted loss scale: 8388608.0, reducing to 4194304.0 [2022-09-14 22:58:13,765] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 7 Skipping step. Attempted loss scale: 8388608.0, reducing to 4194304.0 [2022-09-14 22:58:13,765] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 5 Skipping step. Attempted loss scale: 8388608.0, reducing to 4194304.0 [2022-09-14 22:58:13,765] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 3 Skipping step. Attempted loss scale: 8388608.0, reducing to 4194304.0 [2022-09-14 22:58:13,765] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 8388608.0, reducing to 4194304.0 [2022-09-14 22:58:16,346] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 6 Skipping step. Attempted loss scale: 4194304.0, reducing to 2097152.0 [2022-09-14 22:58:16,346] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 4194304.0, reducing to 2097152.0 [2022-09-14 22:58:16,346] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 4 Skipping step. Attempted loss scale: 4194304.0, reducing to 2097152.0 [2022-09-14 22:58:16,346] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 2 Skipping step. Attempted loss scale: 4194304.0, reducing to 2097152.0 [2022-09-14 22:58:16,346] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 7 Skipping step. Attempted loss scale: 4194304.0, reducing to 2097152.0 [2022-09-14 22:58:16,346] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 4194304.0, reducing to 2097152.0 [2022-09-14 22:58:16,346] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 5 Skipping step. Attempted loss scale: 4194304.0, reducing to 2097152.0 [2022-09-14 22:58:16,346] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 3 Skipping step. Attempted loss scale: 4194304.0, reducing to 2097152.0 [2022-09-14 22:58:18,958] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 4 Skipping step. Attempted loss scale: 2097152.0, reducing to 1048576.0 [2022-09-14 22:58:18,958] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 6 Skipping step. Attempted loss scale: 2097152.0, reducing to 1048576.0 [2022-09-14 22:58:18,958] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 2 Skipping step. Attempted loss scale: 2097152.0, reducing to 1048576.0 [2022-09-14 22:58:18,958] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 5 Skipping step. Attempted loss scale: 2097152.0, reducing to 1048576.0 [2022-09-14 22:58:18,958] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 2097152.0, reducing to 1048576.0 [2022-09-14 22:58:18,958] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 3 Skipping step. Attempted loss scale: 2097152.0, reducing to 1048576.0 [2022-09-14 22:58:18,958] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 2097152.0, reducing to 1048576.0 [2022-09-14 22:58:18,959] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 7 Skipping step. Attempted loss scale: 2097152.0, reducing to 1048576.0 [2022-09-14 22:58:20,798] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 4 Skipping step. Attempted loss scale: 1048576.0, reducing to 524288.0 [2022-09-14 22:58:20,798] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 6 Skipping step. Attempted loss scale: 1048576.0, reducing to 524288.0 [2022-09-14 22:58:20,798] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 1048576.0, reducing to 524288.0 [2022-09-14 22:58:20,799] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 2 Skipping step. Attempted loss scale: 1048576.0, reducing to 524288.0 [2022-09-14 22:58:20,799] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 7 Skipping step. Attempted loss scale: 1048576.0, reducing to 524288.0 [2022-09-14 22:58:20,799] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 5 Skipping step. Attempted loss scale: 1048576.0, reducing to 524288.0 [2022-09-14 22:58:20,799] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 3 Skipping step. Attempted loss scale: 1048576.0, reducing to 524288.0 [2022-09-14 22:58:20,799] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 1048576.0, reducing to 524288.0 [2022-09-14 22:58:25,414] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 4 Skipping step. Attempted loss scale: 524288.0, reducing to 262144.0 [2022-09-14 22:58:25,414] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 6 Skipping step. Attempted loss scale: 524288.0, reducing to 262144.0 [2022-09-14 22:58:25,414] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 524288.0, reducing to 262144.0 [2022-09-14 22:58:25,414] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 3 Skipping step. Attempted loss scale: 524288.0, reducing to 262144.0 [2022-09-14 22:58:25,414] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 2 Skipping step. Attempted loss scale: 524288.0, reducing to 262144.0 [2022-09-14 22:58:25,414] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 7 Skipping step. Attempted loss scale: 524288.0, reducing to 262144.0 [2022-09-14 22:58:25,414] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 524288.0, reducing to 262144.0 [2022-09-14 22:58:25,414] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 5 Skipping step. Attempted loss scale: 524288.0, reducing to 262144.0 [2022-09-14 22:58:28,171] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 4 Skipping step. Attempted loss scale: 262144.0, reducing to 131072.0 [2022-09-14 22:58:28,171] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 6 Skipping step. Attempted loss scale: 262144.0, reducing to 131072.0 [2022-09-14 22:58:28,171] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 1 Skipping step. Attempted loss scale: 262144.0, reducing to 131072.0 [2022-09-14 22:58:28,172] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 3 Skipping step. Attempted loss scale: 262144.0, reducing to 131072.0 [2022-09-14 22:58:28,172] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 2 Skipping step. Attempted loss scale: 262144.0, reducing to 131072.0 [2022-09-14 22:58:28,172] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 0 Skipping step. Attempted loss scale: 262144.0, reducing to 131072.0 [2022-09-14 22:58:28,172] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 5 Skipping step. Attempted loss scale: 262144.0, reducing to 131072.0 [2022-09-14 22:58:28,172] [INFO] [stage2.py:1387:step] [deepspeed] fp16 dynamic loss scale overflow! Rank 7 Skipping step. Attempted loss scale: 262144.0, reducing to 131072.0 Traceback (most recent call last):

    RuntimeError: CUDA out of memory. Tried to allocate 4.63 GiB (GPU 6; 31.75 GiB total capacity; 20.85 GiB already allocated; 4.46 GiB free; 25.71 GiB reserved in total by PyTorch)

    opened by dinglei8908 4
  • Training and inference issue

    Training and inference issue

    Dear authors, thanks for your great work.

    I have a little question. The paper says that the spans are shuffled in the training time. I was wondering whether the predicted spans are in order when inference, and how to make them orderly?

    Look forward to hearing from you.

    opened by qtli 3
  • 继续预训练如何加载模型?

    继续预训练如何加载模型?

    我在pretrain_glm.py继续预训练加载下载下来的glm-large-chinese/mp_rank_00_model_states.pt时报错:

    WARNING: could not find the metadata file /root/Data/zz/GitHub/GLM/blocklm-large-chinese/latest_checkpointed_iteration.txt 
    Try to directly load the checkpoint from the directory
    Traceback (most recent call last):
      File "pretrain_glm.py", line 663, in <module>
        main()
      File "pretrain_glm.py", line 580, in main
        args.iteration = load_checkpoint(model, optimizer, lr_scheduler, args)
      File "/root/Data/zz/GitHub/GLM/utils.py", line 337, in load_checkpoint
        checkpoint_name, sd = model.load_checkpoint(load_dir, tag,
      File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2513, in load_checkpoint
        load_path, client_states = self._load_checkpoint(load_dir,
      File "/root/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2671, in _load_checkpoint
        client_state['optimizer'] = optim_checkpoint['optimizer']
    KeyError: 'optimizer'
    

    提供的模型文件要怎么才能正确加载呢?

    opened by fade-color 3
  • Why vocabulary is divided by GPU number and how to load it?

    Why vocabulary is divided by GPU number and how to load it?

    When I train a pretraining model, the vocabulary is divided by the number of GPU. So I can't directly load it with origin model in downstream tasks. How should I do?Thanks!

    opened by Aurora-slz 3
  • HuggingFace module

    HuggingFace module

    I read your paper with great interest. You seem to have a lot of novel ideas about how to improve the pretraining. Some of the scores are really impressive. I would like to test some of these ideas on other corpuses.

    Have you considered making the code available as a HuggingFace module (TensorFlow/PyTorch/Flax)? I think this would lead to a lot more people looking into your ideas.

    opened by peregilk 3
  • 模型权重加载问题

    模型权重加载问题

    在进一步预训练时执行模型加载的时候报错:

    RuntimeError: Error(s) in loading state_dict for GLMModel:
    	Missing key(s) in state_dict: "word_embeddings.weight", "transformer.block_position_embeddings.weight". 
    	Unexpected key(s) in state_dict: "mixins.block_position_embedding.block_position_embeddings.weight", "transformer.word_embeddings.weight". 
    

    模型是在经过change_mp.py切分为4片之后的模型

    opened by maojinyang 2
  • how to choose the finetuning script for question-answering task

    how to choose the finetuning script for question-answering task

    I have a task ,which given a question, using the [sMASK] token to generate the answer sentence, which finetuning script should I use? Should I use the text-summarization finetuning script ?

    opened by shaomai00 2
  • 运行(bash scripts/generate_block.sh config_tasks/model_blocklm_10B_chinese.sh)代码时生成的文本与示例中的不一致

    运行(bash scripts/generate_block.sh config_tasks/model_blocklm_10B_chinese.sh)代码时生成的文本与示例中的不一致

    
    运行环境
    # ============
    使用的main分支下的代码
    运行脚本:bash scripts/generate_block.sh config_tasks/model_blocklm_10B_chinese.sh
    checkpoint file: GLM-XXLarge-Chinese 10B
    # ============
    
    # 测试时候的输出
    Context: 问题:冬天,中国哪座城市最适合避寒?问题描述:能推荐一些国内适合冬天避寒的城市吗?回答用户:旅游爱好者 回答:[gMASK]
    
    GLM: [CLS]问题:冬天,中国哪座城市最适合避寒?问题描述:能推荐一些国内适合冬天避寒的城市吗?回答用户:旅游爱好者 回答:[gMASK]<|startofpiece|>避寒的话当然是成都啦!成都的冬天真的十分美好,天气暖和,还有不少小吃,来成都避寒真的再合适不过了。 成都的冬天没有雾霾,蓝天白云,温度也很适宜,不像东北那样冷得刺骨。成都是一个生活节奏很慢的城市,在快节奏的生活下,成都的冬天可以让你慢下来,享受生活带来的美好。 下面安利一波来成都避寒冷门的游玩线路吧! 首先推荐一条冬天来成都必打卡的线路!那就是去西岭雪山!! 西岭雪山位于成都市大邑县境内。由大邑县境内终年积雪的西岭而得名。景区面积375平方公里,为邛崃山脉的尾端。西岭雪山常年积雪。有“日出成都、大雪纷飞”的美誉。 冬天去西岭雪山简直是享受雪地的乐趣啊!而且西岭雪山距离成都市区不到90公里,开车1个多小时就能到啦!去之前最好查好天气预报,去之前一定要穿厚实暖和的羽绒服哦! 西岭雪山景区海拔约2500米,是一个大型高山滑雪场。雪山气势宏伟,终年积雪,山上有终年不化的积雪,银光灿灿,是大自然赐与成都平原地区最壮观瑰丽的自然景象。雪场目前有初级道、中级道、高级道共三条。初级道,中高级道的滑雪区域都是对外开放的。 成都的冬天非常阴冷,如果去西岭雪山的话一定不要忘记带上羽绒服哦! 除了冬季玩雪,成都在其他季节也是非常值得游玩的,尤其是春天,漫山的桃花、油菜花还有梨花,简直美翻了! 成都的春天的美景太多啦,就不一一介绍啦!如果有机会的话,大家还可以去成都周边走走,感受不一样的成都,大家也可以去看看大冰的小屋哦,里面有很多成都周边游玩的攻略。
    
    # 示例
    Context: 问题:冬天,中国哪座城市最适合避寒?问题描述:能推荐一些国内适合冬天避寒的城市吗?回答用户:旅游爱好者 回答: [gMASK]
    
    GLM:海南三亚,三亚的冬天是三亚的旺季,因为冬天国内大部分地方都有雾霾,而三亚却是没有雾霾的地方,所以三亚是冬天最惬意不过的地方了。在东北长大的我觉得三亚简直就是一个天堂,冬天去海口三亚旅游,享受冬天,享受阳光沙滩。但是海口却是很干燥,需要多喝水。 三亚冬天最热门的玩法就是晒太阳,在海边晒晒太阳,在沙滩上晒晒太阳,感觉整个人都得到了解放。三亚还有一个特色项目,就是海上冲浪,在三亚的沙滩上冲浪也是一件非常刺激的事情。 海口,海口冬季的阳光十分温暖,海南的冬季也是属于冬季旅游的旺季。冬季的海口最棒的是去海南的热带野生动植物园,那里有数之不尽的热带小动物,在这里可以近距离的和它们接触,海南的热带野生动植物园也是海南的天然氧吧。还可以在海口观澜湖公园里感受海口美丽的海景。 贵阳,贵州的冬天也是十分温暖的,贵阳也是冬季避寒很好的城市之一。冬季去贵阳玩一定要去黔灵山,黔灵山是贵州香火很旺盛的一个寺庙,寺庙的冬季香火鼎盛,在冬季去寺庙游玩也是一个很好的体验。除了黔灵山,贵阳在冬季还有花溪公园可以去玩,花溪公园也是去当地公园玩最好的选择。 青岛,青岛的冬天是青岛最舒服的时候,青岛有很多海滨浴场,冬天去海边泡一泡温泉,然后晒晒太阳是一件十分惬意的事情。青岛也有沙滩,冬天在沙滩上晒晒太阳,看看海,再玩玩沙滩游戏,感觉十分快乐的事。
    
    
    
    
    opened by Ant0082 2
  • The pretraining corpus of GLM-Large-Chinese

    The pretraining corpus of GLM-Large-Chinese

    Hi,

    1. What is the pretraining corpus of GLM-Large-Chinese/GLM-10B-Chinese released ? Wiki+BookCorpus in README or wudao baike zhihu(in config/ds_block_large_chinese.sh) ?
    2. Besides, how large is the corpus used to train GLM-Large-Chinese and GLM-10B-Chinese ? Thanks.
    opened by cklsoft 1
  • Hardware requirements

    Hardware requirements

    I was trying to find hardware requirements for serving, and maybe also fine tuning, a monolingual version for English. Where can I find it? Is it something that can also be potentially added to the README.md?

    Are the requirements comparable to those needed to serve Meta AI OPT models?

    documentation 
    opened by eli-halych 0
  • Google Colab error

    Google Colab error

    Basic code:

    !git clone https://github.com/THUDM/GLM %cd GLM !pip install -r requirements.txt !pip install apex

    modify model_path inside generate_block.sh here ; I'm using glm-1.5-generation.tar.bz2

    !chmod 755 scripts/generate_block.sh !scripts/generate_block.sh config_tasks/model_blocklm_10B_chinese.sh

    Error Log:

    Traceback (most recent call last): File "generate_samples.py", line 23, in from arguments import get_args File "/content/GLM/arguments.py", line 23, in from utils import get_hostname File "/content/GLM/utils.py", line 26, in from fp16 import FP16_Optimizer File "/content/GLM/fp16/init.py", line 15, in from .fp16util import ( File "/content/GLM/fp16/fp16util.py", line 21, in import mpu File "/content/GLM/mpu/init.py", line 35, in from .layers import ColumnParallelLinear File "/content/GLM/mpu/layers.py", line 28, in from apex.normalization.fused_layer_norm import FusedLayerNorm as LayerNorm File "/usr/local/lib/python3.7/dist-packages/apex/init.py", line 13, in from pyramid.session import UnencryptedCookieSessionFactoryConfig ImportError: cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)

    opened by MultiTrickFox 1
  • EOFError: Ran out of input

    EOFError: Ran out of input

    My GPU is RTX3090 * 2, memory is 256g, and CPU is Intel (R) Xeon (R) gold 6230 CPU @ 2.10GHz. When I use the docker image glm-cuda112 provided to run GLM to train SuperGLUE-COPA, the following error occurs,

    WARNING: could not find the metadata file /root/data/checkpoints/blocklm-base-blank/latest_checkpointed_iteration.txt Try to directly load the checkpoint from the directory global rank 0 is loading pretrained model /root/data/checkpoints/blocklm-base-blank/mp_rank_00_model_states.pt Traceback (most recent call last): File "finetune_glm.py", line 469, in main(args) File "/workspace/GLM-main/tasks/superglue/finetune.py", line 100, in main finetune(args, train_valid_datasets_provider, model_kwargs, File "/workspace/GLM-main/finetune_glm.py", line 379, in finetune load_pretrained(model, args.load_pretrained, args, task_tokens=task_tokens) File "/workspace/GLM-main/train_utils.py", line 23, in load_pretrained sd = torch.load(checkpoint_name, map_location='cpu') File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 593, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 762, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) issue.txt EOFError: Ran out of input Killing subprocess 28691 Killing subprocess 28692 Traceback (most recent call last): File "/opt/conda/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/conda/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/opt/conda/lib/python3.8/site-packages/deepspeed/launcher/launch.py", line 171, in main() File "/opt/conda/lib/python3.8/site-packages/deepspeed/launcher/launch.py", line 161, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/opt/conda/lib/python3.8/site-packages/deepspeed/launcher/launch.py", line 139, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '-u', 'finetune_glm.py', '--local_rank=1', '--deepspeed', '--deepspeed_config', 'config_tasks/config_blocklm_10B.json', '--finetune', '--cloze-eval', '--experiment-name', 'blank-base-copa_04-17-02-31', '--task', 'COPA', '--data-dir', '/root/data/superglue/COPA', '--save', '/root/data/checkpoints', '--seq-length', '256', '--checkpoint-activations', '--eval-batch-size', '16', '--save-epoch', '100000', '--num-workers', '1', '--no-load-optim', '--no-load-lr-scheduler', '--block-lm', '--num-layers', '12', '--hidden-size', '768', '--num-attention-heads', '12', '--max-position-embeddings', '512', '--tokenizer-model-type', 'bert-base-uncased', '--tokenizer-type', 'BertWordPieceTokenizer', '--load-pretrained', '/root/data/checkpoints/blocklm-base-blank', '--lr-decay-style', 'linear', '--warmup', '0.1', '--weight-decay', '1.0e-1', '--pattern-id', '0', '--save-interval', '10000', '--log-interval', '20', '--eval-interval', '1000', '--eval-iters', '100', '--fp16', '--model-parallel-size', '1', '--continuous-prompt', '--num-prompt-tokens', '3', '--epochs', '100', '--overwrite']' returned non-zero exit status 1

    opened by helloeng 0
Owner
THUDM
Data Mining Research Group at Tsinghua University
THUDM
A task-agnostic vision-language architecture as a step towards General Purpose Vision

Towards General Purpose Vision Systems By Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, and Derek Hoiem Overview Welcome to the official code base f

AI2 79 Dec 23, 2022
A general-purpose programming language, focused on simplicity, safety and stability.

The Rivet programming language A general-purpose programming language, focused on simplicity, safety and stability. Rivet's goal is to be a very power

The Rivet programming language 17 Dec 29, 2022
Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020)

GraspNet Baseline Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020). [paper] [dataset] [API] [do

GraspNet 209 Dec 29, 2022
Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

This repository is a toolkit to do machine learning for programming languages. It implements tokenization, dataset preprocessing, model training and m

Facebook Research 408 Jan 1, 2023
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.

English | 简体中文 Easy Parallel Library Overview Easy Parallel Library (EPL) is a general and efficient library for distributed model training. Usability

Alibaba 185 Dec 21, 2022
In this project we investigate the performance of the SetCon model on realistic video footage. Therefore, we implemented the model in PyTorch and tested the model on two example videos.

Contrastive Learning of Object Representations Supervisor: Prof. Dr. Gemma Roig Institutions: Goethe University CVAI - Computational Vision & Artifici

Dirk Neuhäuser 6 Dec 8, 2022
Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

El Bruno 3 Mar 30, 2022
A general 3D Object Detection codebase in PyTorch.

Det3D is the first 3D Object Detection toolbox which provides off the box implementations of many 3D object detection algorithms such as PointPillars, SECOND, PIXOR, etc, as well as state-of-the-art methods on major benchmarks like KITTI(ViP) and nuScenes(CBGS).

Benjin Zhu 1.4k Jan 5, 2023
Implementation of self-attention mechanisms for general purpose. Focused on computer vision modules. Ongoing repository.

Self-attention building blocks for computer vision applications in PyTorch Implementation of self attention mechanisms for computer vision in PyTorch

AI Summer 962 Dec 23, 2022
Scikit-learn compatible estimation of general graphical models

skggm : Gaussian graphical models using the scikit-learn API In the last decade, learning networks that encode conditional independence relationships

null 213 Jan 2, 2023
(CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic

ClassSR (CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic Paper Authors: Xiangtao Kong, Hengyuan

Xiangtao Kong 308 Jan 5, 2023
Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch

Perceiver - Pytorch Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch Install $ pip install perceiver-pytorch Usage

Phil Wang 876 Dec 29, 2022
a general-purpose Transformer based vision backbone

Swin Transformer By Ze Liu*, Yutong Lin*, Yue Cao*, Han Hu*, Yixuan Wei, Zheng Zhang, Stephen Lin and Baining Guo. This repo is the official implement

Microsoft 9.9k Jan 8, 2023
Implementation of Perceiver, General Perception with Iterative Attention in TensorFlow

Perceiver This Python package implements Perceiver: General Perception with Iterative Attention by Andrew Jaegle in TensorFlow. This model builds on t

Rishit Dagli 84 Oct 15, 2022
Vector Neurons: A General Framework for SO(3)-Equivariant Networks

Vector Neurons: A General Framework for SO(3)-Equivariant Networks Created by Congyue Deng, Or Litany, Yueqi Duan, Adrien Poulenard, Andrea Tagliasacc

Congyue Deng 332 Dec 29, 2022
BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation This is a demo implementation of BYOL for Audio (BYOL-A), a self-sup

NTT Communication Science Laboratories 160 Jan 4, 2023
Implementation of paper "DeepTag: A General Framework for Fiducial Marker Design and Detection"

Implementation of paper DeepTag: A General Framework for Fiducial Marker Design and Detection. Project page: https://herohuyongtao.github.io/research/

Yongtao Hu 46 Dec 12, 2022
General Virtual Sketching Framework for Vector Line Art (SIGGRAPH 2021)

General Virtual Sketching Framework for Vector Line Art - SIGGRAPH 2021 Paper | Project Page Outline Dependencies Testing with Trained Weights Trainin

Haoran MO 118 Dec 27, 2022