The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

Related tags

Deep Learning PRIMER
Overview

PRIMER

The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization.

PRIMER is a pre-trained model for multi-document representation with focus on summarization that reduces the need for dataset-specific architectures and large amounts of fine-tuning labeled data. With extensive experiments on 6 multi-document summarization datasets from 3 different domains on the zero-shot, few-shot and full-supervised settings, PRIMER outperforms current state-of-the-art models on most of these settings with large margins.

Set up

  1. Create new virtual environment by
conda create --name primer python=3.7
conda activate primer
conda install cudatoolkit=10.0
  1. Install Longformer by
pip install git+https://github.com/allenai/longformer.git
  1. Install requirements to run the summarization scripts and data generation scripts by
pip install -r requirements.txt

Usage of PRIMER

  1. Download the pre-trained PRIMER model here to ./PRIMER_model
  2. Load the tokenizer and model by
from transformers import AutoTokenizer
from longformer import LongformerEncoderDecoderForConditionalGeneration
from longformer import LongformerEncoderDecoderConfig

tokenizer = AutoTokenizer.from_pretrained('./PRIMER_model/')
config = LongformerEncoderDecoderConfig.from_pretrained('./PRIMER_model/')
model = LongformerEncoderDecoderForConditionalGeneration.from_pretrained(
            './PRIMER_model/', config=config)

Make sure the documents separated with <doc-sep> in the input.

Summarization Scripts

You can use script/primer_main.py for pre-train/train/test PRIMER, and script/compared_model_main.py for train/test BART/PEGASUS/LED.

Pre-training Data Generation

Newshead: we crawled the newshead dataset using the original code, and cleaned up the crawled data, the final newshead dataset can be found here.

You can use utils/pretrain_preprocess.py to generate pre-training data.

  1. Generate data with scores and entities with --mode compute_all_scores
  2. Generate pre-training data with --mode pretraining_data_with_score:
    • Pegasus: --strategy greedy --metric pegasus_score
    • Entity_Pyramid: --strategy greedy_entity_pyramid --metric pyramid_rouge

Datasets

  • For Multi-News and Multi-XScience, it will automatically download from Huggingface.
  • WCEP-10: the preprocessed version can be found here
  • Wikisum: we only use a small subset for few-shot training(10/100) and testing(3200). The subset we used can be found here. Note we have significantly more examples than we used in train.pt and valid.pt, as we sample 10/100 examples multiple times in the few-shot setting, and we need to make sure it has a large pool to sample from.
  • DUC2003/2004: You need to apply for access based on the instruction
  • arXiv: you can find the data we used in this repo
Comments
  • Config specifies max_position_embeddings as 1024

    Config specifies max_position_embeddings as 1024

    Hi!

    I noticed that the PRIMERA configs specifies max_position_embeddings: 1024. Is this intentional? AFAICT the HuggingFace library treats this as the maximum position embedding size of the encoder, or max_encoder_position_embeddings, which for PRIMERA is 4096.

    E.g. in their run_summarization.py script, they appear to treat max_position_embeddings as max_encoder_position_embeddings as they compare it to the max_source_length.

    So I am wondering if max_position_embeddings should be set to 4096 in the PRIMERA configs, else it causes problems when trying to use with existing HF example scripts.

    opened by JohnGiorgi 6
  • Can PRIMERA accept 16k input?

    Can PRIMERA accept 16k input?

    Could you please tell me can the models on HF (https://huggingface.co/allenai/PRIMERA, https://huggingface.co/allenai/PRIMERA-arxiv) accept 16k input. Can I just set the max_length to 16384 to let it accept such a length of a long document? Thanks.

    opened by GabrielLin 4
  • bash script of fine-tuning on multinews dataset on multiple gpus using ddp

    bash script of fine-tuning on multinews dataset on multiple gpus using ddp

    Hi,

    I wonder if there is a script to fine-tune the pre-trained PRIMERA model on multiple GPUs using distributed data parallel (From the run_bash I can only find test scripts). I tried using the following command:

    python primer_main.py --primer_path "../PRIMERA_model" --gpus 8 --batch_size 1 --accelerator ddp
    

    but it prompts out errors of ddp as follows:

    Namespace(acc_batch=16, accelerator='ddp', accum_data_per_step=16, adafactor=False, applyTriblck=False, attention_dropout=0.1, attention_mode='sliding_chunks', attention_window=512, batch_size=1, beam_size=1, ckpt_path=None, compute_rouge=False, data_path='../dataset/multi_news', dataset_name='multi_news', debug_mode=False, eval_steps=2500, fewshot=False, fix_lr=False, fp32=False, gpus=8, grad_ckpt=False, join_method='concat_start_wdoc_global', label_smoothing=0.0, length_penalty=1.0, limit_test_batches=None, limit_valid_batches=None, lr=3e-05, mask_num=0, max_length_input=4096, max_length_tgt=1024, min_length_tgt=0, mode='train', model_path='./longformer_summ_multinews/', num_train_data=-1, num_workers=1, primer_path='../PRIMERA_model', progress_bar_refresh_rate=1, rand_seed=0, remove_masks=False, report_steps=50, resume_ckpt=None, saveRouge=False, saveTopK=3, test_batch_size=-1, test_imediate=False, tokenizer='facebook/bart-base', total_steps=50000, val_check_interval=1.0, warmup_steps=1000)
    GPU available: True, used: True
    TPU available: False, using: 0 TPU cores
    Using native 16bit precision.
    Using custom data configuration default
    Reusing dataset multi_news (../dataset/multi_news/multi_news/default/1.0.0/9df9096a1eef569784b4859cc8009c53f31c66b9ccb4f9033feee1f875003adf)
    100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1091.32it/s]
    Namespace(acc_batch=16, accelerator='ddp', accum_data_per_step=16, adafactor=False, applyTriblck=False, attention_dropout=0.1, attention_mode='sliding_chunks', attention_window=512, batch_size=1, beam_size=1, ckpt_path=None, compute_rouge=False, data_path='../dataset/multi_news', dataset_name='multi_news', debug_mode=False, eval_steps=2500, fewshot=False, fix_lr=False, fp32=False, gpus=8, grad_ckpt=False, join_method='concat_start_wdoc_global', label_smoothing=0.0, length_penalty=1.0, limit_test_batches=None, limit_valid_batches=None, lr=3e-05, mask_num=0, max_length_input=4096, max_length_tgt=1024, min_length_tgt=0, mode='train', model_path='./longformer_summ_multinews/', num_train_data=-1, num_workers=1, primer_path='../PRIMERA_model', progress_bar_refresh_rate=1, rand_seed=0, remove_masks=False, report_steps=50, resume_ckpt=None, saveRouge=False, saveTopK=3, test_batch_size=-1, test_imediate=False, tokenizer='facebook/bart-base', total_steps=50000, val_check_interval=1.0, warmup_steps=1000)
    Namespace(acc_batch=16, accelerator='ddp', accum_data_per_step=16, adafactor=False, applyTriblck=False, attention_dropout=0.1, attention_mode='sliding_chunks', attention_window=512, batch_size=1, beam_size=1, ckpt_path=None, compute_rouge=False, data_path='../dataset/multi_news', dataset_name='multi_news', debug_mode=False, eval_steps=2500, fewshot=False, fix_lr=False, fp32=False, gpus=8, grad_ckpt=False, join_method='concat_start_wdoc_global', label_smoothing=0.0, length_penalty=1.0, limit_test_batches=None, limit_valid_batches=None, lr=3e-05, mask_num=0, max_length_input=4096, max_length_tgt=1024, min_length_tgt=0, mode='train', model_path='./longformer_summ_multinews/', num_train_data=-1, num_workers=1, primer_path='../PRIMERA_model', progress_bar_refresh_rate=1, rand_seed=0, remove_masks=False, report_steps=50, resume_ckpt=None, saveRouge=False, saveTopK=3, test_batch_size=-1, test_imediate=False, tokenizer='facebook/bart-base', total_steps=50000, val_check_interval=1.0, warmup_steps=1000)
    Namespace(acc_batch=16, accelerator='ddp', accum_data_per_step=16, adafactor=False, applyTriblck=False, attention_dropout=0.1, attention_mode='sliding_chunks', attention_window=512, batch_size=1, beam_size=1, ckpt_path=None, compute_rouge=False, data_path='../dataset/multi_news', dataset_name='multi_news', debug_mode=False, eval_steps=2500, fewshot=False, fix_lr=False, fp32=False, gpus=8, grad_ckpt=False, join_method='concat_start_wdoc_global', label_smoothing=0.0, length_penalty=1.0, limit_test_batches=None, limit_valid_batches=None, lr=3e-05, mask_num=0, max_length_input=4096, max_length_tgt=1024, min_length_tgt=0, mode='train', model_path='./longformer_summ_multinews/', num_train_data=-1, num_workers=1, primer_path='../PRIMERA_model', progress_bar_refresh_rate=1, rand_seed=0, remove_masks=False, report_steps=50, resume_ckpt=None, saveRouge=False, saveTopK=3, test_batch_size=-1, test_imediate=False, tokenizer='facebook/bart-base', total_steps=50000, val_check_interval=1.0, warmup_steps=1000)
    Using native 16bit precision.
    Using custom data configuration default
    Reusing dataset multi_news (../dataset/multi_news/multi_news/default/1.0.0/9df9096a1eef569784b4859cc8009c53f31c66b9ccb4f9033feee1f875003adf)
    100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1022.00it/s]
    initializing ddp: GLOBAL_RANK: 1, MEMBER: 2/8
    Namespace(acc_batch=16, accelerator='ddp', accum_data_per_step=16, adafactor=False, applyTriblck=False, attention_dropout=0.1, attention_mode='sliding_chunks', attention_window=512, batch_size=1, beam_size=1, ckpt_path=None, compute_rouge=False, data_path='../dataset/multi_news', dataset_name='multi_news', debug_mode=False, eval_steps=2500, fewshot=False, fix_lr=False, fp32=False, gpus=8, grad_ckpt=False, join_method='concat_start_wdoc_global', label_smoothing=0.0, length_penalty=1.0, limit_test_batches=None, limit_valid_batches=None, lr=3e-05, mask_num=0, max_length_input=4096, max_length_tgt=1024, min_length_tgt=0, mode='train', model_path='./longformer_summ_multinews/', num_train_data=-1, num_workers=1, primer_path='../PRIMERA_model', progress_bar_refresh_rate=1, rand_seed=0, remove_masks=False, report_steps=50, resume_ckpt=None, saveRouge=False, saveTopK=3, test_batch_size=-1, test_imediate=False, tokenizer='facebook/bart-base', total_steps=50000, val_check_interval=1.0, warmup_steps=1000)
    Namespace(acc_batch=16, accelerator='ddp', accum_data_per_step=16, adafactor=False, applyTriblck=False, attention_dropout=0.1, attention_mode='sliding_chunks', attention_window=512, batch_size=1, beam_size=1, ckpt_path=None, compute_rouge=False, data_path='../dataset/multi_news', dataset_name='multi_news', debug_mode=False, eval_steps=2500, fewshot=False, fix_lr=False, fp32=False, gpus=8, grad_ckpt=False, join_method='concat_start_wdoc_global', label_smoothing=0.0, length_penalty=1.0, limit_test_batches=None, limit_valid_batches=None, lr=3e-05, mask_num=0, max_length_input=4096, max_length_tgt=1024, min_length_tgt=0, mode='train', model_path='./longformer_summ_multinews/', num_train_data=-1, num_workers=1, primer_path='../PRIMERA_model', progress_bar_refresh_rate=1, rand_seed=0, remove_masks=False, report_steps=50, resume_ckpt=None, saveRouge=False, saveTopK=3, test_batch_size=-1, test_imediate=False, tokenizer='facebook/bart-base', total_steps=50000, val_check_interval=1.0, warmup_steps=1000)
    Using native 16bit precision.
    Using custom data configuration default
    Reusing dataset multi_news (../dataset/multi_news/multi_news/default/1.0.0/9df9096a1eef569784b4859cc8009c53f31c66b9ccb4f9033feee1f875003adf)
    100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1063.02it/s]
    initializing ddp: GLOBAL_RANK: 2, MEMBER: 3/8
    Using native 16bit precision.
    Using custom data configuration default
    Reusing dataset multi_news (../dataset/multi_news/multi_news/default/1.0.0/9df9096a1eef569784b4859cc8009c53f31c66b9ccb4f9033feee1f875003adf)
    100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1064.45it/s]
    initializing ddp: GLOBAL_RANK: 3, MEMBER: 4/8
    Namespace(acc_batch=16, accelerator='ddp', accum_data_per_step=16, adafactor=False, applyTriblck=False, attention_dropout=0.1, attention_mode='sliding_chunks', attention_window=512, batch_size=1, beam_size=1, ckpt_path=None, compute_rouge=False, data_path='../dataset/multi_news', dataset_name='multi_news', debug_mode=False, eval_steps=2500, fewshot=False, fix_lr=False, fp32=False, gpus=8, grad_ckpt=False, join_method='concat_start_wdoc_global', label_smoothing=0.0, length_penalty=1.0, limit_test_batches=None, limit_valid_batches=None, lr=3e-05, mask_num=0, max_length_input=4096, max_length_tgt=1024, min_length_tgt=0, mode='train', model_path='./longformer_summ_multinews/', num_train_data=-1, num_workers=1, primer_path='../PRIMERA_model', progress_bar_refresh_rate=1, rand_seed=0, remove_masks=False, report_steps=50, resume_ckpt=None, saveRouge=False, saveTopK=3, test_batch_size=-1, test_imediate=False, tokenizer='facebook/bart-base', total_steps=50000, val_check_interval=1.0, warmup_steps=1000)
    Using native 16bit precision.
    Using custom data configuration default
    Reusing dataset multi_news (../dataset/multi_news/multi_news/default/1.0.0/9df9096a1eef569784b4859cc8009c53f31c66b9ccb4f9033feee1f875003adf)
    100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1028.02it/s]
    initializing ddp: GLOBAL_RANK: 4, MEMBER: 5/8
    initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/8
    Namespace(acc_batch=16, accelerator='ddp', accum_data_per_step=16, adafactor=False, applyTriblck=False, attention_dropout=0.1, attention_mode='sliding_chunks', attention_window=512, batch_size=1, beam_size=1, ckpt_path=None, compute_rouge=False, data_path='../dataset/multi_news', dataset_name='multi_news', debug_mode=False, eval_steps=2500, fewshot=False, fix_lr=False, fp32=False, gpus=8, grad_ckpt=False, join_method='concat_start_wdoc_global', label_smoothing=0.0, length_penalty=1.0, limit_test_batches=None, limit_valid_batches=None, lr=3e-05, mask_num=0, max_length_input=4096, max_length_tgt=1024, min_length_tgt=0, mode='train', model_path='./longformer_summ_multinews/', num_train_data=-1, num_workers=1, primer_path='../PRIMERA_model', progress_bar_refresh_rate=1, rand_seed=0, remove_masks=False, report_steps=50, resume_ckpt=None, saveRouge=False, saveTopK=3, test_batch_size=-1, test_imediate=False, tokenizer='facebook/bart-base', total_steps=50000, val_check_interval=1.0, warmup_steps=1000)
    Using native 16bit precision.
    Using custom data configuration default
    Reusing dataset multi_news (../dataset/multi_news/multi_news/default/1.0.0/9df9096a1eef569784b4859cc8009c53f31c66b9ccb4f9033feee1f875003adf)
    100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1046.74it/s]
    initializing ddp: GLOBAL_RANK: 5, MEMBER: 6/8
    Using native 16bit precision.
    Using custom data configuration default
    Reusing dataset multi_news (../dataset/multi_news/multi_news/default/1.0.0/9df9096a1eef569784b4859cc8009c53f31c66b9ccb4f9033feee1f875003adf)
    100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1075.46it/s]
    initializing ddp: GLOBAL_RANK: 6, MEMBER: 7/8
    Using native 16bit precision.
    Using custom data configuration default
    Reusing dataset multi_news (../dataset/multi_news/multi_news/default/1.0.0/9df9096a1eef569784b4859cc8009c53f31c66b9ccb4f9033feee1f875003adf)
    100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1096.74it/s]
    initializing ddp: GLOBAL_RANK: 7, MEMBER: 8/8
    LOCAL_RANK: 3 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
    LOCAL_RANK: 6 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
    LOCAL_RANK: 2 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
    LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
    LOCAL_RANK: 5 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
    LOCAL_RANK: 4 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
    LOCAL_RANK: 7 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
    LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
    
      | Name  | Type                                             | Params
    ---------------------------------------------------------------------------
    0 | model | LongformerEncoderDecoderForConditionalGeneration | 447 M 
    ---------------------------------------------------------------------------
    447 M     Trainable params
    0         Non-trainable params
    447 M     Total params
    1,788.895 Total estimated model params size (MB)
    Validation sanity check: 0it [00:00, ?it/s]/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/utilities/distributed.py:69: UserWarning: The dataloader, val dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 96 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
      warnings.warn(*args, **kwargs)
    Validation sanity check:   0%|                                                                                                                          | 0/2 [00:00<?, ?it/sValidation Result at Step 0
    Rouge-1 r score: 0.633570, Rouge-1 p score: 0.386207, Rouge-1 f-score: 0.473641
    Rouge-2 r score: 0.225127, Rouge-2 p score: 0.137367, Rouge-2 f-score: 0.168398
    Rouge-L r score: 0.258078, Rouge-L p score: 0.161508, Rouge-L f-score: 0.196181
    Rouge-Lsum r score: 0.258078, Rouge-Lsum p score: 0.161508,             Rouge-Lsum f-score: 0.196181
    Validation Result at Step 0
    Rouge-1 r score: 0.542197, Rouge-1 p score: 0.253998, Rouge-1 f-score: 0.345720
    Rouge-2 r score: 0.089577, Rouge-2 p score: 0.041425, Rouge-2 f-score: 0.056617
    Rouge-L r score: 0.224821, Rouge-L p score: 0.104866, Rouge-L f-score: 0.142931
    Rouge-Lsum r score: 0.224821, Rouge-Lsum p score: 0.104866,             Rouge-Lsum f-score: 0.142931
    Validation Result at Step 0
    Rouge-1 r score: 0.488497, Rouge-1 p score: 0.294507, Rouge-1 f-score: 0.365395
    Rouge-2 r score: 0.149167, Rouge-2 p score: 0.091010, Rouge-2 f-score: 0.112431
    Rouge-L r score: 0.266496, Rouge-L p score: 0.155666, Rouge-L f-score: 0.195401
    Rouge-Lsum r score: 0.266496, Rouge-Lsum p score: 0.155666,             Rouge-Lsum f-score: 0.195401
    Validation Result at Step 0
    Rouge-1 r score: 0.418725, Rouge-1 p score: 0.389816, Rouge-1 f-score: 0.384908
    Rouge-2 r score: 0.100215, Rouge-2 p score: 0.105262, Rouge-2 f-score: 0.098602
    Rouge-L r score: 0.176946, Rouge-L p score: 0.153756, Rouge-L f-score: 0.156682
    Rouge-Lsum r score: 0.176946, Rouge-Lsum p score: 0.153756,             Rouge-Lsum f-score: 0.156682
    Validation Result at Step 0
    Rouge-1 r score: 0.424317, Rouge-1 p score: 0.271739, Rouge-1 f-score: 0.325188
    Rouge-2 r score: 0.133041, Rouge-2 p score: 0.074561, Rouge-2 f-score: 0.094382
    Rouge-L r score: 0.236625, Rouge-L p score: 0.151552, Rouge-L f-score: 0.181355
    Rouge-Lsum r score: 0.236625, Rouge-Lsum p score: 0.151552,             Rouge-Lsum f-score: 0.181355
    Validation Result at Step 0
    Rouge-1 r score: 0.511936, Rouge-1 p score: 0.385712, Rouge-1 f-score: 0.438370
    Rouge-2 r score: 0.161077, Rouge-2 p score: 0.119126, Rouge-2 f-score: 0.136489
    Rouge-L r score: 0.232773, Rouge-L p score: 0.176263, Rouge-L f-score: 0.199890
    Validation Result at Step 0
    Rouge-Lsum r score: 0.232773, Rouge-Lsum p score: 0.176263,             Rouge-Lsum f-score: 0.199890
    Rouge-1 r score: 0.306800, Rouge-1 p score: 0.350738, Rouge-1 f-score: 0.235585
    Rouge-2 r score: 0.063588, Rouge-2 p score: 0.072438, Rouge-2 f-score: 0.048596
    Rouge-L r score: 0.130816, Rouge-L p score: 0.224060, Rouge-L f-score: 0.112750
    Rouge-Lsum r score: 0.130816, Rouge-Lsum p score: 0.224060,             Rouge-Lsum f-score: 0.112750
    Validation Result at Step 0
    Rouge-1 r score: 0.442865, Rouge-1 p score: 0.526116, Rouge-1 f-score: 0.440968
    Rouge-2 r score: 0.133483, Rouge-2 p score: 0.160556, Rouge-2 f-score: 0.133482
    Rouge-L r score: 0.234281, Rouge-L p score: 0.297648, Rouge-L f-score: 0.240863
    Rouge-Lsum r score: 0.234281, Rouge-Lsum p score: 0.297648,             Rouge-Lsum f-score: 0.240863
    /home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/utilities/distributed.py:69: UserWarning: The dataloader, train dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 96 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
      warnings.warn(*args, **kwargs)
    Epoch 0:   0%|                                                                                                                                       | 0/6325 [00:00<?, ?it/s]../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [8,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [9,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [10,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [11,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [12,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [13,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [14,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [15,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [16,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [17,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [18,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [19,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [20,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [21,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [22,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [23,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [24,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [25,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [26,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [27,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [28,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [29,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [30,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [64,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [65,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [66,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [67,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [68,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [69,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [70,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [71,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [72,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [73,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [74,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [75,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [76,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [77,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [78,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [79,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [80,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [81,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [82,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [83,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [84,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [85,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [86,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [87,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [88,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [89,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [90,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [91,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [98,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [99,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [100,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [101,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [102,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [103,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [104,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [105,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [106,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [107,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [108,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [109,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [110,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [111,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [112,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [113,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [114,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [115,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [116,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [117,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [118,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [119,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [121,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [122,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [123,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [64,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [65,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [66,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [67,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [68,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [69,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [70,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [71,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [72,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [73,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [74,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [75,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [76,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [77,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [78,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [79,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [80,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [81,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [82,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [83,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [84,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [85,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [86,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [87,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [88,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [89,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [90,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [91,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    Traceback (most recent call last):
      File "/home/ec2-user/research/primer/script/primer_main.py", line 788, in <module>
        train(args)
      File "/home/ec2-user/research/primer/script/primer_main.py", line 524, in train
        trainer.fit(model, train_dataloader, valid_dataloader)
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 458, in fit
        self._run(model)
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 756, in _run
        self.dispatch()
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 797, in dispatch
        self.accelerator.start_training(self)
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
        self.training_type_plugin.start_training(trainer)
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
        self._results = trainer.run_stage()
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 807, in run_stage
        return self.run_train()
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 869, in run_train
        self.train_loop.run_training_epoch()
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 499, in run_training_epoch
        batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 715, in run_training_batch
        split_batch, batch_idx, opt_idx, optimizer, self.trainer.hiddens
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 823, in training_step_and_backward
        result = self.training_step(split_batch, batch_idx, opt_idx, hiddens)
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 290, in training_step
        training_step_output = self.trainer.accelerator.training_step(args)
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 204, in training_step
        return self.training_type_plugin.training_step(*args)
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/ddp.py", line 319, in training_step
        return self.model(*args, **kwargs)
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 1008, in forward
        output = self._run_ddp_forward(*inputs, **kwargs)
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 969, in _run_ddp_forward
        return module_to_run(*inputs[0], **kwargs[0])
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/overrides/base.py", line 46, in forward
        output = self.module.training_step(*inputs, **kwargs)
      File "/home/ec2-user/research/primer/script/primer_main.py", line 162, in training_step
        loss = self.shared_step(input_ids, output_ids)
      File "/home/ec2-user/research/primer/script/primer_main.py", line 142, in shared_step
        lm_logits = self.forward(input_ids, output_ids)
      File "/home/ec2-user/research/primer/script/primer_main.py", line 111, in forward
        use_cache=False,
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/transformers/modeling_bart.py", line 1113, in forward
        return_dict=return_dict,
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/transformers/modeling_bart.py", line 956, in forward
        return_dict=return_dict,
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/transformers/modeling_bart.py", line 367, in forward
        x, attn = encoder_layer(x, attention_mask, output_attentions=output_attentions)
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/transformers/modeling_bart.py", line 254, in forward
        query=x, key=x, key_padding_mask=encoder_padding_mask, output_attentions=output_attentions
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/longformer/longformer_encoder_decoder.py", line 71, in forward
        output_attentions=output_attentions,
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/longformer/longformer.py", line 114, in forward
        if max_num_extra_indices_per_batch <= 0:
    RuntimeError: CUDA error: device-side assert triggered
    CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
    For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
    terminate called after throwing an instance of 'c10::CUDAError'
      what():  CUDA error: device-side assert triggered
    CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
    For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
    Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:1387 (most recent call first):
    frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f2f4335e612 in /home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/lib/libc10.so)
    frame #1: <unknown function> + 0x22c1e (0x7f2f435cdc1e in /home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
    frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x22d (0x7f2f435d0c4d in /home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
    frame #3: <unknown function> + 0x33a968 (0x7f2f365b4968 in /home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
    frame #4: c10::TensorImpl::release_resources() + 0x175 (0x7f2f43343295 in /home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/lib/libc10.so)
    frame #5: <unknown function> + 0x2147ad (0x7f2f3648e7ad in /home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
    frame #6: <unknown function> + 0x54b518 (0x7f2f367c5518 in /home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
    frame #7: THPVariable_subclass_dealloc(_object*) + 0x2b9 (0x7f2f367c5819 in /home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
    frame #8: <unknown function> + 0xfc359 (0x55e3a505c359 in /home/ec2-user/miniconda3/envs/primer/bin/python)
    frame #9: <unknown function> + 0xfc547 (0x55e3a505c547 in /home/ec2-user/miniconda3/envs/primer/bin/python)
    frame #10: <unknown function> + 0x181016 (0x55e3a50e1016 in /home/ec2-user/miniconda3/envs/primer/bin/python)
    frame #11: <unknown function> + 0xfc50a (0x55e3a505c50a in /home/ec2-user/miniconda3/envs/primer/bin/python)
    frame #12: <unknown function> + 0x181016 (0x55e3a50e1016 in /home/ec2-user/miniconda3/envs/primer/bin/python)
    frame #13: <unknown function> + 0xfc523 (0x55e3a505c523 in /home/ec2-user/miniconda3/envs/primer/bin/python)
    frame #14: <unknown function> + 0x181016 (0x55e3a50e1016 in /home/ec2-user/miniconda3/envs/primer/bin/python)
    frame #15: <unknown function> + 0xfc547 (0x55e3a505c547 in /home/ec2-user/miniconda3/envs/primer/bin/python)
    frame #16: <unknown function> + 0x181016 (0x55e3a50e1016 in /home/ec2-user/miniconda3/envs/primer/bin/python)
    frame #17: <unknown function> + 0xfc516 (0x55e3a505c516 in /home/ec2-user/miniconda3/envs/primer/bin/python)
    frame #18: <unknown function> + 0x163815 (0x55e3a50c3815 in /home/ec2-user/miniconda3/envs/primer/bin/python)
    frame #19: _PyGC_CollectNoFail + 0x2a (0x55e3a516175a in /home/ec2-user/miniconda3/envs/primer/bin/python)
    frame #20: PyImport_Cleanup + 0x328 (0x55e3a510ce08 in /home/ec2-user/miniconda3/envs/primer/bin/python)
    frame #21: Py_FinalizeEx + 0x64 (0x55e3a5181714 in /home/ec2-user/miniconda3/envs/primer/bin/python)
    frame #22: <unknown function> + 0x232e20 (0x55e3a5192e20 in /home/ec2-user/miniconda3/envs/primer/bin/python)
    frame #23: _Py_UnixMain + 0x3c (0x55e3a519318c in /home/ec2-user/miniconda3/envs/primer/bin/python)
    frame #24: __libc_start_main + 0xea (0x7f2f5466b13a in /lib64/libc.so.6)
    frame #25: <unknown function> + 0x1d803a (0x55e3a513803a in /home/ec2-user/miniconda3/envs/primer/bin/python)
    

    Are there any insights on this error? And also could you provide your bash scripts for fine-tuning the model on multi-news? Thanks much!

    opened by zhangzx-uiuc 4
  • Results for PRIMERA-arxiv

    Results for PRIMERA-arxiv

    Hi,

    Thanks for sharing this nice work. After running your codes, I can just get 28.5 for RougeL-fmeasure for arxiv dataset, but in your paper the Rouge-L is 42.6 in Table 3, while Rouge-1 and Rouge-2 are the same as yours. Moreover, I can only get 46.6/19.1/27.5 for Rouge-1/2/L with led-large-16384-arxiv (i.e., the SOTA for arxiv), but in your Table 3, it is 41.8 for Rouge-L. Could you please helping to explain how you get such high Rouge-L values for arxiv dataset?

    opened by oaimli 4
  • Inference with PRIMERA much slower than inference with PRIMERA-multinews

    Inference with PRIMERA much slower than inference with PRIMERA-multinews

    Hi!

    I have noticed something strange. Inference with allenai/PRIMERA is much slower (as much as 5X!) than inference with allenai/PRIMERA-multinews. I have a notebook benchmarking this effect here. I checked that their model configs are identical.

    The problem is most likely related to the decoder/generation because it only occurs with the max_length argument to model.generate() is large (say 1024) and not when it is small (say 64). Here are some benchmarks using some random examples from MultiNews:

    With a batch size of 4 and max length of inputs 1024:

    Max length of outputs: 64

    • PRIMERA CUDA time total: 758.402ms
    • PRIMERA-multinews CUDA time total: 753.125ms
    • Slowdown: ~0X

    Max length of outputs: 512

    • PRIMERA CUDA time total: 3.682s
    • PRIMERA-multinews CUDA time total: 1.572s
    • Slowdown: ~2X

    Max length of outputs: 1024

    • PRIMERA CUDA time total: 7.676s
    • PRIMERA-multinews CUDA time total: 1.542s
    • Slowdown: ~5X

    Do you have any idea what might be causing this?


    EDIT: While I don't know why this is partially explained by the fact that allenai/PRIMERA does not appear to use the global_attention_mask. Running the model with and without providing global_attention_mask leads to the same inference times, while with allenai/PRIMERA-multinews, providing global_attention_mask leads to a 30% reduction in inference time. However, allenai/PRIMERA-multinews without a global_attention_mask is still almost 3X faster than allenai/PRIMERA, so this couldn't entirely explain the difference.

    opened by JohnGiorgi 2
  • Using the (pretrained) model on new data

    Using the (pretrained) model on new data

    Hi,

    First of all many thanks to the whole team for the amazing work. I'm trying to use the pretrained model (on MultiNews) to make inference on new data. At the moment I'm just trying with the test set of Multinews itself.

    I instantiate the model as suggested:

    tokenizer = AutoTokenizer.from_pretrained(model_path)
    config = LongformerEncoderDecoderConfig.from_pretrained(model_path)
    model = LongformerEncoderDecoderForConditionalGeneration.from_pretrained(model_path, config=config).to(device)
    

    Then I prepare the input similarly to any other HF model (I set max_input_length=4096):

    inputs_dict = tokenizer(input_docs, padding="max_length", max_length=max_input_length, return_tensors="pt", truncation=True)
    input_ids = inputs_dict.input_ids.to(device)
    attention_mask = inputs_dict.attention_mask.to(device)
    

    At the end I use the following to generate the summary:

    predicted_ids = model.generate(input_ids, attention_mask=attention_mask)
    text = tokenizer.batch_decode(predicted_ids, skip_special_tokens=True)
    

    However, the summaries are very short if compared with what was expected (at least for MNews). Hereafter an example of the output:

    – Voters in 11 states will pick their governors tonight, and Republicans appear on track to increase their

    It even seems to be truncated, is there something I'm doing wrong?

    opened by MorenoLaQuatra 2
  • OSError: Unable to load weights from pytorch checkpoint file

    OSError: Unable to load weights from pytorch checkpoint file

    Firstly, thanks for the excellent work. I'm trying the PRIMER model, and facing the following error when loading model:

    from transformers import AutoTokenizer
    from longformer import LongformerEncoderDecoderForConditionalGeneration
    from longformer import LongformerEncoderDecoderConfig
    
    tokenizer = AutoTokenizer.from_pretrained('/content/PRIMER/PRIMER/')
    config = LongformerEncoderDecoderConfig.from_pretrained('/content/PRIMER/PRIMER/')
    model = LongformerEncoderDecoderForConditionalGeneration.from_pretrained(
                '/content/PRIMER/PRIMER/', config=config)
    
    /usr/local/lib/python3.7/dist-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
        857             except Exception:
        858                 raise OSError(
    --> 859                     "Unable to load weights from pytorch checkpoint file. "
        860                     "If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True. "
        861                 )
    
    OSError: Unable to load weights from pytorch checkpoint file. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True. 
    

    I was running on Google Colaboratory, and skipped the Conda install part for simplicity. Is this why I got the above error ?

    opened by thangld201 1
  • Trying to get results on multi_news dataset from the paper

    Trying to get results on multi_news dataset from the paper

    Hi, thank you for your sharing. I got some troubles when I used script/primer_main.py for test PRIMERA. When I follow the settings in run_bash/test_primer.sh,I found that the generated summaries are just truncated from the source text depending on the max_length_tgt and are not generated by beam search no matter what beam_size I set. But I have no idea how to fix this proble,could you please tell me how to solve the problem? Thank you.

    opened by SabrinaZhuangxx 0
  • Pretraining-Mask sentences

    Pretraining-Mask sentences

    Hello, In the primera pretrain process. The model choose 30% of the sentences by pyramid methods and then 50% of the candidates (15% of the sentences) will be mask while all 30% will be kept as the target. May I know why the 15% masked sentences will not be inputted in the target?

    for i_d in range(len(truncated_doc)): for i_s in range(len(truncated_doc[i_d])): if cur_idx in mask_indices: tgt.append(truncated_doc[i_d][i_s]) # here is the line which choose 50% percent of the candidates (30% percent of sentences) for masking if cur_idx not in non_mask_indices: truncated_doc[i_d][i_s] = ''#tokenizer.mask_token cur_idx += 1

    opened by Ronica1234 0
  • Training PRIMER from Scratch

    Training PRIMER from Scratch

    Hello, thanks for your hard work.

    I'm trying to train PRIMER from Scratch with a customized dataset using the zero-shot method. Is this feature available now? Any tutorial I can follow?

    Much appreciated.

    opened by ZicongWen 0
  • Questions about inferencing

    Questions about inferencing

    Hi, thank you for your sharing. I have trouble in using PRIMERA to generate summary. Could you please help me using the pretrained PRIMERA model generate the summary correctly? The code is as following:

    import torch
    from transformers import AutoTokenizer
    from longformer import LongformerEncoderDecoderForConditionalGeneration
    from longformer import LongformerEncoderDecoderConfig
    import time
    tokenizer = AutoTokenizer.from_pretrained('/data/users/wangyiting/primer/PRIMER-main/models/PRIMER_multinews')
    config = LongformerEncoderDecoderConfig.from_pretrained('/data/users/wangyiting/primer/PRIMER-main/models/PRIMER_multinews')
    model = LongformerEncoderDecoderForConditionalGeneration.from_pretrained(
                './PRIMERA_model/', config=config)
    
    
    # import torch
    # from longformer.longformer import Longformer, LongformerConfig
    from longformer.sliding_chunks import pad_to_window_size
    # from transformers import RobertaTokenizer
    
    # SAMPLE_TEXT
    start_time = time.time()
    SAMPLE_TEXT = """An 11-year-old boy who survived being sucked into a flooded stormwater drain has been reunited with his rescuers in Melbourne and gifted a new bike a week after the tumultuous ordeal. Jake Gilbert was cycling with a friend in Altona Meadows last week when he rode across a submerged drain and was sucked 10 metres underneath a road. Stormwater drain ‘I love you all!’: boy sucked into stormwater drain in Melbourne praises rescuers after amazing escape. Gilbert managed to grab on to the underside of a metal grate on the other side and keep his head above water before passerby Damon Trewhella and off-duty SES member Justin Costello came to his aid. Kyle, who was also washed off his bike at the same time, had managed to avoid being sucked into the flooded stormwater drain. The SES member removed the bolts from the drain’s grate before the police officer prised the grate open – with Gilbert still desperately clinging to the underside by his fingernails. His head was just above the water before he was pulled to safety. he's getting her energy back and she's back to being a 'two-step launcher' when she goes to walk – takes two steps and launches off and takes your shoulders off – but prior to that, she'd lost all energy and she couldn't hold her own back legs up."""
    input_ids = torch.tensor(tokenizer.encode(SAMPLE_TEXT)).unsqueeze(0)  # batch of size 1
    
    # # TVM code doesn't work on CPU. Uncomment this if `config.attention_mode = 'tvm'`
    # model = model.cuda(); input_ids = input_ids.cuda()
    
    # Attention mask values -- 0: no attention, 1: local attention, 2: global attention
    attention_mask = torch.ones(input_ids.shape, dtype=torch.long, device=input_ids.device) # initialize to local attention
    attention_mask[:, [1, 4, 21,]] =  2  # Set global attention based on the task. For example,
                                         # classification: the <s> token
                                         # QA: question tokens
    
    # # padding seqlen to the nearest multiple of 512. Needed for the 'sliding_chunks' attention
    input_ids, attention_mask = pad_to_window_size(
            input_ids, attention_mask, config.attention_window[0], tokenizer.pad_token_id)
    
    max_output_len = 100
    generated_ids = model.generate(input_ids=input_ids, attention_mask=attention_mask,
                                                use_cache=True, max_length=max_output_len,
                                                num_beams=1)
    generated_str = tokenizer.batch_decode(generated_ids.tolist(), skip_special_tokens=True)
    end_time = time.time()
    print("spending: ", end_time-start_time)
    print(generated_str[0])
    
    
    
    
    
    opened by FightingEveryDay0 4
  • Issue in using given code for pretraining the model

    Issue in using given code for pretraining the model

    Hi, I am trying to reproduce the pre-training experiment with the codebase here and ran into the following issue.

    Setup / Steps undertaken:

    • Use (LED-large) as the base model to begin with.
    • I used the preprocessed data given in the README file here.
    • I modified the primer_hf_main.py a bit to add the pretraining function from primer_main.py
    • I used the Pretrain Dataset class and dataloader functions defined here in the dataloader.py file to load the preprocessed dataset, essentially replicating the exact pretrain function from the non HuggingFace primer file as is in the primer_hf_main.py file.
    • I provided other relevant args for the pretrain mode to my modified file and passed them to the pl.Trainer as shown in the other file with the same values

    Observations:

    • The PretrainDataset yields 2 values per sample in the batch even in the validation phase here.
    • Unlike the SummarizationIterDataset that yields 3 values in the validation mode here
    • Upon debugging, I realised that the collate_fn when used with the pretraining mode with the PretrainDataset receives 2 variables per batch and always defaults to this line raising an error in the validation sanity check test of the torch trainer.

    Firstly, I wanted to ask the authors @Wendy-Xiao if this is the expected behaviour and if I am taking some wrong assumptions here...?

    My solution: To handle this, I added the tokenizer and decoded output string as the tgt string variable to be yielded (replicating the same behaviour in the SummarizationIterDataset iterator output using in the train mode)

    tgt = self.tokenizer.decode(data['tgt'], skip_special_tokens=True)
    yield torch.tensor(data["src"]), torch.tensor(data["tgt"]), tgt
    
    

    But this yields errors like these:

    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [574,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    ../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [574,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    Traceback (most recent call last):
      File "/lib/python3.8/runpy.py", line 87, in _run_code
        exec(code, run_globals)
      File "/PRIMER_train/script/primer_hf_main.py", line 982, in <module>
        pretrain(args)
      File "/PRIMER_train/script/primer_hf_main.py", line 456, in pretrain
        trainer.fit(model, train_dataloader, valid_dataloader)
      File "/Miniconda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 460, in fit
        self._run(model)
      File "/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 758, in _run
        self.dispatch()
      File "/Miniconda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 799, in dispatch
        self.accelerator.start_training(self)
      File "/Miniconda/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
        self.training_type_plugin.start_training(trainer)
      File "/Miniconda/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
        self._results = trainer.run_stage()
      File "/Miniconda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 809, in run_stage
        return self.run_train()
      File "/Miniconda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 844, in run_train
        self.run_sanity_check(self.lightning_module)
      File "/Miniconda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1112, in run_sanity_check
        self.run_evaluation()
      File "/Miniconda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 967, in run_evaluation
        output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx)
      File "/Miniconda/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 174, in evaluation_step
        output = self.trainer.accelerator.validation_step(args)
      File "/Miniconda/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 226, in validation_step
        return self.training_type_plugin.validation_step(*args)
      File "/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 161, in validation_step
        return self.lightning_module.validation_step(*args, **kwargs)
      File "/PRIMER_train/script/primer_hf_main.py", line 291, in validation_step
        loss = self.shared_step(input_ids, output_ids)
      File "PRIMER_train/script/primer_hf_main.py", line 123, in shared_step
        lm_logits = self.forward(input_ids, output_ids)
      File "PRIMER_train/script/primer_hf_main.py", line 89, in forward
        outputs = self.model(
      File "/Miniconda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
        return forward_call(*input, **kwargs)
      File "/lib/python3.8/site-packages/transformers/models/led/modeling_led.py", line 2338, in forward
        outputs = self.led(
      File "/Miniconda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
        return forward_call(*input, **kwargs)
      File "/lib/python3.8/site-packages/transformers/models/led/modeling_led.py", line 2189, in forward
        encoder_outputs = self.encoder(
      File "/Miniconda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
        return forward_call(*input, **kwargs)
      File "/Miniconda/lib/python3.8/site-packages/transformers/models/led/modeling_led.py", line 1733, in forward
        inputs_embeds = self.embed_tokens(input_ids)
      File "/Miniconda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
        return forward_call(*input, **kwargs)
      File "/Miniconda/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 158, in forward
        return F.embedding(
      File "/Miniconda/lib/python3.8/site-packages/torch/nn/functional.py", line 2199, in embedding
        return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
    RuntimeError: CUDA error: device-side assert triggered
    

    which I think are out of index errors in cuda for the embeddings. I checked that the new token was included in the tokenizer correctly. (PS ignore the filepaths as they have been truncated here, but the code is configured to work as I have been able to finetune the base model on newer datasets) Can anyone point me how to address these issue? Or how did you manage to pretrain the model from scratch using the author's code given here?

    Thanks a ton! :)

    opened by jaineshdoshi 7
Owner
AI2
AI2
ConvMAE: Masked Convolution Meets Masked Autoencoders

ConvMAE ConvMAE: Masked Convolution Meets Masked Autoencoders Peng Gao1, Teli Ma1, Hongsheng Li2, Jifeng Dai3, Yu Qiao1, 1 Shanghai AI Laboratory, 2 M

Alpha VL Team of Shanghai AI Lab 345 Jan 8, 2023
The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

Sun Yi 201 Nov 21, 2022
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

This is a release of our VIMPAC paper to illustrate the implementations. The pretrained checkpoints and scripts will be soon open-sourced in HuggingFace transformers.

Hao Tan 74 Dec 3, 2022
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training [Arxiv] VideoMAE: Masked Autoencoders are Data-Efficient Learne

Multimedia Computing Group, Nanjing University 697 Jan 7, 2023
Code for NAACL 2021 full paper "Efficient Attentions for Long Document Summarization"

LongDocSum Code for NAACL 2021 paper "Efficient Attentions for Long Document Summarization" This repository contains data and models needed to reprodu

null 56 Jan 2, 2023
Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding (AAAI 2020) - PyTorch Implementation

Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding PyTorch implementation for the Scalable Attentive Sentence-Pair Modeling vi

Microsoft 25 Dec 2, 2022
The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization [Paper] accepted at the EMNLP 2021: Vision Guided Genera

CAiRE 42 Jan 7, 2023
FocusFace: Multi-task Contrastive Learning for Masked Face Recognition

FocusFace This is the official repository of "FocusFace: Multi-task Contrastive Learning for Masked Face Recognition" accepted at IEEE International C

Pedro Neto 21 Nov 17, 2022
Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

PLBART Code pre-release of our work, Unified Pre-training for Program Understanding and Generation accepted at NAACL 2021. Note. A detailed documentat

Wasi Ahmad 138 Dec 30, 2022
CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

UC2 UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training Mingyang Zhou, Luowei Zhou, Shuohang Wang, Yu Cheng, Linjie Li, Zhou Yu,

Mingyang Zhou 28 Dec 30, 2022
This is the repo for the paper `SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization'. (published in Bioinformatics'21)

SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization This is the code for our paper ``SumGNN: Multi-typed Drug

Yue Yu 58 Dec 21, 2022
The official implementation of paper Siamese Transformer Pyramid Networks for Real-Time UAV Tracking, accepted by WACV22

SiamTPN Introduction This is the official implementation of the SiamTPN (WACV2022). The tracker intergrates pyramid feature network and transformer in

Robotics and Intelligent Systems Control @ NYUAD 28 Nov 25, 2022
X-VLM: Multi-Grained Vision Language Pre-Training

X-VLM: learning multi-grained vision language alignments Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts. Yan Zeng, Xi

Yan Zeng 286 Dec 23, 2022
[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Versatile Multi-Modal Pre-Training for Human-Centric Perception Fangzhou Hong1  Liang Pan1  Zhongang Cai1,2,3  Ziwei Liu1* 1S-Lab, Nanyang Technologic

Fangzhou Hong 96 Jan 3, 2023
This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

Gautam Singh 66 Dec 26, 2022
The official PyTorch implementation of recent paper - SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training

This repository is the official PyTorch implementation of SAINT. Find the paper on arxiv SAINT: Improved Neural Networks for Tabular Data via Row Atte

Gowthami Somepalli 284 Dec 21, 2022
Official repository for the paper, MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding.

MidiBERT-Piano Authors: Yi-Hui (Sophia) Chou, I-Chun (Bronwin) Chen Introduction This is the official repository for the paper, MidiBERT-Piano: Large-

null 137 Dec 15, 2022
Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

U2Fusion Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal (VIS-IR, medical), multi

Han Xu 129 Dec 11, 2022
Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals

LapDepth-release This repository is a Pytorch implementation of the paper "Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals" M

Minsoo Song 205 Dec 30, 2022