REBEL: Relation Extraction By End-to-end Language generation

Babelscape

Last update: Jan 6, 2023

Related tags

Deep Learning rebel

Overview

REBEL: Relation Extraction By End-to-end Language generation

This is the repository for the Findings of EMNLP 2021 paper REBEL: Relation Extraction By End-to-end Language generation. We present a new linearization aproach and a reframing of Relation Extraction as a seq2seq task. The paper can be found here. If you use the code, please reference this work in your paper:

@inproceedings{huguet-cabot-navigli-2021-rebel,
title = "REBEL: Relation Extraction By End-to-end Language generation",
author = "Huguet Cabot, Pere-Llu{\'\i}s  and
  Navigli, Roberto",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
month = nov,
year = "2021",
address = "Online and in the Barceló Bávaro Convention Centre, Punta Cana, Dominican Republic",
publisher = "Association for Computational Linguistics",
url = "https://github.com/Babelscape/rebel/blob/main/docs/EMNLP_2021_REBEL__Camera_Ready_.pdf",
}

Repo structure
| conf  # contains Hydra config files
  | data
  | model
  | train
  root.yaml  # hydra root config file
| data  # data
| datasets  # datasets scripts
| model # model files should be stored here
| src
  | pl_data_modules.py  # LightinigDataModule
  | pl_modules.py  # LightningModule
  | train.py  # main script for training the network
  | test.py  # main script for training the network
| README.md
| requirements.txt
| demo.py # Streamlit demo to try out the model
| setup.sh # environment setup script

Initialize environment

In order to set up the python interpreter we utilize conda , the script setup.sh creates a conda environment and install pytorch and the dependencies in "requirements.txt".

REBEL Model and Dataset

Model and Dataset files can be downloaded here:

https://osf.io/4x3r9/?view_only=87e7af84c0564bd1b3eadff23e4b7e54

Or you can directly use the model from Huggingface repo:

https://huggingface.co/Babelscape/rebel-large

 ", "").replace("
  ", "").replace("", "").split(): if token == "
  
   ": current = 't' if relation != '': triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()}) relation = '' subject = '' elif token == "
   
    ": current = 's' if relation != '': triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()}) object_ = '' elif token == "
    
     ": current = 'o' relation = '' else: if current == 't': subject += ' ' + token elif current == 's': object_ += ' ' + token elif current == 'o': relation += ' ' + token if subject != '' and relation != '' and object_ != '': triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()}) return triplets extracted_triplets = extract_triplets(extracted_text[0]) print(extracted_triplets) "> 
     from transformers import pipeline

triplet_extractor = pipeline('text2text-generation', model='Babelscape/rebel-large', tokenizer='Babelscape/rebel-large')

# We need to use the tokenizer manually since we need special tokens.
extracted_text = triplet_extractor.tokenizer.batch_decode(triplet_extractor("Punta Cana is a resort town in the municipality of Higuey, in La Altagracia Province, the eastern most province of the Dominican Republic", return_tensors=True, return_text=False)[0]["generated_token_ids"]["output_ids"])

print(extracted_text[0])

# Function to parse the generated text and extract the triplets
def extract_triplets(text):
    triplets = []
    relation, subject, relation, object_ = '', '', '', ''
    text = text.strip()
    current = 'x'
    for token in text.replace("", "").replace("
        
         "
        , "").replace("", "").split():
        if token == "
       
        "
       :
            current = 't'
            if relation != '':
                triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})
                relation = ''
            subject = ''
        elif token == "
       
        "
       :
            current = 's'
            if relation != '':
                triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})
            object_ = ''
        elif token == "
       
        "
       :
            current = 'o'
            relation = ''
        else:
            if current == 't':
                subject += ' ' + token
            elif current == 's':
                object_ += ' ' + token
            elif current == 'o':
                relation += ' ' + token
    if subject != '' and relation != '' and object_ != '':
        triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})
    return triplets
extracted_triplets = extract_triplets(extracted_text[0])
print(extracted_triplets) 
    
   
  
 

CROCODILE: automatiC RelatiOn extraCtiOn Dataset wIth nLi filtEring.

REBEL dataset can be recreated using our RE dataset creator CROCODILE

Training and testing

There are conf files to train and test each model. Within the src folder to train for CONLL04 for instance:

train.py model=rebel_model data=conll04_data train=conll04_train

Once the model is trained, the checkpoint can be evaluated by running:

test.py model=rebel_model data=conll04_data train=conll04_train do_predict=True checkpoint_path="path_to_checkpoint"

src/model_saving.py can be used to convert a pytorch lightning checkpoint into the hf transformers format for model and tokenizer.

DEMO

We suggest running the demo to test REBEL. Once the model files are unzipped in the model folder run:

streamlit run demo.py

And a demo will be available in the browser. It accepts free input as well as data from the sample file in data/rebel/

Datasets

TACRED is not freely avialable but instructions on how to create Re-TACRED from it can be found here.

For CONLL04 and ADE one can use the script from the SpERT github.

For NYT the dataset can be downloaded from Copy_RE github.

Finally the DocRED for RE can be downloaded at the JEREX github

Comments

problem with model_saving.py

Hi, I used your train.py script to train rebel on the docred dataset. When I try to save my model using model_saving.py to use it in transformers I get the following error: Traceback (most recent call last): File "model_saving.py", line 27, in <module> model = pl_module.load_from_checkpoint(checkpoint_path = 'outputs/2022-09-02/07-42-36/experiments/docred/last.ckpt', config = config, tokenizer = tokenizer, model = model) File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 157, in load_from_checkpoint checkpoint[cls.CHECKPOINT_HYPER_PARAMS_KEY].update(kwargs) File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/_collections_abc.py", line 832, in update self[key] = other[key] File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 258, in __setitem__ self._format_and_raise( File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/omegaconf/base.py", line 95, in _format_and_raise format_and_raise( File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/omegaconf/_utils.py", line 694, in format_and_raise _raise(ex, cause) File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/omegaconf/_utils.py", line 610, in _raise raise ex # set end OC_CAUSE=1 for full backtrace omegaconf.errors.ConfigKeyError: 'str' object has no attribute '__dict__' full_key: config reference_type=Optional[Dict[Union[str, Enum], Any]] object_type=dict

Can I configure it manually? The conf file in conf = omegaconf.OmegaConf.load gets read correctly and I could transfer the values manually.

opened by l0renor 9
No improvement with pre-trained REBEL model on CONLL04

Thanks for your work. I tried to train on CONLL04, and it works as expected with BART, but with the REBEL-Large model, all the scores stay at 0. I know there is an issue with transformers 4.4.0, and my experiments were conducted with pytorch 1.7.1 and transformers 4.12.4 (no change to datasets or any other package). Have I been doing something wrong? If there is no other solution, is it possible to get an updated version of the model? If not, could I ask for the finetuned CONLL04 model with REBEL pretraining?
bug

opened by ramarasty 9
Other languages support
Hey, outstanding paper you got. I mentioned it in my Thesis :)

I was trying to train it on Russian language. I've generated the dataset with CROCODILE. Edited config files. Downloaded facebook/mbart-50-large-50. Added it to conf files as well. Additionally, I added src_lang and tgt_lang parameter that equal ru_RU in AutoTokenizer in train.py

I struggled with numerous errors in rebel_short.py, pl_modules.py, pl_data_modules.py that happend on my setup. I also had to update pytorch-lightning several version up - to 1.3.0 because the version in requirements.txt got a problem at startup. May be it relates to https://github.com/Babelscape/rebel/issues/22

So, for now I have several questions that you as creators:

Should there be any problem with replacing bart with mbart-50?

What would you recommend to check if I see that a printed log with all my relations and their TP FP FN precision recall and F1 contains only zeros and the loss is nan?

In generate_samples.py there is a line

pl_module.logger.experiment.log({"Triplets": wandb_table})

when everything is done and the training started at 999 iteration an exception is thrown there which says SummaryWriter() has no attribute log. I've checked the docs of pl and it does not really tell where should I look at least.

I put it try except block, but I worry that this may lead to 4th question

At 50% of training it stops with a RuntimeError 'could not infer dtype of Table' )))0

I am not sure if stacktraces will help here, they are not really informative.

Would love to get any answer here. Thanks again for the paper.
opened by InfroLab 8

Unexpectedly bad performance on bart-base

Hi. I wanted to see what the results would be like with bart-base. I trained on CONLL04 without changing any other parameter, but the performance is not nearly as good.

Here are my results:

processed 288 sentences with 421 relations; found: 444 relations; correct: 46.
	ALL	 TP: 46;	FP: 30;	FN: 360
		(m avg): precision: 60.53;	recall: 11.33;	f1: 19.09 (micro)
		(M avg): precision: 58.83;	recall: 10.79;	f1: 17.58 (Macro)

	killed by: 	TP: 3;	FP: 1;	FN: 44;	precision: 75.00;	recall: 6.38;	f1: 11.76;	4
	residence: 	TP: 3;	FP: 5;	FN: 95;	precision: 37.50;	recall: 3.06;	f1: 5.66;	8
	location: 	TP: 18;	FP: 6;	FN: 71;	precision: 75.00;	recall: 20.22;	f1: 31.86;	24
	headquarters location: 	TP: 17;	FP: 13;	FN: 79;	precision: 56.67;	recall: 17.71;	f1: 26.98;	30
	employer: 	TP: 5;	FP: 5;	FN: 71;	precision: 50.00;	recall: 6.58;	f1: 11.63;	10

I tested on bart-large, and it works as expected:

processed 288 sentences with 421 relations; found: 362 relations; correct: 273.
	ALL	 TP: 273;	FP: 87;	FN: 133
		(m avg): precision: 75.83;	recall: 67.24;	f1: 71.28 (micro)
		(M avg): precision: 77.78;	recall: 69.60;	f1: 73.16 (Macro)

	killed by: 	TP: 43;	FP: 5;	FN: 4;	precision: 89.58;	recall: 91.49;	f1: 90.53;	48
	residence: 	TP: 66;	FP: 37;	FN: 32;	precision: 64.08;	recall: 67.35;	f1: 65.67;	103
	location: 	TP: 53;	FP: 17;	FN: 36;	precision: 75.71;	recall: 59.55;	f1: 66.67;	70
	headquarters location: 	TP: 60;	FP: 14;	FN: 36;	precision: 81.08;	recall: 62.50;	f1: 70.59;	74
	employer: 	TP: 51;	FP: 14;	FN: 25;	precision: 78.46;	recall: 67.11;	f1: 72.34;	65

Increasing the number of steps did not help. Are these results expected? Do you happen to know how much impact the model size has on performance?

opened by ramarasty 7

About the statistics of the dataset

Hi, nice work. I try to replica the pretraining process, but using "rebel/datasets/rebel-short.py /", i can not get the same train/val/test num writen in the paper as

plz enlight me some details!±±

opened by David-Lee-1990 7
AssertionError: Non-consecutive added token '' found. Should have index 50272 but has index 50265 in saved vocabulary.

Traceback: File "/home/rahulpal/anaconda3/envs/rebel/lib/python3.7/site-packages/streamlit/script_runner.py", line 338, in _run_script exec(code, module.dict) File "/home/rahulpal/Documents/rebel-main/demo.py", line 57, in tokenizer, model, dataset = load_models() File "/home/rahulpal/anaconda3/envs/rebel/lib/python3.7/site-packages/streamlit/caching.py", line 573, in wrapped_func return get_or_create_cached_value() File "/home/rahulpal/anaconda3/envs/rebel/lib/python3.7/site-packages/streamlit/caching.py", line 555, in get_or_create_cached_value return_value = func(*args, **kwargs) File "/home/rahulpal/Documents/rebel-main/demo.py", line 18, in load_models tokenizer = AutoTokenizer.from_pretrained("Babelscape/rebel-large") File "/home/rahulpal/anaconda3/envs/rebel/lib/python3.7/site-packages/transformers/models/auto/tokenization_auto.py", line 416, in from_pretrained return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) File "/home/rahulpal/anaconda3/envs/rebel/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1705, in from_pretrained resolved_vocab_files, pretrained_model_name_or_path, init_configuration, *init_inputs, **kwargs File "/home/rahulpal/anaconda3/envs/rebel/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1811, in _from_pretrained f"Non-consecutive added token '{token}' found. "
bug

opened by rahul765 7
when train docred dataset issue

Hi, I get the following error when trying to train docred dataset It seems to me that KeyError: 'labels' is the main problem, how do I fix it? Detailed log is as below.. It is a situation where you must train using the docred dataset. Please help me

/home/kdk/anaconda3/envs/rebel/lib/python3.7/site-packages/pytorch_lightning/utilities/distributed.py:50: UserWarning: The dataloader, train dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 4 which is the number of cpus on this machine) in theDataLoader` init to improve performance. warnings.warn(*args, **kwargs) Epoch 0: 32%|███████████████████████████████████████▏ | 499/1579 [02:10<04:42, 3.82it/s, loss=2.85, v_num=y2u5]Saving latest checkpoint... Traceback (most recent call last): File "/home/kdk/anaconda3/envs/rebel/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 561, in train self.train_loop.run_training_epoch() File "/home/kdk/anaconda3/envs/rebel/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 556, in run_training_epoch self.on_train_batch_end(epoch_output, batch_end_outputs, batch, batch_idx, dataloader_idx) File "/home/kdk/anaconda3/envs/rebel/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 226, in on_train_batch_end self.trainer.call_hook('on_train_batch_end', batch_end_outputs, batch, batch_idx, dataloader_idx) File "/home/kdk/anaconda3/envs/rebel/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 925, in call_hook trainer_hook(*args, **kwargs) File "/home/kdk/anaconda3/envs/rebel/lib/python3.7/site-packages/pytorch_lightning/trainer/callback_hook.py", line 147, in on_train_batch_end callback.on_train_batch_end(self, self.get_model(), outputs, batch, batch_idx, dataloader_idx) File "/home/kdk/rebel/src/generate_samples.py", line 39, in on_train_batch_end labels = batch.pop("labels") File "/home/kdk/anaconda3/envs/rebel/lib/python3.7/_collections_abc.py", line 795, in pop value = self[key] File "/home/kdk/anaconda3/envs/rebel/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 230, in getitem return self.data[item] KeyError: 'labels'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "train.py", line 151, in main() File "/home/kdk/anaconda3/envs/rebel/lib/python3.7/site-packages/hydra/main.py", line 37, in decorated_main strict=strict, File "/home/kdk/anaconda3/envs/rebel/lib/python3.7/site-packages/hydra/_internal/utils.py", line 347, in _run_hydra lambda: hydra.run( File "/home/kdk/anaconda3/envs/rebel/lib/python3.7/site-packages/hydra/_internal/utils.py", line 201, in run_and_report raise ex File "/home/kdk/anaconda3/envs/rebel/lib/python3.7/site-packages/hydra/_internal/utils.py", line 198, in run_and_report return func() File "/home/kdk/anaconda3/envs/rebel/lib/python3.7/site-packages/hydra/_internal/utils.py", line 350, in overrides=args.overrides, File "/home/kdk/anaconda3/envs/rebel/lib/python3.7/site-packages/hydra/_internal/hydra.py", line 112, in run configure_logging=with_log_configuration, File "/home/kdk/anaconda3/envs/rebel/lib/python3.7/site-packages/hydra/core/utils.py", line 127, in run_job ret.return_value = task_function(task_cfg) File "train.py", line 147, in main train(conf) File "train.py", line 143, in train trainer.fit(pl_module, datamodule=pl_data_module) File "/home/kdk/anaconda3/envs/rebel/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 510, in fit results = self.accelerator_backend.train() File "/home/kdk/anaconda3/envs/rebel/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 57, in train return self.train_or_test() File "/home/kdk/anaconda3/envs/rebel/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 74, in train_or_test results = self.trainer.train() File "/home/kdk/anaconda3/envs/rebel/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 592, in train self.train_loop.on_train_end() File "/home/kdk/anaconda3/envs/rebel/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 156, in on_train_end self.check_checkpoint_callback(should_save=True, is_last=True) File "/home/kdk/anaconda3/envs/rebel/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 190, in check_checkpoint_callback callback.on_validation_end(self.trainer, model) File "/home/kdk/anaconda3/envs/rebel/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 204, in on_validation_end self.save_checkpoint(trainer, pl_module) File "/home/kdk/anaconda3/envs/rebel/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 239, in save_checkpoint self._validate_monitor_key(trainer) File "/home/kdk/anaconda3/envs/rebel/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 517, in _validate_monitor_key raise MisconfigurationException(m) pytorch_lightning.utilities.exceptions.MisconfigurationException: ModelCheckpoint(monitor='val_F1_micro') not found in the returned metrics: ['loss']. HINT: Did you call self.log('val_F1_micro', tensor) in the LightningModule?

opened by KIMDOKYOUNG 5
How can I use the docred dataset with strict evaluation

When I am trying to train the model on docred dataset, inputing python train.py model=rebel_model data=docred_data train=docred_train, but the model can't run correctly and return

Traceback (most recent call last): File "/home/weimin/rebel/src/train.py", line 150, in main train(conf) File "/home/weimin/rebel/src/train.py", line 146, in train trainer.fit(pl_module, datamodule=pl_data_module) File "/home/weimin/anaconda3/envs/rebel/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 510, in fit results = self.accelerator_backend.train() File "/home/weimin/anaconda3/envs/rebel/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 57, in train return self.train_or_test() File "/home/weimin/anaconda3/envs/rebel/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 74, in train_or_test results = self.trainer.train() File "/home/weimin/anaconda3/envs/rebel/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 532, in train self.run_sanity_check(self.get_model()) File "/home/weimin/anaconda3/envs/rebel/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 730, in run_sanity_check _, eval_results = self.run_evaluation(max_batches=self.num_sanity_val_batches) File "/home/weimin/anaconda3/envs/rebel/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 662, in run_evaluation deprecated_eval_results = self.evaluation_loop.evaluation_epoch_end() File "/home/weimin/anaconda3/envs/rebel/lib/python3.9/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 210, in evaluation_epoch_end deprecated_results = self.__run_eval_epoch_end(self.num_dataloaders, using_eval_result) File "/home/weimin/anaconda3/envs/rebel/lib/python3.9/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 248, in __run_eval_epoch_end eval_results = model.validation_epoch_end(eval_results) File "/home/weimin/rebel/src/pl_modules.py", line 392, in validation_epoch_end scores, precision, recall, f1 = re_score([item for pred in output for item in pred['predictions']], [item for pred in output for item in pred['labels']], list(relations_docred.values()), "strict") File "/home/weimin/rebel/src/score.py", line 174, in re_score pred_rels = {(rel["head"], rel["head_type"], rel["tail"], rel["tail_type"]) for rel in pred_sent if File "/home/weimin/rebel/src/score.py", line 174, in pred_rels = {(rel["head"], rel["head_type"], rel["tail"], rel["tail_type"]) for rel in pred_sent if KeyError: 'head_type'

And I check the code in the function re_score, I found the output of this model only contains "head", "tail" and "type". What's wrong with it?
bug

opened by xwm-123 5
Datasets Related Problems

Excuse me, I run your model with NYT datasets. However, it failed. I guess whether my dataset is right or wrong. the error printed seems to show there are some problems with the data type. Please see the pictures as follow.

picture1: I run the train.py, the problems occur in load_datasets functions. picture2: the NYT dataset(train.data). I download from the Copy_RE github. picture3: I write a test function to run the load_datasets functions. It still cannot work.

so, I don't know whether my dataset is right or wrong. the follow code also cannot run. because there is no spo_list or spo_details in datasets. list_relations = zip(row['spo_list'], row['spo_details'])

Thank you very much!
documentation

opened by WangYao-GoGoGo 5

KeyError: 'labels' in generate_samples.py; doc_red

Hi, I am trying to train the model on the doc red dataset in order to test the effects of labeling the entities with an additional special token.

At the moment I am still trying to get the code to run with the original dataset.

In the first epoch after 56% i get the KeyError: 'labels' in line 48, in on_train_batch_end labels = batch.pop("labels")

I checked the dataset for empty labels and found 27 empty arrays in the doc red data. Deleting data points didn't solve the problem. I also tested only using the first 50% of the dataset. The error still occurred at 56%.

full console output with print(batch) before the error:

(azureml_py38_PT_TF) azureuser@rebelgpu:/mnt/batch/tasks/shared/LS_root/mounts/clusters/rebelgpu/code/Users/leon.lukas/rebel-main/src$ python train.py 
Extension horovod.torch has not been built: /anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/horovod/torch/mpi_lib/_mpi_lib.cpython-38-x86_64-linux-gnu.so not found
If this is not expected, reinstall Horovod with HOROVOD_WITH_PYTORCH=1 to debug the build error.
Warning! MPI libs are missing, but python applications are still available.
Global seed set to 42
Special tokens have been added in the vocabulary, make sure the associated word embedding are fine-tuned or trained.
[2022-08-29 11:47:49,710][datasets.builder][WARNING] - Using custom data configuration default-3b456a334ae5426f
[2022-08-29 11:47:49,710][datasets.builder][WARNING] - Reusing dataset doc_red (/home/azureuser/.cache/huggingface/datasets/doc_red/default-3b456a334ae5426f/0.0.0/2cc6999b276b6aa2b2af5101b416c33155e5f19e6f0b26864a2312d1aa57b175)
GPU available: True, used: True
TPU available: None, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Using native 16bit precision.
[2022-08-29 11:47:50,688][datasets.arrow_dataset][WARNING] - Loading cached processed dataset at /mnt/batch/tasks/shared/LS_root/mounts/clusters/rebelgpu/code/Users/leon.lukas/rebel-main/data/doc_red/train_annotated.jsondocred_typed.cache
[2022-08-29 11:47:51,828][datasets.arrow_dataset][WARNING] - Loading cached processed dataset at /mnt/batch/tasks/shared/LS_root/mounts/clusters/rebelgpu/code/Users/leon.lukas/rebel-main/data/doc_red/dev.jsondocred_typed.cache
wandb: Currently logged in as: llukas (use `wandb login --relogin` to force relogin)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
wandb: wandb version 0.13.2 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.10.26
wandb: Syncing run bart-large
wandb: ⭐️ View project at https://wandb.ai/llukas/docred_typed
wandb: 🚀 View run at https://wandb.ai/llukas/docred_typed/runs/3b8sf4s1
wandb: Run data is saved locally in /mnt/batch/tasks/shared/LS_root/mounts/clusters/rebelgpu/code/Users/leon.lukas/rebel-main/src/outputs/2022-08-29/11-47-34/wandb/run-20220829_114753-3b8sf4s1
wandb: Run `wandb offline` to turn off syncing.


  | Name    | Type                         | Params
---------------------------------------------------------
0 | model   | BartForConditionalGeneration | 406 M 
1 | loss_fn | CrossEntropyLoss             | 0     
---------------------------------------------------------
406 M     Trainable params
0         Non-trainable params
406 M     Total params
/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:50: UserWarning: The dataloader, val dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 8 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  warnings.warn(*args, **kwargs)
Validation sanity check:   0%|                                                                                                                                                                                                              | 0/2 [00:00<?, ?it/s]/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/transformers/generation_utils.py:1777: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  next_indices = next_tokens // vocab_size
Validation sanity check: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:11<00:00,  5.61s/it]RE Evaluation in *** STRICT *** mode
processed 16 sentences with 233 relations; found: 0 relations; correct: 0.
        ALL      TP: 0; FP: 0;  FN: 231
                (m avg): precision: 0.00;       recall: 0.00;   f1: 0.00 (micro)
                (M avg): precision: 0.00;       recall: 0.00;   f1: 0.00 (Macro)

        head of government:     TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        country:        TP: 0;  FP: 0;  FN: 64; precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        place of birth:         TP: 0;  FP: 0;  FN: 4;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        place of death:         TP: 0;  FP: 0;  FN: 1;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        father:         TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        mother:         TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        spouse:         TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        country of citizenship:         TP: 0;  FP: 0;  FN: 10; precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        continent:      TP: 0;  FP: 0;  FN: 2;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        instance of:    TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        head of state:  TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        capital:        TP: 0;  FP: 0;  FN: 3;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        official language:      TP: 0;  FP: 0;  FN: 1;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        position held:  TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        child:  TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        author:         TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        member of sports team:  TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        director:       TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        screenwriter:   TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        educated at:    TP: 0;  FP: 0;  FN: 5;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        composer:       TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        member of political party:      TP: 0;  FP: 0;  FN: 1;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        employer:       TP: 0;  FP: 0;  FN: 1;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        founded by:     TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        league:         TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        publisher:      TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        owned by:       TP: 0;  FP: 0;  FN: 1;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        located in the administrative territorial entity:       TP: 0;  FP: 0;  FN: 33; precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        genre:  TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        operator:       TP: 0;  FP: 0;  FN: 3;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        religion:       TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        contains administrative territorial entity:     TP: 0;  FP: 0;  FN: 27; precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        follows:        TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        followed by:    TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        headquarters location:  TP: 0;  FP: 0;  FN: 2;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        cast member:    TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        producer:       TP: 0;  FP: 0;  FN: 1;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        award received:         TP: 0;  FP: 0;  FN: 1;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        creator:        TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        parent taxon:   TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        ethnic group:   TP: 0;  FP: 0;  FN: 3;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        performer:      TP: 0;  FP: 0;  FN: 6;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        manufacturer:   TP: 0;  FP: 0;  FN: 14; precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        developer:      TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        series:         TP: 0;  FP: 0;  FN: 1;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        sister city:    TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        legislative body:       TP: 0;  FP: 0;  FN: 4;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        basin country:  TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        located in or next to body of wate/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:50: UserWarning: The dataloader, train dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 8 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  warnings.warn(*args, **kwargs)
r:      TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        military branch:        TP: 0;  FP: 0;  FN: 2;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        record label:   TP: 0;  FP: 0;  FN: 2;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        production company:     TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        location:       TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        subclass of:    TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        subsidiary:     TP: 0;  FP: 0;  FN: 2;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        part of:        TP: 0;  FP: 0;  FN: 2;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        original language of work:      TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        platform:       TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        mouth of the watercourse:       TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        original network:       TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        member of:      TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        chairperson:    TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        country of origin:      TP: 0;  FP: 0;  FN: 1;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        has part:       TP: 0;  FP: 0;  FN: 1;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        residence:      TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        date of birth:  TP: 0;  FP: 0;  FN: 4;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        date of death:  TP: 0;  FP: 0;  FN: 4;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        inception:      TP: 0;  FP: 0;  FN: 3;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        dissolved, abolished or demolished:     TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        publication date:       TP: 0;  FP: 0;  FN: 4;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        start time:     TP: 0;  FP: 0;  FN: 1;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        end time:       TP: 0;  FP: 0;  FN: 1;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        point in time:  TP: 0;  FP: 0;  FN: 1;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        conflict:       TP: 0;  FP: 0;  FN: 4;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        characters:     TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        lyrics by:      TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        located on terrain feature:     TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        participant:    TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        influenced by:  TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        location of formation:  TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        parent organization:    TP: 0;  FP: 0;  FN: 3;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        notable work:   TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        separated from:         TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        narrative location:     TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        work location:  TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        applies to jurisdiction:        TP: 0;  FP: 0;  FN: 3;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        product or material produced:   TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        unemployment rate:      TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        territory claimed by:   TP: 0;  FP: 0;  FN: 3;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        participant of:         TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        replaces:       TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        replaced by:    TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        capital of:     TP: 0;  FP: 0;  FN: 2;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        languages spoken, written or signed:    TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        present in work:        TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
        sibling:        TP: 0;  FP: 0;  FN: 0;  precision: 0.00;        recall: 0.00;   f1: 0.00;       0
Epoch 0:   1%|█▋                                                                                                                                                                                           | 8/889 [00:02<04:25,  3.32it/s, loss=8.86, v_num=f4s1]/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:131: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
Epoch 0:  56%|████████████████████████████████████████████████████████████████████████████████████████████████████████▉                                                                                  | 499/889 [02:39<02:04,  3.12it/s, loss=5.25, v_num=f4s1]------------------------------------------------
{'attention_mask': tensor([[1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 1, 1, 1]], device='cuda:0'), 'input_ids': tensor([[    0, 29161,  2897,  ...,     1,     1,     1],
        [    0,   133,   494,  ...,     1,     1,     1],
        [    0, 47001,   329,  ...,     1,     1,     1],
        [    0,   113,  1890,  ...,   347,     4,     2]], device='cuda:0'), 'decoder_input_ids': tensor([[    0, 50267,  2897,  ...,     1,     1,     1],
        [    0, 50267,   496,  ...,     1,     1,     1],
        [    0, 50267, 18775,  ...,     1,     1,     1],
        [    0, 50267,  1890,  ..., 13034,  1437,     2]], device='cuda:0')}
<bound method BatchEncoding.keys of {'attention_mask': tensor([[1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 1, 1, 1]], device='cuda:0'), 'input_ids': tensor([[    0, 29161,  2897,  ...,     1,     1,     1],
        [    0,   133,   494,  ...,     1,     1,     1],
        [    0, 47001,   329,  ...,     1,     1,     1],
        [    0,   113,  1890,  ...,   347,     4,     2]], device='cuda:0'), 'decoder_input_ids': tensor([[    0, 50267,  2897,  ...,     1,     1,     1],
        [    0, 50267,   496,  ...,     1,     1,     1],
        [    0, 50267, 18775,  ...,     1,     1,     1],
        [    0, 50267,  1890,  ..., 13034,  1437,     2]], device='cuda:0')}>
Saving latest checkpoint...
Traceback (most recent call last):
  File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 561, in train
    self.train_loop.run_training_epoch()
  File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 556, in run_training_epoch
    self.on_train_batch_end(epoch_output, batch_end_outputs, batch, batch_idx, dataloader_idx)
  File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 226, in on_train_batch_end
    self.trainer.call_hook('on_train_batch_end', batch_end_outputs, batch, batch_idx, dataloader_idx)
  File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 925, in call_hook
    trainer_hook(*args, **kwargs)
  File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/pytorch_lightning/trainer/callback_hook.py", line 147, in on_train_batch_end
    callback.on_train_batch_end(self, self.get_model(), outputs, batch, batch_idx, dataloader_idx)
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/rebelgpu/code/Users/leon.lukas/rebel-main/src/generate_samples.py", line 48, in on_train_batch_end
    labels = batch.pop("labels")
  File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/_collections_abc.py", line 795, in pop
    value = self[key]
  File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 230, in __getitem__
    return self.data[item]
KeyError: 'labels'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 111, in <module>
    main()
  File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/hydra/main.py", line 32, in decorated_main
    _run_hydra(
  File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/hydra/_internal/utils.py", line 346, in _run_hydra
    run_and_report(
  File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/hydra/_internal/utils.py", line 201, in run_and_report
    raise ex
  File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/hydra/_internal/utils.py", line 198, in run_and_report
    return func()
  File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/hydra/_internal/utils.py", line 347, in <lambda>
    lambda: hydra.run(
  File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 107, in run
    return run_job(
  File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/hydra/core/utils.py", line 127, in run_job
    ret.return_value = task_function(task_cfg)
  File "train.py", line 107, in main
    train(conf)
  File "train.py", line 103, in train
    trainer.fit(pl_module, datamodule=pl_data_module)
  File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 510, in fit
    results = self.accelerator_backend.train()
  File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 57, in train
    return self.train_or_test()
  File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 74, in train_or_test
    results = self.trainer.train()
  File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 592, in train
    self.train_loop.on_train_end()
  File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 156, in on_train_end
    self.check_checkpoint_callback(should_save=True, is_last=True)
  File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 190, in check_checkpoint_callback
    callback.on_validation_end(self.trainer, model)
  File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 204, in on_validation_end
    self.save_checkpoint(trainer, pl_module)
  File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 239, in save_checkpoint
    self._validate_monitor_key(trainer)
  File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 517, in _validate_monitor_key
    raise MisconfigurationException(m)
pytorch_lightning.utilities.exceptions.MisconfigurationException: ModelCheckpoint(monitor='val_F1_micro') not found in the returned metrics: ['loss']. HINT: Did you call self.log('val_F1_micro', tensor) in the LightningModule?

wandb: Waiting for W&B process to finish, PID 13914
wandb: Program failed with code 1.  Press ctrl-c to abort syncing.
wandb:                                                                                
wandb: Find user logs for this run at: /mnt/batch/tasks/shared/LS_root/mounts/clusters/rebelgpu/code/Users/leon.lukas/rebel-main/src/outputs/2022-08-29/11-47-34/wandb/run-20220829_114753-3b8sf4s1/logs/debug.log
wandb: Find internal logs for this run at: /mnt/batch/tasks/shared/LS_root/mounts/clusters/rebelgpu/code/Users/leon.lukas/rebel-main/src/outputs/2022-08-29/11-47-34/wandb/run-20220829_114753-3b8sf4s1/logs/debug-internal.log
wandb: Run summary:
wandb:   lr-AdamW/pg1 0.0
wandb:   lr-AdamW/pg2 0.0
wandb:           loss 5.46171
wandb:          epoch 0
wandb:       _runtime 174
wandb:     _timestamp 1661773847
wandb:          _step 49
wandb: Run history:
wandb:   lr-AdamW/pg1 ▁
wandb:   lr-AdamW/pg2 ▁
wandb:           loss ▁
wandb:          epoch ▁
wandb:       _runtime ▁
wandb:     _timestamp ▁
wandb:          _step ▁
wandb: 
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)

opened by l0renor 4

spacy_component
Hi, for me spacy integration only worked after the following fix:

$ diff spacy_component.py fix_spacy_component.py 71c71 < extracted_text = self.triplet_extractor.tokenizer.batch_decode(output_ids[0])

extracted_text = self.triplet_extractor.tokenizer.batch_decode(output_ids)

(the decode function assumes a list of lists, not a list of ids, it wont raise an error but only first token is processed in the buggy version)
opened by gossebouma 4

RuntimeError: CUDA error: device-side assert triggered

I trained a custom dataset with five entities based on the CONLL format, but when testing the checkpoint, I'm getting the following error:

/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [56,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [57,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [58,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [59,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [60,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [61,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [62,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [8,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [9,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [10,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [11,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [12,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [13,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [14,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [15,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [16,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [17,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [18,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [19,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [20,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [21,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [22,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [23,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [24,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [25,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [26,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [27,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [28,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [29,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [30,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [98,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [99,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [100,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [101,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [102,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [103,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [104,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [105,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [106,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [107,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [108,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [109,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [110,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [111,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [112,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [113,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [114,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [115,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [116,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [117,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [118,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [119,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [121,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [122,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [123,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [153,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
  File "test.py", line 75, in main
    train(conf)
  File "test.py", line 70, in train
    trainer.test(pl_module, test_dataloaders=pl_data_module.test_dataloader())
  File "/anaconda/envs/we/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 792, in test
    results = self.__test_given_model(model, test_dataloaders)
  File "/anaconda/envs/we/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 855, in __test_given_model
    results = self.fit(model)
  File "/anaconda/envs/we/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 510, in fit
    results = self.accelerator_backend.train()
  File "/anaconda/envs/we/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 57, in train
    return self.train_or_test()
  File "/anaconda/envs/we/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 71, in train_or_test
    results = self.trainer.run_test()
  File "/anaconda/envs/we/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 699, in run_test
    eval_loop_results, _ = self.run_evaluation()
  File "/anaconda/envs/we/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 646, in run_evaluation
    output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx)
  File "/anaconda/envs/we/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 177, in evaluation_step
    output = self.trainer.accelerator_backend.test_step(args)
  File "/anaconda/envs/we/lib/python3.8/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 76, in test_step
    return self._step(self.trainer.model.test_step, args)
  File "/anaconda/envs/we/lib/python3.8/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 65, in _step
    output = model_step(*args)
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/iqranlp/code/Users/i.abbasi/rebel/src/pl_modules.py", line 346, in test_step
    forward_output = self.forward(batch, labels)
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/iqranlp/code/Users/i.abbasi/rebel/src/pl_modules.py", line 121, in forward
    outputs = self.model(**inputs, use_cache=False, return_dict = True, output_hidden_states=True)
  File "/anaconda/envs/we/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/anaconda/envs/we/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py", line 1348, in forward
    outputs = self.model(
  File "/anaconda/envs/we/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/anaconda/envs/we/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py", line 1235, in forward
    decoder_outputs = self.decoder(
  File "/anaconda/envs/we/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/anaconda/envs/we/lib/python3.8/site-packages/transformers/models/bart/modeling_bart.py", line 1023, in forward
    inputs_embeds = self.embed_tokens(input_ids) * self.embed_scale
  File "/anaconda/envs/we/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/anaconda/envs/we/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 158, in forward
    return F.embedding(
  File "/anaconda/envs/we/lib/python3.8/site-packages/torch/nn/functional.py", line 2043, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered

Thie error came about when I resized the model token embeddings from [50272, 1024] to [50276, 1024] to fit the checkpoint. Could you please suggest a workaround for this?

opened by Iqra840 3

REBEL: Relation Extraction By End-to-end Language generation

Related tags

Overview

REBEL: Relation Extraction By End-to-end Language generation

Initialize environment

REBEL Model and Dataset

CROCODILE: automatiC RelatiOn extraCtiOn Dataset wIth nLi filtEring.

Training and testing

DEMO

Datasets

Comments

$ diff spacy_component.py fix_spacy_component.py 71c71 < extracted_text = self.triplet_extractor.tokenizer.batch_decode(output_ids[0])

Owner

Babelscape

PURE: End-to-End Relation Extraction

An implementation for `Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction`

🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

git《Joint Entity and Relation Extraction with Set Prediction Networks》(2020) GitHub:

Source code for paper "Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling", AAAI 2021

Implementation for our AAAI2021 paper (Entity Structure Within and Throughout: Modeling Mention Dependencies for Document-Level Relation Extraction).

[ACL 20] Probing Linguistic Features of Sentence-level Representations in Neural Relation Extraction

Source code for "UniRE: A Unified Label Space for Entity Relation Extraction.", ACL2021.

A project for developing transformer-based models for clinical relation extraction

Code for technical report "An Improved Baseline for Sentence-level Relation Extraction".

Source code for "Pack Together: Entity and Relation Extraction with Levitated Marker"

Code and datasets for the paper "KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction"

Wanli Li and Tieyun Qian: Exploit a Multi-head Reference Graph for Semi-supervised Relation Extraction, IJCNN 2021

It's a implement of this paper：Relation extraction via Multi-Level attention CNNs

Company clustering with K-means/GMM and visualization with PCA, t-SNE, using SSAN relation extraction

A pytorch-version implementation codes of paper: "BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation"

PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning