Hello,
when using the script inference_amr.sh I receive the following error:
Please answer yes or no.
Global seed set to 42
Tokenizer: 53587 PreTrainedTokenizer(name_or_path='facebook/bart-large', vocab_size=53587, model_max_len=1024, is_fast=False, padding_side='right', special_tokens={'bos_token': 'Ġ<s>', 'eos_token': 'Ġ</s>', 'unk_token': 'Ġ<unk>', 'sep_token': AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'pad_token': 'Ġ<pad>', 'cls_token': AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'mask_token': AddedToken("<mask>", rstrip=False, lstrip=True, single_word=False, normalized=True)})
Traceback (most recent call last):
File "/home/students/meier/MA/AMRBART/fine-tune/inference_amr.py", line 105, in <module>
main(args)
File "/home/students/meier/MA/AMRBART/fine-tune/inference_amr.py", line 65, in main
data_module = AMRParsingDataModule(amr_tokenizer, **vars(args))
File "/home/students/meier/MA/AMRBART/fine-tune/data_interface/dataset_pl.py", line 228, in __init__
decoder_start_token_id=self.tokenizer.amr_bos_token_id,
AttributeError: 'PENMANBartTokenizer' object has no attribute 'amr_bos_token_id'
The facebook/bart-large tokenizer is used. This error is new, since I used the scripts 8 to 6 weeks ago and everything worked fine.
A similar error can be seen when using inferece_text.sh:
Please answer yes or no.
Global seed set to 42
Tokenizer: 53587 PreTrainedTokenizer(name_or_path='facebook/bart-large', vocab_size=53587, model_max_len=1024, is_fast=False, padding_side='right', special_tokens={'bos_token': 'Ġ<s>', 'eos_token': 'Ġ</s>', 'unk_token': 'Ġ<unk>', 'sep_token': AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'pad_token': 'Ġ<pad>', 'cls_token': AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'mask_token': AddedToken("<mask>", rstrip=False, lstrip=True, single_word=False, normalized=True)})
Dataset cache dir: /home/students/meier/MA/AMRBART/fine-tune/../examples/.cache/
Using custom data configuration default-288dad464b8291c3
Downloading and preparing dataset amr_data/default (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /home/students/meier/MA/AMRBART/fine-tune/../examples/.cache/amr_data/default-288dad464b8291c3/1.0.0/f0dfbe4d826478b18bc1ef4db7270a419c69c4ea4c94fbf73515b13180f43059...
^M0 examples [00:00, ? examples/s]^M ^M^M0 examples [00:00, ? examples/s]^M ^M^M0 examples [00:00, ? examples/s]^M ^MDataset amr_data downloaded and prepared to /home/students/meier/MA/AMRBART/fine-tune/../examples/.cache/amr_data/default-288dad464b8291c3/1.0.0/f0dfbe4d826478b18bc1ef4db7270a419c69c4ea4c94fbf73515b13180f43059. Subsequent calls will reuse this data.
datasets: DatasetDict({
train: Dataset({
features: ['src', 'tgt'],
num_rows: 10
})
validation: Dataset({
features: ['src', 'tgt'],
num_rows: 10
})
test: Dataset({
features: ['src', 'tgt'],
num_rows: 10
})
})
colums: ['src', 'tgt']
Setting TOKENIZERS_PARALLELISM=false for forked processes.
Parameter 'function'=<function AMR2TextDataModule.setup.<locals>.tokenize_function at 0x154ba6915280> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
^M #0: 0%| | 0/1 [00:00<?, ?ba/s]^M #0: 0%| | 0/1 [00:00<?, ?ba/s]
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/students/meier/amrbart_venv_new/lib/python3.8/site-packages/multiprocess/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/students/meier/amrbart_venv_new/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 185, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/home/students/meier/amrbart_venv_new/lib/python3.8/site-packages/datasets/fingerprint.py", line 397, in wrapper
out = func(self, *args, **kwargs)
File "/home/students/meier/amrbart_venv_new/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2016, in _map_single
batch = apply_function_on_filtered_inputs(
File "/home/students/meier/amrbart_venv_new/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1906, in apply_function_on_filtered_inputs
function(*fn_args, effective_indices, **fn_kwargs) if with_indices else function(*fn_args, **fn_kwargs)
File "/home/students/meier/MA/AMRBART/fine-tune/data_interface/dataset_pl.py", line 72, in tokenize_function
amr_tokens = [
File "/home/students/meier/MA/AMRBART/fine-tune/data_interface/dataset_pl.py", line 74, in <listcomp>
+ [self.tokenizer.amr_bos_token]
AttributeError: 'PENMANBartTokenizer' object has no attribute 'amr_bos_token'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/students/meier/MA/AMRBART/fine-tune/run_amr2text.py", line 154, in <module>
main(args)
File "/home/students/meier/MA/AMRBART/fine-tune/run_amr2text.py", line 91, in main
data_module.setup()
File "/home/students/meier/amrbart_venv_new/lib/python3.8/site-packages/pytorch_lightning/core/datamodule.py", line 474, in wrapped_fn
fn(*args, **kwargs)
File "/home/students/meier/MA/AMRBART/fine-tune/data_interface/dataset_pl.py", line 117, in setup
self.train_dataset = datasets["train"].map(
File "/home/students/meier/amrbart_venv_new/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1744, in map
transformed_shards = [r.get() for r in results]
File "/home/students/meier/amrbart_venv_new/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1744, in <listcomp>
transformed_shards = [r.get() for r in results]
File "/home/students/meier/amrbart_venv_new/lib/python3.8/site-packages/multiprocess/pool.py", line 771, in get
raise self._value
AttributeError: 'PENMANBartTokenizer' object has no attribute 'amr_bos_token'