I think the library should be installed isolated from transformers, because if one has another version of transformers with custom models or whatever, this breaks the environment, unnecessarily.
But the important point here is that it's not possible to train robertuito:
:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [64,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [65,0,0] Assertion `srcIndex < srcSelectDimSize` failed. | 73/666 [00:38<00:45, 13.07it/s]
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [66,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [67,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [68,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [69,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [70,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [71,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [72,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [73,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [74,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [75,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [76,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [77,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [78,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [79,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [80,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [81,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [82,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [83,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [84,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [85,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [86,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [87,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [88,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [89,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [90,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [91,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:699: block: [142,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
[W 2022-02-20 18:25:42,448] Trial 0 failed because of the following error: RuntimeError('CUDA error: device-side assert triggered\nCUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.')
Traceback (most recent call last):
File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\optuna\study\_optimize.py", line 213, in _run_trial
value_or_values = func(trial)
File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\transformers\integrations.py", line 150, in _objective
trainer.train(resume_from_checkpoint=checkpoint, trial=trial)
File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\transformers\trainer.py", line 1365, in train
tr_loss_step = self.training_step(model, inputs)
File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\transformers\trainer.py", line 1950, in training_step
self.scaler.scale(loss).backward()
File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\torch\_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\torch\autograd\__init__.py", line 156, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Trying models on dataset exist22_1_es: 0%| | 0/1 [00:49<?, ?it/s]
Iterating over datasets...: 0it [00:49, ?it/s]
Traceback (most recent call last):
File "run_experiments.py", line 3279, in <module>
benchmarker()
File "run_experiments.py", line 1196, in __call__
self.optuna_hp_search()
File "run_experiments.py", line 1470, in optuna_hp_search
sampler=TPESampler(seed=420)
File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\transformers\trainer.py", line 1847, in hyperparameter_search
best_run = backend_dict[backend](self, n_trials, direction, **kwargs)
File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\transformers\integrations.py", line 160, in run_hp_search_optuna
study.optimize(_objective, n_trials=n_trials, timeout=timeout, n_jobs=n_jobs)
File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\optuna\study\study.py", line 409, in optimize
show_progress_bar=show_progress_bar,
File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\optuna\study\_optimize.py", line 76, in _optimize
progress_bar=progress_bar,
File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\optuna\study\_optimize.py", line 163, in _optimize_sequential
trial = _run_trial(study, func, catch)
File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\optuna\study\_optimize.py", line 264, in _run_trial
raise func_err
File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\optuna\study\_optimize.py", line 213, in _run_trial
value_or_values = func(trial)
File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\transformers\integrations.py", line 150, in _objective
trainer.train(resume_from_checkpoint=checkpoint, trial=trial)
File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\transformers\trainer.py", line 1365, in train
tr_loss_step = self.training_step(model, inputs)
File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\transformers\trainer.py", line 1950, in training_step
self.scaler.scale(loss).backward()
File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\torch\_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "C:\Users\Usuario\anaconda3\envs\rigobenchmarks\lib\site-packages\torch\autograd\__init__.py", line 156, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
I have tried many other models in spanish and this doesn't happen, therefore it's directly related to your model, and not the model architecture (coming from transformers).