A Unified Framework and Analysis for Structured Knowledge Grounding

HKU NLP Group

Last update: Dec 21, 2022

Related tags

Deep Learning natural-language-processing text-generation pytorch question-answering semantic-parsing fact-verification data-to-text huggingface-transformers huggingface-datasets structured-knowledge-grounding

Overview

UnifiedSKG 📚 : Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models

Code for paper UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models. Please refer to our project page for up-to-date related resources (e.g., papers, code, tools, tutorials) in Structured Knowledge Grounding.

Structured knowledge grounding (SKG) leverages structured knowledge to complete user requests, such as semantic parsing over databases and question answering over knowledge bases. Since the inputs and outputs of SKG tasks are heterogeneous, they were historically studied in separate by different communities, which limits systematic and compatible research on SKG. In this paper, we overcome this limitation by proposing the UnifiedSKG framework, which unifies 21 SKG tasks into the text-to-text format, aiming to promote systematic SKG research, instead of being exclusive to a single task, domain, or dataset. We show that large language models like T5, with simple modification when necessary, achieve state-of-the-art performance on all 21 tasks. UnifiedSKG facilitates the investigation of multi-task, zero-shot, and few-shot learning. We demonstrate that multi-task prefix-tuning with UNIFIEDSKG improves the performance on most tasks and show that T0, GPT-3, and Codex struggle in zero-shot and few-shot learning for SKG. UnifiedSKG also enables a series of controlled experiments on structured knowledge encoding variants across SKG tasks. We find that T5’s sensitivity to structured knowledge encoding variations varies across tasks.

UnifiedSKG is easily extensible to more tasks. We encourage researchers to make a pull request to add their datasets, metrics, models to the UnifiedSKG framework!

Updates

2022-01-12: We released our code, colab demo, weights and project page. Check it out!

Content

UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models
- Cloning this Repo
- Dependencies
- Usage
- Introduction of each directory
- Code structure overview of UnifiedSKG
- How to unify a new task into the framework
- Contributors

Cloning this repo

In order to include third-party dependencies in this repository, make sure to clone recursively, e.g.:

git clone --recurse-submodules [email protected]:HKUNLP/UnifiedSKG.git

Dependencies

To establish the environment run this code in the shell (the third line is for CUDA11.1):

conda env create -f py3.7pytorch1.8.yaml
conda activate py3.7pytorch1.8new
pip install datasets==1.14.0
# The following line to be replaced depending on your cuda version.
pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html

That will create the environment py3.7pytorch1.8new we used.

Usage

Environment setup

Activate the environment by running

conda activate py3.7pytorch1.8new

WandB setup

Setup WandB for logging (registration needed):

export WANDB_ENTITY=YOUR_WANDB_USERNAME
export WANDB_API_KEY=YOUR_WANDB_API_KEY
export WANDB_PROJECT=YOUR_PROJECT_NAME

Training

T5-base finetuning on WikiTQ (4 GPUs, 128 effective batch size)

python -m torch.distributed.launch --nproc_per_node 4 --master_port 1234 train.py --seed 2 --cfg Salesforce/T5_base_finetune_wikitq.cfg --run_name T5_base_finetune_wikitq --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 500 --metric_for_best_model avr --greater_is_better true --save_strategy steps --save_steps 500 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 8 --num_train_epochs 400 --adafactor true --learning_rate 5e-5 --do_train --do_eval --do_predict --predict_with_generate --output_dir output/T5_base_finetune_wikitq --overwrite_output_dir --per_device_train_batch_size 4 --per_device_eval_batch_size 16 --generation_num_beams 4 --generation_max_length 128 --input_max_length 1024 --ddp_find_unused_parameters true

If you want to resume training, remove the --overwrite_output_dir flag from the above command:

python -m torch.distributed.launch --nproc_per_node 4 --master_port 1234 train.py --seed 2 --cfg Salesforce/T5_base_finetune_wikitq.cfg --run_name T5_base_finetune_wikitq --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 500 --metric_for_best_model avr --greater_is_better true --save_strategy steps --save_steps 500 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 8 --num_train_epochs 400 --adafactor true --learning_rate 5e-5 --do_train --do_eval --do_predict --predict_with_generate --output_dir output/T5_base_finetune_wikitq --per_device_train_batch_size 4 --per_device_eval_batch_size 16 --generation_num_beams 4 --generation_max_length 128 --input_max_length 1024 --ddp_find_unused_parameters true

T5-base prefix-tuning on WikiTQ (4 GPUs, 128 effective batch size)

python -m torch.distributed.launch --nproc_per_node 4 --master_port 1234 train.py --seed 2 --cfg Salesforce/T5_base_prefix_wikitq.cfg --run_name T5_base_prefix_wikitq --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 500 --metric_for_best_model avr --greater_is_better true --save_strategy steps --save_steps 500 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 8 --num_train_epochs 400 --adafactor true --learning_rate 5e-5 --do_train --do_eval --do_predict --predict_with_generate --output_dir output/T5_base_prefix_wikitq --overwrite_output_dir --per_device_train_batch_size 4 --per_device_eval_batch_size 16 --generation_num_beams 4 --generation_max_length 128 --input_max_length 1024 --ddp_find_unused_parameters true

T5-3b finetuning on WikiTQ (8 GPUs, 128 effective batch size)

deepspeed train.py --deepspeed deepspeed/ds_config_zero2.json --seed 2 --cfg Salesforce/T5_3b_finetune_wikitq.cfg --run_name T5_3b_finetune_wikitq --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 500 --metric_for_best_model avr --greater_is_better true --save_strategy steps --save_steps 500 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 16 --num_train_epochs 50 --adafactor false --learning_rate 5e-5 --do_train --do_eval --do_predict --predict_with_generate --output_dir output/T5_3b_finetune_wikitq --overwrite_output_dir --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --generation_num_beams 4 --generation_max_length 128 --input_max_length 1024 --ddp_find_unused_parameters true

Load weights

See

Code structure overview of UnifiedSKG

.
├── configure                              # Config files for experiments, tasks, and settings
│   ├── META_TUNING                        # Config files for tasks and settings
│   └── Salesforce                         # Config files for experiments. We name this diretory as Salesforce to thank Salesforce Research for providing a large number of GPUs. We would like also to thank Amazon Research Awards, ServiceNow Research, and Yale NLP for providing computing resources generously.
│
├── metrics                                # Code for evaluation
│   └── ...                                # Please check the README of the ./seq2seq_construction.
├── models                                 # Code for models
│   ├── adapter                            # Code for T5 and BART with adapters (based on HuggingFace Transformers)
│   ├── prompt                             # Code for T5 and BART with prefix-tuning (based on HuggingFace Transformers)
│   └── unified
│           ├── base.py                    # Code for the base model that enables an arbitrary model to be pushed to HuggingFace Model Hub (namely, PushToHubFriendlyModel)
│           ├── finetune.py                # Code for finetuning
│           ├── adaptertuning.py           # Code for adapter-tuning
│           └── prefixtuning.py            # Code for prefix-tuning
│
├── seq2seq_construction                   # Code for converting raw data into sequences
│    └──  ...                              # Please check the README in this directory.
│
├── tasks                                  # Code for loading raw data
│    └──  ...                              # Please check the README in this directory.
│
├── third_party                            # Packages from third parties
│    └──  ...                              # Please check the README in this directory.
│
├── utils                                  # Code for some (probably) useful stuff
│       ├── processor                      # Adopted from Tapex: the processor that handles table truncation and linearization
        │        └──  ...            
│       ├── configure.py                   # Code for parsing config files in ./configure
│       ├── dataset.py                     # Code for converting input and output sequences into Datasets for training
│       ├── tool.py                        # Code for loading models, seq2seq constructors, and evaluators
│       ├── trainer.py                     # Code for EvaluationFriendlyTrainer. If you want make training-specific modifications, you may want to change something here.
│       └── training_arguments.py          # Code for seq2seq training arguments
│
├── .gitignore                 
├── .gitmodules                    
├── py3.7pytorch1.8.yaml                   # Anaconda environment config file
├── README.md                              # The README file you are looking at :)
└── train.py                               # Entry code, which controls train, eval, test, storage, and logging

How to unify a new task into the framework?

(README in ./tasks, ./seq2seq_construction, ./metrics, ./configure can also be useful)

step 1, Add the "Loader" of raw data in ./tasks, (you can search in huggingface dataset website firstly to find whether there is already a usable script, if not, that's great because you can be the contributor of both this project and huggingface community.
step 2, Add the "Wrapper" which construct "seq_in"("user request input" & "structured knowledge input") and "seq_out" from and add to the raw_data for seq2seq unification.
step 3, Add the "Evaluator"(for task) in ./metrics. if any third_party repo are used, please add them into .gitmodules.
step 3.5(optional), You can always add new "Model" into the ./models/ if you like, change the path in config files to drive new model.
step 4, Add the "Config" file to drive your task or all the tasks we have by finetune/multi-task-finetune/pretrain/prefix-tuning/multi-task-prefix-tuning... or other ways.

And this is all for it ! =)

Contributors

Comments

prefix tuning with t5-3b

I am trying to run prefix tuning with t5-3b, but I got some strange error

  File "/home/ubuntu/anaconda3/envs/py3.7pytorch1.8new/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/code/UnifiedSKG/models/prompt/modeling_t5.py", line 486, in forward
    key_states = torch.cat([prefix["prev_key"], key_states], dim=2)
RuntimeError: Sizes of tensors must match except in dimension 3. Got 128 and 32 (The offending index is 0)

This error does not take place for t5-base or t5-large, only got this for t5-3b. Any tips? Also I am having OOM issue with t5-3b model, it crashed even in case of mini-batch size = 1 and running on a 40GB GPU. Does anyone have the same issue? Thanks.

bug

opened by zluw1117 11

NCCL version

i have installed environment in the yaml file and installed torch 1.8 follow the setting in readme

my cuda version is 11.4, it seems that it is a version conflict of NCCL, pytorch and cuda

Is my cuda version to high?

ssh://[email protected]:22/home2/xh/.conda/envs/skg/bin/python -u -m torch.distributed.launch --nproc_per_node 4 --master_port 1234 train.py --seed 2 --cfg Salesforce/T5_base_prefix_compwebq.cfg --run_name T5_base_prefix_compwebq --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 500 --metric_for_best_model avr --greater_is_better true --save_strategy steps --save_steps 500 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 2 --num_train_epochs 400 --adafactor true --learning_rate 5e-5 --do_train --do_eval --do_predict --predict_with_generate --output_dir output/T5_base_prefix_compwebq --overwrite_output_dir --per_device_train_batch_size 2 --per_device_eval_batch_size 4 --generation_num_beams 4 --generation_max_length 128 --input_max_length 1024 --ddp_find_unused_parameters true
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/__init__.py:422: UserWarning: torch.set_deterministic is deprecated and will be removed in a future release. Please use torch.use_deterministic_algorithms instead
  "torch.set_deterministic is deprecated and will be removed in a future "
[W Context.cpp:70] Warning: torch.use_deterministic_algorithms is in beta, and its design and functionality may change in the future. (function operator())
INFO:filelock:Lock 140211123887800 acquired on .lock
/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/__init__.py:422: UserWarning: torch.set_deterministic is deprecated and will be removed in a future release. Please use torch.use_deterministic_algorithms instead
  "torch.set_deterministic is deprecated and will be removed in a future "
[W Context.cpp:70] Warning: torch.use_deterministic_algorithms is in beta, and its design and functionality may change in the future. (function operator())
/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/__init__.py:422: UserWarning: torch.set_deterministic is deprecated and will be removed in a future release. Please use torch.use_deterministic_algorithms instead
  "torch.set_deterministic is deprecated and will be removed in a future "
[W Context.cpp:70] Warning: torch.use_deterministic_algorithms is in beta, and its design and functionality may change in the future. (function operator())
/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/__init__.py:422: UserWarning: torch.set_deterministic is deprecated and will be removed in a future release. Please use torch.use_deterministic_algorithms instead
  "torch.set_deterministic is deprecated and will be removed in a future "
[W Context.cpp:70] Warning: torch.use_deterministic_algorithms is in beta, and its design and functionality may change in the future. (function operator())
INFO:filelock:Lock 140211123887800 released on .lock
INFO:filelock:Lock 140144150953768 acquired on .lock
INFO:filelock:Lock 140144150953768 released on .lock
INFO:filelock:Lock 139898741587640 acquired on .lock
Traceback (most recent call last):
  File "train.py", line 225, in <module>
    main()
  File "train.py", line 41, in main
    training_args, = parser.parse_args_into_dataclasses()
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/hf_argparser.py", line 191, in parse_args_into_dataclasses
    obj = dtype(**inputs)
  File "<string>", line 83, in __init__
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 702, in __post_init__
    if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1727, in wrapper
    return func(*args, **kwargs)
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 873, in device
    return self._setup_devices
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1717, in __get__
    cached = self.fget(obj)
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1727, in wrapper
    return func(*args, **kwargs)
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 858, in _setup_devices
    torch.distributed.init_process_group(backend="nccl")
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 510, in init_process_group
    timeout=timeout))
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 603, in _new_process_group_helper
    timeout)
RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!
INFO:filelock:Lock 139898741587640 released on .lock
INFO:filelock:Lock 139711655354096 acquired on .lock
Traceback (most recent call last):
  File "train.py", line 225, in <module>
    main()
  File "train.py", line 41, in main
    training_args, = parser.parse_args_into_dataclasses()
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/hf_argparser.py", line 191, in parse_args_into_dataclasses
    obj = dtype(**inputs)
  File "<string>", line 83, in __init__
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 702, in __post_init__
    if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1727, in wrapper
    return func(*args, **kwargs)
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 873, in device
    return self._setup_devices
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1717, in __get__
    cached = self.fget(obj)
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1727, in wrapper
    return func(*args, **kwargs)
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 858, in _setup_devices
    torch.distributed.init_process_group(backend="nccl")
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 510, in init_process_group
    timeout=timeout))
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 603, in _new_process_group_helper
    timeout)
RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!
INFO:filelock:Lock 139711655354096 released on .lock
Traceback (most recent call last):
  File "train.py", line 225, in <module>
    main()
  File "train.py", line 41, in main
    training_args, = parser.parse_args_into_dataclasses()
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/hf_argparser.py", line 191, in parse_args_into_dataclasses
    obj = dtype(**inputs)
  File "<string>", line 83, in __init__
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 702, in __post_init__
    if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1727, in wrapper
    return func(*args, **kwargs)
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 873, in device
    return self._setup_devices
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1717, in __get__
    cached = self.fget(obj)
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1727, in wrapper
    return func(*args, **kwargs)
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 858, in _setup_devices
    torch.distributed.init_process_group(backend="nccl")
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 510, in init_process_group
    timeout=timeout))
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 603, in _new_process_group_helper
    timeout)
RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!
Traceback (most recent call last):
  File "train.py", line 225, in <module>
    main()
  File "train.py", line 41, in main
    training_args, = parser.parse_args_into_dataclasses()
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/hf_argparser.py", line 191, in parse_args_into_dataclasses
    obj = dtype(**inputs)
  File "<string>", line 83, in __init__
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 702, in __post_init__
    if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1727, in wrapper
    return func(*args, **kwargs)
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 873, in device
    return self._setup_devices
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1717, in __get__
    cached = self.fget(obj)
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1727, in wrapper
    return func(*args, **kwargs)
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 858, in _setup_devices
    torch.distributed.init_process_group(backend="nccl")
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 510, in init_process_group
    timeout=timeout))
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 603, in _new_process_group_helper
    timeout)
RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!
Killing subprocess 30316
Killing subprocess 30320
Killing subprocess 30321
Killing subprocess 30322
Traceback (most recent call last):
  File "/home2/xh/.conda/envs/skg/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home2/xh/.conda/envs/skg/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launch.py", line 340, in <module>
    main()
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launch.py", line 326, in main
    sigkill_handler(signal.SIGTERM, None)  # not coming back
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
    raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/home2/xh/.conda/envs/skg/bin/python', '-u', 'train.py', '--local_rank=3', '--seed', '2', '--cfg', 'Salesforce/T5_base_prefix_compwebq.cfg', '--run_name', 'T5_base_prefix_compwebq', '--logging_strategy', 'steps', '--logging_first_step', 'true', '--logging_steps', '4', '--evaluation_strategy', 'steps', '--eval_steps', '500', '--metric_for_best_model', 'avr', '--greater_is_better', 'true', '--save_strategy', 'steps', '--save_steps', '500', '--save_total_limit', '1', '--load_best_model_at_end', '--gradient_accumulation_steps', '2', '--num_train_epochs', '400', '--adafactor', 'true', '--learning_rate', '5e-5', '--do_train', '--do_eval', '--do_predict', '--predict_with_generate', '--output_dir', 'output/T5_base_prefix_compwebq', '--overwrite_output_dir', '--per_device_train_batch_size', '2', '--per_device_eval_batch_size', '4', '--generation_num_beams', '4', '--generation_max_length', '128', '--input_max_length', '1024', '--ddp_find_unused_parameters', 'true']' returned non-zero exit status 1.

Process finished with exit code 1

I also tried torch 1.11+ cu113，got another error


(skg) xh@4210GPU:~/PycharmProject/UnifiedSKG$ python -m torch.distributed.launch --nproc_per_node 4 --master_port 1234 train.py --seed 2 --cfg Salesforce/T5_base_finetune_compwebq.cfg --run_name T5_base_finetune_compwebq --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 500 --metric_for_best_model avr --greater_is_better true --save_strategy steps --save_steps 500 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 2 --num_train_epochs 400 --adafactor true --learning_rate 5e-5 --do_train --do_eval --do_predict --predict_with_generate --output_dir output/T5_base_finetune_compwebq --overwrite_output_dir --per_device_train_batch_size 1 --per_device_eval_batch_size 4 --generation_num_beams 4 --generation_max_length 128 --input_max_length 1024 --ddp_find_unused_parameters true
/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions

  FutureWarning,
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
Traceback (most recent call last):
  File "train.py", line 225, in <module>
    main()
  File "train.py", line 29, in main
    torch.set_deterministic(True)
AttributeError: module 'torch' has no attribute 'set_deterministic'
Traceback (most recent call last):
  File "train.py", line 225, in <module>
    main()
  File "train.py", line 29, in main
    torch.set_deterministic(True)
AttributeError: module 'torch' has no attribute 'set_deterministic'
Traceback (most recent call last):
  File "train.py", line 225, in <module>
    main()
  File "train.py", line 29, in main
    torch.set_deterministic(True)
AttributeError: module 'torch' has no attribute 'set_deterministic'
Traceback (most recent call last):
  File "train.py", line 225, in <module>
    main()
  File "train.py", line 29, in main
    torch.set_deterministic(True)
AttributeError: module 'torch' has no attribute 'set_deterministic'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 17913) of binary: /home2/xh/.conda/envs/skg/bin/python
Traceback (most recent call last):
  File "/home2/xh/.conda/envs/skg/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home2/xh/.conda/envs/skg/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in <module>
    main()
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/run.py", line 718, in run
    )(*cmd_args)
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 247, in launch_agent
    failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
train.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2022-03-14_21:36:50
  host      : 4210GPU
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 17914)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
  time      : 2022-03-14_21:36:50
  host      : 4210GPU
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 17915)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
  time      : 2022-03-14_21:36:50
  host      : 4210GPU
  rank      : 3 (local_rank: 3)
  exitcode  : 1 (pid: 17916)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2022-03-14_21:36:50
  host      : 4210GPU
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 17913)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

Looking forward to your reply. Thank you.

opened by cdhx 11

GPU and batch size setting

I am using the training command in readme python -m torch.distributed.launch --nproc_per_node 4 --master_port 1234 train.py --seed 2 --cfg Salesforce/T5_base_finetune_wikitq.cfg --run_name T5_base_finetune_wikitq --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 500 --metric_for_best_model avr --greater_is_better true --save_strategy steps --save_steps 500 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 8 --num_train_epochs 400 --adafactor true --learning_rate 5e-5 --do_train --do_eval --do_predict --predict_with_generate --output_dir output/T5_base_finetune_wikitq --overwrite_output_dir --per_device_train_batch_size 4 --per_device_eval_batch_size 16 --generation_num_beams 4 --generation_max_length 128 --input_max_length 1024 --ddp_find_unused_parameters true my question is how to set GPU and batch size, it said this command is 4 GPU and 128 batch size, but i didn't see it in this command, neither in the code Thx
good first issue

opened by cdhx 11
the original prefix-tuning corresponds to the models in the directory `models/prompt`?

Dear authors,

Thank you so much for the effort and opensourcing the well-maintained and clean codebase! And I also appreciate your detailed explanations on Zhihu.

I just have a quick specific question, are the scripts inside models/prompt the re-implementation of the prefix tuning paper (Li & Liang)?

Thank you so much!

opened by Chacha-Chen 8
How can I specify PLM folder

I have problem in downloading or caching the PLM due to my connection and blocked websites.

I want to give the folder of PLM which was already downloaded. How can I set it in ymal? Thanks

opened by puraminy 8
How to use fp16

Hi, I find the deepspeed directory in the project. And I want to know how to train with fp16? Now, I just add the --fp16 in the command, but it seems not to work.

opened by JBoRu 7

[Deprecated] Separate setting: OverflowError: cannot fit 'int' into an index-sized integer

in fetaqa config file if I change concatenate to separate and run prefix tuning the following error occurs

file train.py", line 185, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/pouramini/anaconda3/envs/uni/lib/python3.7/site-packages/transformers/trainer.py", line 1260, in train
    for step, inputs in enumerate(epoch_iterator):
  File "/home/pouramini/anaconda3/envs/uni/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 435, in __next__
    data = self._next_data()
  File "/home/pouramini/anaconda3/envs/uni/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 475, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/pouramini/anaconda3/envs/uni/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/pouramini/anaconda3/envs/uni/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/pouramini/UnifiedSKG/utils/dataset.py", line 116, in __getitem__
    max_length=self.tokenizer.model_max_length,
  File "/home/pouramini/anaconda3/envs/uni/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 2406, in __call__
    **kwargs,
  File "/home/pouramini/anaconda3/envs/uni/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 2476, in encode_plus
    **kwargs,
  File "/home/pouramini/anaconda3/envs/uni/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 480, in _encode_plus
    verbose=verbose,
  File "/home/pouramini/anaconda3/envs/uni/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 2913, in prepare_for_model
    return_attention_mask=return_attention_mask,
  File "/home/pouramini/anaconda3/envs/uni/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 2731, in pad
    return_attention_mask=return_attention_mask,
  File "/home/pouramini/anaconda3/envs/uni/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 3065, in _pad
    encoded_inputs["attention_mask"] = [1] * len(required_input) + [0] * difference
OverflowError: cannot fit 'int' into an index-sized integer

opened by puraminy 7

Knowledge Graph as Input: question-specific subgraphs

Hi, very exciting work!

I have a question on how you create the question-specific subgraphs when using Knowledge Graphs as input (i.e., ComplexWebQ). By navigating in compwebq/test.jsonl, I see that the maximum number of triplets used over all questions is 61 and that at least one answer lies within the subgraphs in 2725/2816 (96.8%) test questions.

Do you use specific mechanisms to prune irrelevant facts and how you make sure to contain the answers?

Thanks a lot!
good first issue

opened by cmavro 7
Questions about MultiWOZ and SMD (KVRET)
Thank you for your awesome work!

I have two questions about structured knowledge processing on MultiWOZ and SMD (KVRET) datasets:

For MultiWOZ dataset, what is ontology_values for non-categorical slots (e.g. name, time)

https://github.com/HKUNLP/UnifiedSKG/blob/65157f72d259c88d14603dd33ce747124e286f33/seq2seq_construction/multiwoz.py#L87-L88

For SMD (KVRET) dataset, the whole KB (without any explicit / hidden row selection) is fed into linearized structured knowledge, right?
opened by ShaneTian 7
How can I train the multi-task models?

Hi, thanks for the great project and I am quite interested in it.

I briefly checked the code repo and the training process but I didn't find the right configuration to train the unified (multi-task) model. Any pointers or suggestions for this?

opened by jifan-chen 6
Question on Prefix tuning code
Hi, I am looking at prefix tuning code..I have few queries on the implementation.

what exactly are the variables in these lines? I understand that prefix tuning provides input to every layer of the encoder-decoder model....But my understanding is that there should be a single wte and control_trans; not sure what the variables in the highlighted lines do.

I dont understand why the *2 in this line of code?

What does the control_trans variable mean in the code? what is its function?

Also, I see another variable mid_dim. What is it conceptually?

Thank you
opened by base-y 6

A Unified Framework and Analysis for Structured Knowledge Grounding

Related tags

Overview

UnifiedSKG 📚 : Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models

Updates

Content

Cloning this repo

Dependencies

Usage

Environment setup

WandB setup

Training

Load weights

Code structure overview of UnifiedSKG

How to unify a new task into the framework?

Contributors

Comments

Owner

HKU NLP Group

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

Build a medical knowledge graph based on Unified Language Medical System (UMLS)

A Fast and Accurate One-Stage Approach to Visual Grounding, ICCV 2019 (Oral)

PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021]

[CVPR2021] Look before you leap: learning landmark features for one-stage visual grounding.

A PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral)

Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

[ICCV2021] 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

SeqTR: A Simple yet Universal Network for Visual Grounding

TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

A Python framework for developing parallelized Computational Fluid Dynamics software to solve the hyperbolic 2D Euler equations on distributed, multi-block structured grids.

UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.

[IJCAI-2021] A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation"

Source Code for our paper: Understand me, if you refer to Aspect Knowledge: Knowledge-aware Gated Recurrent Memory Network

Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement