A Unified Framework and Analysis for Structured Knowledge Grounding

Overview

UnifiedSKG ๐Ÿ“š : Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models

Open In Colab

Code for paper UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models. Please refer to our project page for up-to-date related resources (e.g., papers, code, tools, tutorials) in Structured Knowledge Grounding.

Structured knowledge grounding (SKG) leverages structured knowledge to complete user requests, such as semantic parsing over databases and question answering over knowledge bases. Since the inputs and outputs of SKG tasks are heterogeneous, they were historically studied in separate by different communities, which limits systematic and compatible research on SKG. In this paper, we overcome this limitation by proposing the UnifiedSKG framework, which unifies 21 SKG tasks into the text-to-text format, aiming to promote systematic SKG research, instead of being exclusive to a single task, domain, or dataset. We show that large language models like T5, with simple modification when necessary, achieve state-of-the-art performance on all 21 tasks. UnifiedSKG facilitates the investigation of multi-task, zero-shot, and few-shot learning. We demonstrate that multi-task prefix-tuning with UNIFIEDSKG improves the performance on most tasks and show that T0, GPT-3, and Codex struggle in zero-shot and few-shot learning for SKG. UnifiedSKG also enables a series of controlled experiments on structured knowledge encoding variants across SKG tasks. We find that T5โ€™s sensitivity to structured knowledge encoding variations varies across tasks.

UnifiedSKG is easily extensible to more tasks. We encourage researchers to make a pull request to add their datasets, metrics, models to the UnifiedSKG framework!

Updates

Content

Cloning this repo

In order to include third-party dependencies in this repository, make sure to clone recursively, e.g.:

git clone --recurse-submodules [email protected]:HKUNLP/UnifiedSKG.git

Dependencies

To establish the environment run this code in the shell (the third line is for CUDA11.1):

conda env create -f py3.7pytorch1.8.yaml
conda activate py3.7pytorch1.8new
pip install datasets==1.14.0
# The following line to be replaced depending on your cuda version.
pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html

That will create the environment py3.7pytorch1.8new we used.

Usage

Environment setup

Activate the environment by running

conda activate py3.7pytorch1.8new

WandB setup

Setup WandB for logging (registration needed):

export WANDB_ENTITY=YOUR_WANDB_USERNAME
export WANDB_API_KEY=YOUR_WANDB_API_KEY
export WANDB_PROJECT=YOUR_PROJECT_NAME

Training

T5-base finetuning on WikiTQ (4 GPUs, 128 effective batch size)

python -m torch.distributed.launch --nproc_per_node 4 --master_port 1234 train.py --seed 2 --cfg Salesforce/T5_base_finetune_wikitq.cfg --run_name T5_base_finetune_wikitq --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 500 --metric_for_best_model avr --greater_is_better true --save_strategy steps --save_steps 500 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 8 --num_train_epochs 400 --adafactor true --learning_rate 5e-5 --do_train --do_eval --do_predict --predict_with_generate --output_dir output/T5_base_finetune_wikitq --overwrite_output_dir --per_device_train_batch_size 4 --per_device_eval_batch_size 16 --generation_num_beams 4 --generation_max_length 128 --input_max_length 1024 --ddp_find_unused_parameters true

If you want to resume training, remove the --overwrite_output_dir flag from the above command:

python -m torch.distributed.launch --nproc_per_node 4 --master_port 1234 train.py --seed 2 --cfg Salesforce/T5_base_finetune_wikitq.cfg --run_name T5_base_finetune_wikitq --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 500 --metric_for_best_model avr --greater_is_better true --save_strategy steps --save_steps 500 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 8 --num_train_epochs 400 --adafactor true --learning_rate 5e-5 --do_train --do_eval --do_predict --predict_with_generate --output_dir output/T5_base_finetune_wikitq --per_device_train_batch_size 4 --per_device_eval_batch_size 16 --generation_num_beams 4 --generation_max_length 128 --input_max_length 1024 --ddp_find_unused_parameters true

T5-base prefix-tuning on WikiTQ (4 GPUs, 128 effective batch size)

python -m torch.distributed.launch --nproc_per_node 4 --master_port 1234 train.py --seed 2 --cfg Salesforce/T5_base_prefix_wikitq.cfg --run_name T5_base_prefix_wikitq --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 500 --metric_for_best_model avr --greater_is_better true --save_strategy steps --save_steps 500 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 8 --num_train_epochs 400 --adafactor true --learning_rate 5e-5 --do_train --do_eval --do_predict --predict_with_generate --output_dir output/T5_base_prefix_wikitq --overwrite_output_dir --per_device_train_batch_size 4 --per_device_eval_batch_size 16 --generation_num_beams 4 --generation_max_length 128 --input_max_length 1024 --ddp_find_unused_parameters true

T5-3b finetuning on WikiTQ (8 GPUs, 128 effective batch size)

deepspeed train.py --deepspeed deepspeed/ds_config_zero2.json --seed 2 --cfg Salesforce/T5_3b_finetune_wikitq.cfg --run_name T5_3b_finetune_wikitq --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 500 --metric_for_best_model avr --greater_is_better true --save_strategy steps --save_steps 500 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 16 --num_train_epochs 50 --adafactor false --learning_rate 5e-5 --do_train --do_eval --do_predict --predict_with_generate --output_dir output/T5_3b_finetune_wikitq --overwrite_output_dir --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --generation_num_beams 4 --generation_max_length 128 --input_max_length 1024 --ddp_find_unused_parameters true

Load weights

See Open In Colab

Code structure overview of UnifiedSKG

.
โ”œโ”€โ”€ configure                              # Config files for experiments, tasks, and settings
โ”‚   โ”œโ”€โ”€ META_TUNING                        # Config files for tasks and settings
โ”‚   โ””โ”€โ”€ Salesforce                         # Config files for experiments. We name this diretory as Salesforce to thank Salesforce Research for providing a large number of GPUs. We would like also to thank Amazon Research Awards, ServiceNow Research, and Yale NLP for providing computing resources generously.
โ”‚
โ”œโ”€โ”€ metrics                                # Code for evaluation
โ”‚   โ””โ”€โ”€ ...                                # Please check the README of the ./seq2seq_construction.
โ”œโ”€โ”€ models                                 # Code for models
โ”‚   โ”œโ”€โ”€ adapter                            # Code for T5 and BART with adapters (based on HuggingFace Transformers)
โ”‚   โ”œโ”€โ”€ prompt                             # Code for T5 and BART with prefix-tuning (based on HuggingFace Transformers)
โ”‚   โ””โ”€โ”€ unified
โ”‚           โ”œโ”€โ”€ base.py                    # Code for the base model that enables an arbitrary model to be pushed to HuggingFace Model Hub (namely, PushToHubFriendlyModel)
โ”‚           โ”œโ”€โ”€ finetune.py                # Code for finetuning
โ”‚           โ”œโ”€โ”€ adaptertuning.py           # Code for adapter-tuning
โ”‚           โ””โ”€โ”€ prefixtuning.py            # Code for prefix-tuning
โ”‚
โ”œโ”€โ”€ seq2seq_construction                   # Code for converting raw data into sequences
โ”‚    โ””โ”€โ”€  ...                              # Please check the README in this directory.
โ”‚
โ”œโ”€โ”€ tasks                                  # Code for loading raw data
โ”‚    โ””โ”€โ”€  ...                              # Please check the README in this directory.
โ”‚
โ”œโ”€โ”€ third_party                            # Packages from third parties
โ”‚    โ””โ”€โ”€  ...                              # Please check the README in this directory.
โ”‚
โ”œโ”€โ”€ utils                                  # Code for some (probably) useful stuff
โ”‚       โ”œโ”€โ”€ processor                      # Adopted from Tapex: the processor that handles table truncation and linearization
        โ”‚        โ””โ”€โ”€  ...            
โ”‚       โ”œโ”€โ”€ configure.py                   # Code for parsing config files in ./configure
โ”‚       โ”œโ”€โ”€ dataset.py                     # Code for converting input and output sequences into Datasets for training
โ”‚       โ”œโ”€โ”€ tool.py                        # Code for loading models, seq2seq constructors, and evaluators
โ”‚       โ”œโ”€โ”€ trainer.py                     # Code for EvaluationFriendlyTrainer. If you want make training-specific modifications, you may want to change something here.
โ”‚       โ””โ”€โ”€ training_arguments.py          # Code for seq2seq training arguments
โ”‚
โ”œโ”€โ”€ .gitignore                 
โ”œโ”€โ”€ .gitmodules                    
โ”œโ”€โ”€ py3.7pytorch1.8.yaml                   # Anaconda environment config file
โ”œโ”€โ”€ README.md                              # The README file you are looking at :)
โ””โ”€โ”€ train.py                               # Entry code, which controls train, eval, test, storage, and logging

How to unify a new task into the framework?

(README in ./tasks, ./seq2seq_construction, ./metrics, ./configure can also be useful)

  • step 1, Add the "Loader" of raw data in ./tasks, (you can search in huggingface dataset website firstly to find whether there is already a usable script, if not, that's great because you can be the contributor of both this project and huggingface community.

  • step 2, Add the "Wrapper" which construct "seq_in"("user request input" & "structured knowledge input") and "seq_out" from and add to the raw_data for seq2seq unification.

  • step 3, Add the "Evaluator"(for task) in ./metrics. if any third_party repo are used, please add them into .gitmodules.

  • step 3.5(optional), You can always add new "Model" into the ./models/ if you like, change the path in config files to drive new model.

  • step 4, Add the "Config" file to drive your task or all the tasks we have by finetune/multi-task-finetune/pretrain/prefix-tuning/multi-task-prefix-tuning... or other ways.

And this is all for it ! =)

Contributors

Comments
  • prefix tuning with t5-3b

    prefix tuning with t5-3b

    I am trying to run prefix tuning with t5-3b, but I got some strange error

      File "/home/ubuntu/anaconda3/envs/py3.7pytorch1.8new/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/ubuntu/code/UnifiedSKG/models/prompt/modeling_t5.py", line 486, in forward
        key_states = torch.cat([prefix["prev_key"], key_states], dim=2)
    RuntimeError: Sizes of tensors must match except in dimension 3. Got 128 and 32 (The offending index is 0)
    

    This error does not take place for t5-base or t5-large, only got this for t5-3b. Any tips? Also I am having OOM issue with t5-3b model, it crashed even in case of mini-batch size = 1 and running on a 40GB GPU. Does anyone have the same issue? Thanks.

    bug 
    opened by zluw1117 11
  • NCCL version

    NCCL version

    Hi

    i have installed environment in the yaml file and installed torch 1.8 follow the setting in readme

    my cuda version is 11.4, it seems that it is a version conflict of NCCL, pytorch and cuda

    Is my cuda version to high?

    ssh://[email protected]:22/home2/xh/.conda/envs/skg/bin/python -u -m torch.distributed.launch --nproc_per_node 4 --master_port 1234 train.py --seed 2 --cfg Salesforce/T5_base_prefix_compwebq.cfg --run_name T5_base_prefix_compwebq --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 500 --metric_for_best_model avr --greater_is_better true --save_strategy steps --save_steps 500 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 2 --num_train_epochs 400 --adafactor true --learning_rate 5e-5 --do_train --do_eval --do_predict --predict_with_generate --output_dir output/T5_base_prefix_compwebq --overwrite_output_dir --per_device_train_batch_size 2 --per_device_eval_batch_size 4 --generation_num_beams 4 --generation_max_length 128 --input_max_length 1024 --ddp_find_unused_parameters true
    *****************************************
    Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
    *****************************************
    /home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/__init__.py:422: UserWarning: torch.set_deterministic is deprecated and will be removed in a future release. Please use torch.use_deterministic_algorithms instead
      "torch.set_deterministic is deprecated and will be removed in a future "
    [W Context.cpp:70] Warning: torch.use_deterministic_algorithms is in beta, and its design and functionality may change in the future. (function operator())
    INFO:filelock:Lock 140211123887800 acquired on .lock
    /home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/__init__.py:422: UserWarning: torch.set_deterministic is deprecated and will be removed in a future release. Please use torch.use_deterministic_algorithms instead
      "torch.set_deterministic is deprecated and will be removed in a future "
    [W Context.cpp:70] Warning: torch.use_deterministic_algorithms is in beta, and its design and functionality may change in the future. (function operator())
    /home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/__init__.py:422: UserWarning: torch.set_deterministic is deprecated and will be removed in a future release. Please use torch.use_deterministic_algorithms instead
      "torch.set_deterministic is deprecated and will be removed in a future "
    [W Context.cpp:70] Warning: torch.use_deterministic_algorithms is in beta, and its design and functionality may change in the future. (function operator())
    /home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/__init__.py:422: UserWarning: torch.set_deterministic is deprecated and will be removed in a future release. Please use torch.use_deterministic_algorithms instead
      "torch.set_deterministic is deprecated and will be removed in a future "
    [W Context.cpp:70] Warning: torch.use_deterministic_algorithms is in beta, and its design and functionality may change in the future. (function operator())
    INFO:filelock:Lock 140211123887800 released on .lock
    INFO:filelock:Lock 140144150953768 acquired on .lock
    INFO:filelock:Lock 140144150953768 released on .lock
    INFO:filelock:Lock 139898741587640 acquired on .lock
    Traceback (most recent call last):
      File "train.py", line 225, in <module>
        main()
      File "train.py", line 41, in main
        training_args, = parser.parse_args_into_dataclasses()
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/hf_argparser.py", line 191, in parse_args_into_dataclasses
        obj = dtype(**inputs)
      File "<string>", line 83, in __init__
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 702, in __post_init__
        if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1727, in wrapper
        return func(*args, **kwargs)
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 873, in device
        return self._setup_devices
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1717, in __get__
        cached = self.fget(obj)
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1727, in wrapper
        return func(*args, **kwargs)
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 858, in _setup_devices
        torch.distributed.init_process_group(backend="nccl")
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 510, in init_process_group
        timeout=timeout))
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 603, in _new_process_group_helper
        timeout)
    RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!
    INFO:filelock:Lock 139898741587640 released on .lock
    INFO:filelock:Lock 139711655354096 acquired on .lock
    Traceback (most recent call last):
      File "train.py", line 225, in <module>
        main()
      File "train.py", line 41, in main
        training_args, = parser.parse_args_into_dataclasses()
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/hf_argparser.py", line 191, in parse_args_into_dataclasses
        obj = dtype(**inputs)
      File "<string>", line 83, in __init__
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 702, in __post_init__
        if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1727, in wrapper
        return func(*args, **kwargs)
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 873, in device
        return self._setup_devices
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1717, in __get__
        cached = self.fget(obj)
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1727, in wrapper
        return func(*args, **kwargs)
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 858, in _setup_devices
        torch.distributed.init_process_group(backend="nccl")
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 510, in init_process_group
        timeout=timeout))
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 603, in _new_process_group_helper
        timeout)
    RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!
    INFO:filelock:Lock 139711655354096 released on .lock
    Traceback (most recent call last):
      File "train.py", line 225, in <module>
        main()
      File "train.py", line 41, in main
        training_args, = parser.parse_args_into_dataclasses()
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/hf_argparser.py", line 191, in parse_args_into_dataclasses
        obj = dtype(**inputs)
      File "<string>", line 83, in __init__
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 702, in __post_init__
        if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1727, in wrapper
        return func(*args, **kwargs)
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 873, in device
        return self._setup_devices
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1717, in __get__
        cached = self.fget(obj)
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1727, in wrapper
        return func(*args, **kwargs)
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 858, in _setup_devices
        torch.distributed.init_process_group(backend="nccl")
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 510, in init_process_group
        timeout=timeout))
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 603, in _new_process_group_helper
        timeout)
    RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!
    Traceback (most recent call last):
      File "train.py", line 225, in <module>
        main()
      File "train.py", line 41, in main
        training_args, = parser.parse_args_into_dataclasses()
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/hf_argparser.py", line 191, in parse_args_into_dataclasses
        obj = dtype(**inputs)
      File "<string>", line 83, in __init__
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 702, in __post_init__
        if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1727, in wrapper
        return func(*args, **kwargs)
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 873, in device
        return self._setup_devices
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1717, in __get__
        cached = self.fget(obj)
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/file_utils.py", line 1727, in wrapper
        return func(*args, **kwargs)
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/transformers/training_args.py", line 858, in _setup_devices
        torch.distributed.init_process_group(backend="nccl")
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 510, in init_process_group
        timeout=timeout))
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 603, in _new_process_group_helper
        timeout)
    RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!
    Killing subprocess 30316
    Killing subprocess 30320
    Killing subprocess 30321
    Killing subprocess 30322
    Traceback (most recent call last):
      File "/home2/xh/.conda/envs/skg/lib/python3.7/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/home2/xh/.conda/envs/skg/lib/python3.7/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launch.py", line 340, in <module>
        main()
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launch.py", line 326, in main
        sigkill_handler(signal.SIGTERM, None)  # not coming back
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
        raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
    subprocess.CalledProcessError: Command '['/home2/xh/.conda/envs/skg/bin/python', '-u', 'train.py', '--local_rank=3', '--seed', '2', '--cfg', 'Salesforce/T5_base_prefix_compwebq.cfg', '--run_name', 'T5_base_prefix_compwebq', '--logging_strategy', 'steps', '--logging_first_step', 'true', '--logging_steps', '4', '--evaluation_strategy', 'steps', '--eval_steps', '500', '--metric_for_best_model', 'avr', '--greater_is_better', 'true', '--save_strategy', 'steps', '--save_steps', '500', '--save_total_limit', '1', '--load_best_model_at_end', '--gradient_accumulation_steps', '2', '--num_train_epochs', '400', '--adafactor', 'true', '--learning_rate', '5e-5', '--do_train', '--do_eval', '--do_predict', '--predict_with_generate', '--output_dir', 'output/T5_base_prefix_compwebq', '--overwrite_output_dir', '--per_device_train_batch_size', '2', '--per_device_eval_batch_size', '4', '--generation_num_beams', '4', '--generation_max_length', '128', '--input_max_length', '1024', '--ddp_find_unused_parameters', 'true']' returned non-zero exit status 1.
    
    Process finished with exit code 1
    
    

    I also tried torch 1.11+ cu113๏ผŒgot another error

    
    (skg) xh@4210GPU:~/PycharmProject/UnifiedSKG$ python -m torch.distributed.launch --nproc_per_node 4 --master_port 1234 train.py --seed 2 --cfg Salesforce/T5_base_finetune_compwebq.cfg --run_name T5_base_finetune_compwebq --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 500 --metric_for_best_model avr --greater_is_better true --save_strategy steps --save_steps 500 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 2 --num_train_epochs 400 --adafactor true --learning_rate 5e-5 --do_train --do_eval --do_predict --predict_with_generate --output_dir output/T5_base_finetune_compwebq --overwrite_output_dir --per_device_train_batch_size 1 --per_device_eval_batch_size 4 --generation_num_beams 4 --generation_max_length 128 --input_max_length 1024 --ddp_find_unused_parameters true
    /home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated
    and will be removed in future. Use torchrun.
    Note that --use_env is set by default in torchrun.
    If your script expects `--local_rank` argument to be set, please
    change it to read from `os.environ['LOCAL_RANK']` instead. See
    https://pytorch.org/docs/stable/distributed.html#launch-utility for
    further instructions
    
      FutureWarning,
    WARNING:torch.distributed.run:
    *****************************************
    Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
    *****************************************
    Traceback (most recent call last):
      File "train.py", line 225, in <module>
        main()
      File "train.py", line 29, in main
        torch.set_deterministic(True)
    AttributeError: module 'torch' has no attribute 'set_deterministic'
    Traceback (most recent call last):
      File "train.py", line 225, in <module>
        main()
      File "train.py", line 29, in main
        torch.set_deterministic(True)
    AttributeError: module 'torch' has no attribute 'set_deterministic'
    Traceback (most recent call last):
      File "train.py", line 225, in <module>
        main()
      File "train.py", line 29, in main
        torch.set_deterministic(True)
    AttributeError: module 'torch' has no attribute 'set_deterministic'
    Traceback (most recent call last):
      File "train.py", line 225, in <module>
        main()
      File "train.py", line 29, in main
        torch.set_deterministic(True)
    AttributeError: module 'torch' has no attribute 'set_deterministic'
    ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 17913) of binary: /home2/xh/.conda/envs/skg/bin/python
    Traceback (most recent call last):
      File "/home2/xh/.conda/envs/skg/lib/python3.7/runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "/home2/xh/.conda/envs/skg/lib/python3.7/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in <module>
        main()
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
        launch(args)
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
        run(args)
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/run.py", line 718, in run
        )(*cmd_args)
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
        return launch_agent(self._config, self._entrypoint, list(args))
      File "/home2/xh/.conda/envs/skg/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 247, in launch_agent
        failures=result.failures,
    torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
    ============================================================
    train.py FAILED
    ------------------------------------------------------------
    Failures:
    [1]:
      time      : 2022-03-14_21:36:50
      host      : 4210GPU
      rank      : 1 (local_rank: 1)
      exitcode  : 1 (pid: 17914)
      error_file: <N/A>
      traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
    [2]:
      time      : 2022-03-14_21:36:50
      host      : 4210GPU
      rank      : 2 (local_rank: 2)
      exitcode  : 1 (pid: 17915)
      error_file: <N/A>
      traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
    [3]:
      time      : 2022-03-14_21:36:50
      host      : 4210GPU
      rank      : 3 (local_rank: 3)
      exitcode  : 1 (pid: 17916)
      error_file: <N/A>
      traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
    ------------------------------------------------------------
    Root Cause (first observed failure):
    [0]:
      time      : 2022-03-14_21:36:50
      host      : 4210GPU
      rank      : 0 (local_rank: 0)
      exitcode  : 1 (pid: 17913)
      error_file: <N/A>
      traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
    ============================================================
    
    

    Looking forward to your reply. Thank you.

    opened by cdhx 11
  • GPU and batch size setting

    GPU and batch size setting

    I am using the training command in readme python -m torch.distributed.launch --nproc_per_node 4 --master_port 1234 train.py --seed 2 --cfg Salesforce/T5_base_finetune_wikitq.cfg --run_name T5_base_finetune_wikitq --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 500 --metric_for_best_model avr --greater_is_better true --save_strategy steps --save_steps 500 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 8 --num_train_epochs 400 --adafactor true --learning_rate 5e-5 --do_train --do_eval --do_predict --predict_with_generate --output_dir output/T5_base_finetune_wikitq --overwrite_output_dir --per_device_train_batch_size 4 --per_device_eval_batch_size 16 --generation_num_beams 4 --generation_max_length 128 --input_max_length 1024 --ddp_find_unused_parameters true my question is how to set GPU and batch size, it said this command is 4 GPU and 128 batch size, but i didn't see it in this command, neither in the code Thx

    good first issue 
    opened by cdhx 11
  • the original prefix-tuning corresponds to the models in the directory `models/prompt`?

    the original prefix-tuning corresponds to the models in the directory `models/prompt`?

    Dear authors,

    Thank you so much for the effort and opensourcing the well-maintained and clean codebase! And I also appreciate your detailed explanations on Zhihu.

    I just have a quick specific question, are the scripts inside models/prompt the re-implementation of the prefix tuning paper (Li & Liang)?

    Thank you so much!

    opened by Chacha-Chen 8
  • How can I specify PLM folder

    How can I specify PLM folder

    I have problem in downloading or caching the PLM due to my connection and blocked websites.

    I want to give the folder of PLM which was already downloaded. How can I set it in ymal? Thanks

    opened by puraminy 8
  • How to use fp16

    How to use fp16

    Hi, I find the deepspeed directory in the project. And I want to know how to train with fp16? Now, I just add the --fp16 in the command, but it seems not to work.

    opened by JBoRu 7
  • [Deprecated] Separate setting: OverflowError: cannot fit 'int' into an index-sized integer

    [Deprecated] Separate setting: OverflowError: cannot fit 'int' into an index-sized integer

    in fetaqa config file if I change concatenate to separate and run prefix tuning the following error occurs

    file train.py", line 185, in main
        train_result = trainer.train(resume_from_checkpoint=checkpoint)
      File "/home/pouramini/anaconda3/envs/uni/lib/python3.7/site-packages/transformers/trainer.py", line 1260, in train
        for step, inputs in enumerate(epoch_iterator):
      File "/home/pouramini/anaconda3/envs/uni/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 435, in __next__
        data = self._next_data()
      File "/home/pouramini/anaconda3/envs/uni/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 475, in _next_data
        data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
      File "/home/pouramini/anaconda3/envs/uni/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
        data = [self.dataset[idx] for idx in possibly_batched_index]
      File "/home/pouramini/anaconda3/envs/uni/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
        data = [self.dataset[idx] for idx in possibly_batched_index]
      File "/home/pouramini/UnifiedSKG/utils/dataset.py", line 116, in __getitem__
        max_length=self.tokenizer.model_max_length,
      File "/home/pouramini/anaconda3/envs/uni/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 2406, in __call__
        **kwargs,
      File "/home/pouramini/anaconda3/envs/uni/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 2476, in encode_plus
        **kwargs,
      File "/home/pouramini/anaconda3/envs/uni/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 480, in _encode_plus
        verbose=verbose,
      File "/home/pouramini/anaconda3/envs/uni/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 2913, in prepare_for_model
        return_attention_mask=return_attention_mask,
      File "/home/pouramini/anaconda3/envs/uni/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 2731, in pad
        return_attention_mask=return_attention_mask,
      File "/home/pouramini/anaconda3/envs/uni/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 3065, in _pad
        encoded_inputs["attention_mask"] = [1] * len(required_input) + [0] * difference
    OverflowError: cannot fit 'int' into an index-sized integer
    
    
    opened by puraminy 7
  • Knowledge Graph as Input: question-specific subgraphs

    Knowledge Graph as Input: question-specific subgraphs

    Hi, very exciting work!

    I have a question on how you create the question-specific subgraphs when using Knowledge Graphs as input (i.e., ComplexWebQ). By navigating in compwebq/test.jsonl, I see that the maximum number of triplets used over all questions is 61 and that at least one answer lies within the subgraphs in 2725/2816 (96.8%) test questions.

    Do you use specific mechanisms to prune irrelevant facts and how you make sure to contain the answers?

    Thanks a lot!

    good first issue 
    opened by cmavro 7
  • Questions about MultiWOZ and SMD (KVRET)

    Questions about MultiWOZ and SMD (KVRET)

    Thank you for your awesome work!

    I have two questions about structured knowledge processing on MultiWOZ and SMD (KVRET) datasets:

    1. For MultiWOZ dataset, what is ontology_values for non-categorical slots (e.g. name, time)

    https://github.com/HKUNLP/UnifiedSKG/blob/65157f72d259c88d14603dd33ce747124e286f33/seq2seq_construction/multiwoz.py#L87-L88

    1. For SMD (KVRET) dataset, the whole KB (without any explicit / hidden row selection) is fed into linearized structured knowledge, right?
    opened by ShaneTian 7
  • How can I train the multi-task models?

    How can I train the multi-task models?

    Hi, thanks for the great project and I am quite interested in it.

    I briefly checked the code repo and the training process but I didn't find the right configuration to train the unified (multi-task) model. Any pointers or suggestions for this?

    opened by jifan-chen 6
  • Question on Prefix tuning code

    Question on Prefix tuning code

    Hi, I am looking at prefix tuning code..I have few queries on the implementation.

    1. what exactly are the variables in these lines? I understand that prefix tuning provides input to every layer of the encoder-decoder model....But my understanding is that there should be a single wte and control_trans; not sure what the variables in the highlighted lines do.
    2. I dont understand why the *2 in this line of code?
    3. What does the control_trans variable mean in the code? what is its function?
    4. Also, I see another variable mid_dim. What is it conceptually?

    Thank you

    opened by base-y 6
Owner
HKU NLP Group
HKU NLP Group
UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus General info This is

null 71 Oct 25, 2022
Build a medical knowledge graph based on Unified Language Medical System (UMLS)

UMLS-Graph Build a medical knowledge graph based on Unified Language Medical System (UMLS) Requisite Install MySQL Server 5.6 and import UMLS data int

Donghua Chen 6 Dec 25, 2022
A Fast and Accurate One-Stage Approach to Visual Grounding, ICCV 2019 (Oral)

One-Stage Visual Grounding ***** New: Our recent work on One-stage VG is available at ReSC.***** A Fast and Accurate One-Stage Approach to Visual Grou

Zhengyuan Yang 118 Dec 5, 2022
PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021]

piglet PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021] This repo contains code and data for PIGLeT. If you like

Rowan Zellers 51 Oct 8, 2022
[CVPR2021] Look before you leap: learning landmark features for one-stage visual grounding.

LBYL-Net This repo implements paper Look Before You Leap: Learning Landmark Features For One-Stage Visual Grounding CVPR 2021. Getting Started Prerequ

SVIP Lab 45 Dec 12, 2022
A PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral)

A PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral)

Biomedical Computer Vision @ Uniandes 52 Dec 19, 2022
Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

2D-TAN (Optimized) Introduction This is an optimized re-implementation repository for AAAI'2020 paper: Learning 2D Temporal Localization Networks for

Joya Chen 112 Dec 31, 2022
[ICCV2021] 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

3DVG-Transformer This repository is for the ICCV 2021 paper "3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds" Our method "3DV

null 22 Dec 11, 2022
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

TubeDETR: Spatio-Temporal Video Grounding with Transformers Website โ€ข STVG Demo โ€ข Paper This repository provides the code for our paper. This includes

Antoine Yang 108 Dec 27, 2022
SeqTR: A Simple yet Universal Network for Visual Grounding

SeqTR This is the official implementation of SeqTR: A Simple yet Universal Network for Visual Grounding, which simplifies and unifies the modelling fo

seanZhuh 76 Dec 24, 2022
TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

FunMatch-Distillation TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A g

Sayak Paul 67 Dec 20, 2022
A Python framework for developing parallelized Computational Fluid Dynamics software to solve the hyperbolic 2D Euler equations on distributed, multi-block structured grids.

pyHype: Computational Fluid Dynamics in Python pyHype is a Python framework for developing parallelized Computational Fluid Dynamics software to solve

Mohamed Khalil 21 Nov 22, 2022
UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.

Unified Multi-modal Transformers This repository maintains the official implementation of the paper UMT: Unified Multi-modal Transformers for Joint Vi

Applied Research Center (ARC), Tencent PCG 84 Jan 4, 2023
[IJCAI-2021] A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation"

DataFree A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation" Authors: Gongfa

ZJU-VIPA 47 Jan 9, 2023
Source Code for our paper: Understand me, if you refer to Aspect Knowledge: Knowledge-aware Gated Recurrent Memory Network

KaGRMN-DSG_ABSA This repository contains the PyTorch source Code for our paper: Understand me, if you refer to Aspect Knowledge: Knowledge-aware Gated

XingBowen 4 May 20, 2022
Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021) This repository is the official P

Jingyun Liang 159 Dec 30, 2022
Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021) This repository is the official P

Jingyun Liang 159 Dec 30, 2022
HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement

HiFi++ : a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement This is the unofficial implementation of Vocoder part of

Rishikesh (เค‹เคทเคฟเค•เฅ‡เคถ) 118 Dec 29, 2022