Hi,
could you just quickly summarize the steps, required to test the downloaded checkpoint on the WikiEvents dataset?
As I have observed, the WikiEvents dataset is actually referred to as KAIROS in some parts of the code - it also uses KAIROS data module which requires, for example, the test file to be located in preprocessed_KAIROS/test.jsonl.
I did the following steps:
- I have downloaded the WikiEvents dataset from S3 and stored in at data/wikievents.
- I have downloaded the checkpoints from S3 which are stored at checkpoints/Wikievents/ (note that the directory contains both epoch=1-v0.ckpt and epoch=2-v0.ckpt).
- I had to add the "--coref_dir" argument to the scripts/test_KAIROS.sh as it is referring to some other (non-existing) directory by default.
- The command for "train.py" is the following:
python train.py --model=constrained-gen --ckpt_name=WikiEvents-pred \
--load_ckpt=checkpoints/WikiEvents/epoch=2-v0.ckpt \
--dataset=KAIROS \
--eval_only \
--train_file=data/wikievents/train.jsonl \
--val_file=data/wikievents/dev.jsonl \
--test_file=data/wikievents/test.jsonl \
--coref_dir=data/wikievents/coref \
--train_batch_size=4 \
--eval_batch_size=4 \
--learning_rate=3e-5 \
--accumulate_grad_batches=4 \
--num_train_epochs=3
Note that this throws an error as it still tries to load the test_file from "preprocessed_KAIROS/test.jsonl".
5. Hoping to fix the issue, I have copied the data/wikievents/ to ./preprocessed_KAIROS/. Unfortunately, I get the following error:
File "/home/patrik/gen-arg/src/genie/data.py", line 15, in my_collate
doc_keys = [ex['doc_key'] for ex in batch]
File "/home/patrik/gen-arg/src/genie/data.py", line 15, in <listcomp>
doc_keys = [ex['doc_key'] for ex in batch]
KeyError: 'doc_key'
Do you maybe have an idea about what am I doing wrong?
Best, Patrik