JaMIE: a Japanese Medical Information Extraction toolkit
Joint Japanese Medical Problem, Modality and Relation Recognition
The Train/Test phrases require all train, dev, test file converted to CONLL-style. Please check data_converter.py
Installation (python3.8)
git clone https://github.com/racerandom/JaMIE.git
cd JaMIE \
Required python package
pip install -r requirements.txt
Mophological analyzer required:\
Pretrained BERT required:\
NICT-BERT (NICT_BERT-base_JapaneseWikipedia_32K_BPE)
Train:
CUDA_VISIBLE_DEVICES=$SEED python clinical_joint.py \
--pretrained_model $PRETRAINED_BERT \
--train_file $TRAIN_FILE \
--dev_file $DEV_FILE \
--dev_output $DEV_OUT \
--saved_model $MODEL_DIR_TO_SAVE \
--enc_lr 2e-5 \
--batch_size 4 \
--warmup_epoch 2 \
--num_epoch 20 \
--do_train
--fp16 (apex required)
The models trained on radiography interpretation reports of Lung Cancer (LC) and general medical reports of Idiopathic Pulmonary Fibrosis (IPF) are to be availabel: link1, link2.
Test:
CUDA_VISIBLE_DEVICES=$SEED python clinical_joint.py \
--saved_model $SAVED_MODEL \
--test_file $TEST_FILE \
--test_output $TEST_OUT \
--batch_size 4
Bath Converter from XML (or raw text) to CONLL for Train/Test
Convert XML files to CONLL files for Train/Test. You can also convert raw text to CONLL-style for Test.
python data_converter.py \
--mode xml2conll \
--xml $XML_FILES_DIR \
--conll $OUTPUT_CONLL_DIR \
--cv_num 5 \ # 5-fold cross-validation, 0 presents to generate single conll file
--doc_level \ # generate document-level ([SEP] denotes sentence boundaries) or sentence-level conll files
--segmenter mecab \ # please use mecab and NICT bert currently
--bert_dir $PRETRAINED_BERT
Batch Converter from predicted CONLL to XML
python data_converter.py \
--mode conll2xml \
--xml $XML_FILES_DIR \
--conll $OUTPUT_CONLL_DIR
Citation
If you use our code in your research, please cite our work:
@inproceedings{cheng2021jamie,
title={JaMIE: A Pipeline Japanese Medical Information Extraction System,
author={Fei Cheng, Shuntaro Yada, Ribeka Tanaka, Eiji Aramaki, Sadao Kurohashi},
booktitle={arXiv},
year={2021}
}