Generating Radiology Reports via Memory-driven Transformer

CUHK-SZ NLP Group

Last update: Dec 13, 2022

Related tags

Deep Learning R2Gen

Overview

R2Gen

This is the implementation of Generating Radiology Reports via Memory-driven Transformer at EMNLP-2020.

Citations

If you use or extend our work, please cite our paper at EMNLP-2020.

@inproceedings{chen-emnlp-2020-r2gen,
    title = "Generating Radiology Reports via Memory-driven Transformer",
    author = "Chen, Zhihong and
      Song, Yan  and
      Chang, Tsung-Hui and
      Wan, Xiang",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2020",
}

Requirements

torch==1.5.1
torchvision==0.6.1
opencv-python==4.4.0.42

Download R2Gen

You can download the models we trained for each dataset from here.

Datasets

We use two datasets (IU X-Ray and MIMIC-CXR) in our paper.

For IU X-Ray, you can download the dataset from here and then put the files in data/iu_xray.

For MIMIC-CXR, you can download the dataset from here and then put the files in data/mimic_cxr.

Run on IU X-Ray

Run bash run_iu_xray.sh to train a model on the IU X-Ray data.

Run on MIMIC-CXR

Run bash run_mimic_cxr.sh to train a model on the MIMIC-CXR data.

Comments

About test code
Hello Zhihong, Thanks for opening your source code. It's very nice work. I'd like to ask you a few questions about reproducing paper results.

I evaluate results by saving generation sentence to json file. When I resume the model from your provide in checkpoint, and using command as follows:

CUDA_VISIBLE_DEVICES=4 python main.py \ --image_dir data/iu/images/ \ --ann_path data/iu/annotation.json \ --dataset_name iu_xray \ --max_seq_length 60 \ --threshold 3 \ --batch_size 16 \ --epochs 100 \ --save_dir results/reproduce_iu_xray \ --step_size 50 \ --gamma 0.1 \ --seed 9223 \ --resume data/model_iu_xray.pth

I see Checkpoint loaded. Resume training from epoch 15. And the model generates output JSON files. I use pycocoevalcap to evaluate the results. The results are as follows:

|Bleu_1 | Bleu_2 | Bleu_3 | Bleu_4 | CIDEr | ROUGE_L | METEOR| |-- | -- | -- | -- | -- | -- | --| |0.4334 | 0.2863 | 0.2069 | 0.1554 | 0.5432 | 0.3245 | 0.1945|

It seems different somewhere. Could you give me you test code or provide your generated results JSON file?
opened by aspenstarss 11
Problem on the visualization

Hi Zhihong, Thank you for sharing your code. I am interested in the Visualizations of image-text attention mapping part in the paper. Can you share which approaches you are using for this? (Other repository or code) I am trying to do this but didn't find a solution for the Transformer-based models.

opened by linzhlalala 5
Datasets did not contain all the data

Hi, According to your paper, the IU X-Ray dataset contains 5226, 748, and 1496 images on train, val, and test, respectively. However, the provided dataset you had published contains only 2069, 296, and 590 images, written in the annotation.json. Is the shared dataset you used for training and testing? If not, could you share your splitted dataset to me? MIMIC-CXR dataset also had a similar problem, with 270790, 2130, and 3858, for the splitted dataset, which did not match the number present in your paper.

opened by windstormer 2
AssertionError on MIMIC-CXR dataset

I am training the model on MIMIC-CXR dataset, but the training is always stop at epoch 11 and an AssertionError occurs. The attached file is the screenshot of the error. Thank you very much.

opened by WANGCHENYU123 2
BASE Model

Hi, Thanks for sharing the code. Your approach is very interesting. I wonder how I can run the code in baseline mode (Base Model in Table 2). Thanks in avdance!

opened by nooralahzadeh 2
Inference

Hello and thank you for sharing your work! I was wondering how to use your code and pretrained model weights to make an inference on custom data. Do you hace a script or function for that? Thank you very much in advance!

opened by luantunez 1
Calculation of clinical accuracy for MIMIC dataset

Hello:

Thanks for sharing the code.

While going through the code, I did not find the code that calculates the clinal accuracy (Precision, Recall, and F1) metrics calculation. Will it be possible to share that?

Thanks

opened by ashwanikumar04 1
Random seed

Hi, Could you please share the number that you used for the random seed to generate the results for 5 runs in your experiments on IU-XRAY and MIMIC? Thanks

opened by nooralahzadeh 1
The number images of mimic dataset

train_image_num 270790 val_image_num 2130 test_image_num 3858

Dear Zhihong,

Your code is nice and clean. Thank you so much! Based on the json file you provided, I can only find around 270000 images which is different from table 1 in your paper. The able has around 360000 images. Do you have any selection criteria?

Kind Regards, Donghao

opened by donghaozhang 1
MIMIC Dataset

Hello, I am very interested in your work. In the MIMIC dataset, I can't get the results in the paper according to the run_mimic_cxr.sh file. Can you provide the MIMIC data set so that I can learn more about your masterpiece? I have submitted the application for google driver for a long time

opened by ThatNight 0
Base Model without RM and MCLN

@zhjohnchan @GuiminChen Hey, I wanted to run the model without the Relational memory and MCLN. I tried to detach them by replacing MCLN with LN and completely ignoring the RM, but still some positional arguments error is occuring at DecoderLayer. Can you please guide me on how to do it?

Thanks.

opened by jainnipun11 0
Does difference between torch and torchvision versions cause a big difference in evaluation results?

hi~ As I am running the project on RTX3090 cuda 11.6, I have configured the project to run with torch version 1.8.1 and torchvisiond version 0.9.1. However, the difference between the evaluation results and the paper after training is between 1% and 3%. 1.Do I need to follow your requirements to the letter in order to get the original results? Is it possible that different versions of torch and torchvision may cause large differences in the evaluation results? 2.Also, do you use multiple GPUs for your training? If so, can you share the settings?

Looking forward to hearing from you~

opened by Ammexm 0
Access to MIMIC-CXR

The link for the MIMIC-CXR download has been restricted. What would be the way to request the access? Is the MIMIC-CXR used in the study the same as the MIMIC-CXR-JPG so I can just download from there?

Thanks

opened by 2533245542 0
Wrong number of patients on the official IU_XRAY instances in drive data link.

Hello authors. First of all, fascinating work 🚀! However as I was studying your research as well as your drive link you provided us, I found wrong number of instances for the patients that have only two images on IU-Xray dataset.

All data your provided us:

I download the official iu-xray dataset as well, and after conducting exploratory data analysis on data, I found that the patients that have only two images are more than you provided us :

opened by zaaachos 0

Generating Radiology Reports via Memory-driven Transformer

Related tags

Overview

R2Gen

Citations

Requirements

Download R2Gen

Datasets

Run on IU X-Ray

Run on MIMIC-CXR

Comments

Owner

CUHK-SZ NLP Group

1st-in-MICCAI2020-CPM - Combined Radiology and Pathology Classification

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects

PyTorch Code of "Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spatiotemporal Dynamics"

Episodic-memory - Ego4D Episodic Memory Benchmark

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

Source code of our BMVC 2021 paper: AniFormer: Data-driven 3D Animation with Transformer

Seach Losses of our paper 'Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search', accepted by ICLR 2021.

Sync2Gen Code for ICCV 2021 paper: Scene Synthesis via Uncertainty-Driven Attribute Synchronization

SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks (Scientific Reports)

Implementation of Hierarchical Transformer Memory (HTM) for Pytorch

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"