This repository provides the code for MedViLL(Medical Vision Language Learner).

SuperSuperMoon

Last update: Jan 5, 2023

Related tags

Deep Learning MedViLL

Overview

MedViLL

This repository provides the code for MedViLL(Medical Vision Language Learner).

Our proposed architecture MedViLL is a single BERT-based model that learns unified contextualized vision-language (VL) representation for both Vision Language Understanding (VLU) and Vision Language Generation (VLG). MedViLL performs pre-training with a CNN-based visual encoder and a cross-modal Transformer for VL joint representation learning. After pre-training, our model can be easily used for VLU and VLG tasks with task-specific finetuning. Please refer to our paper "Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training" for more details.

1) Downloads.

Pre-trained weights.

We provide five versions of BERT-based pre-trained weights with different types of self-attention masks. Pre-training for the joint embedding was built on the BERT-base architecutre(12 hidden layers, 12 attention heads, 768 hidden size), and training details are described in our paper. Currently avaliable versions of pre-trained weights are as follows:

MedViLL - BERT-Base model with Bidirectional Auto-regressive attention mask.
Bi & Seq2Seq - BERT-Base model with Seq2Seq attention mask(75%) and Bidirectional attention mask(25%) in every mini-batch.
Bidirectional - BERT-Base model with Bidirectional attention mask.
Seq2Seq - BERT-Base model with Seq2Seq attention mask.
Non-cross - BERT-Base model with Non-cross modality attention mask.

Datasets.

We provide a pre-processed version of multiple datasets for each task as follows:

Download each dataset to the path /data/[dataset].

MIMIC-CXR (2.27 GB): Unique study of 91,685 AP view image and associated report pairs.
OPEN-I (74.1 MB): Unique study of 3,547 AP and PA image-report pairs from the official Open-I dataset.
VQA-RAD (402 MB): 3,515 question answer pairs on 315 images (104 head CTs or MRIs, 107 Chest X-rays, and 104 abdominal CTs).

We also provide the JSON file with the path for validation in the retrieval task, download each files to the path /data/[dataset]. Image to report retrieval

MIMIC valid, 2) MIMIC test, 3) OpenI test

Report to Image retrieval

MIMIC valid, 2) MIMIC test, 3) OpenI test

2) Reproduce.

Section A. Installation

Sections below describe the virtual env installation and the fine-training process of MedviLL based on pytorch version 1.7, python version 3.8. To fine-tune MedViLL, you need to download the pre-trained weights of MedViLL. After downloading the pre-trained weights, use medvill.yaml to install conda based virtual env as follows:

$ git clone https://github.com/SuperSupermoon/MedViLL.git
$ cd MedViLL; conda env create --file medvill.yaml

Note that all fine-tuning models were conducted on 8 Geforce RTX-3090 GPU machines, each of which has 24GB of VRAM.

Section B. Prepare pre-processed dataset

Unzip mimic, openi, and VQA-RAD tar.gz files.

$ cd MedViLL; tar -zxvf [file_name.tar.gz]

Section C. Pre-training model

Example:

$ cd MedViLL
$ python main.py

Section D. Downstream model

Diagnosis Classification Example:

$ cd MedViLL/downstream_task/classification
$ python cls.py

Image-Report Retrieval Example:

$ cd MedViLL/downstream_task/retrieval
$ python retrieval.py

Medical Visual Qestion Answering Example:

$ cd MedViLL/downstream_task/report_generation_and_vqa
$ python finetune.py --tasks vqa --s2s_prob 0 --bi_prob 1 --mask_prob 0

Report Generation Example:

$ cd MedViLL/downstream_task/report_generation_and_vqa
$ python finetune.py --tasks report_generation --mask_prob 0.15 --s2s_prob 1 --bi_prob 0

Comments

Training Error

Hey! I unzipped the images in the suggested path, but still I keep getting the:

FileNotFoundError: [Errno 2] No such file or directory: '/home/data_storage/mimic-cxr/dataset/image_preprocessing/re_512_3ch/Train/s50328096.jpg'

Can you elaborate why this error is coming?

Thanks.

opened by jainnipun11 4
Cannot achieve the various metrics described in the paper after running the code.

Hello, your code cannot achieve the various metrics described in the paper after running. Could you please provide the hyperparameters you set before running, and the training log after running.

opened by PengPeixi 3
dataset path

nice work! I followed the "readme" and tried to run this code on VQA-RAD, but i meet some problems about the datasets' path. I revise the "image_root" and "scr_file", and get some path errors like "/home/mimic-cxr/dataset/.....". I also tried to change the "fixed_path", but the datasets you provided are not same as the code, like "/home/mimic-cxr/dataset/vqa_image/" . How can I do?

opened by xixihawokao 2
.jsonl image_path issue

Inside the .jsonl file, the image path mentioned doesn't match with the image path I have for the images. Due to which, post-training an error comes:

FileNotFoundError: [Errno 2] No such file or directory: '/home/mimic-cxr/dataset/image_preprocessing/re_512_3ch/Train/s58694447.jpg'

Could you please guide me? Thanks.

opened by jainnipun11 2
Fetching relevants position embeddings instead of all. May increase s…

Hi Relevant position embeddings were only the first one and last one, but all were being calculated. I updated the code to focus only on relevant position embeddings (first and last one). I am hoping that it will improve speed and will also save some memory as well.

opened by hammad26 2
funetuning error

sir i am getting this error while funetuning Medical Visual Qestion Answering Example Traceback (most recent call last): File "finetune.py", line 476, in main() File "finetune.py", line 411, in main for step, batch in enumerate(iter_bar): File "/home/cse/ckm/py37/lib/python3.7/site-packages/tqdm/std.py", line 1195, in iter for obj in iterable: File "/home/cse/ckm/py37/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 644, in next data = self._next_data() File "/home/cse/ckm/py37/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1337, in _next_data return self._process_data(data) File "/home/cse/ckm/py37/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1363, in _process_data data.reraise() File "/home/cse/ckm/py37/lib/python3.7/site-packages/torch/_utils.py", line 475, in reraise raise exception RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/cse/ckm/py37/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/home/cse/ckm/py37/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch return self.collate_fn(data) File "/home/cse/ckm/code/seelam/code/MedViLL/downstream_task/report_generation_and_vqa/loader_utils.py", line 23, in batch_list_to_batch_tensors batch_tensors.append(torch.stack(x)) RuntimeError: stack expects each tensor to be equal size, but got [3, 434, 329] at entry 0 and [3, 642, 1024] at entry 1

can you please check it

opened by praveenkumar-ctrl 1
Recall calculation

In the results provided in the paper did you considered a positive pair for the recall calculation only as the original report- image pair or as all pairs with same positive chexpert labels? In addition, you mentioned in the paper that during inference, in each trial a model is given 100 report-image pairs , is the retrieval is in each time out of 100 and the average provided in the paper is the average of all those 15 100 pairs?

opened by gefend 1
Finetuned Models

It looks that the shared models are pretrained models before finetuning. Would it be possible to get access to two finetuned models (MIMIC-CXR/Open-I generation finetuned models)? I would use them exclusively for research purposes and cite this work accordingly. Thank you so much for this repository.

Jungo https://homes.cs.washington.edu/~jkasai/

opened by jungokasai 1
Open I dataset
Hello,

Thank you for your paper and repo. I have a few of questions regarding your use of the OpenI dataset.

The original OpenI dataset has around 1500 manual labels, from which you only kept 15. Is the code of this transformation somewhere in this repo?

In the original OpenI dataset, around 40% of the cases are annotated as normal, but in the Figure 5 of your paper it seems that the "No Findings" tag appears only 4.43% of the times.

In your json files (for example in MedViLL/data/openi/Train.jsonl), some times the label is set as an empty string. Does this mean that this instance belongs to the "Others" category?

Since MedViLL can be asked to do multilabel classification, it is possible that the "No finding" label can be predicted together with another label. Isn't it wrong to predict "No finding" and "Pneumonia" together, since those labels are conflicting?

Thank you.
opened by Christoforos00 1
Issues related to radiology report generation

Hi, I want to run the report_label_eval.py file to get the accuracy, precision, recall and F1 of the fine-tuned model. But I can't find any files in the github repo that generate positive, negative and ambiguous tags. Could you please provide this file. In addition, the github repository shows that beam_size argument does not exist in generation decode. How can I get BLEU score with a beam size of 4.

opened by PengPeixi 0
[help]request for training logs

@SuperSupermoon Hi Moon, I have made some changes to the MedViLL. It would be greatly appreciated if you could share the training logs, which will be very helpful for me to compare the models. Thanks.

opened by Adam-lxd 0
About retrieval finetune question?

Can you share with the parameters setting during retrieval finetune? I try to use the define settings of code , but the results have very difference between the paper.

opened by Subury 2
Error installing environment
Hey, I tried to follow your steps for medville installation and I had some erros installing packages in Windows 11 with 3.8 python environment and conda 22.9.0.

I create a new environment with: conda create -n medvill python=3.8 and update environment conda env update --name medvill --file medvill.yaml --prune

those throw me the following error:

Solving environment: failed

ResolvePackageNotFound:

pandas==1.1.3=py38he6710b0_0

setuptools==50.3.0=py38hb0f4dca_1

libffi==3.3=he6710b0_2

pytorch==1.7.0=py3.8_cuda11.0.221_cudnn8.0.3_0

pillow==8.0.1=py38he98fc37_0

mkl_fft==1.2.0=py38h23d657b_0

cudatoolkit==11.0.221=h6bb024c_0

zlib==1.2.11=h7b6447c_3

sqlite==3.33.0=h62c20be_0

readline==8.0=h7b6447c_0

xz==5.2.5=h7b6447c_0

python==3.8.5=h7579374_1

libpng==1.6.37=hbc83047_0

numpy-base==1.19.2=py38hfa32c7d_0

libuv==1.40.0=h7b6447c_0

tk==8.6.10=hbc83047_0

ld_impl_linux-64==2.33.1=h53a641e_7

freetype==2.10.4=h5ab3b9f_0

mkl_random==1.1.1=py38h0573a6f_0

numpy==1.19.2=py38h54aff64_0

libstdcxx-ng==9.1.0=hdf63c60_0

lz4-c==1.9.2=heb0550a_3

certifi==2020.11.8=py38h06a4308_0

libtiff==4.1.0=h2733197_1

libgcc-ng==9.1.0=hdf63c60_0

jpeg==9b=h024ee3a_2

ncurses==6.2=he6710b0_1

mkl-service==2.3.0=py38he904b0f_0

openssl==1.1.1h=h7b6447c_0

lcms2==2.11=h396b838_0

libedit==3.1.20191231=h14c3975_1

zstd==1.4.5=h9ceee32_0

Also tried install with pip in the environment (medvill) throwing numpy dependencies issues and after manually install 1.19, then torch have conflicting dependencies. Any help would be appreciated ty.
opened by Zickbad 0
CVE-2007-4559 Patch

Patching CVE-2007-4559

Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

opened by TrellixVulnTeam 0
Attention Visualization

Hi! Thanks for the awesome work. Could you share the code for the attention visualization experiment at the end of the paper? Or could you share which attention layers were used for the visualization. Thank you!

Best, Khoa

opened by khoapip 1

This repository provides the code for MedViLL(Medical Vision Language Learner).

Related tags

Overview

MedViLL

1) Downloads.

Pre-trained weights.

Datasets.

2) Reproduce.

Section A. Installation

Section B. Prepare pre-processed dataset

Section C. Pre-training model

Section D. Downstream model

Comments

Patching CVE-2007-4559

Owner

SuperSuperMoon

Pytorch Code for "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation"

Implementation of "Meta-rPPG: Remote Heart Rate Estimation Using a Transductive Meta-Learner"

This repo contains the pytorch implementation for Dynamic Concept Learner (accepted by ICLR 2021).

Neural Motion Learner With Python

The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner [Li et al., 2020].

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

A task-agnostic vision-language architecture as a step towards General Purpose Vision

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

This repository builds a basic vision transformer from scratch so that one beginner can understand the theory of vision transformer.

The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

Vision Transformer for 3D medical image registration (Pytorch).

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

This repository is an unoffical PyTorch implementation of Medical segmentation in 3D and 2D.

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)