This repository provides the code for MedViLL(Medical Vision Language Learner).

Overview

MedViLL

This repository provides the code for MedViLL(Medical Vision Language Learner).


Our proposed architecture MedViLL is a single BERT-based model that learns unified contextualized vision-language (VL) representation for both Vision Language Understanding (VLU) and Vision Language Generation (VLG). MedViLL performs pre-training with a CNN-based visual encoder and a cross-modal Transformer for VL joint representation learning. After pre-training, our model can be easily used for VLU and VLG tasks with task-specific finetuning. Please refer to our paper "Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training" for more details.

1) Downloads.

Pre-trained weights.

We provide five versions of BERT-based pre-trained weights with different types of self-attention masks. Pre-training for the joint embedding was built on the BERT-base architecutre(12 hidden layers, 12 attention heads, 768 hidden size), and training details are described in our paper. Currently avaliable versions of pre-trained weights are as follows:

  • MedViLL - BERT-Base model with Bidirectional Auto-regressive attention mask.

  • Bi & Seq2Seq - BERT-Base model with Seq2Seq attention mask(75%) and Bidirectional attention mask(25%) in every mini-batch.

  • Bidirectional - BERT-Base model with Bidirectional attention mask.

  • Seq2Seq - BERT-Base model with Seq2Seq attention mask.

  • Non-cross - BERT-Base model with Non-cross modality attention mask.

Datasets.

We provide a pre-processed version of multiple datasets for each task as follows:

Download each dataset to the path /data/[dataset].

  • MIMIC-CXR (2.27 GB): Unique study of 91,685 AP view image and associated report pairs.
  • OPEN-I (74.1 MB): Unique study of 3,547 AP and PA image-report pairs from the official Open-I dataset.
  • VQA-RAD (402 MB): 3,515 question answer pairs on 315 images (104 head CTs or MRIs, 107 Chest X-rays, and 104 abdominal CTs).

We also provide the JSON file with the path for validation in the retrieval task, download each files to the path /data/[dataset]. Image to report retrieval

  1. MIMIC valid, 2) MIMIC test, 3) OpenI test

Report to Image retrieval

  1. MIMIC valid, 2) MIMIC test, 3) OpenI test

2) Reproduce.

Section A. Installation

Sections below describe the virtual env installation and the fine-training process of MedviLL based on pytorch version 1.7, python version 3.8. To fine-tune MedViLL, you need to download the pre-trained weights of MedViLL. After downloading the pre-trained weights, use medvill.yaml to install conda based virtual env as follows:

$ git clone https://github.com/SuperSupermoon/MedViLL.git
$ cd MedViLL; conda env create --file medvill.yaml

Note that all fine-tuning models were conducted on 8 Geforce RTX-3090 GPU machines, each of which has 24GB of VRAM.

Section B. Prepare pre-processed dataset

Unzip mimic, openi, and VQA-RAD tar.gz files.

$ cd MedViLL; tar -zxvf [file_name.tar.gz]

Section C. Pre-training model

Example:

$ cd MedViLL
$ python main.py

Section D. Downstream model

  • Diagnosis Classification Example:
$ cd MedViLL/downstream_task/classification
$ python cls.py
  • Image-Report Retrieval Example:
$ cd MedViLL/downstream_task/retrieval
$ python retrieval.py
  • Medical Visual Qestion Answering Example:
$ cd MedViLL/downstream_task/report_generation_and_vqa
$ python finetune.py --tasks vqa --s2s_prob 0 --bi_prob 1 --mask_prob 0
  • Report Generation Example:
$ cd MedViLL/downstream_task/report_generation_and_vqa
$ python finetune.py --tasks report_generation --mask_prob 0.15 --s2s_prob 1 --bi_prob 0
Comments
  • Training Error

    Training Error

    Hey! I unzipped the images in the suggested path, but still I keep getting the:

    FileNotFoundError: [Errno 2] No such file or directory: '/home/data_storage/mimic-cxr/dataset/image_preprocessing/re_512_3ch/Train/s50328096.jpg'

    Can you elaborate why this error is coming?

    Thanks.

    opened by jainnipun11 4
  • Cannot achieve the various metrics described in the paper after running the code.

    Cannot achieve the various metrics described in the paper after running the code.

    Hello, your code cannot achieve the various metrics described in the paper after running. Could you please provide the hyperparameters you set before running, and the training log after running.

    opened by PengPeixi 3
  • dataset path

    dataset path

    nice work! I followed the "readme" and tried to run this code on VQA-RAD, but i meet some problems about the datasets' path. I revise the "image_root" and "scr_file", and get some path errors like "/home/mimic-cxr/dataset/.....". I also tried to change the "fixed_path", but the datasets you provided are not same as the code, like "/home/mimic-cxr/dataset/vqa_image/" . How can I do?

    opened by xixihawokao 2
  • .jsonl image_path issue

    .jsonl image_path issue

    Inside the .jsonl file, the image path mentioned doesn't match with the image path I have for the images. Due to which, post-training an error comes:

    FileNotFoundError: [Errno 2] No such file or directory: '/home/mimic-cxr/dataset/image_preprocessing/re_512_3ch/Train/s58694447.jpg'

    Could you please guide me? Thanks.

    opened by jainnipun11 2
  • Fetching relevants position embeddings instead of all. May increase s…

    Fetching relevants position embeddings instead of all. May increase s…

    Hi Relevant position embeddings were only the first one and last one, but all were being calculated. I updated the code to focus only on relevant position embeddings (first and last one). I am hoping that it will improve speed and will also save some memory as well.

    opened by hammad26 2
  • funetuning error

    funetuning error

    sir i am getting this error while funetuning Medical Visual Qestion Answering Example Traceback (most recent call last): File "finetune.py", line 476, in main() File "finetune.py", line 411, in main for step, batch in enumerate(iter_bar): File "/home/cse/ckm/py37/lib/python3.7/site-packages/tqdm/std.py", line 1195, in iter for obj in iterable: File "/home/cse/ckm/py37/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 644, in next data = self._next_data() File "/home/cse/ckm/py37/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1337, in _next_data return self._process_data(data) File "/home/cse/ckm/py37/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1363, in _process_data data.reraise() File "/home/cse/ckm/py37/lib/python3.7/site-packages/torch/_utils.py", line 475, in reraise raise exception RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/cse/ckm/py37/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/home/cse/ckm/py37/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch return self.collate_fn(data) File "/home/cse/ckm/code/seelam/code/MedViLL/downstream_task/report_generation_and_vqa/loader_utils.py", line 23, in batch_list_to_batch_tensors batch_tensors.append(torch.stack(x)) RuntimeError: stack expects each tensor to be equal size, but got [3, 434, 329] at entry 0 and [3, 642, 1024] at entry 1

    can you please check it

    opened by praveenkumar-ctrl 1
  • Recall calculation

    Recall calculation

    In the results provided in the paper did you considered a positive pair for the recall calculation only as the original report- image pair or as all pairs with same positive chexpert labels? In addition, you mentioned in the paper that during inference, in each trial a model is given 100 report-image pairs , is the retrieval is in each time out of 100 and the average provided in the paper is the average of all those 15 100 pairs?

    opened by gefend 1
  • Finetuned Models

    Finetuned Models

    It looks that the shared models are pretrained models before finetuning. Would it be possible to get access to two finetuned models (MIMIC-CXR/Open-I generation finetuned models)? I would use them exclusively for research purposes and cite this work accordingly. Thank you so much for this repository.

    Jungo https://homes.cs.washington.edu/~jkasai/

    opened by jungokasai 1
  • Open I dataset

    Open I dataset

    Hello,

    Thank you for your paper and repo. I have a few of questions regarding your use of the OpenI dataset.

    1. The original OpenI dataset has around 1500 manual labels, from which you only kept 15. Is the code of this transformation somewhere in this repo?
    2. In the original OpenI dataset, around 40% of the cases are annotated as normal, but in the Figure 5 of your paper it seems that the "No Findings" tag appears only 4.43% of the times.
    3. In your json files (for example in MedViLL/data/openi/Train.jsonl), some times the label is set as an empty string. Does this mean that this instance belongs to the "Others" category?
    4. Since MedViLL can be asked to do multilabel classification, it is possible that the "No finding" label can be predicted together with another label. Isn't it wrong to predict "No finding" and "Pneumonia" together, since those labels are conflicting?

    Thank you.

    opened by Christoforos00 1
  • Issues related to radiology report generation

    Issues related to radiology report generation

    Hi, I want to run the report_label_eval.py file to get the accuracy, precision, recall and F1 of the fine-tuned model. But I can't find any files in the github repo that generate positive, negative and ambiguous tags. Could you please provide this file. In addition, the github repository shows that beam_size argument does not exist in generation decode. How can I get BLEU score with a beam size of 4.

    opened by PengPeixi 0
  • [help]request for training logs

    [help]request for training logs

    @SuperSupermoon Hi Moon, I have made some changes to the MedViLL. It would be greatly appreciated if you could share the training logs, which will be very helpful for me to compare the models. Thanks.

    opened by Adam-lxd 0
  • About retrieval finetune question?

    About retrieval finetune question?

    Can you share with the parameters setting during retrieval finetune? I try to use the define settings of code , but the results have very difference between the paper.

    opened by Subury 2
  • Error installing environment

    Error installing environment

    Hey, I tried to follow your steps for medville installation and I had some erros installing packages in Windows 11 with 3.8 python environment and conda 22.9.0.

    I create a new environment with: conda create -n medvill python=3.8 and update environment conda env update --name medvill --file medvill.yaml --prune

    those throw me the following error:

    Solving environment: failed

    ResolvePackageNotFound:

    • pandas==1.1.3=py38he6710b0_0
    • setuptools==50.3.0=py38hb0f4dca_1
    • libffi==3.3=he6710b0_2
    • pytorch==1.7.0=py3.8_cuda11.0.221_cudnn8.0.3_0
    • pillow==8.0.1=py38he98fc37_0
    • mkl_fft==1.2.0=py38h23d657b_0
    • cudatoolkit==11.0.221=h6bb024c_0
    • zlib==1.2.11=h7b6447c_3
    • sqlite==3.33.0=h62c20be_0
    • readline==8.0=h7b6447c_0
    • xz==5.2.5=h7b6447c_0
    • python==3.8.5=h7579374_1
    • libpng==1.6.37=hbc83047_0
    • numpy-base==1.19.2=py38hfa32c7d_0
    • libuv==1.40.0=h7b6447c_0
    • tk==8.6.10=hbc83047_0
    • ld_impl_linux-64==2.33.1=h53a641e_7
    • freetype==2.10.4=h5ab3b9f_0
    • mkl_random==1.1.1=py38h0573a6f_0
    • numpy==1.19.2=py38h54aff64_0
    • libstdcxx-ng==9.1.0=hdf63c60_0
    • lz4-c==1.9.2=heb0550a_3
    • certifi==2020.11.8=py38h06a4308_0
    • libtiff==4.1.0=h2733197_1
    • libgcc-ng==9.1.0=hdf63c60_0
    • jpeg==9b=h024ee3a_2
    • ncurses==6.2=he6710b0_1
    • mkl-service==2.3.0=py38he904b0f_0
    • openssl==1.1.1h=h7b6447c_0
    • lcms2==2.11=h396b838_0
    • libedit==3.1.20191231=h14c3975_1
    • zstd==1.4.5=h9ceee32_0

    Also tried install with pip in the environment (medvill) throwing numpy dependencies issues and after manually install 1.19, then torch have conflicting dependencies. Any help would be appreciated ty.

    opened by Zickbad 0
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • Attention Visualization

    Attention Visualization

    Hi! Thanks for the awesome work. Could you share the code for the attention visualization experiment at the end of the paper? Or could you share which attention layers were used for the visualization. Thank you!

    Best, Khoa

    opened by khoapip 1
Owner
SuperSuperMoon
PhD student at Graduate School of AI, KAIST. Medical AI. Computer Vision & NLP.
SuperSuperMoon
Pytorch Code for "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation"

Medical-Transformer Pytorch Code for the paper "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation" About this repo: This repo

Jeya Maria Jose 615 Dec 25, 2022
Implementation of "Meta-rPPG: Remote Heart Rate Estimation Using a Transductive Meta-Learner"

Meta-rPPG: Remote Heart Rate Estimation Using a Transductive Meta-Learner This repository is the official implementation of Meta-rPPG: Remote Heart Ra

Eugene Lee 137 Dec 13, 2022
This repo contains the pytorch implementation for Dynamic Concept Learner (accepted by ICLR 2021).

DCL-PyTorch Pytorch implementation for the Dynamic Concept Learner (DCL). More details can be found at the project page. Framework Grounding Physical

Zhenfang Chen 31 Jan 6, 2023
Neural Motion Learner With Python

Neural Motion Learner Introduction This work is to extract skeletal structure from volumetric observations and to learn motion dynamics from the detec

Jinseok Bae 14 Nov 28, 2022
The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Published by SpaceML • About SpaceML • Quick Colab Example Self-Supervised Learner The Self-Supervised Learner can be used to train a classifier with

SpaceML 92 Nov 30, 2022
PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner [Li et al., 2020].

VGPL-Visual-Prior PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner (VGPL). Give

Toru 8 Dec 29, 2022
The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

MIC-DKFZ 1.2k Jan 4, 2023
A task-agnostic vision-language architecture as a step towards General Purpose Vision

Towards General Purpose Vision Systems By Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, and Derek Hoiem Overview Welcome to the official code base f

AI2 79 Dec 23, 2022
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Salesforce 1.3k Dec 31, 2022
This repository builds a basic vision transformer from scratch so that one beginner can understand the theory of vision transformer.

vision-transformer-from-scratch This repository includes several kinds of vision transformers from scratch so that one beginner can understand the the

null 1 Dec 24, 2021
The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization [Paper] accepted at the EMNLP 2021: Vision Guided Genera

CAiRE 42 Jan 7, 2023
This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

BALLAD This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model. Requirements Python3 Pytorch(1.7.

peng gao 11 Dec 1, 2021
Vision Transformer for 3D medical image registration (Pytorch).

ViT-V-Net: Vision Transformer for Volumetric Medical Image Registration keywords: vision transformer, convolutional neural networks, image registratio

Junyu Chen 192 Dec 20, 2022
Alex Pashevich 62 Dec 24, 2022
UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus General info This is

null 71 Oct 25, 2022
This repository is an unoffical PyTorch implementation of Medical segmentation in 3D and 2D.

Pytorch Medical Segmentation Read Chinese Introduction:Here! Recent Updates 2021.1.8 The train and test codes are released. 2021.2.6 A bug in dice was

EasyCV-Ellis 618 Dec 27, 2022
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

Vision Longformer This project provides the source code for the vision longformer paper. Multi-Scale Vision Longformer: A New Vision Transformer for H

Microsoft 209 Dec 30, 2022
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Phil Wang 12.6k Jan 9, 2023
Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh

Arjun Majumdar 44 Dec 14, 2022