Joint learning of images and text via maximization of mutual information

Ruizhi Liao

Last update: Dec 22, 2022

Related tags

Deep Learning mutual_info_img_txt

Overview

mutual_info_img_txt

Joint learning of images and text via maximization of mutual information.

This repository incorporates the algorithms presented in
Ruizhi Liao, Daniel Moyer, Miriam Cha, Keegan Quigley, Seth Berkowitz, Steven Horng, Polina Golland, William M Wells. Multimodal Representation Learning via Maximization of Local Mutual Information. International Conference on Medical Image Computing and Computer-Assisted Intervention, 2021.

This repo is a work-in-progress. As of now, we have released the code for joint representation learning of images and text by maximizing the mutual information between the feature embeddings of the two modalities. We demonstrate its application in learning from chest radiographs and radiology reports.

Instructions

Conda environment

Set up the conda environment using conda_environment.yml:

conda env create -f conda_environment.yml

BERT

Download the pre-trained BERT model, tokenizer, etc. from Dropbox. You should download the folder bert_pretrain_all_notes_150000 that contains seven files. The path to bert_pretrain_all_notes_150000 should be passed to --bert_pretrained_dir.

Model training

Train the model in an unsupervised fashion, i.e., optimizing Eq (2):

python train_img_txt.py

When you run model training for the first time, it may take a while to tokenize the text. Afterwards, this process won't be repeated and the tokenized data will be saved for reuse.

Notes on Data

MIMIC-CXR

We have experimented this algorithm on MIMIC-CXR, which is a large publicly available dataset of chest x-ray images with free-text radiology reports. The dataset contains 377,110 images corresponding to 227,835 radiographic studies performed at the Beth Israel Deaconess Medical Center in Boston, MA.

Example data

We provide 16 example image-text pairs to test the code, listed in training_chexpert_mini.csv.

Contact

Ruizhi (Ray) Liao: ruizhi [at] mit.edu

This repo is a PyTorch implementation for Paper "Unsupervised Learning for Cuboid Shape Abstraction via Joint Segmentation from Point Clouds"

Unsupervised Learning for Cuboid Shape Abstraction via Joint Segmentation from Point Clouds This repository is a PyTorch implementation for paper: Uns

42 Dec 9, 2022

Code release for The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification (TIP 2020)

The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification Code release for The Devil is in the Channels: Mutual-Channel

230 Dec 31, 2022

Official PyTorch code for Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution (MANet, ICCV2021)

Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution (MANet, ICCV2021) This repository is the official PyTorc

139 Dec 29, 2022

Code for "Single-view robot pose and joint angle estimation via render & compare", CVPR 2021 (Oral).

Single-view robot pose and joint angle estimation via render & compare Yann Labbé, Justin Carpentier, Mathieu Aubry, Josef Sivic CVPR: Conference on C

51 Oct 14, 2022

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

183 Jan 3, 2023

Framework for joint representation learning, evaluation through multimodal registration and comparison with image translation based approaches

CoMIR: Contrastive Multimodal Image Representation for Registration Framework 🖼 Registration of images in different modalities with Deep Learning 🤖

55 Dec 9, 2022

Joint Learning of 3D Shape Retrieval and Deformation, CVPR 2021

Joint learning of images and text via maximization of mutual information

Related tags

Overview

mutual_info_img_txt

Instructions

Conda environment

BERT

Model training

Notes on Data

MIMIC-CXR

Example data

Contact

You might also like...

This repo is a PyTorch implementation for Paper "Unsupervised Learning for Cuboid Shape Abstraction via Joint Segmentation from Point Clouds"

Code release for The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification (TIP 2020)

Official PyTorch code for Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution (MANet, ICCV2021)

Code for "Single-view robot pose and joint angle estimation via render & compare", CVPR 2021 (Oral).

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

Framework for joint representation learning, evaluation through multimodal registration and comparison with image translation based approaches

Joint Learning of 3D Shape Retrieval and Deformation, CVPR 2021

Joint Discriminative and Generative Learning for Person Re-identification. CVPR'19 (Oral)

Official implementation of the ICCV 2021 paper "Joint Inductive and Transductive Learning for Video Object Segmentation"

Owner

Ruizhi Liao

The implement of papar "Enhanced Graph Learning for Collaborative Filtering via Mutual Information Maximization"

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

Self-Supervised Learning with Kernel Dependence Maximization

Code for paper ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop.

RE3: State Entropy Maximization with Random Encoders for Efficient Exploration

Python Implementation of algorithms in Graph Mining, e.g., Recommendation, Collaborative Filtering, Community Detection, Spectral Clustering, Modularity Maximization, co-authorship networks.

(AAAI2020)Grapy-ML: Graph Pyramid Mutual Learning for Cross-dataset Human Parsing

Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations

Radar-to-Lidar: Heterogeneous Place Recognition via Joint Learning