Solving reinforcement learning tasks which require language and vision

Henry Prior

Last update: Feb 26, 2022

Related tags

Deep Learning multimodal-rl

Overview

Multimodal Reinforcement Learning

JAX implementations of the following multimodal reinforcement learning approaches.

Dual-coding Episodic Memory from "Grounded Language Learning Fast and Slow"

The goal in this setting is for the agent to be presented with multiple objects with made up names following "This is a _____" statements and to then carry out an instruction such as "Move the wazzle to the table." This task requires the agent to learn long-term language and vision representations for concepts like "This is a" and objects that carry over between episodes such as "table" while also being able to learn one-shot representations of novel objects and their names.

Usage

Start by setting up the environment locally by running

poetry install
poetry shell

The learning environment depends on Docker and requires that the Docker Desktop program is running (on Mac). Once that's done you can run the default environment (fast mapping with 3 objects from the paper).

python fast_slow_learning/main.py

Deep learning library for solving differential equations and more

DeepXDE Voting on whether we should have a Slack channel for discussion. DeepXDE is a library for scientific machine learning. Use DeepXDE if you need

1.4k Dec 29, 2022

Code for Graph-to-Tree Learning for Solving Math Word Problems (ACL 2020)

Graph-to-Tree Learning for Solving Math Word Problems PyTorch implementation of Graph based Math Word Problem solver described in our ACL 2020 paper G

66 Nov 23, 2022

A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

A tour through tensorflow with financial data I present several models ranging in complexity from simple regression to LSTM and policy networks. The s

195 Dec 7, 2022

A module for solving and visualizing Schrödinger equation.

qmsolve This is an attempt at making a solid, easy to use solver, capable of solving and visualize the Schrödinger equation for multiple particles, an

506 Dec 28, 2022

IDRLnet, a Python toolbox for modeling and solving problems through Physics-Informed Neural Network (PINN) systematically.

IDRLnet IDRLnet is a machine learning library on top of PyTorch. Use IDRLnet if you need a machine learning library that solves both forward and inver

105 Dec 17, 2022

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

This is the official PyTorch implementation of the ALBEF paper [Blog]. This repository supports pre-training on custom datasets, as well as finetuning on VQA, SNLI-VE, NLVR2, Image-Text Retrieval on MSCOCO and Flickr30k, and visual grounding on RefCOCO+. Pre-trained and finetuned checkpoints are released.

805 Jan 9, 2023

Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision. ICCV 2021.

Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision Download links and PyTorch implementation of "Towers of Ba

40 Dec 14, 2022

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

Vision Longformer This project provides the source code for the vision longformer paper. Multi-Scale Vision Longformer: A New Vision Transformer for H

209 Dec 30, 2022

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

12.6k Jan 9, 2023

Solving reinforcement learning tasks which require language and vision

Related tags

Overview

Multimodal Reinforcement Learning

Usage

You might also like...

Deep learning library for solving differential equations and more

Code for Graph-to-Tree Learning for Solving Math Word Problems (ACL 2020)

A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

A module for solving and visualizing Schrödinger equation.

IDRLnet, a Python toolbox for modeling and solving problems through Physics-Informed Neural Network (PINN) systematically.

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision. ICCV 2021.

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Owner

Henry Prior

Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

Data and Code for ACL 2021 Paper "Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning"

CompilerGym is a library of easy to use and performant reinforcement learning environments for compiler tasks

Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

Reinforcement-learning - Repository of the class assignment questions for the course on reinforcement learning

gym-anm is a framework for designing reinforcement learning (RL) environments that model Active Network Management (ANM) tasks in electricity distribution networks.

A task-agnostic vision-language architecture as a step towards General Purpose Vision

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Repository providing a wide range of self-supervised pretrained models for computer vision tasks.