146 Repositories
Python multimodal-interactions Libraries
This script checks for any possible SSRF dns/http interactions in xmlrpc.php pingback feature
rpckiller This script checks for any possible SSRF dns/http interactions in xmlrpc.php pingback feature and with that you can further try to escalate
Discord Panel is an AIO panel for Discord that aims to have all the needed tools related to user token interactions, as in nuking and also everything you could possibly need for raids
Discord Panel Discord Panel is an AIO panel for Discord that aims to have all the needed tools related to user token interactions, as in nuking and al
UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities
Pre-trained (foundation) models across tasks (understanding, generation and translation), languages (100+ languages), and modalities (language, image, audio, vision + language, audio + language, etc.)
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
img2dataset Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine. Also supports
This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.
MultiModal-InfoMax This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Informa
The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".
Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization [Paper] accepted at the EMNLP 2021: Vision Guided Genera
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
Multimodal Deep Learning ๐ ๐ ๐ Announcing the multimodal deep learning repository that contains implementation of various deep learning-based model
Generate vibrant and detailed images using only text.
CLIP Guided Diffusion From RiversHaveWings. Generate vibrant and detailed images using only text. See captions and more generations in the Gallery See
Official code for "Focal Self-attention for Local-Global Interactions in Vision Transformers"
Focal Transformer This is the official implementation of our Focal Transformer -- "Focal Self-attention for Local-Global Interactions in Vision Transf
Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision. ICCV 2021.
Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision Download links and PyTorch implementation of "Towers of Ba
FairyTailor: Multimodal Generative Framework for Storytelling
FairyTailor: Multimodal Generative Framework for Storytelling
It is a temporary project to study discord interactions. You can set permissions conveniently when you invite a particular disk code bot.
Permission Bot ๋์ค์ฝ๋ ๋ด์ ์๋ message-components ๋ฅผ ์ฐ๊ตฌํ๊ธฐ ์ํ์ฌ ์ ์๋ ๋ด์ ๋๋ค. Setup /config/config_example.ini ํ์ผ์ /config/config.ini์ผ๋ก ๋ณํํฉ๋๋ค. config ํ์ผ์ ๊ธฐ๋ณธ ์์์ ์
Vvim - Keyboardless Vim interactions
This is done via a hardware glove that the user wears. The glove detects the finger's positions and translates them into key presses. It's currently a work in progress.
multimodal transformer
This repo holds the code to perform experiments with the multimodal autoregressive probabilistic model Transflower. Overview of the repo It is structu
CLIP: Connecting Text and Image (Learning Transferable Visual Models From Natural Language Supervision)
CLIP (Contrastive LanguageโImage Pre-training) Experiments (Evaluation) Model Dataset Acc (%) ViT-B/32 (Paper) CIFAR100 65.1 ViT-B/32 (Our) CIFAR100 6
Code for SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations
The Second Situated Interactive MultiModal Conversations (SIMMC 2.0) Challenge 2021 Welcome to the Second Situated Interactive Multimodal Conversation
Preprocessed Datasets for our Multimodal NER paper
Unified Multimodal Transformer (UMT) for Multimodal Named Entity Recognition (MNER) Two MNER Datasets and Codes for our ACL'2020 paper: Improving Mult
Smoking Simulation is an app to simulate the spreading of smokers and non-smokers, their interactions and population during certain amount of time.
Smoking Simulation is an app to simulate the spreading of smokers and non-smokers, their interactions and population during certain
Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.
Episodic Transformers (E.T.) Episodic Transformer for Vision-and-Language Navigation Alexander Pashevich, Cordelia Schmid, Chen Sun Episodic Transform
Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time
Semi Hand-Object Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time (CVPR 2021).
MERLOT: Multimodal Neural Script Knowledge Models
merlot MERLOT: Multimodal Neural Script Knowledge Models MERLOT is a model for learning what we are calling "neural script knowledge" -- representatio
Framework for joint representation learning, evaluation through multimodal registration and comparison with image translation based approaches
CoMIR: Contrastive Multimodal Image Representation for Registration Framework ๐ผ Registration of images in different modalities with Deep Learning ๐ค
Source code for "MusCaps: Generating Captions for Music Audio" (IJCNN 2021)
MusCaps: Generating Captions for Music Audio Ilaria Manco1 2, Emmanouil Benetos1, Elio Quinton2, Gyorgy Fazekas1 1 Queen Mary University of London, 2
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
MMF is a modular framework for vision and language multimodal research from Facebook AI Research. MMF contains reference implementations of state-of-t
SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks (Scientific Reports)
SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks Molecular interaction networks are powerful resources for the discovery. While dee
Code and datasets for the paper "Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction" (RA-L, 2021)
Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction This is the code for the paper Combining E
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
The implementation of paper CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval. CLIP4Clip is a video-text retrieval model based
PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos
PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos. By adopting a unified pipeline-based API design, PyKale enforces standardization and minimalism, via reusing existing resources, reducing repetitions and redundancy, and recycling learning models across areas.
Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020
XDVioDet Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020. The proj
Rethinking the U-Net architecture for multimodal biomedical image segmentation
MultiResUNet Rethinking the U-Net architecture for multimodal biomedical image segmentation This repository contains the original implementation of "M
Deep Multimodal Neural Architecture Search
MMNas: Deep Multimodal Neural Architecture Search This repository corresponds to the PyTorch implementation of the MMnas for visual question answering
gitใSelf-Attention Attribution: Interpreting Information Interactions Inside Transformerใ(AAAI 2021) GitHub:
Self-Attention Attribution This repository contains the implementation for AAAI-2021 paper Self-Attention Attribution: Interpreting Information Intera
CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"
Diverse Structure Inpainting ArXiv | Papar | Supplementary Material | BibTex This repository is for the CVPR 2021 paper, "Generating Diverse Structure
[CIKM 2019] Code and dataset for "Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction"
FiGNN for CTR prediction The code and data for our paper in CIKM2019: Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Predicti
Learning Intents behind Interactions with Knowledge Graph for Recommendation, WWW2021
Learning Intents behind Interactions with Knowledge Graph for Recommendation This is our PyTorch implementation for the paper: Xiang Wang, Tinglin Hua
This repo provides the official code for TransBTS: Multimodal Brain Tumor Segmentation Using Transformer (https://arxiv.org/pdf/2103.04430.pdf).
TransBTS: Multimodal Brain Tumor Segmentation Using Transformer This repo is the official implementation for TransBTS: Multimodal Brain Tumor Segmenta
Solving reinforcement learning tasks which require language and vision
Multimodal Reinforcement Learning JAX implementations of the following multimodal reinforcement learning approaches. Dual-coding Episodic Memory from
This package allows interactions with the BuyCoins API.
The BuyCoins Python library allows interactions with the BuyCoins API from applications written in Python.
๐ท Instagram Bot - Tool for automated Instagram interactions
InstaPy Tooling that automates your social media interactions to โfarmโ Likes, Comments, and Followers on Instagram Implemented in Python using the Se
Typed interactions with the GitHub API v3
PyGitHub PyGitHub is a Python library to access the GitHub API v3 and Github Enterprise API v3. This library enables you to manage GitHub resources su
Automatically mock your HTTP interactions to simplify and speed up testing
VCR.py ๐ผ This is a Python version of Ruby's VCR library. Source code https://github.com/kevin1024/vcrpy Documentation https://vcrpy.readthedocs.io/ R
Open-AI's DALL-E for large scale training in mesh-tensorflow.
DALL-E in Mesh-Tensorflow [WIP] Open-AI's DALL-E in Mesh-Tensorflow. If this is similarly efficient to GPT-Neo, this repo should be able to train mode
Useful tools for building interactions in Python
discord-interactions-python Types and helper functions for Discord Interactions webhooks. Installation Available via pypi: pip install discord-interac
Automatically mock your HTTP interactions to simplify and speed up testing
VCR.py ๐ผ This is a Python version of Ruby's VCR library. Source code https://github.com/kevin1024/vcrpy Documentation https://vcrpy.readthedocs.io/ R
Automatically mock your HTTP interactions to simplify and speed up testing
VCR.py ๐ผ This is a Python version of Ruby's VCR library. Source code https://github.com/kevin1024/vcrpy Documentation https://vcrpy.readthedocs.io/ R
A Comparative Framework for Multimodal Recommender Systems
Cornac Cornac is a comparative framework for multimodal recommender systems. It focuses on making it convenient to work with models leveraging auxilia