520 Python Cross-modal-retrieval Libraries

Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders

MultiMAE: Multi-modal Multi-task Masked Autoencoders Roman Bachmann*, David Mizrahi*, Andrei Atanov, Amir Zamir Website | arXiv | BibTeX Official PyTo

Visual Intelligence & Learning Lab, Swiss Federal Institute of Technology (EPFL)

385 Jan 6, 2023

Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Memorizing Transformers - Pytorch Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memori

364 Jan 6, 2023

Cross-platform CLI tool to generate your Github profile's stats and summary.

ghs Cross-platform CLI tool to generate your Github profile's stats and summary. Preview Hop on to examples for other usecases. Jump to: Installation

134 Dec 20, 2022

Cross-view Transformers for real-time Map-view Semantic Segmentation (CVPR 2022 Oral)

Cross View Transformers This repository contains the source code and data for our paper: Cross-view Transformers for real-time Map-view Semantic Segme

363 Dec 25, 2022

This repo presents you the official code of "VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention"

VISTA VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention Shengheng Deng, Zhihao Liang, Lin Sun and Kui Jia* (*) Corresponding a

104 Dec 29, 2022

Official code for ROCA: Robust CAD Model Retrieval and Alignment from a Single Image (CVPR 2022)

ROCA: Robust CAD Model Alignment and Retrieval from a Single Image (CVPR 2022) Code release of our paper ROCA. Check out our video, paper, and website

123 Dec 25, 2022

[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Versatile Multi-Modal Pre-Training for Human-Centric Perception Fangzhou Hong1 Liang Pan1 Zhongang Cai1,2,3 Ziwei Liu1* 1S-Lab, Nanyang Technologic

96 Jan 3, 2023

RuDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP

[Paper] [Хабр] [Model Card] [Colab] [Kaggle] RuDOLPH 🦌 🎄 ☃️ One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP Russian Diffusio

232 Jan 4, 2023

Exploration & Research into cross-domain MEV. Initial focus on ETH/POLYGON.

xMEV, an apt exploration This is a small exploration on the xMEV opportunities between Polygon and Ethereum. It's a data analysis exercise on a few pa

7 Oct 18, 2022

This repo provides the source code for "Cross-Domain Adaptive Teacher for Object Detection".

Cross-Domain Adaptive Teacher for Object Detection This is the PyTorch implementation of our paper: Cross-Domain Adaptive Teacher for Object Detection

91 Dec 12, 2022

Unified API to facilitate usage of pre-trained "perceptor" models, a la CLIP

mmc installation git clone https://github.com/dmarx/Multi-Modal-Comparators cd 'Multi-Modal-Comparators' pip install poetry poetry build pip install d

37 Nov 25, 2022

Official implementation of "CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding" (CVPR, 2022)

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding (CVPR'22) Paper Link | Project Page Abstract : Manual an

152 Dec 23, 2022

Get started with Machine Learning with Python - An introduction with Python programming examples

Machine Learning With Python Get started with Machine Learning with Python An engaging introduction to Machine Learning with Python TL;DR Download all

130 Jan 2, 2023

A library for end-to-end learning of embedding index and retrieval model

Poeem Poeem is a library for efficient approximate nearest neighbor (ANN) search, which has been widely adopted in industrial recommendation, advertis

54 Dec 21, 2022

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

In recent years, the dense retrievers based on pre-trained language models have achieved remarkable progress. To facilitate more developers using cutt

475 Jan 4, 2023

Fully cross-platform toolkit (and library!) for MachO+Obj-C editing/analysis

fully cross-platform toolkit (and library!) for MachO+Obj-C editing/analysis. Includes a cli kit, a curses GUI, ObjC header dumping, and much more.

301 Dec 28, 2022

automatically crawl every URL and find cross site scripting (XSS)

scancss Fastest tool to find XSS. scancss is a fastest tool to detect Cross Site scripting (XSS) automatically and it's also an intelligent payload ge

30 Sep 24, 2022

Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in Pytorch

Retrieval-Augmented Denoising Diffusion Probabilistic Models (wip) Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in P

55 Jan 1, 2023

Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).

Bridging Video-text Retrieval with Multiple Choice Questions, CVPR 2022 (Oral) Paper | Project Page | Pre-trained Model | CLIP-Initialized Pre-trained

Applied Research Center (ARC), Tencent PCG

99 Jan 6, 2023

UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.

Unified Multi-modal Transformers This repository maintains the official implementation of the paper UMT: Unified Multi-modal Transformers for Joint Vi

84 Jan 4, 2023

A cross-platform script to book first available time for getting a passport in Sweden - Ett skript som automatiskt bokar pass hos polisen

Automatic passport booker - Boka pass automatiskt hos Svenska polisen A cross-platform script to book first available time for getting a passport in S

14 Oct 17, 2022

PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval (M2HSE) PyTorch code fo

6 Dec 23, 2022

SIGIR'22 paper: Axiomatically Regularized Pre-training for Ad hoc Search

Introduction This codebase contains source-code of the Python-based implementation (ARES) of our SIGIR 2022 paper. Chen, Jia, et al. "Axiomatically Re

17 Nov 9, 2022

Code for "Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency" paper

UNICORN 🦄 Webpage | Paper | BibTex PyTorch implementation of "Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency" pap

118 Jan 6, 2023

[LREC] MMChat: Multi-Modal Chat Dataset on Social Media

MMChat This repo contains the code and data for the LREC2022 paper MMChat: Multi-Modal Chat Dataset on Social Media. Dataset MMChat is a large-scale d

47 Jan 3, 2023

4st place solution for the PBVS 2022 Multi-modal Aerial View Object Classification Challenge - Track 1 (SAR) at PBVS2022

A Two-Stage Shake-Shake Network for Long-tailed Recognition of SAR Aerial View Objects 4st place solution for the PBVS 2022 Multi-modal Aerial View Ob

5 Nov 9, 2022

[arXiv22] Disentangled Representation Learning for Text-Video Retrieval

Disentangled Representation Learning for Text-Video Retrieval This is a PyTorch implementation of the paper Disentangled Representation Learning for T

49 Dec 18, 2022

"Video Moment Retrieval from Text Queries via Single Frame Annotation" in SIGIR 2022.

ViGA: Video moment retrieval via Glance Annotation This is the official repository of the paper "Video Moment Retrieval from Text Queries via Single F

38 Dec 31, 2022

Source code of our TTH paper: Targeted Trojan-Horse Attacks on Language-based Image Retrieval.

Targeted Trojan-Horse Attacks on Language-based Image Retrieval Source code of our TTH paper: Targeted Trojan-Horse Attacks on Language-based Image Re

7 Aug 23, 2022

This project aims to conduct a text information retrieval and text mining on medical research publication regarding Covid19 - treatments and vaccinations.

Project: Text Analysis - This project aims to conduct a text information retrieval and text mining on medical research publication regarding Covid19 -

1 Mar 14, 2022

Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

BiDR Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval. Requirements torch==

11 Oct 20, 2022

Flexible-Modal Face Anti-Spoofing: A Benchmark

Flexible-Modal FAS This is the official repository of "Flexible-Modal Face Anti-

22 Nov 10, 2022

Faune proche - Retrieval of Faune-France data near a google maps location

faune_proche Récupération des données de Faune-France près d'un lieu google maps

4 Feb 15, 2022

Apk Framework Detector

🚀🚀🚀Program helps you to detect the major framework or technology used in writing any android app. Just provide the apk 😇😇

10 Dec 7, 2022

HCQ: Hybrid Contrastive Quantization for Efficient Cross-View Video Retrieval

HCQ: Hybrid Contrastive Quantization for Efficient Cross-View Video Retrieval [toc] 1. Introduction This repository provides the code for our paper at

13 Dec 8, 2022

A Broad Study on the Transferability of Visual Representations with Contrastive Learning

A Broad Study on the Transferability of Visual Representations with Contrastive Learning This repository contains code for the paper: A Broad Study on

29 Nov 9, 2022

TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music

TONet Introduction The official implementation of "TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music", in ICASSP 2022 We

29 Dec 1, 2022

Code Repository for "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"

Hierarchical Token Semantic Audio Transformer Introduction The Code Repository for "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound

13 Feb 8, 2022

Nested cross-validation is necessary to avoid biased model performance in embedded feature selection in high-dimensional data with tiny sample sizes

Pruner for nested cross-validation - Sphinx-Doc Nested cross-validation is necessary to avoid biased model performance in embedded feature selection i

1 Dec 15, 2021

Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

[AAAI2022] Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics Overall pipeline of OCN. Paper Link: [arXiv] [AAAI

13 Nov 21, 2022

Kivy is an open source Python framework for creating cross-platform multi-touch mobile applications with Natural User Interface.

3 Feb 15, 2022

Open source simulator for autonomous vehicles built on Unreal Engine / Unity, from Microsoft AI & Research

Welcome to AirSim AirSim is a simulator for drones, cars and more, built on Unreal Engine (we now also have an experimental Unity release). It is open

13.8k Jan 3, 2023

XSSearch - A comprehensive reflected XSS tool built on selenium framework in python

XSSearch A Comprehensive Reflected XSS Scanner XSSearch is a comprehensive refle

49 Oct 18, 2022

The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"

Hierarchical Token Semantic Audio Transformer Introduction The Code Repository for "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound

134 Jan 1, 2023

Learning Super-Features for Image Retrieval

Learning Super-Features for Image Retrieval This repository contains the code for running our FIRe model presented in our ICLR'22 paper: @inproceeding

101 Dec 28, 2022

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

1.3k Dec 31, 2022

Retrieval.pytorch - The code we used in [2020 DIGIX]

2 Feb 7, 2022

Audio Retrieval with Natural Language Queries: A Benchmark Study

Audio Retrieval with Natural Language Queries: A Benchmark Study Paper | Project page | Text-to-audio search demo This repository is the implementatio

21 Oct 31, 2022

repro_eval is a collection of measures to evaluate the reproducibility/replicability of system-oriented IR experiments

repro_eval repro_eval is a collection of measures to evaluate the reproducibility/replicability of system-oriented IR experiments. The measures were d

9 May 25, 2022

PyTorch implementation for the paper Visual Representation Learning with Self-Supervised Attention for Low-Label High-Data Regime

Visual Representation Learning with Self-Supervised Attention for Low-Label High-Data Regime Created by Prarthana Bhattacharyya. Disclaimer: This is n

5 Nov 8, 2022

Official code of Team Yao at Multi-Modal-Fact-Verification-2022

Official code of Team Yao at Multi-Modal-Fact-Verification-2022 A Multi-Modal Fact Verification dataset released as part of the De-Factify workshop in

11 Nov 15, 2022

Answer a series of contextually-dependent questions like they may occur in natural human-to-human conversations.

SCAI-QReCC-21 [leaderboards] [registration] [forum] [contact] [SCAI] Answer a series of contextually-dependent questions like they may occur in natura

19 Sep 28, 2022

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques This repository is derived from the NMTGMinor

1 Sep 7, 2022

HistoKT: Cross Knowledge Transfer in Computational Pathology

HistoKT: Cross Knowledge Transfer in Computational Pathology Exciting News! HistoKT has been accepted to ICASSP 2022. HistoKT: Cross Knowledge Transfe

5 Jan 5, 2023

This repository contains pre-trained models and some evaluation code for our paper Towards Unsupervised Dense Information Retrieval with Contrastive Learning

Contriever: Towards Unsupervised Dense Information Retrieval with Contrastive Learning This repository contains pre-trained models and some evaluation

207 Jan 8, 2023

This repository contains the implementation of the following paper: Cross-Descriptor Visual Localization and Mapping

Cross-Descriptor Visual Localization and Mapping This repository contains the implementation of the following paper: "Cross-Descriptor Visual Localiza

81 Oct 6, 2022

Accelerated Multi-Modal MR Imaging with Transformers

Accelerated Multi-Modal MR Imaging with Transformers Dependencies numpy==1.18.5 scikit_image==0.16.2 torchvision==0.8.1 torch==1.7.0 runstats==1.8.0 p

54 Dec 16, 2022

Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features

Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features | paper | Official PyTorch implementation for Mul

48 Dec 28, 2022

A secure and customizable bot for controlling cross-server announcements and interactions within Discord

DiscordBot A secure and customizable bot for controlling cross-server announcements and interactions within Discord. Within the code of the bot, you c

1 Jan 22, 2022

A Web app to Cross-Seed torrents in Deluge/qBittorrent/Transmission

SeedCross A Web app to Cross-Seed torrents in Deluge/qBittorrent/Transmission based on CrossSeedAutoDL Require Jackett Deluge/qBittorrent/Transmission

76 Dec 19, 2022

Pytorch Implementation of Value Retrieval with Arbitrary Queries for Form-like Documents.

Value Retrieval with Arbitrary Queries for Form-like Documents Introduction Pytorch Implementation of Value Retrieval with Arbitrary Queries for Form-

13 Sep 15, 2022

Transformer in Vision

Transformer-in-Vision Recent Transformer-based CV and related works. Welcome to comment/contribute! Keep updated. Resource SCENIC: A JAX Library for C

1.1k Dec 30, 2022

TransZero++: Cross Attribute-guided Transformer for Zero-Shot Learning

TransZero++ This repository contains the testing code for the paper "TransZero++: Cross Attribute-guided Transformer for Zero-Shot Learning" submitted

3 Jan 5, 2022

Codes and scripts for "Explainable Semantic Space by Grounding Languageto Vision with Cross-Modal Contrastive Learning"

Visually Grounded Bert Language Model This repository is the official implementation of Explainable Semantic Space by Grounding Language to Vision wit

17 Dec 17, 2022

A PyTorch implementation of VIOLET

VIOLET: End-to-End Video-Language Transformers with Masked Visual-token Modeling A PyTorch implementation of VIOLET Overview VIOLET is an implementati

119 Dec 30, 2022

[BMVC2021] "TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation"

TransFusion-Pose TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation Haoyu Ma, Liangjian Chen, Deying Kong, Zhe Wang, Xingwei

29 Dec 23, 2022

CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss

CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss This is official implement of "

87 Dec 24, 2022

Th2En & Th2Zh: The large-scale datasets for Thai text cross-lingual summarization

Th2En & Th2Zh: The large-scale datasets for Thai text cross-lingual summarization 📥 Download Datasets 📥 Download Trained Models INTRODUCTION TH2ZH (

5 Jan 3, 2022

Cross-modal Retrieval using Transformer Encoder Reasoning Networks (TERN). With use of Metric Learning and FAISS for fast similarity search on GPU

Cross-modal Retrieval using Transformer Encoder Reasoning Networks This project reimplements the idea from "Transformer Reasoning Network for Image-Te

5 Nov 5, 2022

SimplePyBLE - Python bindings for SimpleBLE

The ultimate fully-fledged cross-platform Python BLE library, designed for simplicity and ease of use.

27 Aug 28, 2022

CLIP2Video: Mastering Video-Text Retrieval via Image CLIP

CLIP2Video: Mastering Video-Text Retrieval via Image CLIP The implementation of paper CLIP2Video: Mastering Video-Text Retrieval via Image CLIP. CLIP2

168 Dec 29, 2022

Cross-platform-profile-pic-changer - Script to change profile pictures across multiple platforms

cross-platform-profile-pic-changer script to change profile pictures across mult

4 Jan 17, 2022

RETRO-pytorch - Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch

RETRO - Pytorch (wip) Implementation of RETRO, Deepmind's Retrieval based Attent

556 Jan 4, 2023

Use CLIP to represent video for Retrieval Task

A Straightforward Framework For Video Retrieval Using CLIP This repository contains the basic code for feature extraction and replication of results.

54 Dec 22, 2022

Collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets. Additionally, it also collects many useful tutorials and tools in these related domains.

139 Dec 21, 2022

Meta learning algorithms to train cross-lingual NLI (multi-task) models

4 Nov 20, 2022

Neighbourhood Retrieval (Nearest Neighbours) with Distance Correlation.

Neighbourhood Retrieval with Distance Correlation Assign Pseudo class labels to datapoints in the latent space. NNDC is a slim wrapper around FAISS. N

1 Jan 16, 2022

Continual Learning of Long Topic Sequences in Neural Information Retrieval

ContinualPassageRanking Repository for the paper "Continual Learning of Long Topic Sequences in Neural Information Retrieval". In this repository you

0 Apr 12, 2022

Code for ECIR'20 paper Diagnosing BERT with Retrieval Heuristics

Bert Axioms This is the repository with the code for the Paper Diagnosing BERT with Retrieval Heuristics Required Data In order to run this code, you

5 Jan 21, 2022

ISBI 2022: Cross-level Contrastive Learning and Consistency Constraint for Semi-supervised Medical Image.

Cross-level Contrastive Learning and Consistency Constraint for Semi-supervised Medical Image Introduction This repository contains the PyTorch implem

25 Nov 9, 2022

Crosschat - A bot for cross-server communication

CrossChat A bot for cross-server communication. Running the bot To run the bot y

8 May 15, 2022

Lbl2Vec learns jointly embedded label, document and word vectors to retrieve documents with predefined topics from an unlabeled document corpus.

Lbl2Vec Lbl2Vec is an algorithm for unsupervised document classification and unsupervised document retrieval. It automatically generates jointly embed

61 Dec 20, 2022

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

CLIP4CMR A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval The original data and pre-calculate

9 Jan 12, 2022

Image-retrieval-baseline - MUGE Multimodal Retrieval Baseline

MUGE Multimodal Retrieval Baseline This repo is implemented based on the open_cl

47 Dec 16, 2022

This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' published at ECIR'22.

Paragraph Aggregation Retrieval Model (PARM) for Dense Document-to-Document Retrieval This repository contains the code for the paper PARM: A Paragrap

33 Aug 26, 2022

Decompose to Adapt: Cross-domain Object Detection via Feature Disentanglement

Decompose to Adapt: Cross-domain Object Detection via Feature Disentanglement In this project, we proposed a Domain Disentanglement Faster-RCNN (DDF)

19 Nov 24, 2022

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

CLIP4CMR A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval The original data and pre-calculate

24 Dec 26, 2022

Local cross-platform machine translation GUI, based on CTranslate2

DesktopTranslator Local cross-platform machine translation GUI, based on CTranslate2 Download Windows Installer You can either download a ready-made W

29 Jan 5, 2023

FMA: A Dataset For Music Analysis

FMA: A Dataset For Music Analysis Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson. International Society for Music Information

1.8k Dec 29, 2022

RuDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP

[Paper] [Хабр] [Model Card] [Colab] [Kaggle] RuDOLPH 🦌 🎄 ☃️ One Hyper-Modal Tr

230 Dec 31, 2022

Repository for the paper : Meta-FDMixup: Cross-Domain Few-Shot Learning Guided byLabeled Target Data

1 Meta-FDMIxup Repository for the paper : Meta-FDMixup: Cross-Domain Few-Shot Learning Guided byLabeled Target Data. (ACM MM 2021) paper News! the rep

44 Nov 18, 2022

Cross View SLAM

Cross View SLAM This is the associated code and dataset repository for our paper I. D. Miller et al., "Any Way You Look at It: Semantic Crossview Loca

99 Dec 9, 2022

Cherche (search in French) allows you to create a neural search pipeline using retrievers and pre-trained language models as rankers.

Cherche (search in French) allows you to create a neural search pipeline using retrievers and pre-trained language models as rankers. Cherche is meant to be used with small to medium sized corpora. Cherche's main strength is its ability to build diverse and end-to-end pipelines.

224 Nov 29, 2022

Python Cross-modal-retrieval Resources

Python Cross-modal-retrieval Libraries

Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders

Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Cross-platform CLI tool to generate your Github profile's stats and summary.

Cross-view Transformers for real-time Map-view Semantic Segmentation (CVPR 2022 Oral)

This repo presents you the official code of "VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention"

Official code for ROCA: Robust CAD Model Retrieval and Alignment from a Single Image (CVPR 2022)

[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

RuDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP

Exploration & Research into cross-domain MEV. Initial focus on ETH/POLYGON.

This repo provides the source code for "Cross-Domain Adaptive Teacher for Object Detection".

Unified API to facilitate usage of pre-trained "perceptor" models, a la CLIP

Official implementation of "CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding" (CVPR, 2022)

Get started with Machine Learning with Python - An introduction with Python programming examples

A library for end-to-end learning of embedding index and retrieval model

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

Fully cross-platform toolkit (and library!) for MachO+Obj-C editing/analysis

automatically crawl every URL and find cross site scripting (XSS)

Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in Pytorch

Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).

UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.

A cross-platform script to book first available time for getting a passport in Sweden - Ett skript som automatiskt bokar pass hos polisen

PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

SIGIR'22 paper: Axiomatically Regularized Pre-training for Ad hoc Search

Code for "Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency" paper

[LREC] MMChat: Multi-Modal Chat Dataset on Social Media

4st place solution for the PBVS 2022 Multi-modal Aerial View Object Classification Challenge - Track 1 (SAR) at PBVS2022

[arXiv22] Disentangled Representation Learning for Text-Video Retrieval

"Video Moment Retrieval from Text Queries via Single Frame Annotation" in SIGIR 2022.

Source code of our TTH paper: Targeted Trojan-Horse Attacks on Language-based Image Retrieval.

This project aims to conduct a text information retrieval and text mining on medical research publication regarding Covid19 - treatments and vaccinations.

Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

Flexible-Modal Face Anti-Spoofing: A Benchmark

Faune proche - Retrieval of Faune-France data near a google maps location

Apk Framework Detector

HCQ: Hybrid Contrastive Quantization for Efficient Cross-View Video Retrieval

A Broad Study on the Transferability of Visual Representations with Contrastive Learning

TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music

Code Repository for "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"

Nested cross-validation is necessary to avoid biased model performance in embedded feature selection in high-dimensional data with tiny sample sizes

Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

Kivy is an open source Python framework for creating cross-platform multi-touch mobile applications with Natural User Interface.

Open source simulator for autonomous vehicles built on Unreal Engine / Unity, from Microsoft AI & Research

XSSearch - A comprehensive reflected XSS tool built on selenium framework in python

The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"

Learning Super-Features for Image Retrieval

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Retrieval.pytorch - The code we used in [2020 DIGIX]

Audio Retrieval with Natural Language Queries: A Benchmark Study

repro_eval is a collection of measures to evaluate the reproducibility/replicability of system-oriented IR experiments

PyTorch implementation for the paper Visual Representation Learning with Self-Supervised Attention for Low-Label High-Data Regime

Official code of Team Yao at Multi-Modal-Fact-Verification-2022

Answer a series of contextually-dependent questions like they may occur in natural human-to-human conversations.

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

HistoKT: Cross Knowledge Transfer in Computational Pathology

This repository contains pre-trained models and some evaluation code for our paper Towards Unsupervised Dense Information Retrieval with Contrastive Learning

This repository contains the implementation of the following paper: Cross-Descriptor Visual Localization and Mapping

Accelerated Multi-Modal MR Imaging with Transformers

Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features

A secure and customizable bot for controlling cross-server announcements and interactions within Discord

A Web app to Cross-Seed torrents in Deluge/qBittorrent/Transmission

Pytorch Implementation of Value Retrieval with Arbitrary Queries for Form-like Documents.

Transformer in Vision

TransZero++: Cross Attribute-guided Transformer for Zero-Shot Learning

Codes and scripts for "Explainable Semantic Space by Grounding Languageto Vision with Cross-Modal Contrastive Learning"

A PyTorch implementation of VIOLET

[BMVC2021] "TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation"

CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss

Th2En & Th2Zh: The large-scale datasets for Thai text cross-lingual summarization

Cross-modal Retrieval using Transformer Encoder Reasoning Networks (TERN). With use of Metric Learning and FAISS for fast similarity search on GPU

SimplePyBLE - Python bindings for SimpleBLE

CLIP2Video: Mastering Video-Text Retrieval via Image CLIP

Cross-platform-profile-pic-changer - Script to change profile pictures across multiple platforms

RETRO-pytorch - Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch

Use CLIP to represent video for Retrieval Task

Collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

Meta learning algorithms to train cross-lingual NLI (multi-task) models

Neighbourhood Retrieval (Nearest Neighbours) with Distance Correlation.

Continual Learning of Long Topic Sequences in Neural Information Retrieval