A curated list of awesome papers for Semantic Retrieval (TOIS Accepted: Semantic Models for the First-stage Retrieval: A Comprehensive Review).

Awesome Semantic Models for the First-stage Retrieval

Note:

A curated list of awesome papers for Semantic Retrieval, including some early methods and recent neural models for information retrieval tasks (e.g., ad-hoc retrieval, open-domain QA, community-based QA, and automatic conversation).

For researchers who want to acquire semantic models for re-ranking stages, we refer readers to the awesome NeuIR survey by Guo et.al.

Any feedback and contribution are welcome, please open an issue or contact me.

Survey Paper
Chapter 1: Classical Term-based Retrieval
Chapter 2: Early Methods for Semantic Retrieval
Chapter 3: Neural Methods for Semantic Retrieval
Chapter 4: Other Resources

Survey Paper

Semantic Matching in Search（Li et al., 2014）
Pretrained Transformers for Text Ranking: BERT and Beyond（Lin et al., 2021, arXiv）
Semantic Models for the First-stage Retrieval: A Comprehensive Review （Guo et al., 2021, TOIS）
A Proposed Conceptual Framework for a Representational Approach to Information Retrieval（Lin et al., 2021, arXiv）

Classical Term-based Retrieval

A Vector Space Model for Automatic Indexing（1975, VSM）
Developments in Automatic Text Retrieval（1991, TFIDF）
Term-weighting Approaches in Automatic Text Retrieval（1988, TFIDF）
Relevance Weighting of Search Terms（1976, BIM）
A Theoretical Basis for the Use of Co-occurrence Data in Information Retrieval（1997, Tree Dependence Model）
The Probabilistic Relevance Framework: BM25 and Beyond（2010, BM25）
A Language Modeling Approach to Information Retrieval（1998, QL）
Statistical Language Models for Information Retrieval（2007, LM for IR）
Hypergeometric Language Model and Zipf-Like Scoring Function for Web Document Similarity Retrieval（2010, LM for IR）
Probabilistic Models of Information Retrieval Based on Measuring the Divergence from Randomness（2002, DFR）

Early Methods for Semantic Retrieval

Neural Methods for Semantic Retrieval

Sparse Retrieval Methods

Term Re-weighting
- Learning to Reweight Terms with Distributed Representations（Zheng et al., 2015, SIGIR, DeepTR）
- Integrating and Evaluating Neural Word Embeddings in Information Retrieval（Zuccon et al., 2015, ADCS, NTLM）
- Learning Term Discrimination（Frej et al, 2020, SIGIR, TVD）
- Context-Aware Sentence/Passage Term Importance Estimation for First Stage Retrieval（Dai et al., 2019, arXiv, DeepCT）
- Context-Aware Term Weighting For First-Stage Passage Retrieval（Dai et al., 2020, SIGIR, DeepCT）
- Efficiency Implications of Term Weighting for Passage Retrieval（Mackenzie et al., 2020, SIGIR, DeepCT）
- Context-Aware Document Term Weighting for Ad-Hoc Search（Dai et al., 2020, WWW, HDCT）
- A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques（Lin et al., 2021, arXiv, uniCOIL）
Expansion
- Document Expansion by query Prediction（Nogueira et al., 2019, arXiv, Doc2Query）
- From doc2query to docTTTTTquery（Nogueira et al., 2019, arXiv, DocTTTTTQuery）
- A Unified Pretraining Framework for Passage Ranking and Expansion（Yan et al., 2021, AAAI, UED）
- Generation-augmented Retrieval for Open-domain Question Answering（Mao et al., 2020, ACL, GAR, query expansion）
Expansion + Term Re-weighting
- Expansion via Prediction of Importance with Contextualization（MacAvaney et al., 2020, SIGIR, EPIC）
- SparTerm: Learning Term-based Sparse Representation for Fast Text Retrieval（Bai et al., 2020, arXiv, SparTerm）
- SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking（Formal et al., 2021, SIGIR, SPLADE）
- SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval（Formal et al., 2021, arXiv, SPLADEv2）
- Learning Passage Impacts for Inverted Indexes（Mallia et al., 2021, SIGIR, DeepImapct）
- TILDE: Term Independent Likelihood moDEl for Passage Re-ranking（Zhuang et al., 2021, SIGIR, TILDE）
- Fast Passage Re-ranking with Contextualized Exact Term Matching and Efficient Passage Expansion（Zhuang et al., 2021, arXiv, TILDEv2）
Sparse Representation Learning
- Semantic Hashing（Salakhutdinov et al., 2009）
- From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing（Zamani et al., 2018, CIKM, SNRM）
- UHD-BERT: Bucketed Ultra-High Dimensional Sparse Representations for Full Ranking（Jang et al., 2021, arXiv, UHD-BERT）
- Efﬁcient Passage Retrieval with Hashing for Open-domain Question Answering（Yamada et al., 2021, ACL, BPR）
- Composite Code Sparse Autoencoders for First Stage Retrieval（Lassance et al., 2021, SIGIR, CCSA）

Dense Retrieval Methods

Word-Embedding-based
- Aggregating Continuous Word Embeddings for Information Retrieval（Clinchant et al., 2013, ACL, FV）
- Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings（Vulic et al., 2015, SIGIR）
- Short Text Similarity with Word Embeddings（Kenter et al., 2015, CIKM, OoB）
- A Dual Embedding Space Model for Document Ranking（Mitra et al., 2016, arXiv, DESM）
- Efficient Natural Language Response Suggestion for Smart Reply（Henderson et al., 2017, arXiv）
- End-to-End Retrieval in Continuous Space（Gillick et al., 2018, arXiv）
Phrase-Indexed Question Answering: A New Challenge for Scalable Document Comprehension（Seo et al., 2018, EMNLP, PIQA）
Dense Passage Retrieval for Open-Domain Question Answering（Karpukhin et al., 2020, EMNLP, DPR）
Retrieval-augmented generation for knowledge-intensive NLP tasks（Lewis et al., 2020, NIPS, RAG）
RepBERT: Contextualized Text Embeddings for First-Stage Retrieval（Zhan et al., 2020, arXiv, RepBERT）
CoRT: Complementary Rankings from Transformers（Wrzalik et al., 2020, NAACL, CoRT）
DC-BERT: Decoupling Question and Document for Efficient Contextual Encoding（Nie et al., 2020, SIGIR, DC-BERT）
Neural Retrieval for Question Answering with Cross-Attention Supervised Data Augmentation（Yang et al., 2021, ACL, data augmentation）
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval（Xiong et al., 2020, arXiv, ANCE）
Learning To Retrieve: How to Train a Dense Retrieval Model Effectively and Efficiently（Zhan et al., 2020, arXiv, LTRe）
GLOW : Global Weighted Self-Attention Network for Web（Shan et al, 2020, arXiv, GLOW）
An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering（Qu et al., 2021, ACL, RocketQA）
Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling（Hofstätter et al., 2021, SIGIR, TAS-Balanced）
Optimizing Dense Retrieval Model Training with Hard Negatives（Zhan et al., 2021, SIGIR, STAR/ADORE）
Few-Shot Conversational Dense Retrieval（Yu et al., 2021, SIGIR）
Learning Dense Representations of Phrases at Scale（Lee et al., 2021, ACL, DensePhrases）
More Robust Dense Retrieval with Contrastive Dual Learning（Lee et al., 2021, ICTIR, DANCE）
PAIR: Leveraging Passage-Centric Similarity Relation for Improving Dense Passage Retrieval（Ren et al., 2021, ACL, PAIR）
Relevance-guided Supervision for OpenQA with ColBERT（Khattab et al., 2021, TACL, ColBERT-QA）
End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering（Sachan et al., 2021, arXiv, EMDR^2）
Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback（Yu et al, 2021, CIKM, ANCE-PRF）
Pseudo-Relevance Feedback for Multiple Representation Dense Retrieval（Wang et al., 2021, ICTIR, ColBERT-PRF）
A Discriminative Semantic Ranker for Question Retrieval（Cai et al., 2021, ICTIR, DenseTrans）
Representation Decoupling for Open-Domain Passage Retrieval（Wu et al., 2021, arXiv）
RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking（Ren et al., 2021, EMNLP, RocketQAv2）
Knowledge Distillation
- Distilling Dense Representations for Ranking using Tightly-Coupled Teachers（Lin et al., 2020, arXiv, TCT-ColBERT）
- Distilling Knowledge for Fast Retrieval-based Chat-bots（Tahami et al., 2020, SIGIR）
- Distilling Knowledge from Reader to Retriever for Question Answering（Izacard et al., 2020, arXiv）
- Is Retriever Merely an Approximator of Reader?（Yang et al., 2020, arXiv）
- Twinbert: Distilling knowledge to twin-structured compressed bert models for large-scale retrieval.
- Improving Bi-encoder Document Ranking Models with Two Rankers and Multi-teacher Distillation（Choi et al., 2021, SIGIR, TRMD）
- Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation（Hofstätter et al., 2021, arXiv, Margin-MSE loss）
Multi-vector Representation
- Multi-Hop Paragraph Retrieval for Open-Domain Question Answering（Feldman et al., 2019, ACL, MUPPET）
- Sparse, Dense, and Attentional Representations for Text Retrieval（Luan et al., 2020, TACL, ME-BERT）
- ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT（Khattab et al., 2020, SIGIR, ColBERT）
- COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List（Gao et al., 2021, NACL, COIL）
- Improving Document Representations by Generating Pseudo Query Embeddings for Dense Retrieval（Tang et al., 2021, ACL）
- Phrase Retrieval Learns Passage Retrieval, Too（Lee et al., 2021, EMNLP, DensePhrases）
- Query Embedding Pruning for Dense Retrieval（Tonellotto et al., 2021, CIKM）
Accelerate Interaction-based Models
- Incorporating Query Term Independence Assumption for Efficient Retrieval and Ranking using Deep Neural Networks（Mitra et al., 2019, arXiv）
- Efficient Interaction-based Neural Ranking with Locality Sensitive Hashing（Ji et al., 2019, WWW）
- Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring（Humeau et al., 2020, ICLR, Poly-encoders）
- Modularized Transfomer-based Ranking Framework（Gao et al., 2020, EMNLP, MORES）
- Efficient Document Re-Ranking for Transformers by Precomputing Term Representations（MacAvaney et al., 2020, SIGIR, PreTTR）
- DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering（Cao et al., 2020, ACL, DeFormer）
- SPARTA: Efﬁcient Open-Domain Question Answering via Sparse Transformer Matching Retrieval（Zhao et al., 2020, arXiv, SPARTA）
- Conformer-Kernel with Query Term Independence for Document Retrieval（Mitra et al., 2020, arXiv）
Pre-training
- Latent Retrieval for Weakly Supervised Open Domain Question Answering（Lee et al., 2019, ACL, ORQA）
- Retrieval-Augmented Language Model Pre-Training（Guu et al., 2020, ICML, REALM）
- Pre-training Tasks for Embedding-based Large-scale Retrieval（Chang et al., 2020, ICLR, BFS+WLP+MLM）
- Embedding-based Zero-shot Retrieval through Query Generation（Liang et al., 2020, arXiv, query generation）
- Zero-shot Neural Passage Retrieval via Domain-targeted Synthetic Question Generation（Ma et al., 2020, arXiv, query generation）
- Towards Robust Neural Retrieval Models with Synthetic Pre-Training（Reddy et al., 2021, arXiv, query generation）
- Is Your Language Model Ready for Dense Representation Fine-tuning?（Gao et al., 2021, EMNLP, Condenser）
- Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval（Gao et al., 2021, arXiv, coCondenser）
- Less is More: Pre-training a Strong Siamese Encoder Using a Weak Decoder（Lu et al., 2021, EMNLP, SEED-Encoder）
- Pre-trained Language Model for Web-scale Retrieval in Baidu Search（Liu et al., 2021, KDD）
- Pre-training for Ad-hoc Retrieval: Hyperlink is Also You Need（Ma et al., 2021, CIKM, HARP）
Joint Learning with Index
- Joint Learning of Deep Retrieval Model and Product Quantization based Embedding Index（Zhang et al., 2021, SIGIR, Poeem）
- Jointly Optimizing Query Encoder and Product Quantization to Improve Retrieval Performance（Zhan et al., 2021, CIKM, JPQ）
- Matching-oriented Product Quantization For Ad-hoc Retrieval（Xiao et al., 2021, EMNLP, MoPQ）
- Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval（Zhan et al, 2022, WSDM, RepCONC）
Debias
- Learning Robust Dense Retrieval Models from Incomplete Relevance Labels（Prakash et al., 2021, SIGIR, RANCE）
Zero-shot
- Zero-Shot Dense Retrieval with Momentum Adversarial Domain Invariant Representations（Xin et al., 2021, arXiv, MoDIR）
Probing Analysis
- The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes（Reimers et al., 2021, ACL）
- Simple and Effective Unsupervised Redundancy Elimination to Compress Dense Vectors for Passage Retrieval（Ma et al., EMNLP, 2021, redundancy）
- BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models（ Thakur et al., 2021, NeurIPS, transferability）
- Salient Phrase Aware Dense Retrieval: Can a Dense Retriever Imitate a Sparse One?（Chen et al., 2021, arXiv）
- Simple Entity-Centric Questions Challenge Dense Retrievers（Sciavolino et al., 2021, EMNLP）

Hybrid Retrieval Methods

Word-Embedding-based
- Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings（Vulic et al., 2015, SIGIR, linearly combine）
- Word Embedding based Generalized Language Model for Information Retrieval（Ganguly et al., 2015, SIGIR, GLM）
- Representing Documents and Queries as Sets of Word Embedded Vectors for Information Retrieval（Roy et al., 2016, SIGIR, linearly combine）
- A Dual Embedding Space Model for Document Ranking（Mitra et al., 2016, WWW, DESM_mixture, linearly combine）
- Off the Beaten Path: Let’s Replace Term-Based Retrieval with k-NN Search（Boytsov et al., 2016, CIKM, BM25+translation model）
Learning Hybrid Representations to Retrieve Semantically Equivalent Questions（Santos et al., 2015, ACL, BOW-CNN）
Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index （Seo et al., 2019, ACL, DenSPI）
Contextualized Sparse Representations for Real-Time Open-Domain Question Answering（Lee et al., 2020, ACL, SPARC）
CoRT: Complementary Rankings from Transformers（Wrzalik et al., 2020, NAACL, CoRT_BM25）
Sparse, Dense, and Attentional Representations for Text Retrieval（Luan et al., 2020, TACL, ME-Hybrid）
Complement Lexical Retrieval Model with Semantic Residual Embeddings（Gao et al., 2020, ECIR, CLEAR）
Leveraging Semantic and Lexical Matching to Improve the Recall of Document Retrieval Systems: A Hybrid Approach（Kuzi et al., 2020, arXiv, Hybrid）
A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques（Lin et al., 2021, arXiv, uniCOIL）
Contextualized Offline Relevance Weighting for Efficient and Effective Neural Retrieval（Chen et al., 2021, SIGIR）
Predicting Efficiency/Effectiveness Trade-offs for Dense vs. Sparse Retrieval Strategy Selection（Arabzadeh et al., 2021, CIKM）
Fast Forward Indexes for Efficient Document Ranking（Leonhardt et al., 2021, arXiv）

Other Resources

Other Tasks

E-commerce Search
- Deep Interest Network for Click-Through Rate Prediction（Zhou et al., 2018, KDD, DIN）
- From Semantic Retrieval to Pairwise Ranking: Applying Deep Learning in E-commerce Search（Li et al., 2019, SIGIR, Jingdong）
- Multi-Interest Network with Dynamic Routing for Recommendation at Tmall（Li et al., 2019, CIKM, MIND, Tmall）
- Towards Personalized and Semantic Retrieval: An End-to-End Solution for E-commerce Search via Embedding Learning（Zhang et al., 2020, SIGIR, DPSR, Jingdong）
- Deep Multi-Interest Network for Click-through Rate Prediction（Xiao et al., 2020, CIKM, DMIN）
- Deep Retrieval: An End-to-End Learnable Structure Model for Large-Scale Recommendations（Gao et al., 2020, arXiv）
- Embedding-based Product Retrieval in Taobao Search（Li et al., 2021, KDD, taobao）
- Embracing Structure in Data for Billion-Scale Semantic Product Search（Lakshman et al., 2021, arXiv, Amazon）
Sponsored Search
- MOBIUS: Towards the Next Generation of Query-Ad Matching in Baidu’s Sponsored Search（Fan et al., 2019, KDD, Baidu）
Image Retrieval
- Binary Neural Network Hashing for Image Retrieval（Zhang et al., 2021, SIGIR, BNNH）
- Deep Self-Adaptive Hashing for Image Retrieval（Lin et al., 2021, CIKM, DSAH）
Report on the First HIPstIR Workshop on the Future of Information Retrieval（Dietz et al., 2019, SIGIR, workshop）
Let’s measure run time! Extending the IR replicability infrastructure to include performance aspects（Hofstätter et al., 2019, SIGIR）
Embedding-based Retrieval in Facebook Search（Huang et al., 2020, KDD, EBR）
Learning K-way D-dimensional Discrete Codes for Compact Embedding Representations（Chen et al., 2018, ICML）

Datasets

【MS MARCO】MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
【TREC CAR】TREC Complex Answer Retrieval Overview
【TREC DL】Overview of the TREC 2019 deep learning track
【TREC COVID】TREC-COVID: Constructing a Pandemic Information Retrieval Test Collection

Indexing Methods

Tree-based
- Multidimensional Binary Search Trees Used for Associative Searching（1975, KD tree）
- Annoy
Hashing-based
- Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality（1998, LSH）
Quantization-based
- Product Quantization for Nearest Neighbor Search（2010, PQ）
- Optimized Product Quantization（2013, OPQ）
Graph-based
- Navigation in a Small World（2000, NSW）
- Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs（2018, HNSW）
Toolkits

Hi, I am Gordon Lee. Sorry to bother you with this issue. Thanks for your excellent work on sematic-retrieval models. Recently, MLNLP and I have made a search tool to collect top-tier conference up-to-date papers, which includes most top-tier conferences and journals from 2019-2022. Unlike to dblp or google scholar, it only includes top-tier conferences and journals. So you can find the most related and valuable papers more effectively. I believe this tool can help you to find more retrieval-related papers more efficiently. Welcome to use! You can access it via the following link: https://ai-paper-collector.vercel.app/ and you can find more details from our repo: https://github.com/MLNLP-World/AI-Paper-collector Such as: The search category is as follows:

- [EMNLP 2019-2021] [ACL 2019-2022] [NAACL 2019-2022] [COLING 2020-2022] 
- [ICASSP 2019-2022] [WWW 2019-2022] [ICLR 2019-2022] [ICML 2019-2022] 
- [AAAI 2019-2022] [IJCAI 2019-2022] [CVPR 2019-2022] [ICCV 2019-2021] 
- [MM 2019-2022] [KDD 2019-2022] [CIKM 2019-2021] [SIGIR 2019-2022] 
- [WSDM 2019-2022] [ECIR 2019-2022] [ECCV 2020-2020] [COLT 2019-2022] 
- [AISTATS 2019-2022] [INTERSPEECH 2019-2021] [ISWC 2019-2021] [JMLR 2019-2022] 
- [VLDB 2019-2021] [ICME 2019-2022] [TIP 2020-2022] [TPAMI 2020-2022] 
- [RECSYS 2019-2022] [TKDE 2020-2022] [TOIS 2020-2022] [ICDM 2019-2021] 
- [TASLP 2020-2022] [BMVC 2019-2021] [MICCAI 2019-2022] [NIPS 2019-2021] 
- [MLSYS 2020-2022] [WACV 2020-2022]

It also supports searching papers with specific years or/and specific authors, such as: The results also can be exported to CSV/TXT/JSON files. You need only a few edits to add to your README.md. For example:

[ACL2022]	Sentence-aware Contrastive Learning for Open-Domain Passage Retrieval
[ACL2022]	Retrieval-guided Counterfactual Generation for QA
[ACL2022]	Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval
[ACL2022]	Image Retrieval from Contextual Descriptions
[ACL2022]	Cross-Lingual Phrase Retrieval
[ACL2022]	Subgraph Retrieval Enhanced Model for Multi-hop Knowledge Base Question Answering
[ACL2022]	Multi-View Document Representation Learning for Open-Domain Dense Retrieval
[ACL2022]	ReACC: A Retrieval-Augmented Code Completion Framework
[ACL2022]	A Statutory Article Retrieval Dataset in French
[ACL2022]	Clickbait Spoiling via Question Answering and Passage Retrieval
[ACL2022]	Hyperlink-induced Pre-training for Passage Retrieval in Open-domain Question Answering
[ACL2022]	Generating Biographies on Wikipedia: The Impact of Gender Bias on the Retrieval-Based Generation of Women Biographies
[ACL2022]	Augmenting Document Representations for Dense Retrieval with Interpolation and Perturbation
[ACL2022]	Scene-Text Aware Image and Text Retrieval with Dual-Encoder
[ACL2022]	Compressing Sentence Representation for Semantic Retrieval via Homomorphic Projective Distillation
[ACL2022]	Two-Step Question Retrieval for Open-Domain QA
[ACL2022]	TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval
[ACL2022]	OneAligner: Zero-shot Cross-lingual Transfer with One Rich-Resource Language Pair for Low-Resource Sentence Retrieval
[ACL2022]	The Inefficiency of Language Models in Scholarly Retrieval: An Experimental Walk-through
[ACL2022]	LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval
[ACL2022]	Improving Candidate Retrieval with Entity Profile Generation for Wikidata Entity Linking
[ACL2022]	Zero-Shot Dense Retrieval with Momentum Adversarial Domain Invariant Representations
[COLING2022]	Pan More Gold from the Sand: Refining Open-domain Dialogue Training with Noisy Self-Retrieval Generation
[COLING2022]	Addressing Leakage in Self-Supervised Contextualized Code Retrieval
[COLING2022]	CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual Retrieval
[COLING2022]	Towards Robust Neural Retrieval with Source Domain Synthetic Pre-Finetuning
[COLING2022]	Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual Retrieval
[COLING2022]	Dense Template Retrieval for Customer Support
[COLING2022]	MuSeCLIR: A Multiple Senses and Cross-lingual Information Retrieval Dataset
[COLING2022]	Virtual Knowledge Graph Construction for Zero-Shot Domain-Specific Document Retrieval
[COLING2022]	DPTDR: Deep Prompt Tuning for Dense Passage Retrieval
[COLING2022]	Repo4QA: Answering Coding Questions via Dense Retrieval on GitHub Repositories
[COLING2022]	Augmentation, Retrieval, Generation: Event Sequence Prediction with a Three-Stage Sequence-to-Sequence Approach
[COLING2022]	DocQueryNet: Value Retrieval with Arbitrary Queries for Form-like Documents
[COLING2022]	Diverse Multi-Answer Retrieval with Determinantal Point Processes
[COLING2022]	SCL-RAI: Span-based Contrastive Learning with Retrieval Augmented Inference for Unlabeled Entity Problem in NER
[COLING2022]	CitRet: A Hybrid Model for Cited Text Span Retrieval
[COLING2022]	Generate-and-Retrieve: Use Your Predictions to Improve Retrieval for Semantic Parsing
[COLING2022]	Learning Decoupled Retrieval Representation for Nearest Neighbour Neural Machine Translation
...

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

CLIP4CMR A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval The original data and pre-calculate

9 Jan 12, 2022

🏅 The Most Comprehensive List of Kaggle Solutions and Ideas 🏅

🏅 Collection of Kaggle Solutions and Ideas 🏅

2.3k Jan 8, 2023

A comprehensive list of published machine learning applications to cosmology

ml-in-cosmology This github attempts to maintain a comprehensive list of published machine learning applications to cosmology, organized by subject ma

290 Dec 29, 2022

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

OCTIS : Optimizing and Comparing Topic Models is Simple! OCTIS (Optimizing and Comparing Topic models Is Simple) aims at training, analyzing and compa

478 Jan 1, 2023

new

Learning Diverse Document Representations with Deep Query Interactions for Dense Retrieval（2022，多向量） Topic-Grained Text Representation-based Model for Document Retrieval（2022，多向量）

opened by caiyinqiong 8

Recommend to use this tool to collect retrieval-related papers

opened by Doragd 2

A curated list of awesome papers for Semantic Retrieval (TOIS Accepted: Semantic Models for the First-stage Retrieval: A Comprehensive Review).

Related tags

Overview

Contents

Survey Paper

Classical Term-based Retrieval

Early Methods for Semantic Retrieval

Query Expansion

Document Expansion

Term Dependency Model

Topic Model

Translation Model

Neural Methods for Semantic Retrieval

Sparse Retrieval Methods

Dense Retrieval Methods

Hybrid Retrieval Methods

Other Resources

Other Tasks

Datasets

Indexing Methods

You might also like...

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

🏅 The Most Comprehensive List of Kaggle Solutions and Ideas 🏅

A comprehensive list of published machine learning applications to cosmology

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.

A curated list of neural network pruning resources.

A curated list of resources for Image and Video Deblurring

A curated (most recent) list of resources for Learning with Noisy Labels

A curated list of neural rendering resources.

Comments

new

Recommend to use this tool to collect retrieval-related papers

Owner

Yinqiong Cai

The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.

Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

[EMNLP 2021] MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

Code for our NeurIPS 2021 paper Mining the Benefits of Two-stage and One-stage HOI Detection

Code for Mining the Benefits of Two-stage and One-stage HOI Detection

Virtual Dance Reality Stage: a feature that offers you to share a stage with another user virtually

The tl;dr on a few notable transformer/language model papers + other papers (alignment, memorization, etc).

arxiv-sanity, but very lite, simply providing the core value proposition of the ability to tag arxiv papers of interest and have the program recommend similar papers.

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task