A curated list of awesome papers for Semantic Retrieval (TOIS Accepted: Semantic Models for the First-stage Retrieval: A Comprehensive Review).

Overview

Awesome Semantic Models for the First-stage Retrieval



Awesome

Note:

  • A curated list of awesome papers for Semantic Retrieval, including some early methods and recent neural models for information retrieval tasks (e.g., ad-hoc retrieval, open-domain QA, community-based QA, and automatic conversation).
  • For researchers who want to acquire semantic models for re-ranking stages, we refer readers to the awesome NeuIR survey by Guo et.al.
  • Any feedback and contribution are welcome, please open an issue or contact me.

Contents


Survey Paper

Classical Term-based Retrieval

Early Methods for Semantic Retrieval

Query Expansion

Document Expansion

Term Dependency Model

Topic Model

Translation Model

Neural Methods for Semantic Retrieval

Sparse Retrieval Methods

Dense Retrieval Methods

Hybrid Retrieval Methods

Other Resources

Other Tasks

Datasets

Indexing Methods

You might also like...
A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

CLIP4CMR A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval The original data and pre-calculate

🏅  The Most Comprehensive List of Kaggle Solutions and Ideas 🏅
🏅 The Most Comprehensive List of Kaggle Solutions and Ideas 🏅

🏅 Collection of Kaggle Solutions and Ideas 🏅

A comprehensive list of published machine learning applications to cosmology

ml-in-cosmology This github attempts to maintain a comprehensive list of published machine learning applications to cosmology, organized by subject ma

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

OCTIS : Optimizing and Comparing Topic Models is Simple! OCTIS (Optimizing and Comparing Topic models Is Simple) aims at training, analyzing and compa

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.

TorchRL Disclaimer This library is not officially released yet and is subject to change. The features are available before an official release so that

A curated list of neural network pruning resources.

A curated list of neural network pruning and related resources. Inspired by awesome-deep-vision, awesome-adversarial-machine-learning, awesome-deep-learning-papers and Awesome-NAS.

A curated list of resources for Image and Video Deblurring

A curated list of resources for Image and Video Deblurring

A curated (most recent) list of resources for Learning with Noisy Labels

A curated (most recent) list of resources for Learning with Noisy Labels

A curated list of neural rendering resources.

Awesome-of-Neural-Rendering A curated list of neural rendering and related resources. Please feel free to pull requests or open an issue to add papers

Comments
  • new

    new

    Learning Diverse Document Representations with Deep Query Interactions for Dense Retrieval(2022,多向量) Topic-Grained Text Representation-based Model for Document Retrieval(2022,多向量)

    opened by caiyinqiong 8
  • Recommend to use this tool to collect retrieval-related papers

    Recommend to use this tool to collect retrieval-related papers

    Hi, I am Gordon Lee. Sorry to bother you with this issue. Thanks for your excellent work on sematic-retrieval models. Recently, MLNLP and I have made a search tool to collect top-tier conference up-to-date papers, which includes most top-tier conferences and journals from 2019-2022. Unlike to dblp or google scholar, it only includes top-tier conferences and journals. So you can find the most related and valuable papers more effectively. I believe this tool can help you to find more retrieval-related papers more efficiently. Welcome to use! You can access it via the following link: https://ai-paper-collector.vercel.app/ and you can find more details from our repo: https://github.com/MLNLP-World/AI-Paper-collector Such as: image The search category is as follows:

    - [EMNLP 2019-2021] [ACL 2019-2022] [NAACL 2019-2022] [COLING 2020-2022] 
    - [ICASSP 2019-2022] [WWW 2019-2022] [ICLR 2019-2022] [ICML 2019-2022] 
    - [AAAI 2019-2022] [IJCAI 2019-2022] [CVPR 2019-2022] [ICCV 2019-2021] 
    - [MM 2019-2022] [KDD 2019-2022] [CIKM 2019-2021] [SIGIR 2019-2022] 
    - [WSDM 2019-2022] [ECIR 2019-2022] [ECCV 2020-2020] [COLT 2019-2022] 
    - [AISTATS 2019-2022] [INTERSPEECH 2019-2021] [ISWC 2019-2021] [JMLR 2019-2022] 
    - [VLDB 2019-2021] [ICME 2019-2022] [TIP 2020-2022] [TPAMI 2020-2022] 
    - [RECSYS 2019-2022] [TKDE 2020-2022] [TOIS 2020-2022] [ICDM 2019-2021] 
    - [TASLP 2020-2022] [BMVC 2019-2021] [MICCAI 2019-2022] [NIPS 2019-2021] 
    - [MLSYS 2020-2022] [WACV 2020-2022] 
    

    It also supports searching papers with specific years or/and specific authors, such as: image The results also can be exported to CSV/TXT/JSON files. You need only a few edits to add to your README.md. For example:

    [ACL2022]	Sentence-aware Contrastive Learning for Open-Domain Passage Retrieval
    [ACL2022]	Retrieval-guided Counterfactual Generation for QA
    [ACL2022]	Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval
    [ACL2022]	Image Retrieval from Contextual Descriptions
    [ACL2022]	Cross-Lingual Phrase Retrieval
    [ACL2022]	Subgraph Retrieval Enhanced Model for Multi-hop Knowledge Base Question Answering
    [ACL2022]	Multi-View Document Representation Learning for Open-Domain Dense Retrieval
    [ACL2022]	ReACC: A Retrieval-Augmented Code Completion Framework
    [ACL2022]	A Statutory Article Retrieval Dataset in French
    [ACL2022]	Clickbait Spoiling via Question Answering and Passage Retrieval
    [ACL2022]	Hyperlink-induced Pre-training for Passage Retrieval in Open-domain Question Answering
    [ACL2022]	Generating Biographies on Wikipedia: The Impact of Gender Bias on the Retrieval-Based Generation of Women Biographies
    [ACL2022]	Augmenting Document Representations for Dense Retrieval with Interpolation and Perturbation
    [ACL2022]	Scene-Text Aware Image and Text Retrieval with Dual-Encoder
    [ACL2022]	Compressing Sentence Representation for Semantic Retrieval via Homomorphic Projective Distillation
    [ACL2022]	Two-Step Question Retrieval for Open-Domain QA
    [ACL2022]	TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval
    [ACL2022]	OneAligner: Zero-shot Cross-lingual Transfer with One Rich-Resource Language Pair for Low-Resource Sentence Retrieval
    [ACL2022]	The Inefficiency of Language Models in Scholarly Retrieval: An Experimental Walk-through
    [ACL2022]	LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval
    [ACL2022]	Improving Candidate Retrieval with Entity Profile Generation for Wikidata Entity Linking
    [ACL2022]	Zero-Shot Dense Retrieval with Momentum Adversarial Domain Invariant Representations
    [COLING2022]	Pan More Gold from the Sand: Refining Open-domain Dialogue Training with Noisy Self-Retrieval Generation
    [COLING2022]	Addressing Leakage in Self-Supervised Contextualized Code Retrieval
    [COLING2022]	CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual Retrieval
    [COLING2022]	Towards Robust Neural Retrieval with Source Domain Synthetic Pre-Finetuning
    [COLING2022]	Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual Retrieval
    [COLING2022]	Dense Template Retrieval for Customer Support
    [COLING2022]	MuSeCLIR: A Multiple Senses and Cross-lingual Information Retrieval Dataset
    [COLING2022]	Virtual Knowledge Graph Construction for Zero-Shot Domain-Specific Document Retrieval
    [COLING2022]	DPTDR: Deep Prompt Tuning for Dense Passage Retrieval
    [COLING2022]	Repo4QA: Answering Coding Questions via Dense Retrieval on GitHub Repositories
    [COLING2022]	Augmentation, Retrieval, Generation: Event Sequence Prediction with a Three-Stage Sequence-to-Sequence Approach
    [COLING2022]	DocQueryNet: Value Retrieval with Arbitrary Queries for Form-like Documents
    [COLING2022]	Diverse Multi-Answer Retrieval with Determinantal Point Processes
    [COLING2022]	SCL-RAI: Span-based Contrastive Learning with Retrieval Augmented Inference for Unlabeled Entity Problem in NER
    [COLING2022]	CitRet: A Hybrid Model for Cited Text Span Retrieval
    [COLING2022]	Generate-and-Retrieve: Use Your Predictions to Improve Retrieval for Semantic Parsing
    [COLING2022]	Learning Decoupled Retrieval Representation for Nearest Neighbour Neural Machine Translation
    ...
    
    opened by Doragd 2
Owner
Yinqiong Cai
Shape your life. Shape yourself.
Yinqiong Cai
The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.

This is a curated list of tutorials, projects, libraries, videos, papers, books and anything related to the incredible PyTorch. Feel free to make a pu

Ritchie Ng 9.2k Jan 2, 2023
Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

Optimizing Dense Retrieval Model Training with Hard Negatives Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, Shaoping Ma This repo provi

Jingtao Zhan 99 Dec 27, 2022
[EMNLP 2021] MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations

MuVER This repo contains the code and pre-trained model for our EMNLP 2021 paper: MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity

null 24 May 30, 2022
Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition", accepted at ACL 2021. For details of the model and experiments, please see our paper.

tricktreat 87 Dec 16, 2022
Code for our NeurIPS 2021 paper Mining the Benefits of Two-stage and One-stage HOI Detection

CDN Code for our NeurIPS 2021 paper "Mining the Benefits of Two-stage and One-stage HOI Detection". Contributed by Aixi Zhang*, Yue Liao*, Si Liu, Mia

null 71 Dec 14, 2022
Code for Mining the Benefits of Two-stage and One-stage HOI Detection

Status: Archive (code is provided as-is, no updates expected) PPO-EWMA [Paper] This is code for training agents using PPO-EWMA and PPG-EWMA, introduce

OpenAI 33 Dec 15, 2022
Virtual Dance Reality Stage: a feature that offers you to share a stage with another user virtually

Portrait Segmentation using Tensorflow This script removes the background from an input image. You can read more about segmentation here Setup The scr

null 291 Dec 24, 2022
The tl;dr on a few notable transformer/language model papers + other papers (alignment, memorization, etc).

The tl;dr on a few notable transformer/language model papers + other papers (alignment, memorization, etc).

Will Thompson 166 Jan 4, 2023
arxiv-sanity, but very lite, simply providing the core value proposition of the ability to tag arxiv papers of interest and have the program recommend similar papers.

arxiv-sanity, but very lite, simply providing the core value proposition of the ability to tag arxiv papers of interest and have the program recommend similar papers.

Andrej 671 Dec 31, 2022