An Open-Source Package for Information Retrieval.

THUNLP

Last update: Dec 27, 2022

Related tags

Deep Learning information-retrieval

Overview

OpenMatch

An Open-Source Package for Information Retrieval.

😃 What's New

Top Spot on TREC-COVID Challenge (May 2020, Round2)

The twin goals of the challenge are to evaluate search algorithms and systems for helping scientists, clinicians, policy makers, and others manage the existing and rapidly growing corpus of scientific literature related to COVID-19, and to discover methods that will assist with managing scientific information in future global biomedical crises.
>> Reproduce Our Submit >> About COVID-19 Dataset >> Our Paper

Overview

OpenMatch integrates excellent neural methods and technologies to provide a complete solution for deep text matching and understanding. The documentation and tutorial of OpenMatch are available at here.

1/ Document Retrieval

Document Retrieval refers to extracting a set of related documents from large-scale document-level data based on user queries.

* Sparse Retrieval

Sparse Retriever is defined as a sparse bag-of-words retrieval model.

* Dense Retrieval

Dense Retriever performs retrieval by encoding documents and queries into dense low-dimensional vectors, and selecting the document that has the highest inner product with the query

2/ Document Reranking

Document reranking aims to further match user query and documents retrieved by the previous step with the purpose of obtaining a ranked list of relevant documents.

* Neural Ranker

Neural Ranker uses neural network as ranker to reorder documents.

* Feature Ensemble

Feature Ensemble can fuse neural features learned by neural ranker with the features of non-neural methods to obtain more robust performance

3/ Domain Transfer Learning

Domain Transfer Learning can leverages external knowledge graphs or weak supervision data to guide and help ranker to overcome data scarcity.

* Knowledge Enhancemnet

Knowledge Enhancement incorporates entity semantics of external knowledge graphs to enhance neural ranker.

* Data Augmentation

Data Augmentation leverages weak supervision data to improve the ranking accuracy in certain areas that lacks large scale relevance labels.

Stage	Model	Paper
1/ Sparse Retrieval	BM25	Best Match25 ~Tool
1/ Dense Retrieval	ANN	Approximate nearest neighbor ~Tool

2/ Neural Ranker	K-NRM	End-to-End Neural Ad-hoc Ranking with Kernel Pooling ~Paper
2/ Neural Ranker	Conv-KNRM	Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search ~Paper
2/ Neural Ranker	TK	Interpretable & Time-Budget-Constrained Contextualization for Re-Ranking ~Paper
2/ Neural Ranker	BERT	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding ~Paper
2/ Feature Ensemble	Coordinate Ascent	Linear feature-based models for information retrieval. Information Retrieval ~Paper

3/ Knowledge Enhancement	EDRM	Entity-Duet Neural Ranking: Understanding the Role of Knowledge Graph Semantics in Neural Information Retrieval ~Paper
3/ Data Augmentation	ReInfoSelect	Selective Weak Supervision for Neural Information Retrieval ~Paper

Note that the BERT model is following huggingface's implementation - transformers, so other bert-like models are also available in our toolkit, e.g. electra, scibert.

Installation

* From PyPI

pip install git+https://github.com/thunlp/OpenMatch.git

* From Source

git clone https://github.com/thunlp/OpenMatch.git
cd OpenMatch
python setup.py install

* From Docker

To build an OpenMatch docker image from Dockerfile

docker build -t <image_name> .

To run your docker image just built above as a container

docker run --gpus all --name=<container_name> -it -v /:/all/ --rm <image_name>:<TAG>

Quick Start

* Detailed examples are available here.

import torch
import OpenMatch as om

query = "Classification treatment COVID-19"
doc = "By retrospectively tracking the dynamic changes of LYM% in death cases and cured cases, this study suggests that lymphocyte count is an effective and reliable indicator for disease classification and prognosis in COVID-19 patients."

* For bert-like models:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("allenai/scibert_scivocab_uncased")
input_ids = tokenizer.encode(query, doc)
model = om.models.Bert("allenai/scibert_scivocab_uncased")
ranking_score, ranking_features = model(torch.tensor(input_ids).unsqueeze(0))

* For other models:

tokenizer = om.data.tokenizers.WordTokenizer(pretrained="./data/glove.6B.300d.txt")
query_ids, query_masks = tokenizer.process(query, max_len=16)
doc_ids, doc_masks = tokenizer.process(doc, max_len=128)
model = om.models.KNRM(vocab_size=tokenizer.get_vocab_size(),
                       embed_dim=tokenizer.get_embed_dim(),
                       embed_matrix=tokenizer.get_embed_matrix())
ranking_score, ranking_features = model(torch.tensor(query_ids).unsqueeze(0),
                                        torch.tensor(query_masks).unsqueeze(0),
                                        torch.tensor(doc_ids).unsqueeze(0),
                                        torch.tensor(doc_masks).unsqueeze(0))

* The GloVe can be downloaded using:

wget http://nlp.stanford.edu/data/glove.6B.zip -P ./data
unzip ./data/glove.6B.zip -d ./data

* Evaluation

metric = om.Metric()
res = metric.get_metric(qrels, ranking_list, 'ndcg_cut_20')
res = metric.get_mrr(qrels, ranking_list, 'mrr_cut_10')

Experiments

* Ad-hoc Search

Retriever	Reranker	Coor-Ascent	ClueWeb09	Robust04	ClueWeb12
SDM	KNRM	-	0.1880	0.3016	0.0968
SDM	Conv-KNRM	-	0.1894	0.2907	0.0896
SDM	EDRM	-	0.2015	0.2993	0.0937
SDM	TK	-	0.2306	0.2822	0.0966
SDM	BERT Base	-	0.2701	0.4168	0.1183
SDM	ELECTRA Base	-	0.2861	0.4668	0.1078

* MS MARCO Passage Ranking

Retriever	Reranker	Coor-Ascent	dev	eval
BM25	BERT Base	-	0.349	0.345
BM25	ELECTRA Base	-	0.352	0.344
BM25	RoBERTa Large	-	0.386	0.375
BM25	ELECTRA Large	-	0.388	0.376

* MS MARCO Document Ranking

Retriever	Reranker	Coor-Ascent	dev	eval
ANCE FirstP	-	-	0.373	0.334
ANCE MaxP	-	-	0.383	0.342
ANCE FirstP+BM25	BERT Base FirstP	+	0.431	0.380
ANCE MaxP	BERT Base MaxP	+	0.432	0.391

* Classic Features

Methods	ClueWeb09-B		Robust04		TREC-COVID
Methods	NDCG@20	ERR@20	NDCG@20	ERR@20	NDCG@20	P@20
BM25 (Anserini)	0.2773	0.1426	0.4129	0.1117	0.6979	0.7670
RankSVM (Dai et al.)	0.289	n.a.	0.420	n.a.	n.a.	n.a.
RankSVM (OpenMatch)	0.2825	0.1476	0.4309	0.1173	0.6995	0.7570
Coor-Ascent (Dai et al.)	0.295	n.a.	0.427	n.a.	n.a.	n.a.
Coor-Ascent (OpenMatch)	0.2969	0.1581	0.4340	0.1171	0.7041	0.7770

Contribution

Thanks to all the people who contributed to OpenMatch!

Kaitao Zhang, Si Sun, Zhenghao Liu, Aowei Lu

Project Organizers

Zhiyuan Liu
- Tsinghua University
- Homepage
Chenyan Xiong
- Microsoft Research AI
- Homepage
Maosong Sun
- Tsinghua University
- Homepage

Citation

@inproceedings{openmatch,
  author = {Liu, Zhenghao and Zhang, Kaitao and Xiong, Chenyan and Liu, Zhiyuan and Sun, Maosong},
  title = {OpenMatch: An Open Source Library for Neu-IR Research},
  booktitle = {Proceedings of SIGIR},
  year = {2021},
  url = {https://doi.org/10.1145/3404835.3462789},
  pages = {2531–2535}
}

Comments

Segmentation fault (core dumped)

On executing sh train_bert.sh getting error Segmentation fault (core dumped) Configuration of My VM RAM - 26 GB GPU - 2 (Tesla T4 ) Any Comment will be highly appreciated.

opened by Maheshkumar094 9
Unable to Replicate the results for BERT-based models

Hi! I am trying to replicate the results for the transformer models. While the BERT-base model is trained properly, the Electra-base, Electra-large, and Roberta-large models are not being trained properly. I get MRRs below 0.25 which is lower than the numbers reported in the repo. In addition, I use the default parameters for the training. Is there any specific point I have to observe in order to get the models trained correctly?

opened by shirinssalehi 7
Use OpenMatch as a library
Hi,

I would like to investigate using OpenMatch within my open Python based library (which handles file IO). Could you help with two things:

make a setup.py, go I can pip install directly from Github?

is it possible to use the OpenMatch DataSet API without reading from files?

Craig
opened by cmacdonald 7
How to use the api to initialize model with .bin checkpoints

Hi,

I am trying to train a bert and a KNRM ranker on my own data. I can't use the inference script because my test data does not have the required fields score and some other fields. So, I am trying to use the api directly.

How can I use the api to initialize a model with my .bin checkpoints, like in the example model = om.models.Bert("allenai/scibert_scivocab_uncased") ? I have tried pretrained = mycheckpoints.bin but I see that pretrainedrequires a huggingface checkpoint or a config.json. But there is no config.json in my checkpoint path...

opened by zhenduow 6
ContrastQG code

In your paper, you mention using contrastQG/QG to generate label/training data. Could you please share the code or repo link? I do not see its implementation in your repo.

Thanks,

opened by woshizouguo 4
Sparse Retrieval

hi，do you support building index for sparse retreival?

I see you mentioned Anserini. But I see neither examples nor api help us build a bm25 inverted index.

BTW, I'm not used to JAVA, and find Anserini kind of difficult to build a sparse index. Do you have any suggestion for my purpose?

opened by tangzhy 4
Ranking Loss

Hi developers, Looking at the code, I see you use the MarginRankingLoss, while Nogueira et al. suggest to adopt the cross-entropy loss. Is there a specific reason for this choice?

opened by CosimoRulli 3
Cannot Find File queries.train.small.tsv

Hi， I was trying to train a BERT model for MS MARCO Passage Ranking. And according to the bash, there needs a ‘queries.train.small.tsv’ file. But I didn't find any download link on the MS MARCO website. Where can I get this file? Thanks!

opened by VickiCui 2
How to generate TREC COVID?
I want to generate TREC COID file, e.g. run.covid-r1.fusion1.txt, I have tried to run

./bm25_retriever/bin/IndexCollection -collection JsonCollection -input {your collection} -index {index path} -generator LuceneDocumentGenerator -threads 8 -storePositions -storeDocvectors -storeRawDocs >& {log file path}

Search by BM25:

./bm25_retriever/bin/SearchCollection -index {index path} -topicreader {topic format} -topics {topic path} -bm25 -output {result file path}

But there is gap betweendownloaded dataset e.g. cord-19_2020-05-01.tar.gz and index input {your collection}.

Could you help me? Thanks a lot.
opened by xiaohong020408 2
Bump nbconvert from 5.6.1 to 6.3.0 in /retrievers/DANCE
Bumps nbconvert from 5.6.1 to 6.3.0.

Commits

cefe0bf Release 6.3.0

a534fb9 Release 6.3.0b0

87920c5 Add changelog for 6.3.0 (#1669)

dd6d9c7 add slide numbering (#1654)

5d2c5e2 Update state filter (#1664)

11ea593 fix: avoid closing the script tag early by escaping a forward slash (#1665)

968c5fb Fix HTML templates mentioned in help docs (#1653)

35c4d07 Add a new output filter that excludes widgets if there is no state (#1643)

c663c75 6.2.0

fd1dd15 6.2.0rc2

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Bump notebook from 6.1.3 to 6.4.10 in /retrievers/DANCE
Bumps notebook from 6.1.3 to 6.4.10.

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Bump numpy from 1.19.0 to 1.22.0 in /v1/retrievers/DANCE
Bumps numpy from 1.19.0 to 1.22.0.

Release notes

Sourced from numpy's releases.

v1.22.0

NumPy 1.22.0 Release Notes

NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.

A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.

NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.

New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.

A new configurable allocator for use by downstream projects.

These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

Expired deprecations

Deprecated numeric style dtype strings have been removed

Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

(gh-19539)

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

(gh-19615)

... (truncated)

Commits

4adc87d Merge pull request #20685 from charris/prepare-for-1.22.0-release

fd66547 REL: Prepare for the NumPy 1.22.0 release.

125304b wip

c283859 Merge pull request #20682 from charris/backport-20416

5399c03 Merge pull request #20681 from charris/backport-20954

f9c45f8 Merge pull request #20680 from charris/backport-20663

794b36f Update armccompiler.py

d93b14e Update test_public_api.py

7662c07 Update init.py

311ab52 Update armccompiler.py

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Bump protobuf from 3.12.2 to 3.15.0 in /v1/retrievers/DANCE
Bumps protobuf from 3.12.2 to 3.15.0.

Release notes

Sourced from protobuf's releases.

Protocol Buffers v3.15.0

Protocol Compiler

Optional fields for proto3 are enabled by default, and no longer require the --experimental_allow_proto3_optional flag.

C++

MessageDifferencer: fixed bug when using custom ignore with multiple unknown fields

Use init_seg in MSVC to push initialization to an earlier phase.

Runtime no longer triggers -Wsign-compare warnings.

Fixed -Wtautological-constant-out-of-range-compare warning.

DynamicCastToGenerated works for nullptr input for even if RTTI is disabled

Arena is refactored and optimized.

Clarified/specified that the exact value of Arena::SpaceAllocated() is an implementation detail users must not rely on. It should not be used in unit tests.

Change the signature of Any::PackFrom() to return false on error.

Add fast reflection getter API for strings.

Constant initialize the global message instances

Avoid potential for missed wakeup in UnknownFieldSet

Now Proto3 Oneof fields have "has" methods for checking their presence in C++.

Bugfix for NVCC

Return early in _InternalSerialize for empty maps.

Adding functionality for outputting map key values in proto path logging output (does not affect comparison logic) and stop printing 'value' in the path. The modified print functionality is in the MessageDifferencer::StreamReporter.

Fixed protocolbuffers/protobuf#8129

Ensure that null char symbol, package and file names do not result in a crash.

Constant initialize the global message instances

Pretty print 'max' instead of numeric values in reserved ranges.

Removed remaining instances of std::is_pod, which is deprecated in C++20.

Changes to reduce code size for unknown field handling by making uncommon cases out of line.

Fix std::is_pod deprecated in C++20 (#7180)

Fix some -Wunused-parameter warnings (#8053)

Fix detecting file as directory on zOS issue #8051 (#8052)

Don't include sys/param.h for _BYTE_ORDER (#8106)

remove CMAKE_THREAD_LIBS_INIT from pkgconfig CFLAGS (#8154)

Fix TextFormatMapTest.DynamicMessage issue#5136 (#8159)

Fix for compiler warning issue#8145 (#8160)

fix: support deprecated enums for GCC < 6 (#8164)

Fix some warning when compiling with Visual Studio 2019 on x64 target (#8125)

Python

Provided an override for the reverse() method that will reverse the internal collection directly instead of using the other methods of the BaseContainer.

MessageFactory.CreateProtoype can be overridden to customize class creation.

... (truncated)

Commits

ae50d9b Update protobuf version

8260126 Update protobuf version

c741c46 Resovled issue in the .pb.cc files

eef2764 Resolved an issue where NO_DESTROY and CONSTINIT were in incorrect order

0040102 Updated collect_all_artifacts.sh for Ubuntu Xenial

26cb6a7 Delete root-owned files in Kokoro builds

1e924ef Update port_def.inc

9a80cf1 Update coded_stream.h

a97c4f4 Merge pull request #8276 from haberman/php-warning

44cd75d Merge pull request #8282 from haberman/changelog

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Bump pillow from 7.2.0 to 9.0.1 in /v1/retrievers/DANCE
Bumps pillow from 7.2.0 to 9.0.1.

Release notes

Sourced from pillow's releases.

9.0.1

https://pillow.readthedocs.io/en/stable/releasenotes/9.0.1.html

Changes

In show_file, use os.remove to remove temporary images. CVE-2022-24303 #6010 [@radarhere, @hugovk]

Restrict builtins within lambdas for ImageMath.eval. CVE-2022-22817 #6009 [radarhere]

9.0.0

https://pillow.readthedocs.io/en/stable/releasenotes/9.0.0.html

Changes

Restrict builtins for ImageMath.eval() #5923 [@radarhere]

Ensure JpegImagePlugin stops at the end of a truncated file #5921 [@radarhere]

Fixed ImagePath.Path array handling #5920 [@radarhere]

Remove consecutive duplicate tiles that only differ by their offset #5919 [@radarhere]

Removed redundant part of condition #5915 [@radarhere]

Explicitly enable strip chopping for large uncompressed TIFFs #5517 [@kmilos]

Use the Windows method to get TCL functions on Cygwin #5807 [@DWesl]

Changed error type to allow for incremental WebP parsing #5404 [@radarhere]

Improved I;16 operations on big endian #5901 [@radarhere]

Ensure that BMP pixel data offset does not ignore palette #5899 [@radarhere]

Limit quantized palette to number of colors #5879 [@radarhere]

Use latin1 encoding to decode bytes #5870 [@radarhere]

Fixed palette index for zeroed color in FASTOCTREE quantize #5869 [@radarhere]

When saving RGBA to GIF, make use of first transparent palette entry #5859 [@radarhere]

Pass SAMPLEFORMAT to libtiff #5848 [@radarhere]

Added rounding when converting P and PA #5824 [@radarhere]

Improved putdata() documentation and data handling #5910 [@radarhere]

Exclude carriage return in PDF regex to help prevent ReDoS #5912 [@radarhere]

Image.NONE is only used for resampling and dithers #5908 [@radarhere]

Fixed freeing pointer in ImageDraw.Outline.transform #5909 [@radarhere]

Add Tidelift alignment action and badge #5763 [@aclark4life]

Replaced further direct invocations of setup.py #5906 [@radarhere]

Added ImageShow support for xdg-open #5897 [@m-shinder]

Fixed typo #5902 [@radarhere]

Switched from deprecated "setup.py install" to "pip install ." #5896 [@radarhere]

Support 16-bit grayscale ImageQt conversion #5856 [@cmbruns]

Fixed raising OSError in _safe_read when size is greater than SAFEBLOCK #5872 [@radarhere]

Convert subsequent GIF frames to RGB or RGBA #5857 [@radarhere]

WebP: Fix memory leak during decoding on failure #5798 [@ilai-deutel]

Do not prematurely return in ImageFile when saving to stdout #5665 [@infmagic2047]

Added support for top right and bottom right TGA orientations #5829 [@radarhere]

Corrected ICNS file length in header #5845 [@radarhere]

Block tile TIFF tags when saving #5839 [@radarhere]

Added line width argument to ImageDraw polygon #5694 [@radarhere]

Do not redeclare class each time when converting to NumPy #5844 [@radarhere]

Only prevent repeated polygon pixels when drawing with transparency #5835 [@radarhere]

... (truncated)

Changelog

Sourced from pillow's changelog.

9.0.1 (2022-02-03)

In show_file, use os.remove to remove temporary images. CVE-2022-24303 #6010 [radarhere, hugovk]

Restrict builtins within lambdas for ImageMath.eval. CVE-2022-22817 #6009 [radarhere]

9.0.0 (2022-01-02)

Restrict builtins for ImageMath.eval(). CVE-2022-22817 #5923 [radarhere]

Ensure JpegImagePlugin stops at the end of a truncated file #5921 [radarhere]

Fixed ImagePath.Path array handling. CVE-2022-22815, CVE-2022-22816 #5920 [radarhere]

Remove consecutive duplicate tiles that only differ by their offset #5919 [radarhere]

Improved I;16 operations on big endian #5901 [radarhere]

Limit quantized palette to number of colors #5879 [radarhere]

Fixed palette index for zeroed color in FASTOCTREE quantize #5869 [radarhere]

When saving RGBA to GIF, make use of first transparent palette entry #5859 [radarhere]

Pass SAMPLEFORMAT to libtiff #5848 [radarhere]

Added rounding when converting P and PA #5824 [radarhere]

Improved putdata() documentation and data handling #5910 [radarhere]

Exclude carriage return in PDF regex to help prevent ReDoS #5912 [hugovk]

Fixed freeing pointer in ImageDraw.Outline.transform #5909 [radarhere]

... (truncated)

Commits

6deac9e 9.0.1 version bump

c04d812 Update CHANGES.rst [ci skip]

4fabec3 Added release notes for 9.0.1

02affaa Added delay after opening image with xdg-open

ca0b585 Updated formatting

427221e In show_file, use os.remove to remove temporary images

c930be0 Restrict builtins within lambdas for ImageMath.eval

75b69dd Dont need to pin for GHA

cd938a7 Autolink CWE numbers with sphinx-issues

2e9c461 Add CVE IDs

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Bump ipython from 7.22.0 to 7.31.1 in /v1/retrievers/DANCE
Bumps ipython from 7.22.0 to 7.31.1.

Commits

e321e76 release 7.31.1

67ca2b3 Merge pull request from GHSA-pq7m-3gw7-gq5x

2794330 back to dev

be343e7 release 7.31.0

0fcf2c4 Merge pull request #13428 from meeseeksmachine/auto-backport-of-pr-13427-on-7.x

b8db9b1 Backport PR #13427: wn 731

7f253dc Merge pull request #13412 from bnavigator/backport-inspect

4f26796 fix xxlimited_35 import name

77ca4a6 don't run nose-based iptest on py310, only pytest

533e509 back to decorator skip

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0

Owner

THUNLP

Natural Language Processing Lab at Tsinghua University

GitHub

A PyTorch Implementation of the paper - Choi, Woosung, et al. "Investigating u-nets with various intermediate blocks for spectrogram-based singing voice separation." 21th International Society for Music Information Retrieval Conference, ISMIR. 2020.

Investigating U-NETS With Various Intermediate Blocks For Spectrogram-based Singing Voice Separation A Pytorch Implementation of the paper "Investigat

63 Nov 14, 2022

The official implementation for ACL 2021 "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval".

Code for "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval" (ACL 2021, Long) This is the repository for baseline m

25 Oct 30, 2022

Implementing Graph Convolutional Networks and Information Retrieval Mechanisms using pure Python and NumPy

3 Jun 22, 2022

Fake videos detection by tracing the source using video hashing retrieval.

Vision Transformer Based Video Hashing Retrieval for Tracing the Source of Fake Videos ??️ ?? Directory Introduction VTL Trace Samples and Acc of Hash

56 Dec 22, 2022

Source code of our TTH paper: Targeted Trojan-Horse Attacks on Language-based Image Retrieval.

Targeted Trojan-Horse Attacks on Language-based Image Retrieval Source code of our TTH paper: Targeted Trojan-Horse Attacks on Language-based Image Re

7 Aug 23, 2022

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)

NeurIPS 2021 Title: Distilling Robust and Non-Robust Features in Adversarial Exa

35 Dec 26, 2022

tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

Time series Timeseries Deep Learning Pytorch fastai - State-of-the-art Deep Learning with Time Series and Sequences in Pytorch / fastai

2.8k Jan 8, 2023

Face Library is an open source package for accurate and real-time face detection and recognition

Face Library Face Library is an open source package for accurate and real-time face detection and recognition. The package is built over OpenCV and us

52 Nov 9, 2022

An open source Python package for plasma science that is under development

PlasmaPy PlasmaPy is an open source, community-developed Python 3.7+ package for plasma science. PlasmaPy intends to be for plasma science what Astrop

444 Jan 7, 2023

Learning embeddings for classification, retrieval and ranking.

StarSpace StarSpace is a general-purpose neural model for efficient learning of entity embeddings for solving a wide variety of problems: Learning wor

3.8k Dec 22, 2022

Python library containing BART query generation and BERT-based Siamese models for neural retrieval.

Neural Retrieval Embedding-based Zero-shot Retrieval through Query Generation leverages query synthesis over large corpuses of unlabeled text (such as

35 Apr 14, 2022

Activity image-based video retrieval

Cross-modal-retrieval Our approach is focus on Activity Image-to-Video Retrieval (AIVR) task. The compared methods are state-of-the-art single modalit

75 Oct 21, 2021

🏆 The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)

AI City 2021: Connecting Language and Vision for Natural Language-Based Vehicle Retrieval ?? The 1st Place Submission to AICity Challenge 2021 Natural

82 Dec 29, 2022

Joint Learning of 3D Shape Retrieval and Deformation, CVPR 2021

Joint Learning of 3D Shape Retrieval and Deformation Joint Learning of 3D Shape Retrieval and Deformation Mikaela Angelina Uy, Vladimir G. Kim, Minhyu

38 Oct 18, 2022

Official Implementation of CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback

CoSMo.pytorch Official Implementation of CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback, Seungmin Lee*, Dongwan Kim*, Bohyung

54 Dec 8, 2022

Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

Optimizing Dense Retrieval Model Training with Hard Negatives Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, Shaoping Ma This repo provi

99 Dec 27, 2022

Official PyTorch implementation of Retrieve in Style: Unsupervised Facial Feature Transfer and Retrieval.

Retrieve in Style: Unsupervised Facial Feature Transfer and Retrieval PyTorch This is the PyTorch implementation of Retrieve in Style: Unsupervised Fa

60 Oct 12, 2022

A Joint Video and Image Encoder for End-to-End Retrieval

Frozen️ in Time ❄️ ️️️️ ⏳ A Joint Video and Image Encoder for End-to-End Retrieval project page | arXiv | webvid-data Repository containing the code,

225 Dec 25, 2022

This is the official implementation of "One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval".

CORA This is the official implementation of the following paper: Akari Asai, Xinyan Yu, Jungo Kasai and Hannaneh Hajishirzi. One Question Answering Mo

59 Dec 28, 2022

An Open-Source Package for Information Retrieval.

Related tags

Overview

OpenMatch

😃 What's New

Overview

1/ Document Retrieval

* Sparse Retrieval

* Dense Retrieval

2/ Document Reranking

* Neural Ranker

* Feature Ensemble

3/ Domain Transfer Learning

* Knowledge Enhancemnet

* Data Augmentation

Installation

* From PyPI

* From Source

* From Docker

Quick Start

Experiments

Contribution

Project Organizers

Citation

Comments

v1.22.0

NumPy 1.22.0 Release Notes

Expired deprecations

Deprecated numeric style dtype strings have been removed

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

Protocol Buffers v3.15.0

Protocol Compiler

C++

Python

9.0.1

Changes

9.0.0

Changes

9.0.1 (2022-02-03)

9.0.0 (2022-01-02)

Owner

THUNLP

A PyTorch Implementation of the paper - Choi, Woosung, et al. "Investigating u-nets with various intermediate blocks for spectrogram-based singing voice separation." 21th International Society for Music Information Retrieval Conference, ISMIR. 2020.

The official implementation for ACL 2021 "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval".

Implementing Graph Convolutional Networks and Information Retrieval Mechanisms using pure Python and NumPy

Fake videos detection by tracing the source using video hashing retrieval.

Source code of our TTH paper: Targeted Trojan-Horse Attacks on Language-based Image Retrieval.

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)

tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

Face Library is an open source package for accurate and real-time face detection and recognition

An open source Python package for plasma science that is under development

Learning embeddings for classification, retrieval and ranking.

Python library containing BART query generation and BERT-based Siamese models for neural retrieval.

Activity image-based video retrieval

🏆 The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)

Joint Learning of 3D Shape Retrieval and Deformation, CVPR 2021

Official Implementation of CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback

Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

Official PyTorch implementation of Retrieve in Style: Unsupervised Facial Feature Transfer and Retrieval.

A Joint Video and Image Encoder for End-to-End Retrieval

This is the official implementation of "One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval".

Expired deprecations for `loads`, `ndfromtxt`, and `mafromtxt` in npyio