Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval (NeurIPS'21)

Related tags

Deep Learning Baleen

Overview

Baleen

Baleen is a state-of-the-art model for multi-hop reasoning, enabling scalable multi-hop search over massive collections for knowledge-intensive tasks like QA and claim verification.

Figure 1: Baleen's condensed retrieval architecture for multi-hop search.

Installation

The implementation of Baleen lives as part of the parent ColBERT repository (under its new_api branch).

After cloning, make sure you obtain the code for the submodule too:

git submodule update --init --recursive

Please follow the installation instructions from the submodule. Baleen has the same requirements as the parent ColBERT repository.

Usage

We will update this README with instructions and model checkpoints in the next few hours! Check back or "Watch" the github repo for updates.

Permeability Prediction Via Multi Scale 3D CNN

Permeability-Prediction-Via-Multi-Scale-3D-CNN Data: The raw CT rock cores are obtained from the Imperial Colloge portal. The CT rock cores are sub-sa

2 Jul 6, 2022

Multi-Scale Geometric Consistency Guided Multi-View Stereo

ACMM [News] The code for ACMH is released!!! [News] The code for ACMP is released!!! About ACMM is a multi-scale geometric consistency guided multi-vi

118 Jan 4, 2023

Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

CMIC-Retrieval Code for Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning. ICCV 2021. Introduction In this wo

42 Nov 17, 2022

[EMNLP 2021] MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations

MuVER This repo contains the code and pre-trained model for our EMNLP 2021 paper: MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity

24 May 30, 2022

Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

U2Fusion Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal (VIS-IR, medical), multi

129 Dec 11, 2022

[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

Feel free to visit my homepage Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DIMP) [ECCVW2020 paper] Presentation

35 Oct 26, 2022

Code for the paper: Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization (https://arxiv.org/abs/2002.11798)

Representation Robustness Evaluations Our implementation is based on code from MadryLab's robustness package and Devon Hjelm's Deep InfoMax. For all t

19 Dec 7, 2022

Code for C2-Matching (CVPR2021). Paper: Robust Reference-based Super-Resolution via C2-Matching.

C2-Matching (CVPR2021) This repository contains the implementation of the following paper: Robust Reference-based Super-Resolution via C2-Matching Yum

151 Dec 26, 2022

Code for the CVPR 2021 paper: Understanding Failures of Deep Networks via Robust Feature Extraction

Welcome to Barlow Barlow is a tool for identifying the failure modes for a given neural network. To achieve this, Barlow first creates a group of imag

33 Dec 5, 2022

Comments

TypeError after inference

Hi, when saving the inference results as json file via hover_inference.py, the dictionary contains set. Sets are not serializable via json. Thus the saving fails.

python -m hover_inference --root ./experiments/ --datadir . --index wiki17.hover.2bit

Traceback (most recent call last):                                                                                                                            
  File "xxx/conda/envs/colbert-v0.4/lib/python3.7/runpy.py", line 193, in _run_module_as_main                                                      
    "__main__", mod_spec)                                                                                                                                     
  File "xxx/.conda/envs/colbert-v0.4/lib/python3.7/runpy.py", line 85, in _run_code                                                                 
    exec(code, run_globals)                                                                                                                                   
  File "yyy/baleen/Baleen/hover_inference.py", line 53, in <module>                                                                             
    main(args)                                                                                                                                                
  File "yyy/baleen/Baleen/hover_inference.py", line 43, in main                                                                                 
    f.write(ujson.dumps(outputs) + '\n')                                                                                                                      
TypeError: {3910663, 1373715, 833561, 2479648, 3921953, 3408419, 3188274, 1399859, 372789, 1117238, 3283510, 3342401, 2585678, 1428049, 4948563, 1399892, 4449
365, 4216407, 4502103, 819287, 3598429, 5187684, 625781, 3042432, 1485442, 3487369, 4166284, 148110, 3713169, 1338005, 1951900, 936613, 437414, 556716, 266616
2, 573620, 4666549, 638144, 4154562, 4315335, 4230859, 4788429, 2613967, 174801, 4054227, 3768532, 5224152, 4914913, 2469090, 460517, 4820205, 1360625, 426418
5, 3064580, 424200, 4601613, 4707087, 2140434, 3422995, 3878677, 3583776, 2412329, 5212973, 3787053, 4286261, 2512694, 821559, 4174137, 3351359, 349002, 38961
43, 3414369, 875881, 1557358, 3957103, 4061041, 3913073, 2986353, 959347, 803705, 4757370, 1752441, 2359693, 4729260, 1178030, 1897903, 5206962, 564149, 42382
75, 4074960, 1900502, 4158425, 4635100, 4552679, 1106923, 3795442, 3049975, 2750972, 4602365, 1399295} is not JSON serializable

Every item in dictionary to be saved looks like this

0: ([(424200, 2), (4635100, 1), (4635100, 0)], 
{3910663, 1373715, 833561, 2479648, 3921953, 3408419, 3188274, 1399859, 372789, 1117238, 3283510, 3342401, 2585678, 1428049, 4948563, 1399892, 4449365, 4216407, 4502103, 819287, 3598429, 5187684, 625781, 3042432, 1485442, 3487369, 4166284, 148110, 3713169, 1338005, 19
51900, 936613, 437414, 556716, 2666162, 573620, 4666549, 638144, 4154562, 4315335, 4230859, 4788429, 2613967, 174801, 4054227, 3768532, 5224152, 4914913, 2469090, 460517, 4820205, 1360625, 4264185, 3064580, 424200, 4601613, 4707087, 2140434, 3422995, 3878677, 3583776, 2412329, 5212973, 3787053, 4286261, 2512694, 821559, 4174137, 3351359, 349002, 3896143, 3414369, 875881, 1557358, 3957103, 4061041, 3913073, 2986353, 959347, 803705, 4757370, 1752441, 2359693, 4729260, 1178030, 1897903, 5206962, 564149, 4238275, 4074960, 1900502, 4158425, 4635100, 4552679, 1106923, 3795442, 3049975, 2750972, 4602365, 1399295})

This is quite annoying, when spending few hours inferring the actual retrieval results :). Cheers, Martin

environment

name: colbert-v0.4
channels:
  - pytorch
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=2_kmp_llvm
  - blas=2.114=mkl
  - blas-devel=3.9.0=14_linux64_mkl
  - bzip2=1.0.8=h7f98852_4
  - ca-certificates=2021.10.8=ha878542_0
  - cudatoolkit=11.1.1=h6406543_10
  - cupy=10.4.0=py37h52a254a_0
  - faiss=1.7.0=py37cuda111hcc9d9d6_8_cuda
  - faiss-gpu=1.7.0=h788eb59_8
  - ffmpeg=4.3=hf484d3e_0
  - freetype=2.10.4=h0708190_1
  - gmp=6.2.1=h58526e2_0
  - gnutls=3.6.13=h85f3911_1
  - jpeg=9b=h024ee3a_2
  - lame=3.100=h7f98852_1001
  - ld_impl_linux-64=2.36.1=hea4e1c9_2
  - libblas=3.9.0=14_linux64_mkl
  - libcblas=3.9.0=14_linux64_mkl
  - libfaiss=1.7.0=cuda111hf54f04a_8_cuda
  - libfaiss-avx2=1.7.0=cuda111h1234567_8_cuda
  - libffi=3.4.2=h7f98852_5
  - libgcc-ng=11.2.0=h1d223b6_16
  - libgfortran-ng=11.2.0=h69a702a_16
  - libgfortran5=11.2.0=h5c6108e_16
  - libiconv=1.16=h516909a_0
  - liblapack=3.9.0=14_linux64_mkl
  - liblapacke=3.9.0=14_linux64_mkl
  - libnsl=2.0.0=h7f98852_0
  - libpng=1.6.37=h21135ba_2
  - libstdcxx-ng=11.2.0=he4da1e4_16
  - libtiff=4.0.9=he6b73bb_1
  - libuv=1.43.0=h7f98852_0
  - libzlib=1.2.11=h166bdaf_1014
  - llvm-openmp=13.0.1=he0ac6c6_1
  - mkl=2022.0.1=h8d4b97c_803
  - mkl-devel=2022.0.1=ha770c72_804
  - mkl-include=2022.0.1=h8d4b97c_803
  - ncurses=6.3=h27087fc_1
  - nettle=3.6=he412f7d_0
  - ninja=1.10.2=h4bd325d_1
  - numpy=1.21.6=py37h976b520_0
  - olefile=0.46=pyh9f0ad1d_1
  - openh264=2.1.1=h780b84a_0
  - openssl=3.0.3=h166bdaf_0
  - pillow=5.4.1=py37h34e0f95_0
  - pip=21.0.1=pyhd8ed1ab_0
  - python=3.7.12=hf930737_100_cpython
  - python_abi=3.7=2_cp37m
  - pytorch=1.9.0=py3.7_cuda11.1_cudnn8.0.5_0
  - readline=8.1=h46c0cb4_0
  - setuptools=62.1.0=py37h89c1867_0
  - sqlite=3.38.2=h4ff8645_0
  - tbb=2021.5.0=h924138e_1
  - tk=8.6.12=h27826a3_0
  - torchaudio=0.9.0=py37
  - torchvision=0.10.0=py37_cu111
  - wheel=0.37.1=pyhd8ed1ab_0
  - xz=5.2.5=h516909a_1
  - zlib=1.2.11=h166bdaf_1014
  - pip:
    - anyio==3.5.0
    - argon2-cffi==21.3.0
    - argon2-cffi-bindings==21.2.0
    - attrs==21.4.0
    - babel==2.10.1
    - backcall==0.2.0
    - beautifulsoup4==4.11.1
    - bitarray==2.4.1
    - bleach==5.0.0
    - blis==0.7.7
    - catalogue==2.0.7
    - certifi==2021.10.8
    - cffi==1.15.0
    - charset-normalizer==2.0.12
    - click==8.0.4
    - cymem==2.0.6
    - debugpy==1.6.0
    - decorator==5.1.1
    - defusedxml==0.7.1
    - entrypoints==0.4
    - fastjsonschema==2.15.3
    - fastrlock==0.8
    - filelock==3.6.0
    - gitdb==4.0.9
    - gitpython==3.1.27
    - huggingface-hub==0.5.1
    - idna==3.3
    - importlib-metadata==4.11.3
    - importlib-resources==5.7.1
    - ipykernel==6.13.0
    - ipython==7.32.0
    - ipython-genutils==0.2.0
    - ipywidgets==7.7.0
    - jedi==0.18.1
    - jinja2==3.1.1
    - joblib==1.1.0
    - json5==0.9.6
    - jsonschema==4.4.0
    - jupyter==1.0.0
    - jupyter-client==7.3.0
    - jupyter-console==6.4.3
    - jupyter-core==4.10.0
    - jupyter-server==1.16.0
    - jupyterlab==3.3.4
    - jupyterlab-pygments==0.2.2
    - jupyterlab-server==2.13.0
    - jupyterlab-widgets==1.1.0
    - langcodes==3.3.0
    - markupsafe==2.1.1
    - matplotlib-inline==0.1.3
    - mistune==0.8.4
    - murmurhash==1.0.7
    - nbclassic==0.3.7
    - nbclient==0.6.0
    - nbconvert==6.5.0
    - nbformat==5.3.0
    - nest-asyncio==1.5.5
    - notebook==6.4.11
    - notebook-shim==0.1.0
    - packaging==21.3
    - pandocfilters==1.5.0
    - parso==0.8.3
    - pathy==0.6.1
    - pexpect==4.8.0
    - pickleshare==0.7.5
    - preshed==3.0.6
    - prometheus-client==0.14.1
    - prompt-toolkit==3.0.29
    - psutil==5.9.0
    - ptyprocess==0.7.0
    - pycparser==2.21
    - pydantic==1.8.2
    - pygments==2.12.0
    - pyparsing==3.0.8
    - pyrsistent==0.18.1
    - python-dateutil==2.8.2
    - pytz==2022.1
    - pyyaml==6.0
    - pyzmq==22.3.0
    - qtconsole==5.3.0
    - qtpy==2.0.1
    - regex==2022.4.24
    - requests==2.27.1
    - sacremoses==0.0.49
    - scipy==1.7.3
    - send2trash==1.8.0
    - six==1.16.0
    - smart-open==5.2.1
    - smmap==5.0.0
    - sniffio==1.2.0
    - soupsieve==2.3.2.post1
    - spacy==3.2.4
    - spacy-legacy==3.0.9
    - spacy-loggers==1.0.2
    - srsly==2.4.3
    - terminado==0.13.3
    - thinc==8.0.15
    - tinycss2==1.1.1
    - tokenizers==0.10.3
    - tornado==6.1
    - tqdm==4.64.0
    - traitlets==5.1.1
    - transformers==4.10.0
    - typer==0.4.1
    - typing-extensions==3.10.0.2
    - ujson==5.2.0
    - urllib3==1.26.9
    - wasabi==0.9.1
    - wcwidth==0.2.5
    - webencodings==0.5.1
    - websocket-client==1.3.2
    - widgetsnbextension==3.6.0
    - zipp==3.8.0
prefix: xxx/.conda/envs/colbert-v0.4

opened by MFajcik 2

Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval (NeurIPS'21)

Related tags

Overview

Baleen

Baleen is a state-of-the-art model for multi-hop reasoning, enabling scalable multi-hop search over massive collections for knowledge-intensive tasks like QA and claim verification.

Installation

Usage

You might also like...

Permeability Prediction Via Multi Scale 3D CNN

Multi-Scale Geometric Consistency Guided Multi-View Stereo

Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

[EMNLP 2021] MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations

Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

Code for the paper: Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization (https://arxiv.org/abs/2002.11798)

Code for C2-Matching (CVPR2021). Paper: Robust Reference-based Super-Resolution via C2-Matching.

Code for the CVPR 2021 paper: Understanding Failures of Deep Networks via Robust Feature Extraction

Comments

TypeError after inference

Owner

Stanford Future Data Systems

This repository is the official implementation of Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning (NeurIPS21).

Simulation of self-focusing of laser beams in condensed media

Codes for NAACL 2021 Paper "Unsupervised Multi-hop Question Answering by Question Generation"

Robust Instance Segmentation through Reasoning about Multi-Object Occlusion [CVPR 2021]

Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Image-retrieval-baseline - MUGE Multimodal Retrieval Baseline

PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition, CVPR 2018

The code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning"