Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval (NeurIPS'21)

Related tags

Deep Learning Baleen
Overview

Baleen

Baleen is a state-of-the-art model for multi-hop reasoning, enabling scalable multi-hop search over massive collections for knowledge-intensive tasks like QA and claim verification.

Figure 1: Baleen's condensed retrieval architecture for multi-hop search.

Installation

The implementation of Baleen lives as part of the parent ColBERT repository (under its new_api branch).

After cloning, make sure you obtain the code for the submodule too:

git submodule update --init --recursive

Please follow the installation instructions from the submodule. Baleen has the same requirements as the parent ColBERT repository.

Usage

We will update this README with instructions and model checkpoints in the next few hours! Check back or "Watch" the github repo for updates.

You might also like...
Permeability Prediction Via Multi Scale 3D CNN

Permeability-Prediction-Via-Multi-Scale-3D-CNN Data: The raw CT rock cores are obtained from the Imperial Colloge portal. The CT rock cores are sub-sa

Multi-Scale Geometric Consistency Guided Multi-View Stereo

ACMM [News] The code for ACMH is released!!! [News] The code for ACMP is released!!! About ACMM is a multi-scale geometric consistency guided multi-vi

Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021
Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

CMIC-Retrieval Code for Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning. ICCV 2021. Introduction In this wo

[EMNLP 2021] MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations

MuVER This repo contains the code and pre-trained model for our EMNLP 2021 paper: MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity

Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

U2Fusion Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal (VIS-IR, medical), multi

[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)
[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

Feel free to visit my homepage Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DIMP) [ECCVW2020 paper] Presentation

Code for the paper: Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization (https://arxiv.org/abs/2002.11798)

Representation Robustness Evaluations Our implementation is based on code from MadryLab's robustness package and Devon Hjelm's Deep InfoMax. For all t

Code for C2-Matching (CVPR2021). Paper: Robust Reference-based Super-Resolution via C2-Matching.
Code for C2-Matching (CVPR2021). Paper: Robust Reference-based Super-Resolution via C2-Matching.

C2-Matching (CVPR2021) This repository contains the implementation of the following paper: Robust Reference-based Super-Resolution via C2-Matching Yum

Code for the CVPR 2021 paper: Understanding Failures of Deep Networks via Robust Feature Extraction
Code for the CVPR 2021 paper: Understanding Failures of Deep Networks via Robust Feature Extraction

Welcome to Barlow Barlow is a tool for identifying the failure modes for a given neural network. To achieve this, Barlow first creates a group of imag

Comments
  • TypeError after inference

    TypeError after inference

    Hi, when saving the inference results as json file via hover_inference.py, the dictionary contains set. Sets are not serializable via json. Thus the saving fails.

    python -m hover_inference --root ./experiments/ --datadir . --index wiki17.hover.2bit
    
    Traceback (most recent call last):                                                                                                                            
      File "xxx/conda/envs/colbert-v0.4/lib/python3.7/runpy.py", line 193, in _run_module_as_main                                                      
        "__main__", mod_spec)                                                                                                                                     
      File "xxx/.conda/envs/colbert-v0.4/lib/python3.7/runpy.py", line 85, in _run_code                                                                 
        exec(code, run_globals)                                                                                                                                   
      File "yyy/baleen/Baleen/hover_inference.py", line 53, in <module>                                                                             
        main(args)                                                                                                                                                
      File "yyy/baleen/Baleen/hover_inference.py", line 43, in main                                                                                 
        f.write(ujson.dumps(outputs) + '\n')                                                                                                                      
    TypeError: {3910663, 1373715, 833561, 2479648, 3921953, 3408419, 3188274, 1399859, 372789, 1117238, 3283510, 3342401, 2585678, 1428049, 4948563, 1399892, 4449
    365, 4216407, 4502103, 819287, 3598429, 5187684, 625781, 3042432, 1485442, 3487369, 4166284, 148110, 3713169, 1338005, 1951900, 936613, 437414, 556716, 266616
    2, 573620, 4666549, 638144, 4154562, 4315335, 4230859, 4788429, 2613967, 174801, 4054227, 3768532, 5224152, 4914913, 2469090, 460517, 4820205, 1360625, 426418
    5, 3064580, 424200, 4601613, 4707087, 2140434, 3422995, 3878677, 3583776, 2412329, 5212973, 3787053, 4286261, 2512694, 821559, 4174137, 3351359, 349002, 38961
    43, 3414369, 875881, 1557358, 3957103, 4061041, 3913073, 2986353, 959347, 803705, 4757370, 1752441, 2359693, 4729260, 1178030, 1897903, 5206962, 564149, 42382
    75, 4074960, 1900502, 4158425, 4635100, 4552679, 1106923, 3795442, 3049975, 2750972, 4602365, 1399295} is not JSON serializable
    

    Every item in dictionary to be saved looks like this

    0: ([(424200, 2), (4635100, 1), (4635100, 0)], 
    {3910663, 1373715, 833561, 2479648, 3921953, 3408419, 3188274, 1399859, 372789, 1117238, 3283510, 3342401, 2585678, 1428049, 4948563, 1399892, 4449365, 4216407, 4502103, 819287, 3598429, 5187684, 625781, 3042432, 1485442, 3487369, 4166284, 148110, 3713169, 1338005, 19
    51900, 936613, 437414, 556716, 2666162, 573620, 4666549, 638144, 4154562, 4315335, 4230859, 4788429, 2613967, 174801, 4054227, 3768532, 5224152, 4914913, 2469090, 460517, 4820205, 1360625, 4264185, 3064580, 424200, 4601613, 4707087, 2140434, 3422995, 3878677, 3583776, 2412329, 5212973, 3787053, 4286261, 2512694, 821559, 4174137, 3351359, 349002, 3896143, 3414369, 875881, 1557358, 3957103, 4061041, 3913073, 2986353, 959347, 803705, 4757370, 1752441, 2359693, 4729260, 1178030, 1897903, 5206962, 564149, 4238275, 4074960, 1900502, 4158425, 4635100, 4552679, 1106923, 3795442, 3049975, 2750972, 4602365, 1399295})
    

    This is quite annoying, when spending few hours inferring the actual retrieval results :). Cheers, Martin

    environment

    name: colbert-v0.4
    channels:
      - pytorch
      - conda-forge
      - defaults
    dependencies:
      - _libgcc_mutex=0.1=conda_forge
      - _openmp_mutex=4.5=2_kmp_llvm
      - blas=2.114=mkl
      - blas-devel=3.9.0=14_linux64_mkl
      - bzip2=1.0.8=h7f98852_4
      - ca-certificates=2021.10.8=ha878542_0
      - cudatoolkit=11.1.1=h6406543_10
      - cupy=10.4.0=py37h52a254a_0
      - faiss=1.7.0=py37cuda111hcc9d9d6_8_cuda
      - faiss-gpu=1.7.0=h788eb59_8
      - ffmpeg=4.3=hf484d3e_0
      - freetype=2.10.4=h0708190_1
      - gmp=6.2.1=h58526e2_0
      - gnutls=3.6.13=h85f3911_1
      - jpeg=9b=h024ee3a_2
      - lame=3.100=h7f98852_1001
      - ld_impl_linux-64=2.36.1=hea4e1c9_2
      - libblas=3.9.0=14_linux64_mkl
      - libcblas=3.9.0=14_linux64_mkl
      - libfaiss=1.7.0=cuda111hf54f04a_8_cuda
      - libfaiss-avx2=1.7.0=cuda111h1234567_8_cuda
      - libffi=3.4.2=h7f98852_5
      - libgcc-ng=11.2.0=h1d223b6_16
      - libgfortran-ng=11.2.0=h69a702a_16
      - libgfortran5=11.2.0=h5c6108e_16
      - libiconv=1.16=h516909a_0
      - liblapack=3.9.0=14_linux64_mkl
      - liblapacke=3.9.0=14_linux64_mkl
      - libnsl=2.0.0=h7f98852_0
      - libpng=1.6.37=h21135ba_2
      - libstdcxx-ng=11.2.0=he4da1e4_16
      - libtiff=4.0.9=he6b73bb_1
      - libuv=1.43.0=h7f98852_0
      - libzlib=1.2.11=h166bdaf_1014
      - llvm-openmp=13.0.1=he0ac6c6_1
      - mkl=2022.0.1=h8d4b97c_803
      - mkl-devel=2022.0.1=ha770c72_804
      - mkl-include=2022.0.1=h8d4b97c_803
      - ncurses=6.3=h27087fc_1
      - nettle=3.6=he412f7d_0
      - ninja=1.10.2=h4bd325d_1
      - numpy=1.21.6=py37h976b520_0
      - olefile=0.46=pyh9f0ad1d_1
      - openh264=2.1.1=h780b84a_0
      - openssl=3.0.3=h166bdaf_0
      - pillow=5.4.1=py37h34e0f95_0
      - pip=21.0.1=pyhd8ed1ab_0
      - python=3.7.12=hf930737_100_cpython
      - python_abi=3.7=2_cp37m
      - pytorch=1.9.0=py3.7_cuda11.1_cudnn8.0.5_0
      - readline=8.1=h46c0cb4_0
      - setuptools=62.1.0=py37h89c1867_0
      - sqlite=3.38.2=h4ff8645_0
      - tbb=2021.5.0=h924138e_1
      - tk=8.6.12=h27826a3_0
      - torchaudio=0.9.0=py37
      - torchvision=0.10.0=py37_cu111
      - wheel=0.37.1=pyhd8ed1ab_0
      - xz=5.2.5=h516909a_1
      - zlib=1.2.11=h166bdaf_1014
      - pip:
        - anyio==3.5.0
        - argon2-cffi==21.3.0
        - argon2-cffi-bindings==21.2.0
        - attrs==21.4.0
        - babel==2.10.1
        - backcall==0.2.0
        - beautifulsoup4==4.11.1
        - bitarray==2.4.1
        - bleach==5.0.0
        - blis==0.7.7
        - catalogue==2.0.7
        - certifi==2021.10.8
        - cffi==1.15.0
        - charset-normalizer==2.0.12
        - click==8.0.4
        - cymem==2.0.6
        - debugpy==1.6.0
        - decorator==5.1.1
        - defusedxml==0.7.1
        - entrypoints==0.4
        - fastjsonschema==2.15.3
        - fastrlock==0.8
        - filelock==3.6.0
        - gitdb==4.0.9
        - gitpython==3.1.27
        - huggingface-hub==0.5.1
        - idna==3.3
        - importlib-metadata==4.11.3
        - importlib-resources==5.7.1
        - ipykernel==6.13.0
        - ipython==7.32.0
        - ipython-genutils==0.2.0
        - ipywidgets==7.7.0
        - jedi==0.18.1
        - jinja2==3.1.1
        - joblib==1.1.0
        - json5==0.9.6
        - jsonschema==4.4.0
        - jupyter==1.0.0
        - jupyter-client==7.3.0
        - jupyter-console==6.4.3
        - jupyter-core==4.10.0
        - jupyter-server==1.16.0
        - jupyterlab==3.3.4
        - jupyterlab-pygments==0.2.2
        - jupyterlab-server==2.13.0
        - jupyterlab-widgets==1.1.0
        - langcodes==3.3.0
        - markupsafe==2.1.1
        - matplotlib-inline==0.1.3
        - mistune==0.8.4
        - murmurhash==1.0.7
        - nbclassic==0.3.7
        - nbclient==0.6.0
        - nbconvert==6.5.0
        - nbformat==5.3.0
        - nest-asyncio==1.5.5
        - notebook==6.4.11
        - notebook-shim==0.1.0
        - packaging==21.3
        - pandocfilters==1.5.0
        - parso==0.8.3
        - pathy==0.6.1
        - pexpect==4.8.0
        - pickleshare==0.7.5
        - preshed==3.0.6
        - prometheus-client==0.14.1
        - prompt-toolkit==3.0.29
        - psutil==5.9.0
        - ptyprocess==0.7.0
        - pycparser==2.21
        - pydantic==1.8.2
        - pygments==2.12.0
        - pyparsing==3.0.8
        - pyrsistent==0.18.1
        - python-dateutil==2.8.2
        - pytz==2022.1
        - pyyaml==6.0
        - pyzmq==22.3.0
        - qtconsole==5.3.0
        - qtpy==2.0.1
        - regex==2022.4.24
        - requests==2.27.1
        - sacremoses==0.0.49
        - scipy==1.7.3
        - send2trash==1.8.0
        - six==1.16.0
        - smart-open==5.2.1
        - smmap==5.0.0
        - sniffio==1.2.0
        - soupsieve==2.3.2.post1
        - spacy==3.2.4
        - spacy-legacy==3.0.9
        - spacy-loggers==1.0.2
        - srsly==2.4.3
        - terminado==0.13.3
        - thinc==8.0.15
        - tinycss2==1.1.1
        - tokenizers==0.10.3
        - tornado==6.1
        - tqdm==4.64.0
        - traitlets==5.1.1
        - transformers==4.10.0
        - typer==0.4.1
        - typing-extensions==3.10.0.2
        - ujson==5.2.0
        - urllib3==1.26.9
        - wasabi==0.9.1
        - wcwidth==0.2.5
        - webencodings==0.5.1
        - websocket-client==1.3.2
        - widgetsnbextension==3.6.0
        - zipp==3.8.0
    prefix: xxx/.conda/envs/colbert-v0.4
    
    opened by MFajcik 2
Owner
Stanford Future Data Systems
We are a CS research group at Stanford building data-intensive systems
Stanford Future Data Systems
This repository is the official implementation of Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning (NeurIPS21).

Core-tuning This repository is the official implementation of ``Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regular

vanint 18 Dec 17, 2022
Simulation of self-focusing of laser beams in condensed media

What is it? Program for scientific research, which allows to simulate the phenomenon of self-focusing of different laser beams (including Gaussian, ri

Evgeny Vasilyev 13 Dec 24, 2022
Codes for NAACL 2021 Paper "Unsupervised Multi-hop Question Answering by Question Generation"

Unsupervised-Multi-hop-QA This repository contains code and models for the paper: Unsupervised Multi-hop Question Answering by Question Generation (NA

Liangming Pan 70 Nov 27, 2022
Robust Instance Segmentation through Reasoning about Multi-Object Occlusion [CVPR 2021]

Robust Instance Segmentation through Reasoning about Multi-Object Occlusion [CVPR 2021] Abstract Analyzing complex scenes with DNN is a challenging ta

Irene Yuan 24 Jun 27, 2022
Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification (AAAI 2022) Prerequisite PyTorch >= 1.2.0 P

null 16 Dec 14, 2022
Image-retrieval-baseline - MUGE Multimodal Retrieval Baseline

MUGE Multimodal Retrieval Baseline This repo is implemented based on the open_cl

null 47 Dec 16, 2022
PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval (M2HSE) PyTorch code fo

Xinlei-Pei 6 Dec 23, 2022
Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking We revisit and address issues with Oxford 5k and Paris 6k image retrieval benchm

Filip Radenovic 188 Dec 17, 2022
PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition, CVPR 2018

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place

Mikaela Uy 294 Dec 12, 2022
The code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning"

The Code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning" Setting up and using the repo Get the dataset. Follow

null 4 Apr 20, 2022