SGPT: Multi-billion parameter models for semantic search

Related tags

Deep Learning sgpt
Overview

SGPT: Multi-billion parameter models for semantic search

This repository contains code, results and pre-trained models for the paper SGPT: Multi-billion parameter models for semantic search. - TODO: Link to arxiv

**************************** Updates ****************************

Quick Links

Overview

We present SGPT-CE and SGPT-BE for applying large transformer decoders as Cross-Encoders or Bi-Encoders to symmetric or asymmetric search. SGPT-CE uses log-probability extraction of pre-trained models. SGPT-BE uses position-weighted mean pooling and contrastive fine-tuning of only bias tensors (BitFit).

Feel free to open an issue should you have any questions~

Structure

.
├── biencoder  # Training & Inference of Bi-Encoders
│   ├── beir
│   │   ├── custommodels # Directory providing BEIR compatibility for asymmetric mdoels & models with special tokens
│   │   │   └── ...
│   │   ├── io_utils # Exclusively used for beir_openai_embeddings_batched_parallel.py
│   │   │   └── ...
│   │   ├── parallelizer # Exclusively used for beir_openai_embeddings_batched_parallel.py
│   │   │   └── ...
│   │   ├── beir_dense_retriever.py
│   │   ├── beir_openai_embeddings_batched_parallel.py
│   │   ├── requirements.txt
│   │   ├── *.bash # Bash scripts to run multiple experiments
│   │   └── README.md
│   ├── nli_msmarco
│   │   ├── sentence-transformers # An adapted version of sentence-transformers - Install this version for all biencoder experiments
│   │   │   └── ...
│   │   └── README.md
│   └── useb
│       ├── useb
│       │   └── ...
│       ├── *.bash # Bash scripts to run multiple experiments
│       ├── useb_dense_retriever.py
│       └── README.md
├── crossencoder  # Inference of crossencoders
│   └── beir
│       ├── *.ipynb # Notebooks explained in the README
│       └── README.md
├── other
│   ├── sgpt_graphic.png
│   └── sgpt_utils.ipynb # Code for creating the graphs in the paper & other
├── requirements.txt
└── README.md

Each data sub-directory provides its own README with an overview of its Structure, Downloads (Datasets, Models) & Commands used to produce the datasets, models & other things. Generally, you can find all models at https://huggingface.co/Muennighoff and json results in various datasets at https://www.kaggle.com/muennighoff/datasets. Model names are explained in their Huggingface READMEs. Dataset names are explained in the sub-folders of this repository.

Use SGPT with Huggingface

Below we provide python examples to use the pre-trained models for your own semantic search use case. We highly recommend replacing the model names with larger models, e.g. Muennighoff/SGPT-5.8B-weightedmean-nli-bitfit for biencoder/symmetric. For small models, SBERT outperforms SGPT. See our paper for more information.

Biencoder

Symmetric Semantic Search
import torch
from transformers import AutoModel, AutoTokenizer
from scipy.spatial.distance import cosine

# Get our models - The package will take care of downloading the models automatically
# For best performance: Muennighoff/SGPT-5.8B-weightedmean-nli-bitfit
tokenizer = AutoTokenizer.from_pretrained("Muennighoff/SGPT-125M-weightedmean-nli-bitfit")
model = AutoModel.from_pretrained("Muennighoff/SGPT-125M-weightedmean-nli-bitfit")

# Tokenize input texts
texts = [
    "deep learning",
    "artificial intelligence",
    "deep throating",
    "artificial snow",
]
batch_tokens = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")

# Get the embeddings
with torch.no_grad():
    # Get hidden state of shape [bs, seq_len, hid_dim]
    last_hidden_state = model(**batch_tokens, output_hidden_states=True, return_dict=True).last_hidden_state

# Get weights of shape [bs, seq_len, hid_dim]
weights = (
    torch.arange(start=1, end=last_hidden_state.shape[1] + 1)
    .unsqueeze(0)
    .unsqueeze(-1)
    .expand(last_hidden_state.size())
    .float().to(last_hidden_state.device)
)

# Get attn mask of shape [bs, seq_len, hid_dim]
input_mask_expanded = (
    batch_tokens["attention_mask"]
    .unsqueeze(-1)
    .expand(last_hidden_state.size())
    .float()
)

# Perform weighted mean pooling across seq_len: bs, seq_len, hidden_dim -> bs, hidden_dim
sum_embeddings = torch.sum(last_hidden_state * input_mask_expanded * weights, dim=1)
sum_mask = torch.sum(input_mask_expanded * weights, dim=1)

embeddings = sum_embeddings / sum_mask

# Calculate cosine similarities
# Cosine similarities are in [-1, 1]. Higher means more similar
cosine_sim_0_1 = 1 - cosine(embeddings[0], embeddings[1])
cosine_sim_0_2 = 1 - cosine(embeddings[0], embeddings[2])
cosine_sim_0_3 = 1 - cosine(embeddings[0], embeddings[3])

print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (texts[0], texts[1], cosine_sim_0_1))
print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (texts[0], texts[2], cosine_sim_0_2))
print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (texts[0], texts[3], cosine_sim_0_3))
Asymmetric Semantic Search
import torch
from transformers import AutoModel, AutoTokenizer
from scipy.spatial.distance import cosine

# Get our models - The package will take care of downloading the models automatically
# For best performance: Muennighoff/SGPT-5.8B-weightedmean-msmarco-specb-bitfit
tokenizer = AutoTokenizer.from_pretrained("Muennighoff/SGPT-125M-weightedmean-msmarco-specb-bitfit")
model = AutoModel.from_pretrained("Muennighoff/SGPT-125M-weightedmean-msmarco-specb-bitfit")

queries = [
    "I'm searching for a planet not too far from Earth.",
]

docs = [
    "Neptune is the eighth and farthest-known Solar planet from the Sun. In the Solar System, it is the fourth-largest planet by diameter, the third-most-massive planet, and the densest giant planet. It is 17 times the mass of Earth, slightly more massive than its near-twin Uranus.",
    "TRAPPIST-1d, also designated as 2MASS J23062928-0502285 d, is a small exoplanet (about 30% the mass of the earth), which orbits on the inner edge of the habitable zone of the ultracool dwarf star TRAPPIST-1 approximately 40 light-years (12.1 parsecs, or nearly 3.7336×1014 km) away from Earth in the constellation of Aquarius.",
    "A harsh desert world orbiting twin suns in the galaxy’s Outer Rim, Tatooine is a lawless place ruled by Hutt gangsters. Many settlers scratch out a living on moisture farms, while spaceport cities such as Mos Eisley and Mos Espa serve as home base for smugglers, criminals, and other rogues.",
]

SPECB_QUE_BOS = tokenizer.encode("[", add_special_tokens=False)[0]
SPECB_QUE_EOS = tokenizer.encode("]", add_special_tokens=False)[0]

SPECB_DOC_BOS = tokenizer.encode("{", add_special_tokens=False)[0]
SPECB_DOC_EOS = tokenizer.encode("}", add_special_tokens=False)[0]


def tokenize_with_specb(texts, is_query):
    # Tokenize without padding
    batch_tokens = tokenizer(texts, padding=False, truncation=True)   
    # Add special brackets & pay attention to them
    for seq, att in zip(batch_tokens["input_ids"], batch_tokens["attention_mask"]):
        if is_query:
            seq.insert(0, SPECB_QUE_BOS)
            seq.append(SPECB_QUE_EOS)
        else:
            seq.insert(0, SPECB_DOC_BOS)
            seq.append(SPECB_DOC_EOS)
        att.insert(0, 1)
        att.append(1)
    # Add padding
    batch_tokens = tokenizer.pad(batch_tokens, padding=True, return_tensors="pt")
    return batch_tokens

def get_weightedmean_embedding(batch_tokens, model):
    # Get the embeddings
    with torch.no_grad():
        # Get hidden state of shape [bs, seq_len, hid_dim]
        last_hidden_state = model(**batch_tokens, output_hidden_states=True, return_dict=True).last_hidden_state

    # Get weights of shape [bs, seq_len, hid_dim]
    weights = (
        torch.arange(start=1, end=last_hidden_state.shape[1] + 1)
        .unsqueeze(0)
        .unsqueeze(-1)
        .expand(last_hidden_state.size())
        .float().to(last_hidden_state.device)
    )

    # Get attn mask of shape [bs, seq_len, hid_dim]
    input_mask_expanded = (
        batch_tokens["attention_mask"]
        .unsqueeze(-1)
        .expand(last_hidden_state.size())
        .float()
    )

    # Perform weighted mean pooling across seq_len: bs, seq_len, hidden_dim -> bs, hidden_dim
    sum_embeddings = torch.sum(last_hidden_state * input_mask_expanded * weights, dim=1)
    sum_mask = torch.sum(input_mask_expanded * weights, dim=1)

    embeddings = sum_embeddings / sum_mask

    return embeddings


query_embeddings = get_weightedmean_embedding(tokenize_with_specb(queries, is_query=True), model)
doc_embeddings = get_weightedmean_embedding(tokenize_with_specb(docs, is_query=False), model)

# Calculate cosine similarities
# Cosine similarities are in [-1, 1]. Higher means more similar
cosine_sim_0_1 = 1 - cosine(query_embeddings[0], doc_embeddings[0])
cosine_sim_0_2 = 1 - cosine(query_embeddings[0], doc_embeddings[1])
cosine_sim_0_3 = 1 - cosine(query_embeddings[0], doc_embeddings[2])

print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (queries[0], docs[0][:20] + "...", cosine_sim_0_1))
print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (queries[0], docs[1][:20] + "...", cosine_sim_0_2))
print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (queries[0], docs[2][:20] + "...", cosine_sim_0_3))

Crossencoder

Asymmetric Semantic Search
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from scipy.spatial.distance import cosine

# Get models - The package will take care of downloading the models automatically
# For best performance: EleutherAI/gpt-j-6B
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-125M")
model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-125M")

prompt = 'Documents are searched to find matches with the same content.\nThe document "{}" is a good search result for "'

queries = [
    "I'm searching for a planet not too far from Earth.",
]

docs = [
    "Neptune is the eighth and farthest-known Solar planet from the Sun. In the Solar System, it is the fourth-largest planet by diameter, the third-most-massive planet, and the densest giant planet. It is 17 times the mass of Earth, slightly more massive than its near-twin Uranus.",
    "TRAPPIST-1d, also designated as 2MASS J23062928-0502285 d, is a small exoplanet (about 30% the mass of the earth), which orbits on the inner edge of the habitable zone of the ultracool dwarf star TRAPPIST-1 approximately 40 light-years (12.1 parsecs, or nearly 3.7336×1014 km) away from Earth in the constellation of Aquarius.",
    "A harsh desert world orbiting twin suns in the galaxy’s Outer Rim, Tatooine is a lawless place ruled by Hutt gangsters. Many settlers scratch out a living on moisture farms, while spaceport cities such as Mos Eisley and Mos Espa serve as home base for smugglers, criminals, and other rogues.",
]

for query in queries:
    print(f"Query: {query}")
    for doc in docs:
        context = prompt.format(doc)

        context_enc = tokenizer.encode(context, add_special_tokens=False)
        continuation_enc = tokenizer.encode(query, add_special_tokens=False)
        # Slice off the last token, as we take its probability from the one before
        model_input = torch.tensor(context_enc+continuation_enc[:-1])
        continuation_len = len(continuation_enc)
        input_len, = model_input.shape

        # [seq_len] -> [seq_len, vocab]
        logprobs = torch.nn.functional.log_softmax(model(model_input)[0], dim=-1).cpu()
        # [seq_len, vocab] -> [continuation_len, vocab]
        logprobs = logprobs[input_len-continuation_len:]
        # Gather the log probabilities of the continuation tokens -> [continuation_len]
        logprobs = torch.gather(logprobs, 1, torch.tensor(continuation_enc).unsqueeze(-1)).squeeze(-1)
        score = torch.sum(logprobs)
        # The higher (closer to 0), the better
        print(f"Document: {doc[:20] + '...'} Score: {score}")
Symmetric Semantic Search

You can use the same code as in the above CE-Asym section but change the prompt. Feel free to share prompts that work well :)

Acknowledgements

We thank XYZ for insightful discussions and valuable feedback throughout the project. This work has been supported by OpenAI under the academic access program. This work would not have been possible without:

Citation

Feel free to cite our paper if SGPT is helpful to you :)

@inproceedings{,
   title={{SGPT}: Multi-billion parameter models for semantic search},
   author={XXX},
   year={2022}
}
Comments
  • Construct SGPT

    Construct SGPT

    Hello, I am currently working on my project and I'm interested on your paper.

    First of all, is it correct that SGPT is based on GPT-Neo? If yes, it is possible to construct SGPT that is based on GPT-2? How to construct it from scratch?

    Thank you

    opened by tsamarahrana 4
  • Metric scores for CQADupStack

    Metric scores for CQADupStack

    Hey, what an awesome paper! I was looking for some benchmarks for semantic search and I must say it will be really useful in my research :) In this regard, I wanted to ask a question - on this page on papers with code: https://paperswithcode.com/sota/information-retrieval-on-cqadupstack you're listed as highest scoring for CQADupStack, however the metric they list there is mAP@100, and in your paper the metric listed under this score nDCG@10. Which one is correct? I'm asking, as I will be using this dataset to validate implementations of a few models, and wanted to be sure I'll be comparing correct metrics

    opened by rafaljanwojcik 2
  • Theory question about the token weightings for symetric search.

    Theory question about the token weightings for symetric search.

    First things first, I loved reading your paper. Was clear, concise and has great implications for semantic search going forward. Cannot compiment highly enough!

    One question. I would like to make use of a similar method to get semantic embedding for non GPT auto regressive language models. In the paper I read

    The causal attention mask in an auto-regressive decoder transformer, tokens do not attend to
    future tokens like in an encoder transformer. Hence, only the last token has attended to all tokens in a
    sequence. To account for this information mismatch, we propose to give later tokens a higher weight
    using a position-weighted mean pooling method:
    v =
    S∑
    i=1
    wihi where wi = i
    ∑S
    i=1 i (2)
    where S is the sequence length, hi the ith hidden state and v the query or document embedding. We
    compare weighted mean pooling with last token pooling, where the hidden state of the final token is
    the embedding, and regular mean pooling
    

    This trick is really neat, but I was wondering if this would work for autoregressive decoder only models that use a causal language model loss, for example the XGLM model set? https://huggingface.co/facebook/xglm-564M

    How about for autoregressive LM's that do not make use of causal language model losses, but instead use next-token prediction language modeling? Such as the CodeGen model set? https://arxiv.org/pdf/2203.13474.pdf [if you are unfamilar, the training section is 2.2 :) ]

    I understand if there is not a clear answer to these questions, but I would love to hear your thoughts either way. Thanks again!

    opened by cm2435 2
  • Learning rate & schedule

    Learning rate & schedule

    Hi @Muennighoff

    thanks for your interesting paper and for sharing this repo! You seem using different learning rates and different schedules (sometimes with warm-up, sometimes constant LR).

    Did you find these settings empirically? Do they depend on the model size or dataset? Is there any particular LR setting you would recommend when training a GPT model with BitFit?

    opened by malteos 2
  • Error when using sentence_transformer

    Error when using sentence_transformer

    When I run: model = SentenceTransformer("Muennighoff/SGPT-125M-weightedmean-nli-bitfit") the following error happens: TypeError: Pooling.__init__() got an unexpected keyword argument 'pooling_mode_weightedmean_tokens'.

    My machine:

    • OS: MacOs, chip M1
    • pytorch: 1.12.1
    • sentence_transformer: 2.2.2
    • python: 3.10
    opened by TamHHM 1
  • ValueError: not enough values to unpack (expected 2, got 1)

    ValueError: not enough values to unpack (expected 2, got 1)

    Hi,

    I'm trying to run the Cross-Encoder given example. i faced this error in Line no 41.

       torch.nn.functional.log_softmax(model(model_input)[0], dim=-1).cpu()
    

    i tried modifying dimentions but no luck, tried to give input as 2 seperate parameters but also did not work.

    Please kindly help me which parameter should i update to avoid unpack error.

    Thank You and Much Appreciated

     ValueError                                Traceback (most recent call last)
     Cell In [54], line 41
     37 print(input_len)
     38 # print('model',model(model_input))
     39 
     40 # [seq_len] -> [seq_len, vocab]
      ---> 41 logprobs = torch.nn.functional.log_softmax(model(model_input)[0], dim=-1).cpu()
     42 # [seq_len, vocab] -> [continuation_len, vocab]
     43 logprobs = logprobs[input_len-continuation_len:]
    
     File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
       1126 # If we don't have any hooks, we want to skip the rest of the logic in
        1127 # this function, and just call forward.
        1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
        1129         or _global_forward_hooks or _global_forward_pre_hooks):
        -> 1130     return forward_call(*input, **kwargs)
        1131 # Do not call functions when jit is used
        1132 full_backward_hooks, non_full_backward_hooks = [], []
    
        File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/transformers/models/gpt_neo/modeling_gpt_neo.py:974, in 
        GPTNeoForCausalLM.forward(self, input_ids, past_key_values, attention_mask, token_type_ids, position_ids, head_mask, 
        inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict)
        966 r"""
        967 labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
        968     Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set
        969     ``labels = input_ids`` Indices are selected in ``[-100, 0, ..., config.vocab_size]`` All labels set to
        970     ``-100`` are ignored (masked), the loss is only computed for labels in ``[0, ..., config.vocab_size]``
        971 """
        972 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
        --> 974 transformer_outputs = self.transformer(
        975     input_ids,
        976     past_key_values=past_key_values,
        977     attention_mask=attention_mask,
        978     token_type_ids=token_type_ids,
        979     position_ids=position_ids,
        980     head_mask=head_mask,
        981     inputs_embeds=inputs_embeds,
        982     use_cache=use_cache,
        983     output_attentions=output_attentions,
        984     output_hidden_states=output_hidden_states,
        985     return_dict=return_dict,
        986 )
        987 hidden_states = transformer_outputs[0]
        989 lm_logits = self.lm_head(hidden_states)
    
       File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, 
      *input, **kwargs)
       1126 # If we don't have any hooks, we want to skip the rest of the logic in
        1127 # this function, and just call forward.
        1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
         1129         or _global_forward_hooks or _global_forward_pre_hooks):
        -> 1130     return forward_call(*input, **kwargs)
        1131 # Do not call functions when jit is used
        1132 full_backward_hooks, non_full_backward_hooks = [], []
    
         File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/transformers/models/gpt_neo/modeling_gpt_neo.py:799, in 
         GPTNeoModel.forward(self, input_ids, past_key_values, attention_mask, token_type_ids, position_ids, head_mask, 
         inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict)
         796     global_attention_mask = None
         798 # Local causal attention mask
         --> 799 batch_size, seq_length = input_shape
         800 full_seq_length = seq_length + past_length
         801 local_attention_mask = GPTNeoAttentionMixin.create_local_attention_mask(
         802     batch_size, full_seq_length, self.config.window_size, device, attention_mask
         803 )
    
         ValueError: not enough values to unpack (expected 2, got 1)
    
    opened by Roopesh-Bharatwaj-K-R 1
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • Training SGPT for Custom Dataset

    Training SGPT for Custom Dataset

    Hi I read your paper that is cool, am trying to do this on my own dataset and my dataset is huge. Can you please tell me the exact ways to train from the scratch to achieve SGPT- both symmetric and asymmetric in both the encoder. But cross encoder would be our interest. I Have one doubt are you using bert to produce cross and BI encoder embedding. In my understanding you are using BERT as initial pipeline before fetching it to GPT to produce the cosine similarity and log probabilities please help

    opened by rajarajanvakil 1
  • OpenAI-GPT3 search endpoint deprecated

    OpenAI-GPT3 search endpoint deprecated

    Hello there, the search end point(purpose) as been deprecated is there any alternative solution on this. This happens at CE-BEIR notebook. Thanks in advance.

    opened by rajarajanvakil 4
Owner
Niklas Muennighoff
Niklas Muennighoff
A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.

A curated list of awesome resources related to Semantic Search?? and Semantic Similarity tasks.

null 224 Jan 4, 2023
PyTorch code for the paper "Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval".

Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval (M2HSE) PyTorch code fo

Xinlei-Pei 6 Dec 23, 2022
Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

pgmpy pgmpy is a python library for working with Probabilistic Graphical Models. Documentation and list of algorithms supported is at our official sit

pgmpy 2.2k Jan 3, 2023
Hyper-parameter optimization for sklearn

hyperopt-sklearn Hyperopt-sklearn is Hyperopt-based model selection among machine learning algorithms in scikit-learn. See how to use hyperopt-sklearn

null 1.4k Jan 1, 2023
A mini library for Policy Gradients with Parameter-based Exploration, with reference implementation of the ClipUp optimizer from NNAISENSE.

PGPElib A mini library for Policy Gradients with Parameter-based Exploration [1] and friends. This library serves as a clean re-implementation of the

NNAISENSE 56 Jan 1, 2023
Implementation of the 😇 Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones

HaloNet - Pytorch Implementation of the Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones. This re

Phil Wang 189 Nov 22, 2022
An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

null 45 Dec 8, 2022
Visualize Camera's Pose Using Extrinsic Parameter by Plotting Pyramid Model on 3D Space

extrinsic2pyramid Visualize Camera's Pose Using Extrinsic Parameter by Plotting Pyramid Model on 3D Space Intro A very simple and straightforward modu

JEONG HYEONJIN 106 Dec 28, 2022
Compositional and Parameter-Efficient Representations for Large Knowledge Graphs

NodePiece - Compositional and Parameter-Efficient Representations for Large Knowledge Graphs NodePiece is a "tokenizer" for reducing entity vocabulary

Michael Galkin 107 Jan 4, 2023
The Power of Scale for Parameter-Efficient Prompt Tuning

The Power of Scale for Parameter-Efficient Prompt Tuning Implementation of soft embeddings from https://arxiv.org/abs/2104.08691v1 using Pytorch and H

Kip Parker 208 Dec 30, 2022
🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐

?? Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐

xmu-xiaoma66 7.7k Jan 5, 2023
Based on Yolo's low-power, ultra-lightweight universal target detection algorithm, the parameter is only 250k, and the speed of the smart phone mobile terminal can reach ~300fps+

Based on Yolo's low-power, ultra-lightweight universal target detection algorithm, the parameter is only 250k, and the speed of the smart phone mobile terminal can reach ~300fps+

null 567 Dec 26, 2022
Code for Parameter Prediction for Unseen Deep Architectures (NeurIPS 2021)

Parameter Prediction for Unseen Deep Architectures (NeurIPS 2021) authors: Boris Knyazev, Michal Drozdzal, Graham Taylor, Adriana Romero-Soriano Overv

Facebook Research 462 Jan 3, 2023
A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data

A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data Overview Clustering analysis is widely utilized in single-cell RNA-seque

AI-Biomed @NSCC-gz 3 May 8, 2022
Implementation of "The Power of Scale for Parameter-Efficient Prompt Tuning"

Prompt-Tuning Implementation of "The Power of Scale for Parameter-Efficient Prompt Tuning" Currently, we support the following huggigface models: Bart

Andrew Zeng 36 Dec 19, 2022
Pynomial - a lightweight python library for implementing the many confidence intervals for the risk parameter of a binomial model

Pynomial - a lightweight python library for implementing the many confidence intervals for the risk parameter of a binomial model

Demetri Pananos 9 Oct 4, 2022
Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"

T-Few This repository contains the official code for the paper: "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learni

null 220 Dec 31, 2022
Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

U2Fusion Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal (VIS-IR, medical), multi

Han Xu 129 Dec 11, 2022
Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders

MultiMAE: Multi-modal Multi-task Masked Autoencoders Roman Bachmann*, David Mizrahi*, Andrei Atanov, Amir Zamir Website | arXiv | BibTeX Official PyTo

Visual Intelligence & Learning Lab, Swiss Federal Institute of Technology (EPFL) 385 Jan 6, 2023