Skipgram Negative Sampling in PyTorch

Jamie J. Seol

Last update: Dec 14, 2022

Related tags

Overview

PyTorch SGNS

Word2Vec's SkipGramNegativeSampling in Python.

Yet another but quite general negative sampling loss implemented in PyTorch.

It can be used with ANY embedding scheme! Pretty fast, I bet.

vocab_size = 20000
word2vec = Word2Vec(vocab_size=vocab_size, embedding_size=300)
sgns = SGNS(embedding=word2vec, vocab_size=vocab_size, n_negs=20)
optim = Adam(sgns.parameters())
for batch, (iword, owords) in enumerate(dataloader):
    loss = sgns(iword, owords)
    optim.zero_grad()
    loss.backward()
    optim.step()

New: support negative sampling based on word frequency distribution (0.75th power) and subsampling (resolving word frequency imbalance).

To test this repo, place a space-delimited corpus as data/corpus.txt then run python preprocess.py and python train.py --weights --cuda (use -h option for help).

Comments

An error occurred when testing the repo

Hi Thank you for sharing the code. However, when I tried to test the repo with "python preprocess.py" and " python train.py --weights --cuda", the first one worked well and generated processed data, whereas the second reported the error as follows:

[Epoch 1]: 0%| | 0/1 [00:00<?, ?it/s]Traceback (most recent call last): File "train.py", line 93, in train(parse_args()) File "train.py", line 81, in train loss = sgns(iword, owords) File "/home/weixin/anaconda2/envs/p3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in call result = self.forward(*input, **kwargs) File "/home/weixin/Downloads/pytorch-sgns-master/model.py", line 70, in forward ivectors = self.embedding.forward_i(iword).unsqueeze(2) File "/home/weixin/Downloads/pytorch-sgns-master/model.py", line 42, in forward_i return self.ivectors(v) File "/home/weixin/anaconda2/envs/p3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in call result = self.forward(*input, **kwargs) File "/home/weixin/anaconda2/envs/p3/lib/python3.6/site-packages/torch/nn/modules/sparse.py", line 103, in forward self.scale_grad_by_freq, self.sparse RuntimeError: save_for_backward can only save input or output tensors, but argument 0 doesn't satisfy this condition

I am quite new to Pytorch so any idea what might go wrong? Many thanks.

opened by DexterZeng 4
Bug in the Loss Function

The loss function currently implement is -(oloss + nloss).mean()

It should be (-oloss + nloss).mean()

You want to minimize the distance between "positive samples" and maximize the distance between "negative samples".

opened by saketguru 2
Different embeddings for input/output words?

Hey there, great skipgram example, so thank you for that.

I have a question on why you decided to use different embeddings for the "input" words and "output"/"negative" words? See lines below: https://github.com/theeluwin/pytorch-sgns/blob/master/model.py#L29:L30

I imagine this could give better performance on some problem, but haven't been able to test this myself yet. Thanks for the help!

opened by phillynch7 2
Fix misspelling when checking cuda availbility

Hi, While I try to understand the model part, I think I found some misspelling. I think we should have to check the cuda availability for ovectors.weight

opened by heartcored98 2
How to ensure that the negative sampled words are not the target word?

First, thanks for you excellent code :)

In model.py, the following piece of code suggests that we may get positive word when we do negative sampling, though the probability is very small. nwords = t.multinomial(self.weights, batch_size * context_size * self.n_negs, replacement=True).view(batch_size, -1) I'm wondering why you didn't perform equality check, is that because it doesn't affect the quality of trained word vectors but slow down the training speed? Are there other reasons?

opened by jeffchy 1
applying regularisation
Hi theeluwin!

First of all thanks for the code, it was well written and helped me a ton in building my own word2vec model.

This is not an issue per se, but something I'm potentially adding to the word2vec model using your code, the main idea is to use regularisation on embeddings in a temporal setting. I've run into trouble with the code and I'm wondering if you'd be so kind as to help out!

the main idea is that I'm training 2 sets of models (model 0 & 1) consecutively based on 2 sets of corpora, the 2 sets are temporally adjacent (say news articles of 01/jan and 02/jan), during the training of model 1, I'd like to add a penalty term to the loss/cost function: for all the words in set(vocab_0)&set(vocab_1), I'd like to minimise the distance of the same word's embeddings from period 0 & 1.

I'm not sure if it makes sense!

So far I'm testing on embeddings of rather small dimensions ~ 20, therefore I'm using the Euclidean distance as a measure.

based on your code, I added a fordward_r function in the Word2Vec class: ` def forward_r(self, data):

if data is not None: v = LT(data) v = v.cuda() if self.ivectors.weight.is_cuda else v return(self.ivectors(v)) else: return(None)

`

This function simply extracts the relevant embeddings (words from the intersection of the 2 vocabs)

and then in the SGNS, I'm now only testing on 1 particular embedding, I added the following loss calculation that look like this:

rvectors = self.embedding.forward_r(rwords) rloss = 3*((rvectors.squeeze() - self.vector3)**2).sum()

and finally it woud return the following total loss: return -(oloss + nloss).mean() + rloss

However the problem is, the loss gets stuck, it never updates, and it appears that the back propagation is not working properly.

As you can probably tell, I'm rather new to pytorch and I'm not sure if you could lend me a hand on what's happening!

Thank you so much in advance!
opened by ruoyzhang 1
Purpose of unks in skipgram function

Hi,

Can you please explain, what is the purpose of including the <UNK> tokens in the owords vector produced by skipgram function? What should model learn by using these as training examples?

Also, what is the purpose of variable ws in train function, if it's not used anywhere after its definition?

opened by mmlynarik 2
Where is Expectation

In this formula we have an expectation of $w_i$. That means for each pair of $(w_I, w_O)$ we should calculate this expectation. But as I can see in your code you are sampling n_negs of Negative Samples for each pair of $(w_I, w_O)$. Wouldn't that be more correct if we sample n_negs times $N$ of $w_i$ to obtain an empirical mean of expression in square brackets and after than accumulate n_negs of means?

opened by zetyquickly 5
Confused by the loss function.

In your code, you minimized -(oloss + nloss).mean()

which means (oloss+nloss) should be large. So, "oloss become large and nloss become small " is expected.

Although -(oloss+nloss) decrease, I got oloss become small and nloss become large, how so?

opened by JinYang88 2

Owner

Jamie J. Seol

@theeluwin

GitHub

Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

2D-TAN (Optimized) Introduction This is an optimized re-implementation repository for AAAI'2020 paper: Learning 2D Temporal Localization Networks for

112 Dec 31, 2022

Minimisation of a negative log likelihood fit to extract the lifetime of the D^0 meson (MNLL2ELDM)

Minimisation of a negative log likelihood fit to extract the lifetime of the D^0 meson (MNLL2ELDM) Introduction The average lifetime of the $D^{0}$ me

1 Dec 17, 2021

Personal implementation of paper "Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval"

Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval This repo provides personal implementation of paper Approximate Ne

8 Oct 7, 2022

Official PyTorch implementation for FastDPM, a fast sampling algorithm for diffusion probabilistic models

Official PyTorch implementation for "On Fast Sampling of Diffusion Probabilistic Models". FastDPM generation on CIFAR-10, CelebA, and LSUN datasets. S

68 Dec 26, 2022

Pytorch implementation of Straight Sampling Network For Point Cloud Learning (ICIP2021).

Pytorch code for SS-Net This is a pytorch implementation of Straight Sampling Network For Point Cloud Learning (ICIP2021). Environment Code is tested

1 May 18, 2022

Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

Little Ball of Fur is a graph sampling extension library for Python. Please look at the Documentation, relevant Paper, Promo video and External Resour

619 Dec 14, 2022

The Python ensemble sampling toolkit for affine-invariant MCMC

emcee The Python ensemble sampling toolkit for affine-invariant MCMC emcee is a stable, well tested Python implementation of the affine-invariant ense

1.3k Dec 31, 2022

Code for ICLR 2021 Paper, "Anytime Sampling for Autoregressive Models via Ordered Autoencoding"

Anytime Autoregressive Model Anytime Sampling for Autoregressive Models via Ordered Autoencoding , ICLR 21 Yilun Xu, Yang Song, Sahaj Gara, Linyuan Go

22 Sep 8, 2022

Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis in JAX

SYMPAIS: Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis Overview | Installation | Documentation | Examples | Notebo

4 Sep 13, 2022

NAS Benchmark in "Prioritized Architecture Sampling with Monto-Carlo Tree Search", CVPR2021

NAS-Bench-Macro This repository includes the benchmark and code for NAS-Bench-Macro in paper "Prioritized Architecture Sampling with Monto-Carlo Tree

35 Jan 3, 2023

code for "AttentiveNAS Improving Neural Architecture Search via Attentive Sampling"

94 Oct 26, 2022

Official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

Vision Transformer with Progressive Sampling This is the official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

123 Jan 1, 2023

codes for "Scheduled Sampling Based on Decoding Steps for Neural Machine Translation" (long paper of EMNLP-2022)

Scheduled Sampling Based on Decoding Steps for Neural Machine Translation (EMNLP-2021 main conference) Contents Overview Background Quick to Use Furth

13 Jul 25, 2022

[NeurIPS 2021] “Improving Contrastive Learning on Imbalanced Data via Open-World Sampling”,

Improving Contrastive Learning on Imbalanced Data via Open-World Sampling Introduction Contrastive learning approaches have achieved great success in

24 Dec 17, 2022

'Solving the sampling problem of the Sycamore quantum supremacy circuits

solve_sycamore This repo contains data, contraction code, and contraction order for the paper ''Solving the sampling problem of the Sycamore quantum s

29 Nov 28, 2022

Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".

PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation Introduction Getting Started FSD50K Recipe AudioSet Recipe Label E

84 Dec 27, 2022

zeus is a Python implementation of the Ensemble Slice Sampling method.

zeus is a Python implementation of the Ensemble Slice Sampling method. Fast & Robust Bayesian Inference, Efficient Markov Chain Monte Carlo (MCMC), Bl

197 Dec 4, 2022

Differentiable Annealed Importance Sampling (DAIS)

Differentiable Annealed Importance Sampling (DAIS) This repository contains the code to reproduce the DAIS results from the paper Differentiable Annea

6 Dec 26, 2021

An essential implementation of BYOL in PyTorch + PyTorch Lightning

Essential BYOL A simple and complete implementation of Bootstrap your own latent: A new approach to self-supervised Learning in PyTorch + PyTorch Ligh

48 Sep 27, 2022