Supervised Contrastive Learning for Product Matching

Related tags

Deep Learning contrastive-product-matching

Overview

Contrastive Product Matching

This repository contains the code and data download links to reproduce the experiments of the paper "Supervised Contrastive Learning for Product Matching" by Ralph Peeters and Christian Bizer. ArXiv link. A comparison of the results to other systems using different benchmark datasets is found at Papers with Code - Entity Resolution.

Requirements

Anaconda3

Please keep in mind that the code is not optimized for portable or even non-workstation devices. Some of the scripts may require large amounts of RAM (64GB+) and GPUs. It is advised to use a powerful workstation or server when experimenting with some of the larger files.

The code has only been used and tested on Linux (CentOS) servers.
Building the conda environment

To build the exact conda environment used for the experiments, navigate to the project root folder where the file contrastive-product-matching.yml is located and run conda env create -f contrastive-product-matching.yml

Furthermore you need to install the project as a package. To do this, activate the environment with conda activate contrastive-product-matching, navigate to the root folder of the project, and run pip install -e .
Downloading the raw data files

Navigate to the src/data/ folder and run python download_datasets.py to automatically download the files into the correct locations. You can find the data at data/raw/

If you are only interested in the separate datasets, you can download the WDC LSPC datasets and the deepmatcher splits for the abt-buy and amazon-google datasets on the respective websites.
Processing the data

To prepare the data for the experiments, run the following scripts in that order. Make sure to navigate to the respective folders first.
1. src/processing/preprocess/preprocess_corpus.py
2. src/processing/preprocess/preprocess_ts_gs.py
3. src/processing/preprocess/preprocess_deepmatcher_datasets.py
4. src/processing/contrastive/prepare_data.py
5. src/processing/contrastive/prepare_data_deepmatcher.py
Running the Contrastive Pre-training and Cross-entropy Fine-tuning

Navigate to src/contrastive/

You can find respective scripts for running the experiments of the paper in the subfolders lspc/ abtbuy/ and amazongoogle/. Note that you need to adjust the file path in these scripts for your system (replace your_path with path/to/repo).
- Contrastive Pre-training
  
  To run contrastive pre-training for the abtbuy dataset for example use
  
  bash abtbuy/run_pretraining_clean_roberta.sh BATCH_SIZE LEARNING_RATE TEMPERATURE (AUG)
  
  You need to specify batch site, learning rate and temperature as arguments here. Optionally you can also apply data augmentation by passing an augmentation method as last argument (use all- for the augmentation used in the paper).
  
  For the WDC Computers data you need to also supply the size of the training set, e.g.
  
  bash lspc/run_pretraining_roberta.sh BATCH_SIZE LEARNING_RATE TEMPERATURE TRAIN_SIZE (AUG)
- Cross-entropy Fine-tuning
  
  Finally, to use the pre-trained models for fine-tuning, run any of the fine-tuning scripts in the respective folders, e.g.
  
  bash abtbuy/run_finetune_siamese_frozen_roberta.sh BATCH_SIZE LEARNING_RATE TEMPERATURE (AUG)
  
  Please note, that BATCH_SIZE refers to the batch size used in pre-training. The fine-tuning batch size is locked to 64 but can be adjusted in the bash scripts if needed.
  
  Analogously for fine-tuning WDC computers, add the train size:
  
  bash lspc/run_finetune_siamese_frozen_roberta.sh BATCH_SIZE LEARNING_RATE TEMPERATURE TRAIN_SIZE (AUG)

Project based on the cookiecutter data science project template. #cookiecutterdatascience

Comments

Reproduce the Abt-Buy and Amazon-Google results

Hi, couldn't find the preprocessing you've done for the mentioned datasets. Curious to see how to obtain the similar metrics as you've reported in your paper

opened by ptyshevs 1
Question about the training set for constrastive learning

Hi,

Thank you for sharing your code. I wonder if you could explain why you're using the test set for training in the pertaining step here. Many thanks.

opened by XiaohanYa 1

A PyTorch implementation of "Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning", IJCAI-21

MERIT A PyTorch implementation of our IJCAI-21 paper Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning. Depen

Graph Analysis & Deep Learning Laboratory, GRAND

32 Jan 2, 2023

Supervised Contrastive Learning for Downstream Optimized Sequence Representations

SupCL-Seq 📖 Supervised Contrastive Learning for Downstream Optimized Sequence representations (SupCS-Seq) accepted to be published in EMNLP 2021, ext

18 Oct 21, 2022

Implementation for paper "Towards the Generalization of Contrastive Self-Supervised Learning"

Contrastive Self-Supervised Learning on CIFAR-10 Paper "Towards the Generalization of Contrastive Self-Supervised Learning", Weiran Huang, Mingyang Yi

13 Nov 30, 2022

This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning].

CG3 This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning]. R

12 Oct 28, 2022

Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive Learning.

Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive Learning. Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive

7 Dec 16, 2021

Official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition" in AAAI2022.

AimCLR This is an official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Reco

44 Dec 17, 2022

This is the repository for the NeurIPS-21 paper [Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels].

CGPN This is the repository for the NeurIPS-21 paper [Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels]. Req

10 Sep 12, 2022

ISBI 2022: Cross-level Contrastive Learning and Consistency Constraint for Semi-supervised Medical Image.

Cross-level Contrastive Learning and Consistency Constraint for Semi-supervised Medical Image Introduction This repository contains the PyTorch implem

25 Nov 9, 2022

PyTorch implementation of DirectCLR from paper Understanding Dimensional Collapse in Contrastive Self-supervised Learning

DirectCLR DirectCLR is a simple contrastive learning model for visual representation learning. It does not require a trainable projector as SimCLR. It

49 Dec 21, 2022

Supervised Contrastive Learning for Product Matching

Related tags

Overview

Contrastive Product Matching

You might also like...

A PyTorch implementation of "Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning", IJCAI-21

Supervised Contrastive Learning for Downstream Optimized Sequence Representations

Implementation for paper "Towards the Generalization of Contrastive Self-Supervised Learning"

This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning].

Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive Learning.

Official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition" in AAAI2022.

This is the repository for the NeurIPS-21 paper [Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels].

ISBI 2022: Cross-level Contrastive Learning and Consistency Constraint for Semi-supervised Medical Image.

PyTorch implementation of DirectCLR from paper Understanding Dimensional Collapse in Contrastive Self-supervised Learning

Comments

Reproduce the Abt-Buy and Amazon-Google results

Question about the training set for constrastive learning

Owner

Web-based Systems Group @ University of Mannheim

TorchPQ is a python library for Approximate Nearest Neighbor Search (ANNS) and Maximum Inner Product Search (MIPS) on GPU using Product Quantization (PQ) algorithm.

Streamlit App For Product Analysis - Streamlit App For Product Analysis

SUPERVISED-CONTRASTIVE-LEARNING-FOR-PRE-TRAINED-LANGUAGE-MODEL-FINE-TUNING - The Facebook paper about fine tuning RoBERTa with contrastive loss

Code for C2-Matching (CVPR2021). Paper: Robust Reference-based Super-Resolution via C2-Matching.

A Python implementation of the Locality Preserving Matching (LPM) method for pruning outliers in image matching.

Dense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021.

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Code for CoMatch: Semi-supervised Learning with Contrastive Graph Regularization

Self-Supervised Contrastive Learning of Music Spectrograms

PyTorch implementation of "Supervised Contrastive Learning" (and SimCLR incidentally)