The source code and dataset for the RecGURU paper (WSDM 2022)

Chenglin Li

Last update: Jan 7, 2023

Related tags

Overview

RecGURU

About The Project

Source code and baselines for the RecGURU paper "RecGURU: Adversarial Learning of Generalized User Representations for Cross-Domain Recommendation (WSDM 2022)"

Code Structure

RecGURU  
├── README.md                                 Read me file 
├── data_process                              Data processing methods
│   ├── __init__.py                           Package initialization file     
│   └── amazon_csv.py                         Code for processing the amazon data (in .csv format)
│   └── business_process.py                   Code for processing the collected data
│   └── item_frequency.py                     Calculate item frequency in each domain
│   └── run.sh                                Shell script to perform data processing  
├── GURU                                      Scripts for modeling, training, and testing 
│   ├── data                                  Dataloader package      
│     ├── __init__.py                         Package initialization file 
│     ├── data_loader.py                      Customized dataloaders 
│   └── tools                                 Tools such as loss function, evaluation metrics, etc.
│     ├── __init__.py                         Package initialization file
│     ├── lossfunction.py                     Customized loss functions
│     ├── metrics.py                          Evaluation metrics
│     ├── plot.py                             Plot function
│     ├── utils.py                            Other tools
│  ├── Transformer                            Transformer package
│     ├── __init__.py                         Package initialization 
│     ├── transformer.py                      transformer module
│  ├── AutoEnc4Rec.py                         Autoencoder based sequential recommender
│  ├── AutoEnc4Rec_cross.py                   Cross-domain recommender modules
│  ├── config_auto4rec.py                     Model configuration file
│  ├── gan_training.py                        Training methods of the GAN framework
│  ├── train_auto.py                          Main function for training and testing single-domain sequential recommender
│  ├── train_gan.py                           Main function for training and testing cross-domain sequential recommender
└── .gitignore                                gitignore file

Dataset

The public datasets: Amazon view dataset at: https://nijianmo.github.io/amazon/index.html
Collected datasets: https://drive.google.com/file/d/1NbP48emGPr80nL49oeDtPDR3R8YEfn4J/view
Data processing:

Amazon dataset:

```shell
cd ../data_process
python amazon_csv.py   
```

Collected dataset

```shell
cd ../data_process
python business_process.py --rate 0.1  # portion of overlapping user = 0.1   
```

After data process, for each cross-domain scenario we have a dataset folder:

."a_domain"-"b_domain"
├── a_only.pickle         # users in domain a only
├── b_only.pickle         # users in domain b only
├── a.pickle              # all users in domain a
├── b.pickle              # all users in domain b
├── a_b.pickle            # overlapped users of domain a and b

Note: see the code for processing details and make modifications accordingly.

Run

Single-domain Methods:

# SAS
python train_auto.py --sas "True"
# AutoRec (ours)
python train_auto.py

Cross-Domain Methods:

# RecGURU
python train_gan.py --cross "True"

Comments

请问collected_data如何处理？

Hi, Li:

我在下载完collected_data.gz之后，解压得到collected_data，但是这似乎无法打开得到相应的train/valid/test。

另外business_process.py似乎也不能直接处理collected_data.gz这个文件。

请问应该如何得到处理后的数据呢？

opened by caojiangxia 8
Fused with which Amazon view dataset should be download

Hello, I,m not clear which Amazon view dataset should be download from :https://nijianmo.github.io/amazon/index.html ,there are any clues? Also, is it convenient to provide the environmental configuration of the project?

opened by CreaterLL 4
business_process.py 文件没有生成kdd_10_f 只有kdd_10

business_process.py文件只生成了kdd_10文件，为什么会显示kdd_10_f文件找不到呢我需要将kdd_10_f直接改成kdd_10吗

另外代码中需要freq的文件但是business_process.py并没有在kdd_10中文件夹中生成freq文件，请问这里的freq文件是需要自己生成吗？

opened by CreaterLL 2
Confused with the result_a.* file

hi，after running ‘python train_gan.py --cross "True"’，i got only result_a.* files and the content is like as 、 ,i'm so confused with them. and i haven't found the HR@metrics. could you please give me some instructions? besides,i 'm also confused with if there are both results of two domains in the result_a. files？ Thank you!

opened by CreaterLL 1
Exception on processing the collected data

I follow the guideline to download the kdd.tar.gz from provided URL and run the business_process py file.

But some exceptions happen in the generate_data function. It happens in lines 209-249. except Exception as e: print(data_t["uid"][u_id]) print(data_t["a_item"][u_id]) print(wesee_items) print(data_t["b_item"][u_id]) print(video_items) sys.exit()

Is my download file is broken? The MD5 value of my download kdd.tar.gz is "a7e572a892b602552eaaa4203a8d7f14".

opened by WujiangXu 4

The source code and dataset for the RecGURU paper (WSDM 2022)

Related tags

Overview

RecGURU

About The Project

Code Structure

Dataset

Amazon dataset:

Collected dataset

Run

Comments

请问collected_data如何处理？

Fused with which Amazon view dataset should be download

business_process.py 文件没有生成kdd_10_f 只有kdd_10

Confused with the result_a.* file

Exception on processing the collected data

Owner

Chenglin Li

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

A PyTorch implementation of "SimGNN: A Neural Network Approach to Fast Graph Similarity Computation" (WSDM 2019).

Hierarchical Metadata-Aware Document Categorization under Weak Supervision (WSDM'21)

Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

An official source code for paper Deep Graph Clustering via Dual Correlation Reduction, accepted by AAAI 2022

Source code for paper "ATP: AMRize Than Parse! Enhancing AMR Parsing with PseudoAMRs" @NAACL-2022

Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Source code and Dataset creation for the paper "Neural Symbolic Regression That Scales"

The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question IntentionClassification Benchmark for Text-to-SQL"

This is the dataset and code release of the OpenRooms Dataset.

The 7th edition of NTIRE: New Trends in Image Restoration and Enhancement workshop will be held on June 2022 in conjunction with CVPR 2022.

"MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction" (CVPRW 2022) & (Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB)

Open-source code for Generic Grouping Network (GGN, CVPR 2022)

Official source code of Fast Point Transformer, CVPR 2022

Relative Human dataset, CVPR 2022

Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021