Code for Transformer Hawkes Process, ICML 2020.

Simiao Zuo

Last update: Dec 26, 2022

Related tags

Deep Learning Transformer-Hawkes-Process

Overview

Transformer Hawkes Process

Source code for Transformer Hawkes Process (ICML 2020).

Run the code

Dependencies

Python 3.7.
Anaconda contains all the required packages.
PyTorch version 1.4.0.

Instructions

Put the data folder inside the root folder, modify the data entry in run.sh accordingly. The datasets are available here.
bash run.sh to run the code.

Note

Right now the code only supports single GPU training, but an extension to support multiple GPUs should be easy.
The reported event time prediction RMSE and the time stamps provided in the datasets are not of the same unit, i.e., the provided time stamps can be in minutes, but the reported results are in hours.
There are several factors that can be changed, beside the ones in run.sh:
- In Main.py, function train_epoch, the event time prediction squared error needs to be properly scaled to stabilize training. In the meantime, also scale the diff variable in function time_loss in Utils.py.
- In Utils.py, function log_likelihood, users can select whether to use numerical integration or Monte Carlo integration.
- In transformer/Models.py, class Transformer, there is an optional recurrent layer. This is inspired by the fact that additional recurrent layers can better capture the sequential context, as suggested in this paper. In reality, this may or may not help, depending on the dataset.

Reference

Please cite the following paper if you use this code.

@article{zuo2020transformer,
  title={Transformer Hawkes Process},
  author={Zuo, Simiao and Jiang, Haoming and Li, Zichong and Zhao, Tuo and Zha, Hongyuan},
  journal={arXiv preprint arXiv:2002.09291},
  year={2020}
}

Comments

I try to reproduce the results of your experiment about RMSE.

Dear Zuo， I hope everything is fine with you, after saw your paper, I am really interested in you work, and learned a lot from it. However, I have a little question, how do you get the so small RMSE? While I get the RMSE on Financal data, is like Minimum RMSE: xx.xxxxx while in the paper, it is only 0.93. Are there some hyper-parameters about loss function need to adjustment? I set loss = 0 * event_loss + 0* pred_loss + se / scale_time_loss But it still doesn' t work, could you offer me some help? Thank you.

Sincerely yours, Luning Zhang

opened by DavidZhang88 3
Loss setting: can we mix likelihood and RMSE together?

In the Paper the Loss function contains 3 parts:

The first part is the log-likelihood, the second is the event cross-entropy loss, and the third is the time RMSE loss.

I don't know if it is appropriate to mix these all together. As I expect, if the log-likelihood is applied, then to predict the next event's time, the only way is to calculate the expectation of PDF:

Can anyone help me make further clarification?

opened by waystogetthere 0
Ambiguity in calculating log likelihood

Hi, I have the following issue regarding calculating LL in Utils.py in line 49 inside the function "compute_integral_unbiased":

temp_hid = torch.sum(temp_hid * type_mask[:, 1:, :], dim=2, keepdim=True)

you have only considered the occurred events for calculating integral while according to the formula we should compute the integral of each event type. I would expect the output dimension of this function to be [Batch_size, Length, Num_types]. then we should sum over all num_types instead of reducing it to only occurred events.

I believe that this underestimation of this integral has led to your high overall LL compared to other studies.

looking forward to your clarifications

opened by hojjatkarami 4
The performances in the paper is not reproduced.

Hi, I tried to reproduce the transformer hawkes process on StackOverflow fold1. However, the results of accuracy and RMSE is as below.

![image](https://user-images.githubusercontent.com/56212725/173004512-ba357b4d-244a-4f73-9ca1-9d9535f3f1df.png

)

I think I have something missing. Compared to the relased code of Self-Attentive Hawkes process, I think it is not because of scaling factor. What does make the difference between the paper and this repository?

opened by KanghoonYoon 7
Should event likelihood be computed using current or last hidden state?

Suppose the transformer hidden state at event i is h_i, should the likelihood of this event be computed using h_i or h_{i-1}?

Using h_{i-1} makes more sense to me because this will encourage model to assign high intensity to the true next event, therefore learn to forecast.

But the implementation and the paper seem to be using h_i. The problem is that, since the transformer is given the true event i as part of the input, it can simply learn to output infinitely high intensity for the correct event type in order to maximize the likelihood. Still, the learned model will have no predictive power.

I feel I must have missed something. Any clarification is appreciated. Thanks.

opened by mistycheney 2
Instructions to obtain Structured-THP datasets
Could you please provide additional details on how to obtain the 911-Calls and Earthquake datasets used in your paper? The CSV found at the provided webpage has 663,522 calls, all of which are in the EMS, fire, or traffic categories. For the 75 most frequent ZIP codes in this dataset, there are 582,045 total calls, which is considerably more than the 290,293 listed in Table 1 (see below code).

import pandas as pd df = pd.read_csv("911.csv") print(len(df)) # 663522 cats = ["EMS: ", "Fire: ", "Traffic: "] in_cats = 0 for title in df["title"]: for cat in cats: if cat in title: in_cats += 1 break print(in_cats) # 663522 zip_calls = ( df.groupby("zip") .size() .reset_index(name="n_calls") .sort_values("n_calls", ascending=False) ) print(zip_calls["n_calls"][:75].sum()) # 582045

The paper also states that:

An undirected edge exists between two vertices if their zipcodes are within 10 of each other.

Does this mean two vertices were considered neighbors if abs(ZIP_{1} - ZIP_{2}) <= 10?

For the Earthquake dataset, the provided website is in Chinese and seems to host a number of datasets. Could you provide precise instructions on where to find the specific earthquake dataset used in your paper?
opened by airalcorn2 0

Owner

Simiao Zuo

PhD Student @ Georgia Tech

GitHub

[ICML 2020] Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control

PG-MORL This repository contains the implementation for the paper Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Contro

65 Jan 7, 2023

The implementation of the algorithm in the paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020.

DS3L This is the code for paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020. Setups The code is implem

36 Oct 19, 2022

Decentralized Reinforcment Learning: Global Decision-Making via Local Economic Transactions (ICML 2020)

Decentralized Reinforcement Learning This is the code complementing the paper Decentralized Reinforcment Learning: Global Decision-Making via Local Ec

40 Oct 30, 2022

PyTorch implementation of SCAFFOLD (Stochastic Controlled Averaging for Federated Learning, ICML 2020).

Scaffold-Federated-Learning PyTorch implementation of SCAFFOLD (Stochastic Controlled Averaging for Federated Learning, ICML 2020). Environment numpy=

30 Dec 29, 2022

Code for the ICML 2021 paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

ViLT Code for the paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision" Install pip install -r requirements.txt pip

922 Jan 1, 2023

Code for the ICML 2021 paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

ViLT Code for the paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision" Install pip install -r requirements.txt pip

922 Jan 1, 2023

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

225 Nov 13, 2022

Code for ICML 2021 paper: How could Neural Networks understand Programs?

OSCAR This repository contains the source code of our ICML 2021 paper How could Neural Networks understand Programs?. Environment Run following comman

115 Dec 17, 2022

Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline Ankit Goyal, Hei Law, Bowei Liu, Alejandro Newell, Jia Deng Internati

115 Jan 4, 2023

Code for Fold2Seq paper from ICML 2021

[ICML2021] Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design Environment file: environment.yml Data and Feat

43 Dec 4, 2022

Official code for UnICORNN (ICML 2021)

UnICORNN (Undamped Independent Controlled Oscillatory RNN) [ICML 2021] This repository contains the implementation to reproduce the numerical experime

21 Dec 22, 2022

Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxiang Wang, Han Zhao, Bo Li.

Bridging Multi-Task Learning and Meta-Learning Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Trainin

57 Dec 15, 2022

Code for Transformer Hawkes Process, ICML 2020.

Related tags

Overview

Transformer Hawkes Process

Run the code

Dependencies

Instructions

Note

Reference

Comments

I try to reproduce the results of your experiment about RMSE.

Loss setting: can we mix likelihood and RMSE together?

Ambiguity in calculating log likelihood

The performances in the paper is not reproduced.

Should event likelihood be computed using current or last hidden state?

Instructions to obtain Structured-THP datasets

Owner

Simiao Zuo

[ICML 2020] Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control

The implementation of the algorithm in the paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020.

Decentralized Reinforcment Learning: Global Decision-Making via Local Economic Transactions (ICML 2020)

PyTorch implementation of SCAFFOLD (Stochastic Controlled Averaging for Federated Learning, ICML 2020).

Code for the ICML 2021 paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

Code for the ICML 2021 paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

Code for ICML 2021 paper: How could Neural Networks understand Programs?

Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

Code for Fold2Seq paper from ICML 2021

Official code for UnICORNN (ICML 2021)

Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxiang Wang, Han Zhao, Bo Li.

Code release for "Self-Tuning for Data-Efficient Deep Learning" (ICML 2021)

Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun

Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation".

Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

UDP++ (ECCVW 2020 Oral), (Winner of COCO 2020 Keypoint Challenge).

Code for SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics (ACL'2020).

Code to reproduce the experiments in the paper "Transformer Based Multi-Source Domain Adaptation" (EMNLP 2020)