Parameterization of Hypercomplex Multiplications (PHM)
This repository contains the TensorFlow implementation of PHM (Parameterization of Hypercomplex Multiplication) layers and PHM-Transformers in the paper Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with 1/n Parameters at ICLR 2021.
Installation
One may install the following libraries before running our code:
- tensorflow-gpu (1.14.0)
- tensor2tensor (1.14.0)
Usage
The usage of this repository follows the original tensor2tensor repository (e.g., t2t-datagen
, t2t-trainer
, t2t-avg-all
, followed by t2t-decoder
). It helps to gain familiarity on tensor2tensor before attempting to run our code. Specifically, setting --t2t_usr_dir=./Parameterization-of-Hypercomplex-Multiplications
will allow tensor2tensor to register PHM-Transformers.
Training
For example, to evaluate PHM-Transformer (n=4) on the En-Vi machine translation task (t2t-datagen --problem=translate_envi_iwslt32k
), one may set the following flags when training:
t2t-trainer \
--problem=translate_envi_iwslt32k \
--model=light_transformer \
--hparams_set=light_transformer_base_single_gpu \
--hparams="light_mode='random',hidden_size=512,factor=4" \
--train_steps=50000
where light_transformer
with light_mode='random'
is the alias of the PHM-Transformer in our implementation.
Aggretating Checkpoints
After training, the latest 8 checkpoints are averaged:
t2t-avg-all --model_dir $TRAIN_DIR --output_dir $AVG_DIR --n 8
where $TRAIN_DIR
and $AVG_DIR
need to be specified by users.
Testing
To decode the target sequence, one has to additionally set the decode_hparams
as follows:
t2t-decoder \
--decode_hparams="beam_size=5,alpha=0.6"
Then t2t-bleu
is invoked for calculating the BLEU.
PHM Implementations
PHM is implemented with operations in make_random_mul
and random_ffn
, which are mathematically equivalent to sum of Kronecker products.
Among works that use PHM, some have offered alternative PHM implementations:
- Parameterized Hypercomplex Graph Neural Networks
- COMPACTER: Efficient Low-Rank Hypercomplex Adapter Layers
- Convolutional Neural Networks by Hypercomplex Parameterization
- demegire/Parameterization-of-Hypercomplex-Multiplications
Citation
If you find this repository helpful, please cite our paper:
@inproceedings{zhang2021beyond,
title={Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with $1/n$ Parameters},
author={Zhang, Aston and Tay, Yi and Zhang, Shuai and Chan, Alvin and Luu, Anh Tuan and Hui, Siu Cheung and Fu, Jie},
booktitle={International Conference on Learning Representations},
year={2021}
}