A PyTorch implementation of QANet.

Overview

QANet-pytorch

NOTICE

I'm very busy these months. I'll return to this repo in about 10 days.

Introduction

An implementation of QANet with PyTorch.

Any contributions are welcome!

Current performance

F1 EM Got by
66 ? InitialBug
64 50 BangLiu

Usage

  1. Install pytorch 0.4 for Python 3.6+
  2. Run pip install -r requirements.txt to install python dependencies.
  3. Run download.sh to download the dataset.
  4. Run python preproc.py to build tensors from the raw dataset.
  5. Run python main.py --mode train to train the model. After training, log/model.pt will be generated.
  6. Run python main.py --mode test to test an pretrained model. Default model file is log/model.pt

Structure

preproc.py: downloads dataset and builds input tensors.

main.py: program entry; functions about training and testing.

models.py: QANet structure.

config.py: configurations.

Differences from the paper

  1. The paper doesn't mention which activation function they used. I use relu.
  2. I don't set the embedding of <UNK> trainable.
  3. The connector between embedding layers and embedding encoders may be different from the implementation of Google, since the description in the paper is inconsistent (residual block can't be used because the dimensions of input and output are different) and they don't say how they implemented it.

TODO

  • Reduce memory usage
  • Improve converging speed (to reach 60 F1 scores in 1000 iterations)
  • Reach state-of-art scroes of the original paper
  • Performance analysis
  • Test on SQuAD 2.0

Contributors

  1. InitialBug: found two bugs: (1) positional encodings require gradients; (2) wrong weight sharing among encoders.
  2. linthieda: fixed one issue about dependencies and offered computing resources.
  3. BangLiu: tested the model.
  4. wlhgtc: (1) improved the calculation of Context-Question Attention; (2) fixed a bug that is compacting embeddings before highway nets.
Comments
  • Some difference between the paper and the model

    Some difference between the paper and the model

    2018-07-17 8 37 41

    According to the original paper, the dimension of input to Encoder block is 500(200+300). That means we should not change the dimension of word embedding and char embedding in "Embedding Layer" . We should use "Conv" at the start of "Encoder Layer" in order to map it to dimension.

    opened by wlhgtc 1
  • the layer norm

    the layer norm

    nn.LayerNorm is a function with learnable parameters, it not only normalize the input, but also learn the possible data distribution, I think different layers in the encoder block(eg. conv layer,self-attention layer, feed forward layer) should have different learnable layernorm.

    opened by InitialBug 1
  • Position Encoding

    Position Encoding

    In the original paper 《attention is all you need》, it seems the position encoding is direct computed rather than trained. But in your code, the final parameter is wrapped with torch.nn.Parameter, is that OK?

    opened by InitialBug 1
  • why the dim of character vec set to 64?

    why the dim of character vec set to 64?

    hey,hengruo! I've got a question to consult you~ char_dim = 64 #Embedding dimension for char In the paper of QANet, the auther stated ' Each character is represented as a trainable vector of dimension p2 = 200, meaning each word can be viewed as the concatenation of the embedding vectors for each of its characters.' But, I found this in the config.py, char_dim = 64 ? And, when i run this repo,both the values of F1 score and EM were very low(F1 only close to 10). What's more: d_model = 96 #Dimension of connectors of each layer is there should be 128? Do those settings of these values affect the performance of model? Thanks a lot, I'd appreciate that if you have time.

    opened by JewelChen2019 1
  • some issues

    some issues

    Hi, @hengruo When I trained the model, I got an issue below:

    File "/XXX/QANet-pytorch-master/models.py", line 33, in init self.pos_encoding = nn.Parameter(torch.sin(torch.add(torch.mul(pos, freqs), phases)), requires_grad=False) RuntimeError: Expected object of type torch.LongTensor but found type torch.FloatTensor for argument #2 'other'

    Then I modified the code so that I can run it correctly: self.pos_encoding = nn.Parameter(torch.sin(torch.add(torch.mul(pos.float(), freqs), phases)), requires_grad=False)

    I don't know why only I have got the issue. Thank you!

    opened by qjzhzw 0
  • any plans on releasing pretrained model ?

    any plans on releasing pretrained model ?

    It would really be nice, if someone can release partially / fully trained model checkpoints, so I can try restoring from that point, instead of starting from scratch, any plans for that ?

    Thanks

    opened by saurabhvyas 0
  • GPU memory explode after 3 steps

    GPU memory explode after 3 steps

    I use the TiTan x GPU, but the GPU memory is growing rapidly, and after 3 batches, it went out of memory. I have check your code line by line, and I still don't konw what's wrong with it

    opened by InitialBug 31
Owner
H. Z.
H. Z.
An essential implementation of BYOL in PyTorch + PyTorch Lightning

Essential BYOL A simple and complete implementation of Bootstrap your own latent: A new approach to self-supervised Learning in PyTorch + PyTorch Ligh

Enrico Fini 48 Sep 27, 2022
RealFormer-Pytorch Implementation of RealFormer using pytorch

RealFormer-Pytorch Implementation of RealFormer using pytorch. Includes comparison with classical Transformer on image classification task (ViT) wrt C

Simo Ryu 90 Dec 8, 2022
A PyTorch implementation of the paper Mixup: Beyond Empirical Risk Minimization in PyTorch

Mixup: Beyond Empirical Risk Minimization in PyTorch This is an unofficial PyTorch implementation of mixup: Beyond Empirical Risk Minimization. The co

Harry Yang 121 Dec 17, 2022
A pytorch implementation of Pytorch-Sketch-RNN

Pytorch-Sketch-RNN A pytorch implementation of https://arxiv.org/abs/1704.03477 In order to draw other things than cats, you will find more drawing da

Alexis David Jacq 172 Dec 12, 2022
PyTorch implementation of Advantage async actor-critic Algorithms (A3C) in PyTorch

Advantage async actor-critic Algorithms (A3C) in PyTorch @inproceedings{mnih2016asynchronous, title={Asynchronous methods for deep reinforcement lea

LEI TAI 111 Dec 8, 2022
Pytorch-diffusion - A basic PyTorch implementation of 'Denoising Diffusion Probabilistic Models'

PyTorch implementation of 'Denoising Diffusion Probabilistic Models' This reposi

Arthur Juliani 76 Jan 7, 2023
Fang Zhonghao 13 Nov 19, 2022
RETRO-pytorch - Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch

RETRO - Pytorch (wip) Implementation of RETRO, Deepmind's Retrieval based Attent

Phil Wang 556 Jan 4, 2023
HashNeRF-pytorch - Pure PyTorch Implementation of NVIDIA paper on Instant Training of Neural Graphics primitives

HashNeRF-pytorch Instant-NGP recently introduced a Multi-resolution Hash Encodin

Yash Sanjay Bhalgat 616 Jan 6, 2023
Generic template to bootstrap your PyTorch project with PyTorch Lightning, Hydra, W&B, and DVC.

NN Template Generic template to bootstrap your PyTorch project. Click on Use this Template and avoid writing boilerplate code for: PyTorch Lightning,

Luca Moschella 520 Dec 30, 2022
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

This repository holds NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pytorch. Some of the code here will be included in upstream Pytorch eventually. The intention of Apex is to make up-to-date utilities available to users as quickly as possible.

NVIDIA Corporation 6.9k Jan 3, 2023
Objective of the repository is to learn and build machine learning models using Pytorch. 30DaysofML Using Pytorch

30 Days Of Machine Learning Using Pytorch Objective of the repository is to learn and build machine learning models using Pytorch. List of Algorithms

Mayur 119 Nov 24, 2022
Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pytorch Lightning 1.4k Jan 1, 2023
Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

Amazon Forest Computer Vision Satellite Image tagging code using PyTorch / Keras Here is a sample of images we had to work with Source: https://www.ka

Mamy Ratsimbazafy 360 Dec 10, 2022
The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.

This is a curated list of tutorials, projects, libraries, videos, papers, books and anything related to the incredible PyTorch. Feel free to make a pu

Ritchie Ng 9.2k Jan 2, 2023
Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

Amazon Forest Computer Vision Satellite Image tagging code using PyTorch / Keras Here is a sample of images we had to work with Source: https://www.ka

Mamy Ratsimbazafy 359 Jan 5, 2023
A bunch of random PyTorch models using PyTorch's C++ frontend

PyTorch Deep Learning Models using the C++ frontend Gettting started Clone the repo 1. https://github.com/mrdvince/pytorchcpp 2. cd fashionmnist or

Vince 0 Jul 13, 2021
PyTorch Autoencoders - Implementing a Variational Autoencoder (VAE) Series in Pytorch.

PyTorch Autoencoders Implementing a Variational Autoencoder (VAE) Series in Pytorch. Inspired by this repository Model List check model paper conferen

Subin An 8 Nov 21, 2022
PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

PyTorch-LIT PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices. With

Amin Rezaei 157 Dec 11, 2022