Code for "Learning Structural Edits via Incremental Tree Transformations" (ICLR'21)

NeuLab

Last update: Dec 23, 2022

Related tags

Deep Learning incremental_tree_edit

Overview

Learning Structural Edits via Incremental Tree Transformations

Code for "Learning Structural Edits via Incremental Tree Transformations" (ICLR'21)

1. Prepare Environment

We recommend using conda to manage the environment:

conda env create -n "structural_edits" -f structural_edits.yml
conda activate structural_edits

Install the punkt tokenizer:

python
>>> import nltk
>>> nltk.download('punkt')
>>> <ctrl-D>

2. Data

Please extract the datasets and vocabulary files by:

cd source_data
tar -xzvf githubedits.tar.gz

All necessary source data has been included as the following:

| --source_data
|       |-- githubedits
|           |-- githubedits.{train|train_20p|dev|test}.jsonl
|           |-- csharp_fixers.jsonl
|           |-- vocab.from_repo.{080910.freq10|edit}.json
|           |-- Syntax.xml
|           |-- configs
|               |-- ...(model config json files)

A sample file containing 20% of the GitHubEdits training data is included as source_data/githubedits/githubedits.train_20p.jsonl for running small experiments.

We have generated and included the vocabulary files as well. To create your own vocabulary, see edit_components/vocab.py.

3. Experiments

See training and test scripts in scripts/githubedits/. Please configure the PYTHONPATH environment variable in line 6.

3.1 Training

For training, uncomment the desired setting in scripts/githubedits/train.sh and run:

bash scripts/githubedits/train.sh source_data/githubedits/configs/CONFIGURATION_FILE

where CONFIGURATION_FILE is the json file of your setting.

Supervised Learning

For example, if you want to train Graph2Edit + Sequence Edit Encoder on GitHubEdits's 20% sample data, please uncomment only line 21-25 in scripts/githubedits/train.sh and run:

bash scripts/githubedits/train.sh source_data/githubedits/configs/graph2iteredit.seq_edit_encoder.20p.json

(Note: when you run the experiment for the first time, you might need to wait for ~15 minutes for data preprocessing.)

Imitation Learning

To further train the model with PostRefine imitation learning, please replace FOLDER_OF_SUPERVISED_PRETRAINED_MODEL with your model dir in source_data/githubedits/configs/graph2iteredit.seq_edit_encoder.20p.postrefine.imitation.json. Uncomment only line 27-31 in scripts/githubedits/train.sh and run:

bash scripts/githubedits/train.sh source_data/githubedits/configs/graph2iteredit.seq_edit_encoder.20p.postrefine.imitation.json

3.2 Test

To test a trained model, first uncomment only the desired setting in scripts/githubedits/test.sh and replace work_dir with your model directory, and then run:

bash scripts/githubedits/test.sh

4. Reference

If you use our code and data, please cite our paper:

@inproceedings{yao2021learning,
    title={Learning Structural Edits via Incremental Tree Transformations},
    author={Ziyu Yao and Frank F. Xu and Pengcheng Yin and Huan Sun and Graham Neubig},
    booktitle={International Conference on Learning Representations},
    year={2021},
    url={https://openreview.net/forum?id=v9hAX77--cZ}
}

Our implementation is adapted from TranX and Graph2Tree. We are grateful to the two work!

@inproceedings{yin18emnlpdemo,
    title = {{TRANX}: A Transition-based Neural Abstract Syntax Parser for Semantic Parsing and Code Generation},
    author = {Pengcheng Yin and Graham Neubig},
    booktitle = {Conference on Empirical Methods in Natural Language Processing (EMNLP) Demo Track},
    year = {2018}
}
@inproceedings{yin2018learning,
    title={Learning to Represent Edits},
    author={Pengcheng Yin and Graham Neubig and Miltiadis Allamanis and Marc Brockschmidt and Alexander L. Gaunt},
    booktitle={International Conference on Learning Representations},
    year={2019},
    url={https://openreview.net/forum?id=BJl6AjC5F7},
}

Code for all the Advent of Code'21 challenges mostly written in python

Advent of Code 21 Code for all the Advent of Code'21 challenges mostly written in python. They are not necessarily the best or fastest solutions but j

4 May 26, 2022

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

This codebase is being actively maintained, please create and issue if you have issues using it Basics All data files are included under losses and ea

32 Nov 9, 2021

Opinionated code formatter, just like Python's black code formatter but for Beancount

beancount-black Opinionated code formatter, just like Python's black code formatter but for Beancount Try it out online here Features MIT licensed - b

16 Oct 11, 2022

a delightful machine learning tool that allows you to train, test and use models without writing code

igel A delightful machine learning tool that allows you to train/fit, test and use models without writing code Note I'm also working on a GUI desktop

3k Jan 5, 2023

Pytorch Lightning code guideline for conferences

Deep learning project seed Use this seed to start new deep learning / ML projects. Built in setup.py Built in requirements Examples with MNIST Badges

1k Jan 2, 2023

Automatically Build Multiple ML Models with a Single Line of Code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

Auto-ViML Automatically Build Variant Interpretable ML models fast! Auto_ViML is pronounced "auto vimal" (autovimal logo created by Sanket Ghanmare) N

397 Dec 30, 2022

Comments

Cannot produce model.bin after training

After I run "bash scripts/githubedits/train.sh source_data/githubedits/configs/graph2iteredit.seq_edit_encoder.20p.json"

and the err.log says

Could you please guide me how to fix this err?

opened by chontipan 5
Where can we get the Syntax.xml file for other languages?

Hi, Your work is great. I want to reproduce your model on Java datasets. But the Syntax.xml and grammar.json file in asdl/lang/csharp is for Csharp. Are there any public resources for Java? Thank you.

opened by v587su 2
How applying the model to a different language
Hi authors, your work is interesting. May I ask steps of applying this model to a different language (i.e., JavaScript)?

Create ASDL grammar on tranX?

Convert raw data (source code) to the ginhubedit dataset format?

Make edit sequence generation (could you please suggest how to do?)

Data Preprocessing and Sanity Check Todo: when applying the model to a different programming language, how to check/debug the ASDL grammar implementation, the abstract syntactic tree preprocessing, and the ground-truth edit sequence generation?
opened by chontipan 2
What is the CSharpHypothesis for?

Hi. I am trying to transfer your model to my Java dataset. It is not a easy work because of the difference between the grammar of C# and Java. I have already produced the ASDLTree with Field class modified.

Now I have difficulty in understanding the CSharpHypothesis class.

-What is its function? -Why does the SubstitutionSystem use Hypothesis intead of CSharpHypothesis?

It will be a great help if you could answer these questions.

opened by v587su 1

Code for "Learning Structural Edits via Incremental Tree Transformations" (ICLR'21)

Related tags

Overview

Learning Structural Edits via Incremental Tree Transformations

1. Prepare Environment

2. Data

3. Experiments

3.1 Training

Supervised Learning

Imitation Learning

3.2 Test

4. Reference

You might also like...

Code for all the Advent of Code'21 challenges mostly written in python

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

Opinionated code formatter, just like Python's black code formatter but for Beancount

a delightful machine learning tool that allows you to train, test and use models without writing code

Pytorch Lightning code guideline for conferences

Automatically Build Multiple ML Models with a Single Line of Code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

Code samples for my book "Neural Networks and Deep Learning"

Code for: https://berkeleyautomation.github.io/bags/

Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Comments

Cannot produce model.bin after training

Where can we get the Syntax.xml file for other languages?

How applying the model to a different language

What is the CSharpHypothesis for?

Owner

NeuLab

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

A code generator from ONNX to PyTorch code

This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code for training a DPR model then continuing training with RAG.

Convert Python 3 code to CUDA code.

Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Low-code/No-code approach for deep learning inference on devices