This code is part of the reproducibility package for the SANER 2022 paper "Generating Clarifying Questions for Query Refinement in Source Code Search".

Zachary Eberhart

Last update: Dec 4, 2021

Related tags

Deep Learning ZaCQ

Overview

Clarifying Questions for Query Refinement in Source Code Search

This code is part of the reproducibility package for the SANER 2022 paper "Generating Clarifying Questions for Query Refinement in Source Code Search".

It consists of five folders:

codesearch/ - API to access the CodeSearchNet datasets and neural bag-of-words code retrieval method.
cq/ - Implementation of the ZaCQ system, including an implementation of the the TaskNav development task extraction algorithm and two baseline query refinement methods.
data/ - Includes pretrained code search model and config files for task extraction.
evaluation/ - Scripts to run and evaluate ZaCQ.
interface/ - Backend and Frontend servers for a search interface implementing ZaCQ.

Setup

Clone the CodeSearchNet package to the root directory, and download the CSN datasets

cd ZaCQ
git clone https://github.com/github/CodeSearchNet.git
cd CodeSearchNet/scripts
./download_and_preprocess

Use a CSN model to create vector representations for candidate code search results. A pretrained Neural BoW model is included in this package.

cd codesearch
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python _setup.py

This will save and index vectors in the data folder. It will also generate search results for the 99 CSN queries.

Task extraction is fairly quick for small sets of code search results, but it is expensive to do repeatedly. To expedite the evaluation, we cache the extracted tasks for the results of the 99 CSN queries, as well as keywords for all functions in the datasets.

cd cq
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python _setup.py

Cached tasks and keywords are stored in the data folder.

Evaluation

To evaluate the ZaCQ and the other query refinement methods on the CSN queries, you may use the following:

cd evaluation
python run_queries.py
python evaluate.py

The run_queries script determines the subset of CSN queries that can be automatically evaluated, and simulates interactive refinement sessions for all valid questions for each language in CSN. For ZaCQ, the script runs through a set of predefined hyperparameter combinations. The script calculates NDCG, MAP, and MRE metrics for each refinement method and hyperparameter configuration, and stores them in the data/output folder

The evaluate script averages the metrics across all languages after 1-N rounds of refinement. For ZaCQ, it also records the best-performing hyperparamter combination after n rounds of refinement.

Interface

To run the interactive search interface, you need to run two backend servers and start the GUI server:

cd interface/cqserver
python ClarifyAPI.py

cd interface/searchserver
python SearchAPI.py

cd interface/gui
npm start

By default, you can access the GUI at localhost:3000

You might also like...

"MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction" (CVPRW 2022) & (Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB)

MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction (CVPRW 2022) Yuanhao Cai, Jing Lin, Zudi Lin, Haoqian Wang, Yulun Z

274 Jan 5, 2023

Pytorch implementation of Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization https://arxiv.org/abs/2008.11646

[TCSVT] Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization LPN [Paper] NEWs Prerequisites Python 3.6 GPU Memory = 8G Numpy 1.

46 Dec 14, 2022

Towards Part-Based Understanding of RGB-D Scans

Towards Part-Based Understanding of RGB-D Scans (CVPR 2021) We propose the task of part-based scene understanding of real-world 3D environments: from

26 Nov 23, 2022

Kaggle | 9th place (part of) solution for the Bristol-Myers Squibb – Molecular Translation challenge

Part of the 9th place solution for the Bristol-Myers Squibb – Molecular Translation challenge translating images containing chemical structures into I

22 Nov 30, 2022

Part-Aware Data Augmentation for 3D Object Detection in Point Cloud

Part-Aware Data Augmentation for 3D Object Detection in Point Cloud This repository contains a reference implementation of our Part-Aware Data Augment

62 Jan 3, 2023

TorchIO is a Medical image preprocessing and augmentation toolkit for deep learning. Part of the PyTorch Ecosystem.

Medical image preprocessing and augmentation toolkit for deep learning. Part of the PyTorch Ecosystem.

1.6k Jan 6, 2023

Created as part of CS50 AI's coursework. This AI makes use of knowledge entailment to calculate the best probabilities to win Minesweeper.

Minesweeper-AI Created as part of CS50 AI's coursework. This AI makes use of knowledge entailment to calculate the best probabilities to win Minesweep

0 Jul 20, 2022

Transformer part of 12th place solution in Riiid! Answer Correctness Prediction

kaggle_riiid Transformer part of 12th place solution in Riiid! Answer Correctness Prediction. Please see here for more information. Execution You need

2 Apr 23, 2022

Confident Semantic Ranking Loss for Part Parsing

5 Oct 22, 2022

This code is part of the reproducibility package for the SANER 2022 paper "Generating Clarifying Questions for Query Refinement in Source Code Search".

Related tags

Overview

Clarifying Questions for Query Refinement in Source Code Search

Setup

Evaluation

Interface

You might also like...

"MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction" (CVPRW 2022) & (Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB)

Pytorch implementation of Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization https://arxiv.org/abs/2008.11646

Towards Part-Based Understanding of RGB-D Scans

Kaggle | 9th place (part of) solution for the Bristol-Myers Squibb – Molecular Translation challenge

Part-Aware Data Augmentation for 3D Object Detection in Point Cloud

TorchIO is a Medical image preprocessing and augmentation toolkit for deep learning. Part of the PyTorch Ecosystem.

Created as part of CS50 AI's coursework. This AI makes use of knowledge entailment to calculate the best probabilities to win Minesweeper.

Transformer part of 12th place solution in Riiid! Answer Correctness Prediction

Confident Semantic Ranking Loss for Part Parsing

Owner

Zachary Eberhart

Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study.

Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

An implementation of Geoffrey Hinton's paper "How to represent part-whole hierarchies in a neural network" in Pytorch.

An attempt at the implementation of GLOM, Geoffrey Hinton's paper for emergent part-whole hierarchies from data

Official implementation of the paper Visual Parser: Representing Part-whole Hierarchies with Transformers

The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter

Utility tools for the "Divide and Remaster" dataset, introduced as part of the Cocktail Fork problem paper

The 7th edition of NTIRE: New Trends in Image Restoration and Enhancement workshop will be held on June 2022 in conjunction with CVPR 2022.