Replication Code for "Self-Supervised Bug Detection and Repair" NeurIPS 2021

Overview

Self-Supervised Bug Detection and Repair

This is the reference code to replicate the research in Self-Supervised Bug Detection and Repair in NeurIPS 2021. Note that due to internal dependencies, it is not possible to share the whole infrastructure. We provide instructions to run the supervised learning parts. For the self-supervised, we provide the code as-is and some documentation below but it's not directly replicable without substantial work to setup the relevant infrastructure.

Please cite as,

@inproceedings{allamanis2021self,
  title={Self-Supervised Bug Detection and Repair},
  author={Allamanis, Miltiadis and Jackson-Flux, Henry and Brockschmidt, Marc},
  booktitle={NeurIPS},
  year={2021}
}

Table of Content

Random Bugs Data Extraction

To create a dataset with fixed (randomly selected) bugs, use the following instructions:

  1. Build the Docker container for extraction by navigating to the root directory and running. Start :

    docker build -f deployment/dockerfiles/baseimage.Dockerfile -t buglab-base .
    
  2. Create a text file containing the names of the PyPi packages to be extracted.

    • You can find the 4k most downloaded packages here.
    • You can get a list of all PyPi packages using the utilities in buglab.data.pypi.utils.get_all_pypi_packages.
  3. Start the extraction

    python -m buglab.controllers.staticdatasetextractor package_list_path target_dir
    

    The results will be saved at the target_dir. The code will create multiple processes each spawning a Docker container that is executing the main in buglab.controllers.packageextracttodisk that extracts the data from a single package.

    Note that by default the extraction runs as many Docker containers as CPUs in the current machine.

  4. Split the dataset

    python -m buglab.data.split ALL_DATA_FOLDER OUTPUT_FOLDER
    

    the split is deterministic with respect to the filename.

To view the saved data, you can use the CLI utility see Viewing *.msgpack.l.gz.

PyPIBugs Dataset

We provide the PyPIBugs dataset as a set of URLs and git SHAs, which can be used to re-extract the dataset. Please download the dataset from here. The dataset has a json lines (JSONL) format with each line having the format:

{
    "repo": "url-to-git",
    "hash": "SHA of bug fixing commit",
    "diff": "diff hunk",
    "old_path": "filepath wrt to repo root",
    "rewrite": "the bug fixing rewrite metadata"
}

Using this data, and as long as the original repository is present online and its history is not rewritten, the dataset can be extracted in a usable format. Please look at get_pypibugs.py for a script scaffold that allows data re-extraction. The code automatically, reads in the PyPiBugs dataset, clones the repos and checkouts out the appropriate commits. The visit_buggy_code and visit_fixed_code need to be implemented:

  • visit_buggy_code is called on the version of the code before fixing the bug. The full repository and is accessible at the repo_path argument.
  • visit_fixed_code is called immediately after visit_buggy_code and the repository is at the version after the bug is fixed.

Finally, buglab.data.extract extracts the dataset in the format used in this work.

Training Models

Supervised Learning Mode

To train supervised model over a fixed dataset of (random) bugs, run

python buglab/models/train.py MODEL_NAME TRAIN_DATA_PATH VALID_DATA_PATH MODEL_FILENAME

The models gnn-mlp and seq-great are those described in the paper. To define a new model, add it to the model registry.

To evaluate a model run

python buglab/models/evaluate.py MODEL_FILENAME TEST_DATA_PATH

Finally, to visualize the output of a model in an html file, run

python buglab/models/visualize.py MODEL_FILENAME DATA_PATH OUT_HTML

Self-Supervised Learning Mode

You may need to create your own infrastructure. Please, see the relevant section in this README file. The following instructions assume a manual start-up of all controllers. See here for a high-level description of the different processes involved.

The code is developed in a decoupled fashion, such that it can be run across multiple processes/computers. Processes communicate (via the network or IPC) with ZeroMQ. For the full BugLab to run, a number of processes need to be spawned. Follow the instructions below or start the relevant pods in your Kubernetes cluster using the Helm recipe in deployment.

  1. Build the Docker container for extraction by navigating to the root directory and running2. Start:

    docker build -f ./deployment/dockerfiles/baseimage.Dockerfile  -t buglab-extract .
    
  2. Start a bug selector server. This is responsible for selecting bugs to introduce when generating data for the bug detector. To use the random selector

    python -m buglab.controllers.randombugselectorserver "tcp://*:5556"
    

    Alternatively, this maybe a learned model,

    python -m buglab.controllers.bugselectorserver
    
  3. Start the data generating pipeline server

    python -m buglab.controllers.datageneratingpipeline_coordinator package_list_path
    

    where the package_list_path is a list of packages (text file; one package per line). See the fixed data extraction for more information about retrieving the package_list_path. This process is responsible for distributing work across a number of workers, instantiates a deduplication server, and acts a proxy among workers and the training processes.

  4. Start one or more (usually around 400) processes for extracting the graph representations of source code. Each process consults the datageneratingpipeline_coordinator, installs a package from PyPI and inserts bugs (rewrites the code) and extract the graph representation. Each sample is then passed into the training process.

    python -m buglab.controllers.datageneratingpipeline_worker
    

    Please look at the command-line arguments for defining the addresses (IPs, ports) to the bug selector server, deduplication server, pipeline coordinator, for more fine-grained control.

  5. Start the bug detector training process, the detector model server, and its data buffer

     python -m buglab.controllers.trainbugdetector PATH_FOR_METADATA_INIT MODEL_CONFIG OPTIONAL_VALIDATION_DATA_PATH MODEL_SAVE_PATH
     python -m buglab.controllers.detectortrainingdatabuffer PATH_OF_INITIAL_DATA
    

    Again check the command line arguments for more fine-grained options.

  6. Start the bug selector training process, the bug selector model server, and the scoring pipeline which scores the data with the detector probabilities used for the selector loss

     python -m buglab.controllers.trainbugselector MODEL_CONFIG MODEL_SAVE_PATH
     python -m buglab.controllers.detectordatascoringworker PATH_OF_INITIAL_DATA
    

    Check the command line arguments for more fine-grained options.

  7. Finally, you may peek at the output of the extraction server by running the dummy data subscriber

    python -m buglab.controllers.dummydatasubscriber
    

    which will subscribe to the dataset extraction publisher.

Repository Structure

A high-level overview of the structure of the code follows:

Utilities

Viewing *.msgpack.l.gz

To view the contents of a gzipped list of messagepack files, run

python -m buglab.utils.msgpackutils path/to/file.msgpack.l.gz | less

Visualizing graph representation

This requires GraphViz and Dot. To get the graph of a single file run

python -m buglab.representations.asdot path/to/file.py output/path.dot
dot -Tpdf output/path.dot > output/path.pdf

Interactive Model Explorer

To get an interactive mode for experimenting with a given model, run

streamlit run buglab/controllers/helper/modelexplorer.py path/to/models/dir/

Note that this requires an installation of streamlit.

Infrastructure

The BugLab compute infrastructure is managed through Terraform and Kubernetes. This is provided as-is and no support will be provided by the maintainers to run the infrastructure All relevant code is found in the deployment/ folder and is relevant to running BugLab on Azure. Note that the variables.tf needs to be populated with user-specific variables. Terraform is responsible for creating the Azure infrastructure including an appropriately configured Kubernetes cluster. Helm is the templating language used to define Kubernetes deployments.

First, make sure you have installed Azure CLI, terraform, kubectl (kubernetes cli), and helm.

Architecture

The high-level architecture of the infrastructure and communicating processes can be seen in the image below. Diagram

Spawning the BugLab infrastructure

If you are recreating this project, rather than working on an already-started version of it, you will need to register an app, create a service principal, and a client secret following the instructions here. You can find more detail on apps and service principals here if you need it. Make sure that the service principal has "contributor" rights for your resource group.

Initialise Terraform

Once you have filled variables.tf open up a shell and navigate to the Terraform directory and run terraform init:

cd $CODE/BugLab/deployment/terraform
terraform init

You might need to login with the Azure CLI if an error occurs.

Get the Kubernetes Credentials and Connect to K8s

To connect your kubectl command with the Kubernetes cluster that Terraform is now managing for you, dump the output of terraform output kube_config into a file somewhere. In powershell:

terraform output -raw kube_config | Out-File -FilePath azurek8s

or bash:

terraform output -raw kube_config > azurek8s

will put it in a file called azurek8s within the terraform directory.

Then set your KUBECONFIG environment variable to point towards this file. In powershell:

$Env:KUBECONFIG = "C:\path\to\azurek8s"

and bash:

export KUBECONFIG="/path/to/azurek8s"

Now, running the command kubectl get nodes should give you a list of all of the compute nodes which are defined in the kube.tf file.

Useful K8s commands

Kubernetes is a useful tool with a steep learning curve. We should keep some of the more useful commands here for reference. It will not be an exhaustive list at all.

To see what pods are running, run

kubectl get pods

The output will look something like:

NAME                             READY   STATUS    RESTARTS   AGE
train-selector-bc6cb4b4f-zjmg7   1/1     Running   0          42m

with possibly pods, depending on what is currently running. To see logs from a pod you are interested in, run:

kubectl logs train-selector-bc6cb4b4f-zjmg7

(where the name is copied from the output of kubectl get pods).

For more detail about the pod creation process (useful if the STATUS is Failed or something similarly disappointing), run:

kubectl describe pod train-selector-bc6cb4b4f-zjmg7

If the pod status is RUNNING, but you want to check if

  • data has mounted properly
  • the GPU exists
  • the expected processes are running

or similar, you can connect directly to a pod and get an interactive prompt using the command:

kubectl exec --stdin --tty train-selector-bc6cb4b4f-zjmg7 -- bash

Your prompt will then be that of the container running in the pod, and you should then be able to run nvidia-smi, htop or any other of your favourite inspection tools.

Starting experiments

To start an experiment in the K8s cluster use helm. First navigate to the deployment folder where the buglab Helm chart is located. Then kick-off an experiment:

helm install NAME_OF_EXPERIMENT ./buglab/ -n NAME_OF_EXPERIMENT --create-namespace -f config.yml

you can use the --dry-run option to check the Kubernetes configuration before running.

To stop an experiment run

helm uninstall NAME_OF_EXPERIMENT -n NAME_OF_EXPERIMENT
Monitoring

There are two instances of Grafana that will be useful for monitoring experiments. One is for monitoring the compute resources in the cluster, and the other is for monitoring your specific experiment.

Resource monitoring

To see a cluster wide monitor to check how compute resources are being used, you will want to connect to the Grafana service on the monitoring namespace. Run the command

kubectl --namespace monitoring port-forward svc/kube-prometheus-stack-grafana 8080:80

and then navigate to localhost:8080. The username is admin and password is prom-operator. There are many dashboards pre-configured to look at the compute resources broken down by pod and node.

It looks like GPU use monitoring is not supported for the type of GPU we are currently using. Following instructions from here I tried installing the metric exporter. The command

helm install --generate-name gpu-helm-charts/dcgm-exporter -f .\deployment\terraform\nvidia_values.yaml

works successfully, but the pods themselves fail with logs:

level=fatal msg="Error watching fields: Profiling is not supported for this group of GPUs or GPU"

It might be worth trying again at later, when GPU support has broadened.

Experiment monitoring

When you start an experiment using the helm install command described in the previous section, it will start a Grafana instance running in the namespace that you specified. Find it's name by first running

kubectl --namespace NAME_OF_EXPERIMENT get pods

and then connecting to by running (something like)

kubectl --namespace NAME_OF_EXPERIMENT port-forward grafana-5f977fdd7c-zpqcd 3000

Note the difference in port! You need to connect to 3000 in this instance. Navigate to localhost:3000. The username and password are both admin for this instance.

The json definition of the dashboards that are loaded by default are stored in this repo in the grafana/dashboards directory. However, they are actually read from the grafana blob container in the BugLab storage account. If you make changes in the grafana/dashboards directory, they will not be reflected in the dashboards that you actually see until you (manually) sync the changes with to the grafana blob, and then restart your Grafana service.

Developing

To contribute to this project, first follow the next steps to setup your development environment:

  • Install the library requirements.
  • Install the pre-commit hooks:
    • Run pip3 install pre-commit
    • Install the hooks pre-commit install

Running Tests

The test suite takes a long time to run, thus first select the test that you're interested in running.

pytest -k "name_of_test" -s .

at the root of the project.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

You might also like...
This GitHub repository contains code used for plots in NeurIPS 2021 paper 'Stochastic Multi-Armed Bandits with Control Variates.'

About Repository This repository contains code used for plots in NeurIPS 2021 paper 'Stochastic Multi-Armed Bandits with Control Variates.' About Code

Code for
Code for "Adversarial Attack Generation Empowered by Min-Max Optimization", NeurIPS 2021

Min-Max Adversarial Attacks [Paper] [arXiv] [Video] [Slide] Adversarial Attack Generation Empowered by Min-Max Optimization Jingkang Wang, Tianyun Zha

[NeurIPS 2021] Code for Unsupervised Learning of Compositional Energy Concepts

Unsupervised Learning of Compositional Energy Concepts This is the pytorch code for the paper Unsupervised Learning of Compositional Energy Concepts.

Source code of NeurIPS 2021 Paper ''Be Confident! Towards Trustworthy Graph Neural Networks via Confidence Calibration''

CaGCN This repo is for source code of NeurIPS 2021 paper "Be Confident! Towards Trustworthy Graph Neural Networks via Confidence Calibration". Paper L

Code repo for
Code repo for "RBSRICNN: Raw Burst Super-Resolution through Iterative Convolutional Neural Network" (Machine Learning and the Physical Sciences workshop in NeurIPS 2021).

RBSRICNN: Raw Burst Super-Resolution through Iterative Convolutional Neural Network An official PyTorch implementation of the RBSRICNN network as desc

[NeurIPS 2021 Spotlight] Code for Learning to Compose Visual Relations
[NeurIPS 2021 Spotlight] Code for Learning to Compose Visual Relations

Learning to Compose Visual Relations This is the pytorch codebase for the NeurIPS 2021 Spotlight paper Learning to Compose Visual Relations. Demo Imag

Code for Subgraph Federated Learning with Missing Neighbor Generation (NeurIPS 2021)

To run the code Unzip the package to your local directory; Run 'pip install -r requirements.txt' to download required packages; Open file ~/nips_code/

Code for NeurIPS 2021 paper: Invariant Causal Imitation Learning for Generalizable Policies

Invariant Causal Imitation Learning for Generalizable Policies Ioana Bica, Daniel Jarrett, Mihaela van der Schaar Neural Information Processing System

Official implementation of NeurIPS 2021 paper "One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective"

Official implementation of NeurIPS 2021 paper "One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective"

Comments
  • Failed to build docker image

    Failed to build docker image

    I ran into the the following problem when building the docker container:

    Installing collected packages: annoy, chardet, scipy, datasketch, docopt, parso, jedi, mypy-extensions, typing-inspect, pyyaml, libcst, msgpack, opentelemetry-api, thrift, protobuf, googleapis-common-protos, opentelemetry-sdk, grpcio, opentelemetry-exporter-jaeger, prometheus-client, opentelemetry-exporter-prometheus, jellyfish, tqdm, portalocker, cryptography, types-cryptography, PyJWT, msal, msal-extensions, azure-core, azure-identity, isodate, oauthlib, requests-oauthlib, msrest, azure-storage-blob, SetSimilaritySearch, sentencepiece, regex, dpu-utils, ptgnn, pystache, pyzmq

    Running setup.py install for annoy: started

    Running setup.py install for annoy: finished with status 'done'

    Exception:

    Traceback (most recent call last):

    File "/usr/lib/python3/dist-packages/pip/basecommand.py", line 215, in main status = self.run(options, args)

    File "/usr/lib/python3/dist-packages/pip/commands/install.py", line 356, in run requirement_set.install(

    File "/usr/lib/python3/dist-packages/pip/req/req_set.py", line 780, in install requirement.install(

    File "/usr/lib/python3/dist-packages/pip/req/req_install.py", line 851, in install self.move_wheel_files(self.source_dir, root=root, prefix=prefix)

    File "/usr/lib/python3/dist-packages/pip/req/req_install.py", line 1057, in move_wheel_files move_wheel_files(

    File "/usr/lib/python3/dist-packages/pip/wheel.py", line 245, in move_wheel_files scheme = distutils_scheme(

    File "/usr/lib/python3/dist-packages/pip/locations.py", line 153, in distutils_scheme i.finalize_options()

    File "/usr/share/python-wheels/setuptools-39.0.1-py2.py3-none-any.whl/setuptools/command/install.py", line 38, in finalize_options orig.install.finalize_options(self)

    File "/usr/lib/python3.8/distutils/command/install.py", line 388, in finalize_options self.set_undefined_options('build',

    File "/usr/lib/python3.8/distutils/cmd.py", line 286, in set_undefined_options src_cmd_obj = self.distribution.get_command_obj(src_cmd)

    File "/usr/lib/python3.8/distutils/dist.py", line 857, in get_command_obj klass = self.get_command_class(command)

    File "/usr/share/python-wheels/setuptools-39.0.1-py2.py3-none-any.whl/setuptools/dist.py", line 634, in get_command_class self.cmdclass[command] = cmdclass = ep.load()

    File "/usr/share/python-wheels/pkg_resources-0.0.0-py2.py3-none-any.whl/pkg_resources/init.py", line 2324, in load return self.resolve()

    File "/usr/share/python-wheels/pkg_resources-0.0.0-py2.py3-none-any.whl/pkg_resources/init.py", line 2330, in resolve module = import(self.module_name, fromlist=['name'], level=0)

    ModuleNotFoundError: No module named 'setuptools.command.build'

    The command '/bin/sh -c python3.8 -m pip install --no-cache-dir -r requirements.txt' returned a non-zero code: 2

    I am using Ubuntu 20.04.

    opened by jacobwwh 0
  • Fails to build docker image

    Fails to build docker image

    Hi there, I am having issues building the docker image. Using M1 Macbook Air.

    [ 2/15] RUN apt update: #6 0.299 #6 0.299 WARNING: apt does not have a stable CLI interface. Use with caution in scripts. #6 0.300 #6 0.733 Get:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease [1581 B] #6 1.111 Ign:2 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 InRelease #6 1.120 Get:3 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Release [564 B] #6 1.130 Get:4 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Release.gpg [833 B] #6 1.257 Get:5 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB] #6 1.258 Get:6 http://archive.ubuntu.com/ubuntu bionic InRelease [242 kB] #6 1.875 Err:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease #6 1.875 The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC #6 2.698 Get:7 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB] #6 3.120 Get:8 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB] #6 3.125 Get:9 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Packages [73.8 kB] #6 4.408 Get:10 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [2989 kB] #6 5.439 Get:11 http://archive.ubuntu.com/ubuntu bionic/restricted amd64 Packages [13.5 kB] #6 5.686 Get:12 http://archive.ubuntu.com/ubuntu bionic/multiverse amd64 Packages [186 kB] #6 5.881 Get:13 http://archive.ubuntu.com/ubuntu bionic/main amd64 Packages [1344 kB] #6 6.504 Get:14 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [1545 kB] #6 7.000 Get:15 http://security.ubuntu.com/ubuntu bionic-security/restricted amd64 Packages [1161 kB] #6 7.017 Get:16 http://archive.ubuntu.com/ubuntu bionic/universe amd64 Packages [11.3 MB] #6 8.060 Get:17 http://security.ubuntu.com/ubuntu bionic-security/multiverse amd64 Packages [22.8 kB] #6 9.493 Get:18 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages [2322 kB] #6 9.664 Get:19 http://archive.ubuntu.com/ubuntu bionic-updates/restricted amd64 Packages [1202 kB] #6 9.845 Get:20 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages [3421 kB] #6 10.21 Get:21 http://archive.ubuntu.com/ubuntu bionic-updates/multiverse amd64 Packages [29.9 kB] #6 10.21 Get:22 http://archive.ubuntu.com/ubuntu bionic-backports/universe amd64 Packages [12.9 kB] #6 10.21 Get:23 http://archive.ubuntu.com/ubuntu bionic-backports/main amd64 Packages [12.2 kB] #6 12.97 Reading package lists... #6 17.71 W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC #6 17.71 E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease' is not signed.


    executor failed running [/bin/sh -c apt update]: exit code: 100

    opened by shen-qin 1
  • Failed to train the model with PyPIBugs Dataset

    Failed to train the model with PyPIBugs Dataset

    Hi! I tried to extract the PyPIBugs Dataset and train the model in Supervised Learning mode using the instructions in README.md. I modified the buglab/data/diffmining/extract.py file to just extract part of the repositories from the jsonl file. I was able to split the small dataset into three datasets (train, test, and validation), but when I ran the following command,

    python -m buglab.models.train seq-great /data/splitDir-train/output.msgpack.l.gz /data/splitDir-valid/output.msgpack.l.gz /data/models/model1.pkl.gz the result was as followed:

    2022-03-09 12:29:18,743 [ptgnn.baseneuralmodel.trainer       @ MainThread  ] [INFO ]  Saving model with finalized metadata.
    2022-03-09 12:29:19,691 [ptgnn.baseneuralmodel.trainer       @ MainThread  ] [INFO ]  Model has 5959939 trainable parameters.
    2022-03-09 12:29:19,692 [ptgnn.baseneuralmodel.trainer       @ MainThread  ] [INFO ]  Using `cpu` for training.
    Traceback (most recent call last):
      File "/usr/lib/python3.8/runpy.py", line 192, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/usr/lib/python3.8/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/src/buglab/models/train.py", line 144, in <module>
        run_and_debug(lambda: run(args), args.get("--debug", False))
      File "/usr/local/lib/python3.8/dist-packages/dpu_utils/utils/debughelper.py", line 21, in run_and_debug
        func()
      File "/src/buglab/models/train.py", line 144, in <lambda>
        run_and_debug(lambda: run(args), args.get("--debug", False))
      File "/src/buglab/models/train.py", line 127, in run
        trainer.train(
      File "/usr/local/lib/python3.8/dist-packages/ptgnn/baseneuralmodel/trainer.py", line 409, in train
        target_metric, improved, _ = self._run_validation(
      File "/usr/local/lib/python3.8/dist-packages/ptgnn/baseneuralmodel/trainer.py", line 306, in _run_validation
        assert num_samples > 0, "No validation data was found."
    AssertionError: No validation data was found.
    

    I get the same result when I changed the validation dataset path into the path of the other two datasets(for debugging). So I think it was not the problem of the validation dataset?

    opened by codingClaire 1
  • Fails to build docker image, please help (running on ubuntu)

    Fails to build docker image, please help (running on ubuntu)

    building '_cffi_backend' extension creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/c x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DUSE__THREAD -DHAVE_SYNC_SYNCHRONIZE -I/usr/include/ffi -I/usr/include/libffi -I/usr/include/python3.8 -c c/_cffi_backend.c -o build/temp.linux-x86_64-3.8/c/_cffi_backend.o c/_cffi_backend.c:15:10: fatal error: ffi.h: No such file or directory #include <ffi.h> ^~~~~~~ compilation terminated. error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

    ----------------------------------------
    

    Can't rollback cffi, nothing uninstalled. Command "/usr/bin/python3.8 -u -c "import setuptools, tokenize;file='/tmp/pip-build-u54uqb0t/cffi/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-4t0k3y44-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-u54uqb0t/cffi/

    opened by pratikch98 1
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Replication attempt for the Protein Folding Model

RGN2-Replica (WIP) To eventually become an unofficial working Pytorch implementation of RGN2, an state of the art model for MSA-less Protein Folding f

Eric Alcaide 36 Nov 29, 2022
Replication of Pix2Seq with Pretrained Model

Pretrained-Pix2Seq We provide the pre-trained model of Pix2Seq. This version contains new data augmentation. The model is trained for 300 epochs and c

peng gao 51 Nov 22, 2022
Code for our NeurIPS 2021 paper Mining the Benefits of Two-stage and One-stage HOI Detection

CDN Code for our NeurIPS 2021 paper "Mining the Benefits of Two-stage and One-stage HOI Detection". Contributed by Aixi Zhang*, Yue Liao*, Si Liu, Mia

null 71 Dec 14, 2022
Code to reproduce the experiments from our NeurIPS 2021 paper " The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective"

Code To run: python runner.py new --save <SAVE_NAME> --data <PATH_TO_DATA_DIR> --dataset <DATASET> --model <model_name> [options] --n 1000 - train - t

Geoff Pleiss 5 Dec 12, 2022
Companion code for the paper "An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their Asymptotic Overconfidence" (NeurIPS 2021)

ReLU-GP Residual (RGPR) This repository contains code for reproducing the following NeurIPS 2021 paper: @inproceedings{kristiadi2021infinite, title=

Agustinus Kristiadi 4 Dec 26, 2021
This repo includes our code for evaluating and improving transferability in domain generalization (NeurIPS 2021)

Transferability for domain generalization This repo is for evaluating and improving transferability in domain generalization (NeurIPS 2021), based on

gordon 9 Nov 29, 2022
Code for MarioNette: Self-Supervised Sprite Learning, in NeurIPS 2021

MarioNette | Webpage | Paper | Video MarioNette: Self-Supervised Sprite Learning Dmitriy Smirnov, Michaƫl Gharbi, Matthew Fisher, Vitor Guizilini, Ale

Dima Smirnov 28 Nov 18, 2022
Code for Parameter Prediction for Unseen Deep Architectures (NeurIPS 2021)

Parameter Prediction for Unseen Deep Architectures (NeurIPS 2021) authors: Boris Knyazev, Michal Drozdzal, Graham Taylor, Adriana Romero-Soriano Overv

Facebook Research 462 Jan 3, 2023
Code for our NeurIPS 2021 paper 'Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation'

Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation (NeurIPS 2021) Code for our NeurIPS 2021 paper 'Exploiting the Intri

Shiqi Yang 53 Dec 25, 2022
Official code for On Path Integration of Grid Cells: Group Representation and Isotropic Scaling (NeurIPS 2021)

On Path Integration of Grid Cells: Group Representation and Isotropic Scaling This repo contains the official implementation for the paper On Path Int

Ruiqi Gao 39 Nov 10, 2022