PyTorch Connectomics: segmentation toolbox for EM connectomics

Overview


Introduction

The field of connectomics aims to reconstruct the wiring diagram of the brain by mapping the neural connections at the level of individual synapses. Recent advances in electronic microscopy (EM) have enabled the collection of a large number of image stacks at nanometer resolution, but the annotation requires expertise and is super time-consuming. Here we provide a deep learning framework powered by PyTorch for automatic and semi-automatic semantic and instance segmentation in connectomics, which is called PyTorch Connectomics (PyTC). This repository is mainly maintained by the Visual Computing Group (VCG) at Harvard University.

PyTorch Connectomics is currently under active development!

Key Features

  • Multi-task, Active and Semi-supervised Learning
  • Distributed and Mixed-precision Training
  • Scalability for Handling Large Datasets

If you want new features that are relatively easy to implement (e.g., loss functions, models), please open a feature requirement discussion in issues or implement by yourself and submit a pull request. For other features that requires substantial amount of design and coding, please contact the author directly.

Environment

The code is developed and tested under the following configurations.

  • Hardware: 1-8 Nvidia GPUs with at least 12G GPU memory (change SYSTEM.NUM_GPU accordingly based on the configuration of your machine)
  • Software: CentOS Linux 7.4 (Core), CUDA>=11.1, Python>=3.8, PyTorch>=1.9.0, YACS>=0.1.8

Installation

Create a new conda environment and install PyTorch:

conda create -n py3_torch python=3.8
source activate py3_torch
conda install pytorch torchvision cudatoolkit=11.1 -c pytorch -c nvidia

Please note that this package is mainly developed on the Harvard FASRC cluster. More information about GPU computing on the FASRC cluster can be found here.

Download and install the package:

git clone https://github.com/zudi-lin/pytorch_connectomics.git
cd pytorch_connectomics
pip install --upgrade pip
pip install --editable .

Since the package is under active development, the editable installation will allow any changes to the original package to reflect directly in the environment. For more information and frequently asked questions about installation, please check the installation guide.

Notes

Data Augmentation

We provide a data augmentation interface several different kinds of commonly used augmentation method for EM images. The interface is pure-python, and operate on and output only numpy arrays, so it can be easily incorporated into any kinds of python-based deep learning frameworks (e.g., TensorFlow). For more details about the design of the data augmentation module, please check the documentation.

YACS Configuration

We use the Yet Another Configuration System (YACS) library to manage the settings and hyperparameters in model training and inference. The configuration files for tutorial examples can be found here. All available configuration options can be found at connectomics/config/defaults.py. Please note that the default value of several options is None, which is only supported after YACS v0.1.8.

Segmentation Models

We provide several encoder-decoder architectures, which are customized 3D UNet and Feature Pyramid Network (FPN) models with various blocks and backbones. Those models can be applied for both semantic segmentation and bottom-up instance segmentation of 3D image stacks. Those models can also be constructed specifically for isotropic and anisotropic datasets. Please check the documentation for more details.

Acknowledgement

This project is built upon numerous previous projects. Especially, we'd like to thank the contributors of the following github repositories:

License

This project is licensed under the MIT License and the copyright belongs to all PyTorch Connectomics contributors - see the LICENSE file for details.

Citation

If you find PyTorch Connectomics (PyTC) useful in your research, please cite:

@misc{lin2019pytorchconnectomics,
  author =       {Zudi Lin and Donglai Wei},
  title =        {PyTorch Connectomics},
  howpublished = {\url{https://github.com/zudi-lin/pytorch_connectomics}},
  year =         {2019}
}
Comments
  • How to merge MitoEM output files?

    How to merge MitoEM output files?

    I run your code MitoEM-R-A.yaml in the MitoEM challenge, but many H5 files appear in the inference time . How can I merge them? The H5 file list is as follow: image

    good first issue 
    opened by Chenliang-Gu 22
  • RuntimeError: Expected one of cpu, cuda, mkldnn, opengl, opencl, ideep, hip, msnpu device type at start of device string: train in configs/CREMI-Synaptic-Cleft.yaml

    RuntimeError: Expected one of cpu, cuda, mkldnn, opengl, opencl, ideep, hip, msnpu device type at start of device string: train in configs/CREMI-Synaptic-Cleft.yaml

    Hi!

    While running the tutorial on Synaptic Cleft Segmentation (https://zudi-lin.github.io/pytorch_connectomics/build/html/tutorials/cremi.html), I encountered the following error:

    `Traceback (most recent call last):
      File "pytorch_connectomics/scripts/main.py", line 72, in <module>
        main()
      File "pytorch_connectomics/scripts/main.py", line 65, in main
        trainer = Trainer(cfg, mode, args.checkpoint, device)
      File "/n/home11/kguliani/pytorch_connectomics/connectomics/engine/trainer.py", line 29, in _init_
        self.model = build_model(self.cfg, self.device)
      File "/n/home11/kguliani/pytorch_connectomics/connectomics/model/_init_.py", line 27, in build_model
        model = model.to(device)
      File "/n/home11/kguliani/.conda/envs/py3_torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 431, in to
        device, dtype, non_blocking, convert_to_format = torch._C._nn._parse_to(*args, **kwargs)
    RuntimeError: Expected one of cpu, cuda, mkldnn, opengl, opencl, ideep, hip, msnpu device type at start of device string: train`
    

    Steps to reproduce the error in the shell-

    `$ srun --pty -p cox -t 2-00:00 --mem 16000 -n 1 --gres=gpu:4 /bin/bash 
     $ module load cuda/9.2.88-fasrc01 cudnn/7.1.4-fasrc01
     $ module load cuda/9.2.88-fasrc01
     $ module load Anaconda/2019.10
    
     $ source activate py3_torch 
    
     $ PATH=/usr/local/cuda/bin:$PATH
     $ echo $PATH
     $ CPATH=/usr/local/cuda/include:$CPATH
     $ echo $CPATH
    
     $ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -u pytorch_connectomics/scripts/main.py --config-file pytorch_connectomics/configs/CREMI-Synaptic-Cleft.yaml`
    

    Additionally,

    `$ python -c 'import torch; print(torch.version.cuda)'

    9.2

    $ nvcc --version

    9.2

    @zudi-lin Have a look pls

    Thanks!

    good first issue 
    opened by KeeratKG 6
  • Keyerror 'seg' in demo 'segmentation.ipynb'

    Keyerror 'seg' in demo 'segmentation.ipynb'

    Describe the bug Following the demo 'segmentation.ipynb', there will be keyerror 'seg' in paragraph 4.

    Screenshots Screenshot from 2019-07-02 08-47-52

    System Specifications Desktop (please complete the following information):

    • Operating system: Ubuntu 18.04LTS
    • CUDA version: 10.0
    • python version: 3.6
    • pytorch version: 1.1.0
    • Anything else that seems relevant: zwatershed version:1e8528c commit
    opened by HoraceKem 5
  • Model Input Size and Inference Stride Size in NucMM

    Model Input Size and Inference Stride Size in NucMM

    In configs/NucMM/NucMM-Mouse-Base.yaml, MODEL.INPUT_SIZE is [33, 97, 97] and INFERENCE.STRIDE is [26, 128, 128] . Doesn't this gap between the model input size and inference stride lead to some pixels uncovered during the test stage?

    opened by gitdxj 2
  • Data loading bug for single-channel label

    Data loading bug for single-channel label

    Hi,

    when I load the MitoEM dataset, I meet an error "IndexError: index 1 is out of bounds for axis 2 with size 1".

    data_io.py

    def vast2Seg(seg):
        # convert to 24 bits
        if seg.ndim==2:
            return seg
        else: #vast: rgb
            return seg[:,:,0].astype(np.uint32)*65536+seg[:,:,1].astype(np.uint32)*256+seg[:,:,2].astype(np.uint32)  # error!!
    

    I find the shape of "seg" is (2859, 2859, 1). Because the labels of the MitoEM dataset are grey image. It seems that you haven't considered the case where the label is a grayscale image?

    opened by Limingxing00 2
  • Data augmentation documentation link is not working.

    Data augmentation documentation link is not working.

    The following link for the documentation of data augmentation does not work: https://zudi-lin.github.io/pytorch_connectomics/build/html/modules/augmentation.html

    opened by atul-77 1
  • GPU underutilization

    GPU underutilization

    Hi,

    Thank you for your open-source awesome work!

    I meet a problem of GPU underutilization. I can run the code successfully. But when I use 4 Titan XP, only 2 are used. When I use 8 GPUs, only 3 are in use.

    | ID | Name | Serial | UUID || GPU temp. | GPU util. | Memory util. || Memory total | Memory used | Memory free || Display mode | Display active |

    | 0 | TITAN Xp | 0321118041854 | GPU-e47e3aa6-63e6-cccc-9575-740c0932425a || 60C | 0% | 52% || 12196MB | 6306MB | 5890MB || Disabled | Disabled | | 1 | TITAN Xp | 0321118043078 | GPU-99bacaab-e6a7-68a4-8f91-631df2578104 || 74C | 0% | 49% || 12196MB | 6024MB | 6172MB || Disabled | Disabled | | 2 | TITAN Xp | 0321118040179 | GPU-73abffcb-a391-bf5d-3095-0271e9919f8c || 70C | 0% | 49% || 12196MB | 6024MB | 6172MB || Disabled | Disabled | | 3 | TITAN Xp | 0321118042097 | GPU-ce8fed4b-4882-01e6-73f8-03019d2d1b5e || 24C | 0% | 0% || 12196MB | 11MB | 12185MB || Disabled | Disabled | | 4 | TITAN Xp | 0321118040143 | GPU-f9d9e962-3254-456d-47a8-c5da5f13551c || 29C | 0% | 0% || 12196MB | 11MB | 12185MB || Disabled | Disabled | | 5 | TITAN Xp | 0321118042010 | GPU-4130bd87-b82b-f901-6953-d925dc5fc039 || 30C | 0% | 0% || 12196MB | 11MB | 12185MB || Disabled | Disabled | | 6 | TITAN Xp | 0321118040854 | GPU-314d4746-9237-14e0-071b-cc2ee9c3dac6 || 27C | 0% | 0% || 12196MB | 11MB | 12185MB || Disabled | Disabled | | 7 | TITAN Xp | 0321118042171 | GPU-d9d858b8-721f-99a0-3da5-90b52bb2d78b || 27C | 0% | 0% || 12196MB | 11MB | 12185MB || Disabled | Disabled |

    System

    • ubuntu
    • pytorch 1.1
    • cuda 9.0

    (Due to hardware limitations, I can only use this version.) Could you help me?

    opened by Limingxing00 1
  • the sample of 'im_train.json' in the yaml

    the sample of 'im_train.json' in the yaml

    hi,

    thank you for the impressive work!I am concerned about the format that should be followed in the json file

    Could you give an official sample of it?

    opened by Limingxing00 1
  • GPU related error when using CPU only (GPUutil related)

    GPU related error when using CPU only (GPUutil related)

    After 49 iterations, the model always stops training and runs into this error. I am training without CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7

    Traceback (most recent call last):
      File "pytorch_connectomics/scripts/main.py", line 67, in <module>
        main()
      File "pytorch_connectomics/scripts/main.py", line 62, in main
        trainer.train()
      File "/n/home00/nwendt/zebrafish/pytorch_connectomics/connectomics/engine/trainer.py", line 92, in train
        GPUtil.showUtilization(all=True)
      File "/n/home00/nwendt/anaconda3/envs/py3_torch/lib/python3.7/site-packages/GPUtil/GPUtil.py", line 210, in showUtilization
        GPUs = getGPUs()
      File "/n/home00/nwendt/anaconda3/envs/py3_torch/lib/python3.7/site-packages/GPUtil/GPUtil.py", line 102, in getGPUs
        deviceIds = int(vals[i])
    ValueError: invalid literal for int() with base 10: 'No devices were found'
    
    opened by ygCoconut 1
  • inference for multiple input volumes: better to load the volume when needed

    inference for multiple input volumes: better to load the volume when needed

    For inference, current logic load all volumes first and then run inference for each volume.

    To save memory, it's better to load the volume when it's needed

    opened by donglaiw 1
  • test_augmentor does not support 2D models

    test_augmentor does not support 2D models

    Need to extend test_augmentor to also support 2D models.

    • data/augmentation/test_augmentor.py: add an attribute for the TestAugmentor object for 2D or 3D model
    • engine/trainer.py: specify whether it's a 2D or 3D model
    opened by donglaiw 1
  • Pre-train model and Evaluation Code for Neuron Segmentation tutorial

    Pre-train model and Evaluation Code for Neuron Segmentation tutorial

    Hi, Is it possible to add the evaluation code for Neuron Segmentation tutorial, and also to provide the pre-trained weights for the U-Net for Neuron segmentation please?

    opened by AlexandreDiPiazza 1
  • Added missing bracket

    Added missing bracket

    In file connectomics/data/utils/data_segmentation.py:

    • Added missing bracket

    • Added type casting to int62

    Indexing requires int or boolean. But the mask used for indexing where of format float without decimal ( like 124432.) since originally being of type float32.

    opened by Lauenburg 0
  • Bug: AttributeError in VolumeDataset when not providing a label list

    Bug: AttributeError in VolumeDataset when not providing a label list

    Steps to reproduce

    Use VolumeDataset without specifying a list of labels.

    In line 48, the label list is initialized with None.

    In line 88 with set self.label_vol_ratio = self.sample_label_size / self.sample_volume_size if self.label is not None.

    However, in line 232 self.label_vol_ratio is referenced even if the label list was not initialized and consequently self.label_vol_ratio was never defined.

    Current behavior (bug)

    Raises AttributeError

    Expected behavior (correct)

    Should be able to process a data volume without providing a list of labels since label has a default value of None.

    /label ~Bug

    enhancement 
    opened by Lauenburg 2
  • Very slow label smoothing with large input size

    Very slow label smoothing with large input size

    Thank you very much for your contributions! :)

    I'm implementing MALA's network in this pipeline. It saves memory by using convolution without padding, therefore can afford a larger input size during training (for example [64, 268, 268] with batch size 4 on a single GPU).

    However, the data loading time became unaffordable under this input size, where 90% of the time is spent on data-loading. I found that this is caused by SMOOTH, the post-process of the label after augmentation.

    I wonder if you are aware of this? Will discarding smooth influence training much?

    Merry Christmas :)

    opened by Levishery 1
  • A Problem

    A Problem

    Hello, I'm interested in the pytorch_connectomics. So I try to learn it, however, I encounter a problem. In the notebook, the data you gave is some pictures in png format, but the code seems to require the .h5 format, which makes me fail when I try to run the code. Could you do me a favor? Thank you!

    opened by Crystalqijing 1
Owner
Zudi Lin
CS Ph.D. student at Harvard
Zudi Lin
OpenMMLab Semantic Segmentation Toolbox and Benchmark.

Documentation: https://mmsegmentation.readthedocs.io/ English | 简体中文 Introduction MMSegmentation is an open source semantic segmentation toolbox based

OpenMMLab 5k Dec 31, 2022
Knowledge Distillation Toolbox for Semantic Segmentation

SegDistill: Toolbox for Knowledge Distillation on Semantic Segmentation Networks This repo contains the supported code and configuration files for Seg

null 9 Dec 12, 2022
A PyTorch Toolbox for Face Recognition

FaceX-Zoo FaceX-Zoo is a PyTorch toolbox for face recognition. It provides a training module with various supervisory heads and backbones towards stat

JDAI-CV 1.6k Jan 6, 2023
MMDetection3D is an open source object detection toolbox based on PyTorch

MMDetection3D is an open source object detection toolbox based on PyTorch, towards the next-generation platform for general 3D detection. It is a part of the OpenMMLab project developed by MMLab.

OpenMMLab 3.2k Jan 5, 2023
LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models. Developers can reproduce these SOTA methods and build their own methods.

TuZheng 405 Jan 4, 2023
Deep learning toolbox based on PyTorch for hyperspectral data classification.

Deep learning toolbox based on PyTorch for hyperspectral data classification.

Nicolas 304 Dec 28, 2022
Image Restoration Toolbox (PyTorch). Training and testing codes for DPIR, USRNet, DnCNN, FFDNet, SRMD, DPSR, BSRGAN, SwinIR

Image Restoration Toolbox (PyTorch). Training and testing codes for DPIR, USRNet, DnCNN, FFDNet, SRMD, DPSR, BSRGAN, SwinIR

Kai Zhang 2k Dec 31, 2022
A graph adversarial learning toolbox based on PyTorch and DGL.

GraphWar: Arms Race in Graph Adversarial Learning NOTE: GraphWar is still in the early stages and the API will likely continue to change. ?? Installat

Jintang Li 54 Jan 5, 2023
MMFlow is an open source optical flow toolbox based on PyTorch

Documentation: https://mmflow.readthedocs.io/ Introduction English | 简体中文 MMFlow is an open source optical flow toolbox based on PyTorch. It is a part

OpenMMLab 688 Jan 6, 2023
An open source object detection toolbox based on PyTorch

MMDetection is an open source object detection toolbox based on PyTorch. It is a part of the OpenMMLab project.

Bo Chen 24 Dec 28, 2022
mmfewshot is an open source few shot learning toolbox based on PyTorch

OpenMMLab FewShot Learning Toolbox and Benchmark

OpenMMLab 514 Dec 28, 2022
Mmdetection3d Noted - MMDetection3D is an open source object detection toolbox based on PyTorch

MMDetection3D is an open source object detection toolbox based on PyTorch

Jiangjingwen 13 Jan 6, 2023
The code repository for "PyCIL: A Python Toolbox for Class-Incremental Learning" in PyTorch.

PyCIL: A Python Toolbox for Class-Incremental Learning Introduction • Methods Reproduced • Reproduced Results • How To Use • License • Acknowledgement

Fu-Yun Wang 258 Dec 31, 2022
TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

yifan liu 147 Dec 3, 2022
(JMLR'19) A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)

Python Outlier Detection (PyOD) Deployment & Documentation & Stats Build Status & Coverage & Maintainability & License PyOD is a comprehensive and sca

Yue Zhao 6.6k Jan 3, 2023
A Topic Modeling toolbox

Topik A Topic Modeling toolbox. Introduction The aim of topik is to provide a full suite and high-level interface for anyone interested in applying to

Anaconda, Inc. (formerly Continuum Analytics, Inc.) 93 Dec 1, 2022
Bolt Online Learning Toolbox

Bolt Online Learning Toolbox Bolt features discriminative learning of linear predictors (e.g. SVM or Logistic Regression) using fast online learning a

Peter Prettenhofer 87 Dec 12, 2022
Machine Learning toolbox for Humans

Reproducible Experiment Platform (REP) REP is ipython-based environment for conducting data-driven research in a consistent and reproducible way. Main

Yandex 662 Nov 20, 2022
Toolbox of models, callbacks, and datasets for AI/ML researchers.

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch Website • Installation • Main

Pytorch Lightning 1.4k Dec 30, 2022