Tools for computational pathology

Overview

tests Documentation Status Code style: black PyPI version Downloads codecov

A toolkit for computational pathology and machine learning.

View documentation

Please cite our paper

Installation

There are several ways to install PathML:

  1. pip install (recommended for users)
  2. clone repo to local machine and install from source (recommended for developers/contributors)

Options (1) and (2) require that you first install all external dependencies:

  • openslide
  • JDK 8

We recommend using conda for environment management. Download Miniconda here

Note: these instructions are for Linux. Commands may be different for other platforms.

Installation option 1: pip install

Create conda environment

conda create --name pathml python=3.8
conda activate pathml

Install external dependencies (Linux) with Apt

sudo apt-get install openslide-tools g++ gcc libblas-dev liblapack-dev

Install external dependencies (MacOS) with Brew

brew install openslide

Install OpenJDK 8

conda install openjdk==8.0.152

Optionally install CUDA (instructions here)

Install PathML

pip install pathml

Installation option 2: clone repo and install from source

Clone repo

git clone https://github.com/Dana-Farber-AIOS/pathml.git
cd pathml

Create conda environment

conda env create -f environment.yml
conda activate pathml

Optionally install CUDA (instructions here)

Install PathML:

pip install -e .

CUDA

To use GPU acceleration for model training or other tasks, you must install CUDA. This guide should work, but for the most up-to-date instructions, refer to the official PyTorch installation instructions.

Check the version of CUDA:

nvidia-smi

Install correct version of cudatoolkit:

# update this command with your CUDA version number
conda install cudatoolkit=11.0

After installing PyTorch, optionally verify successful PyTorch installation with CUDA support:

python -c "import torch; print(torch.cuda.is_available())"

Using with Jupyter

Jupyter notebooks are a convenient way to work interactively. To use PathML in Jupyter notebooks:

Set JAVA_HOME environment variable

PathML relies on Java to enable support for reading a wide range of file formats. Before using PathML in Jupyter, you may need to manually set the JAVA_HOME environment variable specifying the path to Java. To do so:

  1. Get the path to Java by running echo $JAVA_HOME in the terminal in your pathml conda environment (outside of Jupyter)
  2. Set that path as the JAVA_HOME environment variable in Jupyter:
    import os
    os.environ["JAVA_HOME"] = "/opt/conda/envs/pathml" # change path as needed
    

Register PathML as an IPython kernel

conda activate pathml
conda install ipykernel
python -m ipykernel install --user --name=pathml

This makes PathML available as a kernel in jupyter lab or notebook.

Contributing

PathML is an open source project. Consider contributing to benefit the entire community!

There are many ways to contribute to PathML, including:

  • Submitting bug reports
  • Submitting feature requests
  • Writing documentation and examples
  • Fixing bugs
  • Writing code for new features
  • Sharing workflows
  • Sharing trained model parameters
  • Sharing PathML with colleagues, students, etc.

See contributing for more details.

License

The GNU GPL v2 version of PathML is made available via Open Source licensing. The user is free to use, modify, and distribute under the terms of the GNU General Public License version 2.

Commercial license options are available also.

Contact

Questions? Comments? Suggestions? Get in touch!

[email protected]

Comments
  • Improve performance

    Improve performance

    Currently, writing to h5 is the primary performance bottleneck when running a pipeline (see profile here).

    Perhaps by refactoring our h5 integration, we can boost performance. For example, maybe we should store tiles in separate groups instead of in one big array. This would potentially let us write in parallel and also make it trivial to support overlapping tiles (#223).

    Some work on this was being tracked in #200 but I am creating this issue so that we can discuss here instead of on the pull request

    enhancement 
    opened by jacob-rosenthal 16
  • Warnings associated with circulating a keras model among dask workers

    Warnings associated with circulating a keras model among dask workers

    We are getting a set of warnings (which I think is contributing to a subsequent error https://github.com/Dana-Farber-AIOS/pathml/issues/164#issuecomment-953384867 and the warnings https://github.com/Dana-Farber-AIOS/pathml/issues/211#issue-1038691185) is around the loading of a saved keras checkpoint file.

    Here is the warning we get, which we get when we run the SegmentMIF function:

    WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.

    We believe that the keras saved model is being recycled dirtily to dask workers (existing locks not released etc.), causing the warnings in https://github.com/Dana-Farber-AIOS/pathml/issues/211#issue-1038691185 and eventually, the error in https://github.com/Dana-Farber-AIOS/pathml/issues/164#issuecomment-953384867.

    To Reproduce Here is our pipeline. I cannot share the data for regulatory reasons.

    pipeline = Pipeline([
        CollapseRunsVectra(),    
        SegmentMIF(model='mesmer', nuclear_channel=0, cytoplasm_channel=2, image_resolution=0.5, 
                   gpu=False, postprocess_kwargs_whole_cell=None, 
                   postprocess_kwrags_nuclear=None),
        QuantifyMIF('nuclear_segmentation')   
    ])
    
    bug 
    opened by surya-narayanan 13
  • Docker ci

    Docker ci

    Add a Dockerfile which builds a working environment for pathml and starts up a jupyterlab instance in the container, which users can connect to and get up and running quickly. Also add a github actions workflow to build the image and publish it to dockerhub whenever we create a new release

    This will close #145

    opened by jacob-rosenthal 11
  • Unable to open tile object (object 'array' doesn't exist)

    Unable to open tile object (object 'array' doesn't exist)

    Describe the bug Unable to access tile array from TileDataset.__getitem__() KeyError: "Unable to open object (object 'array' doesn't exist)"

    To Reproduce Traceback:

    ---------------------------------------------------------------------------
    KeyError                                  Traceback (most recent call last)
    /tmp/ipykernel_4463/806735975.py in <module>
    ----> 1 tile_dataset.__getitem__(0)
    
    ~/pathml/pathml/ml/dataset.py in __getitem__(self, ix)
         54         ### this part copied from h5manager.get_tile()
         55         tile_image = self.h5["tiles"][str(k)]["array"][:]
    ---> 56 
         57         # get corresponding masks if there are masks
         58         if "masks" in self.h5["tiles"][str(k)].keys():
    
    h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
    
    h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
    
    /opt/conda/envs/wtf/lib/python3.8/site-packages/h5py/_hl/group.py in __getitem__(self, name)
        286                 raise ValueError("Invalid HDF5 object reference")
        287         else:
    --> 288             oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
        289 
        290         otype = h5i.get_type(oid)
    
    h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
    
    h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
    
    h5py/h5o.pyx in h5py.h5o.open()
    
    KeyError: "Unable to open object (object 'array' doesn't exist)"
    

    Expected behavior Should be able to access the tile array object. I created the h5 file with the following code:

    slidename = < Path to Slide >
    slide = SlideData(slide_name, backend = "bioformats", slide_type = types.Vectra)
    slide.write(f'/parent_directory/{slide.name}.h5')
    
    bug 
    opened by surya-narayanan 11
  • Adding to h5 file

    Adding to h5 file

    Is it possible to re-run a slide with a different pipeline and add to the h5 file, without re-doing tiling? Happy to provide an example, if that would be helpful.

    opened by surya-narayanan 9
  • Resolving dependencies between PathML and Deepcell

    Resolving dependencies between PathML and Deepcell

    Describe the bug When we run pipelines for multiparametric images we often want to include models from deepcell https://github.com/vanvalenlab/deepcell-tf (especially for the SegmentMIF transform). It is difficult for users to solve the environment since installing deepcell downgrades packages like numpy to incompatible versions. This has caused installation problems for @MohamedOmar2020 and other internal users

    To Reproduce

    pipe = Pipeline(
        [
            CollapseRunsVectra(),
            SegmentMIF(
                model="mesmer",
                nuclear_channel=0,
                cytoplasm_channel=7,
                image_resolution=0.5,
            ),
            QuantifyMIF(segmentation_mask="cell_segmentation"),
        ]
    )
    dataset.run(pipe)
    

    Expected behavior We would expect this to run but following the default installation instructions (option 1 from pip) followed by pip install deepcell results in a series of numpy errors when we attempt to run the pipeline

    Working Solution These dependency problems are resolved (at least to the extent that the above pipeline can run) by upgrading numpy after deepcell installation as follows

    conda create --name pathml python=3.8
    conda activate pathml
    sudo apt-get install openslide-tools g++ gcc libblas-dev liblapack-dev
    conda install openjdk==8.0.152
    pip install pathml
    pip install deepcell
    pip install --upgrade numpy
    

    The question is: should we include this in our installation instructions for users who want to use multiparametric pipelines? Should we create a docker container for multiparametric pipelines? Should we remove our dependency on deepcell and try to wrap the model more directly in PathML (or train our own)?

    enhancement 
    opened by ryanccarelli 8
  • Weird segmentation results

    Weird segmentation results

    Hello, I have a problem with the segmentation resulting from the mesmer model. It looks like the model is not identifying cells properly since many cells are too large with too many nuclei. This is the code used to process the image:

    pipe = Pipeline([ CollapseRunsVectra(), SegmentMIF(model='mesmer', nuclear_channel=0, cytoplasm_channel=7, image_resolution=0.5), QuantifyMIF(segmentation_mask='cell_segmentation') ])

    slidedata.run(pipe, distributed = False, tile_size= (12784, 13234), tile_pad=False, overwrite_existing_tiles=True)

    img = slidedata.tiles[3].image[10000:10500,12000:12500, :] nuc_mask = slidedata.tiles[3].masks['nuclear_segmentation'][10000:10500,12000:12500, :] cell_mask = slidedata.tiles[3].masks['cell_segmentation'][10000:10500,12000:12500, :]

    img_fiji = np.expand_dims(img, axis=0) nuc_cytoplasm = np.stack((img_fiji[:,:,:,0], img_fiji[:,:,:,7]), axis=-1) rgb_image = create_rgb_image(nuc_cytoplasm, channel_colors=['blue', 'green']) cell_segmentation_predictions = np.expand_dims(cell_mask, axis=0) overlay_cell = make_outline_overlay(rgb_data=rgb_image, predictions=cell_segmentation_predictions)

    That is how it looks like when I overlay the segmentation on the original image in fiji: OverlaySeg1

    I loaded a small part of the original image in fiji and adjusted the brightness/contrast then used the mesmer model for segmentation (using deepcell directly not pathml) and the segmentation seems good. this is how it looks like: Screenshot 2021-07-19 at 1 39 19 PM

    Is it right to assume that the bad segmentation shown in the first image has something to do with the brightness/contrast of the raw image? Any ideas how to fix this?

    Thanks in advance

    opened by MohamedOmar2020 8
  • indices should be either on cpu or on the same device as the indexed tensor (cpu)

    indices should be either on cpu or on the same device as the indexed tensor (cpu)

    Describe the bug RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (CPU)

    To Reproduce

    n_classes_pannuke = 6

    load the model

    hovernet = HoVerNet(n_classes=n_classes_pannuke)

    wrap model to use multi-GPU

    hovernet = torch.nn.DataParallel(hovernet)

    set up optimizer

    opt = torch.optim.Adam(hovernet.parameters(), lr = 1e-4)

    learning rate scheduler to reduce LR by factor of 10 each 25 epochs

    scheduler = StepLR(opt, step_size=25, gamma=0.1)

    send model to GPU

    hovernet.to(device);

    n_epochs = 50

    print performance metrics every n epochs

    print_every_n_epochs = None

    evaluating performance on a random subset of validation mini-batches

    this saves time instead of evaluating on the entire validation set

    n_minibatch_valid = 50

    epoch_train_losses = {} epoch_valid_losses = {} epoch_train_dice = {} epoch_valid_dice = {}

    best_epoch = 0

    main training loop

    for i in tqdm(range(n_epochs)): minibatch_train_losses = [] minibatch_train_dice = []

    ### put model in training mode
    hovernet.train()
    
    for data in train_dataloader:
        ### send the data to the GPU
        images = data[0].float().to(device)
        masks = data[1].to(device)
        hv = data[2].float().to(device)
        tissue_type = data[3]
    
        ### zero out gradient
        opt.zero_grad()
    
        ### forward pass
        outputs = hovernet(images)
    
        ### compute loss
        loss = loss_hovernet(outputs = outputs, ground_truth = [masks, hv], n_classes=6)
    
        ### track loss
        minibatch_train_losses.append(loss.item())
    
        ### also track dice score to measure performance
        preds_detection, preds_classification = post_process_batch_hovernet(outputs, n_classes=n_classes_pannuke)
        truth_binary = masks[:, -1, :, :] == 0
        dice = dice_score(preds_detection, truth_binary.cpu().numpy())
        minibatch_train_dice.append(dice)
    
        ### compute gradients
        loss.backward()
    
        ### step optimizer and scheduler
        opt.step()
    
    ### step LR scheduler
    scheduler.step()
    
    ### evaluate on random subset of validation data
    hovernet.eval()
    minibatch_valid_losses = []
    minibatch_valid_dice = []
    ### randomly choose minibatches for evaluating
    minibatch_ix = np.random.choice(range(len(valid_dataloader)), replace=False, size=n_minibatch_valid)
    with torch.no_grad():
        for j, data in enumerate(valid_dataloader):
            if j in minibatch_ix:
                # send the data to the GPU
                images = data[0].float().to(device)
                masks = data[1].to(device)
                hv = data[2].float().to(device)
                tissue_type = data[3]
    
                # forward pass
                outputs = hovernet(images)
    
                # compute loss
                loss = loss_hovernet(outputs = outputs, ground_truth = [masks, hv], n_classes=6)
    
                # track loss
                minibatch_valid_losses.append(loss.item())
    
                # also track dice score to measure performance
                preds_detection, preds_classification = post_process_batch_hovernet(outputs, n_classes=n_classes_pannuke)
                truth_binary = masks[:, -1, :, :] == 0
                dice = dice_score(preds_detection, truth_binary.cpu().numpy())
                minibatch_valid_dice.append(dice)
    
    ### average performance metrics over minibatches
    mean_train_loss = np.mean(minibatch_train_losses)
    mean_valid_loss = np.mean(minibatch_valid_losses)
    mean_train_dice = np.mean(minibatch_train_dice)
    mean_valid_dice = np.mean(minibatch_valid_dice)
    
    ### save the model with best performance
    if i != 0:
        if mean_valid_loss < min(epoch_valid_losses.values()):
            best_epoch = i
            torch.save(hovernet.state_dict(), f"hovernet_best_perf.pt")
    
    ### track performance over training epochs
    epoch_train_losses.update({i : mean_train_loss})
    epoch_valid_losses.update({i : mean_valid_loss})
    epoch_train_dice.update({i : mean_train_dice})
    epoch_valid_dice.update({i : mean_valid_dice})
    
    if print_every_n_epochs is not None:
        if i % print_every_n_epochs == print_every_n_epochs - 1:
            print(f"Epoch {i+1}/{n_epochs}:")
            print(f"\ttraining loss: {np.round(mean_train_loss, 4)}\tvalidation loss: {np.round(mean_valid_loss, 4)}")
            print(f"\ttraining dice: {np.round(mean_train_dice, 4)}\tvalidation dice: {np.round(mean_valid_dice, 4)}")
    

    save fully trained model

    torch.save(hovernet.state_dict(), f"hovernet_fully_trained.pt") print(f"\nEpoch with best validation performance: {best_epoch}")

    Expected behavior Should start model training

    Screenshots image

    Additional context Anyone else also have this problem. I run this on HPC with 4 GPUs, each having 16G memory.

    bug 
    opened by luzy05111036 7
  • Issue with distributed processing

    Issue with distributed processing

    Hello, Thank you for fixing the distributed issue with the mesmer model. I am running the pipeline with 'distributed = True' flag but I am getting many warnings and errors. Additionally, the pipeline was supposed to return 145 tiles but it is returning only 3 !. This is a part of the log message:

    def watershed(image, markers=None, connectivity=1, offset=None, mask=None, /Users/mohamedomar/.local/lib/python3.8/site-packages/skimage/morphology/_deprecated.py:5: skimage_deprecation: Function watershed is deprecated and will be removed in version 0.19. Use skimage.segmentation.watershed instead. def watershed(image, markers=None, connectivity=1, offset=None, mask=None, /Users/mohamedomar/opt/anaconda3/envs/pathml2/lib/python3.8/site-packages/anndata/_core/anndata.py:120: ImplicitModificationWarning: Transforming to str index. warnings.warn("Transforming to str index.", ImplicitModificationWarning) /Users/mohamedomar/.local/lib/python3.8/site-packages/skimage/morphology/_deprecated.py:5: skimage_deprecation: Function watershed is deprecated and will be removed in version 0.19. Use skimage.segmentation.watershed instead. def watershed(image, markers=None, connectivity=1, offset=None, mask=None, /Users/mohamedomar/opt/anaconda3/envs/pathml2/lib/python3.8/site-packages/anndata/io/h5ad.py:64: FutureWarning: The force_dense argument is deprecated. Use as_dense instead. warn( /Users/mohamedomar/opt/anaconda3/envs/pathml2/lib/python3.8/site-packages/anndata/core/anndata.py:120: ImplicitModificationWarning: Transforming to str index. warnings.warn("Transforming to str index.", ImplicitModificationWarning) storing 'coords' as categorical storing 'slice' as categorical storing 'tile' as categorical **> 2021-08-10 00:00:05.176962: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at save_restore_v2_ops.cc:205 : Out of range: Read less bytes than requested distributed.worker - WARNING - Compute Failed Function: apply args: (Tile(coords=(1598, 6616), name=None, image shape: (1598, 1654, 2), slide_type=SlideType(stain=Fluor, platform=Vectra, tma=None, rgb=None, volumetric=None, time_series=None), labels=None, masks=None, counts=None)) kwargs: {} Exception: OutOfRangeError()**

    That last error (bold text) is repeated many times.

    Thanks in advance

    bug 
    opened by MohamedOmar2020 7
  • Error installing owing to cached version of torch

    Error installing owing to cached version of torch

    If one tries to install pathml after a previously failed installation attempt, one runs into the following error, which I think is due to using cached files. One suggested solution (for just torch) is to do pip --no-cache-dir install torchvision, but i dont know if this is going to solve the issue and how to integrate this into intalling pathml as a whole, without installing each dependency one by one.

    (pathml) jupyter@cuda11:~$ pip install pathml
    Collecting pathml
      Using cached pathml-2.0.4-py3-none-any.whl (83 kB)
    Collecting scipy
      Using cached scipy-1.8.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (41.6 MB)
    Collecting python-bioformats>=4.0.0
      Using cached python_bioformats-4.0.5-py3-none-any.whl (41.4 MB)
    Collecting scikit-image
      Using cached scikit_image-0.19.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.8 MB)
    Collecting scikit-learn
      Using cached scikit_learn-1.0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.7 MB)
    Requirement already satisfied: pip in /opt/conda/envs/pathml/lib/python3.8/site-packages (from pathml) (22.0.3)
    Collecting openslide-python
      Using cached openslide-python-1.1.2.tar.gz (316 kB)
      Preparing metadata (setup.py) ... done
    Collecting dask[distributed]
      Using cached dask-2022.1.1-py3-none-any.whl (1.1 MB)
    Collecting pandas
      Using cached pandas-1.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.7 MB)
    Collecting matplotlib
      Using cached matplotlib-3.5.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.3 MB)
    Collecting anndata>=0.7.6
      Using cached anndata-0.7.8-py3-none-any.whl (91 kB)
    Requirement already satisfied: numpy>=1.16.4 in /opt/conda/envs/pathml/lib/python3.8/site-packages (from pathml) (1.22.2)
    Collecting h5py
      Using cached h5py-3.6.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (4.5 MB)
    Collecting opencv-contrib-python
      Using cached opencv_contrib_python-4.5.5.62-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (66.6 MB)
    Collecting pydicom
      Using cached pydicom-2.2.2-py3-none-any.whl (2.0 MB)
    Collecting torch
    SystemError: deallocated bytearray object has exported buffers
    ERROR: Exception:
    Traceback (most recent call last):
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/cli/base_command.py", line 167, in exc_logging_wrapper
        status = run_func(*args)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/cli/req_command.py", line 205, in wrapper
        return func(self, options, args)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/commands/install.py", line 339, in run
        requirement_set = resolver.resolve(
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 94, in resolve
        result = self._result = resolver.resolve(
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 481, in resolve
        state = resolution.resolve(requirements, max_rounds=max_rounds)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 373, in resolve
        failure_causes = self._attempt_to_pin_criterion(name)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 213, in _attempt_to_pin_criterion
        criteria = self._get_updated_criteria(candidate)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 204, in _get_updated_criteria
        self._add_to_criteria(criteria, requirement, parent=candidate)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 172, in _add_to_criteria
        if not criterion.candidates:
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/resolvelib/structs.py", line 151, in __bool__
        return bool(self._sequence)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 155, in __bool__
        return any(self)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 143, in <genexpr>
        return (c for c in iterator if id(c) not in self._incompatible_ids)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 47, in _iter_built
        candidate = func()
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 215, in _make_candidate_from_link
        self._link_candidate_cache[link] = LinkCandidate(
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 288, in __init__
        super().__init__(
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 158, in __init__
        self.dist = self._prepare()
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 227, in _prepare
        dist = self._prepare_distribution()
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 299, in _prepare_distribution
        return preparer.prepare_linked_requirement(self._ireq, parallel_builds=True)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 487, in prepare_linked_requirement
        return self._prepare_linked_requirement(req, parallel_builds)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 532, in _prepare_linked_requirement
        local_file = unpack_url(
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 214, in unpack_url
        file = get_http_url(
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 94, in get_http_url
        from_path, content_type = download(link, temp_dir.path)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/network/download.py", line 133, in __call__
        resp = _http_get_download(self._session, link)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/network/download.py", line 116, in _http_get_download
        resp = session.get(target_url, headers=HEADERS, stream=True)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/requests/sessions.py", line 542, in get
        return self.request('GET', url, **kwargs)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/network/session.py", line 454, in request
        return super().request(method, url, *args, **kwargs)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/requests/sessions.py", line 529, in request
        resp = self.send(prep, **send_kwargs)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/requests/sessions.py", line 645, in send
        r = adapter.send(request, **kwargs)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/cachecontrol/adapter.py", line 48, in send
        cached_response = self.controller.cached_request(request)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/cachecontrol/controller.py", line 151, in cached_request
        resp = self.serializer.loads(request, cache_data)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/cachecontrol/serialize.py", line 95, in loads
        return getattr(self, "_loads_v{}".format(ver))(request, data)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/cachecontrol/serialize.py", line 182, in _loads_v4
        cached = msgpack.loads(data, raw=False)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/msgpack/fallback.py", line 128, in unpackb
        ret = unpacker._unpack()
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/msgpack/fallback.py", line 592, in _unpack
        ret[key] = self._unpack(EX_CONSTRUCT)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/msgpack/fallback.py", line 592, in _unpack
        ret[key] = self._unpack(EX_CONSTRUCT)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/msgpack/fallback.py", line 546, in _unpack
        typ, n, obj = self._read_header()
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/msgpack/fallback.py", line 488, in _read_header
        obj = self._read(n)
      File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/msgpack/fallback.py", line 407, in _read
        ret = self._buffer[i : i + n]
    MemoryError
    
    
    bug 
    opened by surya-narayanan 6
  • Extracting tile returns multi dim output from H&E qptiff

    Extracting tile returns multi dim output from H&E qptiff

    I think this is specific to my scanner, when running the following line on my qptiff H&E svs file,

    region = wsi.slide.extract_region(location = (900, 800), size = (500, 500))

    I get a 5 dimensional object, which is incompatible for downstream analysis.

    Do you think it may be useful to run a np.squeeze right before returning the array in wsi.slide.extract_region?

    opened by surya-narayanan 6
  • How to implement HoVer-Net Model in TIAToolBox?

    How to implement HoVer-Net Model in TIAToolBox?

    opened by WilliWespe 0
  • How to train hovernet starting with semantic-level mask image?

    How to train hovernet starting with semantic-level mask image?

    Is your feature request related to a problem? Please describe. I have large WSI data and multi-class jpeg masks, but I am so tired to find a solution to make them work with any hovernet implementation.

    Describe the solution you'd like I'd like to be able to feed my large WSI data along with the jpeg masks, and tiling and training then took place.

    Describe alternatives you've considered If I still need instance masks, I can do that I do this in watershed. But still don't know what format PathML would want (e.g npy, mat, json, jpeg ..etc

    Additional context

    Any help is highly appreciated.

    Thanks!

    enhancement 
    opened by OmarAshkar 3
  • Deepcell segmentation without cytoplasm channel

    Deepcell segmentation without cytoplasm channel

    I note that Deepcell provides both a segmentation (nuclear channel only) and mesmer (nuclear and cytoplasm) model (https://www.deepcell.org/predict) Our datasets do not have a single general cytoplasm marker that will capture all cell type cytoplasm required by mesmer model, eg tumour cells vs immune vs stromal cells Can both nuclear_channel=DAPI, cytoplasm_channel=DAPI in mesmer model? or can SegmentMIF(model='segmentation' be supported? thanks!

    enhancement 
    opened by jamesMo84 1
  • Allow SlideData to use existing h5path files

    Allow SlideData to use existing h5path files

    As motivated by https://github.com/Dana-Farber-AIOS/pathml/issues/332 and https://github.com/Dana-Farber-AIOS/pathml/issues/300, this modifies SlideData to read and update Tiles from an existing h5path file instead of requiring each pipeline run to recreate all tiles from scratch.

    This includes #335 as many transforms (e.g. BoxBlur) require np.uint8 data instead of the default float16 saved to h5path files. I was also working off my load-data-in-workers branch because it had significant performance changes for my use cases. Sorry about the branching messiness, hopefully the changes will be clearer as other branches are merged into dev.

    This makes breaking changes to the SlideData API, namely replacing generate_tiles with get_tiles and moving the tile parameterization from run to the SlideData constructor.

    opened by tddough98 0
  • Load tiles in parallel on workers and add options to `TissueDetectionHE`

    Load tiles in parallel on workers and add options to `TissueDetectionHE`

    This contains two separate improvements

    • add drop_empty_tiles and keep_mask options to the TissueDetectionHE transform to bypass saving tiles with no detected H&E tissue and bypass saving masks
    • parallelize tile image loading by using dask.delayed to avoid loading images on the main thread

    The first part is both for convenience and performance. It's possible to generate all tiles and then filter out the empty tiles and remove masks before writing the h5path to disk, but that requires that all the tiles be added to the Tiles which takes IO time. If these tiles and masks are never saved even to in-memory objects, processing can finish faster.

    The second part is a core performance issue with distributed processing. I believe it's relevant to https://github.com/Dana-Farber-AIOS/pathml/issues/211 and https://github.com/Dana-Farber-AIOS/pathml/issues/299. When processing tiles, I've found that loading time >> processing time, and currently, tile image data is loaded on the main thread and scatters the loaded tile to workers. This prevents any parallelism as all but one worker are always waiting for the main thread to load data and send them a tile.

    Additionally, as all tiles have to be loaded on the main thread, the block that generates the futures

    for tile in self.generate_tiles(
        level=level,
        shape=tile_size,
        stride=tile_stride,
        pad=tile_pad,
        **kwargs,
    ):
        if not tile.slide_type:
            tile.slide_type = self.slide_type
        # explicitly scatter data, i.e. send the tile data out to the cluster before applying the pipeline
        # according to dask, this can reduce scheduler burden and keep data on workers
        big_future = client.scatter(tile)
        f = client.submit(pipeline.apply, big_future)
        processed_tile_futures.append(f)
    

    has to load all tiles and send them all to workers before ANY tile can be added to the Tiles and the memory can be freed in the next block

    # as tiles are processed, add them to h5
    for future, tile in dask.distributed.as_completed(
        processed_tile_futures, with_results=True
    ):
        self.tiles.add(tile)
    

    causing the dramatic memory leaks seen in https://github.com/Dana-Farber-AIOS/pathml/issues/211.

    I've used dask.delayed to prevent reading from the input file until the image is accessed on the worker. The code that accesses the file and loads the image can now be run by each worker in parallel. To preserve the parallelism, we have to take care not to access and load tile.image on the main thread before loading it on the worker, or to at least wrap accesses in dask.delayed as in SlideData.generate_tiles.

    I had some issues with the backends not being picklable. The Backend has to be sent to each worker so it has access to the code that interfaces with the filesystem. I changed Backend filelike attributes to be lazily evaluated with the @property decorator.

    opened by tddough98 4
  • Parameterize dtype for h5path with `SlideData` constructor

    Parameterize dtype for h5path with `SlideData` constructor

    Currently, PathML stores all images with float16, forcing all image inputs to be upcast or downcast to this data type, which increases storage size or loses information. There already is a dtype parameter in the SlideData constructor, but it's only used to assist the BioFormatsBackend in loading images correctly. This repurposes that parameter to control what dtype h5py uses when writing image data.

    I also changed masks to stored as ENUM and use the strongest compression setting as boolean masks are highly compressible and easily compressed. The compression made a huge difference in file size, and using (HDFView)[https://www.hdfgroup.org/downloads/hdfview/] showed a compression ratio of 100-200x for masks. The ENUM data type is stored as an 8-bit integer (https://docs.h5py.org/en/stable/special.html#enumerated-types) but at least this is less than using float16.

    opened by tddough98 4
Releases(v2.1.0)
  • v2.1.0(Apr 22, 2022)

    What's Changed

    • Clean SegmentMIF by @ryanccarelli in https://github.com/Dana-Farber-AIOS/pathml/pull/294
      • Removed GPU argument from SegmentMIF
      • Separated whole_cell and nuclear kwargs
    • Update README.md by @surya-narayanan in https://github.com/Dana-Farber-AIOS/pathml/pull/298
    • Update quantify mif by @jacob-rosenthal in https://github.com/Dana-Farber-AIOS/pathml/pull/301
      • update the functional implementation F() to not require a tile object.
      • Add "label" property to counts matrix.
    • Fix tiling bug by @jacob-rosenthal in https://github.com/Dana-Farber-AIOS/pathml/pull/306
      • Fixed bug for generate_tiles() within OpenSlideBackend and BioFormatsBackend. Tile shape evenly divides into slide shape
    • Added logging functionality by @BeeGass in https://github.com/Dana-Farber-AIOS/pathml/pull/304
      • Includes logger customization
    • Don't augment test or valid splits for PanNuke by @jacob-rosenthal in https://github.com/Dana-Farber-AIOS/pathml/pull/309

    New Contributors

    • @BeeGass made their first contribution in https://github.com/Dana-Farber-AIOS/pathml/pull/304

    Full Changelog: https://github.com/Dana-Farber-AIOS/pathml/compare/v2.0.4...v2.1.0

    Source code(tar.gz)
    Source code(zip)
  • v2.0.4(Feb 7, 2022)

    What's Changed

    • Fix bug caused by mixing up (i, j) and (x, y) coordinate systems in BioFormatsBackend (#278)
    • Add option to not normalize image in BioFormatsBackend.extract_region() (#279)
    • Fix logic when inferring correct backend to use from file path which was failing on paths containing periods (#284)
    • Fix bug to correctly pass image_resolution argument to Mesmer model (#286)
    • Fix outdated url for PanNuke dataset (#287) by @Yu-AnChen
    • Fix GitHub Actions configuration which was causing testing suite to hang (#289)

    New Contributors

    • @dependabot made their first contribution in https://github.com/Dana-Farber-AIOS/pathml/pull/275
    • @Yu-AnChen made their first contribution in https://github.com/Dana-Farber-AIOS/pathml/pull/287

    Full Changelog: https://github.com/Dana-Farber-AIOS/pathml/compare/v2.0.3...v2.0.4

    Source code(tar.gz)
    Source code(zip)
  • v2.0.3(Jan 7, 2022)

  • v2.0.2(Jan 6, 2022)

    What's Changed

    • Streamline environment setup by removing spams as a dependency (#142) and updating environment.yml to create an environment with both PathML and deepcell (#259 #210)
    • Add a Dockerfile for another installation option, and a GitHub Actions workflow to build and publish it to Dockerhub on new release (#145)
    • Add series_as_channels flag to BioFormatsBackend.extract_region() to fix support for images from the MISI lab (#261)

    Full Changelog: https://github.com/Dana-Farber-AIOS/pathml/compare/v2.0.1...v2.0.2

    Source code(tar.gz)
    Source code(zip)
  • v2.0.dev4(Jan 4, 2022)

  • v2.0.dev3(Jan 4, 2022)

  • v2.0.dev2(Jan 4, 2022)

  • 2.0.dev1(Jan 4, 2022)

  • v2.0.1(Dec 25, 2021)

    What's Changed

    • Improve h5path read/write by @ryanccarelli in https://github.com/Dana-Farber-AIOS/pathml/pull/260

    Full Changelog: https://github.com/Dana-Farber-AIOS/pathml/compare/v2.0.0...v2.0.1

    Source code(tar.gz)
    Source code(zip)
  • v2.0.0(Dec 19, 2021)

    What's new in v2.0.0:

    • Changed h5path format and refactored h5manager to improve performance (#231)
    • support XYZCT images for TileDataset (#233)
    • Cleaned up versioning tracker (#236)
    • fix bug when reading region from openslide backend at higher levels (#242)
    • Add support for multi-series images with BioformatsBackend (#251)
    • Pin python-bioformats version to avoid any possibility of log4j hacks (#256)
    • Added optional flag in SlideDataset.run() to write slides to h5path as they finish processing (#226)
    • Added GitHub Actions workflow to automatically build package and publish to PyPI when a new release is created (#235)

    Because the file format is changed in this version, .h5path files saved in older versions will not be able to be loaded in this one, and vice versa (i.e. breaking backwards compatibility, hence the bumped major version).

    Source code(tar.gz)
    Source code(zip)
  • v1.0.4(Nov 29, 2021)

  • v1.0.dev4(Nov 29, 2021)

Owner
AI Operations and Data Science Services group
null
House-GAN++: Generative Adversarial Layout Refinement Network towards Intelligent Computational Agent for Professional Architects

House-GAN++ Code and instructions for our paper: House-GAN++: Generative Adversarial Layout Refinement Network towards Intelligent Computational Agent

null 122 Dec 28, 2022
null 190 Jan 3, 2023
A Python framework for developing parallelized Computational Fluid Dynamics software to solve the hyperbolic 2D Euler equations on distributed, multi-block structured grids.

pyHype: Computational Fluid Dynamics in Python pyHype is a Python framework for developing parallelized Computational Fluid Dynamics software to solve

Mohamed Khalil 21 Nov 22, 2022
Flow is a computational framework for deep RL and control experiments for traffic microsimulation.

Flow Flow is a computational framework for deep RL and control experiments for traffic microsimulation. See our website for more information on the ap

null 867 Jan 2, 2023
Computational modelling of ray propagation through optical elements using the principles of geometric optics (Ray Tracer)

Computational modelling of ray propagation through optical elements using the principles of geometric optics (Ray Tracer) Introduction By applying the

Son Gyo Jung 1 Jul 9, 2022
Computational Methods Course at UdeA. Forked and size reduced from:

Computational Methods for Physics & Astronomy Book version at: https://restrepo.github.io/ComputationalMethods by: Sebastian Bustamante 2014/2015 Dieg

Diego Restrepo 11 Sep 10, 2022
A computational optimization project towards the goal of gerrymandering the results of a hypothetical election in the UK.

A computational optimization project towards the goal of gerrymandering the results of a hypothetical election in the UK.

Emma 1 Jan 18, 2022
Python implementation of 3D facial mesh exaggeration using the techniques described in the paper: Computational Caricaturization of Surfaces.

Python implementation of 3D facial mesh exaggeration using the techniques described in the paper: Computational Caricaturization of Surfaces.

Wonjong Jang 8 Nov 1, 2022
Computational inteligence project on faces in the wild dataset

Table of Contents The general idea How these scripts work? Loading data Needed modules and global variables Parsing the arrays in dataset Extracting a

tooraj taraz 4 Oct 21, 2022
Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.

Core ML Tools Use coremltools to convert machine learning models from third-party libraries to the Core ML format. The Python package contains the sup

Apple 3k Jan 8, 2023
A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Feature Forge This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, e

Machinalis 380 Nov 5, 2022
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group 8.4k Jan 3, 2023
potpourri3d - An invigorating blend of 3D geometry tools in Python.

A Python library of various algorithms and utilities for 3D triangle meshes and point clouds. Managed by Nicholas Sharp, with new tools added lazily as needed. Currently, mainly bindings to C++ tools from geometry-central.

Nicholas Sharp 295 Jan 5, 2023
Simple tools for logging and visualizing, loading and training

TNT TNT is a library providing powerful dataloading, logging and visualization utilities for Python. It is closely integrated with PyTorch and is desi

null 1.5k Jan 2, 2023
MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

MMdnn MMdnn is a comprehensive and cross-framework tool to convert, visualize and diagnose deep learning (DL) models. The "MM" stands for model manage

Microsoft 5.7k Jan 9, 2023
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

This repository holds NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pytorch. Some of the code here will be included in upstream Pytorch eventually. The intention of Apex is to make up-to-date utilities available to users as quickly as possible.

NVIDIA Corporation 6.9k Jan 3, 2023
Tools for manipulating UVs in the Blender viewport.

UV Tool Suite for Blender A set of tools to make editing UVs easier in Blender. These tools can be accessed wither through the Kitfox - UV panel on th

null 35 Oct 29, 2022
Deep learning (neural network) based remote photoplethysmography: how to extract pulse signal from video using deep learning tools

Deep-rPPG: Camera-based pulse estimation using deep learning tools Deep learning (neural network) based remote photoplethysmography: how to extract pu

Terbe Dániel 138 Dec 17, 2022
PointCloud Annotation Tools, support to label object bound box, ground, lane and kerb

PointCloud Annotation Tools, support to label object bound box, ground, lane and kerb

halo 368 Dec 6, 2022