Tools for computational pathology

Last update: Dec 12, 2022

Related tags

Deep Learning machine-learning biomedical-image-processing digital-pathology pathology computational-pathology histopathology

Overview

A toolkit for computational pathology and machine learning.

View documentation

Please cite our paper

Installation

There are several ways to install PathML:

pip install (recommended for users)
clone repo to local machine and install from source (recommended for developers/contributors)

Options (1) and (2) require that you first install all external dependencies:

openslide
JDK 8

We recommend using conda for environment management. Download Miniconda here

Note: these instructions are for Linux. Commands may be different for other platforms.

Installation option 1: pip install

Create conda environment

conda create --name pathml python=3.8
conda activate pathml

Install external dependencies (Linux) with Apt

sudo apt-get install openslide-tools g++ gcc libblas-dev liblapack-dev

Install external dependencies (MacOS) with Brew

brew install openslide

Install OpenJDK 8

conda install openjdk==8.0.152

Optionally install CUDA (instructions here)

Install PathML

pip install pathml

Installation option 2: clone repo and install from source

Clone repo

git clone https://github.com/Dana-Farber-AIOS/pathml.git
cd pathml

Create conda environment

conda env create -f environment.yml
conda activate pathml

Optionally install CUDA (instructions here)

Install PathML:

pip install -e .

CUDA

To use GPU acceleration for model training or other tasks, you must install CUDA. This guide should work, but for the most up-to-date instructions, refer to the official PyTorch installation instructions.

Check the version of CUDA:

nvidia-smi

Install correct version of cudatoolkit:

# update this command with your CUDA version number
conda install cudatoolkit=11.0

After installing PyTorch, optionally verify successful PyTorch installation with CUDA support:

python -c "import torch; print(torch.cuda.is_available())"

Using with Jupyter

Jupyter notebooks are a convenient way to work interactively. To use PathML in Jupyter notebooks:

Set JAVA_HOME environment variable

PathML relies on Java to enable support for reading a wide range of file formats. Before using PathML in Jupyter, you may need to manually set the JAVA_HOME environment variable specifying the path to Java. To do so:

Get the path to Java by running echo $JAVA_HOME in the terminal in your pathml conda environment (outside of Jupyter)

Set that path as the JAVA_HOME environment variable in Jupyter:

import os
os.environ["JAVA_HOME"] = "/opt/conda/envs/pathml" # change path as needed

Register PathML as an IPython kernel

conda activate pathml
conda install ipykernel
python -m ipykernel install --user --name=pathml

This makes PathML available as a kernel in jupyter lab or notebook.

Contributing

PathML is an open source project. Consider contributing to benefit the entire community!

There are many ways to contribute to PathML, including:

Submitting bug reports
Submitting feature requests
Writing documentation and examples
Fixing bugs
Writing code for new features
Sharing workflows
Sharing trained model parameters
Sharing PathML with colleagues, students, etc.

See contributing for more details.

License

The GNU GPL v2 version of PathML is made available via Open Source licensing. The user is free to use, modify, and distribute under the terms of the GNU General Public License version 2.

Commercial license options are available also.

Contact

Questions? Comments? Suggestions? Get in touch!

[email protected]

Comments

Improve performance

Currently, writing to h5 is the primary performance bottleneck when running a pipeline (see profile here).

Perhaps by refactoring our h5 integration, we can boost performance. For example, maybe we should store tiles in separate groups instead of in one big array. This would potentially let us write in parallel and also make it trivial to support overlapping tiles (#223).

Some work on this was being tracked in #200 but I am creating this issue so that we can discuss here instead of on the pull request
enhancement

opened by jacob-rosenthal 16
Warnings associated with circulating a keras model among dask workers
We are getting a set of warnings (which I think is contributing to a subsequent error https://github.com/Dana-Farber-AIOS/pathml/issues/164#issuecomment-953384867 and the warnings https://github.com/Dana-Farber-AIOS/pathml/issues/211#issue-1038691185) is around the loading of a saved keras checkpoint file.

Here is the warning we get, which we get when we run the SegmentMIF function:

WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), NOT tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.

We believe that the keras saved model is being recycled dirtily to dask workers (existing locks not released etc.), causing the warnings in https://github.com/Dana-Farber-AIOS/pathml/issues/211#issue-1038691185 and eventually, the error in https://github.com/Dana-Farber-AIOS/pathml/issues/164#issuecomment-953384867.

To Reproduce Here is our pipeline. I cannot share the data for regulatory reasons.

pipeline = Pipeline([ CollapseRunsVectra(), SegmentMIF(model='mesmer', nuclear_channel=0, cytoplasm_channel=2, image_resolution=0.5, gpu=False, postprocess_kwargs_whole_cell=None, postprocess_kwrags_nuclear=None), QuantifyMIF('nuclear_segmentation') ])
bug
opened by surya-narayanan 13
Docker ci

Add a Dockerfile which builds a working environment for pathml and starts up a jupyterlab instance in the container, which users can connect to and get up and running quickly. Also add a github actions workflow to build the image and publish it to dockerhub whenever we create a new release

This will close #145

opened by jacob-rosenthal 11

Unable to open tile object (object 'array' doesn't exist)

Describe the bug Unable to access tile array from TileDataset.__getitem__() KeyError: "Unable to open object (object 'array' doesn't exist)"

To Reproduce Traceback:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_4463/806735975.py in <module>
----> 1 tile_dataset.__getitem__(0)

~/pathml/pathml/ml/dataset.py in __getitem__(self, ix)
     54         ### this part copied from h5manager.get_tile()
     55         tile_image = self.h5["tiles"][str(k)]["array"][:]
---> 56 
     57         # get corresponding masks if there are masks
     58         if "masks" in self.h5["tiles"][str(k)].keys():

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

/opt/conda/envs/wtf/lib/python3.8/site-packages/h5py/_hl/group.py in __getitem__(self, name)
    286                 raise ValueError("Invalid HDF5 object reference")
    287         else:
--> 288             oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
    289 
    290         otype = h5i.get_type(oid)

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5o.pyx in h5py.h5o.open()

KeyError: "Unable to open object (object 'array' doesn't exist)"

Expected behavior Should be able to access the tile array object. I created the h5 file with the following code:

slidename = < Path to Slide >
slide = SlideData(slide_name, backend = "bioformats", slide_type = types.Vectra)
slide.write(f'/parent_directory/{slide.name}.h5')

bug

opened by surya-narayanan 11

Adding to h5 file

Is it possible to re-run a slide with a different pipeline and add to the h5 file, without re-doing tiling? Happy to provide an example, if that would be helpful.

opened by surya-narayanan 9
Resolving dependencies between PathML and Deepcell
Describe the bug When we run pipelines for multiparametric images we often want to include models from deepcell https://github.com/vanvalenlab/deepcell-tf (especially for the SegmentMIF transform). It is difficult for users to solve the environment since installing deepcell downgrades packages like numpy to incompatible versions. This has caused installation problems for @MohamedOmar2020 and other internal users

To Reproduce

pipe = Pipeline( [ CollapseRunsVectra(), SegmentMIF( model="mesmer", nuclear_channel=0, cytoplasm_channel=7, image_resolution=0.5, ), QuantifyMIF(segmentation_mask="cell_segmentation"), ] ) dataset.run(pipe)

Expected behavior We would expect this to run but following the default installation instructions (option 1 from pip) followed by pip install deepcell results in a series of numpy errors when we attempt to run the pipeline

Working Solution These dependency problems are resolved (at least to the extent that the above pipeline can run) by upgrading numpy after deepcell installation as follows

conda create --name pathml python=3.8 conda activate pathml sudo apt-get install openslide-tools g++ gcc libblas-dev liblapack-dev conda install openjdk==8.0.152 pip install pathml pip install deepcell pip install --upgrade numpy

The question is: should we include this in our installation instructions for users who want to use multiparametric pipelines? Should we create a docker container for multiparametric pipelines? Should we remove our dependency on deepcell and try to wrap the model more directly in PathML (or train our own)?
enhancement
opened by ryanccarelli 8
Weird segmentation results

Hello, I have a problem with the segmentation resulting from the mesmer model. It looks like the model is not identifying cells properly since many cells are too large with too many nuclei. This is the code used to process the image:

pipe = Pipeline([ CollapseRunsVectra(), SegmentMIF(model='mesmer', nuclear_channel=0, cytoplasm_channel=7, image_resolution=0.5), QuantifyMIF(segmentation_mask='cell_segmentation') ])

slidedata.run(pipe, distributed = False, tile_size= (12784, 13234), tile_pad=False, overwrite_existing_tiles=True)

img = slidedata.tiles[3].image[10000:10500,12000:12500, :] nuc_mask = slidedata.tiles[3].masks['nuclear_segmentation'][10000:10500,12000:12500, :] cell_mask = slidedata.tiles[3].masks['cell_segmentation'][10000:10500,12000:12500, :]

img_fiji = np.expand_dims(img, axis=0) nuc_cytoplasm = np.stack((img_fiji[:,:,:,0], img_fiji[:,:,:,7]), axis=-1) rgb_image = create_rgb_image(nuc_cytoplasm, channel_colors=['blue', 'green']) cell_segmentation_predictions = np.expand_dims(cell_mask, axis=0) overlay_cell = make_outline_overlay(rgb_data=rgb_image, predictions=cell_segmentation_predictions)

That is how it looks like when I overlay the segmentation on the original image in fiji:

I loaded a small part of the original image in fiji and adjusted the brightness/contrast then used the mesmer model for segmentation (using deepcell directly not pathml) and the segmentation seems good. this is how it looks like:

Is it right to assume that the bad segmentation shown in the first image has something to do with the brightness/contrast of the raw image? Any ideas how to fix this?

Thanks in advance

opened by MohamedOmar2020 8

indices should be either on cpu or on the same device as the indexed tensor (cpu)

Describe the bug RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (CPU)

To Reproduce

n_classes_pannuke = 6

load the model

hovernet = HoVerNet(n_classes=n_classes_pannuke)

wrap model to use multi-GPU

hovernet = torch.nn.DataParallel(hovernet)

set up optimizer

opt = torch.optim.Adam(hovernet.parameters(), lr = 1e-4)

learning rate scheduler to reduce LR by factor of 10 each 25 epochs

scheduler = StepLR(opt, step_size=25, gamma=0.1)

send model to GPU

hovernet.to(device);

n_epochs = 50

print performance metrics every n epochs

print_every_n_epochs = None

evaluating performance on a random subset of validation mini-batches

this saves time instead of evaluating on the entire validation set

n_minibatch_valid = 50

epoch_train_losses = {} epoch_valid_losses = {} epoch_train_dice = {} epoch_valid_dice = {}

best_epoch = 0

main training loop

for i in tqdm(range(n_epochs)): minibatch_train_losses = [] minibatch_train_dice = []

### put model in training mode
hovernet.train()

for data in train_dataloader:
    ### send the data to the GPU
    images = data[0].float().to(device)
    masks = data[1].to(device)
    hv = data[2].float().to(device)
    tissue_type = data[3]

    ### zero out gradient
    opt.zero_grad()

    ### forward pass
    outputs = hovernet(images)

    ### compute loss
    loss = loss_hovernet(outputs = outputs, ground_truth = [masks, hv], n_classes=6)

    ### track loss
    minibatch_train_losses.append(loss.item())

    ### also track dice score to measure performance
    preds_detection, preds_classification = post_process_batch_hovernet(outputs, n_classes=n_classes_pannuke)
    truth_binary = masks[:, -1, :, :] == 0
    dice = dice_score(preds_detection, truth_binary.cpu().numpy())
    minibatch_train_dice.append(dice)

    ### compute gradients
    loss.backward()

    ### step optimizer and scheduler
    opt.step()

### step LR scheduler
scheduler.step()

### evaluate on random subset of validation data
hovernet.eval()
minibatch_valid_losses = []
minibatch_valid_dice = []
### randomly choose minibatches for evaluating
minibatch_ix = np.random.choice(range(len(valid_dataloader)), replace=False, size=n_minibatch_valid)
with torch.no_grad():
    for j, data in enumerate(valid_dataloader):
        if j in minibatch_ix:
            # send the data to the GPU
            images = data[0].float().to(device)
            masks = data[1].to(device)
            hv = data[2].float().to(device)
            tissue_type = data[3]

            # forward pass
            outputs = hovernet(images)

            # compute loss
            loss = loss_hovernet(outputs = outputs, ground_truth = [masks, hv], n_classes=6)

            # track loss
            minibatch_valid_losses.append(loss.item())

            # also track dice score to measure performance
            preds_detection, preds_classification = post_process_batch_hovernet(outputs, n_classes=n_classes_pannuke)
            truth_binary = masks[:, -1, :, :] == 0
            dice = dice_score(preds_detection, truth_binary.cpu().numpy())
            minibatch_valid_dice.append(dice)

### average performance metrics over minibatches
mean_train_loss = np.mean(minibatch_train_losses)
mean_valid_loss = np.mean(minibatch_valid_losses)
mean_train_dice = np.mean(minibatch_train_dice)
mean_valid_dice = np.mean(minibatch_valid_dice)

### save the model with best performance
if i != 0:
    if mean_valid_loss < min(epoch_valid_losses.values()):
        best_epoch = i
        torch.save(hovernet.state_dict(), f"hovernet_best_perf.pt")

### track performance over training epochs
epoch_train_losses.update({i : mean_train_loss})
epoch_valid_losses.update({i : mean_valid_loss})
epoch_train_dice.update({i : mean_train_dice})
epoch_valid_dice.update({i : mean_valid_dice})

if print_every_n_epochs is not None:
    if i % print_every_n_epochs == print_every_n_epochs - 1:
        print(f"Epoch {i+1}/{n_epochs}:")
        print(f"\ttraining loss: {np.round(mean_train_loss, 4)}\tvalidation loss: {np.round(mean_valid_loss, 4)}")
        print(f"\ttraining dice: {np.round(mean_train_dice, 4)}\tvalidation dice: {np.round(mean_valid_dice, 4)}")

save fully trained model

torch.save(hovernet.state_dict(), f"hovernet_fully_trained.pt") print(f"\nEpoch with best validation performance: {best_epoch}")

Expected behavior Should start model training

Screenshots

Additional context Anyone else also have this problem. I run this on HPC with 4 GPUs, each having 16G memory.

bug

opened by luzy05111036 7

Issue with distributed processing

Hello, Thank you for fixing the distributed issue with the mesmer model. I am running the pipeline with 'distributed = True' flag but I am getting many warnings and errors. Additionally, the pipeline was supposed to return 145 tiles but it is returning only 3 !. This is a part of the log message:

def watershed(image, markers=None, connectivity=1, offset=None, mask=None, /Users/mohamedomar/.local/lib/python3.8/site-packages/skimage/morphology/_deprecated.py:5: skimage_deprecation: Function watershed is deprecated and will be removed in version 0.19. Use skimage.segmentation.watershed instead. def watershed(image, markers=None, connectivity=1, offset=None, mask=None, /Users/mohamedomar/opt/anaconda3/envs/pathml2/lib/python3.8/site-packages/anndata/_core/anndata.py:120: ImplicitModificationWarning: Transforming to str index. warnings.warn("Transforming to str index.", ImplicitModificationWarning) /Users/mohamedomar/.local/lib/python3.8/site-packages/skimage/morphology/_deprecated.py:5: skimage_deprecation: Function watershed is deprecated and will be removed in version 0.19. Use skimage.segmentation.watershed instead. def watershed(image, markers=None, connectivity=1, offset=None, mask=None, /Users/mohamedomar/opt/anaconda3/envs/pathml2/lib/python3.8/site-packages/anndata/io/h5ad.py:64: FutureWarning: The force_dense argument is deprecated. Use as_dense instead. warn( /Users/mohamedomar/opt/anaconda3/envs/pathml2/lib/python3.8/site-packages/anndata/core/anndata.py:120: ImplicitModificationWarning: Transforming to str index. warnings.warn("Transforming to str index.", ImplicitModificationWarning) storing 'coords' as categorical storing 'slice' as categorical storing 'tile' as categorical **> 2021-08-10 00:00:05.176962: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at save_restore_v2_ops.cc:205 : Out of range: Read less bytes than requested distributed.worker - WARNING - Compute Failed Function: apply args: (Tile(coords=(1598, 6616), name=None, image shape: (1598, 1654, 2), slide_type=SlideType(stain=Fluor, platform=Vectra, tma=None, rgb=None, volumetric=None, time_series=None), labels=None, masks=None, counts=None)) kwargs: {} Exception: OutOfRangeError()**

That last error (bold text) is repeated many times.

Thanks in advance
bug

opened by MohamedOmar2020 7

Error installing owing to cached version of torch

If one tries to install pathml after a previously failed installation attempt, one runs into the following error, which I think is due to using cached files. One suggested solution (for just torch) is to do pip --no-cache-dir install torchvision, but i dont know if this is going to solve the issue and how to integrate this into intalling pathml as a whole, without installing each dependency one by one.

(pathml) jupyter@cuda11:~$ pip install pathml
Collecting pathml
  Using cached pathml-2.0.4-py3-none-any.whl (83 kB)
Collecting scipy
  Using cached scipy-1.8.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (41.6 MB)
Collecting python-bioformats>=4.0.0
  Using cached python_bioformats-4.0.5-py3-none-any.whl (41.4 MB)
Collecting scikit-image
  Using cached scikit_image-0.19.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.8 MB)
Collecting scikit-learn
  Using cached scikit_learn-1.0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.7 MB)
Requirement already satisfied: pip in /opt/conda/envs/pathml/lib/python3.8/site-packages (from pathml) (22.0.3)
Collecting openslide-python
  Using cached openslide-python-1.1.2.tar.gz (316 kB)
  Preparing metadata (setup.py) ... done
Collecting dask[distributed]
  Using cached dask-2022.1.1-py3-none-any.whl (1.1 MB)
Collecting pandas
  Using cached pandas-1.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.7 MB)
Collecting matplotlib
  Using cached matplotlib-3.5.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.3 MB)
Collecting anndata>=0.7.6
  Using cached anndata-0.7.8-py3-none-any.whl (91 kB)
Requirement already satisfied: numpy>=1.16.4 in /opt/conda/envs/pathml/lib/python3.8/site-packages (from pathml) (1.22.2)
Collecting h5py
  Using cached h5py-3.6.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (4.5 MB)
Collecting opencv-contrib-python
  Using cached opencv_contrib_python-4.5.5.62-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (66.6 MB)
Collecting pydicom
  Using cached pydicom-2.2.2-py3-none-any.whl (2.0 MB)
Collecting torch
SystemError: deallocated bytearray object has exported buffers
ERROR: Exception:
Traceback (most recent call last):
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/cli/base_command.py", line 167, in exc_logging_wrapper
    status = run_func(*args)
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/cli/req_command.py", line 205, in wrapper
    return func(self, options, args)
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/commands/install.py", line 339, in run
    requirement_set = resolver.resolve(
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 94, in resolve
    result = self._result = resolver.resolve(
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 481, in resolve
    state = resolution.resolve(requirements, max_rounds=max_rounds)
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 373, in resolve
    failure_causes = self._attempt_to_pin_criterion(name)
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 213, in _attempt_to_pin_criterion
    criteria = self._get_updated_criteria(candidate)
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 204, in _get_updated_criteria
    self._add_to_criteria(criteria, requirement, parent=candidate)
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/resolvelib/resolvers.py", line 172, in _add_to_criteria
    if not criterion.candidates:
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/resolvelib/structs.py", line 151, in __bool__
    return bool(self._sequence)
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 155, in __bool__
    return any(self)
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 143, in <genexpr>
    return (c for c in iterator if id(c) not in self._incompatible_ids)
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 47, in _iter_built
    candidate = func()
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 215, in _make_candidate_from_link
    self._link_candidate_cache[link] = LinkCandidate(
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 288, in __init__
    super().__init__(
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 158, in __init__
    self.dist = self._prepare()
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 227, in _prepare
    dist = self._prepare_distribution()
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 299, in _prepare_distribution
    return preparer.prepare_linked_requirement(self._ireq, parallel_builds=True)
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 487, in prepare_linked_requirement
    return self._prepare_linked_requirement(req, parallel_builds)
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 532, in _prepare_linked_requirement
    local_file = unpack_url(
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 214, in unpack_url
    file = get_http_url(
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 94, in get_http_url
    from_path, content_type = download(link, temp_dir.path)
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/network/download.py", line 133, in __call__
    resp = _http_get_download(self._session, link)
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/network/download.py", line 116, in _http_get_download
    resp = session.get(target_url, headers=HEADERS, stream=True)
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/requests/sessions.py", line 542, in get
    return self.request('GET', url, **kwargs)
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_internal/network/session.py", line 454, in request
    return super().request(method, url, *args, **kwargs)
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/requests/sessions.py", line 529, in request
    resp = self.send(prep, **send_kwargs)
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/requests/sessions.py", line 645, in send
    r = adapter.send(request, **kwargs)
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/cachecontrol/adapter.py", line 48, in send
    cached_response = self.controller.cached_request(request)
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/cachecontrol/controller.py", line 151, in cached_request
    resp = self.serializer.loads(request, cache_data)
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/cachecontrol/serialize.py", line 95, in loads
    return getattr(self, "_loads_v{}".format(ver))(request, data)
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/cachecontrol/serialize.py", line 182, in _loads_v4
    cached = msgpack.loads(data, raw=False)
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/msgpack/fallback.py", line 128, in unpackb
    ret = unpacker._unpack()
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/msgpack/fallback.py", line 592, in _unpack
    ret[key] = self._unpack(EX_CONSTRUCT)
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/msgpack/fallback.py", line 592, in _unpack
    ret[key] = self._unpack(EX_CONSTRUCT)
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/msgpack/fallback.py", line 546, in _unpack
    typ, n, obj = self._read_header()
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/msgpack/fallback.py", line 488, in _read_header
    obj = self._read(n)
  File "/opt/conda/envs/pathml/lib/python3.8/site-packages/pip/_vendor/msgpack/fallback.py", line 407, in _read
    ret = self._buffer[i : i + n]
MemoryError

bug

opened by surya-narayanan 6

Extracting tile returns multi dim output from H&E qptiff

I think this is specific to my scanner, when running the following line on my qptiff H&E svs file,

region = wsi.slide.extract_region(location = (900, 800), size = (500, 500))

I get a 5 dimensional object, which is incompatible for downstream analysis.

Do you think it may be useful to run a np.squeeze right before returning the array in wsi.slide.extract_region?

opened by surya-narayanan 6
How to implement HoVer-Net Model in TIAToolBox?

I want to use the TIAToolBox to segment nuclei in tissue images. The pretrained models don't fit my demands. Therfore I trained a custom HoVer-Net model on the PanNuke dataset to detect/count nuclei in tissue images.

Does anybody know, how to use the custom PyTorch HoVer-Net model that I get from this script in the TIAToolBox to detect nuclei?

opened by WilliWespe 0
How to train hovernet starting with semantic-level mask image?

Is your feature request related to a problem? Please describe. I have large WSI data and multi-class jpeg masks, but I am so tired to find a solution to make them work with any hovernet implementation.

Describe the solution you'd like I'd like to be able to feed my large WSI data along with the jpeg masks, and tiling and training then took place.

Describe alternatives you've considered If I still need instance masks, I can do that I do this in watershed. But still don't know what format PathML would want (e.g npy, mat, json, jpeg ..etc

Additional context

Any help is highly appreciated.

Thanks!
enhancement

opened by OmarAshkar 3
Deepcell segmentation without cytoplasm channel

I note that Deepcell provides both a segmentation (nuclear channel only) and mesmer (nuclear and cytoplasm) model (https://www.deepcell.org/predict) Our datasets do not have a single general cytoplasm marker that will capture all cell type cytoplasm required by mesmer model, eg tumour cells vs immune vs stromal cells Can both nuclear_channel=DAPI, cytoplasm_channel=DAPI in mesmer model? or can SegmentMIF(model='segmentation' be supported? thanks!
enhancement

opened by jamesMo84 1
Allow SlideData to use existing h5path files

As motivated by https://github.com/Dana-Farber-AIOS/pathml/issues/332 and https://github.com/Dana-Farber-AIOS/pathml/issues/300, this modifies SlideData to read and update Tiles from an existing h5path file instead of requiring each pipeline run to recreate all tiles from scratch.

This includes #335 as many transforms (e.g. BoxBlur) require np.uint8 data instead of the default float16 saved to h5path files. I was also working off my load-data-in-workers branch because it had significant performance changes for my use cases. Sorry about the branching messiness, hopefully the changes will be clearer as other branches are merged into dev.

This makes breaking changes to the SlideData API, namely replacing generate_tiles with get_tiles and moving the tile parameterization from run to the SlideData constructor.

opened by tddough98 0
Load tiles in parallel on workers and add options to `TissueDetectionHE`
This contains two separate improvements

add drop_empty_tiles and keep_mask options to the TissueDetectionHE transform to bypass saving tiles with no detected H&E tissue and bypass saving masks

parallelize tile image loading by using dask.delayed to avoid loading images on the main thread

The first part is both for convenience and performance. It's possible to generate all tiles and then filter out the empty tiles and remove masks before writing the h5path to disk, but that requires that all the tiles be added to the Tiles which takes IO time. If these tiles and masks are never saved even to in-memory objects, processing can finish faster.

The second part is a core performance issue with distributed processing. I believe it's relevant to https://github.com/Dana-Farber-AIOS/pathml/issues/211 and https://github.com/Dana-Farber-AIOS/pathml/issues/299. When processing tiles, I've found that loading time >> processing time, and currently, tile image data is loaded on the main thread and scatters the loaded tile to workers. This prevents any parallelism as all but one worker are always waiting for the main thread to load data and send them a tile.

Additionally, as all tiles have to be loaded on the main thread, the block that generates the futures

for tile in self.generate_tiles( level=level, shape=tile_size, stride=tile_stride, pad=tile_pad, **kwargs, ): if not tile.slide_type: tile.slide_type = self.slide_type # explicitly scatter data, i.e. send the tile data out to the cluster before applying the pipeline # according to dask, this can reduce scheduler burden and keep data on workers big_future = client.scatter(tile) f = client.submit(pipeline.apply, big_future) processed_tile_futures.append(f)

has to load all tiles and send them all to workers before ANY tile can be added to the Tiles and the memory can be freed in the next block

# as tiles are processed, add them to h5 for future, tile in dask.distributed.as_completed( processed_tile_futures, with_results=True ): self.tiles.add(tile)

causing the dramatic memory leaks seen in https://github.com/Dana-Farber-AIOS/pathml/issues/211.

I've used dask.delayed to prevent reading from the input file until the image is accessed on the worker. The code that accesses the file and loads the image can now be run by each worker in parallel. To preserve the parallelism, we have to take care not to access and load tile.image on the main thread before loading it on the worker, or to at least wrap accesses in dask.delayed as in SlideData.generate_tiles.

I had some issues with the backends not being picklable. The Backend has to be sent to each worker so it has access to the code that interfaces with the filesystem. I changed Backend filelike attributes to be lazily evaluated with the @property decorator.
opened by tddough98 4
Parameterize dtype for h5path with `SlideData` constructor

Currently, PathML stores all images with float16, forcing all image inputs to be upcast or downcast to this data type, which increases storage size or loses information. There already is a dtype parameter in the SlideData constructor, but it's only used to assist the BioFormatsBackend in loading images correctly. This repurposes that parameter to control what dtype h5py uses when writing image data.

I also changed masks to stored as ENUM and use the strongest compression setting as boolean masks are highly compressible and easily compressed. The compression made a huge difference in file size, and using (HDFView)[https://www.hdfgroup.org/downloads/hdfview/] showed a compression ratio of 100-200x for masks. The ENUM data type is stored as an 8-bit integer (https://docs.h5py.org/en/stable/special.html#enumerated-types) but at least this is less than using float16.

opened by tddough98 4

Releases(v2.1.0)

v2.1.0(Apr 22, 2022)
What's Changed

Clean SegmentMIF by @ryanccarelli in https://github.com/Dana-Farber-AIOS/pathml/pull/294

Removed GPU argument from SegmentMIF

Separated whole_cell and nuclear kwargs

Update README.md by @surya-narayanan in https://github.com/Dana-Farber-AIOS/pathml/pull/298

Update quantify mif by @jacob-rosenthal in https://github.com/Dana-Farber-AIOS/pathml/pull/301

update the functional implementation F() to not require a tile object.

Add "label" property to counts matrix.

Fix tiling bug by @jacob-rosenthal in https://github.com/Dana-Farber-AIOS/pathml/pull/306

Fixed bug for generate_tiles() within OpenSlideBackend and BioFormatsBackend. Tile shape evenly divides into slide shape

Added logging functionality by @BeeGass in https://github.com/Dana-Farber-AIOS/pathml/pull/304

Includes logger customization

Don't augment test or valid splits for PanNuke by @jacob-rosenthal in https://github.com/Dana-Farber-AIOS/pathml/pull/309

New Contributors

@BeeGass made their first contribution in https://github.com/Dana-Farber-AIOS/pathml/pull/304

Full Changelog: https://github.com/Dana-Farber-AIOS/pathml/compare/v2.0.4...v2.1.0
Source code(tar.gz)
Source code(zip)
v2.0.4(Feb 7, 2022)
What's Changed

Fix bug caused by mixing up (i, j) and (x, y) coordinate systems in BioFormatsBackend (#278)

Add option to not normalize image in BioFormatsBackend.extract_region() (#279)

Fix logic when inferring correct backend to use from file path which was failing on paths containing periods (#284)

Fix bug to correctly pass image_resolution argument to Mesmer model (#286)

Fix outdated url for PanNuke dataset (#287) by @Yu-AnChen

Fix GitHub Actions configuration which was causing testing suite to hang (#289)

New Contributors

@dependabot made their first contribution in https://github.com/Dana-Farber-AIOS/pathml/pull/275

@Yu-AnChen made their first contribution in https://github.com/Dana-Farber-AIOS/pathml/pull/287

Full Changelog: https://github.com/Dana-Farber-AIOS/pathml/compare/v2.0.3...v2.0.4
Source code(tar.gz)
Source code(zip)
v2.0.3(Jan 7, 2022)
What's Changed

Fix bug in BioFormats backend #272

Full Changelog: https://github.com/Dana-Farber-AIOS/pathml/compare/v2.0.2...v2.0.3
Source code(tar.gz)
Source code(zip)
v2.0.2(Jan 6, 2022)
What's Changed

Streamline environment setup by removing spams as a dependency (#142) and updating environment.yml to create an environment with both PathML and deepcell (#259 #210)

Add a Dockerfile for another installation option, and a GitHub Actions workflow to build and publish it to Dockerhub on new release (#145)

Add series_as_channels flag to BioFormatsBackend.extract_region() to fix support for images from the MISI lab (#261)

Full Changelog: https://github.com/Dana-Farber-AIOS/pathml/compare/v2.0.1...v2.0.2
Source code(tar.gz)
Source code(zip)
v2.0.dev4(Jan 4, 2022)

Test release to debug dockerization GitHub Actions workflow

#245 #145
Source code(tar.gz)
Source code(zip)
v2.0.dev3(Jan 4, 2022)

Test release to debug dockerization GitHub Actions workflow

#245 #145
Source code(tar.gz)
Source code(zip)
v2.0.dev2(Jan 4, 2022)

Test release to debug dockerization GitHub Actions workflow

#245 #145
Source code(tar.gz)
Source code(zip)
2.0.dev1(Jan 4, 2022)

Test release to check that dockerization GitHub Actions workflow is working properly

#245 #145
Source code(tar.gz)
Source code(zip)
v2.0.1(Dec 25, 2021)
What's Changed

Improve h5path read/write by @ryanccarelli in https://github.com/Dana-Farber-AIOS/pathml/pull/260

Full Changelog: https://github.com/Dana-Farber-AIOS/pathml/compare/v2.0.0...v2.0.1
Source code(tar.gz)
Source code(zip)
v2.0.0(Dec 19, 2021)
What's new in v2.0.0:

Changed h5path format and refactored h5manager to improve performance (#231)

support XYZCT images for TileDataset (#233)

Cleaned up versioning tracker (#236)

fix bug when reading region from openslide backend at higher levels (#242)

Add support for multi-series images with BioformatsBackend (#251)

Pin python-bioformats version to avoid any possibility of log4j hacks (#256)

Added optional flag in SlideDataset.run() to write slides to h5path as they finish processing (#226)

Added GitHub Actions workflow to automatically build package and publish to PyPI when a new release is created (#235)

Because the file format is changed in this version, .h5path files saved in older versions will not be able to be loaded in this one, and vice versa (i.e. breaking backwards compatibility, hence the bumped major version).
Source code(tar.gz)
Source code(zip)
v1.0.4(Nov 29, 2021)

Testing versioning and CI distribution
Source code(tar.gz)
Source code(zip)
v1.0.dev4(Nov 29, 2021)

This is a test release to debug/check our CI workflows
Source code(tar.gz)
Source code(zip)
v1.0.3(Oct 26, 2021)

Source code(tar.gz)
Source code(zip)
v1.0.2(Sep 22, 2021)

Source code(tar.gz)
Source code(zip)
v1.0.1(Sep 13, 2021)

Source code(tar.gz)
Source code(zip)

Owner

AI Operations and Data Science Services group

GitHub https://pathml.org

House-GAN++: Generative Adversarial Layout Refinement Network towards Intelligent Computational Agent for Professional Architects

House-GAN++ Code and instructions for our paper: House-GAN++: Generative Adversarial Layout Refinement Network towards Intelligent Computational Agent

122 Dec 28, 2022

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

XL-Sum This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Lang

190 Jan 3, 2023

A Python framework for developing parallelized Computational Fluid Dynamics software to solve the hyperbolic 2D Euler equations on distributed, multi-block structured grids.

pyHype: Computational Fluid Dynamics in Python pyHype is a Python framework for developing parallelized Computational Fluid Dynamics software to solve

21 Nov 22, 2022

Flow is a computational framework for deep RL and control experiments for traffic microsimulation.

Flow Flow is a computational framework for deep RL and control experiments for traffic microsimulation. See our website for more information on the ap

867 Jan 2, 2023

Computational modelling of ray propagation through optical elements using the principles of geometric optics (Ray Tracer)

Computational modelling of ray propagation through optical elements using the principles of geometric optics (Ray Tracer) Introduction By applying the

1 Jul 9, 2022

Computational Methods Course at UdeA. Forked and size reduced from:

Computational Methods for Physics & Astronomy Book version at: https://restrepo.github.io/ComputationalMethods by: Sebastian Bustamante 2014/2015 Dieg

11 Sep 10, 2022

A computational optimization project towards the goal of gerrymandering the results of a hypothetical election in the UK.

1 Jan 18, 2022

Python implementation of 3D facial mesh exaggeration using the techniques described in the paper: Computational Caricaturization of Surfaces.

8 Nov 1, 2022

Computational inteligence project on faces in the wild dataset

Table of Contents The general idea How these scripts work? Loading data Needed modules and global variables Parsing the arrays in dataset Extracting a

4 Oct 21, 2022

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.

Core ML Tools Use coremltools to convert machine learning models from third-party libraries to the Core ML format. The Python package contains the sup

3k Jan 8, 2023

A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Feature Forge This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, e

380 Nov 5, 2022

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group

8.4k Jan 3, 2023

potpourri3d - An invigorating blend of 3D geometry tools in Python.

A Python library of various algorithms and utilities for 3D triangle meshes and point clouds. Managed by Nicholas Sharp, with new tools added lazily as needed. Currently, mainly bindings to C++ tools from geometry-central.

295 Jan 5, 2023

Simple tools for logging and visualizing, loading and training

TNT TNT is a library providing powerful dataloading, logging and visualization utilities for Python. It is closely integrated with PyTorch and is desi

1.5k Jan 2, 2023

MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

MMdnn MMdnn is a comprehensive and cross-framework tool to convert, visualize and diagnose deep learning (DL) models. The "MM" stands for model manage

5.7k Jan 9, 2023

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

This repository holds NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pytorch. Some of the code here will be included in upstream Pytorch eventually. The intention of Apex is to make up-to-date utilities available to users as quickly as possible.

6.9k Jan 3, 2023

Tools for manipulating UVs in the Blender viewport.

UV Tool Suite for Blender A set of tools to make editing UVs easier in Blender. These tools can be accessed wither through the Kitfox - UV panel on th

35 Oct 29, 2022

Deep learning (neural network) based remote photoplethysmography: how to extract pulse signal from video using deep learning tools

Deep-rPPG: Camera-based pulse estimation using deep learning tools Deep learning (neural network) based remote photoplethysmography: how to extract pu

138 Dec 17, 2022

PointCloud Annotation Tools, support to label object bound box, ground, lane and kerb

368 Dec 6, 2022

Tools for computational pathology

Related tags

Overview

Installation

Installation option 1: pip install

Installation option 2: clone repo and install from source

CUDA

Using with Jupyter

Set JAVA_HOME environment variable

Register PathML as an IPython kernel

Contributing

License

Contact

Comments

load the model

wrap model to use multi-GPU

set up optimizer

learning rate scheduler to reduce LR by factor of 10 each 25 epochs

send model to GPU

print performance metrics every n epochs

evaluating performance on a random subset of validation mini-batches

this saves time instead of evaluating on the entire validation set

main training loop

save fully trained model

Releases(v2.1.0)

v2.1.0(Apr 22, 2022)

What's Changed

New Contributors

v2.0.4(Feb 7, 2022)

What's Changed

New Contributors

v2.0.3(Jan 7, 2022)

What's Changed

v2.0.2(Jan 6, 2022)

What's Changed

v2.0.dev4(Jan 4, 2022)

v2.0.dev3(Jan 4, 2022)

v2.0.dev2(Jan 4, 2022)

2.0.dev1(Jan 4, 2022)

v2.0.1(Dec 25, 2021)

What's Changed

v2.0.0(Dec 19, 2021)

v1.0.4(Nov 29, 2021)

v1.0.dev4(Nov 29, 2021)

v1.0.3(Oct 26, 2021)

v1.0.2(Sep 22, 2021)

v1.0.1(Sep 13, 2021)

Owner

House-GAN++: Generative Adversarial Layout Refinement Network towards Intelligent Computational Agent for Professional Architects

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

A Python framework for developing parallelized Computational Fluid Dynamics software to solve the hyperbolic 2D Euler equations on distributed, multi-block structured grids.

Flow is a computational framework for deep RL and control experiments for traffic microsimulation.

Computational modelling of ray propagation through optical elements using the principles of geometric optics (Ray Tracer)

Computational Methods Course at UdeA. Forked and size reduced from:

A computational optimization project towards the goal of gerrymandering the results of a hypothetical election in the UK.

Python implementation of 3D facial mesh exaggeration using the techniques described in the paper: Computational Caricaturization of Surfaces.

Computational inteligence project on faces in the wild dataset

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.

A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

potpourri3d - An invigorating blend of 3D geometry tools in Python.

Simple tools for logging and visualizing, loading and training

MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Tools for manipulating UVs in the Blender viewport.

Deep learning (neural network) based remote photoplethysmography: how to extract pulse signal from video using deep learning tools

PointCloud Annotation Tools, support to label object bound box, ground, lane and kerb