LIVECell - A large-scale dataset for label-free live cell segmentation

Sartorius Corporate Research

Last update: Jan 7, 2023

Related tags

Deep Learning LIVECell

Overview

LIVECell dataset

This document contains instructions of how to access the data associated with the submitted manuscript "LIVECell - A large-scale dataset for label-free live cell segmentation" by Edlund et. al. 2021.

Background

Light microscopy is a cheap, accessible, non-invasive modality that when combined with well-established protocols of two-dimensional cell culture facilitates high-throughput quantitative imaging to study biological phenomena. Accurate segmentation of individual cells enables exploration of complex biological questions, but this requires sophisticated imaging processing pipelines due to the low contrast and high object density. Deep learning-based methods are considered state-of-the-art for most computer vision problems but require vast amounts of annotated data, for which there is no suitable resource available in the field of label-free cellular imaging. To address this gap we present LIVECell, a high-quality, manually annotated and expert-validated dataset that is the largest of its kind to date, consisting of over 1.6 million cells from a diverse set of cell morphologies and culture densities. To further demonstrate its utility, we provide convolutional neural network-based models trained and evaluated on LIVECell.

How to access LIVECell

All images in LIVECell are available following this link (requires 1.3 GB). Annotations for the different experiments are linked below. To see a more details regarding benchmarks and how to use our models, see this link.

LIVECell-wide train and evaluate

Annotation set	URL
Training set	link
Validation set	link
Test set	link

Single cell-type experiments

Cell Type	Training set	Validation set	Test set
A172	link	link	link
BT474	link	link	link
BV-2	link	link	link
Huh7	link	link	link
MCF7	link	link	link
SH-SHY5Y	link	link	link
SkBr3	link	link	link
SK-OV-3	link	link	link

Dataset size experiments

Split	URL
2 %	link
4 %	link
5 %	link
25 %	link
50 %	link

Comparison to fluorescence-based object counts

The images and corresponding json-file with object count per image is available together with the raw fluorescent images the counts is based on.

Cell Type	Images	Counts	Fluorescent images
A549	link	link	link
A172	link	link	link

Download all of LIVECell

The LIVECell-dataset and trained models is stored in an Amazon Web Services (AWS) S3-bucket. It is easiest to download the dataset if you have an AWS IAM-user using the AWS-CLI in the folder you would like to download the dataset to by simply:

aws s3 sync s3://livecell-dataset .

If you do not have an AWS IAM-user, the procedure is a little bit more involved. We can use curl to make an HTTP-request to get the S3 XML-response and save to files.xml:

files.xml ">

curl -H "GET /?list-type=2 HTTP/1.1" \
     -H "Host: livecell-dataset.s3.eu-central-1.amazonaws.com" \
     -H "Date: 20161025T124500Z" \
     -H "Content-Type: text/plain" http://livecell-dataset.s3.eu-central-1.amazonaws.com/ > files.xml

We then get the urls from files using grep:

)[^<]+" files.xml | sed -e 's/^/http:\/\/livecell-dataset.s3.eu-central-1.amazonaws.com\//' > urls.txt ">

grep -oPm1 "(?<=
   
    )[^<]+" files.xml | sed -e 's/^/http:\/\/livecell-dataset.s3.eu-central-1.amazonaws.com\//' > urls.txt

Then download the files you like using wget.

File structure

The top-level structure of the files is arranged like:

/livecell-dataset/
    ├── LIVECell_dataset_2021  
    |       ├── annotations/
    |       ├── models/
    |       ├── nuclear_count_benchmark/	
    |       └── images.zip  
    ├── README.md  
    └── LICENSE

LIVECell_dataset_2021/images

The images of the LIVECell-dataset are stored in /livecell-dataset/LIVECell_dataset_2021/images.zip along with their annotations in /livecell-dataset/LIVECell_dataset_2021/annotations/.

Within images.zip are the training/validation-set and test-set images are completely separate to facilitate fair comparison between studies. The images require 1.3 GB disk space unzipped and are arranged like:

images/
    ├── livecell_test_images
    |       └── 
   
    
    |               └── 
    
     _Phase_
     
      _
      
       _
       
        _
        
         .tif └── livecell_train_val_images └──

Where is each of the eight cell-types in LIVECell (A172, BT474, BV2, Huh7, MCF7, SHSY5Y, SkBr3, SKOV3). Wells are the location in the 96-well plate used to culture cells, indicates location in the well where the image was acquired, the time passed since the beginning of the experiment to image acquisition and index of the crop of the original larger image. An example image name is A172_Phase_C7_1_02d16h00m_2.tif, which is an image of A172-cells, grown in well C7 where the image is acquired in position 1 two days and 16 hours after experiment start (crop position 2).

LIVECell_dataset_2021/annotations/

The annotations of LIVECell are prepared for all tasks along with the training/validation/test splits used for all experiments in the paper. The annotations require 2.1 GB of disk space and are arranged like:

annotations/
    ├── LIVECell
    |       └── livecell_coco_
   
    .json
    ├── LIVECell_single_cells
    |       └── 
    
     
    |               └── 
     
      .json
    └── LIVECell_dataset_size_split
            └── 
      
       _train
       
        percent.json

annotations/LIVECell contains the annotations used for the LIVECell-wide train and evaluate task.
annotations/LIVECell_single_cells contains the annotations used for Single cell type train and evaluate as well as the Single cell type transferability tasks.
annotations/LIVECell_dataset_size_split contains the annotations used to investigate the impact of training set scale.

All annotations are in Microsoft COCO Object Detection-format, and can for instance be parsed by the Python package pycocotools.

models/

ALL models trained and evaluated for tasks associated with LIVECell are made available for wider use. The models are trained using detectron2, Facebook's framework for object detection and instance segmentation. The models require 15 GB of disk space and are arranged like:

models/
   └── Anchor_
   
    
            ├── ALL/
            |    └──
    
     .pth
            └── 
     
      /
                 └──
      
       .pths

Where each .pth is a binary file containing the model weights.

configs/

The config files for each model can be found in the LIVECell github repo

LIVECell
    └── Anchor_
   
    
            ├── livecell_config.yaml
            ├── a172_config.yaml
            ├── bt474_config.yaml
            ├── bv2_config.yaml
            ├── huh7_config.yaml
            ├── mcf7_config.yaml
            ├── shsy5y_config.yaml
            ├── skbr3_config.yaml
            └── skov3_config.yaml

Where each config file can be used to reproduce the training done or in combination with our model weights for usage, for more info see the usage section.

nuclear_count_benchmark/

The images and fluorescence-based object counts are stored as the label-free images in a zip-archive and the corresponding counts in a json as below:

nuclear_count_benchmark/
    ├── A172.zip
    ├── A172_counts.json
    ├── A172_fluorescent_images.zip
    ├── A549.zip
    ├── A549_counts.json 
    └── A549_fluorescent_images.zip

The json files are on the following format:

": " " } ">

Where points to one of the images in the zip-archive, and refers to the object count according fluorescent nuclear labels.

LICENSE

All images, annotations and models associated with LIVECell are published under Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license.

All software source code associated associated with LIVECell are published under the MIT License.

Comments

Unable to parse annotations using pycocotools package

Hello!

I have downloaded the images and the corresponding .json files of the annotations for both the entire dataset and individual cell lines. However, when attempting to read the .json files for any of the annotations (ex. 'livecell_a172_val.json' or 'livecell_bv2_test.json') I get the error message shown below:

I have researched the topic online and have tried a few "fixes", but am still having trouble with this issue. My hope is the extract the annotations as images to view them in order to later on evaluate the performance of a model.

Is there any chance you may know how to fix this issue?

opened by calebhallinan 10
conver json to npy
Hi,

Thanks for sharing the great work.

I'm not familiar with coco, could you please share some code about converting the livecell_coco_train.json annotation to npy? The expected outputs are as follows

A172_Phase_A7_1_00d00h00m_1.npy # dtype=np.uint32 MCF7_Phase_E4_1_02d00h00m_2.npy A172_Phase_A7_1_00d00h00m_2.npy MCF7_Phase_E4_1_02d00h00m_3.npy
opened by EdwardZhao1991 6

Poor predictions when running the provided models

Hello,

Over the last few days I have been trying to test some of the models you trained on some different microscopy images. However, I am encountering some problems while running the model.

To check that everything works, I tried to evaluate the model on the test data you provide, but (no matter which model I choose, I tried both with anchor-based and anchor-free models), I always get very wrong predictions, almost as if the models were not trained at all. I attached a picture below as an example. I also tried with images from the training set and results are the same. This is also confirmed by the evaluation script, which returns very low AP scores. I broke my head few days over this to understand what I could possibly do wrong, but I could not solve the problem (I also checked multiple times to make sure that I am passing the correct MODEL.WEIGHTS parameter for the LIVECell model).

Am I the only one facing this problem or did someone else manage to run the provided models?

Thank you!

Here there are few things I noticed while running the script:

The script raises the following warning. Is it a problem or is it expected?

[09/22 16:57:37 fvcore.common.checkpoint]: Loading checkpoint from /scratch/bailoni/datasets/LIVECell/LIVECell_anchor_based_skbr3_model.pth
WARNING [09/22 16:57:38 fvcore.common.checkpoint]: 'roi_heads.box_head.0.fc1.weight' has shape (1024, 12544) in the checkpoint but (1024, 50176) in the model! Skipped.
WARNING [09/22 16:57:38 fvcore.common.checkpoint]: 'roi_heads.box_head.1.fc1.weight' has shape (1024, 12544) in the checkpoint but (1024, 50176) in the model! Skipped.
WARNING [09/22 16:57:38 fvcore.common.checkpoint]: 'roi_heads.box_head.2.fc1.weight' has shape (1024, 12544) in the checkpoint but (1024, 50176) in the model! Skipped.
[09/22 16:57:38 fvcore.common.checkpoint]: Some model parameters are not in the checkpoint:
  roi_heads.box_head.0.fc1.weight
  roi_heads.box_head.2.fc1.weight
  roi_heads.box_head.1.fc1.weight
[09/22 16:57:38 fvcore.common.checkpoint]: The checkpoint contains parameters not used by the model:
  pixel_mean
  pixel_std

Here there is an examples of the scores I get:

[09/22 16:58:42 d2.evaluation.coco_evaluation_LIVECell]: Evaluation results for bbox:
|  AP   |  AP50  |  AP75  |  APs  |  APm  |  APl  |
|:-----:|:------:|:------:|:-----:|:-----:|:-----:|
| 0.081 | 0.336  | 0.010  | 0.020 | 0.453 | 0.027 |
Loading and preparing results...
DONE (t=0.07s)
creating index...
index created!
Size parameters: [[0, 10000000000.0], [0, 324], [324, 961], [961, 10000000000.0]]
Running per image evaluation...
Evaluate annotation type *segm*
DONE (t=40.64s).
Accumulating evaluation results...
DONE (t=0.12s).
In method
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=2000 ] = 0.002
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=2000 ] = 0.004
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=2000 ] = 0.001
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=2000 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=2000 ] = 0.011
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=2000 ] = 0.002
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=500 ] = 0.005
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=2000 ] = 0.019
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=2000 ] = 0.007
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=2000 ] = 0.049
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=2000 ] = 0.016
_derive_coco_results
[09/22 16:59:23 d2.evaluation.coco_evaluation_LIVECell]: Evaluation results for segm:
|  AP   |  AP50  |  AP75  |  APs  |  APm  |  APl  |
|:-----:|:------:|:------:|:-----:|:-----:|:-----:|
| 0.151 | 0.384  | 0.078  | 0.029 | 1.126 | 0.166 |
[09/22 16:59:23 d2.engine.defaults]: Evaluation results for TEST in csv format:
[09/22 16:59:23 d2.evaluation.testing]: copypaste: Task: bbox
[09/22 16:59:23 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl
[09/22 16:59:23 d2.evaluation.testing]: copypaste: 0.0807,0.3364,0.0104,0.0201,0.4529,0.0274
[09/22 16:59:23 d2.evaluation.testing]: copypaste: Task: segm
[09/22 16:59:23 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl
[09/22 16:59:23 d2.evaluation.testing]: copypaste: 0.1508,0.3836,0.0783,0.0293,1.1261,0.1664

As you can see in the attached image, the predictions are not totally random (most of the instances are predicted where there is indeed a cell), but they are not accurate at all and way too many
In the provided coco_evaluation.py script I had to comment out the last section of the code (Added code to produce precision and recall for all iou levels / Chris, line 650) because it was giving several errors (the first one complaining that an integer is not iterable, at line 656)
I noticed that the input tensor passed to the model has a much larger shape (800, 1083, 3) than the original image shape (520, 704, 3), but I guess this is normal (the output masks have indeed the correct shape (520, 704))

instances

opened by abailoni 6

The test results of the anchor based model do not match the published ones

Hello, based on the detectron2-ResNeSt library, I downloaded the anchor based model and config file you published, but my test results are inconsistent with the content published in your github. Your published results are 48.43, 47.89, my test results are 47.9, 47.9, maybe you have modified some configuration? Here are my test results:

[32m[06/27 23:14:31 d2.evaluation.coco_evaluation]: [0mPreparing results for COCO format ...
[32m[06/27 23:14:31 d2.evaluation.coco_evaluation]: [0mSaving results to PATH/TO/SAVE/RESULTS/inference/coco_instances_results.json
[32m[06/27 23:14:35 d2.evaluation.coco_evaluation]: [0mEvaluating predictions ...
Loading and preparing results...
DONE (t=0.42s)
creating index...
index created!
Size parameters: [[0, 10000000000.0], [0, 324], [324, 961], [961, 10000000000.0]]
Evaluate annotation type *bbox*
COCOeval_opt.evaluate() finished in 75.10 seconds.
In method
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=2000 ] = 0.479
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=2000 ] = 0.804
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=2000 ] = 0.509
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=2000 ] = 0.477
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=2000 ] = 0.491
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=2000 ] = 0.540
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.218
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=500 ] = 0.478
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=2000 ] = 0.555
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=2000 ] = 0.528
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=2000 ] = 0.573
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=2000 ] = 0.643
Precision and Recall per iou: [0.5  0.55 0.6  0.65 0.7  0.75 0.8  0.85 0.9  0.95]
[0.8039 0.7617 0.7166 0.6623 0.5963 0.5091 0.395  0.2458 0.0906 0.0083]
[0.8623 0.829  0.7875 0.7366 0.6738 0.5938 0.4904 0.3537 0.1872 0.0372]
_derive_coco_results
[32m[06/27 23:15:54 d2.evaluation.coco_evaluation]: [0mEvaluation results for bbox: 
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|:------:|
| 47.895 | 80.389 | 50.911 | 47.670 | 49.056 | 54.012 |
Loading and preparing results...
DONE (t=3.93s)
creating index...
index created!
Size parameters: [[0, 10000000000.0], [0, 324], [324, 961], [961, 10000000000.0]]
Evaluate annotation type *segm*
COCOeval_opt.evaluate() finished in 82.66 seconds.
In method
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=2000 ] = 0.479
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=2000 ] = 0.808
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=2000 ] = 0.516
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=2000 ] = 0.458
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=2000 ] = 0.483
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=2000 ] = 0.569
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.214
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=500 ] = 0.472
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=2000 ] = 0.547
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=2000 ] = 0.524
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=2000 ] = 0.556
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=2000 ] = 0.633
Precision and Recall per iou: [0.5  0.55 0.6  0.65 0.7  0.75 0.8  0.85 0.9  0.95]
[0.808  0.772  0.7289 0.6765 0.6106 0.5164 0.3885 0.2239 0.0627 0.0013]
[0.8632 0.8323 0.7948 0.7475 0.6839 0.5968 0.4772 0.3193 0.1395 0.0128]
_derive_coco_results
[32m[06/27 23:17:40 d2.evaluation.coco_evaluation]: [0mEvaluation results for segm: 
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|:------:|
| 47.890 | 80.797 | 51.644 | 45.752 | 48.330 | 56.935 |
[32m[06/27 23:17:40 d2.engine.defaults]: [0mEvaluation results for livecelltest in csv format:
[32m[06/27 23:17:40 d2.evaluation.testing]: [0mcopypaste: Task: bbox
[32m[06/27 23:17:40 d2.evaluation.testing]: [0mcopypaste: AP,AP50,AP75,APs,APm,APl
[32m[06/27 23:17:40 d2.evaluation.testing]: [0mcopypaste: 47.8952,80.3888,50.9107,47.6699,49.0555,54.0118
[32m[06/27 23:17:40 d2.evaluation.testing]: [0mcopypaste: Task: segm
[32m[06/27 23:17:40 d2.evaluation.testing]: [0mcopypaste: AP,AP50,AP75,APs,APm,APl
[32m[06/27 23:17:40 d2.evaluation.testing]: [0mcopypaste: 47.8895,80.7974,51.6442,45.7517,48.3298,56.9348

opened by zhouzhouhhh 5

Strange predictions on training images

I am trying to use the trained LIVECell model. To do so, first I have set up a google colab environment, and have tried to make predictions on some of the original training images. I followed the installation steps for LIVECell, and I combined it with a general Detectron2 model visualization tutorial.

Here is the main code I use:

# Register training dataset
from detectron2.data.datasets import register_coco_instances
from detectron2.data import MetadataCatalog, DatasetCatalog

register_coco_instances("LIVECell", {}, data_path + "livecell_coco_train.json", data_path + "/images/livecell_train_val_images")
cells_metadata = MetadataCatalog.get("LIVECell")
dataset_dicts = DatasetCatalog.get("LIVECell")

# Load trained model
from detectron2.engine import DefaultTrainer
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg

cfg = get_cfg()
cfg.merge_from_list(['MODEL.WEIGHTS', os.path.join(models_path, "LIVECell_anchor_free_model.pth")])
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.25   # set the testing threshold for this model
predictor = DefaultPredictor(cfg)

# Predict on training images
from google.colab.patches import cv2_imshow
from detectron2.utils.visualizer import Visualizer
from detectron2.utils.visualizer import ColorMode

d = dataset_dicts[300] # choosing a random image
img = cv2.imread(d["file_name"])
outputs = predictor(img)
instances = outputs['instances']
v = Visualizer(img[:, :, ::-1],
                   scale=1, 
                   instance_mode=ColorMode.IMAGE_BW   # remove the colors of unsegmented pixels
    )
v = v.draw_instance_predictions(instances.to("cpu"))
cv2_imshow(v.get_image()[:, :, ::-1])

And this is what I get as the result

pred_image300

It detected something, but they have a very odd shape, and the object labels are also not "cell" in the output image.

Do you have a quick solution to fix this issue?

opened by csmolnar 5

KeyError when running models

Thanks for the great data release and model release!

Unfortunately I have received this error: https://github.com/chongruo/detectron2-ResNeSt/issues/54, when trying to run both the anchor-free and anchor-based models. I'm using Ubuntu 18.04 OS, torch v1.10, cudatoolkit=11.3.

My apologies if I did not understand the install directions. I installed detectron2 using the instructions from the facebook page using python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html. Then I cloned the two repos as suggested and cd'ed into them to run their train_net.py script.

I used the LIVECell config file with the only modification being the path to the images, I was not sure if there was another script I needed to run? Then I ran the following command in the detectron2-ResNeSt folder

python ./tools/train_net.py --config-file ../LIVECell/model/anchor_based/livecell_config.yaml  --eval-only MODEL.WEIGHTS ../LIVECell_anchor_based_model.pth

and received the error:

Command Line Args: Namespace(config_file='../LIVECell/model/anchor_based/livecell_config.yaml', dist_url='tcp://127.0.0.1:50152', eval_only=True, machine_rank=0, num_gpus=1, num_machines=1, opts=['MODEL.WEIGHTS', '../LIVECell_anchor_based_model.pth'], resume=False)
Traceback (most recent call last):
  File "./tools/train_net.py", line 157, in <module>
    launch(
  File "/home/carsen/anaconda3/envs/deepcell/lib/python3.8/site-packages/detectron2/engine/launch.py", line 82, in launch
    main_func(*args)
  File "./tools/train_net.py", line 127, in main
    cfg = setup(args)
  File "./tools/train_net.py", line 119, in setup
    cfg.merge_from_file(args.config_file)
  File "/home/carsen/anaconda3/envs/deepcell/lib/python3.8/site-packages/detectron2/config/config.py", line 69, in merge_from_file
    self.merge_from_other_cfg(loaded_cfg)
  File "/home/carsen/anaconda3/envs/deepcell/lib/python3.8/site-packages/fvcore/common/config.py", line 132, in merge_from_other_cfg
    return super().merge_from_other_cfg(cfg_other)
  File "/home/carsen/anaconda3/envs/deepcell/lib/python3.8/site-packages/yacs/config.py", line 217, in merge_from_other_cfg
    _merge_a_into_b(cfg_other, self, self, [])
  File "/home/carsen/anaconda3/envs/deepcell/lib/python3.8/site-packages/yacs/config.py", line 478, in _merge_a_into_b
    _merge_a_into_b(v, b[k], root, key_list + [k])
  File "/home/carsen/anaconda3/envs/deepcell/lib/python3.8/site-packages/yacs/config.py", line 478, in _merge_a_into_b
    _merge_a_into_b(v, b[k], root, key_list + [k])
  File "/home/carsen/anaconda3/envs/deepcell/lib/python3.8/site-packages/yacs/config.py", line 491, in _merge_a_into_b
    raise KeyError("Non-existent config key: {}".format(full_key))
KeyError: 'Non-existent config key: MODEL.RESNETS.RADIX'

I also tried using the author's new repo and then received a different key error. When trying to directly pip install -e . in their repo I received several errors in the build which I could not figure out how to resolve (tried their suggestions but failed). Do you perhaps have more detailed instructions for installing detectron2 from source from their repo if that's what's required?

Thanks!

opened by carsen-stringer 4

Conversion of model predictions to instance masks fails

Hi @RickardSjogren, I managed to set-up the environment for the anchor-free model now and also to run it (for the LiveCELL val data as well as for some test data from our Incucytes). However, I can't figure out how to extract the cell instance segmentation results from the output of the run: The output directory has a subfolder inference, that contains the file coco_instance_results.json. However, this file is not in a valid coco format:

from pycocotools.coco import COCO
COCO("inference/coco_instance_results.json")

fails with

Traceback (most recent call last):
  File "/home/uni02/UMIN/pape41/Work/my_projects/incucyte_projects/livecell_models/to_imag$
s.py", line 36, in <module>
    annotations_to_masks("livecell-out/inference/coco_instances_results.json", "test-out/p$
edictions")
  File "/home/uni02/UMIN/pape41/Work/my_projects/incucyte_projects/livecell_models/to_image
s.py", line 10, in annotations_to_masks
    coco_anno = COCO(annotation_file)
  File "/home/uni02/UMIN/pape41/Work/software/conda/mambaforge/envs/main310/lib/python3.10/
site-packages/pycocotools/coco.py", line 83, in __init__
    assert type(dataset)==dict, 'annotation file format {} not supported'.format(type(datas
et))
AssertionError: annotation file format <class 'list'> not supported

and indeed the json contains a list that does not look like valid coco instance annotations:

[...
 {'image_id': 1,
  'category_id': 1,
  'bbox': [161.37379455566406,
   354.2332763671875,
   21.690078735351562,
   54.1363525390625],
  'score': 0.817893385887146,
  'segmentation': {'size': [848, 1280],
   'counts': '_aV47Uj05]Oc0M2N1O00O2M1010O101O10O2O1O1N2N2N3K9E]S\\l0'},
  'mask_score': 0.7007805705070496},
 ...]

Any idea on how to extract instance annotations from these results, or how to change the settings in order to obtain proper coco instance annotations?

For some more detail on how I ran this:

I used the Anchor free/Livecell model (https://github.com/sartorius-research/LIVECell/blob/main/model/anchor_free/livecell_config.yaml)
And then follow the steps as described here: https://github.com/sartorius-research/LIVECell/tree/main/model#evaluate to run evaluation.

opened by constantinpape 3

IndexError during evaluation with coco_evaluation_resnest.py

Hello and thanks for your interesting work!

When evaluating your Anchor-based model with coco_evaluation_resnest.py, I get an error saying:

"IndexError: too many indices for array: array is 4-dimensional, but 5 were indexed" for line 657 in coco_evaluation_resnest.py rec_pre_iou = [recalls[iou_idx, :, :, 0, -1].mean() for iou_idx in range(recalls.shape[0])].

Seems like pycocotools returns "recall = -np.ones((T,K,A,M))" in cocoeval.py and therefore only 4 indices are given instead of 5 like in "precision = -np.ones((T,R,K,A,M))". Removing the second index in "rec_pre_iou = [recalls[iou_idx, :, :, 0, -1].mean()" seems to do the job.

Can you maybe reproduce the error and tell me, if the solution for this bug is correct?

Thanks in advance!

opened by hellebe 2
Why is the number of images in 'json' file greater than in image folder?
Hi! Thanks for your great works and code repo.

I'm new in Cell Segmentation and going to reproduce your work as a part of my graduation projects. I'm a liitle confused after loading the livecell_coco_{train/val/test}.json. Below is the detail code.

from pycocotools.coco import COCO coco = COCO('livecell_coco_test.json') test_ids = coco.getImgIds() print(len(test_ids)) # 1564 for test # 570 for val # 3253 for train import os test_imgs = os.listdir('test_images') print(len(test_imgs)) # 1512 for test < 1564 # 3727 for trainval < 3253 + 570

Note: The json and images data download from your repo Github website, LIVECell-wide train and evaluate and How to access LIVECell, repectively. Maybe I have renamed the image folder names, but it doesn't matter. :>

It seems that one image in {trainval/test}_images may get more than one imgae ids in livecell_coco_{train/val/test}.json.

Is that true? Could you please help explain it? :)
opened by KeplerWang 2
Single cell annotation for SK-OV-3 for test seems broken

Hello!

Thank you to upload the great dataset.

I try to perform on a single SK-OV-3 cell condition by using skov3/test.json. However, the test.json of a single SK-OV-3 seems broken. (https://livecell-dataset.s3.eu-central-1.amazonaws.com/LIVECell_dataset_2021/annotations/LIVECell_single_cells/skov3/test.json) I try to open this JSON file by using pycocotools. But, JSON decoder error occurs (following error).

By observing json file, this file seems to end in the middle of the description. Other files end with a license description. Is this file correct?

Best regard

opened by naivete5656 2

Error loading 'livecell_coco_train.json' using pycocotools

Hello.

I am trying to write a data generator to use LiveCell data to train a Keras model. While trying to read the annotation file 'livecell_coco_train.json' using pycocotools with the following code:

from pycocotools.coco import COCO coco = COCO(annFile)

The following error is raised:


loading annotations into memory...
Done (t=28.31s)
creating index...

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_57/539935055.py in <module>
     12     json.dump(dataset, f)
     13 
---> 14 check_ANNT = COCO(ANNT_TRAIN_COPY)

/opt/conda/lib/python3.7/site-packages/pycocotools/coco.py in __init__(self, annotation_file)
     87             print('Done (t={:0.2f}s)'.format(time.time()- tic))
     88             self.dataset = dataset
---> 89             self.createIndex()
     90 
     91     def createIndex(self):

/opt/conda/lib/python3.7/site-packages/pycocotools/coco.py in createIndex(self)
     96         if 'annotations' in self.dataset:
     97             for ann in self.dataset['annotations']:
---> 98                 imgToAnns[ann['image_id']].append(ann)
     99                 anns[ann['id']] = ann
    100 

TypeError: string indices must be integers

I am not sure how to correct this error or if this is the wrong approach to read the annotation file.

opened by alxndrdiaz 2

Training vs. Validation loss, as reported in the article

Hello everyone,

First, thanks for releasing such an amazing dataset! Incredible work!

I’m currently training a model on your dataset for my engineering degree, and was wondering how you generate with Detectron2 the validation loss, as reported in the supplementary notes: "All training was run for a predefined set of iterations and the loss on a validation set, separate from the training and test sets, were monitored to assess model over- and under-fitting. Model checkpoints were saved and used for evaluation based on which had the lowest validation loss (Supplementary Figs. 9 and 10) on the rationale that the lowest validation loss represents a good balance between an under- and over-fitted model."

I am able to get a very similar loss curve for the training set. I used the validation set for monitoring the performance of the model for the segmentation task, and evaluated the trained model on the test set, but I don't know how you get the validation loss you show on your graphs.

Any hint would mean a lot. Thank you in advance,

Kind regards,

Matías Stingl

opened by mgstingl 1
A few images have incomplete annotations.

Hi, thank you for making such an interesting dataset publicly available!

If I'm not mistaken, I think there are a few images with incomplete annotations. In other words, the json entry only contains the coordinates of the segmentation for 1 or 2 cells, while the image shows clearly many more cells. Example in the image below (id = 150535). Files affected in the table.

|id | file_name | set| segmented cells | |----------:|:-------------|------:|:----| | 205798 | A172_Phase_D7_1_01d20h00m_1.png | train | 1 | | 10517 | BT474_Phase_B3_1_03d00h00m_3.png | train | 1 | | 150535 | A172_Phase_A7_1_01d04h00m_3.png | train | 1 | | 1494964 | SHSY5Y_Phase_D10_1_01d16h00m_4.png | train | 1 | | 718286 | BV2_Phase_D4_1_00d12h00m_2.png | train | 9 | | 628256 | BV2_Phase_C4_1_01d16h00m_3.png | train | 4 | | 1248961 | SkBr3_Phase_H3_2_00d00h00m_1.png | val | 1 | | 976048 | BV2_Phase_A4_2_00d00h00m_1.png | test | 2 | | 1007442 | BV2_Phase_A4_2_02d04h00m_3.png | test | 2 |

The list may not be complete, as the threshold I used was 10 segments per image, but I'm confident some of the images with around 20+ segmented cells are all ok.

The number of images affected seems to be very small, so excluding these images should solve any issue. Nevertheless, I think this information could be relevant for people that are interested in using the dataset for their DL models.

opened by ivan-ea 7

Error "libcudart.so.11.0: cannot open shared object file" when using Docker image

I've been trying to train the LIVECELL anchor-based model with my dataset, but the model failed to start learning.

I used Docker image pytorch/pytorch:1.5-cuda10.1-cudnn7-devel to match the versions you mentioned in the paper. Then I got the error saying "libcudart.so.11.0: cannot open shared object file: No such file or directory".

The error traceback is as follows:

Traceback (most recent call last):
  File "train_net.py", line 27, in <module>
    from detectron2.data import MetadataCatalog
  File "/workspaces/livecell-anchor-based/detectron2-ResNeSt/detectron2/data/__init__.py", line 4, in <module>
    from .build import (
  File "/workspaces/livecell-anchor-based/detectron2-ResNeSt/detectron2/data/build.py", line 14, in <module>
    from detectron2.structures import BoxMode
  File "/workspaces/livecell-anchor-based/detectron2-ResNeSt/detectron2/structures/__init__.py", line 6, in <module>
    from .keypoints import Keypoints, heatmaps_to_keypoints
  File "/workspaces/livecell-anchor-based/detectron2-ResNeSt/detectron2/structures/keypoints.py", line 6, in <module>
    from detectron2.layers import interpolate
  File "/workspaces/livecell-anchor-based/detectron2-ResNeSt/detectron2/layers/__init__.py", line 3, in <module>
    from .deform_conv import DeformConv, ModulatedDeformConv
  File "/workspaces/livecell-anchor-based/detectron2-ResNeSt/detectron2/layers/deform_conv.py", line 10, in <module>
    from detectron2 import _C
ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory

This is probably because the CUDA toolkit version inside Docker image (10.1) mismatches that of Detecton2-ResNest (11.x?). Should I specify the version of Detectron2-ResNest?

Environment

Hardware

OS: Ubuntu 20.04.5 LTS on WSL 2 CPU: Intel Core i9-10940X GPU：NVIDIA TITAN RTX（Turing architecture） DRAM: 100GB

nvidia-smi

opened by tsh11na 2

Owner

Sartorius Corporate Research

GitHub

Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

Portrait Photo Retouching with PPR10K Paper | Supplementary Material PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask an

184 Dec 11, 2022

A Large-Scale Dataset for Spinal Vertebrae Segmentation in Computed Tomography

75 Dec 26, 2022

Label-Free Model Evaluation with Semi-Structured Dataset Representations

Label-Free Model Evaluation with Semi-Structured Dataset Representations Prerequisites This code uses the following libraries Python 3.7 NumPy PyTorch

8 Oct 6, 2022

Python library to receive live stream events like comments and gifts in realtime from TikTok LIVE.

TikTokLive A python library to connect to and read events from TikTok's LIVE service A python library to receive and decode livestream events such as

277 Dec 23, 2022

O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning (CoRL 2021)

O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning Object-object Interaction Affordance Learning. For a given object-object int

26 Nov 4, 2022

A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data

A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data Overview Clustering analysis is widely utilized in single-cell RNA-seque

3 May 8, 2022

Label Mask for Multi-label Classification

LM-MLC 一种基于完型填空的多标签分类算法 1 前言本文主要介绍本人在全球人工智能技术创新大赛【赛道一】设计的一种基于完型填空(模板)的多标签分类算法：LM-MLC，该算法拟合能力很强能感知标签关联性，在多个数据集上测试表明该算法与主流算法无显著性差异，在该比赛数据集上的dev效果很好，但是由

52 Nov 20, 2022

Official implementation of "Open-set Label Noise Can Improve Robustness Against Inherent Label Noise" (NeurIPS 2021)

Open-set Label Noise Can Improve Robustness Against Inherent Label Noise NeurIPS 2021: This repository is the official implementation of ODNL. Require

12 Dec 7, 2022

A PyTorch implementation of ICLR 2022 Oral paper PiCO: Contrastive Label Disambiguation for Partial Label Learning

PiCO: Contrastive Label Disambiguation for Partial Label Learning This is a PyTorch implementation of ICLR 2022 Oral paper PiCO; also see our Project

83 May 11, 2022

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation mode

36 Oct 30, 2022

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation models. It contains 17 different amateur subjects performing 30 sports-related actions each, for a total of 510 action clips.

25 Jun 20, 2021

LIVECell - A large-scale dataset for label-free live cell segmentation

Related tags

Overview

LIVECell dataset

Background

How to access LIVECell

LIVECell-wide train and evaluate

Single cell-type experiments

Dataset size experiments

Comparison to fluorescence-based object counts

Download all of LIVECell

File structure

LIVECell_dataset_2021/images

LIVECell_dataset_2021/annotations/

models/

configs/

nuclear_count_benchmark/

LICENSE

Comments

Environment

Hardware

nvidia-smi

Owner

Sartorius Corporate Research

Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

A Large-Scale Dataset for Spinal Vertebrae Segmentation in Computed Tomography

Label-Free Model Evaluation with Semi-Structured Dataset Representations

Python library to receive live stream events like comments and gifts in realtime from TikTok LIVE.

O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning (CoRL 2021)

A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data

Label Mask for Multi-label Classification

Official implementation of "Open-set Label Noise Can Improve Robustness Against Inherent Label Noise" (NeurIPS 2021)

A PyTorch implementation of ICLR 2022 Oral paper PiCO: Contrastive Label Disambiguation for Partial Label Learning

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

A pytorch implementation of the CVPR2021 paper "VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild"

Large Scale Multi-Illuminant (LSMI) Dataset for Developing White Balance Algorithm under Mixed Illumination

A large-scale face dataset for face parsing, recognition, generation and editing.

FactSeg: Foreground Activation Driven Small Object Semantic Segmentation in Large-Scale Remote Sensing Imagery (TGRS)

Code for "FPS-Net: A convolutional fusion network for large-scale LiDAR point cloud segmentation".

Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021)

Kaggle: Cell Instance Segmentation

Solution of Kaggle competition: Sartorius - Cell Instance Segmentation