Streamlit tool to explore coco datasets

Overview

What is this

This tool given a COCO annotations file and COCO predictions file will let you explore your dataset, visualize results and calculate important metrics.

Running the explorer on example data

You can use the predictions I prepared and explore the results on the COCO validation dataset. The predictions are coming from a Mask R-CNN model trained with mmdetection.

  1. Download (and extract in project directory) the labels, annotations and images:

https://drive.google.com/open?id=1wxIagenNdCt_qphEe8gZYK7H2_to9QXl

  1. Setup using docker
sudo docker run -p 8501:8501 -it -v "$PWD"/coco_data:/coco_data i008/cocoexp:latest  \
    --coco_train /coco_data/ground_truth_annotations.json \
    --coco_predictions /coco_data/predictions.json  \
    --images_path /coco_data/images/
  1. Setup using conda
conda env update
conda activate cocoexplorer
streamlit run coco_explorer.py -- --coco_train ./coco_data/ground_truth_annotations.json --coco_predictions ./coco_data/predictions.json  --images_path ./coco_data/val2017/
  1. Setup using pip
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
streamlit run coco_explorer.py -- --coco_train ./coco_data/ground_truth_annotations.json --coco_predictions ./coco_data/predictions.json  --images_path ./coco_data/val2017/
  1. go to http://localhost:8501

Running on your own data

In the same way you can explore your own results. Just follow the official COCO dataset format for annotations and predictions.

Examples

alt text

alt text

Comments
  • Variable IoU

    Variable IoU

    Here's more :grin:

    With this you can also select a different mode of comparison during startup (e.g. segm instead of bbox), and vary the overlap criterion for COCOEval (which gets reflected in statistics, too).

    opened by bertsky 7
  • various improvements

    various improvements

    Fixes #6 and #9. If this all-at-once PR is inconvenient, I can also split it up functionally.

    I've also changed the st.image(figure.savefig()) to st.pyplot(figure) because that seemed faster. It depends on the image resolution and dpi parameter for savefig though. Still, it seems more logical that way. The single parameter to control the size/speed trade-off would be figsize then. Perhaps we should make that configurable?

    I have also added an input field to look up images by file_name (which is helpful for navigation in large datasets), and an option to make the category view show only that single category.

    opened by bertsky 4
  • mask visualization does not always work

    mask visualization does not always work

    I have COCO datasets for PNGs in RGB colorspace, with segmentation converted from polygons to RLE format prior to starting COCO Explorer.

    But on some images, this happens when activating Draw prediction masks:

    ValueError: operands could not be broadcast together with shapes (56179,) (3,)
    Traceback:
    File "/opt/conda/lib/python3.7/site-packages/streamlit/ScriptRunner.py", line 322, in _run_script
        exec(code, module.__dict__)
    File "/cocodemo/coco_explorer.py", line 153, in <module>
        app(args)
    File "/cocodemo/coco_explorer.py", line 65, in app
        figsize=(15, 15))
    File "/cocodemo/cocoinspector.py", line 172, in visualize_image
        draw_pred_mask=draw_pred_mask)
    File "/cocodemo/vis.py", line 60, in vis_image
        img[mask] = img[mask] * 0.5 + color_mask * 0.5
    

    What am I doing wrong?

    (Also, I can see nothing when activating Draw ground truth masks...)

    opened by bertsky 3
  • Images without detections show as error

    Images without detections show as error

    On some of my images (presumably those without detections or without matches), I get the following error:

    ValueError: need at least one array to concatenate
    Traceback:
    File "/opt/conda/lib/python3.7/site-packages/streamlit/ScriptRunner.py", line 322, in _run_script
        exec(code, module.__dict__)
    File "/cocodemo/coco_explorer.py", line 153, in <module>
        app(args)
    File "/cocodemo/coco_explorer.py", line 65, in app
        figsize=(15, 15))
    File "/cocodemo/cocoinspector.py", line 159, in visualize_image
        gtmatches, dtmatches = self.get_detection_matches(image_id)
    File "/cocodemo/cocoinspector.py", line 127, in get_detection_matches
        [c for c in self.cocoeval.evalImgs if c and c['image_id'] == image_id]]).astype(int)
    
    opened by bertsky 3
  • evaluation statistics does not work

    evaluation statistics does not work

    In my COCO dataset, there are 3 classes plus background. When I click CoCo Scores, I get the following error:

    ValueError: Length mismatch: Expected axis has 5 elements, new values have 6 elements
    Traceback:
    File "/opt/conda/lib/python3.7/site-packages/streamlit/ScriptRunner.py", line 322, in _run_script
        exec(code, module.__dict__)
    File "/cocodemo/coco_explorer.py", line 153, in <module>
        app(args)
    File "/cocodemo/coco_explorer.py", line 135, in app
        df = inspector.ap_per_class()
    File "/cocodemo/cocoinspector.py", line 97, in ap_per_class
        df.columns = c
    File "/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py", line 5080, in __setattr__
        return object.__setattr__(self, name, value)
    File "pandas/_libs/properties.pyx", line 69, in pandas._libs.properties.AxisProperty.__set__
    File "/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py", line 638, in _set_axis
        self._data.set_axis(axis, labels)
    File "/opt/conda/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 155, in set_axis
        'values have {new} elements'.format(old=old_len, new=new_len))
    
    opened by bertsky 3
  • regression in Docker image

    regression in Docker image

    On i008/coco_explorer version 22a923893c29 I now get:

    ImportError: libGL.so.1: cannot open shared object file: No such file or directory
    Traceback:
    File "/opt/conda/lib/python3.8/site-packages/streamlit/ScriptRunner.py", line 322, in _run_script
        exec(code, module.__dict__)
    File "/cocodemo/coco_explorer.py", line 8, in <module>
        from cocoinspector import CoCoInspector
    File "/cocodemo/cocoinspector.py", line 9, in <module>
        from vis import vis_image
    File "/cocodemo/vis.py", line 3, in <module>
        from easyimages.utils import change_box_order
    File "/opt/conda/lib/python3.8/site-packages/easyimages/__init__.py", line 10, in <module>
        from .easyimages import EasyImage, EasyImageList, bbox
    File "/opt/conda/lib/python3.8/site-packages/easyimages/easyimages.py", line 23, in <module>
        from imutils.convenience import build_montages
    File "/opt/conda/lib/python3.8/site-packages/imutils/__init__.py", line 8, in <module>
        from .convenience import translate
    File "/opt/conda/lib/python3.8/site-packages/imutils/convenience.py", line 6, in <module>
        import cv2
    File "/opt/conda/lib/python3.8/site-packages/cv2/__init__.py", line 5, in <module>
        from .cv2 import *
    

    Maybe some additional system libraries need to be installed, or opencv_python can be reduced to opencv-python-headless?

    opened by bertsky 2
  • Linspace for numpy 1.18 fix

    Linspace for numpy 1.18 fix

    opened by 8greg8 1
  • CI: make forks work and enable for PRs

    CI: make forks work and enable for PRs

    @i008 Wow, I did not notice you already set up a CI!

    Alas, it still fails in my fork due to permissions:

    Run docker/login-action@v1
    Error: Username and password required
    

    Also, I noticed there's no such check for the pull request branches yet.

    opened by bertsky 1
  • interpretation of scores/statistics

    interpretation of scores/statistics

    I am trying to make sense of the scores output provided here (which puzzles me over what I am used to from pycocotools itself, even more so after I have made the iouThrs param freely changeable).

    Apparently you have made some modifications to cocoeval...

    --- ../pycocotools/PythonAPI/pycocotools/cocoeval.py    2020-04-02 12:41:09.770697735 +0200
    +++ pycoco.py   2021-02-16 13:16:39.248434628 +0100
    @@ -1,12 +1,11 @@
    -__author__ = 'tsungyi'
    -
     import numpy as np
     import datetime
     import time
     from collections import defaultdict
    -from . import mask as maskUtils
    +from pycocotools import mask as maskUtils
     import copy
     
    +
     class COCOeval:
         # Interface for evaluating detection on the Microsoft COCO dataset.
         #
    @@ -68,7 +67,6 @@
                 print('iouType not specified. use default iouType segm')
             self.cocoGt   = cocoGt              # ground truth COCO API
             self.cocoDt   = cocoDt              # detections COCO API
    -        self.params   = {}                  # evaluation parameters
             self.evalImgs = defaultdict(list)   # per-image per-category evaluation results [KxAxI] elements
             self.eval     = {}                  # accumulated evaluation results
             self._gts = defaultdict(list)       # gt for evaluation
    @@ -203,21 +202,26 @@
             if len(gts) == 0 or len(dts) == 0:
                 return []
             ious = np.zeros((len(dts), len(gts)))
    -        sigmas = np.array([.26, .25, .25, .35, .35, .79, .79, .72, .72, .62,.62, 1.07, 1.07, .87, .87, .89, .89])/10.0
    +        sigmas = p.kpt_oks_sigmas
             vars = (sigmas * 2)**2
             k = len(sigmas)
             # compute oks between each detection and ground truth object
             for j, gt in enumerate(gts):
                 # create bounds for ignore regions(double the gt bbox)
                 g = np.array(gt['keypoints'])
    -            xg = g[0::3]; yg = g[1::3]; vg = g[2::3]
    +            xg = g[0::3];
    +            yg = g[1::3];
    +            vg = g[2::3]
                 k1 = np.count_nonzero(vg > 0)
                 bb = gt['bbox']
    -            x0 = bb[0] - bb[2]; x1 = bb[0] + bb[2] * 2
    -            y0 = bb[1] - bb[3]; y1 = bb[1] + bb[3] * 2
    +            x0 = bb[0] - bb[2];
    +            x1 = bb[0] + bb[2] * 2
    +            y0 = bb[1] - bb[3];
    +            y1 = bb[1] + bb[3] * 2
                 for i, dt in enumerate(dts):
                     d = np.array(dt['keypoints'])
    -                xd = d[0::3]; yd = d[1::3]
    +                xd = d[0::3];
    +                yd = d[1::3]
                     if k1>0:
                         # measure the per-keypoint distance if keypoints visible
                         dx = xd - xg
    @@ -334,6 +338,7 @@
             M           = len(p.maxDets)
             precision   = -np.ones((T,R,K,A,M)) # -1 for the precision of absent categories
             recall      = -np.ones((T,K,A,M))
    +        scores = -np.ones((T, R, K, A, M))
     
             # create dictionary for future indexing
             _pe = self._paramsEval
    @@ -364,6 +369,7 @@
                         # different sorting method generates slightly different results.
                         # mergesort is used to be consistent as Matlab implementation.
                         inds = np.argsort(-dtScores, kind='mergesort')
    +                    dtScoresSorted = dtScores[inds]
     
                         dtm  = np.concatenate([e['dtMatches'][:,0:maxDet] for e in E], axis=1)[:,inds]
                         dtIg = np.concatenate([e['dtIgnore'][:,0:maxDet]  for e in E], axis=1)[:,inds]
    @@ -383,6 +389,7 @@
                             rc = tp / npig
                             pr = tp / (fp+tp+np.spacing(1))
                             q  = np.zeros((R,))
    +                        ss = np.zeros((R,))
     
                             if nd:
                                 recall[t,k,a,m] = rc[-1]
    @@ -391,7 +398,8 @@
     
                             # numpy is slow without cython optimization for accessing elements
                             # use python array gets significant speed improvement
    -                        pr = pr.tolist(); q = q.tolist()
    +                        pr = pr.tolist();
    +                        q = q.tolist()
     
                             for i in range(nd-1, 0, -1):
                                 if pr[i] > pr[i-1]:
    @@ -401,15 +409,18 @@
                             try:
                                 for ri, pi in enumerate(inds):
                                     q[ri] = pr[pi]
    +                                ss[ri] = dtScoresSorted[pi]
                             except:
                                 pass
                             precision[t,:,k,a,m] = np.array(q)
    +                        scores[t, :, k, a, m] = np.array(ss)
             self.eval = {
                 'params': p,
                 'counts': [T, R, K, A, M],
                 'date': datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
                 'precision': precision,
                 'recall':   recall,
    +            'scores': scores,
             }
             toc = time.time()
             print('DONE (t={:0.2f}s).'.format( toc-tic))
    @@ -419,7 +430,12 @@
             Compute and display summary metrics for evaluation results.
             Note this functin can *only* be applied on the default parameter setting
             '''
    +
    +        self.per_class_precisions = []
    +        self.ap_per_class_columns = []
    +
             def _summarize( ap=1, iouThr=None, areaRng='all', maxDets=100 ):
    +            print(ap, iouThr, areaRng, maxDets)
                 p = self.params
                 iStr = ' {:<18} {} @[ IoU={:<9} | area={:>6s} | maxDets={:>3d} ] = {:0.3f}'
                 titleStr = 'Average Precision' if ap == 1 else 'Average Recall'
    @@ -448,8 +464,22 @@
                     mean_s = -1
                 else:
                     mean_s = np.mean(s[s>-1])
    +
    +                # cacluate AP(average precision) for each category
    +                num_classes = 80
    +                avg_ap = 0.0
    +                if ap == 1:
    +                    pcp = {}
    +                    for i, c in enumerate(sorted(list(self.cocoDt.cats.values()), key=lambda x: x['id'])):
    +                        pcp[c['name']] = np.mean(s[:, :, i, :])
    +
    +                    self.per_class_precisions.append(pcp)
    +                    self.ap_per_class_columns.append(f"ap={ap} iouThr={iouThr or '0.5:0.95'} area={areaRng} maxDets={maxDets}")
    +
    +
                 print(iStr.format(titleStr, typeStr, iouStr, areaRng, maxDets, mean_s))
                 return mean_s
    +
             def _summarizeDets():
                 stats = np.zeros((12,))
                 stats[0] = _summarize(1)
    @@ -494,12 +527,13 @@
         '''
         Params for coco evaluation api
         '''
    +
         def setDetParams(self):
             self.imgIds = []
             self.catIds = []
             # np.arange causes trouble.  the data point on arange is slightly larger than the true value
    -        self.iouThrs = np.linspace(.5, 0.95, np.round((0.95 - .5) / .05) + 1, endpoint=True)
    -        self.recThrs = np.linspace(.0, 1.00, np.round((1.00 - .0) / .01) + 1, endpoint=True)
    +        self.iouThrs = np.linspace(.5, 0.95, int(np.round((0.95 - .5) / .05) + 1), endpoint=True)
    +        self.recThrs = np.linspace(.0, 1.00, int(np.round((1.00 - .0) / .01) + 1), endpoint=True)
             self.maxDets = [1, 10, 100]
             self.areaRng = [[0 ** 2, 1e5 ** 2], [0 ** 2, 32 ** 2], [32 ** 2, 96 ** 2], [96 ** 2, 1e5 ** 2]]
             self.areaRngLbl = ['all', 'small', 'medium', 'large']
    @@ -509,12 +543,14 @@
             self.imgIds = []
             self.catIds = []
             # np.arange causes trouble.  the data point on arange is slightly larger than the true value
    -        self.iouThrs = np.linspace(.5, 0.95, np.round((0.95 - .5) / .05) + 1, endpoint=True)
    -        self.recThrs = np.linspace(.0, 1.00, np.round((1.00 - .0) / .01) + 1, endpoint=True)
    +        self.iouThrs = np.linspace(.5, 0.95, int(np.round((0.95 - .5) / .05) + 1), endpoint=True)
    +        self.recThrs = np.linspace(.0, 1.00, int(np.round((1.00 - .0) / .01) + 1), endpoint=True)
             self.maxDets = [20]
             self.areaRng = [[0 ** 2, 1e5 ** 2], [32 ** 2, 96 ** 2], [96 ** 2, 1e5 ** 2]]
             self.areaRngLbl = ['all', 'medium', 'large']
             self.useCats = 1
    +        self.kpt_oks_sigmas = np.array(
    +            [.26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07, .87, .87, .89, .89]) / 10.0
     
         def __init__(self, iouType='segm'):
             if iouType == 'segm' or iouType == 'bbox':
    

    First, some observations:

    • in the per_class_precisions aggregator, the mean includes empty cells represented as -1 (which distorts the average numerically; needs the equivalent of s[s > -1])
    • in its ap_per_class_columns title, the iouThr prints a fixed interval instead of the actual range substituted for None via parameters (iouStr would be correct here IIUC)
    • by making that calculation indented (i.e. dependent on len(s[s > -1]) > 0), only columns with matches will be shown (which can be confusing)
    • the Average mAP by class then is merely a contraction (df.mean(axis=1)) of that table, i.e. a macro-average, but IINM the only correct way to average over all axes/ranges is to aggregate by your respective criteria directly, i.e. micro-average; in this case, you probably just want to pick the first column (which contains an average over all IoU, all recall, all area, all number of detections)
    • it would probably be more interesting to get the precision at the maximum recall and then show both precision and recall side by side (or pick some other non-avg operating point, like the largest sum / product / F1 / MCC)
    • you can make these by-category averages without modifying pycocotools at all, simply by inspecting its eval table before calling summarize(); here's an example for the max-recall operating point per category:
        recalls = self.cocoeval.eval['recall'][0,:,0,-1] # at min-IoU, all-area, max-detections
        recallInds = np.searchsorted(self.cocoeval.params.recThrs, recalls) - 1
        classInds = np.arange(len(recalls))
        precisions = self.cocoeval.eval['precision'][0,recallInds,classInds,0,-1]
        catIds = self.coco_gt.getCatIds()
        for id_, cat in self.coco_gt.cats.items():
            name = cat['name']
            i = catIds.index(id_)
            print(name + ' prc: ' + str(precisions[i]))
            print(name + ' rec: ' + str(recalls[i]))
    
    • I don't understand the dtScoresSorted and scores modification TBH, but it does not seem to be used anywhere. Could it be this is just an earlier attempt at what you do with per_image_scores?
    • If so, perhaps the custom pycoco.py could be removed entirely, depending solely on pycocotools.cocoeval?
    • In the per_image_scores calculation, there are several non-numeric fields (like the lists of scores which are true positives, false positives, false negatives, or the list of IoUs of GT regions, or the list of categories). But they are all gone in the displayed table – IIUC because the sum() removes them. Isn't there a way to keep both the sums and the lists/columns in the table? I don't know much about pandas TBH. (But as it is, the table shows me the numerical sum of all categories.)
    opened by bertsky 0
  • enable caching

    enable caching

    I wonder if browsing can be speeded up by enabling @st.cache for inspector.visualize_image. I have tried with various options (allow_output_mutation=True, persist=True, even various hash_funcs= trying to avoid hashing certain types). But whatever I do, the server becomes extremely slow as soon as caching is activated.

    opened by bertsky 1
  • missing dependencies

    missing dependencies

    I believe torchvision is missing from the requirements. Also, it would be great if there was a standard setuptools compatible setup.py or setup.cfg here.

    Also, in Dockerfile there's a mistake on the last line:

    RUN git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI & pip install easyimages
    

    This will give:

    /bin/sh: 1: git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI: not found
    

    Perhaps you meant:

    RUN pip install easyimages pycocotools
    
    opened by bertsky 0
Owner
Jakub Cieslik
yet another data scientist
Jakub Cieslik
OCR Streamlit App is used to extract text from images using python's easyocr, pytorch and streamlit packages

OCR-Streamlit-App OCR Streamlit App is used to extract text from images using python's easyocr, pytorch and streamlit packages OCR app gets an image a

Siva Prakash 5 Apr 5, 2022
Streamlit App For Product Analysis - Streamlit App For Product Analysis

Streamlit_App_For_Product_Analysis Здравствуйте! Перед вами дашборд, позволяющий

Grigory Sirotkin 1 Jan 10, 2022
YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset

YOLOv5 ?? is a family of object detection architectures and models pretrained on the COCO dataset, and represents Ultralytics open-source research int

阿才 73 Dec 16, 2022
A set of tools for converting a darknet dataset to COCO format working with YOLOX

darknet格式数据→COCO darknet训练数据目录结构(详情参见dataset/darknet): darknet ├── class.names ├── gen_config.data ├── gen_train.txt ├── gen_valid.txt └── images

RapidAI-NG 148 Jan 3, 2023
The official homepage of the COCO-Stuff dataset.

The COCO-Stuff dataset Holger Caesar, Jasper Uijlings, Vittorio Ferrari Welcome to official homepage of the COCO-Stuff [1] dataset. COCO-Stuff augment

Holger Caesar 715 Dec 31, 2022
The official homepage of the (outdated) COCO-Stuff 10K dataset.

COCO-Stuff 10K dataset v1.1 (outdated) Holger Caesar, Jasper Uijlings, Vittorio Ferrari Overview Welcome to official homepage of the COCO-Stuff [1] da

Holger Caesar 263 Dec 11, 2022
UDP++ (ECCVW 2020 Oral), (Winner of COCO 2020 Keypoint Challenge).

UDP-Pose This is the pytorch implementation for UDP++, which won the Fisrt place in COCO Keypoint Challenge at ECCV 2020 Workshop. Top-Down Results on

null 20 Jul 29, 2022
Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.

Summary Explorer Summary Explorer is a tool to visually inspect the summaries from several state-of-the-art neural summarization models across multipl

Webis 42 Aug 14, 2022
An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results

EasyDatas An easy way to build PyTorch datasets. Modularly build datasets and automatically cache processed results Installation pip install git+https

Ximing Yang 4 Dec 14, 2021
Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data.

Deep Learning Dataset Maker Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data. How to use Down

deepbands 25 Dec 15, 2022
Cl datasets - PyTorch image dataloaders and utility functions to load datasets for supervised continual learning

Continual learning datasets Introduction This repository contains PyTorch image

berjaoui 5 Aug 28, 2022
All-in-one Docker container that allows a user to explore Nautobot in a lab environment.

Nautobot Lab This container is not for production use! Nautobot Lab is an all-in-one Docker container that allows a user to quickly get an instance of

Nautobot 29 Sep 16, 2022
[EMNLP 2020] Keep CALM and Explore: Language Models for Action Generation in Text-based Games

Contextual Action Language Model (CALM) and the ClubFloyd Dataset Code and data for paper Keep CALM and Explore: Language Models for Action Generation

Princeton Natural Language Processing 43 Dec 16, 2022
The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient (paper) @misc{zhang2021compress,

null 46 Dec 7, 2022
Visualizer using audio and semantic analysis to explore BigGAN (Brock et al., 2018) latent space.

BigGAN Audio Visualizer Description This visualizer explores BigGAN (Brock et al., 2018) latent space by using pitch/tempo of an audio file to generat

Rush Kapoor 2 Nov 21, 2022
A repository built on the Flow software package to explore cyber-security attacks on intelligent transportation systems.

A repository built on the Flow software package to explore cyber-security attacks on intelligent transportation systems.

George Gunter 4 Nov 14, 2022
This project aims to explore the deployment of Swin-Transformer based on TensorRT, including the test results of FP16 and INT8.

Swin Transformer This project aims to explore the deployment of SwinTransformer based on TensorRT, including the test results of FP16 and INT8. Introd

maggiez 87 Dec 21, 2022
Explore extreme compression for pre-trained language models

Code for paper "Exploring extreme parameter compression for pre-trained language models ICLR2022"

twinkle 16 Nov 14, 2022
Using Streamlit to host a multi-page tool with model specs and classification metrics, while also accepting user input values for prediction.

Predicitng_viability Using Streamlit to host a multi-page tool with model specs and classification metrics, while also accepting user input values for

Gopalika Sharma 1 Nov 8, 2021