serological measurements from multiplexed ELISA assays

Overview

unittests codecov

pysero

pysero enables serological measurements with multiplexed and standard ELISA assays.

The project automates estimation of antibody titers from data collected with ELISA assays performed with antigen-arrays and single antigens.

The immediate goal is to enable specific, sensitive, and quantitative serological surveys for COVID-19.

Installation

On a typical Windows, Mac, or Linux computer:

  • Create a conda environment: conda create --name pysero python=3.7
  • Activate conda environment: conda activate pysero
  • Pip install dependencies: pip install -r requirements.txt
  • Add the package to PYTHONPATH. Inside the package directory (...\serology-COVID19), do: export PYTHONPATH=$PYTHONPATH:$(pwd)

For installation notes for Jetson Nano, see these notes.

Usage

usage: pysero.py [-h] (-e | -a) -i INPUT -o OUTPUT
                 [-wf {well_segmentation,well_crop,array_interp,array_fit}]
                 [-d] [-r] [-m METADATA]

optional arguments:
  -h, --help            show this help message and exit
  -e, --extract_od      Segment spots and compute ODs
  -a, --analyze_od      Generate OD analysis plots
  -i INPUT, --input INPUT
                        Input directory path
  -o OUTPUT, --output OUTPUT
                        Output directory path, where a timestamped subdir will
                        be generated. In case of rerun, give path to
                        timestamped run directory
  -wf {well_segmentation,well_crop,array_interp,array_fit}, --workflow {well_segmentation,well_crop,array_interp,array_fit}
                        Workflow to automatically identify and extract
                        intensities from experiment. 'Well' experiments are
                        for standard ELISA. 'Array' experiments are for ELISA
                        assays using antigen arrays printed with Scienion
                        Array Printer Default: array_fit
  -d, --debug           Write debug plots of well and spots. Default: False
  -r, --rerun           Rerun wells listed in 'rerun_wells sheets of metadata
                        file. Default: False
  -m METADATA, --metadata METADATA
                        specify the file name for the experiment metadata.
                        Assumed to be in the same directory as images.
                        Default: 'pysero_output_data_metadata.xlsx'

Extract OD from antigen array images

python pysero.py -e -i <input> -o <output> -m <METADATA> will take metadata for antigen array and images as input, and output optical densities for each antigen. The optical densities are stored in an excel file at the following path: <output>/pysero_<input>_<year><month><day>_<hour><min>/median_ODs.xlsx

If rerunning some of the wells, the input metadata file needs to contain a sheet named 'rerun_wells' with a column named 'well_names' listing wells that will be rerun.

This workflow describes the steps in the extraction of optical density.

Generate OD analysis plots

python pysero.py -a -i <input> -o <output> -m <METADATA> will read pysero or scienion spot fitting outputs and generates analysis plots for each single antigen. 3 types of plots are supported for now (ROC, categorical, standard curves). The example xlsx config file can be found in \example folder in the repo.

An '-l' flag can be added to load the saved report from previous run to speed up loading.

Train a classifier using information from multiple antigens

One could train a machine learning classifier using ODs from multiple antigens to potentially improve the classification accuracy for sero-positive or sero-negative. The following script demonstrates how to do this with xgboost tree classifiers. python -m interpretation.train_classifier

Equipment list

The project aims to implement serological analysis for several antigen multiplexing approaches.

It currently supports:

  • classical ELISA.
  • antigen arrays printed with Scienion.

It can be extended to support:

  • antigen arrays printed with Echo.
  • antigen multiplexing with Luminex beads.

The antigen-arrays can be imaged with:

  • any transmission microscope with motorized XY stage.
  • turn-key plate imagers, e.g., SciReader CL2.
  • Squid - a variant of Octopi platform from Prakash Lab.

The project will also have tools for intersecting data from different assays for estimation of concentrations, determining level of cross-reactivity, ...

Validation

Current code is validated for analysis of anigen arrays imaged with Scienion Reader and is being refined for antigen arrays imaged with motorized XY microscope and Squid.

Contributions

We welcome bug reports, feature requests, and contributions to the code. Please see this page for most fruitful ways to contribute.

Comments
  • robust spot detection is needed

    robust spot detection is needed

    Segmentation loses quite a few antigen spots and is not robust to smearing of antigen (see #7 for images and segmentations). We need robust method for detecting where antigens are in the array grid. This step is likely time consuming, so making it fast in the next iteration is also good idea.

    opened by mattersoflight 25
  • Metadata encapsulation

    Metadata encapsulation

    Purpose:

    This PR introduces modules and functions to parse standardized metadata files. ** the standardized metadata file can be found here ** /Volumes/GoogleDrive/My Drive/ELISAarrayReader/data_for_tests_and_github/Metadata_and_Plate_configuration.xlsx

    Features:

    This PR enables one to select the type of metadata format using the -m flag at CLI:

    • ".xml" files (old style)
    • ".xlsx" (excel)

    It introduces two new modules:

    extract/metadata.py

    • contains a single class that will parse the correct metadata and assign constants

    extract/constants.py

    • holds hardware, array, and other constants in a single location.

    Details:

    All workflows follow a similar procedure, which are now encapsulated in the MetaData class:

    1. parse a text file containing hardware, array, experiment, data
    2. place relevant info from 1 into dictionaries or lists
    3. create arrays whose (row, column) positions contain information from 2
    4. compute other constants (such as indexed fiducial locations) from 3

    Minor details:

    • I've moved all notebooks to subfolders "notebooks". There is one subfolder for each of extract/transform/load
    • CI failed while reading excel files using Pandas. It required the dependency: "xlrd >= 1.0.0"
    • Writing temporary excel metadata files for tests, using Pandas, requires one of two backends: xlsxwriter, or openpyxl. Thus, we should consider keeping openpyxl as a dependency (for tests, at least).
    • "Well" segmentation for standard ELISAS uses a very simple workflow and mostly does not require any metadata from the above metadata sheet. It is mostly untouched.

    Questions:

    • A few experiments have a lot of .xlsx reports and extra files in the data folder. Can we move these to subfolders and maintain a cleaner "data" directory?
    • There are a few intermediate data structures (in constants.py): layout, fiducials, spots, replicates These are currently in constants.py but could easily be MetaData class variables, as they are not accessed anywhere else. It's an easy refactor, but let me know if you think they'll be needed anywhere else.

    Future:

    • there are very few tests at the moment. I spent some time getting the conftest fixtures created, which should be a good starting point for making more tests.
    • OD reporting is not formatted the per-antigen right now (it is per-well). This will be fixed soon.
    opened by bryantChhun 16
  • Rerun wells

    Rerun wells

    Added feature in which you can rerun some of the wells with different settings.

    If rerunning, you need to use the -r flag and specify the full path to the run dir so that results can be written in the same dir. You also need to create a sheet in your metadata.xlsx sheet named rerun_wells, with a column named 'well_name' with wells to be rerun.

    The feature was pretty straight forward, but I had to restructure the report writing to more easily load existing reports and well xlsx files and add the rerun wells to them, hope that's ok.

    opened by jennyfolkesson 14
  • Write integration tests

    Write integration tests

    What is the bottleneck? Please describe.

    Our review process involves processing 2-3 datasets to determine if the pipeline is functional. Running these tests takes 15-20 minutes.

    Describe the solution you'd like and alternatives

    Having automated integration test will speed the review process. It would make sense to create a dataset of 4 random wells from 3 acquisitions and test that ODs, intensities, and backgrounds output by the pipeline match with curated reference. If our results become more accurate, we will update the reference.

    Good time to add this feature would be after refactoring (#34 , #36 , #29 ), but before multiprocessing (#33).

    It will also be useful to invoke the integration test from cli with --test flag.

    opened by mattersoflight 10
  • Find images

    Find images

    Added support to read either a directory containing images or a number of subdirectories containing one image each.

    I also added tests to the code, which I recommend doing moving forward for those who are so inclined. It really helps if the repo is intended to have many collaborators.

    opened by jennyfolkesson 9
  • Add a check on the quality of spot detection and switch to the alternative approach if failed

    Add a check on the quality of spot detection and switch to the alternative approach if failed

    The failed wells in fit workflow seem to complement the failed wells in interpolation workflow. This is probably because fit approach tries to fit a full grid to the spots, but the fit can fail when only a few spots are present in the image. Interpolation approach on the other hand finds the x,y boundary of the grid and interpolates the grid positions. When fit approach converges to the right answer the determined spot positions seem more accurate than interpolation approach because fit approach fits to all the available spots but interpolation only uses the corner spots.

    I think we can have a check on the quality of spot detection and automatically switch to the alternative approach when detection using one approach fails. Some examples:

    Fit approach: C3_overlayCentroids D3_overlayCentroids D1_overlayCentroids C11_overlayCentroids

    Interpolation approach: C3_overlayCentroids D3_overlayCentroids D1_overlayCentroids C11_overlayCentroids

    Fit approach output: ELISAarrayReader/images_scienion/2020-04-08-14-48-49-COVID_concentrationtest_April8_images/Automated_Analysis_Spot-Fitting/4_10_13_21_19

    interpolation approach output (using spot_intensity_fix branch): ELISAarrayReader/images_scienion/2020-04-08-14-48-49-COVID_concentrationtest_April8_images/Automated_Analysis_Spot-Fitting/4_9_19_25_45

    opened by smguo 9
  • Ci tests

    Ci tests

    Adding continuous integration support for unit testing on Github. This will be activated each time there's a push. We can add another workflow later that will activate an integration test whenever there's a PR to master.

    opened by jennyfolkesson 7
  • Tune get_spot_intensity

    Tune get_spot_intensity

    I've noticed that the spot ROIs generated by get_spot_intensity can be a bit off here: https://github.com/czbiohub/serology-COVID19/blob/robustspotdetection/array_analyzer/workflows/registration_workflow.py#L158

    Sometimes it just crops a part of the detected spot, and in one case it detects an artifact. These are from 2020-04-04-14-18-32-COVID_April4_flusecondplate.

    B9 - wrong spot in top left B9_registration B9_composite_spots_img

    A6 - mismatching ROIs, only partial spot example in bottom left, one diagonal step up. A6_composite_spots_img

    Any thoughts @smguo @bryantChhun ?

    opened by jennyfolkesson 7
  • Robustspotdetection

    Robustspotdetection

    I believe this should be stable now, but @bryantChhun should have a final say before this is merged. I wanted to start the code review at least.

    One question - it looks like a lot of images are displayed now as you're running, and that makes the run pretty slow, plus you have all these plots popping up. What about changing this to write only?

    opened by jennyfolkesson 7
  • Streamline spot detection and OD estimation

    Streamline spot detection and OD estimation

    We have two major approaches to turn image of array into ODs using array_analyzer module. v2 is more robust but in development. v1 is functional for Scienion data. We need to keep processing data from Scienion at rapid clip.

    v1: use segmentation: crop well image -> estimate background -> segment spots -> interpolate between centroids of spots to find missing points in the segmentation -> compute spot density and background -> export ODs.

    v2: use point detector (https://github.com/czbiohub/serology-COVID19/issues/14): fit spots with opencv -> create a reference grid of fiducials using metadata from array printer and imager -> register reference grid to detected spots -> crop image based on bounding box of the reference grid -> estimate background -> compute spot density and background -> export ODs.

    Until v2 is stable, it is optimal that @bryantChhun and @jennyfolkesson focus on implementing v2 and test against representative images from Scienion and Octopi, before merging into master. Whereas, @smguo should maintain v1 and process Scienion data.

    Please add your comments on how the pipelines should be changed.

    opened by mattersoflight 7
  • Mock Regionprop throws error

    Mock Regionprop throws error

    Describe the bug See screenshot below. The bug is caused by mock regionprop's automatic calculation of "mean_intensity", which slices a subregion of the "intensity_image". Specifically, mock regionprop receives a centroid coordinate that is negative, which causes the errant slicing.

    To Reproduce Steps to reproduce the behavior:

    • Run well "B3" from this data: /Volumes/GoogleDrive/My Drive/ELISAarrayReader/images_cuttlefish/2020-05-01-17-29-54-COVID_May1_JBassay_images/rotated
    • use master branch (5-11-2020)

    Expected behavior Error with the screenshot below

    Screenshots image

    Operating environment (please complete the following information):

    • OS: MacOSX, Master, branch=fishy_registration, branch=enhance_spot_contrast
    opened by bryantChhun 5
  • Vacc vs non analysis

    Vacc vs non analysis

    This PR includes:

    • Updates to readme.
    • Reorganization of couple of modules.
    • Updates to the code for analyzing Ab response to COVID antigens before and after vaccination.
    opened by mattersoflight 1
  • Bump certifi from 2019.11.28 to 2022.12.7

    Bump certifi from 2019.11.28 to 2022.12.7

    Bumps certifi from 2019.11.28 to 2022.12.7.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • decreased redundancy in fitting

    decreased redundancy in fitting

    This branch fixes redundancies when 4pl fitting is running. Instead of fitting per serum, antigen, secondary, plate ID AND PRNT50 value, because PRNT50 values are the same per serum, I edited fit2df such that the PRNT5p values are preserved in the creation of a new dataframe, and so that the for loop doesn't run for longer than necessary.

    opened by lenafb 0
  • Bump opencv-python from 3.4.2.17 to 4.2.0.32

    Bump opencv-python from 3.4.2.17 to 4.2.0.32

    Bumps opencv-python from 3.4.2.17 to 4.2.0.32.

    Release notes

    Sourced from opencv-python's releases.

    4.2.0.32

    OpenCV version 4.2.0.

    Changes:

    • macOS environment updated from xcode8.3 to xcode 9.4
    • macOS uses now Qt 5 instead of Qt 4
    • Nasm version updated to Docker containers
    • multibuild updated

    Fixes:

    • don't use deprecated brew tap-pin, instead refer to the full package name when installing #267
    • replace get_config_var() with get_config_vars() in setup.py #274
    • add workaround for DLL errors in Windows Server #264

    3.4.9.31

    OpenCV version 3.4.9.

    Changes:

    • macOS environment updated from xcode8.3 to xcode 9.4
    • macOS uses now Qt 5 instead of Qt 4
    • Nasm version updated to Docker containers
    • multibuild updated

    Fixes:

    • don't use deprecated brew tap-pin, instead refer to the full package name when installing #267
    • replace get_config_var() with get_config_vars() in setup.py #274
    • add workaround for DLL errors in Windows Server #264

    4.1.2.30

    OpenCV version 4.1.2.

    Changes:

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 2
  • extract_od is excessively slow with Nautilus data

    extract_od is excessively slow with Nautilus data

    When analyzing images from the same plate (acquired on different machines) it takes multiSero ~7.5x longer to extract ODs from Nautilus images than Scienion images. Most likely this is due to the fact that multiSero looks for the well before applying masks and particle filter steps, hence WARNING:root:Couldn't find well in {current well} error.

    To Reproduce Steps to reproduce the behavior: To best reproduce illustrate this behavior, I recommend running multiSero debug configs with the following commands (main difference being path to Scienion data vs path to Nautilus data):

    Scienion run: -e -i "/Volumes/GoogleDrive/Shared drives/compmicro/ELISAarrayReader/images_scienion/2021-08-20-12-09-52-head_to_head_plateII/E_to_H" -o "/Volumes/GoogleDrive/Shared drives/compmicro/ELISAarrayReader/images_scienion/2021-08-20-12-09-52-head_to_head_plateII/E_to_H" -d

    Nautilus run: -e -i "/Volumes/GoogleDrive/Shared drives/compmicro/ELISAarrayReader/images_nautilus/2021-08-20-VLP_RVP_NS1/E_to_H" -o "/Volumes/GoogleDrive/Shared drives/compmicro/ELISAarrayReader/images_nautilus/2021-08-20-VLP_RVP_NS1/E_to_H" -d

    Expected behavior multiSero should be able run extract_od in <5 min.

    Operating environment (please complete the following information):

    • OS: Mac, Jetson Nano

    Additional context The strategy I'll pursue is to suppress multiSero's effort to search for the well boundaries, since Nautilus data often does not include well boundaries.

    opened by lenafb 0
Owner
Chan Zuckerberg Biohub
Chan Zuckerberg Biohub
A calculator for common measurements used in sci-fi books.

Sci-fi-speed-calculator A calculator for common measurements used in sci-fi books. Author: Tyler Windmemuth Purpose: This program allows sci-fi author

Tyler Windemuth 0 Apr 22, 2022
Python library for the analysis of dynamic measurements

Python library for the analysis of dynamic measurements The goal of this library is to provide a starting point for users in metrology and related are

Physikalisch-Technische Bundesanstalt - Department 9.4 'Metrology for the digital Transformation' 18 Dec 21, 2022
A large-scale dataset of both raw MRI measurements and clinical MRI images

fastMRI is a collaborative research project from Facebook AI Research (FAIR) and NYU Langone Health to investigate the use of AI to make MRI scans faster. NYU Langone Health has released fully anonymized knee and brain MRI datasets that can be downloaded from the fastMRI dataset page. Publications associated with the fastMRI project can be found at the end of this README.

Facebook Research 907 Jan 4, 2023
fetchmesh is a tool to simplify working with Atlas anchoring mesh measurements

A Python library for working with the RIPE Atlas anchoring mesh. fetchmesh is a tool to simplify working with Atlas anchoring mesh measurements. It ca

null 2 Aug 30, 2022
A calculator for common measurements used in sci-fi books.

Sci-fi-speed-calculator A calculator for common measurements used in sci-fi books. Author: Tyler Windmemuth Purpose: This program allows sci-fi author

Tyler Windemuth 0 Apr 22, 2022
Albert launcher extension for converting units of length, mass, speed, temperature, time, current, luminosity, printing measurements, molecular substance, and more

unit-converter-albert-ext Extension for converting units of length, mass, speed, temperature, time, current, luminosity, printing measurements, molecu

Jonah Lawrence 2 Jan 13, 2022
Python library for the analysis of dynamic measurements

Python library for the analysis of dynamic measurements The goal of this library is to provide a starting point for users in metrology and related are

Physikalisch-Technische Bundesanstalt - Department 9.4 'Metrology for the digital Transformation' 18 Dec 21, 2022
A series of Python scripts to access measurements from Fluke 28X meters. Fluke IR Remote Interface required.

Fluke289_data_access A series of Python scripts to access measurements from Fluke 28X meters. Fluke IR Remote Interface required. Created from informa

null 3 Dec 8, 2022