A declarative (epi)genomics visualization library for Python

Overview

gos 🦆

License PyPI Python Version tests Binder Open In Colab

gos is a declarative (epi)genomics visualization library for Python. It is built on top of the Gosling JSON specification, providing a simplified interface for authoring interactive genomic visualizations.

Installation

The gos API is under active development. Feedback is appreciated and welcomed.

pip install gosling

Documentation

See the Documentation Site for more information.

Example

Gosling visualization

import gosling as gos

data = gos.multivec(
    url="https://server.gosling-lang.org/api/v1/tileset_info/?d=cistrome-multivec",
    row="sample",
    column="position",
    value="peak",
    categories=["sample 1", "sample 2", "sample 3", "sample 4"],
    binSize=5,
)

base_track = gos.Track(data, width=800, height=100)

heatmap = base_track.mark_rect().encode(
    x=gos.Channel("start:G", axis="top"),
    xe="end:G",
    row=gos.Channel("sample:N", legend=True),
    color=gos.Channel("peak:Q", legend=True),
)

bars = base_track.mark_bar().encode(
    x=gos.Channel("position:G", axis="top"),
    y="peak:Q",
    row="sample:N",
    color=gos.Channel("sample:N", legend=True),
)

lines = base_track.mark_line().encode(
    x=gos.Channel("position:G", axis="top"),
    y="peak:Q",
    row="sample:N",
    color=gos.Channel("sample:N", legend=True),
)

gos.vertical(heatmap, bars, lines).properties(
    title="Visual Encoding",
    subtitle="Gosling provides diverse visual encoding methods",
    layout="linear",
    centerRadius=0.8,
    xDomain=gos.Domain(chromosome="1", interval=[1, 3000500]),
)

Example Gallery

We have started a gallery of community examples in gosling/examples/. If you are interested in contributing, please feel free to submit a PR! Checkout the existing JSON examples if you are looking for inspiration.

Development

pip install -e '.[dev]'

The schema bindings (gosling/schema/) and docs (doc/user_guide/API.rst) are automatically generated using the following. Please do not edit these files directly.

# generate gosling/schema/*
python tools/generate_schema_wrapper.py

Release

git checkout main && git pull

Update version in setup.py and doc/conf.py:

git add setup.py doc/conf.py
git commit -m "v0.[minor].[patch]"
git tag -a v0.[minor].[patch] -m "v0.[minor].[patch]"
git push --follow-tags

Design & Implementation

gos is inspired by and borrows heavily from Altair both in project philosophy and implementation. The internal Python API is auto-generated from the Gosling specification using code adapted directly from Altair to generate Vega-Lite bindings. This design choice guarantees that visualizations are type-checked in complete concordance with the Gosling specification, and that the Python API remains consistent with the evolving schema over time. Special thanks to Jake Vanderplas and others on schemapi.

Comments
  • How to visualize my own data?

    How to visualize my own data?

    Dear gos, I am going to visualize my own data (BED, bedgraph or bigwig), my data format is very simple. But I can not find from your document how to visualized a user's data. Thank you!

    Data format of bedgraph: chr1 10468 10469 1.000 chr1 10469 10470 0.667 chr1 10470 10471 1.000

    Thank you! Yang

    opened by liuyangzzu 7
  • fix: errors in experimental `bam` data

    fix: errors in experimental `bam` data

    The base branch of this PR is the experimental-data-server branch. This PR is not ready to merge.

    I wanted to fix the issue related to bam with the updated Gosling.js schema that requires indexUrl as a sibling property of url for the bam file. I basically changed the following code blocks:

    def _create_loader(type_: str, create_ts: Optional[CreateTileset] = None):
    -    def load(url: Union[pathlib.Path, str], **kwargs):
    +    def load(url: Union[pathlib.Path, str], indexUrl: str = None, **kwargs):
            """Adds resource to data_server if local file is detected."""
            fp = pathlib.Path(url)
            if fp.is_file():
                data = create_ts(fp) if create_ts else fp
                url = data_server(data)
    
    +        # bam's index file URL
    +        if indexUrl != None and pathlib.Path(indexUrl).is_file():
    +            indexUrl = data_server(indexUrl)
    
    -        return dict(type=type_, url=url, **kwargs)
    +        return dict(type=type_, url=url, indexUrl=indexUrl, **kwargs)
    
        return load
    
    - def bam(url: str, **kwargs):
    -    return dict(type="bam", url=url, **kwargs)
    + def bam(url: str, indexUrl: str, **kwargs):
    +    return dict(type="bam", url=url, indexUrl=indexUrl, **kwargs)
    

    While the spec seems to be generated properly:

    !ls ../../gos-data
    
    import gosling as gos
    from gosling.experimental.data import bam
    
    bd = bam(
        "../../gos-data/example_higlass.bam",
        "../../gos-data/example_higlass.bai",
    )
    
    gos.Track(bd).transform_coverage(
        startField="from", endField="to"
    ).mark_point(outlineWidth=0).encode(
        x="from:G",
        xe="to:G",
        y="coverage:Q",
        color=gos.value("grey")
    ).view(xDomain={"chromosome": "1", "interval": [136750, 139450]})
    
    {
      "tracks": [
        {
          "color": {
            "value": "grey"
          },
          "data": {
            "indexUrl": "http://localhost:21227/611af51a-c41b-4255-a76f-1b8ab402f698",
            "type": "bam",
            "url": "http://localhost:21227/c5031c79-1e2c-4269-8cce-9e668e69c3bf"
          },
          "dataTransform": [
            {
              "endField": "to",
              "startField": "from",
              "type": "coverage"
            }
          ],
          "height": 180,
          "mark": "point",
          "style": {
            "outlineWidth": 0
          },
          "width": 800,
          "x": {
            "field": "from",
            "type": "genomic"
          },
          "xe": {
            "field": "to",
            "type": "genomic"
          },
          "y": {
            "field": "coverage",
            "type": "quantitative"
          }
        }
      ],
      "xDomain": {
        "chromosome": "1",
        "interval": [
          136750,
          139450
        ]
      }
    }
    

    I get the following CORS errors.

    From the jupyter notebook: Screen Shot 2021-08-27 at 8 49 18 AM

    From the browser: Screen Shot 2021-08-27 at 8 49 24 AM

    Do you have any thoughts on this issue?

    I used the following two files:

    • https://aveit.s3.amazonaws.com/higlass/bam/example_higlass.bam.bai
    • https://aveit.s3.amazonaws.com/higlass/bam/example_higlass.bam
    opened by sehilyi 6
  • Rename `Chart` to `View`?

    Rename `Chart` to `View`?

    Since both Track and View are important concepts in Gosling, I think we can support explicit functions for both in the python package as well. In this regard, I was wondering whether it makes sense to rename Chart to View since Chart() looks to construct Gosling's views.

    // from
    gos.Chart(
        title='Overview and Detail Views',
        arrangement='horizontal',
        views=[overview, detail_view]
    )
    
    // to
    gos.View(
        title='Overview and Detail Views',
        arrangement='horizontal',
        views=[overview, detail_view] // two children views that are nested in this parent view
    )
    

    So, at the root level, users define a View, and it can contain either multiple Views or Tracks.

    opened by sehilyi 6
  • How to provide constant value for y?

    How to provide constant value for y?

    Is there a way to provide a constant value to y-axis in the bar chart (and others as well)? For example, in the gallery example of bar chart, is it possible to a set constant value 0.0005 to y instead of reading specifying it from the data object?

    opened by ManavalanG 4
  • potential CORs issue with local server?

    potential CORs issue with local server?

    Potentially related to #80.

    I am unable to reproduce (as the error has now gone away), but I ran into a CORs issue with Chrome 95 the first time I tried to load a local CSV. I switched to Firefox and the visualization worked fine. When I navigated back to Chrome, the CORs issue went away and hasn't come back.

    I will add more information here as I learn more, but I wanted to file something to note the issue is known.

    opened by manzt 4
  • feat: Generate API for Gosling.js v0.9.14

    feat: Generate API for Gosling.js v0.9.14

    Updates gos for changes in https://github.com/gosling-lang/gosling.js/pull/533

    Once the changes in gosling.js are merged and there is a new release, we will replace the schema URL in tool/generate_schema_wrappers.py. Currently a URL is used tracking the PR in gosling.js

    opened by manzt 4
  • chore: remove Jupyter extention + widget

    chore: remove Jupyter extention + widget

    Now that gosling works without jupyter extension (no install), I would like to remove the Jupyter extension-based renderer for the moment. Currently this just adds a lot of code complexity and limited "features" that we aren't using to the repo.

    We can always add these back in the future (either in this repo or another) if there is a motivating use case. Removed features:

    • Offline rendering (HTMLRenderer loads assets from https://unpkg.com)
    • Embeded PNG in jupyter notebooks. The default HTMLRender cannot save a static image to the jupyter. I think the benefit of a no-install renderer out-weighs the feature of embedding a JPEG in notebooks.
    • GoslingWidget - the widget has no immediate utility. If we want to explore a deeper integration with gosling.js API we can in the future when the use case is better understood.
    opened by manzt 4
  • feat: add `gosling.datasets`

    feat: add `gosling.datasets`

    It might be nice to add some convenience exports for reusable example datasets for gosling. This could remove some of the boilerplate in the examples for:

    import gosling as gos
    - from gosling.data import multivec
    + from gosling.datasets import cistrome_multivec
    
    
    - data = multivec(
    -    url="https://server.gosling-lang.org/api/v1/tileset_info/?d=cistrome-multivec",
    -    row="sample",
    -    column="position",
    -    value="peak",
    -    categories=["sample 1", "sample 2", "sample 3", "sample 4"],
    -    binSize=5,
    - )
    - base_track = gos.Track(data, width=800, height=100)
    
    + base_track = gos.Track(cistrome_multivec, width=800, height=100)
    
    enhancement 
    opened by manzt 4
  • Support chromesizes for bigwig tile server in `data.vector`?

    Support chromesizes for bigwig tile server in `data.vector`?

    Do you by any chance know if a bigwig file can be used directly without any pre-aggregation using vector()? It looked so according to the HiGlass docs, but when I tested with a local bigwig file, it did not show any visual elements in the rendered track, having some error message.

    v = vector(
        "../data/Astrocytes-insertions_bin100_RIPnorm.bw",
        value="value",
        column="position",
        binSize=4
    )
    
    gos.Track(v).mark_bar(outlineWidth=0).encode(
        x="position:G",
        y=gos.Channel("value:Q"),
        color=gos.Channel("value:Q", legend=True),
        stroke=gos.value("black"),
        strokeWidth=gos.value(0.3),
    ).view() # Uncaught (in promise) TypeError: Cannot read property 'chromsizes' of null
    

    I wonder if chromsizes should be somehow provided before storing a bigwig file to the server.

    If a coordSystem is specified for the bigWig, but no chromsizes are found on the server, the import will fail.

    Originally posted by @sehilyi in https://github.com/manzt/gos/pull/39#discussion_r695857459

    bug 
    opened by manzt 4
  • chore: add chr prefix to examples

    chore: add chr prefix to examples

    Changed chromosome names in the examples to use chr prefix (e.g., 1 --> chr1) since the next version of Gosling will let ones use exact chromosome names (https://github.com/gosling-lang/gosling.js/pull/796).

    opened by sehilyi 3
  • User guide on exporting plots

    User guide on exporting plots

    We have the .save() function explained on the doc under API Reference, but it is quite hidden to users and does not provide detailed explanations and examples. I think we need to add a dedicated page or section that describes ways to export Gos plots (e.g., html, json). Maybe we can add a section near the Local Data?:

    Screen Shot 2022-08-17 at 09 18 12

    Related tweet: https://twitter.com/sminot/status/1559746056669126657?s=21&t=yCLH8vli7kbjDf8foZhy5g

    documentation 
    opened by sehilyi 3
  • Plotting Single Co-ordinate BED file on custom assembly

    Plotting Single Co-ordinate BED file on custom assembly

    Hi,

    First, thanks for the great work! I am really liking the layout of Gosling so far!

    I am having an issue plotting some data. I have a microbial assembly with two chromosomes and a tsv file with three columns: chromosome, position, value (the file does not have a header). I want to plot the values along each chromosome, so have code like this:

    data = gos.csv(
        "../data/06_coverage/23b_coverage.bed",
        separator="\t",
        headerNames=["chrom", "position", "value"],
        chromosomeField="chrom",
        genomicFields=["position"]
    )
    
    gos.Track(data).mark_bar().encode(
        x=gos.X("position:G"),
        y=gos.Y("value:Q", axis="left"),
    ).view(
        assembly=[
            ("NC_012345.1", 1_234_567),
            ("NC_012346.1", 45000)
        ]
    )
    

    When I run the plot, I just get a blank canvas with the chromosome labels/positions along the top, but no data is showing up. I am not sure what I am missing here. Does the file have to have two coordinates for each feature?

    Thanks in advance for the help! I hope this is not something obvious that I am missing.

    Keep up the great work! Excited to see this tool continue to develop!

    opened by Gr1m3y 2
  • visibility_lt (or other visibility functions) will not accept

    visibility_lt (or other visibility functions) will not accept "zoomLevel" as a "measure" argument

    In the gosling website, there is an example of setting up visibility based on zoomLevel instead of width|height. If I try to set the visibility by passing zoomLevel to the measure parameter, I get schema validation error.

    seq_info = seq_track.mark_text(
    ).encode(
        text=gos.Text("seq:N"),
        color=gos.value("black"),
        stroke=gos.value("white"),
        strokeWidth=gos.value(3),
        x=gos.X("start:G", linkingId='detail-1'),
        xe="end:G",
        row=gos.value(80)
    ).visibility_le(
        target="mark",
        measure="zoomLevel",
        threshold=10000,
        transitionPadding=5,
    )
    

    The code above gives:

    SchemaValidationError: Invalid specification
    
            gosling.schema.core.VisibilityCondition->0->measure, validating 'enum'
    
            'zoomLevel' is not one of ['width', 'height']
    

    However, I can pass width as a measure which passes the schema validation, then manually edit the json to zoomLevel and it will give me the exact behavior I am expecting i.e. mark appears when the zoomLevel is at < 10,000 bp. It would be nice to be able to use the function to set measure as zoomLevel instead of setting it up manually.

    opened by sapoudel 2
  • Support for Ibisdata

    Support for Ibisdata

    Just learned of gosling today via twitter. I used to work at heavy.ai on large-scale dataviz and have been pretty close to the Vega/VegaLite ecosystem for a few years, including collaborations with that team.

    First of all, wanted to say that this is a FANTASTIC project - kudos to everyone involved for a truly next-gen way of taking genomics data and turning it into a comprehensive set of visualizations. I've always felt that this area has been underserved and you've all done a tremendous job!

    I work at sneller.io now - which may be of some interest. We allow for SQL on very large-scale JSON collections - see here. It may be of interest if you have data collections of this sort.

    Next - One of the projects I worked on in the past is IbisData - This allows you to use SQL databases/DWs as backends in a pythonic data workflow without needing SQL directly. It might be interesting to explore how to back gosling/gos with Ibis so you can put large genomic data sets in central storage and use Ibis to only pull in the data needed on demand.

    Hope this helps - again, fantastic work and will stay close to this!

    opened by venkat-sneller 3
  • feat: (responsive) defaults (e.g. width, height) when not explicitly provided

    feat: (responsive) defaults (e.g. width, height) when not explicitly provided

    I think we can at least make the width and height optional so that users do not have to specify them when users use responsiveSize. But, yes, it is somewhat challenging to provide non-constant optimal width and height to individual tracks when the track/view structure becomes complicated.

    Also, I was thinking that the appearance of some visual elements can be adaptive by default depending on the assigned size (e.g., hiding a color legend when the track height is too narrow). Another thing that we can figure out to make the responsiveSize more useful.

    Originally posted by @sehilyi in https://github.com/gosling-lang/gos/issues/85#issuecomment-1021380417

    opened by manzt 0
  • race conditions with HTML renderer

    race conditions with HTML renderer

    There appear to be temperamental issues with loading the JS to power the gos visualization in Jupyter Notebooks. It is difficult to debug due to the browser cache and (hidden) state of saved jupyter notebooks as well.

    Steps to reproduce:

    • open a new blank notebook (jupyter notebook) (Python 3.9, Chrome 95)
    • load gos and execute a cell which renders a gos.View
    • re-run the same cell

    https://user-images.githubusercontent.com/24403730/140839455-c8a2337e-5568-4c50-86cd-1a43e9258a87.mov

    However, if a visualization is executed in a different cell, the expected rendering behavior is restored.

    https://user-images.githubusercontent.com/24403730/140839595-4a54d895-8ebf-4b62-b383-7f9cac1fa7f3.mov

    My best guess is that this is related to the custom JS loading code in gosling/display.py, but this is a requirement currently since some of our JS is incompatible with the global requirejs in Jupyter Notebooks.

    TL;DR - If you are having trouble with rendering, try executing a different cell and re-running the previous cell. This step only needs to be done once, and will likely occur naturally during a typical workflow. Hopefully we will have a better solution soon.

    bug 
    opened by manzt 0
Owner
Gosling
The data visualization grammar of scalable linked interactive nucleotide graphics. A project of the Gehlenborg Lab at @hms-dbmi.
Gosling
Declarative statistical visualization library for Python

Altair http://altair-viz.github.io Altair is a declarative statistical visualization library for Python. With Altair, you can spend more time understa

Altair 6.4k Feb 18, 2021
This is a super simple visualization toolbox (script) for transformer attention visualization ✌

Trans_attention_vis This is a super simple visualization toolbox (script) for transformer attention visualization ✌ 1. How to prepare your attention m

Mingyu Wang 3 Jul 9, 2022
Python histogram library - histograms as updateable, fully semantic objects with visualization tools. [P]ython [HYST]ograms.

physt P(i/y)thon h(i/y)stograms. Inspired (and based on) numpy.histogram, but designed for humans(TM) on steroids(TM). The goal is to unify different

Jan Pipek 120 Dec 8, 2022
High-level geospatial data visualization library for Python.

geoplot: geospatial data visualization geoplot is a high-level Python geospatial plotting library. It's an extension to cartopy and matplotlib which m

Aleksey Bilogur 1k Jan 1, 2023
FURY - A software library for scientific visualization in Python

Free Unified Rendering in Python A software library for scientific visualization in Python. General Information • Key Features • Installation • How to

null 169 Dec 21, 2022
Streamlit component for Let's-Plot visualization library

streamlit-letsplot This is a work-in-progress, providing a convenience function to plot charts from the Lets-Plot visualization library. Example usage

Randy Zwitch 9 Nov 3, 2022
Visualization Library

CamViz Overview // Installation // Demos // License Overview CamViz is a visualization library developed by the TRI-ML team with the goal of providing

Toyota Research Institute - Machine Learning 67 Nov 24, 2022
Interactive Data Visualization in the browser, from Python

Bokeh is an interactive visualization library for modern web browsers. It provides elegant, concise construction of versatile graphics, and affords hi

Bokeh 17.1k Dec 31, 2022
Debugging, monitoring and visualization for Python Machine Learning and Data Science

Welcome to TensorWatch TensorWatch is a debugging and visualization tool designed for data science, deep learning and reinforcement learning from Micr

Microsoft 3.3k Dec 27, 2022
Python script to generate a visualization of various sorting algorithms, image or video.

sorting_algo_visualizer Python script to generate a visualization of various sorting algorithms, image or video.

null 146 Nov 12, 2022
Interactive Data Visualization in the browser, from Python

Bokeh is an interactive visualization library for modern web browsers. It provides elegant, concise construction of versatile graphics, and affords hi

Bokeh 14.7k Feb 13, 2021
Missing data visualization module for Python.

missingno Messy datasets? Missing values? missingno provides a small toolset of flexible and easy-to-use missing data visualizations and utilities tha

Aleksey Bilogur 3.4k Dec 29, 2022
Interactive Data Visualization in the browser, from Python

Bokeh is an interactive visualization library for modern web browsers. It provides elegant, concise construction of versatile graphics, and affords hi

Bokeh 14.7k Feb 18, 2021
Missing data visualization module for Python.

missingno Messy datasets? Missing values? missingno provides a small toolset of flexible and easy-to-use missing data visualizations and utilities tha

Aleksey Bilogur 2.6k Feb 18, 2021
Python package for hypergraph analysis and visualization.

The HyperNetX library provides classes and methods for the analysis and visualization of complex network data. HyperNetX uses data structures designed to represent set systems containing nested data and/or multi-way relationships. The library generalizes traditional graph metrics to hypergraphs.

Pacific Northwest National Laboratory 304 Dec 27, 2022
A Python package that provides evaluation and visualization tools for the DexYCB dataset

DexYCB Toolkit DexYCB Toolkit is a Python package that provides evaluation and visualization tools for the DexYCB dataset. The dataset and results wer

NVIDIA Research Projects 107 Dec 26, 2022
Create a visualization for Trump's Tweeted Words Using Python

Data Trump's Tweeted Words This plot illustrates twitter word occurences. We already did the coding I needed for this plot, so I was very inspired to

null 7 Mar 27, 2022
Rick and Morty Data Visualization with python

Rick and Morty Data Visualization For this project I looked at data for the TV show Rick and Morty Number of Episodes at a Certain Location Here is th

null 7 Aug 29, 2022
Python Package for CanvasXpress JS Visualization Tools

CanvasXpress Python Library About CanvasXpress for Python CanvasXpress was developed as the core visualization component for bioinformatics and system

Dr. Todd C. Brett 5 Nov 7, 2022