Make sankey, alluvial and sankey bump plots in ggplot

Overview

ggsankey

The goal of ggsankey is to make beautiful sankey, alluvial and sankey bump plots in ggplot2

Installation

You can install the development version of ggsankey from github with:

# install.packages("devtools")
devtools::install_github("davidsjoberg/ggsankey")

How does it work

Google defines a sankey as:

A sankey diagram is a visualization used to depict a flow from one set of values to another. The things being connected are called nodes and the connections are called links. Sankeys are best used when you want to show a many-to-many mapping between two domains or multiple paths through a set of stages.

To plot a sankey diagram with ggsankey each observation has a stage (called a discrete x-value in ggplot) and be part of a node. Furthermore, each observation needs to have instructions of which node it will belong to in the next stage. See the image below for some clarification.

Hence, to use geom_sankey the aestethics x, next_x, node and next_node are required. The last stage should point to NA. The aestethics fill and color will affect both nodes and flows.

To controll geometries (not changed by data) like fill, color, size, alpha etc for nodes and flows you can either choose to set a global value that affect both, or you can specify which one you want to alter. For example node.color = 'black' will only draw a black line around the nodes, but not the flows (links).

Example

geom_sankey

A basic sankey plot that shows how dimensions are linked.

library(ggsankey)
library(dplyr)
library(ggplot2)

df <- mtcars %>%
  make_long(cyl, vs, am, gear, carb)

ggplot(df, aes(x = x, 
               next_x = next_x, 
               node = node, 
               next_node = next_node,
               fill = factor(node))) +
  geom_sankey()

And by adding a little pimp.

  • Labels with geom_sankey_label which places labels in the center of nodes if given the same aestethics.

  • ggsankey also comes with custom minimalistic themes that can be used. Here I use theme_sankey.

ggplot(df, aes(x = x, next_x = next_x, node = node, next_node = next_node, fill = factor(node), label = node)) +
  geom_sankey(flow.alpha = .6,
              node.color = "gray30") +
  geom_sankey_label(size = 3, color = "white", fill = "gray40") +
  scale_fill_viridis_d() +
  theme_sankey(base_size = 18) +
  labs(x = NULL) +
  theme(legend.position = "none",
        plot.title = element_text(hjust = .5)) +
  ggtitle("Car features")

geom_alluvial

Alluvial plots are very similiar to sankey plots but have no spaces between nodes and start at y = 0 instead being centered around the x-axis.

ggplot(df, aes(x = x, next_x = next_x, node = node, next_node = next_node, fill = factor(node), label = node)) +
  geom_alluvial(flow.alpha = .6) +
  geom_alluvial_text(size = 3, color = "white") +
  scale_fill_viridis_d() +
  theme_alluvial(base_size = 18) +
  labs(x = NULL) +
  theme(legend.position = "none",
        plot.title = element_text(hjust = .5)) +
  ggtitle("Car features")

geom_sankey_bump

Sankey bump plots is mix between bump plots and sankey and mostly useful for time series. When a group becomes larger than another it bumps above it.

# install.packages("gapminder")
library(gapminder)

df <- gapminder %>%
  group_by(continent, year) %>%
  summarise(gdp = (sum_(pop * gdpPercap)/1e9) %>% round(0), .groups = "keep") %>%
  ungroup()

ggplot(df, aes(x = year,
               node = continent,
               fill = continent,
               value = gdp)) +
  geom_sankey_bump(space = 0, type = "alluvial", color = "transparent", smooth = 6) +
  scale_fill_viridis_d(option = "A", alpha = .8) +
  theme_sankey_bump(base_size = 16) +
  labs(x = NULL,
       y = "GDP ($ bn)",
       fill = NULL,
       color = NULL) +
  theme(legend.position = "bottom") +
  labs(title = "GDP development per continent")

Comments
  • size of the flow

    size of the flow

    Is there a way in geom_sankey to specify an aesthetic that provides directly the size of the flow, i.e. the number of connections between the nodes?

    For example:

    df <- data.frame(expand.grid(LETTERS[1:3],LETTERS[1:3]))
    df$N <- sample(1:10,size = nrow(df),replace = T)
    

    I would like something like

    df %>%
    make_long(Var1, Var2)%>%
      ggplot( aes(x = x, 
                     next_x = next_x, 
                     node = node, 
                     next_node = next_node,
                     fill = factor(node))) +
      geom_sankey()
    

    image

    But with the flows given by N. A hack would be to repeat each row by N before the make_long, but I am sure there is a proper way.

    opened by dmongin 2
  • "sum_" function missing for geom_sankey_bump()

    Following the sum_ issue reported in #1, I have the same issue with the example provided in the readme.

    I tried:

    library(ggsankey)
    library(dplyr)
    #> 
    #> Attaching package: 'dplyr'
    #> The following objects are masked from 'package:stats':
    #> 
    #>     filter, lag
    #> The following objects are masked from 'package:base':
    #> 
    #>     intersect, setdiff, setequal, union
    library(ggplot2)
    library(gapminder)
    
    df <- gapminder %>%
        group_by(continent, year) %>%
        summarise(gdp = (sum_(pop * gdpPercap)/1e9) %>% round(0), .groups = "keep") %>%
        ungroup()
    #> Error: Problem with `summarise()` input `gdp`.
    #> x could not find function "sum_"
    #> ℹ Input `gdp` is `(sum_(pop * gdpPercap)/1e+09) %>% round(0)`.
    #> ℹ The error occurred in group 1: continent = "Africa", year = 1952.
    
    ggplot(df, aes(x = year,
                   node = continent,
                   fill = continent,
                   value = gdp)) +
        geom_sankey_bump(space = 0, type = "alluvial", color = "transparent", smooth = 6) +
        scale_fill_viridis_d(option = "A", alpha = .8) +
        theme_sankey_bump(base_size = 16) +
        labs(x = NULL,
             y = "GDP ($ bn)",
             fill = NULL,
             color = NULL) +
        theme(legend.position = "bottom") +
        labs(title = "GDP development per continent")
    #> Error:   You're passing a function as global data.
    #>   Have you misspelled the `data` argument in `ggplot()`
    

    Created on 2021-04-03 by the reprex package (v1.0.0)

    I also tried removing sum_ and replacing with sum when writing to the variable df, but I also had no luck.

    See here:

    df <- gapminder %>%
      group_by(continent, year) %>%
      summarise(gdp = (sum(pop * gdpPercap)/1e9) %>% round(0), .groups = "keep") %>%
      ungroup()
    
    ggplot(df, aes(x = year,
                   node = continent,
                   fill = continent,
                   value = gdp)) +
      geom_sankey_bump(space = 0, type = "alluvial", color = "transparent", smooth = 6) +
      scale_fill_viridis_d(option = "A", alpha = .8) +
      theme_sankey_bump(base_size = 16) +
      labs(x = NULL,
           y = "GDP ($ bn)",
           fill = NULL,
           color = NULL) +
      theme(legend.position = "bottom") +
      labs(title = "GDP development per continent")
    

    My console error is different than reprex's for some reason; this is my console error:

    Error: Problem with `summarise()` input `flow_freq`.
    x could not find function "sum_"
    ℹ Input `flow_freq` is `sum_(value)`.
    ℹ The error occurred in group 1: n_x = 1952, node = "Oceania 1952", n_next_x = 1957, next_node = "Oceania 1957".
    
    opened by engineerchange 2
  • Using ggsankey in Shiny

    Using ggsankey in Shiny

    It seems I cannot use ggsankey with my Shiny app.

    I did a test app to see if it worked, but I get an error from toJSON about a named vector. I made a simple selector to use a database extract as the base dataframe for the sankey plot. On initialisation, it works, but after switching to another df, I get the error.

    Sadly, as an R beginner, I cannot ascertain that I am not at the origin of the issue.

    opened by c3-rkieffer 1
  • ggsankey vs ggalluvial

    ggsankey vs ggalluvial

    Hi- I just discovered the existence of Sankey plots (or rather, that such things had a name and could be done in R...).

    I found your package and ggalluvial, which seems to pre-date ggsankey. Can you comment on the pros and cons of ggsankey? Both packages seem pretty good at a first glance. Thanks!

    opened by dariober 1
  • Suggest license

    Suggest license

    Thanks for this package! It's really cool. But I saw that under license, you do not have any license. Legally this means no one can use or modify it. Can you add a license?

    For more on this you can read this page.

    opened by GaborioSensata 1
  • could not find function

    could not find function "sum_" with geom_sankey_bump()

    I get this error when I run the example for geom_sankey_bump()

    Problem with `summarise()` input `gdp`.
    x could not find function "sum_"
    ℹ Input `gdp` is `(sum_(pop * gdpPercap)/1e+09) %>% round(0)`.
    ℹ The error occurred in group 1: continent = "Africa", year = 1952.
    Backtrace:
      1. base::source("~/.active-rstudio-document", echo = TRUE)
     13. base::.handleSimpleError(...)
     14. dplyr:::h(simpleError(msg, call))
    Run `rlang::last_trace()` to see the full context.
    

    geom_sankey() works like a charm, by the way! 😀

    opened by gkaramanis 1
  • Flow.fill isn't working

    Flow.fill isn't working

    I have a sankey chart with 21 nodes, and I'm trying to fill the flows one of three colors, but flow.fill isn't working. Is there more documentation on how it works?

    opened by bdidds2 2
  • Labels always get misaligned

    Labels always get misaligned

    Labels always get misaligned. They don't seem to follow any logical rule. Using ggsankey on a W10 laptop with R version 4.2.1. Output was generated with pdf() since there is no proper antialiasing in the png() output (not an issue of ggsankey but of the OS), and some labels get cropped for being so far from the diagram. The package is excellent however! ggsankey

    opened by gluijk 3
  • How to skip nodes with NA value in ggsankey?

    How to skip nodes with NA value in ggsankey?

    Suppose I have this dataset (the actual dataset has 30+ columns and thousands of ids)

    	df <- data. Frame(id = 1:5,
    				admission = c("Severe", "Mild", "Mild", "Moderate", "Severe"),
    				d1 = c(NA, "Moderate", "Mild", "Moderate", "Severe"),
    				d2 = c(NA, "Moderate", "Mild", "Mild", "Moderate"),
    				d3 = c(NA, "Severe", "Mild", "Mild", "Severe"),
    				d4 = c(NA, NA, "Mild", "Mild", NA),
    				outcome = c("Dead", "Dead", "Alive", "Alive", "Dead"))
    

    I want to make a Sankey diagram that illustrates the daily severity of the patients over time. However, when the observation reaches NA (means that an outcome has been reached), I want the node to directly link to the outcome.

    This is how the diagram should look like: [enter image description here]1

    Image fetched from the question asked by @qdread here

    Is this possible with ggsankey?

    This is my current code:

    df.sankey <- df %>%
    	make_long(admission, d1, d2, d3, d4, outcome)
    ggplot(df.sankey, aes(x = x,
    					 next_x = next_x,
    					 node = node,
    					 next_node = next_node,
    					 fill = factor(node),
    					 label = node)) +
    	geom_sankey(flow.alpha = 0.5,
    				node.color = NA,
    				show.legend = TRUE) +
    	geom_sankey_text(size = 3, color = "black", fill = NA, hjust = 0, position = position_nudge(x = 0.1))
    

    Which results in this diagram: [enter image description here]3

    Thanks in advance for the help.

    opened by gilbertlzrus 0
  • missing dplyr:: call

    missing dplyr:: call

    in sankey.R, in the function StatSankeyFlow (line 228) is summarise(flow_freq = dplyr::n(), .groups = "keep") which is missing the explicit reference to dplyr.

    opened by ulysses-sr 0
  • how can I joint ggsankey and a dotplot?

    how can I joint ggsankey and a dotplot?

    Hi:

    I put it together myself. The coordinates don't match:

    image

    This is what I'm looking for:

    image

    my code:

    library(ggplot2)
    library(ggsankey)
    library(dplyr)
    pl <- ggplot(dat3, aes(x = x, 
                           next_x = next_x,
                           node = node, 
                           next_node = next_node,
                           fill = factor(node),
                           label = node2
                           )) +
      geom_sankey(flow.alpha = 0.5, node.color = "black") +
      geom_sankey_label(size = 6, color = "black", fill = "white", hjust = 1, family = "Times") +
      scale_fill_viridis_d(option = "magma") +
      theme_sankey(base_size = 16) +
      scale_x_discrete(expand = c(0.01,0.1)) +
      theme(legend.position = "none",
            axis.title = element_blank(),
            axis.text = element_blank())
    pl
    
    library(clusterProfiler)
    kk_dot <- dotplot(kk, showCategory=10) +
      theme(text = element_text(family = "Times"),
            axis.text.y = element_text(size = 12, face = "bold"),
            axis.text.x = element_text(size = 10, face = "bold"),
            axis.title.x = element_text(size = 14, face = "bold"),
            legend.title = element_text(face = "bold"))
    kk_dot
    kk_dot2 <- kk_dot + theme(axis.text.y = element_blank(),
                              axis.ticks.y = element_blank())
    library(patchwork)
    design <- c("
                AAAA#
                AAAAB
                AAAAB
                AAAAB
                AAAA#
                ")
    all_p <- pl + kk_dot2 + theme(text = element_text(size = 20), 
                                  axis.title.x = element_text(size = 25),
                                  axis.text.x = element_text(size = 20)) +
      plot_layout(design = design)
    all_p
    

    Looking forward to your reply!

    opened by Sagityq 1
Owner
David Sjoberg
Happy R user. Twitter: @davsjob
David Sjoberg
Python scripts to manage Chia plots and drive space, providing full reports. Also monitors the number of chia coins you have.

Chia Plot, Drive Manager & Coin Monitor (V0.5 - April 20th, 2021) Multi Server Chia Plot and Drive Management Solution Be sure to ⭐ my repo so you can

null 338 Nov 25, 2022
Plot, scatter plots and histograms in the terminal using braille dots

Plot, scatter plots and histograms in the terminal using braille dots, with (almost) no dependancies. Plot with color or make complex figures - similar to a very small sibling to matplotlib. Or use the canvas to plot dots and lines yourself.

Tammo Ippen 207 Dec 30, 2022
Standardized plots and visualizations in Python

Standardized plots and visualizations in Python pltviz is a Python package for standardized visualization. Routine and novel plotting approaches are f

Andrew Tavis McAllister 0 Jul 9, 2022
A python package for animating plots build on matplotlib.

animatplot A python package for making interactive as well as animated plots with matplotlib. Requires Python >= 3.5 Matplotlib >= 2.2 (because slider

Tyler Makaro 394 Dec 18, 2022
A python package for animating plots build on matplotlib.

animatplot A python package for making interactive as well as animated plots with matplotlib. Requires Python >= 3.5 Matplotlib >= 2.2 (because slider

Tyler Makaro 356 Feb 16, 2021
Painlessly create beautiful matplotlib plots.

Announcement Thank you to everyone who has used prettyplotlib and made it what it is today! Unfortunately, I no longer have the bandwidth to maintain

Olga Botvinnik 1.6k Jan 6, 2023
Easily convert matplotlib plots from Python into interactive Leaflet web maps.

mplleaflet mplleaflet is a Python library that converts a matplotlib plot into a webpage containing a pannable, zoomable Leaflet map. It can also embe

Jacob Wasserman 502 Dec 28, 2022
Example scripts for generating plots of Bohemian matrices

Bohemian Eigenvalue Plotting Examples This repository contains examples of generating plots of Bohemian eigenvalues. The examples in this repository a

Bohemian Matrices 5 Nov 12, 2022
Moscow DEG 2021 elections plots

Построение графиков на основе публичных данных о ДЭГ в Москве в 2021г. Описание Скрипты в данном репозитории позволяют собственноручно построить графи

null 9 Jul 15, 2022
This plugin plots the time you spent on a tag as a histogram.

This plugin plots the time you spent on a tag as a histogram.

Tom Dörr 7 Sep 9, 2022
A minimal Python package that produces slice plots through h5m DAGMC geometry files

A minimal Python package that produces slice plots through h5m DAGMC geometry files Installation pip install dagmc_geometry_slice_plotter Python API U

Fusion Energy 4 Dec 2, 2022
MPL Plotter is a Matplotlib based Python plotting library built with the goal of delivering publication-quality plots concisely.

MPL Plotter is a Matplotlib based Python plotting library built with the goal of delivering publication-quality plots concisely.

Antonio López Rivera 162 Nov 11, 2022
Generate "Jupiter" plots for circular genomes

jupiter Generate "Jupiter" plots for circular genomes Description Python scripts to generate plots from ViennaRNA output. Written in "pidgin" python w

Robert Edgar 2 Nov 29, 2021
A Python function that makes flower plots.

Flower plot A Python 3.9+ function that makes flower plots. Installation This package requires at least Python 3.9. pip install

Thomas Roder 4 Jun 12, 2022
YOPO is an interactive dashboard which generates various standard plots.

YOPO is an interactive dashboard which generates various standard plots.you can create various graphs and charts with a click of a button. This tool uses Dash and Flask in backend.

ADARSH C 38 Dec 20, 2022
The plottify package is makes matplotlib plots more legible

plottify The plottify package is makes matplotlib plots more legible. It's a thin wrapper around matplotlib that automatically adjusts font sizes, sca

Andy Jones 97 Nov 4, 2022
This component provides a wrapper to display SHAP plots in Streamlit.

streamlit-shap This component provides a wrapper to display SHAP plots in Streamlit.

Snehan Kekre 30 Dec 10, 2022
Shaded 😎 quantile plots

shadyquant ?? This python package allows you to quantile and plot lines where you have multiple samples, typically for visualizing uncertainty. Your d

Mehrad Ansari 13 Sep 29, 2022
🧇 Make Waffle Charts in Python.

PyWaffle PyWaffle is an open source, MIT-licensed Python package for plotting waffle charts. It provides a Figure constructor class Waffle, which coul

Guangyang Li 528 Jan 2, 2023