Make dbt docs and Apache Superset talk to one another

Overview

dbt-superset-lineage

License: MIT PyPI GitHub last commit PyPI - Python Version PyPI - Format

dbt-superset-lineage

Make dbt docs and Apache Superset talk to one another

Why do I need something like this?

Odds are rather high that you use dbt together with a visualisation tool. If so, these questions might have popped into your head time to time:

  • "Could I get rid of this model? Does it get used for some dashboards? And in which ones, if yes?"
  • "It would be so handy to see all these well-maintained column descriptions when exploring and creating charts."

In case your visualisation tool of choice is Supserset, you are in luck!

Using dbt-superset-lineage, you can:

  • Add dependencies of Superset dashboards to your dbt sources and models
  • Sync column descriptions from dbt docs to Superset

This will help you:

  • Avoid broken dashboards because of deprecated or changed models
  • Choosing the right attributes without navigating back and forth between chart and documentation

Installation

pip install dbt-superset-lineage

Usage

dbt-superset-lineage comes with two basic commands: pull-dashboards and push-descriptions. The documentation for the individual commands can be shown by using the --help option.

It includes a wrapper for Superset API, one only needs to provide SUPERSET_ACCESS_TOKEN/SUPERSET_REFRESH_TOKEN (obtained via /security/login) as environment variable or through --superset-access-token/superset-refresh-token option.

N.B.

  • Make sure to run dbt compile (or dbt run) against the production profile, not your development profile
  • In case more databases are used within dbt and/or Superset and there are duplicate names (schema + table) across them, specify the database through --dbt-db-name and/or --superset-db-id options
  • Currently, PUT requests are only supported if CSRF tokens are disabled in Superset (WTF_CSRF_ENABLED=False).
  • Tested on dbt v0.20.0 and Apache Superset v1.3.0. Other versions, esp. those newer of Superset, might face errors due to different underlying code and API.

Pull dashboards

Pull dashboards from Superset and add them as exposures to dbt docs with references to dbt sources and models, making them visible both separately and as dependencies.

N.B.

  • Only published dashboards are extracted.
$ cd jaffle_shop
$ dbt compile  # Compile project to create manifest.json
$ export SUPERSET_ACCESS_TOKEN=<TOKEN>
$ dbt-superset-lineage pull-dashboards https://mysuperset.mycompany.com  # Pull dashboards from Superset to /models/exposures/superset_dashboards.yml
$ dbt docs generate # Generate dbt docs
$ dbt docs serve # Serve dbt docs

Separate exposure in dbt docs

Referenced exposure in dbt docs

Push descriptions

Push column descriptions from your dbt docs to Superset as plain text so that they could be viewed in Superset when creating charts.

N.B.:

  • Run carefully as this rewrites your datasets using merged column metadata from Superset and dbt docs.
  • Descriptions are rendered as plain text, hence no markdown syntax, incl. links, will be displayed.
  • Avoid special characters and strings in your dbt docs, e.g. or <null>.
$ cd jaffle_shop
$ dbt compile  # Compile project to create manifest.json
$ export SUPERSET_ACCESS_TOKEN=<TOKEN>
$ dbt-superset-lineage push-descriptions https://mysuperset.mycompany.com  # Push descrptions from dbt docs to Superset

Column descriptions in Superset

License

Licensed under the MIT license (see LICENSE.md file for more details).

Comments
  • Update needed on sqlfluff version

    Update needed on sqlfluff version

    Because dbt-superset-lineage (0.2.1) depends on sqlfluff (>=0.8.2,<0.9.0) and no versions of dbt-superset-lineage match >0.2.1,<0.3.0, dbt-superset-lineage (>=0.2.1,<0.3.0) requires sqlfluff (>=0.8.2,<0.9.0). So, because seed depends on both dbt-superset-lineage (^0.2.1) and sqlfluff (^1.4.1), version solving failed.

    opened by MuslimBeibytuly 3
  • Feature Update

    Feature Update

    Problems:

    1. We wanted to pull dashboards by name and not simply by published status
    2. When pushing, we wanted to specify the database by name.
    3. When pushing, we wanted to specify the schema name in case of inter-schema table name overlap.

    Solutions:

    1. Added: --superset-dashboard-name as a option for pulling. It can be daisy chained by using --superset-dashboard-name 'Name One' --superset-dashboard-name 'Name 2'.
    2. Added: --superset-db-name to push command
    3. Added: --superset-schema-name to push command

    Updated the docs to reflect these changes.

    opened by MarcinZegar 3
  • change: Split `__init__.py` to multiple files

    change: Split `__init__.py` to multiple files

    In preparation for a second command and to allow for easier orientation between the two, each Typer command should be put to a separate .py file.

    Note that this is to be merged to #1.

    opened by one-data-cookie 3
  • 400 Bad Request: The CSRF token is missing

    400 Bad Request: The CSRF token is missing

    Hi, is this repo still actively maintained. I have been testing this on a Docker deployment of Superset and unable to disable CSRF tokens. there are a number of threads on this which seem to indicate this is a common issue. Is there a way to ensure this library works for POST commands with SS or any specific steps to disable CSRF.

    opened by ahsanshah 1
  • add: `push-descriptions` command

    add: `push-descriptions` command

    Created based on internal script.

    Changes

    • add new dependencies through Poetry that amended poetry.lock and pyproject.toml
    • add new command into __init__.py
    • add source code to push_descriptions.py
    • release as v0.2.0

    Tests

    • installed via Poetry
    • ran on Superset staging with the same result as from internal script

    Next steps

    • release (tag on Github and publish on PyPI)
    • add docs as v0.2.1
    opened by one-data-cookie 1
  • update: Replace `catalog.json` with `manifest.json`

    update: Replace `catalog.json` with `manifest.json`

    As generating docs through dbt docs generate often takes a while, it is less economical to use catalog.json than manifest.json that gets created with every compile (dbt compile or as a part of dbt run or dbt docs generate). This could be done because we don't need information about the current state of database, only what's documented in dbt, because we can only create refs to these tables anyway.

    Tested by running the code – resulted in the same output as the code on main.

    opened by one-data-cookie 1
  • change: `superset_domain` to `superset_url`

    change: `superset_domain` to `superset_url`

    Why

    Asking users for superset_domain did not allow for using HTTP protocol because HTTPS protocol was then hard-coded into the full URL address for API calls. That, however, disallowed for easy run on local (using HTTP).

    Changes

    • amend in URL for API calls and dashboard links
    • rename everywhere
    • bump version to 0.1.1

    Notes

    • #2 to be merged before merging to main
    opened by one-data-cookie 1
  • refactor: URL params in Dashboard listing

    refactor: URL params in Dashboard listing

    • Refactor the way URL params are being passed to requests in Dashboard listing in pull_dashboards.py, as the previous version has been susceptible to encoding issues.

    Signed-off-by: mrshu [email protected]

    opened by mrshu 1
  • Pull dashboards has old API calls

    Pull dashboards has old API calls

    Hey guys, I adapted the pull_dashboards.py file to use it as, when trying to run the code out of the box, I was not getting the dashboards' table names.

    Out of my interpretation GET /dashboard/id doesn't return "table_name" anymore and it was replaced by a similar command I found (GET /dashboard/id/datasets).

    Let me know how this sounds to you, I am happy to contribute with my minor edits. Best, Agus

    opened by agusfigueroa-htg 0
Owner
Slido
Slido is an audience interaction platform for meetings and events. Public repositories
Slido
Ahmed Hossam 12 Oct 17, 2022
Superset custom path for python

It is a common requirement to have superset running under a base url, (https://mydomain.at/analytics/ instead of https://mydomain.at/). I created the

null 9 Dec 14, 2022
A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!

Streamify A data pipeline with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more! Description Objective The project will stre

Ankur Chavda 206 Dec 30, 2022
:fishing_pole_and_fish: List of `pre-commit` hooks to ensure the quality of your `dbt` projects.

pre-commit-dbt List of pre-commit hooks to ensure the quality of your dbt projects. BETA NOTICE: This tool is still BETA and may have some bugs, so pl

Offbi 262 Nov 25, 2022
dbt adapter for Firebolt

dbt-firebolt dbt adapter for Firebolt dbt-firebolt supports dbt 0.21 and newer Installation First, download the JDBC driver and place it wherever you'

null 23 Dec 14, 2022
An Airflow operator to call the main function from the dbt-core Python package

airflow-dbt-python An Airflow operator to call the main function from the dbt-core Python package Motivation Airflow running in a managed environment

Tomás Farías Santana 93 Jan 8, 2023
Model synchronization from dbt to Metabase.

dbt-metabase Model synchronization from dbt to Metabase. If dbt is your source of truth for database schemas and you use Metabase as your analytics to

Mike Gouline 270 Jan 8, 2023
dbt (data build tool) adapter for Oracle Autonomous Database

dbt-oracle version 1.0.0 dbt (data build tool) adapter for the Oracle database. dbt "adapters" are responsible for adapting dbt's functionality to a g

Oracle 22 Nov 15, 2022
NewsBlur is a personal news reader bringing people together to talk about the world.

NewsBlur NewsBlur is a personal news reader bringing people together to talk about the world.

Samuel Clay 6.2k Dec 29, 2022
Demo repository for Saltconf21 talk - Testing strategies for Salt states

Saltconf21 testing strategies Demonstration repository for my Saltconf21 talk "Strategies for testing Salt states" Talk recording Slides and demos Get

Barney Sowood 3 Mar 31, 2022
A web-based chat application that enables multiple users to interact with one another

A web-based chat application that enables multiple users to interact with one another, in the same chat room or different ones according to their choosing.

null 3 Apr 22, 2022
Tips that improve your life in one way or another

Tips that improve your life in one way or another. This software downloads life tips from reddit.com/r/LifeProTips and tweet the most upvoted tips on Twitter.

Burak Tokman 2 Aug 4, 2022
Is a util for xferring skinning from one mesh to another

maya_pythonplugins skinTo: Is a util for xferring skinning from one mesh to another args: :param maxInfluences: is the number of max influences on the

James Dunlop 2 Jan 24, 2022
Retrying is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just about anything.

Retrying Retrying is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just

Ray Holder 1.9k Dec 29, 2022
Project repository of Apache Airflow, deployed on Docker in Amazon EC2 via GitLab.

Airflow on Docker in EC2 + GitLab's CI/CD Personal project for simple data pipeline using Airflow. Airflow will be installed inside Docker container,

Ammar Chalifah 13 Nov 29, 2022
A tutorial presents several practical examples of how to build DAGs in Apache Airflow

Apache Airflow - Python Brasil 2021 Este tutorial apresenta vários exemplos práticos de como construir DAGs no Apache Airflow. Background Apache Airfl

Jusbrasil 14 Jun 3, 2022
The Python agent for Apache SkyWalking

SkyWalking Python Agent SkyWalking-Python: The Python Agent for Apache SkyWalking, which provides the native tracing abilities for Python project. Sky

The Apache Software Foundation 149 Dec 12, 2022
Python-geoarrow - Storing geometry data in Apache Arrow format

geoarrow Storing geometry data in Apache Arrow format Installation $ pip install

Joris Van den Bossche 11 Mar 3, 2022
Simple plug-and-play installer for users who want to LineageOS from stock firmware, or from another custom ROM.

LineageOS for the Teracube 2e Simple plug-and-play installer for users who want to LineageOS from stock firmware, or from another custom ROM. Dependen

Gagan Malvi 5 Mar 31, 2022