Centralized whale instance using github actions, sourcing metadata from bigquery-public-data.

Hyperquery

Last update: Dec 14, 2022

Related tags

Third-party APIs Wrappers whale-bigquery-public-data

Overview

Whale Demo Instance: Bigquery Public Data

This is a fully-functioning demo instance of the whale data catalog, actively scraping data from Bigquery's public project bigquery-public-data using github actions.

To test out this repo with your own local installation of whale (i.e. to emulate what it'd be like to set up whale on github for your own team), clone the repo to your ~/.whale directory (if you already have a ~/.whale directory, move it or delete it with rm -rf ~/.whale or the clone won't work):

git clone https://github.com/dataframehq/whale-bigquery-public-data ~/.whale

Then install whale and run the following commands, following the prompts:

wh git-enable
wh schedule

At this point, if you run wh pull, it should run a git pull --autostash --rebase against this repo (meaning any locally scheduled cron jobs will simply pull down fresh metadata from this repo, rather than scraping directly from Bigquery).

For more information on how to set this up for your own warehouse, see the docs.

FAQ

Why doesn't `wh.pull()` work locally?

While wh pull (the CLI hook) will check for a flag in wh config and act appropriately (sourcing from github if is_git_etl_enabled is True, and from the connections in wh connections if not), wh.pull() (the python hook) performs no such check. This is by design, to ensure the remote repository's associated CI/CD pipelines to pull down data directly from the metadata source, by default (we suspect most people do not want to refresh metadata using the python client, but feel free to open an issue if you disagree).

Locally, this will fail, unless you modify the key_path value specified in wh connections to some credentials you have stored locally. If you choose to do this, ensure you have the following permissions enabled in the associated service account: BigQuery Data Viewer, BigQuery Job User, BigQuery Metadata Viewer.

A GitHub Actions repo for tracking the dummies sending free money to Alex Jones + co.

2 Jul 20, 2022

GitHub Actions Docker training

GitHub-Actions-Docker-training Training exercise repository for GitHub Actions using a docker base. This repository should be cloned and used for trai

1 Jan 21, 2022

A discord.py bot template with easy deployment through Github Actions

discord.py bot template A discord.py bot template with easy deployment through Github Actions. You can use this template to just run a Python instance

1 Feb 9, 2022

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/en-us/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.

Azure SDK for Python This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public

3.4k Jan 3, 2023

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/en-us/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.

Azure SDK for Python This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public

3.4k Jan 2, 2023

This is a scalable system that reads messages from public Telegram channels using Telethon and stores the data in a PostgreSQL database.

This is a scalable system that reads messages from public Telegram channels using Telethon and stores the data in a PostgreSQL database. Its original intention is to monitor cryptocurrency related channels, but it can be configured to read any Telegram data that is accessible through the API.

3 Jun 7, 2022

GitHub Activity Generator - A script that helps you instantly generate a beautiful GitHub Contributions Graph for the last year.

GitHub Activity Generator A script that helps you instantly generate a beautiful GitHub Contributions Graph for the last year. Before 😐 😶 😒 After ?

1 Dec 30, 2021

🤖 Automated follow/unfollow bot for GitHub. Uses GitHub API. Written in python.

GitHub Follow Bot Table of Contents Disclaimer How to Use Install requirements Authenticate Get a GitHub Personal Access Token Add your GitHub usernam

37 Dec 27, 2022

Github-Checker - Simple Tool To Check If Github User Available Or Not

Github Checker Simple Tool To Check If Github User Available Or Not Socials: Lan

7 Jan 30, 2022

Comments

README instructions do not explain credentials.json
I followed the instructions in the README and the result is:

❯ cat connections.yaml --- name: bq-1 metadata_source: Bigquery key_path: ~/credentials.json project_credentials: ~ project_id: bigquery-public-data

Then I ran:

❯ ipython Python 3.8.6 (default, Oct 27 2020, 08:56:44) Type 'copyright', 'credits' or 'license' for more information IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: import whale as wh ...: wh.pull()

As expected, this fails with FileNotFoundError: [Errno 2] No such file or directory: '/Users/manuelzander/credentials.json.

I was able to fix it by using a key file I created using my own GCP project. Is this expected from users? I would add some points to the README explaining how to create a credentials.json and which GCP roles and permissions are needed 😊
opened by manuelzander 2
fix: update github actions script to accommodate new make build structure
Due to changes in how the makefile build directory is structured (turns out virtual environments aren't very portable, so we installing the virtual env directly into the ~/.whale/libexec/env path instead), we adjust the code to source from the final destination in libexec instead (rather than the ./build directory, which is now deprecated).

This PR also includes a few QOL improvements:

Add comments and split up the git push step for easier grokking.

Migrate to python 3.8 (which is what we're testing on, anyways).

Directly invoke the github secret w/the BQ credentials, rather than specifying a env variable and referencing that indirectly.

These changes will be reflected in the docs as well, in the sample github actions code.
opened by rsyi 0

Centralized whale instance using github actions, sourcing metadata from bigquery-public-data.

Related tags

Overview

Whale Demo Instance: Bigquery Public Data

FAQ

Why doesn't `wh.pull()` work locally?

You might also like...

A GitHub Actions repo for tracking the dummies sending free money to Alex Jones + co.

GitHub Actions Docker training

A discord.py bot template with easy deployment through Github Actions

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/en-us/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/en-us/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.

This is a scalable system that reads messages from public Telegram channels using Telethon and stores the data in a PostgreSQL database.

GitHub Activity Generator - A script that helps you instantly generate a beautiful GitHub Contributions Graph for the last year.

🤖 Automated follow/unfollow bot for GitHub. Uses GitHub API. Written in python.

Github-Checker - Simple Tool To Check If Github User Available Or Not

Comments

README instructions do not explain credentials.json

fix: update github actions script to accommodate new make build structure

Owner

Hyperquery

Automation that uses Github Actions, Google Drive API, YouTube Data API and youtube-dl together to feed BackJam app with new music

Automate HoYoLAB Genshin Daily Check-In Using Github Actions

42-event-notifier - 42 Event notifier using 42API and Github Actions

Use GitHub Actions to create a serverless service.

Automatic SystemVerilog linting in github actions with the help of Verible

Mini Tool to lovers of debe from eksisozluk (one of the most famous website -reffered as collaborative dictionary like reddit- in Turkey) for pushing debe (Most Liked Entries of Yesterday) to kindle every day via Github Actions.

POC de uma AWS lambda que executa a consulta de preços de criptomoedas, e é implantada na AWS usando Github actions.

Implement SAST + DAST through Github actions

GitHub Actions Poll Mode AutoScaler (GAPMAS)

Centralized whale instance using github actions, sourcing metadata from bigquery-public-data.

Related tags

Overview

Whale Demo Instance: Bigquery Public Data

FAQ

Why doesn't wh.pull() work locally?

You might also like...

A GitHub Actions repo for tracking the dummies sending free money to Alex Jones + co.

GitHub Actions Docker training

A discord.py bot template with easy deployment through Github Actions

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/en-us/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/en-us/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.

This is a scalable system that reads messages from public Telegram channels using Telethon and stores the data in a PostgreSQL database.

GitHub Activity Generator - A script that helps you instantly generate a beautiful GitHub Contributions Graph for the last year.

🤖 Automated follow/unfollow bot for GitHub. Uses GitHub API. Written in python.

Github-Checker - Simple Tool To Check If Github User Available Or Not

Comments

README instructions do not explain credentials.json

fix: update github actions script to accommodate new make build structure

Owner

Hyperquery

Automation that uses Github Actions, Google Drive API, YouTube Data API and youtube-dl together to feed BackJam app with new music

Automate HoYoLAB Genshin Daily Check-In Using Github Actions

42-event-notifier - 42 Event notifier using 42API and Github Actions

Use GitHub Actions to create a serverless service.

Automatic SystemVerilog linting in github actions with the help of Verible

Mini Tool to lovers of debe from eksisozluk (one of the most famous website -reffered as collaborative dictionary like reddit- in Turkey) for pushing debe (Most Liked Entries of Yesterday) to kindle every day via Github Actions.

POC de uma AWS lambda que executa a consulta de preços de criptomoedas, e é implantada na AWS usando Github actions.

Implement SAST + DAST through Github actions

GitHub Actions Poll Mode AutoScaler (GAPMAS)

Why doesn't `wh.pull()` work locally?