Centralized whale instance using github actions, sourcing metadata from bigquery-public-data.

Overview

Whale Demo Instance: Bigquery Public Data

This is a fully-functioning demo instance of the whale data catalog, actively scraping data from Bigquery's public project bigquery-public-data using github actions.

To test out this repo with your own local installation of whale (i.e. to emulate what it'd be like to set up whale on github for your own team), clone the repo to your ~/.whale directory (if you already have a ~/.whale directory, move it or delete it with rm -rf ~/.whale or the clone won't work):

git clone https://github.com/dataframehq/whale-bigquery-public-data ~/.whale

Then install whale and run the following commands, following the prompts:

wh git-enable
wh schedule

At this point, if you run wh pull, it should run a git pull --autostash --rebase against this repo (meaning any locally scheduled cron jobs will simply pull down fresh metadata from this repo, rather than scraping directly from Bigquery).

For more information on how to set this up for your own warehouse, see the docs.

FAQ

Why doesn't wh.pull() work locally?

While wh pull (the CLI hook) will check for a flag in wh config and act appropriately (sourcing from github if is_git_etl_enabled is True, and from the connections in wh connections if not), wh.pull() (the python hook) performs no such check. This is by design, to ensure the remote repository's associated CI/CD pipelines to pull down data directly from the metadata source, by default (we suspect most people do not want to refresh metadata using the python client, but feel free to open an issue if you disagree).

Locally, this will fail, unless you modify the key_path value specified in wh connections to some credentials you have stored locally. If you choose to do this, ensure you have the following permissions enabled in the associated service account: BigQuery Data Viewer, BigQuery Job User, BigQuery Metadata Viewer.

You might also like...
A GitHub Actions repo for tracking the dummies sending free money to Alex Jones + co.

A GitHub Actions repo for tracking the dummies sending free money to Alex Jones + co.

GitHub Actions Docker training

GitHub-Actions-Docker-training Training exercise repository for GitHub Actions using a docker base. This repository should be cloned and used for trai

A discord.py bot template with easy deployment through Github Actions
A discord.py bot template with easy deployment through Github Actions

discord.py bot template A discord.py bot template with easy deployment through Github Actions. You can use this template to just run a Python instance

This is a scalable system that reads messages from public Telegram channels using Telethon and stores the data in a PostgreSQL database.

This is a scalable system that reads messages from public Telegram channels using Telethon and stores the data in a PostgreSQL database. Its original intention is to monitor cryptocurrency related channels, but it can be configured to read any Telegram data that is accessible through the API.

GitHub Activity Generator - A script that helps you instantly generate a beautiful GitHub Contributions Graph for the last year.
GitHub Activity Generator - A script that helps you instantly generate a beautiful GitHub Contributions Graph for the last year.

GitHub Activity Generator A script that helps you instantly generate a beautiful GitHub Contributions Graph for the last year. Before 😐 😶 😒 After ?

🤖 Automated follow/unfollow bot for GitHub. Uses GitHub API. Written in python.

GitHub Follow Bot Table of Contents Disclaimer How to Use Install requirements Authenticate Get a GitHub Personal Access Token Add your GitHub usernam

Github-Checker - Simple Tool To Check If Github User Available Or Not
Github-Checker - Simple Tool To Check If Github User Available Or Not

Github Checker Simple Tool To Check If Github User Available Or Not Socials: Lan

Comments
  • README instructions do not explain credentials.json

    README instructions do not explain credentials.json

    I followed the instructions in the README and the result is:

    ❯ cat connections.yaml
    ---
    name: bq-1
    metadata_source: Bigquery
    key_path: ~/credentials.json
    project_credentials: ~
    project_id: bigquery-public-data
    

    Then I ran:

    ❯ ipython
    Python 3.8.6 (default, Oct 27 2020, 08:56:44)
    Type 'copyright', 'credits' or 'license' for more information
    IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.
    
    In [1]: import whale as wh
       ...: wh.pull()
    

    As expected, this fails with FileNotFoundError: [Errno 2] No such file or directory: '/Users/manuelzander/credentials.json.

    I was able to fix it by using a key file I created using my own GCP project. Is this expected from users? I would add some points to the README explaining how to create a credentials.json and which GCP roles and permissions are needed 😊

    opened by manuelzander 2
  • fix: update github actions script to accommodate new make build structure

    fix: update github actions script to accommodate new make build structure

    Due to changes in how the makefile build directory is structured (turns out virtual environments aren't very portable, so we installing the virtual env directly into the ~/.whale/libexec/env path instead), we adjust the code to source from the final destination in libexec instead (rather than the ./build directory, which is now deprecated).

    This PR also includes a few QOL improvements:

    • Add comments and split up the git push step for easier grokking.
    • Migrate to python 3.8 (which is what we're testing on, anyways).
    • Directly invoke the github secret w/the BQ credentials, rather than specifying a env variable and referencing that indirectly.

    These changes will be reflected in the docs as well, in the sample github actions code.

    opened by rsyi 0
Owner
Hyperquery
All-in-one workspace for data analytics
Hyperquery
Automation that uses Github Actions, Google Drive API, YouTube Data API and youtube-dl together to feed BackJam app with new music

Automation that uses Github Actions, Google Drive API, YouTube Data API and youtube-dl together to feed BackJam app with new music

Antônio Oliveira 1 Nov 21, 2021
Automate HoYoLAB Genshin Daily Check-In Using Github Actions

Genshin Daily Check-In ?? Automate HoYoLAB Daily Check-In Using Github Actions KOR, ENG Instructions Fork the repository Go to Settings -> Secrets Cli

Leo Kim 41 Jun 24, 2022
42-event-notifier - 42 Event notifier using 42API and Github Actions

42 Event Notifier 42서울 Agenda에 새로운 이벤트가 등록되면 알려드립니다! 현재는 Github Issue로 등록되므로 상단

null 6 May 16, 2022
Use GitHub Actions to create a serverless service.

ActionServerless - Use GitHub Actions to create a serverless service ActionServerless is an action to do some computing and then generate a string/JSO

null 107 Oct 28, 2022
可基于【腾讯云函数】/【GitHub Actions】/【Docker】的每日签到脚本(支持多账号使用)签到列表: |爱奇艺|全民K歌|腾讯视频|有道云笔记|网易云音乐|一加手机社区官方论坛|百度贴吧|Bilibili|V2EX|咔叽网单|什么值得买|AcFun|天翼云盘|WPS|吾爱破解|芒果TV|联通营业厅|Fa米家|小米运动|百度搜索资源平台|每日天气预报|每日一句|哔咔漫画|和彩云|智友邦|微博|CSDN|王者营地|

每日签到集合 基于【腾讯云函数】/【GitHub Actions】/【Docker】的每日签到脚本 支持多账号使用 特别声明: 本仓库发布的脚本及其中涉及的任何解锁和解密分析脚本,仅用于测试和学习研究,禁止用于商业用途,不能保证其合法性,准确性,完整性和有效性,请根据情况自行判断。

null 87 Nov 12, 2022
Automatic SystemVerilog linting in github actions with the help of Verible

Verible Lint Action Usage See action.yml This is a GitHub Action used to lint Verilog and SystemVerilog source files and comment erroneous lines of co

CHIPS Alliance 10 Dec 26, 2022
Mini Tool to lovers of debe from eksisozluk (one of the most famous website -reffered as collaborative dictionary like reddit- in Turkey) for pushing debe (Most Liked Entries of Yesterday) to kindle every day via Github Actions.

debe to kindle Mini Tool to lovers of debe from eksisozluk (one of the most famous website -refered as collaborative dictionary like reddit- in Turkey

null 11 Oct 11, 2022
POC de uma AWS lambda que executa a consulta de preços de criptomoedas, e é implantada na AWS usando Github actions.

Cryptocurrency Prices Overview Instalação Repositório Configuração CI/CD Roadmap Testes Overview A ideia deste projeto é aplicar o conteúdo estudado s

Gustavo Santos 3 Aug 31, 2022
Implement SAST + DAST through Github actions

Implement SAST + DAST through Github actions The repository is supposed to implement SAST+DAST checks using github actions against a vulnerable python

Syed Umar Arfeen 3 Nov 9, 2022
GitHub Actions Poll Mode AutoScaler (GAPMAS)

GitHub Actions Poll Mode AutoScaler, or GAPMAS, is a simple tool that helps you run ephemeral GitHub Actions self-hosted runners on your own infrastructure.

Frode Nordahl 4 Nov 4, 2022