RasgoQL
Write python locally, execute SQL in your data warehouse
≪ Read the Docs · Join Our Slack »
RasgoQL is a Python package that enables you to easily query and transform tables in your Data Warehouse directly from a notebook.
You can quickly create new features, sample data, apply complex aggregates... all without having to write SQL!
Choose from our library of predefined transformations or make your own to streamline the feature engineering process.
Why is this package useful?
Data scientists spend much of their time in pandas preparing data for modelling. When they are ready to deploy or scale, two pain points arise:
- pandas cannot handle larger volumes of data, forcing the use of VMs or code refactoring.
- feature data must be added to the Enterprise Data Warehouse for future processing, requiring refactoring to SQL
We created RasgoQL to solve these two pain points.
Learn more at https://docs.rasgoql.com.
How does it work?
Under the covers, RasgoQL sends all processing to your Data Warehouse, enabling the efficient transformation of massive datasets. RasgoQL only needs basic metadata to execute transforms, so your private data remains secure.
RasgoQL does these things well:
- Pulls existing Data Warehouse tables into pandas DataFrames for analysis
- Constructs SQL queries using a syntax that feels like pandas
- Creates views in your Data Warehouse to save transformed data
- Exports runnable sql in .sql files or dbt-compliant .yaml files
- Offers dozens of free SQL transforms to use
- Coming Soon: allows users to create & add custom transforms
Rasgo supports Snowflake, BigQuery, Postgres, and Amazon Redshift with more Data Warehouses being added soon. If you'd like to suggest another database type, submit your idea to our GitHub Discussions page so that other community members can weight in and show their support.
Can RasgoQL help you?
-
If you use pandas to build features, but you are working on a massive set of data that won't fit in your machine's memory. RasgoQL can help!
-
If your organization uses dbt or another SQL tool to run production data flows, but you prefer to build features in pandas. RasgoQL can help!
-
If you know pandas, but not SQL and want to learn how queries will translate. RasgoQL can help!
Where to get it
Just run a simple pip install.
pip install rasgoql~=1.0
Report Bug · Suggest Improvement · Request Feature
Quick Start
pip install rasgoql --upgrade
# Connect to your data warehouse
creds = rasgoql.SnowflakeCredentials(
account="",
user="",
password="",
role="",
warehouse="",
database="",
schema=""
)
# Connect to DW
rql = rasgoql.connect(creds)
# List available tables
rql.list_tables('ADVENTUREWORKS').head(10)
# Allow rasgoQL to interact with an existing Table in your Data Warehouse
dataset = rql.dataset('ADVENTUREWORKS.PUBLIC.FACTINTERNETSALES')
# Take a peek at the data
dataset.preview()
# Use the datetrunc transform to seperate things into weeks
weekly_sales = dataset.datetrunc(dates={'ORDERDATE':'week'})
# Aggregate to sum of sales for each week
agg_weekly_sales = weekly_sales.aggregate(
group_by=['PRODUCTKEY', 'ORDERDATE_WEEK'],
aggregations={'SALESAMOUNT': ['SUM']},
)
# Quickly validate output
agg_weekly_sales.to_df()
# Print the SQL
print(agg_weekly_sales.sql())
Getting Stared Tutorials
The best way to get familiar with the RasgoQL basics is by running through these notebooks in the tutorials folder.
Advanced Examples
Joins
Easily join tables together using the join
transform.
sales_dataset = rasgoql.dataset('ADVENTUREWORKS.PUBLIC.FACTINTERNETSALES')
sales_product_dataset = sales_dataset.join(
join_table='DIM_PRODUCT',
join_columns={'PRODUCTKEY': 'PRODUCTKEY'},
join_type='LEFT',
join_prefix='PRODUCT')
sales_product_dataset.sql()
sales_product_dataset.preview()
Chain transforms together
Create a rolling average aggregation and then drops unnecessary colomns.
sales_agg_drop = sales_dataset.rolling_agg(
aggregations={"SALESAMOUNT": ["MAX", "MIN", "SUM"]},
order_by="ORDERDATE",
offsets=[-7, 7],
group_by=["PRODUCTKEY"],
).drop_columns(exclude_cols=["ORDERDATEKEY"])
sales_agg_drop.sql()
sales_agg_drop.preview()
Transpose unique values with pivots
Quickly generate pivot tables of your data.
sales_by_product = sales_dataset.pivot(
dimensions=['ORDERDATE'],
pivot_column='SALESAMOUNT',
value_column='PRODUCTKEY',
agg_method='SUM',
list_of_vals=['310', '345'],
)
sales_by_product.sql()
sales_by_product.preview()
Does any of my data get collected?
Rasgo will not collect any personal information. We log execution of methods in transforms.py
for success and failure so that we can more accurately track what's useful and what's problematic.
Where do I go for help?
If you have any questions please:
How can I contribute?
Review the contributors guide
License
RasgoQL uses the GNU AGPL license, as found in the LICENSE file.
This project is sponspored by RasgoML. Find out at https://www.rasgoml.com/