bearsql
Bearsql adds sql syntax on pandas dataframe. It uses duckdb to speedup the pandas processing and as the sql engine
- Free software: MIT license
- Documentation: https://bearsql.readthedocs.io.
Basic Usage
To use bearsql in a project:
from bearsql import SqlContext import pandas as pd sc = SqlContext() # The above statement will create duckdb instance in memory. Once the session ends, the database will be erased and not be persisted # To persist the database, you can instantiate sqlcontext like: # sc = SqlContext(database='.db' df = pd.DataFrame([{'name': 'John Doe', 'city': 'New York', 'age': 24}, {'name': 'Jane Doe', 'city': 'Chicago', 'age': 27}]) # Create table from pandas dataframe sc.register_table(df, 'testable') # instead of 'testable' # Query table and output to pandas dataframe results = sc.sql('select * from testable', output='df') output_df = next(results) print(output_df) # Query table and output to pyarrow table results = sc.sql('select * from testable', output='arrow') output_arrow_table = next(results) print(output_arrow_table) # Query table and output raw tuples results = sc.sql('select * from testable', output='any') output_rows = next(results) print(output_rows)
Create a relational table from dataframe and apply some operations:
rel = sc.relation(df, 'new_relation') #instead of new_relation print(rel.filter('age > 24')) # OR convert to df: rel.filter('age > 24').df()
Export the data to filesystem:
result = sc.sql('EXPORT DATABASE \'\' (FORMAT PARQUET);') # format can either be PARQUET or CSV list(result)
For more examples, please visit https://github.com/duckdb/duckdb/blob/master/examples/python/duckdb-python.py
Features
- TODO
Credits
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.