Sqlalchemy-databricks - SQLAlchemy dialect for Databricks

Overview

sqlalchemy-databricks

pypi pyversions

A SQLAlchemy Dialect for Databricks using the officially supported databricks-sql-connector dbapi. Compatible with workspace and sql analytics clusters. (Thanks to @Tom-Newton for pointing this out)

Installation

Install using pip.

pip install sqlalchemy-databricks

Usage

Installing registers the databricks+connector dialect/driver with SQLAlchemy. Fill in the required information when passing the engine URL. The http path can be for either a workspace or sql analytics cluster.

@ :443/default", connect_args={"http_path": " "} ) logs = Table("my_table", MetaData(bind=engine), autoload=True) print(select([func.count("*")], from_obj=logs).scalar())">
from sqlalchemy import *
from sqlalchemy.engine import create_engine


engine = create_engine(
    "databricks+connector://token:
     
      @
      
       :443/default"
      
     ,
    connect_args={"http_path": "
     
      "
     }
)

logs = Table("my_table", MetaData(bind=engine), autoload=True)
print(select([func.count("*")], from_obj=logs).scalar())
Comments
  • schema doesn't change from default

    schema doesn't change from default

    sorry if this is a really dumb/misguided question. if it is, I am hoping you could point me to a place where I can learn more.

    I set up an engine like

    engine = create_engine( f"databricks+connector://token:{token}@{host}:443/{my_schema}", connect_args={"http_path": http_path} )

    but engine.table_names() and anything I try to do with that engine have the default schema tables.

    I have to workaround by doing a schema translation, but that can't be the right way to do this, or is it?

    engine1 = engine.execution_options(schema_translate_map={ "default": my_schema })

    edit: whoops put some private data in there

    opened by jonathan-dufault-kr 14
  • Add alembic support

    Add alembic support

    Following instructions from (here)[https://groups.google.com/g/sqlalchemy-alembic/c/t3KmE9KDzH4/m/AK1UylnCCQAJ], making this dialect work with Alembic.

    opened by harry19023 2
  • InvalidRequestError: Could not reflect: requested table(s)

    InvalidRequestError: Could not reflect: requested table(s)

    Summary

    We are trying to get data from Databricks into a pandas dataframe. We were able to get the list of all the tables but when we try to actually query the data on one table, it fails. We are actually able to make it work using only databricks-sql-connector, but we'd prefer to use sql_alchemy if possible.

    Data

    The tables are in Databricks under hive_metastore.my_schema which we access through a SQL warehouse.

    Code and Error

    The code below correctly returns all the tables (schema_name = "my_schema")

    from databricks import sql
    from sqlalchemy import *
    from sqlalchemy.engine import create_engine
    
    engine = create_engine(
        f"""databricks+connector://token:{databricks_token}@{databricks_server_hostname}:443/{schema_name}""",
        connect_args={ "http_path": f"{databricks_http_path}",},
    )
    sql_query = "show tables"
    df = pd.read_sql(sql_query, engine)
    

    However when we change the line before last to:

    sql_query = "select * from my_table limit 100"
    

    We get the following error: InvalidRequestError: Could not reflect: requested table(s) not available in Engine(databricks+connector://token:***@****.azuredatabricks.net:443/my_schema): (select * from my_table limit 100)

    We tried to replace my_table by my_schema.my_table, hive_metastore.my_schema.my_table and we get the same error.

    Libraries

    Python 3.10.8 pandas 1.5.1 sqlalchemy-databricks 0.2.0 sqlalchemy 1.4.39 databricks-sql-connector 2.0.2

    Please let me know if I need to provide more details. Thanks for the help.

    opened by colineRamee 1
  • Connection value  examples

    Connection value examples

    Why?

    I is never easy to figure out exactly what value the conn string wants and what format. Examples help. Now we have examples.

    What Changes?

    .env.template now has examples and links to detailed databricks docs so new users can quickly get conn strings right :)

    How Does This Impact Us?

    One less silly thing to slow us down.

    opened by norton120 1
  • Example .env values?

    Example .env values?

    This is always a weirdly painful thing with Sqlalchemy dialects and seems like such a simple thing to make easier. I have yet to find 2 dialects that expect params like host/cluster/role/compute_warehouse in exactly the same format. If you were to provide dummy values of each in the .env file it would probably save millions of quietly lost person-hours trying to decipher things like "does this dialect want the host escaped? where does it want the account id, as a get value in the string or a conn param? what have I done with my life that I am re-running tests at 3am because I don't know if this conn string wants protocol prefixes?!?!?" and such.

    Also once/if I figure it out I'm happy to make a PR with the examples

    opened by norton120 1
  • Update sample environment with example formats

    Update sample environment with example formats

    Description

    Updates the sample environment to show what format the host, token, and http path conventionally take. These are taken directly from the official Python connector docs. The change to the readme also reflects the new name SQL Warehouses (which replaced SQL Endpoints in June 2022).

    Related Tickets & Documents

    Closes #9

    opened by susodapop 0
  • Either set supports_statement_cache to True or False

    Either set supports_statement_cache to True or False

    Hi,

    Many thanks for providing this dialect !

    I'm getting this warning when running queries:

    SAWarning: Dialect databricks:connector will not make use of SQL compilation caching as it does not set the 'supports_statement_cache' attribute to True. This can have significant performance implications including some performance degradations in comparison to prior SQLAlchemy versions. Dialect maintainers should seek to set this attribute to True after appropriate development and testing for SQLAlchemy 1.4 caching support. Alternatively, this attribute may be set to False which will disable this warning. (Background on this error at: https://sqlalche.me/e/14/cprf)

    opened by ebarault 0
  • [Question] passing http_path as query parameter

    [Question] passing http_path as query parameter

    Hi @crflynn ,

    Thanks for this wonderful tool.

    I was wondering if is possible to pass the http_path as a query parameter, e.g:

    SQLALCHEMY_DATABASE_URL="databricks+connector://token:<databricks_token>@<databricks_host>:443/<database_or_schema_name>?http_path=<MY_HTTP_PATH>"
    

    This is because I would like to standardize my application so if in the future I want to use another database, I just have to change the URL. The problem that I have at the moment is that I have to define a connect_args in the create_engine function for databricks, but for sqlite o MySQL I don´t need that parameter and raises an error that:

    TypeError: 'http_path' is an invalid keyword argument for this function

    This is when I run:

    engine = create_engine(
            settings.SQLALCHEMY_DATABASE_URL,
            echo=False,
            connect_args={"check_same_thread": False, "http_path": settings.DATABRICKS_HTTP_PATH},
        )
    

    and SQLALCHEMY_DATABASE_URL=sqlite:///./sql_app.db

    opened by dtcMLOps 0
  • add custom type compiler, fix data type DATE

    add custom type compiler, fix data type DATE

    PyHive mappes the data type DATE to a old data type TIMESTAMP which is a quite old way of mapping Date types. Since databricks have been launched after Hive 0.12 I would strongly recommend to fix the data type mapping to DATE in the SQLAlchemy implementation for databricks.

    See also: https://github.com/dropbox/PyHive/issues/139

    opened by leo-schick 0
  • How to run queries in a non-blocking way?

    How to run queries in a non-blocking way?

    Hi there! Thanks for the contribution, the project is awesome and saved me countless hours of coding!

    I have a stupid question: is there an example of how to run queries in a non-blocking way serving data out of a fastapi async handler? Is that supported?

    If so, how would one do that, using something like this: https://docs.sqlalchemy.org/en/14/orm/extensions/asyncio.html ?

    opened by thiagodelgado111 0
  • pandas to sql fails to create table if it does not exist

    pandas to sql fails to create table if it does not exist

    when using the pandas to sql function it would appear if the table does not already exist that the table wont be created. The expected behaviour is that the table should be created and the results inserted.

    Is there a known work around or is this going to be fixed soon?

    opened by timpiperseek 1
  • Inconsistent schema name handling

    Inconsistent schema name handling

    My table in Databricks is in a database, then a schema, so I reference it like database.schema.tablename. When using the to_sql method to insert from a Pandas dataframe, SQLAlchemy issues a DESCRIBE database.schema.tablename command before trying to insert the data into the table. To get the DESCRIBE command to work properly I have to set the 'schema' parameter on the to_sql method to 'database.schema'. However, when I do this, the INSERT statement looks like INSERT INTO TABLE database.schema.tablename. I think the backticks on database.schema make Databricks SQL look for a single schema or database with the name database.schema, instead of treating those as two separate things. Is this something that needs to be addressed here or is this a deeper SQLAlchemy issue? Thanks for your help!

    opened by clayton-bridge 0
  • Failure to connect

    Failure to connect

    I have Azure databricks running in two regions (US and UK). I have been using the databricks-sql-connector successfully to connect to Hive Metastore tables in each for some months. Currently the clusters are not running all the time, but if I use the databricks connector it first spins up the appropriate cluster. I would love to be able to use SQLAlchemy instead, however. I tried your test setup in a Jupyter notebook, but get a Operational Error.

    import numpy as np import pandas as pd import os from sqlalchemy import * from sqlalchemy.engine import create_engine

    def test_dialect(host, http_path, token): engine = create_engine( f"databricks+connector://token:{token}@{host}:443/default", connect_args={"http_path": f"{http_path}"}, ) tables = inspect(engine).get_table_names() print(tables)

    region = "UK" server_hostname= os.getenv(f'DATABRICKS_{region}HOSTNAME'), http_path= os.getenv(f'DATABRICKS{region}HTTP'), access_token = os.getenv(f'DATABRICKS{region}_TOKEN') test_dialect(server_hostname, http_path, access_token)

    gives: OperationalError: (databricks.sql.exc.RequestError) Error during request to server (Background on this error at: https://sqlalche.me/e/14/e3q8)

    I do notice that this relies on PyHive, which is marked as unsupported.

    opened by alunap 0
Owner
Flynn
Flynn
Adds SQLAlchemy support to Flask

Flask-SQLAlchemy Flask-SQLAlchemy is an extension for Flask that adds support for SQLAlchemy to your application. It aims to simplify using SQLAlchemy

The Pallets Projects 3.9k Jan 9, 2023
SQLAlchemy support for aiohttp.

aiohttp-sqlalchemy SQLAlchemy 1.4 / 2.0 support for AIOHTTP. The library provides the next features: initializing asynchronous sessions through a midd

Ruslan Ilyasovich Gilfanov 5 Dec 11, 2022
Global base classes for Pyramid SQLAlchemy applications.

pyramid_basemodel pyramid_basemodel is a thin, low level package that provides an SQLAlchemy declarative Base and a thread local scoped Session that c

Grzegorz Śliwiński 15 Jan 3, 2023
Sqlalchemy seeder that supports nested relationships.

sqlalchemyseed Sqlalchemy seeder that supports nested relationships. Supported file types json yaml csv Installation Default installation pip install

Jedy Matt Tabasco 10 Aug 13, 2022
Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Data Scientist Learning Plan Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Trung-Duy Nguyen 27 Nov 1, 2022
TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset.

TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset. TunBERT was applied to three NLP downstream tasks: Sentiment Analysis (SA), Tunisian Dialect Identification (TDI) and Reading Comprehension Question-Answering (RCQA)

InstaDeep Ltd 72 Dec 9, 2022
DziriBERT: a Pre-trained Language Model for the Algerian Dialect

DziriBERT is the first Transformer-based Language Model that has been pre-trained specifically for the Algerian Dialect.

null 117 Jan 7, 2023
Lightweight Markdown dialect for Python desktop apps

Litemark is a lightweight Markdown dialect originally created to be the markup language for the Codegame Platform project. When you run litemark from the command line interface without any arguments, the Litemark Viewer opens and displays the rendered demo.

null 10 Apr 23, 2022
DziriBERT: a Pre-trained Language Model for the Algerian Dialect

DziriBERT DziriBERT is the first Transformer-based Language Model that has been pre-trained specifically for the Algerian Dialect. It handles Algerian

null 117 Jan 7, 2023
SQL Alchemy dialect for Neo4j

SQL Alchemy dialect for Neo4j This package provides the SQL dialect for Neo4j, using the official JDBC driver (the Neo4j "BI Connector" ) Installation

Beni Ben zikry 8 Jan 2, 2023
A Sublime Text plugin to select a default syntax dialect

Default Syntax Chooser This Sublime Text 4 plugin provides the set_default_syntax_dialect command. This command manipulates a syntax file (e.g.: SQL.s

null 3 Jan 14, 2022
PEP-484 typing stubs for SQLAlchemy 1.4 and SQLAlchemy 2.0

SQLAlchemy 2 Stubs These are PEP-484 typing stubs for SQLAlchemy 1.4 and 2.0. They are released concurrently along with a Mypy extension which is desi

SQLAlchemy 139 Dec 30, 2022
Kotti is a high-level, Pythonic web application framework based on Pyramid and SQLAlchemy. It includes an extensible Content Management System called the Kotti CMS.

Kotti Kotti is a high-level, Pythonic web application framework based on Pyramid and SQLAlchemy. It includes an extensible Content Management System c

Kotti 394 Jan 7, 2023
A curated list of awesome tools for SQLAlchemy

Awesome SQLAlchemy A curated list of awesome extra libraries and resources for SQLAlchemy. Inspired by awesome-python. (See also other awesome lists!)

Hong Minhee (洪 民憙) 2.5k Dec 31, 2022
Mixer -- Is a fixtures replacement. Supported Django, Flask, SqlAlchemy and custom python objects.

The Mixer is a helper to generate instances of Django or SQLAlchemy models. It's useful for testing and fixture replacement. Fast and convenient test-

Kirill Klenov 871 Dec 25, 2022
Mixer -- Is a fixtures replacement. Supported Django, Flask, SqlAlchemy and custom python objects.

The Mixer is a helper to generate instances of Django or SQLAlchemy models. It's useful for testing and fixture replacement. Fast and convenient test-

Kirill Klenov 871 Dec 25, 2022
A wrapper around asyncpg for use with sqlalchemy

asyncpgsa A python library wrapper around asyncpg for use with sqlalchemy Backwards incompatibility notice Since this library is still in pre 1.0 worl

Canopy 404 Dec 3, 2022
GINO Is Not ORM - a Python asyncio ORM on SQLAlchemy core.

GINO - GINO Is Not ORM - is a lightweight asynchronous ORM built on top of SQLAlchemy core for Python asyncio. GINO 1.0 supports only PostgreSQL with

GINO Community 2.5k Dec 27, 2022
Mixer -- Is a fixtures replacement. Supported Django, Flask, SqlAlchemy and custom python objects.

The Mixer is a helper to generate instances of Django or SQLAlchemy models. It's useful for testing and fixture replacement. Fast and convenient test-

Kirill Klenov 741 Feb 4, 2021
SQLAlchemy database migrations for Flask applications using Alembic

Flask-Migrate Flask-Migrate is an extension that handles SQLAlchemy database migrations for Flask applications using Alembic. The database operations

Miguel Grinberg 2.2k Dec 28, 2022