ClickHouse Python Driver with native interface support

Overview

ClickHouse Python Driver

https://coveralls.io/repos/github/mymarilyn/clickhouse-driver/badge.svg?branch=master https://travis-ci.org/mymarilyn/clickhouse-driver.svg?branch=master

ClickHouse Python Driver with native (TCP) interface support.

Asynchronous wrapper is available here: https://github.com/mymarilyn/aioch

Features

  • External data for query processing.
  • Query settings.
  • Compression support.
  • TLS support (since server version 1.1.54304).
  • Types support:
    • Float32/64
    • [U]Int8/16/32/64
    • Date/DateTime('timezone')/DateTime64('timezone')
    • String/FixedString(N)
    • Enum8/16
    • Array(T)
    • Nullable(T)
    • UUID
    • Decimal
    • IPv4/IPv6
    • LowCardinality(T)
    • SimpleAggregateFunction(F, T)
    • Tuple(T1, T2, ...)
    • Nested
  • Query progress information.
  • Block by block results streaming.
  • Reading query profile info.
  • Receiving server logs.
  • Multiple hosts support.
  • Python DB API 2.0 specification support.
  • Optional NumPy arrays support.

Documentation

Documentation is available at https://clickhouse-driver.readthedocs.io.

Usage

There are two ways to communicate with server:

  • using pure Client;
  • using DB API.

Pure Client example:

>>> from clickhouse_driver import Client
>>>
>>> client = Client('localhost')
>>>
>>> client.execute('SHOW TABLES')
[('test',)]
>>> client.execute('DROP TABLE IF EXISTS test')
[]
>>> client.execute('CREATE TABLE test (x Int32) ENGINE = Memory')
[]
>>> client.execute(
...     'INSERT INTO test (x) VALUES',
...     [{'x': 100}]
... )
1
>>> client.execute('INSERT INTO test (x) VALUES', [[200]])
1
>>> client.execute(
...     'INSERT INTO test (x) '
...     'SELECT * FROM system.numbers LIMIT %(limit)s',
...     {'limit': 3}
... )
[]
>>> client.execute('SELECT sum(x) FROM test')
[(303,)]

DB API example:

>>> from clickhouse_driver import connect
>>>
>>> conn = connect('clickhouse://localhost')
>>> cursor = conn.cursor()
>>>
>>> cursor.execute('SHOW TABLES')
>>> cursor.fetchall()
[('test',)]
>>> cursor.execute('DROP TABLE IF EXISTS test')
>>> cursor.fetchall()
[]
>>> cursor.execute('CREATE TABLE test (x Int32) ENGINE = Memory')
>>> cursor.fetchall()
[]
>>> cursor.executemany(
...     'INSERT INTO test (x) VALUES',
...     [{'x': 100}]
... )
>>> cursor.rowcount
1
>>> cursor.executemany('INSERT INTO test (x) VALUES', [[200]])
>>> cursor.rowcount
1
>>> cursor.execute(
...     'INSERT INTO test (x) '
...     'SELECT * FROM system.numbers LIMIT %(limit)s',
...     {'limit': 3}
... )
>>> cursor.rowcount
0
>>> cursor.execute('SELECT sum(x) FROM test')
>>> cursor.fetchall()
[(303,)]

License

ClickHouse Python Driver is distributed under the MIT license.

Comments
  • Fix null value on bytestring columns

    Fix null value on bytestring columns

    When client setting strings_as_bytes is set, the driver crashes when inserting None values into columns with type Nullable([Fixed]String):

      File "clickhouse_driver/client.py", line 142, in execute
        query_id=query_id, types_check=types_check
      File "clickhouse_driver/client.py", line 246, in process_insert_query
        self.send_data(sample_block, data, types_check=types_check)
      File "clickhouse_driver/client.py", line 270, in send_data
        self.connection.send_data(block)
      File "clickhouse_driver/connection.py", line 429, in send_data
        self.block_out.write(block)
      File "clickhouse_driver/streams/native.py", line 41, in write
        self.fout, types_check=block.types_check)
      File "clickhouse_driver/columns/service.py", line 84, in write_column
        column.write_data(items, buf)
      File "clickhouse_driver/columns/base.py", line 79, in write_data
        self._write_data(items, buf)
      File "clickhouse_driver/columns/base.py", line 83, in _write_data
        self.write_items(prepared, buf)
      File "clickhouse_driver/columns/stringcolumn.py", line 120, in write_items
        items_buf_view[buf_pos:buf_pos + min(length, value_len)] = value
    TypeError: a bytes-like object is required, not 'str'
    

    This is likely because the method ByteString.prepare_null is simply String.prepare_null, and returns the empty string '' instead the empty bytes b''. I've been able to fix it with this PR, but I'm not familiar at all with this project so maybe it's not the right way to do it. Let me know if I can help :)

    opened by vivienm 43
  • fallback for os_name if user name is not defined

    fallback for os_name if user name is not defined

    got this error while running inside docker container (no user entry for such uid)

    Traceback (most recent call last):
      File "/usr/local/lib/python3.7/site-packages/clickhouse_driver/client.py", line 119, in execute
        columnar=columnar
      File "/usr/local/lib/python3.7/site-packages/clickhouse_driver/client.py", line 188, in process_ordinary_query
        self.connection.send_query(query, query_id=query_id)
      File "/usr/local/lib/python3.7/site-packages/clickhouse_driver/connection.py", line 432, in send_query
        client_info = ClientInfo(self.client_name)
      File "/usr/local/lib/python3.7/site-packages/clickhouse_driver/clientinfo.py", line 38, in __init__
        self.os_user = getpass.getuser()
      File "/usr/local/lib/python3.7/getpass.py", line 169, in getuser
        return pwd.getpwuid(os.getuid())[0]
    KeyError: 'getpwuid(): uid not found: 1001'
    
    opened by wawaka 33
  • Add max_partitions_per_insert_block to settings.available

    Add max_partitions_per_insert_block to settings.available

    The max_partitions_per_insert_block is defined in: https://github.com/yandex/ClickHouse/blob/f566182582c70986be19777b3583c803607928ad/dbms/src/Core/Settings.h#L315

    Regarding the data type, I have submitted the pull request https://github.com/yandex/ClickHouse/pull/5028 because it is used in other places of code as size_t, rather than boolean.

    opened by mohammad7t 24
  • How to speed up inserts from pandas dataframe?

    How to speed up inserts from pandas dataframe?

    I have pandas dataframe on my laptop with few millions of records. I am inserting them to clickhouse table with: client.execute('insert into database.table (col1, col2…, coln) values', df.values.tolist())

    After execution of this command I looked at laptop’s network activity.

    image

    As you can see network activity is in peaks up to 12 Mbps, with lows at 6 Mbps. Such activity takes quite a long, and than at one moment, laptop's network send goes to the 100 Mbps for some short period of time and insert is over.

    Can someone explain how insert in clickhouse driver works? Why they are not going to the clickhouse server at top network speed?

    I tried to play with settings like max_insert_block_size or insert_block_size, but with no success. A there any clickhouse server parameters that could improve the speed of inserts?

    What would be the fastest way to insert pandas dataframe to clickhouse table?

    opened by ghuname 22
  • Enum option parsing is not handling all supported characters correctly

    Enum option parsing is not handling all supported characters correctly

    When querying a table with Enum options containing comma and a space, the parsing of the options fails (see below).

    With an example table as

    CREATE TABLE default.test_table (
    	id UInt64,  
    	datetime DateTime,  
    	action_type Enum8('first' = 1, 'test_rename = 3, ' = 2, 'dele\'te' = 3, 'to"uch' = 4)
    ) ENGINE = MergeTree
    

    the options are a bit non-standard but seem to be actually permitted. (Kind of created these options by accident due to a typo in a query and then figured out that the parsing could be improved on this.)

    And while testing the the original parsing, I've also noticed that it is not really handling any empty characters before the first option; it gets prepended into the first option, e.g., Enum8( 'one' = 1, 'exa"mple' = 2, 'three' = 3) is turned into {" 'one": 1, 'exa"mple': 2, 'three': 3} which doesn't seem right either.

    Traceback (most recent call last):
      File "/home/.../test_clickhouse_integration.py", line 92, in <module>
        .limit(5)
      File "/home/.../venv/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 2843, in all
        return list(self)
      File "/home/.../venv/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 2995, in __iter__
        return self._execute_and_instances(context)
      File "/home/.../venv/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 3018, in _execute_and_instances
        result = conn.execute(querycontext.statement, self._params)
      File "/home/.../venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 948, in execute
        return meth(self, multiparams, params)
      File "/home/.../venv/lib/python3.6/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
        return connection._execute_clauseelement(self, multiparams, params)
      File "/home/.../venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
        compiled_sql, distilled_params
      File "/home/.../venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
        context)
      File "/home/.../venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1416, in _handle_dbapi_exception
        util.reraise(*exc_info)
      File "/home/.../venv/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 249, in reraise
        raise value
      File "/home/.../venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
        context)
      File "/home/.../venv/lib/python3.6/site-packages/clickhouse_sqlalchemy/drivers/base.py", line 492, in do_execute
        cursor.execute(statement, parameters, context=context)
      File "/home/.../venv/lib/python3.6/site-packages/clickhouse_sqlalchemy/drivers/native/connector.py", line 125, in execute
        external_tables=external_tables, settings=settings
      File "/home/.../venv/lib/python3.6/site-packages/clickhouse_driver/client.py", line 119, in execute
        columnar=columnar
      File "/home/.../venv/lib/python3.6/site-packages/clickhouse_driver/client.py", line 192, in process_ordinary_query
        columnar=columnar)
      File "/home/.../venv/lib/python3.6/site-packages/clickhouse_driver/client.py", line 42, in receive_result
        return result.get_result()
      File "/home/.../venv/lib/python3.6/site-packages/clickhouse_driver/result.py", line 39, in get_result
        for packet in self.packet_generator:
      File "/home/.../venv/lib/python3.6/site-packages/clickhouse_driver/client.py", line 54, in packet_generator
        packet = self.receive_packet()
      File "/home/.../venv/lib/python3.6/site-packages/clickhouse_driver/client.py", line 68, in receive_packet
        packet = self.connection.receive_packet()
      File "/home/.../venv/lib/python3.6/site-packages/clickhouse_driver/connection.py", line 334, in receive_packet
        packet.block = self.receive_data()
      File "/home/.../venv/lib/python3.6/site-packages/clickhouse_driver/connection.py", line 393, in receive_data
        block = self.block_in.read()
      File "/home/.../venv/lib/python3.6/site-packages/clickhouse_driver/streams/native.py", line 80, in read
        self.fin)
      File "/home/.../venv/lib/python3.6/site-packages/clickhouse_driver/columns/service.py", line 67, in read_column
        column = get_column_by_spec(column_spec, column_options=column_options)
      File "/home/.../venv/lib/python3.6/site-packages/clickhouse_driver/columns/service.py", line 45, in get_column_by_spec
        return create_enum_column(spec, column_options)
      File "/home/.../venv/lib/python3.6/site-packages/clickhouse_driver/columns/enumcolumn.py", line 65, in create_enum_column
        value = int(param[pos + 1:].lstrip(' ='))
    ValueError: invalid literal for int() with base 10: 'test_rename = 3'
    

    Do you agree that it make sense to fix the options parsing? Should I add some tests for it?

    In addition I've added escaping of single quotes into generated error message.

    opened by sochi 18
  • Get progress info

    Get progress info

    It seems the Progress packets are received and managed but there is no way to get the info from the Client or Connection objects. Here an API proposition with a fetch* method, this is common on a database API.

    # quick way
    c = client('server')
    c.execute("select * from table")
    result = c.fetchall()
    
    # get progress (and trash the results for an easy example)
    c = client('server')
    c.execute("select * from verybigtable")
    while c.fetchsome():
        print c.progress
    
    opened by elrik75 17
  • Last query’s profile info

    Last query’s profile info

    Hi, is there an easy way to read

    • The number of rows in the result set,
    • the time passed,
    • and the average speed of query processing

    that ClickHouse command-line client displays at the end of processing a query ?

    For example:
    21 rows in set. Elapsed: 2.172 sec. Processed 2.78 million rows, 33.33 MB (1.28 million rows/s., 15.34 MB/s.)
    

    I have tried reading query profile info but I could only get

    • rows, i.e. number of rows in the result set
    • bytes of the result set ?

    I am missing 3 more measurements

    1. elapsed time
    2. read_rows
    3. read_bytes

    Question:

    How would you execute client.last_query.profile_info to get these three ?

    BTW1: I can process the system.query_log but this is a cumbersome approach BTW2: This could be a nice feature to have, i.e. adding an option to display profile info from Client.execute()

    opened by healiseu 14
  • Wrong DateTime insert

    Wrong DateTime insert

    Hi!

    After insert datetime.datetime(2018, 1, 19, 10) through this driver I see '2018-01-19 13:00:00' value in table. Timezone on my computer and clickhouse server is Moskow.

    What I must do to see '2018-01-19 10:00:00' after insert?

    opened by v-v-vishnevskiy 14
  • expected Hello or Exception, got Unknown packet

    expected Hello or Exception, got Unknown packet

    Describe the bug Client throws this error when running queries.

    UnexpectedPacketFromServerError           Traceback (most recent call last)
    <ipython-input-6-977623b45362> in <module>
          4 
          5 client = Client("localhost")
    ----> 6 client.execute("select 123")
    
    /usr/local/lib/python3.9/site-packages/clickhouse_driver-0.2.1-py3.9-macosx-11-x86_64.egg/clickhouse_driver/client.py in execute(self, query, params, with_column_types, external_tables, query_id, settings, types_check, columnar)
        269         start_time = time()
        270         self.make_query_settings(settings)
    --> 271         self.connection.force_connect()
        272         self.last_query = QueryInfo()
        273 
    
    /usr/local/lib/python3.9/site-packages/clickhouse_driver-0.2.1-py3.9-macosx-11-x86_64.egg/clickhouse_driver/connection.py in force_connect(self)
        211     def force_connect(self):
        212         if not self.connected:
    --> 213             self.connect()
        214 
        215         elif not self.ping():
    
    /usr/local/lib/python3.9/site-packages/clickhouse_driver-0.2.1-py3.9-macosx-11-x86_64.egg/clickhouse_driver/connection.py in connect(self)
        317 
        318             try:
    --> 319                 return self._init_connection(host, port)
        320 
        321             except socket.timeout as e:
    
    /usr/local/lib/python3.9/site-packages/clickhouse_driver-0.2.1-py3.9-macosx-11-x86_64.egg/clickhouse_driver/connection.py in _init_connection(self, host, port)
        293 
        294         self.send_hello()
    --> 295         self.receive_hello()
        296 
        297         self.block_in = self.get_block_in_stream()
    
    /usr/local/lib/python3.9/site-packages/clickhouse_driver-0.2.1-py3.9-macosx-11-x86_64.egg/clickhouse_driver/connection.py in receive_hello(self)
        437             message = self.unexpected_packet_message('Hello or Exception',
        438                                                      packet_type)
    --> 439             raise errors.UnexpectedPacketFromServerError(message)
        440 
        441     def ping(self):
    
    UnexpectedPacketFromServerError: Code: 102. Unexpected packet from server None:None (expected Hello or Exception, got Unknown packet)
    

    To Reproduce

    from clickhouse_driver import Client
    
    client = Client("localhost")
    client.execute("select 123")
    

    Versions Python 3.9.6 clickhouse-driver built from commit 78e389e36d20744c236c546ee01ee76d5bc5fb35 Clickhouse server version 21.10.1 revision 54449

    opened by leizha 13
  • Feature request: Extend columnar form to support NumPy / PyArrow arrays

    Feature request: Extend columnar form to support NumPy / PyArrow arrays

    Hi Konstantin, I noticed that enabling the columnar parameter during query execution, clickhouse binary raw data are deserialized into python tuples. Wouldn't be great if you can match numpy data types with ClickHouse data types and deserialize into numpy arrays instead ?

    As far as I understand there are two ways to do this, either turn python tuples into numpy arrays, if possible with zero copy, or do the transformation straight ahead on the binary data.

    The bonus of this will be that another zero copy transformation of numpy arrays to pyarrow arrays can be easily done. This way we gain easily two significant advantages:

    • Use pyarrow for batch processing, table and pandas dataframe zero copy transformations, it's blazing fast and memory efficient.

    • it opens the sky for arrow flight protocol (gRPC based) which can be great for transferring data with high speed from remote servers.

    I would also like to use columnar forms with numpy arrays for my project and I am offering testing.

    opened by healiseu 12
  • Support python 3.11?

    Support python 3.11?

    Describe the bug Can't install clickhouse-driver with python 3.11

    To Reproduce pip3.11 install clickhouse-driver

    Expected behavior Seeing such errors:

          building 'clickhouse_driver.bufferedreader' extension
          creating build/temp.linux-x86_64-cpython-311
          creating build/temp.linux-x86_64-cpython-311/clickhouse_driver
          x86_64-linux-gnu-gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -I/mnt/python-ve/ck/include -I/usr/include/python3.11 -c clickhouse_driver/bufferedreader.c -o build/temp.linux-x86_64-cpython-311/clickhouse_driver/bufferedreader.o
          clickhouse_driver/bufferedreader.c:209:12: fatal error: longintrepr.h: No such file or directory
            209 |   #include "longintrepr.h"
                |            ^~~~~~~~~~~~~~~
          compilation terminated.
          error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1
    

    Versions

    • clickhouse-driver 0.24
    • Python 3.11
    opened by wenbinf 11
  • EOFError: Unexpected EOF while reading bytes

    EOFError: Unexpected EOF while reading bytes

    Describe the bug: When i execute_iter bigquery, i receive this error

    Traceback

     File "/usr/local/lib/python3.6/dist-packages/clickhouse_driver/client.py", line 197, in iter_receive_result
        for rows in result:
      File "/usr/local/lib/python3.6/dist-packages/clickhouse_driver/result.py", line 114, in __next__
        packet = next(self.packet_generator)
      File "/usr/local/lib/python3.6/dist-packages/clickhouse_driver/client.py", line 204, in packet_generator
        packet = self.receive_packet()
      File "/usr/local/lib/python3.6/dist-packages/clickhouse_driver/client.py", line 218, in receive_packet
        packet = self.connection.receive_packet()
      File "/usr/local/lib/python3.6/dist-packages/clickhouse_driver/connection.py", line 502, in receive_packet
        packet.block = self.receive_data(may_be_use_numpy=True)
      File "/usr/local/lib/python3.6/dist-packages/clickhouse_driver/connection.py", line 577, in receive_data
        return reader.read(use_numpy=use_numpy)
      File "/usr/local/lib/python3.6/dist-packages/clickhouse_driver/streams/native.py", line 75, in read
        self.fin, use_numpy=use_numpy)
      File "/usr/local/lib/python3.6/dist-packages/clickhouse_driver/columns/service.py", line 144, in read_column
        return col.read_data(n_items, buf)
      File "/usr/local/lib/python3.6/dist-packages/clickhouse_driver/columns/arraycolumn.py", line 50, in read_data
        return self._read(rows, buf)[0]
      File "/usr/local/lib/python3.6/dist-packages/clickhouse_driver/columns/arraycolumn.py", line 140, in _read
        n_items, buf, nulls_map=nulls_map
      File "/usr/local/lib/python3.6/dist-packages/clickhouse_driver/columns/base.py", line 105, in _read_data
        items = self.read_items(n_items, buf)
      File "/usr/local/lib/python3.6/dist-packages/clickhouse_driver/columns/base.py", line 146, in read_items
        return s.unpack(buf.read(s.size))
      File "clickhouse_driver/bufferedreader.pyx", line 40, in clickhouse_driver.bufferedreader.BufferedReader.read
      File "clickhouse_driver/bufferedreader.pyx", line 251, in clickhouse_driver.bufferedreader.CompressedBufferedReader.read_into_buffer
      File "/usr/local/lib/python3.6/dist-packages/clickhouse_driver/streams/compressed.py", line 87, in read_block
        method_byte, compressed_hash, extra_header_size
      File "/usr/local/lib/python3.6/dist-packages/clickhouse_driver/compression/base.py", line 74, in get_decompressed_data
        compressed = BytesIO(self.stream.read(compressed_size))
      File "clickhouse_driver/bufferedreader.pyx", line 40, in clickhouse_driver.bufferedreader.BufferedReader.read
      File "clickhouse_driver/bufferedreader.pyx", line 240, in clickhouse_driver.bufferedreader.BufferedSocketReader.read_into_buffer
    EOFError: Unexpected EOF while reading bytes
    

    Notes if i disable compression, i receive same last rows:

    File "/usr/local/lib/python3.6/dist-packages/clickhouse_driver/columns/base.py", line 146, in read_items
        return s.unpack(buf.read(s.size))
      File "clickhouse_driver/bufferedreader.pyx", line 40, in clickhouse_driver.bufferedreader.BufferedReader.read
      File "clickhouse_driver/bufferedreader.pyx", line 240, in clickhouse_driver.bufferedreader.BufferedSocketReader.read_into_buffer
    EOFError: Unexpected EOF while reading bytes
    

    if max_block_size > bigquery rows, then problem dissapers

    Setttings {'max_block_size': 10000, 'max_query_size': 1048576} with_column_types=True compression='lz4'

    Versions clickhouse-driver==0.2.5 clickhouse-cityhash==1.0.2.4 ClickHouse server version==22.11.2.30 Python version==3.6

    opened by Noordle 2
  • Feature/err codes as enum

    Feature/err codes as enum

    I find the var name assigned with error code useful for my project purposes. Using Enum one can get both the name and its matching error code. Additionally - using the unique decorator it's assured that no error code is duplicated.

    Checklist:

    • [x] Add tests that demonstrate the correct behavior of the change. Tests should fail without the change.
    • [x] Add or update relevant docs, in the docs folder and in code.
    • [x] Ensure PR doesn't contain untouched code reformatting: spaces, etc.
    • [x] Run flake8 and fix issues.
    • [x] Run pytest no tests failed. See https://clickhouse-driver.readthedocs.io/en/latest/development.html.
    opened by NI1993 3
  • fix byte: if convert error, use ori text

    fix byte: if convert error, use ori text

    • fixes #

    Checklist:

    • [x] Add tests that demonstrate the correct behavior of the change. Tests should fail without the change.
    • [x] Add or update relevant docs, in the docs folder and in code.
    • [x] Ensure PR doesn't contain untouched code reformatting: spaces, etc.
    • [x] Run flake8 and fix issues.
    • [x] Run pytest no tests failed. See https://clickhouse-driver.readthedocs.io/en/latest/development.html.
    opened by nian0114 1
  • Coalesce to default value for notnull type when inserting

    Coalesce to default value for notnull type when inserting

    Describe the bug when type is notnull LowCardinality(String) DEFAULT 'NULL'

    To Reproduce Minimal piece of Python code that reproduces the problem.

        client.execute("drop table if exists testnn")
        client.execute("create table testnn (x String DEFAULT 'aaa') Engine=Memory")
        client.execute("insert into testnn VALUES", [('3')])
        client.execute("insert into testnn VALUES", [{'x': 'yee'}])
        print(list(client.execute_iter('select * from testnn')))
        client.execute("insert into testnn VALUES", [{'x': None}])
        client.execute("insert into testnn VALUES", [(None)])
    

    Expected behavior Should success as the default value

    Versions

    • Version of package with the problem. 0.2.4
    • ClickHouse server version. Version can be obtained by running SELECT version() query. 22.6.1.1985
    • Python version. 3.9
    bug 
    opened by spff 1
  • Feature Request: Support pandas StringArray and ArrowStringArray

    Feature Request: Support pandas StringArray and ArrowStringArray

    pandas has three type to store string , object string string[pyarrow]

    according to user guide

    We recommend using StringDtype to store text data.

    [docs] v1.5 arrays-string

    and in my use case, it use 2GiB with object and 400MiB with string[pyarrow], saving memory a lot

    but insert_dataframe does not support string nor string[pyarrow] (v0.2.4)

    Minimal, Reproducible Example

    create table test (
        text String
    ) ENGINE = MergeTree()
    ORDER BY (text)
    
    import pandas as pd
    from clickhouse_driver import Client
    
    client = Client(
        host="localhost",
        settings={"use_numpy": True},
    )
    client.insert_dataframe(
        "INSERT INTO test VALUES",
        dataframe=pd.DataFrame(
            ["test"],
            columns=["text"],
            dtype="string[pyarrow]",
        ),
    )
    

    it would be great if we support string string[pyarrow]

    Thank you!

    feature request 
    opened by Simon-Chenzw 2
  • Timestamp which dtype is Float64 are treated with timezone

    Timestamp which dtype is Float64 are treated with timezone

    Describe the bug When I insert a df with a timestamp into a table, if the dtype of that timestamp column is Float64, it is considered to be local_time, and then the value inserted into the clickhouse produces a wrong offset, which is offset for me by 8 hours.

    To Reproduce

    from clickhouse_driver import Client
    import pandas as pd
    
    port = 9000
    client = Client(
        host='192.168.1.160',
        settings={
            'use_numpy': True},
        port=9000,
        user='default',
        password='')
    
    sql1 = '''
    
    drop table if exists dt;
    '''
    sql2 = '''
    CREATE TABLE dt
    (
        `timestamp` DateTime64(9),
        `event_id` UInt8
    )
    ENGINE = TinyLog;
    '''
    client.execute(sql1)
    client.execute(sql2)
    data = [
        (1659099540588999936, 1),
        (1659099540588999936, 2),
        # ('2022-07-29 20:59:00.588999936', 3),
        (1659099540588999936.0, 2),
    
    ]
    
    df = pd.DataFrame(data)
    df.columns= ['timestamp', 'event_id']
    df.info()
    res = client.insert_dataframe(f'INSERT INTO dt VALUES', df)
    
    df2 = client.query_dataframe('SELECT *,toUnixTimestamp64Nano(timestamp) FROM dt;')
    print(df2)
    
    output:
                       timestamp  event_id  toUnixTimestamp64Nano_timestamp_
    0 2022-07-29 12:59:00.588998         1               1659070740588998912
    1 2022-07-29 12:59:00.588998         2               1659070740588998912
    2 2022-07-29 12:59:00.588998         2               1659070740588998912
    

    Expected behavior I hope the value of timestamp do not minus 8hours.

    Apparently, it treated my UTC time as UTC+8

                       timestamp  event_id  toUnixTimestamp64Nano_timestamp_
    0 2022-07-29 20:59:00.588999         1               1659099540588999936
    1 2022-07-29 20:59:00.588999         2               1659099540588999936
    2 2022-07-29 20:59:00.588999         2               1659099540588999936
    
    (1659099540588999936-1659070740588998912)/1e9/3600
    Out[28]: 8.000000000284444
    

    Versions

    • numpy==1.19.0
    • pandas==1.0.5
    • clickhouse-driver==0.2.2
    • ClickHouse server version==21.8.4.51
    • Python version==3.7
    opened by zhouwuji 0
Owner
Marilyn System
Marilyn System
python-beryl, a Python driver for BerylDB.

python-beryl, a Python driver for BerylDB.

BerylDB 3 Nov 24, 2021
DataStax Python Driver for Apache Cassandra

DataStax Driver for Apache Cassandra A modern, feature-rich and highly-tunable Python client library for Apache Cassandra (2.1+) and DataStax Enterpri

DataStax 1.3k Dec 25, 2022
PyMongo - the Python driver for MongoDB

PyMongo Info: See the mongo site for more information. See GitHub for the latest source. Documentation: Available at pymongo.readthedocs.io Author: Mi

mongodb 3.7k Jan 8, 2023
Motor - the async Python driver for MongoDB and Tornado or asyncio

Motor Info: Motor is a full-featured, non-blocking MongoDB driver for Python Tornado and asyncio applications. Documentation: Available at motor.readt

mongodb 2.1k Dec 26, 2022
Motor - the async Python driver for MongoDB and Tornado or asyncio

Motor Info: Motor is a full-featured, non-blocking MongoDB driver for Python Tornado and asyncio applications. Documentation: Available at motor.readt

mongodb 1.6k Feb 6, 2021
Neo4j Bolt driver for Python

Neo4j Bolt Driver for Python This repository contains the official Neo4j driver for Python. Each driver release (from 4.0 upwards) is built specifical

Neo4j 762 Dec 30, 2022
Pure-python PostgreSQL driver

pg-purepy pg-purepy is a pure-Python PostgreSQL wrapper based on the anyio library. A lot of this library was inspired by the pg8000 library. Credits

Lura Skye 11 May 23, 2022
An asyncio compatible Redis driver, written purely in Python. This is really just a pet-project for me.

asyncredis An asyncio compatible Redis driver. Just a pet-project. Information asyncredis is, like I've said above, just a pet-project for me. I reall

Vish M 1 Dec 25, 2021
asyncio compatible driver for elasticsearch

asyncio client library for elasticsearch aioes is a asyncio compatible library for working with Elasticsearch The project is abandoned aioes is not su

null 97 Sep 5, 2022
MySQL database connector for Python (with Python 3 support)

mysqlclient This project is a fork of MySQLdb1. This project adds Python 3 support and fixed many bugs. PyPI: https://pypi.org/project/mysqlclient/ Gi

PyMySQL 2.2k Dec 25, 2022
MySQL database connector for Python (with Python 3 support)

mysqlclient This project is a fork of MySQLdb1. This project adds Python 3 support and fixed many bugs. PyPI: https://pypi.org/project/mysqlclient/ Gi

PyMySQL 2.2k Dec 25, 2022
Async database support for Python. 🗄

Databases Databases gives you simple asyncio support for a range of databases. It allows you to make queries using the powerful SQLAlchemy Core expres

Encode 3.2k Dec 30, 2022
Python interface to Oracle Database conforming to the Python DB API 2.0 specification.

cx_Oracle version 8.2 (Development) cx_Oracle is a Python extension module that enables access to Oracle Database. It conforms to the Python database

Oracle 841 Dec 21, 2022
asyncio (PEP 3156) Redis support

aioredis asyncio (PEP 3156) Redis client library. Features hiredis parser Yes Pure-python parser Yes Low-level & High-level APIs Yes Connections Pool

aio-libs 2.2k Jan 4, 2023
A pandas-like deferred expression system, with first-class SQL support

Ibis: Python data analysis framework for Hadoop and SQL engines Service Status Documentation Conda packages PyPI Azure Coverage Ibis is a toolbox to b

Ibis Project 2.3k Jan 6, 2023
Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.

dataset: databases for lazy people In short, dataset makes reading and writing data in databases as simple as reading and writing JSON files. Read the

Friedrich Lindenberg 4.2k Jan 2, 2023
An extension package of 🤗 Datasets that provides support for executing arbitrary SQL queries on HF datasets

datasets_sql A ?? Datasets extension package that provides support for executing arbitrary SQL queries on HF datasets. It uses DuckDB as a SQL engine

Mario Šaško 19 Dec 15, 2022
TileDB-Py is a Python interface to the TileDB Storage Engine.

TileDB-Py TileDB-Py is a Python interface to the TileDB Storage Engine. Quick Links Installation Build Instructions TileDB Documentation Python API re

TileDB, Inc. 149 Nov 28, 2022
Asynchronous interface for peewee ORM powered by asyncio

peewee-async Asynchronous interface for peewee ORM powered by asyncio. Important notes Since version 0.6.0a only peewee 3.5+ is supported If you still

05Bit 666 Dec 30, 2022