Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.

Related tags

python sql database
Overview

dataset: databases for lazy people

build

In short, dataset makes reading and writing data in databases as simple as reading and writing JSON files.

Read the docs

To install dataset, fetch it with pip:

$ pip install dataset

Note: as of version 1.0, dataset is split into two packages, with the data export features now extracted into a stand-alone package, datafreeze. See the relevant repository here.

Issues
  • Enhancement

    Enhancement

    This PR adds following supports:

    • Table.delete return True or False. This can handle the case that try to delete not existed row/column.
    • add Table.__repr__
    enhancement 
    opened by cli248 13
  • sqlite3 OperationalError

    sqlite3 OperationalError

    Hello there,

    I always get following error

    Traceback (most recent call last): File "./compile_results.py", line 58, in for i in table_send: File "/usr/local/lib/python2.7/dist-packages/dataset/persistence/util.py", line 73, in next return self.next() File "/usr/local/lib/python2.7/dist-packages/dataset/persistence/util.py", line 67, in next if not self._next_chunk(): File "/usr/local/lib/python2.7/dist-packages/dataset/persistence/util.py", line 58, in _next_chunk chunk = self.result_proxy.fetchmany(self.step) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/result.py", line 997, in fetchmany self.cursor, self.context) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception exc_info File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/util/compat.py", line 200, in raise_from_cause reraise(type(exception), exception, tb=exc_tb) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/result.py", line 990, in fetchmany l = self.process_rows(self._fetchmany_impl(size)) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/result.py", line 907, in _fetchmany_impl return self.cursor.fetchmany(size) sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) SQL logic error or missing database

    if I try to access dataset number 5000 of my sqlite database.

    I am importing two databases into memory with about 60000 rows, which works. I can also access the row 5000 (here id 10000) without error. However, as soon as the sqlite database reaches the for loop at this position, the whole python program crashes with the error shown above. So it seems like the database is working and got the imported data correctly (otherwise I could not access it before the for loop) - but something weird happens then. Any ideas?

    Thank you very much for your help and great product :)

    # import json from sender into memory database
    db_send = dataset.connect('sqlite:///:memory:')
    table_send = db_send['record']
    with open(sendfile) as send:
        s = json.load(send)
    table_send.insert_many(s["results"])
    send.close()
    
    # import json from receiver into memory database
    db_recv = dataset.connect('sqlite:///:memory:')
    table_recv = db_recv['record']
    with open(recvfile) as recv:
        r = json.load(recv)
    table_recv.insert_many(r["results"])
    recv.close()
    
    # init variables for matching counter
    count_search=0
    count_found=0
    
    #print len(table_send)
    #test = table_send.find_one(id=9998)
    #print test['id']
    #test = table_send.find_one(id=10000)
    #print test['id']
    #test = table_send.find_one(id=10002)
    #print test['id']
    
    # iterate through sender results, set file as sent
    # match checksum of sent files to received files
    # if matches, include time details of receiver and set received
    # update results
    for i in table_send:
            print i['id']
            i['file_sent'] = 1
            count_search += 1
            res_recv = table_recv.find_one(filechecksum=i['filechecksum'])
            i['file_sent'] = 1
            if res_recv != None:
                    count_found += 1
                    #print i['filechecksum'] + ':' + res_recv['filechecksum']
                    i['r_time_start'] = res_recv['time_start']
                    i['r_time_end'] = res_recv['time_end']
                    i['r_time'] = res_recv['time']
                    i['file_received'] = 1
            table_send.update(i,['id'])
    
    opened by nmaas87 12
  • How do I silence the

    How do I silence the "DEBUG:dataset.persistence…" messages?

    I'm getting logging messages appearing in my script output. How do I silence them? And why are they even there in the first place?

    >>> import dataset
    >>> dt = dataset.connect('sqlite:///:memory:')
    >>> table = dt['test']
    DEBUG:dataset.persistence.database:Creating table: test on Engine(sqlite:///:memory:)
    >>> table.insert({"id": "bv56fzi", "name": "This will fail"})
    DEBUG:dataset.persistence.table:Creating column: name (<class 'sqlalchemy.types.UnicodeText'>) on 'test'
    
    opened by zarino 12
  • Implement handling of iterator rows

    Implement handling of iterator rows

    Updated the input_many and update_many functions so that an iterator rows input will be handled properly, as was mentioned by @pudo in my last pull request (#298). It may be a bit over-complicated, so may require modification.

    opened by abmyii 10
  • Support for storing Numpy arrays as LargeBinary objects

    Support for storing Numpy arrays as LargeBinary objects

    It is useful to be able to store Numpy arrays in as a large binary object in a database. Here is an implementation that uses Numpy's native binary format to convert an array to a binary string and put it into the database. When reading entries from the database, there is a function that checks if the binary string starts with \x93NUMPY (which is the first 6 bytes of any numpy array) and converts the string back to a numpy array.

    opened by davidkleiven 10
  • Close connection automatically when with is used

    Close connection automatically when with is used

    If I connect to a database using the with statement like this:

    with dataset.connect(path_to_database) as db:
        pass
    

    Does the connection to database close automatically once we are outside the scope of the with statement? Or should we still explicitly close the connection?

    opened by ritviksahajpal 10
  • db close

    db close

    I'm not seeing a close method on the database API. What's the proper process to handle a close? Thanks for your efforts with this.

    opened by valbanese 9
  • connection schema handling on inserts

    connection schema handling on inserts

    On my postgresql database (9.3.2), executing the quickstart examples works when connection schema is not specified:

    >>>import dataset
    >>>db = dataset.connect()
    >>>table = db['person']
    >>>print(table.columns)
    ['id']
    >>>table.insert(dict(name='John Doe', age=46))
    1
    

    However, when connection schema is specified, the created table is not found when using insert - it seems to be looking instead in the current search_path:

    >>>import dataset
    >>>db = dataset.connect(schema="myschema")
    >>>table = db['person']
    >>>print(table.columns)
    ['id']
    >>>table.insert(dict(name='John Doe', age=46))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/dataset/persistence/table.py", line 66, in insert
        self._ensure_columns(row, types=types)
      File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/dataset/persistence/table.py", line 198, in _ensure_columns
        self.create_column(column, _type)
      File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/dataset/persistence/table.py", line 224, in create_column
        Column(name, type)
      File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/alembic/operations.py", line 365, in add_column
        schema=schema
      File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/alembic/ddl/impl.py", line 127, in add_column
        self._exec(base.AddColumn(table_name, column, schema=schema))
      File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/alembic/ddl/impl.py", line 76, in _exec
        conn.execute(construct, *multiparams, **params)
      File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1651, in execute
        return connection.execute(statement, *multiparams, **params)
      File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 717, in execute
        return meth(self, multiparams, params)
      File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/sql/ddl.py", line 67, in _execute_on_connection
        return connection._execute_ddl(self, multiparams, params)
      File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 771, in _execute_ddl
        compiled
      File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 927, in _execute_context
        context)
      File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1076, in _handle_dbapi_exception
        exc_info
      File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 185, in raise_from_cause
        reraise(type(exception), exception, tb=exc_tb)
      File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 920, in _execute_context
        context)
      File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 425, in do_execute
        cursor.execute(statement, parameters)
    sqlalchemy.exc.ProgrammingError: (ProgrammingError) relation "person" does not exist
     'ALTER TABLE person ADD COLUMN age INTEGER' {}
    
    opened by smnorris 9
  • connecting times out when connecting to a db with an

    connecting times out when connecting to a db with an "@" in the password.

    This might be a sqlalchemy thing, but when I try to connect to a postgres db with the following (example) url:

    import dataset
    db_url = "postgres://drsuess:[email protected]@123.435.123.1:5432/lorax"
    db = dataset.connect(db_url)
    

    The connection attempt just freezes indefinitely. I tried changing the password and it remedied the situation, but you might want to add a check / warning for this.

    opened by abelsonlive 9
  • Fix #90: added `with` statement support to the Database class.

    Fix #90: added `with` statement support to the Database class.

    This PR adds support of with statement for Database class. Basic usage:

    with dataset.connect('...') as db:
            db['...'].insert()
    

    If an exception is raised within the scope of with then all changes are rolled back and exception is passed forward. Otherwise all changes are committed upon leaving with scope.

    Tests are added as well.

    This PR partially fixes the issue https://github.com/pudo/dataset/issues/90.

    opened by victorkashirin 8
  • Fixed connection to database not closing when context manager exits

    Fixed connection to database not closing when context manager exits

    Currently, when a database connection made with a context manager exits, the connection to the database isn't actually closed.

    For example, the following code will cause a stock MySQL database to lock up in about 15-20 seconds when it hits 151 simultaneous open connections:

    import time
    
    import dataset
    
    while True:
        with dataset.connect("mysql://root:[email protected]/database") as db:
            table = db["tickets"]
            ticket = table.find_one(user_id=0000000000000000)
    
        time.sleep(0.1)
    
    opened by Snaacky 0
  • Add support for decimal/numeric and varchar types

    Add support for decimal/numeric and varchar types

    Financial datasets require decimal/numeric(16,8)

    opened by tumikosha 0
  • UPSERT_MANY creates another index

    UPSERT_MANY creates another index

    Hey!

    I have a table with uuid as a Primary key (see my screenshot with my table's indexes). I want to upsert_many(items, keys=["uuid"]) to this table. In result, another index on the same column uuid is created by dataset.

    I expect that dataset should use already existed Primary Key Index instead of creating another one which is basically the same.

    image

    opened by ohld 0
  • Add selecting json function into find method

    Add selecting json function into find method

    Related #262. Hi pudo, I have implemented selecting json function that can combine with other find parameters. Here are some using examples:

    # Notice: selected key type depends on giving value type,
    #         like if given integer but stored type is float will be automatically transformed to integer.
    # Support operations: >(gt), <(lt), >=(gte), <=(lte), =(==,is), !=(<>, not), between("..")
    # id json_column
    # 0  {"key":-0.5}
    # 1  {"key":0.5}
    # 2  {"key":1.5}
    results = table.find(_json={'json_column':{'key':{'>=': 0.0, '<':1.0}}}) # id = [1]
    results = table.find(_json={'json_column':{'key':{'>=': 0, '<':1}}}) # int(-0.5)==0, id = [0,1]
    
    # id json_column
    # 0  [0,1,2]
    # 1  [0,0.5,1]
    # 2  [0]
    # find rows by index
    results = table.find(_json={'json_column':{1:{'>=': 0.0, '<':1.0}}}) # id = [1]
    
    # id json_column
    # 0  {"key1":{"key2":-1}}
    # 1  {"key1":{"key2":0.5}}
    # find rows by path
    results = table.find(_json={'json_column':{('key1','key2'):{'between':[0.0,1.0]}}}) # id = [1]
    

    However, current sqlite standard library's version does not support this function. So, I have no idea how to add unittest.

    opened by Remalloc 0
  • TIMESTAMPTZ Datatype for PostgreSQL

    TIMESTAMPTZ Datatype for PostgreSQL

    Hi, How can I create a column with datatype of TIMESTAMPTZ (in PostgreSQL).

    I tried using this: table.create_column('datetime', db.types.TIMESTAMPTZ)

    But its giving error: AttributeError: 'Types' object has no attribute 'TIMESTAMPTZ'

    What is the right way?

    Thanks.

    opened by nagabrahmam-mantha 1
  • Add default replacement keys parameter for insert and upsert methods

    Add default replacement keys parameter for insert and upsert methods

    Hi! I've implemented the ability to use default values of the keys parameter for insert* & upsert* methods. This task is related with suggestion Can you please check the code?

    opened by dshepelev15 4
  • [Feature request] Allow to use '..' or 'between' for 'datetime'

    [Feature request] Allow to use '..' or 'between' for 'datetime'

    Currently between doesn't work for 'date' and 'datetime' types.

    opened by karolzlot 0
  • provide option for special characters in column/table names

    provide option for special characters in column/table names

    #160 is quite old, but i wanted to bring it up again because I just ran up against it.

    I'm hacking around with a bibtex-to-sqlite converter to ingest mendeley- and zotero- managed bibtex files of collections of citations, and emit a database from it. Mendeley, in the great wisdom common to all the works of Elsevier, has injected fields in the bibtex entries whose names are things like mendeley-tags. This is guarded against by dataset at the moment, so we get:

    File "/home/gvoysey/.cache/pypoetry/virtualenvs/bibtex-to-sqlite-uwnieG41-py3.8/lib/python3.8/site-packages/dataset/util.py", line 75, in normalize_column_name
        raise ValueError("%r is not a valid column name." % name)
    ValueError: 'mendeley-tags' is not a valid column name.
    

    I'd like to be able to pass a no_i_really_mean_it_ruin_my_schema_give_me_the_footgun flag in this case to allow column IDs to be created that will subsequently be required to be quoted in SELECTs etc. I'm not sure where i'd want to pass it, but possibly in .create_table().

    In my limited case, permitting weird column names is fine w/r/t dataset's table.find() method because the database i'm making will never see python again, but I understand the reluctance to expose the footgun given in #160. Still, for parity with the original data, I think it's worthwhile to have it passable. Right now i'm doing the extremely hacky thing of:

    val = mendeley_entry.pop('mendeley-tags')
    mendeley_entry['mendeleytags'] = val
    

    and that's a little distasteful.

    opened by gvoysey 0
  • Support for unique indexes and multicolumn indexes

    Support for unique indexes and multicolumn indexes

    Could you add the ability to read existing unique indexes and multicolumn indexes from the current database model? In order not to list them each time in insert (keys) and upsert (keys).

    enhancement 
    opened by vladiscripts 1
Owner
Friedrich Lindenberg
Data and software engineer, investigative support.
Friedrich Lindenberg
SQLModel is a library for interacting with SQL databases from Python code, with Python objects.

SQLModel is a library for interacting with SQL databases from Python code, with Python objects. It is designed to be intuitive, easy to use, highly compatible, and robust.

Sebastián Ramírez 5.5k Oct 24, 2021
The ormar package is an async mini ORM for Python, with support for Postgres, MySQL, and SQLite.

python async mini orm with fastapi in mind and pydantic validation

null 674 Oct 23, 2021
Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.

dataset: databases for lazy people In short, dataset makes reading and writing data in databases as simple as reading and writing JSON files. Read the

Friedrich Lindenberg 4.1k Oct 18, 2021
Rich Python data types for Redis

Created by Stephen McDonald Introduction HOT Redis is a wrapper library for the redis-py client. Rather than calling the Redis commands directly from

Stephen McDonald 273 Sep 2, 2021
A pure Python Database Abstraction Layer

pyDAL pyDAL is a pure Python Database Abstraction Layer. It dynamically generates the SQL/noSQL in realtime using the specified dialect for the databa

null 397 Oct 23, 2021
Pony Object Relational Mapper

Downloads Pony Object-Relational Mapper Pony is an advanced object-relational mapper. The most interesting feature of Pony is its ability to write que

null 2.7k Oct 25, 2021
Tortoise ORM is an easy-to-use asyncio ORM inspired by Django.

Tortoise ORM was build with relations in mind and admiration for the excellent and popular Django ORM. It's engraved in it's design that you are working not with just tables, you work with relational data.

Tortoise 2.4k Oct 20, 2021
A pythonic interface to Amazon's DynamoDB

PynamoDB A Pythonic interface for Amazon's DynamoDB. DynamoDB is a great NoSQL service provided by Amazon, but the API is verbose. PynamoDB presents y

null 1.7k Oct 23, 2021
A Python Library for Simple Models and Containers Persisted in Redis

Redisco Python Containers and Simple Models for Redis Description Redisco allows you to store objects in Redis. It is inspired by the Ruby library Ohm

sebastien requiem 434 Oct 1, 2021
An async ORM. 🗃

ORM The orm package is an async ORM for Python, with support for Postgres, MySQL, and SQLite. ORM is built with: SQLAlchemy core for query building. d

Encode 1.4k Oct 11, 2021
Pydantic model support for Django ORM

Pydantic model support for Django ORM

Jordan Eremieff 189 Oct 23, 2021
Piccolo - A fast, user friendly ORM and query builder which supports asyncio.

A fast, user friendly ORM and query builder which supports asyncio.

null 566 Oct 20, 2021
A very simple CRUD class for SQLModel! ✨

Base SQLModel A very simple CRUD class for SQLModel! ✨ Inspired on: Full Stack FastAPI and PostgreSQL - Base Project Generator FastAPI Microservices I

Marcelo Trylesinski 12 Oct 22, 2021
a small, expressive orm -- supports postgresql, mysql and sqlite

peewee Peewee is a simple and small ORM. It has few (but expressive) concepts, making it easy to learn and intuitive to use. a small, expressive ORM p

Charles Leifer 8.7k Oct 25, 2021
A Python Object-Document-Mapper for working with MongoDB

MongoEngine Info: MongoEngine is an ORM-like layer on top of PyMongo. Repository: https://github.com/MongoEngine/mongoengine Author: Harry Marr (http:

MongoEngine 3.6k Oct 24, 2021
MongoEngine flask extension with WTF model forms support

Flask-MongoEngine Info: MongoEngine for Flask web applications. Repository: https://github.com/MongoEngine/flask-mongoengine About Flask-MongoEngine i

MongoEngine 788 Oct 14, 2021
Adds SQLAlchemy support to Flask

Flask-SQLAlchemy Flask-SQLAlchemy is an extension for Flask that adds support for SQLAlchemy to your application. It aims to simplify using SQLAlchemy

The Pallets Projects 3.6k Oct 22, 2021
Beanie - is an Asynchronous Python object-document mapper (ODM) for MongoDB

Beanie - is an Asynchronous Python object-document mapper (ODM) for MongoDB, based on Motor and Pydantic.

Roman 277 Oct 18, 2021