Mongita is to MongoDB as SQLite is to SQL

Related tags

Database mongita
Overview

Mongita Logo

Version 0.1.0 Build passing Coverage 100% License BSD

Mongita is a lightweight embedded document database that implements a commonly-used subset of the MongoDB/PyMongo interface. Mongita differs from MongoDB in that instead of being a server, Mongita is a self-contained Python library. Mongita can be configured to store its documents either on disk or in memory.

"Mongita is to MongoDB as SQLite is to SQL"

Mongita is in active development. Please report any bugs. Mongita is free and open source. You can contribute!

Applications

  • Embedded database: Mongita is a good alternative to SQLite for embedded applications when a document database makes more sense than a relational one.
  • Unit testing: Mocking PyMongo/MongoDB is a pain. Worse, mocking can hide real bugs. By monkey-patching PyMongo with Mongita, unit tests can be more faithful while remaining isolated.

Design goals

  • MongoDB compatibility: Mongita implements a commonly-used subset of the PyMongo API. This allows projects to be started with Mongita and later upgraded to MongoDB once they reach an appropriate scale.
  • Embedded/self-contained: Mongita does not require a server or start a process. It is just a Python library. To use it, just add import mongita to the top of your script.
  • Speed: Mongita is comparable-to or faster than both MongoDB and Sqlite in 10k document benchmarks. See the performance section below.
  • Well tested: Mongita has 100% test coverage and more test code than library code.
  • Limited dependencies: Mongita runs anywhere that Python runs. Currently the only dependencies are pymongo (for bson) and sortedcontainers (for faster indexes).
  • Thread-safe: (EXPERIMENTAL) Mongita avoids race conditions by isolating certain document modification operations.

When NOT to use Mongita

  • You need a traditional server/client relationship: Mongita is an embedded database. It is not process-safe. When you have multiple clients, a traditional server/client database is the correct choice.
  • You run a lot of uncommon commands: Mongita implements a commonly used subset of MongoDB. While the goal is to eventually implement most of it, it will take some time to get there.
  • You need extreme performance: Mongita has comparable performance to MongoDB and SQLite for common operations. However, it's possible you'll find bottlenecks - especially with uncommon operations.

Installation

pip3 install mongita

Hello world

>>> from mongita import MongitaClientDisk
>>> client = MongitaClientDisk()
>>> hello_world_db = client.hello_world_db
>>> mongoose_types = hello_world_db.mongoose_types
>>> mongoose_types.insert_many([{'name': 'Meercat', 'not_into', 'Snakes'},
                                {'name': 'Yellow mongoose': 'eats': 'Termites'}])
InsertResult()
>>> mongoose_types.count_documents({})
2
>>> mongoose_types.update_one({'name': 'Meercat'}, {'$set': {"weight": 2}})
UpdateResult()
>>> mongoose_types.find({'weight': {'$gt': 1})
Cursor()
>>> list(coll.find({'weight': {'$gt': 1}))
[{'_id': 'a1b2c3d4e5f6', 'weight': 2, 'name': 'Meercat'}]
>>> coll.delete_one({'name': 'Meercat'})
DropResult()

Performance

Inserts and access Finds Updates and deletes Cold start

API

Refer to the PyMongo docs for detailed syntax and behavior. Most named keyword parameters are not implemented. When something is not implemented, efforts are made to be loud and obvious about it.

mongita.MongitaClientMemory / mongita.MongitaClientDisk (PyMongo docs)

mongita.MongitaClient.close()
mongita.MongitaClient.list_database_names()
mongita.MongitaClient.list_databases()
mongita.MongitaClient.drop_database(name_or_database)

Database (PyMongo docs)

mongita.Database.list_collection_names()
mongita.Database.list_collections()
mongita.Database.drop_collection(name_or_collection)

Collection (PyMongo docs)

mongita.Collection.insert_one(document)
mongita.Collection.insert_many(documents, ordered=True)
mongita.Collection.find_one(filter, sort)
mongita.Collection.find(filter, sort, limit)
mongita.Collection.replace_one(filter, replacement, upsert=False)
mongita.Collection.update_one(filter, update)
mongita.Collection.update_many(filter, update)
mongita.Collection.delete_one(filter)
mongita.Collection.delete_many(filter)
mongita.Collection.count_documents(filter)
mongita.Collection.distinct(key, filter)
mongita.Collection.create_index(keys)
mongita.Collection.drop_index(index_or_name)
mongita.Collection.index_information()

Cursor (PyMongo docs)

mongita.Cursor.sort(key_or_list, direction=None)
mongita.Cursor.next()
mongita.Cursor.limit(limit)
mongita.Cursor.close()

CommandCursor (PyMongo docs)

mongita.CommandCursor.next()
mongita.CommandCursor.close()

errors (PyMongo docs)

mongita.errors.MongitaError (parent class of all errors)
mongita.errors.PyMongoError (alias of MongitaError)
mongita.errors.InvalidOperation
mongita.errors.OperationFailure
mongita.errors.DuplicateKeyError
mongita.errors.MongitaNotImplementedError

results (PyMongo docs)

mongita.results.InsertOneResult
mongita.results.InsertManyResult
mongita.results.UpdateResult
mongita.results.DeleteResult

Currently implemented query operators

$eq
$gt
$gte
$in
$lt
$lte
$ne
$nin

Currently implemented update operators

$set
$inc

Contributing

Mongita is an excellent project for open source contributors. There is a lot to do and it is easy to get started. In particular, the following tasks are high in priority:

You are welcome to email me at [email protected] if you are interested.

License

BSD 3-clause. Mongita is free and open source for any purpose with basic restrictions related to liability, warranty, and endorsement.

History

Mongita was started as a component of the fastmap server. Fastmap offloads and parallelizes arbitrary Python functions on the cloud.

Similar projects

  • TinyMongo: Python library. Attempts to replicate the MongoDB interface.
  • MontyDb: Python library. Attempts to replicate the MongoDB interface.
  • UnQLite: Embedded NoSQL with Python bindings. Does not attempt to replicate the MongoDB interface. Very popular.
  • NeDB: Javascript library. Attempts to replicate the MongoDB interface. Very popular.
Comments
  • Connection with mongoengine

    Connection with mongoengine

    This is a really awesome lib. I wanted to see if I am able to start a db, and use it with mongoengine. I'm not sure if it requires an actual connection.

    opened by rocky-holms 13
  • MongoEngine + mongita

    MongoEngine + mongita

    While it is possible to use MongoEngine 0.22.1 with mongita-1.1.0 MongitaClientMemory:

    import pymongo
    import mongita
    #  This works
    pymongo.MongoClient = mongita.MongitaClientMemory
    import mongoengine
    mongoengine.connect(host='c:/temp/mongita')
    

    It does not work with MongitaClientDisk:

    import pymongo
    import mongita
    # This fails
    pymongo.MongoClient = mongita.MongitaClientDisk
    import mongoengine
    mongoengine.connect(host='c:/temp/mongita')
    

    The MongitaClientDisk constructor always fails.

    Not clear if this is just a version mis-match with mongoengine or not. The error also affects the unit tests for mongita.

    The problem seems to be in mongita_client.py in the MongitaClientDisk constructor where it invokes:

    disk_engine.DiskEngine.create(host)
    

    This is because according to: https://pymongo.readthedocs.io/en/stable/api/pymongo/mongo_client.html the host parameter to the MongoClient() method is a list, not a string. Since the DiskEngine.create() factory method is expecting a string, it reports the error: "unhashable type: list"

    The simple fix is to test in the MongoClientDisk constructor in mongita_client.py to see if the parameter is a list and if so pluck the first element. Since the default value for mongoengine is 'localhost' this is worth checking for and replacing with DEFAULT_STORAGE_DIR.

    While we are at it, a check for the existence of the parent directory where the database is to be located is worthwhile. This leaves us with:

        def __init__(self, host=DEFAULT_STORAGE_DIR, **kwargs):
            host = host or DEFAULT_STORAGE_DIR
            if host == 'localhost':
                host = DEFAULT_STORAGE_DIR
     
             if os.path.exists(os.path.dirname(host)):
                raise NotADirectoryError(os.path.dirname(host))
     
            self.engine = disk_engine.DiskEngine.create(host)
    

    I'd be happy to generate a pull request, but I'd like to know that the pre-existing unit tests work on some system before requesting a pull for a fix that may be out of date.

    opened by iraytrace 6
  • $in operator fails on array fields

    $in operator fails on array fields

    According to https://docs.mongodb.com/manual/reference/operator/query/in/#mongodb-query-op.-in

    If the field holds an array, then the $in operator selects the documents whose field holds an array that contains at least one element that matches a value in the specified array (for example, <value1>, <value2>, and so on).

    mongita raises an error in this case:

    Traceback (most recent call last):
      File "test.py", line 57, in test_mongita
        result = list(col.find({"names: {"$in": ["asd", "qwe"]}}))
      File "Python38\lib\site-packages\mongita\cursor.py", line 56, in __iter__
        for el in self._gen():
      File "Python38\lib\site-packages\mongita\collection.py", line 870, in __find
        for doc_id in self.__find_ids(filter, sort, limit, metadata=metadata):
      File "Python38\lib\site-packages\mongita\collection.py", line 845, in __find_ids
        if doc and _doc_matches_slow_filters(doc, slow_filters):
      File "Python38\lib\site-packages\mongita\collection.py", line 193, in _doc_matches_slow_filters
        if _doc_matches_agg(doc_v, query_ops):
      File "Python38\lib\site-packages\mongita\collection.py", line 143, in _doc_matches_agg
        if doc_v not in query_val:
    TypeError: unhashable type: 'list'
    
    opened by Dobatymo 6
  • Please add cursor.skip

    Please add cursor.skip

    First off, I love this library. It's one of the main reasons I made the jump from NodeJS with NEDB to Python. However, I'm trying to implement pagination on my Flask site, and the cursor.skip method would make life a little easier for me. Thank you for taking the time for implementing this feature. I appreciate all you do.

    opened by michaelkornblum 4
  • Need to reinstance after update

    Need to reinstance after update

    Hi, Very nice work with the mongita project!

    I have just started experimenting with it. I have found that if i write to a collection in one process and reads from it in another, the values when reading will not be updated without reinstance the MongitaClient. From behaviour it seems like the full db is loaded to memory? is there a function I can use to refresh?

    I was testing this with MongitaClientDisk.

    If I open two instances of the same db and collection in the same process the changes to the collection is reflected immediately.

    Best regards.

    opened by LMSAas 4
  • Benchmarks suggetions

    Benchmarks suggetions

    @scottrogowski, this is a really nice project. The name is awesome too!

    I have not had the chance yet to really git it a spin, however I think the benchmarks can be improved a bit. I believe SQLite performance comparision is can be improved if you compared insertion of a dict into JSON.

    I think this is where most of the CPU cycles in the row insertion are consumed, which make SQLite look so bad ...

    def _to_sqlite_row(doc):
        doc['_id'] = str(doc['_id'])
        return (doc['_id'], doc['name'], doc['dt'], doc['count'], doc['city'],
                doc['content'], doc['percent'],
                json.dumps(doc['dict'], default=json_util.default))
    

    Thanks for publishing this nifty little project!

    opened by oz123 4
  • Add support for $push

    Add support for $push

    First of all, thank you for this library. I love it.

    Would love to see $push implemented.

    https://docs.mongodb.com/manual/reference/operator/update/push/

    If I have a chance I'll make an attempt at it this week. What modules would I need to touch for this?

    opened by Kilo59 3
  • Pip install mongita -- Segment Fault

    Pip install mongita -- Segment Fault

    I am trying to install mongita on Python 3.9 on a Windows machine. I am getting the following error. Any ideas?:

    $ pip install mongita Collecting mongita Using cached mongita-1.0.0.tar.gz (33 kB) ERROR: Command errored out with exit status 3221225477: command: 'D:\anaconda\envs\py39\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\len_w\AppData\Local\Temp\pip-install-ee8rj6s8\mongita\setup.py'"'"'; file='"'"'C:\Users\len_w\AppData\Local\Temp\pip-install-ee8rj6s8\mongita\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\len_w\AppData\Local\Temp\pip-pip-egg-info-m1bl3ksn' cwd: C:\Users\len_w\AppData\Local\Temp\pip-install-ee8rj6s8\mongita
    Complete output (11 lines): running egg_info creating C:\Users\len_w\AppData\Local\Temp\pip-pip-egg-info-m1bl3ksn\mongita.egg-info writing C:\Users\len_w\AppData\Local\Temp\pip-pip-egg-info-m1bl3ksn\mongita.egg-info\PKG-INFO writing dependency_links to C:\Users\len_w\AppData\Local\Temp\pip-pip-egg-info-m1bl3ksn\mongita.egg-info\dependency_links.txt writing requirements to C:\Users\len_w\AppData\Local\Temp\pip-pip-egg-info-m1bl3ksn\mongita.egg-info\requires.txt writing top-level names to C:\Users\len_w\AppData\Local\Temp\pip-pip-egg-info-m1bl3ksn\mongita.egg-info\top_level.txt writing manifest file 'C:\Users\len_w\AppData\Local\Temp\pip-pip-egg-info-m1bl3ksn\mongita.egg-info\SOURCES.txt' reading manifest file 'C:\Users\len_w\AppData\Local\Temp\pip-pip-egg-info-m1bl3ksn\mongita.egg-info\SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching 'LICENSE,' writing manifest file 'C:\Users\len_w\AppData\Local\Temp\pip-pip-egg-info-m1bl3ksn\mongita.egg-info\SOURCES.txt' ---------------------------------------- ERROR: Command errored out with exit status 3221225477: python setup.py egg_info Check the logs for full command output.

    Segmentation fault

    opened by lwanger 3
  • Question about parallelization

    Question about parallelization

    I see that it is written in the docs:

    It is not process-safe

    But the particular case is not clear. What if to create one client to work with one collection?. Example with MongoDB. Is it a safe approach?

    opened by PlatonB 2
  • Feature request: Allow callback functions in slow codepatch

    Feature request: Allow callback functions in slow codepatch

    Hi!

    I am not sure if it's against the design goals of the library, but it would be very useful to be able to provide custom Python callback functions to the _doc_matches_slow_filters code path of the find() method. That means a custom function which takes a single document and returns True or False if it should be included in the output. As fast as I understand this should be easy to implement as that's what the non-indexed code path is basically doing anyway.

    Using that it would also be easy to work around operators which are not implemented yet.

    opened by Dobatymo 2
  • Is this api thread safe at all?

    Is this api thread safe at all?

    I'm experiencing weird issues,

    the api will throw a key error exception when accessing a dict. And after that, the db will be erased.

    It seems to be a thread safety issue? read/write can't happen at the same time.

    opened by shi-yan 2
  • Implement index intersection + create_indexes

    Implement index intersection + create_indexes

    When trying to create more than one index via create_indexes, a MongitaNotImplementedError occurs. I understand that implementing index intersection is a difficult task, but I really hope that it will be in Mongita someday.

    opened by PlatonB 1
  • Is the existence of mongita.errors.InvalidName necessary?

    Is the existence of mongita.errors.InvalidName necessary?

    mongita.errors.InvalidName: Collection cannot be named 'Nerve_Tibial.v8.egenes_ann_query_res.vcf'.

    It seems to me that the prohibition of presence of non-letter symbols in the collection name is superfluous. By the way, MongoDB does not have this restriction.

    opened by PlatonB 2
  • Compression of collections

    Compression of collections

    It is not clear whether collections are compressed. If so, what algorithm is used by default? Can the user specify a preferred algorithm and compression level?

    An example from the MongoDB world: create_collection(name, storageEngine={'wiredTiger': {'configString': 'block_compressor=zstd'}})

    opened by PlatonB 2
  • Retrieving documents with specific fields with find() is not working/implemented.

    Retrieving documents with specific fields with find() is not working/implemented.

    I've dumped my Mongo Database in local folder now working with it mongitaDB.

    PyMongo has such functionality where I can retrieve documents from the Database with specific fields. For an example:

    mongo_client.db.col.find({},{"_id":1})
    

    This line returns the cursor from where I will get the "_id" field. Then, I've tried something similar with mongitaDB.

    mongita_client.db.col.find({},{"_id":1})
    

    It raises following error:

    mongita.errors.MongitaError: Unsupported sort parameter format. See the docs.
    
    opened by ipritom 2
  • mongorestore and mongodump equivalent functionality

    mongorestore and mongodump equivalent functionality

    That's an awesome tools, and really very close to MongoDB. Is there any feature equivalent to mongodump and mongorestore (or mongoexport and mongoimport)? If not, it would be a useful addition to the module. Thanks.

    opened by ipritom 3
Owner
Scott Rogowski
https://ffer.io && https://fastmap.io
Scott Rogowski
This is a simple graph database in SQLite, inspired by

This is a simple graph database in SQLite, inspired by "SQLite as a document database".

Denis Papathanasiou 1.2k Jan 3, 2023
Python function to extract all the rows from a SQLite database file while iterating over its bytes, such as while downloading it

Python function to extract all the rows from a SQLite database file while iterating over its bytes, such as while downloading it

Department for International Trade 16 Nov 9, 2022
Python function to query SQLite files stored on S3

sqlite-s3-query Python function to query a SQLite file stored on S3. It uses multiple HTTP range requests per query to avoid downloading the entire fi

Michal Charemza 189 Dec 27, 2022
Tools for analyzing Git history using SQLite

git-history Tools for analyzing Git history using SQLite Installation Install this tool using pip: $ pip install git-history Usage This tool can be r

Simon Willison 128 Jan 2, 2023
Manage your sqlite database very easy (like django) ...

Manage your sqlite database very easy (like django) ...

aWolver 1 Feb 9, 2022
Monty, Mongo tinified. MongoDB implemented in Python !

Monty, Mongo tinified. MongoDB implemented in Python ! Was inspired by TinyDB and it's extension TinyMongo

David Lai 523 Jan 2, 2023
Enfilade: Tool to Detect Infections in MongoDB Instances

Enfilade: Tool to Detect Infections in MongoDB Instances

Aditya K Sood 7 Feb 21, 2022
Migrate data from SQL to NoSQL easily

Migrate data from SQL to NoSQL easily Installation ?? pip install sql2nosql --upgrade Dependencies ?? For the package to work, it first needs "clients

Facundo Padilla 43 Mar 26, 2022
Caretaker 2 Jun 6, 2022
PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.

PyPika - Python Query Builder Abstract What is PyPika? PyPika is a Python API for building SQL queries. The motivation behind PyPika is to provide a s

KAYAK 1.9k Jan 4, 2023
Crud-python-sqlite: used to manage telephone contacts through python and sqlite

crud-python-sqlite This program is used to manage telephone contacts through python and sqlite. Dependencicas python3 sqlite3 Installation Clone the r

Luis Negrón 0 Jan 24, 2022
Google-drive-to-sqlite - Create a SQLite database containing metadata from Google Drive

google-drive-to-sqlite Create a SQLite database containing metadata from Google

Simon Willison 140 Dec 4, 2022
MongoDB data stream pipeline tools by YouGov (adopted from MongoDB)

mongo-connector The mongo-connector project originated as a MongoDB mongo-labs project and is now community-maintained under the custody of YouGov, Pl

YouGov 1.9k Jan 4, 2023
Backend, modern REST API for obtaining match and odds data crawled from multiple sites. Using FastAPI, MongoDB as database, Motor as async MongoDB client, Scrapy as crawler and Docker.

Introduction Apiestas is a project composed of a backend powered by the awesome framework FastAPI and a crawler powered by Scrapy. This project has fo

Fran Lozano 54 Dec 13, 2022
Soda SQL Data testing, monitoring and profiling for SQL accessible data.

Soda SQL Data testing, monitoring and profiling for SQL accessible data. What does Soda SQL do? Soda SQL allows you to Stop your pipeline when bad dat

Soda Data Monitoring 51 Jan 1, 2023
tfquery: Run SQL queries on your Terraform infrastructure. Query resources and analyze its configuration using a SQL-powered framework.

??️ tfquery ??️ Run SQL queries on your Terraform infrastructure. Ask questions that are hard to answer ?? What is tfquery? tfquery is a framework tha

Mazin Ahmed 311 Dec 21, 2022
dask-sql is a distributed SQL query engine in python using Dask

dask-sql is a distributed SQL query engine in Python. It allows you to query and transform your data using a mixture of common SQL operations and Python code and also scale up the calculation easily if you need it.

Nils Braun 271 Dec 30, 2022
Given a metadata file with relevant schema, an SQL Engine can be run for a subset of SQL queries.

Mini-SQL-Engine Given a metadata file with relevant schema, an SQL Engine can be run for a subset of SQL queries. The query engine supports Project, A

Prashant Raj 1 Dec 3, 2021
CLI for SQLite Databases with auto-completion and syntax highlighting

litecli Docs A command-line client for SQLite databases that has auto-completion and syntax highlighting. Installation If you already know how to inst

dbcli 1.8k Dec 31, 2022
A supercharged SQLite library for Python

SuperSQLite: a supercharged SQLite library for Python A feature-packed Python package and for utilizing SQLite in Python by Plasticity. It is intended

Plasticity 703 Dec 30, 2022