Lately, I've been thinking about how to move TinyDB forward and what the next steps are. And the more I thought about it, the more I became convinced, that TinyDB needs a v4.0.0.
Motivation
Why would we want a TinyDB v4.0? Why introducing a backwards-incompatible releases? In my view, the reasons to publish a new major release is threefold:
- Remove deprecated functionality that's been waiting to be removed for more than two years now.
- Fix design issues that have been introduced by a lack of vision for extension mechanisms in TinyDB.
- Simplify the architecture in order to fix other issues that cannot be solved without breaking backwards compatibility.
To elaborate on these reasons:
Deprecations
TinyDB is 6 years old now. The first stable release (v1.0.0) was published in July 2013. A year later in September 2014 there has been a major release (v2.0.0) that changed the data format and improved the API. Again a year later in November 2015, the next major release (v3.0.0) cleaned up the query language architecture and started moving non-core functionality to external packages.
Version 3.0.0 is now almost 4 years old. In the meantime TinyDB continued to evolve, including shifting the terminology from elements to documents in v3.6.0 and the deprecation of ujson
support. Both of these changes have been major cleanups, but there hasn't been a major release of TinyDB that would actually get rid of the deprecated features. This results in cluttered code which makes it harder to understand the TinyDB software design/architecture.
In addition, Python 2, which TinyDB supports, will reach its end of life at the end of 2019. As TinyDB has quite a few places where it has to do extra work to support both Python 2 and 3 from the same code base, dropping Python 2 support would simplify the code even further.
TinyDB v4.0.0 would simplify the source code by removing deprecated features and in turn make it easier to understand the source and to develop one's own extensions. In addition, only Python 3 would be supported.
Extension Mechanisms
Right from the start, there have been two ways to extend TinyDB: Custom Storages and Custom Middlewares. As the popularity and usage of TinyDB increased, so did requests to make it possible to extend other parts of TinyDB. Thus, Custom Table Classes and Custom Storage Proxy Classes have been added. In addition, mechanisms to modify the default table class name and parameters as well as the default storage class have been introduced. As as result there are no less than seven places where TinyDB's behaviour can be modified.
Except for the first two, all extension mechanisms have been introduced as a result of user requests. At the time of each request, it seemed to be the best option to follow the path of least resistance when adding a new extension mechanism, refraining from any soft of breaking changes. But looking back it is apparent, that there was no real concept of how extending TinyDB should work in general.
TinyDB v4.0.0 would remove all extension mechanisms except for Custom Storages. All other extension mechanisms would be replaced by a unified extension concept as detailed below.
Architecture & API
To be honest, I'm not particularly proud about TinyDB's internal software architecture. As TinyDB evolved gradually, often simplicity of the software architecture was neglected. Now we're in a state, where there's a lot of unneeded indirection. Data access uses up to 5 classes, two of which are some form of proxy class: TinyDB
→ Table
→ StorageProxy
→ DocumentProxy
→ Storage
. This makes TinyDB's source code complicated and also impacts performance (see #250). Fixing these design issues requires rearchitecting TinyDB. But this in turn requires breaking backwards compatibility as some extension mechanisms rely on the old software architecture.
Additionally, there's been discussion about inconsistencies in TinyDB's API regarding purging data (see #103). Removing these inconsistencies would break backwards compatibility.
TinyDB v4.0.0. would simplify the internal software architecture and remove inconsistencies from its API, making it easier to understand how TinyDB works and thus making it easier to extend.
Proposals
Deprecations
For the reasons outlined above, I propose to
- Put TinyDB v3 into maintenance mode, implementing only bug fixes but not adding new features,
- Remove all deprecated features,
- Drop Python 2 support
Extension Mechanisms
I propose to replace all existing extension mechanisms with Custom Storages and Inheritance. Custom Storages continue to be a useful extension mechanism that is difficult to replicate by other means. In addition to Custom Storages, the main way to extend TinyDB and to modify its behaviour would be inheritance – to create subclasses of TinyDB. A famous example of this approach is Flask:
The Flask class has many methods designed for subclassing. You can quickly add or customize behavior by subclassing Flask (see the linked method docs) and using that subclass wherever you instantiate an application class.
In addition, the Flask docs state:
As you grow your codebase, don’t just use Flask – understand it. Read the source. Flask’s code is written to be read; its documentation is published so you can use its internal APIs. Flask sticks to documented APIs in upstream libraries, and documents its internal utilities so that you can find the hook points needed for your project.
With the new extension approach, TinyDB would aim to follow the same path: Instead of adding new extension mechanisms endlessly, users would be encouraged to subclass TinyDB
and other classes in order to modify, how TinyDB fetches the last ID, how it calculates the next ID, how the default table name is determined, and other behaviours.
Implementing this requires making useful internal TinyDB methods part of the public API and documenting them in a way that makes it easy to overload them with custom behaviour. The documentation should provide examples of what types of extensions are possible by subclassing. Also, the source code itself should have its code comments reworked to make it easy to understand how TinyDB works from the first reading of the source code (based on ideas like literate programming).
The main challenge of implementing this approach is to find the right balance of how much of TinyDB's internal methods should become part of the public API. Making too few methods part of the public API makes it difficult to modify all aspects of TinyDB's behaviour. But making too many methods part of the public API makes it difficult to continue to evolve TinyDB without breaking the existing API and existing extensions and in addition cluttering the public API too much.
My approach regarding which methods to include in the public API would be to be conservative and – at first – include too few methods rather than too many. The reason behind this is that it's possible to move more methods to the public API after the fact without breaking the existing API whereas the opposite would break existing usage.
Architecture & API Changes
I propose to simplify to rearchitect TinyDB to use the following classes:
TinyDB
class
- Create and manage tables
- Forward calls to the default table
Table
class
- Receive a storage instance from the
TinyDB
class
- Modify table data
- Cache query results to avoid unneeded I/O
Storage
class
- Read and write data to a storage
Query
class
- Provide searching and filtering
Document
class
- Provide a thin wrapper around stored data that remembers the document's ID
There may be additional internal classes (such as the QueryImp
and LRUCache
classes we currently have), the classes outlined above should do the lion share of the work for TinyDB. All in all, the new architecture should provide a clear separation of concerns and responsibilities. Simplifying the architecture in this way would allow to fix issue #250 (StorageProxy read performance is abysmal). Also, a simple architecture makes it easy to understand how TinyDB works which in turn would impact how easy it is to extend TinyDB using inheritance (see above). In other words: Having a simpler architecture should make it easier to extend TinyDB.
In addition to architecture changes, I propose to also simplify TinyDB's API. For one thing, we could fix issue #103 (Inconsistency with purge functions) by making function names consistent between the TinyDB
and Table
classes. For another thing, I would propose to remove the write_back
method as it complicates the API, probably is rarely used and can be implemented by subclassing, if needed. Also, I would like to make process_elements
part of the internal API again as it's a core method of how data is manipulated and probably should not be modified by subclassing.
Feedback Requested
If you have thoughts, questions, comments or ideas regarding a possible TinyDB v4.0.0, especially regarding the proposals outlined above, feel free to comment and discuss on this issue 🙂
discussion pinned design