Currently msgspec supports encoding and decoding only timezone-aware datetime.datetime
objects, holding strict conformance to RFC3339. Naive datetime.datetime
objects can be encoded using a custom enc_hook
, but there's no way to decode a naive datetime.datetime
object.
I would like to expand our builtin support for datetime
types to include:
datetime.datetime
(both aware and naive)
datetime.date
datetime.time
(both aware and naive)
Here's the plan I've come up with:
Encoding
To support encoding, we add a support_naive_datetimes
keyword argument to msgspec.*.decode
and msgspec.*.Decoder
to configure the treatment of naive datetimes. This would take one of:
False
: the default. Naive datetime
and time
objects error on encoding.
True
: allow encoding naive datetime
and time
objects. These will be encoded as their RFC3339 compatible counterparts, just missing the offset
component
"UTC"
: naive datetime
and time
objects will be treated as if they have a UTC timezone.
- a
tzinfo
object: naive datetime
and time
objects will be treated as if they have this timezone.
I'm not attached to the keyword name (or boolean options), so if someone can think of a nicer spelling I'd be happy. I think this supports all the common options.
One benefit of supporting these options builtin is that we no longer have the weird behavior of enc_hook
only being called for naive datetime.datetime
objects. This would admittedly be less weird if Python had different types for aware and naive datetimes.
I could hear an argument that the default should be True
(encoding naive datetimes/times by default), but I'm hesitant to make that change. Having an error by default if you're using a naive datetime will force users to think about timezones early on - if they really want a naive datetime they can explicitly opt into it. Supporting naive datetimes/times by default could let programming errors slip by, since most times the user does want an aware datetime rather than a naive datetime.
Decoding
To support decoding, we want to handle the following use cases:
- Only decode RFC3339 compatible datetimes and times (requiring a timezone)
- Only decode naive datetimes and times (require no timezone)
- Decode any datetime or time object (naive or aware)
Since msgspec
will only ever decode an object into a datetime if type information is provided, then the natural place to enable this configuration is through our existing type annotations system. The question then is - what does an unannotated datetime.datetime
mean?
I want msgspec
to make it easy to do the right thing, and (within reason) possible to do the flexible thing. As such, I'd argue that raw datetime.datetime
and datetime.time
annotations should only decode timezone-aware objects. This means that by default APIs built with msgspec are compatible with json-schema
(which lacks a naive datetime/time format), and common web languages like golang (which requires RFC3339 compatible strings in JSON by default).
To support naive-datetime or any-datetime types, we'd add a new config to Meta
annotations. Something like:
from msgspec import Struct, Meta
from typing import Annotated
import datetime
class Example(Struct):
aware_only_datetime: datetime.datetime
aware_only_time: datetime.time
date: datetime.date # date objects have no timezone
naive_only_datetime: Annotated[datetime.datetime, Meta(timezone=False)]
naive_only_time: Annotated[datetime.time, Meta(timezone=False)]
any_datetime: Annotated[datetime.datetime, Meta(timezone=None)]
any_time: Annotated[datetime.time, Meta(timezone=None)]
Like above, I don't love the timezone=True
(aware), timezone=False
(naive), timezone=None
(aware or naive) syntax, if anyone can think of a better API spelling please let me know.
We could also add type aliases in a new submodule msgspec.types
to make this easier to spell (since datetimes are common):
from msgspec import Struct
from msgspec.types import NaiveDatetime
# NaiveDatetime = Annotated[datetime.datetime, Meta(timezone=False)]
class Example(Struct):
naive_only_datetime: NaiveDatetime
Msgpack Complications
Currently we use msgpack's timestamp extension (https://github.com/msgpack/msgpack/blob/master/spec.md#timestamp-extension-type) when encoding datetimes to msgpack. This extension by design only supports timezone-aware datetimes. msgpack
has no standard representation for naive datetimes (or time/date objects in general). To handle this, I plan to encode naive datetimes as strings in the same format as JSON. This is an edge case that I don't expect to affect most users. I think the main benefit of supporting it is parity between types supported by both protocols.