Initial Checks
- [X] I have searched Google & GitHub for similar requests and couldn't find anything
- [X] I have read and followed the docs and still think this feature is missing
Description
I'm not really sure if this is a feature request or a bug to be honest, but I'm reasonably certain it doesn't belong in the discussion forum.
I have a traditional Pydantic model that looks something like this:
from __future__ import annotations
from enum import Enum
from pydantic import BaseModel, Extra, Field
from typing import Any
class Status(Enum):
UNKNOWN = "Unknown"
TRUE = "True"
FALSE = "False"
@classmethod
def __modify_schema__(cls, schema: dict[str, Any]) -> None:
# Pydantic isn't able to determine this type by itself during schema generation.
schema["type"] = "string"
class Condition(BaseModel, frozen=True, extra=Extra.forbid, allow_population_by_field_name=True):
type_: str = Field(alias="type")
status: ConditionStatus
Critically, it contains an enum as a subtype.
I have another class that is a custom data type. It's essentially a set dedicated to Condition
objects:
from collections.abc import Iterable, Iterator, Set
from pydantic.typing import CallableGenerator
class Conditions(Set[Condition]):
def __init__(self, conditions: Iterable[Condition]):
self._conditions = frozenset(conditions)
@classmethod
def __get_validators__(cls) -> CallableGenerator:
yield cls.validate
@classmethod
def validate(cls, value: Any) -> Conditions:
if isinstance(value, Conditions):
return value
if not isinstance(value, Iterable):
raise TypeError("Must provide an iterable of Condition objects.")
def iter_check(value: Iterable[Any]) -> Iterator[Condition]:
for i, condition in enumerate(value, start=1):
if isinstance(condition, Condition):
yield condition
else:
raise TypeError(f"Entry {i} is not a Condition object: {condition!r}")
return cls(iter_check(value))
def __iter__(self) -> Iterator[Condition]:
return iter(self._conditions)
def __contains__(self, value: Any) -> bool:
return value in self._conditions
def __len__(self) -> int:
return len(self._conditions)
All of this works fine, except when it comes time to generate a JSON schema. For context, this Conditions
class is used as the type of a field in a much bigger Pydantic model, and I'm generating the schema from that model.
The first problem is that Pydantic doesn't understand the inheritance from collections.abc.Set
at all. There is no schema generated for the class by default, and in fact attempting to do so produces an error:
ValueError: Value not declarable with JSON Schema, field: name='conditions' type=Conditions required=True
Ideally Pydantic would understand that I've extended collections.abc.Set
, and that the generic type (Condition
) is a Pydantic model, and then it could in fact generate a full schema from that automatically. However it doesn't right now, and as support will probably never arrive in Pydantic 1.10, I need to keep searching for a solution.
The next thing to try is writing a __modify_schema__
method for the Conditions
class:
@classmethod
def __modify_schema__(cls, schema: dict[str, Any]) -> None:
schema["type"] = "array"
schema["uniqueItems"] = True
schema["items"] = {}
Writing just that code works, but it's sub-optimal. There's no type safety for the Condition
type that we know we're wrapping. Unfortunately, I can't find a good way of handling this, only things that amount to hackz.
I attempted to replace the empty dictionary with this:
schema["items"] = Condition.schema()
It "works" in that the code runs, but the full generated schema for the overall model is invalid. The problem is that Condition.schema()
lacks context of the overall schema generation operation. It generates a definitions
object (and $ref
s pointing at the definitions
object), but all of that is nested inside an object. The generated $ref
s point at things that don't exist.
This is where the title of this issue comes in: the schema generation isn't composable. If there was a way to tell the schema generation code "don't use a global definitions store to generate this schema", I could trivially nest one schema inside another.
Alas, this isn't possible, and attempting to make that possible would likely require a significant refactoring of the existing schema generation code, so I keep on searching. My next idea is to keep doing schema["items"] = Condition.schema()
, and just fix the generated schema in the Condition
class' __modify_schema__
method. This way the hackz is kept local and limited in scope, and avoids needing to reinvent the wheel wherever it may be used.
Unfortunately for whatever reason, the __modify_schema__
method of the class you call .schema()
on doesn't get called. Instead, you're expected to use schema_extra
in the model's config. The schema_extra
config option can either be a static dictionary, or a callable. I don't really understand why the distinction is necessary. When using kwargs for model config (as opposed to the nested Config
class), passing a callable just feels weird. I suspect this approach will work, though I haven't tried it yet. Getting to this point is what prompted me to create the issue.
To summarise, there are three problems here:
- Pydantic doesn't understand the meaning of a custom data type that inherits from
collections.abc.Set
, and isn't able to use its generic type in schema generation.
- There is no way to generate a composable schema that doesn't use
$ref
.
- The
__modify_schema__
method is not called when generating that model's schema specifically, for no discernible reason.
Please let me know if you require any further context for the code samples I've provided, or if you need any other information I might have missed :pray:
Affected Components
feature request