MultiError
is the one part of trio's core design that I'm really not satisfied with. Trying to support multiple exceptions on top of the language's firm one-exception-at-a-time stance raises (heh) all kinds of issues. Some of them can probably be fixed by changing the design. But the hardest problem is that there are lots of third-party packages that want to do custom handling of exception tracebacks (e.g. ipython, pytest, raven/sentry). And right now we have to monkeypatch all of them to work with MultiError, with more or less pain and success.
Now @1st1 wants to add nurseries to asyncio, and as part of that he'll need to add something like MultiError
to the stdlib. We talked a bunch about this at PyCon, and the plan is to figure out how to do this in a way that works for both asyncio and trio. Probably we'll start by splitting MultiError
off into a standalone library, that trio and uvloop can both consume, and then add that library to the stdlib in 3.8 (and then the library will remain as a backport library for those using trio on 3.7 and earlier). This way asyncio can build on our experience, and trio can get out of the monkeypatching business (because if MultiError
s are in the stdlib, then it becomes ipython/pytest/sentry's job to figure out how to cope with them).
But before we can do that we need to fix the design, and do it in writing so we (and Yury) can all see that the new design is right :-).
Current design
[If this were a PEP I'd talk more about the basic assumptions underlying the design: multiple errors cna happen concurrently, you need to preserve that fact, you need to be able to catch some-but-not-all of those errors, you need to make sure that don't accidentally throw away any errors that you didn't explicitly try to catch, etc. But this is a working doc so I'm just going to dive into the details...]
Currently, trio thinks of MultiError
objects as being ephemeral things. It tries as much as possible to simulate a system where multiple exceptions just happen to propagate next to each other. So it's important to keep track of the individual errors and their tracebacks, but the MultiError
objects themselves are just a detail needed to accomplish this.
So, we only create MultiError
s when there are actually multiple errors – if a MultiError
has only a single exception, we "collapse" it, so MultiError([single_exc]) is single_exc
. The basic primitive for working with a MultiError
is the filter
function, which is really a kind of weird flat-map-ish kind of thing: it runs a function over each of the "real" exceptions inside a MultiError
, and can replace or remove any of them. If this results in any MultiError
object that has zero or one child, then filter
collapses it. And the catch helper, MultiError.catch
, is a thin wrapper for filter
: it catches an exception, then runs a filter
over it, and then reraises whatever is left (if anything).
One more important detail: traceback handling. When you have a nested collection of MultiError
s, e.g. MultiError([RuntimeError(), MultiError([KeyError(), ValueError()])])
, then the leaf exceptions' __traceback__
attr holds the traceback for the frames where they traveled independently before meeting up to become a MultiError
, and then each MultiError
object's __traceback__
attr holds the frames that that particular MultiError
traversed. This is just how Python's __traceback__
handling works; there's no way to avoid it. But that's OK, it's actually quite convenient – when we display a traceback, we don't want to say "exception 1 went through frames A, B, C, D, and independently, exception 2 went through frames A', B, C, D" – it's more meaningful, and less cluttered, to say "exception 1 went through frame A, and exception 2 went through frame A', and then they met up and together they went through frames B, C, D". The way __traceback__
data ends up distributed over the MultiError
structure makes this structure really easy to extract.
Proposal for new design
Never collapse MultiError
s. Example:
async def some_func():
async with trio.open_nursery() as nursery:
async with trio.open_nursery() as nursery:
raise RuntimeError
If you do await some_func()
then currently you get a RuntimeError
; in this proposal, you'll instead get a MultiError([MultiError([RuntimeError()])])
.
Get rid of filter
, and replace it with a new primitive split
. Given an exception and a predicate, split
splits the exception into one half representing all the parts of the exception that match the predicate, and another half representing all the parts that don't match. Example:
match, rest = MultiError.split(
# simple predicate: isinstance(exc, RuntimeError)
RuntimeError,
# The exception being split:
MultiError([
RuntimeError("a"),
MultiError([
RuntimeError("b"),
ValueError(),
]),
])
)
# produces:
match = MultiError([RuntimeError("a"), MultiError([RuntimeError("b")])])
rest = MultiError([MultiError([ValueError()])])
The split
operation always takes an exception type (or tuple of types) to match, just like an except
clause. It should also take an optional arbitrary function predicate, like match=lambda exc: ...
.
If either match
or rest
is empty, it gets set to None
. It's a classmethod rather than a regular method so that you can still use it in cases where you have an exception but don't know whether it's a MultiError
or a regular exception, without having to check.
Catching MultiError
s is still done with a context manager, like with MultiError.catch(RuntimeError, handler)
. But now, catch
takes a predicate + a handler (as opposed to filter, which combines these into one thing), uses the predicate to split
any caught error, and then if there is a match it calls the handler exactly once, passing it the matched object.
Also, support async with MultiError.acatch(...)
so you can write async handlers.
Limitations of the current design, and how this fixes them
Collapsing is not as helpful to users as you might think
A "nice" thing about collapsing out MultiError
s is that most of the time, when only one thing goes wrong, you get a nice regular exception and don't need to think about this MultiError
stuff. I say "nice", but really this is... bad. When you write error handling code, you want to be prepared for everything that could happen, and this design makes it very easy to forget that MultiError
is a possibility, and hard to figure out where MultiError
handling is actually required. If the language made handling MultiError
s more natural/ergonomic, this might not be as big an issue, but that's just not how Python works. So Yury is strongly against the collapsing design, and he has a point.
Basically, seeing MultiError([RuntimeError()])
tells you "ah, this time it was a single exception, but it could have been multiple exceptions, so I'd better be prepared to handle that".
This also has the nice effect that it becomes much easier to teach people about MultiError
, because it shows up front-and-center the first time you have an error inside a nursery.
One of my original motivations for collapsing was that trio.run
has a hidden nursery (the "system nursery") that the main task runs inside, and if you do trio.run(main)
and main
raises RuntimeError
, I wanted trio.run
to raise RuntimeError
as well, not MultiError([RuntimeError()])
. But actually this is OK, because the way things have worked out, we never raise errors through the system nursery anyway: either we re-raise whatever main
raised, or we raise TrioInternalError
. So my concern was unfounded.
Collapsing makes traceback handling more complicated and fragile
Collapsing also makes the traceback handling code substantially more complicated. When filter
simplifies a MultiError
tree by removing intermediate nodes, it has to preserve the traceback data those nodes held, which it does by patching it into the remaining exceptions. (In our example above, if exception 2 gets caught, then we patch exception 1's __traceback__
so that it shows frames A, B, C, D after all.) This all works, but it makes the implementation much more complex. If we don't collapse, then we can throw away all the traceback patching code: the tracebacks can just continue to live on whichever object they started out on.
Collapsing also means that filter
is a destructive operation: it has to mutate the underlying exception objects' __traceback__
attributes in place, so you can't like, speculatively run a filter
and then change your mind and go back to using the original MultiError
. That object still exists but after the filter
operation it's now in an inconsistent state. Fine if you're careful, but it'd be nicer if users didn't have to be careful. If we don't collapse, then this isn't an issue: split
doesn't have to mutate its input (and neither would filter
, if we were still using filter
).
Collapsing loses __context__
for intermediate MultiError
nodes
Currently, Trio basically ignores the __context__
and __cause__
attributes on MultiError
objects. They don't get assigned any semantics, they get freely discarded when collapsing, and they often end up containing garbage data. (In particular, if you catch a MultiError
, handle part of it, and re-raise the remainder... the interpreter doesn't know that this is semantically a "re-raise", and insists on sticking the old MultiError
object onto the new one's __context__
. We have countermeasures, but it's all super annoying and messy.)
It turns out though that we do actually have a good use for __context__
on MultiError
s. It's super not obvious, but consider this scenario: you have two tasks, A and B, executing concurrently in the same nursery. They both crash. But! Task A's exception propagates faster, and reaches the nursery first. So the nursery sees this, and cancels task B. Meanwhile, the task B has blocked somewhere – maybe it's trying to send a graceful shutdown message from a finally:
block or something. The cancellation interrupts this, so now task B has a Cancelled
exception propagating, and that exception's __context__
is set to the original exception in task B. Eventually, the Cancelled
exception reaches the nursery, which catches it. What happens to task B's original exception?
Currently, in Trio, it gets discarded. But it turns out that this is not so great – in particular, it's very possible that task A and B were working together on something, task B hit an error, and then task A crashed because task B suddenly stopped talking to it. So here task B's exception is the actual root cause, and task A's exception is detritus. At least two people have hit this in Trio (see #416 and https://github.com/python-trio/pytest-trio/issues/30).
In the new design, we should declare that a MultiError
object's __context__
held any exceptions that were preempted by the creation of that MultiError
, i.e., by the nursery getting cancelled. We'd basically just look at the Cancelled
objects, and move their __context__
attributes onto the MultiError
that the nursery was raising. But this only works if we avoid collapsing.
It would be nice if tracebacks could show where exceptions jumped across task boundaries
This has been on our todo list forever. It'd be nice if we could like... annotate tracebacks somehow?
If we stopped collapsing MultiError
s, then there's a natural place to put this information: each MultiError
corresponds to a jump across task boundaries, so we can put it in the exception string or something. (Oh yeah, maybe we should switch MultiError
s to having associated message strings? Currently they don't have that.)
Filtering is just an annoying abstraction to use
If I wanted to express exception catching using a weird flat-map-ish thing, I'd be writing haskell. In Python it's awkward and unidiomatic. But with filter
, it's necessary, because you could have any number of exceptions you need to call it on.
With split
, there's always exactly 2 outputs, so you can perform basic MultiError
manipulations in straight-line code without callbacks.
Tracebacks make filtering even more annoying to use than it would otherwise be
When filter
maps a function over a MultiError
tree, the exceptions passed in are not really complete, standalone exceptions: they only have partial tracebacks attached. So you have to handle them carefully. You can't raise or catch them – if you did, the interpreter would start inserting new tracebacks and make a mess of things.
You might think it was natural to write a filter function using a generator, like how @contextmanager
works:
def catch_and_log_handler():
try:
yield
except Exception as exc:
log.exception(exc)
def normalize_errors_handler():
try:
yield
except OtherLibraryError as exc:
raise MyLibraryError(...) from exc
But this can't work, because the tracebacks would get all corrupted. Instead, handlers take exceptions are arguments, and return either that exception object, or a new exception object (like MyLibraryError
).
If a handler function does raise an exception (e.g., b/c of a typo), then there's no good way to cope with that. Currently Trio's MultiError
code doesn't even try to handle this.
In the proposed design, all of these issues go away. The exceptions returned by split
are always complete and self-contained. Probably for MultiError.catch
we still will pass in the exception as an argument instead of using a generator and .throw
ing it – the only advantage of the .throw
is that it lets you use an except
block to say which exceptions you want to catch, and with the new MultiError.catch
we've already done that before we call the handler. But we can totally allow raise
as a way to replace the exception, or handle accidental exceptions. (See the code below for details.)
Async catching
Currently we don't have an async version of filter
or catch
(i.e., one where the user-specified handler can be async). Partly this is because when I was first implementing this I hit an interpreter bug that made it not work, but it's also because filter
's is extremely complicated and maintaining two copies makes it that much worse.
With the new design, there's no need for async split
, I think the new catch
logic makes supporting both sync and async easy (see below).
Details
# A classic "catch-all" handler, very common in servers
with MultiError.catch(Exception, logger.exception):
...
# is equivalent to:
except BaseException as exc:
caught, rest = MultiError.split(Exception, exc)
if caught is None:
raise
try:
logger.exception(caught)
except BaseException as exc:
# The way we set exc.__context__ here isn't quite right...
# Ideally we should stash it the interpreter's implicit exception
# handling context state before calling the handler.
if rest is None:
try:
raise
finally:
exc.__context__ = caught
else:
exc.__context__ = caught
new_exc = MultiError([exc, rest])
try:
raise new_exc
finally:
new_exc.__context__ = None
# IIRC this traceback fixup doesn't actually work b/c
# of interpreter details, but maybe we can fix that in 3.8
new_exc.__traceback__ = new_exc.__traceback__.tb_next
else:
if rest is not None:
orig_context = rest.__context__
try:
raise rest
finally:
rest.__context__ = orig_context
# See above re: traceback fixup
rest.__traceback__ = rest.__traceback__.tb_next
Notes:
As noted in comments, __context__
and __traceback__
handling is super finicky and has subtle bugs. Interpreter help would be very... helpful.
Notice that all the logic around the logger.exception
call is always synchronous and can be factored into a context manager, so we can do something like:
def catch(type, handler, match=None):
with _catch_impl(type, match=None) as caught:
handler(caught)
async def acatch(type, handler, match=None):
with _catch_impl(type, match=None) as caught:
await handler(caught)
Other notes
Subclassing
Do we want to support subclassing of MultiError
, like class NurseryError(MultiError)
? Should we encourage it?
If so, we need to think about handling subclasses when cloning MultiError
s in .split
.
I think we should not support this though. Because, we don't have, and don't want, a way to distinguish between a MultiError([MultiError([...])])
and a MultiError([NurseryError([...])])
– we preserve the structure, it contains metadata, but still it's structural. split
and catch
still only allow you address the leaf nodes. And that's important, because if we made it easy to match on structure, then people would do things like try to catch a MultiError([MultiError([RuntimeError()])])
, when what they should be doing is trying to catch one-or-more-RuntimeError
s. The point of keeping the MultiError
s around instead of collapsing is to push you to handle this case, not continue to hard-code assumptions about there being only a single error.
Naming
MultiError
isn't bad, but might as well consider other options while we're redoing everything. AggregateError
? CombinedError
? NurseryError
?
Relevant open issues
#408, #285, #204, #56, https://github.com/python-trio/pytest-trio/issues/30
design discussion potential API breaker exception handling