Hello up there. In https://github.com/zopefoundation/ZEO/pull/207 it was noted that current ZEO5 master fails sometimes in check_race_load_vs_external_invalidate
wrt ZEO4 server. I digged in to find out why and discovered multiple concurrency bugs - in both ZEO4 and ZEO5 - that lead to data corruptions. Please find details about those bugs below:
Bug1: ZEO5.client - ZEO4.server : race in between load / invalidate
The first bug is that ZEO5.client - contrary to ZEO4.client - does not account for simultaneous invalidations when working wrt ZEO4.server.
It shows as e.g.
(z-dev) kirr@deca:~/src/wendelin/z/ZEO5$ ZEO4_SERVER=1 zope-testrunner -fvvvx --test-path=src -t check_race_load_vs_external
...
AssertionError: T1: obj1 (24) != obj2 (23)
obj1._p_serial: 0x03ea4b6fb05b52cc obj2._p_serial: 0x03ea4b6faf253855
zconn_at: 0x03ea4b6fb05b52cc # approximated as max(serials)
zstor.loadBefore(obj1, @zconn.at) -> serial: 0x03ea4b6fb05b52cc next_serial: None
zstor.loadBefore(obj2, @zconn.at) -> serial: 0x03ea4b6faf253855 next_serial: 0x03ea4b6fb07def66
zstor._cache.clear()
zstor.loadBefore(obj1, @zconn.at) -> serial: 0x03ea4b6fb05b52cc next_serial: 0x03ea4b6fb07def66
zstor.loadBefore(obj2, @zconn.at) -> serial: 0x03ea4b6fb05b52cc next_serial: 0x03ea4b6fb07def66
indicating that obj2 was provided to user from the cache that erroneously got stale.
With IO trace showing message exchange in between client and the server, this looks as:
# loadBefore issued
tx (('\x00\x00\x00\x00\x00\x00\x00\x02', '\x03\xeaKo\xaf%8V'), False, 'loadBefore', ('\x00\x00\x00\x00\x00\x00\x00\x02', '\x03\xeaKo\xaf%8V'))
# received invalidateTransaction
rx (0, 1, 'invalidateTransaction', ('\x03\xeaKo\xb0[R\xcc', ['\x00\x00\x00\x00\x00\x00\x00\x01', '\x00\x00\x00\x00\x00\x00\x00\x02']))
# received loadBefore reply but with end_tid=None !!!
rx (('\x00\x00\x00\x00\x00\x00\x00\x02', '\x03\xeaKo\xaf%8V'), 0, '.reply', ('\x80\x03cZODB.tests.MinPO\nMinPO\nq\x01.\x80\x03}q\x02U\x05valueq\x03K\x17s.', '\x03\xeaKo\xaf%8U', None))
which:
- contradicts what @jimfulton wrote about ZEO4: that there invalidations are sent in a callback called when the storage lock is held, blocking loads while committing, and
- points out what the bug is:
Since in ZEO4 loads can be handled while a commit is in progress, ZEO4.client has special care to detect if an invalidateTransaction
comes in between load
request and corresponding .reply
response, and does not update the cache if invalidateTransaction sneaked in-between:
https://github.com/zopefoundation/ZEO/blob/47d3fbe8cbf24cad91b183483df069ef20708874/src/ZEO/ClientStorage.py#L367-L374
https://github.com/zopefoundation/ZEO/blob/47d3fbe8cbf24cad91b183483df069ef20708874/src/ZEO/ClientStorage.py#L841-L852
https://github.com/zopefoundation/ZEO/blob/47d3fbe8cbf24cad91b183483df069ef20708874/src/ZEO/ClientStorage.py#L1473-L1476
but in ZEO5.client loadBefore
does not have anything like that
https://github.com/zopefoundation/ZEO/blob/fc0729b3cc754bda02c7f54319260b5527dd42a3/src/ZEO/ClientStorage.py#L603-L608
https://github.com/zopefoundation/ZEO/blob/fc0729b3cc754bda02c7f54319260b5527dd42a3/src/ZEO/asyncio/client.py#L289-L309
and thus invalidateTransaction
sneaked in between loadBefore
request and
corresponding .reply
causes ZEO client cache corruption.
In the original check_race_load_vs_external_invalidate
the problem appears
only sometimes, but the bug happens with ~ 100% probability with the
following delay injected on the server after loadBefore
:
--- a/src/ZEO/tests/ZEO4/StorageServer.py
+++ b/src/ZEO/tests/ZEO4/StorageServer.py
@@ -285,7 +285,9 @@ def loadEx(self, oid):
def loadBefore(self, oid, tid):
self.stats.loads += 1
- return self.storage.loadBefore(oid, tid)
+ x = self.storage.loadBefore(oid, tid)
+ time.sleep(0.1)
+ return x
def getInvalidations(self, tid):
invtid, invlist = self.server.get_invalidations(self.storage_id, tid)
so maybe, in addition to normal test runs, it could be also good idea to run
the whole ZEO testsuite against so-amended storage backend. This way it will be
similar to how e.g. races are detected by my tracetest package.
This particular race does not happen for ZEO5.client - ZEO5.server, because
there is a difference how invalidations are emitted by the server:
ZEO4 schedules to send invalidations via callAsyncNoSend
:
https://github.com/zopefoundation/ZEO/blob/47d3fbe8cbf24cad91b183483df069ef20708874/src/ZEO/StorageServer.py#L1453-L1467
which immediately puts the message into the output queue and arms the trigger to wakeup IO thread:
https://github.com/zopefoundation/ZEO/blob/47d3fbe8cbf24cad91b183483df069ef20708874/src/ZEO/zrpc/connection.py#L574-L581
https://github.com/zopefoundation/ZEO/blob/47d3fbe8cbf24cad91b183483df069ef20708874/src/ZEO/zrpc/connection.py#L546-L558
the bug, thus, can happen because invalidateTransaction
message is put into
the output queue to client immediately on tpc_finish
.
ZEO5, however, queues to the IO thread both actions via call_soon_threadsafe
:
https://github.com/zopefoundation/ZEO/blob/fc0729b3cc754bda02c7f54319260b5527dd42a3/src/ZEO/StorageServer.py#L827-L845
https://github.com/zopefoundation/ZEO/blob/fc0729b3cc754bda02c7f54319260b5527dd42a3/src/ZEO/asyncio/server.py#L153-L154
which results in that invalidateTransaction
message is appended to the output queue at the end of current event loop cycle, not immediately right after tpc_finish
on the server. In https://github.com/zopefoundation/ZEO/blob/master/docs/ordering.rst#zeo-5 @jimfulton wrote about this the following:
The server-side hndling of invalidations is a bit tricker in ZEO 5 because there isn't a thread-safe queue of outgoing messages in ZEO 5 as there was in ZEO 4. The natural approach in ZEO 5 would be to use asyncio's call_soon_threadsafe
to send invalidations in a client's thread. This could easily cause invalidations to be sent after loads. As shown above, this isn't a problem for ZODB 5, at least assuming that invalidations arrive in order. This would be a problem for ZODB 4. For this reason, we require ZODB 5 for ZEO 5.
Anyway, this particular bug does not happen for ZEO5.client - ZEO5.server, because the server does not inject invalidateTransaction
message in between loadBefore
request and corresponding .reply
response.
The fix is either 1) abandon support for working with ZEO4 servers completely, or
2) to implement load-tracking similar to ZEO4 client.
If we go with option "1" I suggest to explicitly reject Z4
protocol from both ZEO5 client and ZEO5 server.
Bug2: ZEO5.client - ZEO5.server, ZEO4.client - ZEO4.server : race in between invalidate / disconnect
While investigating Bug1 and verifying whether it also shows itself on ZEO4,
I've found out that, even though ZEO4.client has explicit support for
sneaked-in invalidateTransaction
, check_race_load_vs_external_invalidate
also fails sometimes on ZEO4.client - ZEO4.server, but due to another reason:
failing assertion was similar to the one in Bug1, but IO trace was showing that
for the failure one intermediate invalidateTransaction
message was not
delivered to client at all. Missed invalidation makes client think that both
objects were unchanged, and retrieve them either from ZODB.Connection, or ZEO
cache, but if for some reason, one of the object is evicted from the cache, it
will result in inconsistency detected by broken test invariant.
The bug here is due to the race on the server: when sending out
invalidateTransaction
messages to all connected clients, both ZEO4 and ZEO5
iterate the list of connected clients without any locking:
https://github.com/zopefoundation/ZEO/blob/47d3fbe8cbf24cad91b183483df069ef20708874/src/ZEO/StorageServer.py#L1108-L1115
https://github.com/zopefoundation/ZEO/blob/fc0729b3cc754bda02c7f54319260b5527dd42a3/src/ZEO/StorageServer.py#L843-L845
So, if during such iteration, a client disconnects, it will cause some other
client to be skipped during the iteration and so do not receive
currently-processed invalidateTransaction
message. The following code
illustrates this:
In [9]: l = [1,2,3,4] # list of clients
In [10]: i = iter(l) # start iterating over it
In [11]: next(i) # first client yielded
Out[11]: 1
In [12]: next(i) # second client yielded
Out[12]: 2
In [13]: del l[0] # first client disconnects
In [14]: l
Out[14]: [2, 3, 4]
In [15]: next(i) # `3` is skipped wrongly
Out[15]: 4
In [16]: next(i)
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
This bug affects both servers: ZEO4 and ZEO5 in multi-threaded mode. ZEO5 is
affected only in multithreaded mode, because if ZEO5 server runs
single-threaded, a disconnection should not be handled simultaneously to
invalidateTransaction
processing. ZEO5 multi-threaded mode is activated by
ZEO_MTACCEPTOR=1
environment variable. This mode is scheduled to be removed
in ZEO6, but still, my understanding is that the bug does not show itself in
single-threaded mode only due to chance, not by design - there is neither
comments, nor explicit care to avoid this race.
The fix is to iterate over the snapshot copy of the list of clients when doing
such kind of iterations.