First off, thank you for this project, I'm building an app that should scale quite largely against couchdb and I'm looking forward to using the power of async :)
I had prototype code working perfectly against CouchDb (1.4) and recently we are trying to transition to Couchbase Sync Gateway for scalability issues (we expect thousands of concurrent feeds). After some setup pains, my script to write test data worked, and the web interface returns the list of documents, but all queries with aiocouchdb returned nothing.
I spent quite some time debugging this. And it seems to be a bug in the library, which may well be related to the problem in issue #8
curl http://localhost:5984/default/_all_docs?include_docs=true
returns the proper information. In this case:
{"rows":[
{"key":"product:484d5edd-e8f3-4cbb-962f-c88b08898a4f","id":"product:484d5edd-e8f3-4cbb-962f-c88b08898a4f","value":{"rev":"1-2d78300a38c3f4e8db23e05d9ed20d1d"},"doc":{"_id":"product:484d5edd-e8f3-4cbb-962f-c88b08898a4f","_rev":"1-2d78300a38c3f4e8db23e05d9ed20d1d","created_at":"2016-01-21T14:19:02.627+0000","name":"Pasta","price":750,"type":"product","version":1}}
,{"key":"product:4ab9a3e1-2081-4159-9746-2b1cb29e0cb4","id":"product:4ab9a3e1-2081-4159-9746-2b1cb29e0cb4","value":{"rev":"1-39244db8fbf1150f18ef2b755487068d"},"doc":{"_id":"product:4ab9a3e1-2081-4159-9746-2b1cb29e0cb4","_rev":"1-39244db8fbf1150f18ef2b755487068d","created_at":"2016-01-21T14:01:02.279+0000","name":"Peanuts","price":150,"type":"product","version":1}}
,{"key":"product:7219f59b-3fce-4893-8f38-e2a1f38a82dc","id":"product:7219f59b-3fce-4893-8f38-e2a1f38a82dc","value":{"rev":"1-1bf8bbc9e5b5cc626361f76f14dc29bd"},"doc":{"_id":"product:7219f59b-3fce-4893-8f38-e2a1f38a82dc","_rev":"1-1bf8bbc9e5b5cc626361f76f14dc29bd","created_at":"2016-01-21T13:57:52.495+0000","name":"Beer","price":200,"type":"product","version":1}}
],
"total_rows":3,"update_seq":4}
My query code looks something like this:
db = await ensure_db(server, db_name)
docs = await db.all_docs(include_docs=True)
with docs:
rec = await docs.next()
while rec:
print_json(rec)
rec = await docs.next()
As I said, this worked against CouchDB, but the first call to docs.next() now returns None....
Digging into the aiocouchdb code code, I put a breakpoint in ViewFeed.next (feeds.py:140) and found that chunk
was the entire body of the response. Apparently Sync Gateway buffers the text, making one network IO call, while couchdb sends each chunk in it's own network IO call. And quoting from feeds.py:146-147:
elif chunk.startswith(('{"rows"', ']}')):
return (yield from self.next())
Thus, the chunk is skipped and just moved onto the next one... which doesn't exist, so the result is None, an empty iterator :(
For me, the solution is to no longer depend on the network IO caching behavior of the server, but use a more powerful parser to return one line at a time. (I think ijson (https://pypi.python.org/pypi/ijson) is a great solution for parsing streaming json, and with optional c libraries, very fast).
My work-around looks something like this:
in views.py, add to the top:
from .client import HttpStreamResponse
and on line 50, change:
resp = yield from request(auth=auth, data=data, params=params)
to:
resp = yield from request(auth=auth, data=data, params=params, response_class=HttpStreamResponse)
in Feed._loop (feeds.py:59), change:
chunk = yield from self._resp.content.read()
to
chunk = yield from self._resp.content.readline()
This mostly works and parses the rows correctly now, it raises an exception on parsing the remainder in ViewFeed.next (feeds.py:140-149). We have an empty line with ],
but the code only checks for ]}
. Also, we no longer have {"total_rows"
with no trailing }
, but rather "total_rows"
with no leading {
.
I could work on a patch here, but I would like some direction from the project maintainer.
- There don't seem to be tests for the feeds in unit test (understandably, as it usually requires a server to test), shall I try to mock something there, or is there a way to run tests against various servers.
- What variants need to be supported (CouchDB 1.4? CouchDB 2.0? Sync Gateway?)
- Is it acceptable to add requirements that provide more robust streaming json parsing?
Thanks again for the great package, and please let me know how I can contribute.