My organization runs a Pyramid web application that we deploy to AWS Elastic Container Service (Fargate). Waitress serves the Pyramid application with nginx as a reverse proxy on a Debian buster image. We found upon taking waitress 2.1.0 we saw an immediate performance degradation. Specifically for response sizes above ~50kb (compressed by nginx), we see a ~5-10x slowdown on responses, resulting in numerous 504s for our various APIs. Upon rolling back to version 2.0.0, performance is restored. This is on Python 3.7.12.
I use locust to run performance tests on the web application, which allows me to quickly isolate and identify the root cause of such dramatic performance changes. All system level dependencies are identical (those installed by apt
), as are other Python libraries.
Pre-update (waitress 2.0.0):
Response time percentiles (approximated in ms)
Type Name 50% 66% 75% 80% 90% 95% 98% 99% 99.9% 99.99% 100%
--------|------------|---------|------|------|------|------|------|------|------|------|------|------|
All Aggregated 750 980 1200 1400 1800 2300 4000 4600 6500 6500 6500
Post update (waitress 2.1.0):
Response time percentiles (approximated in ms)
Type Name 50% 66% 75% 80% 90% 95% 98% 99% 99.9% 99.99% 100%
-------|------------|---------|------|------|------|------|------|------|------|------|------|------|
All Aggregated 780 1500 2600 5300 21000 57000 60000 61000 61000 61000 61000
This test was run on the same environment - only difference is the waitress version. Most routes that return small responses are not affected, but as soon as response size gets beyond a certain threshold the performance falls off a cliff.
Any thoughts why this might be? Something obvious I am missing? It is entirely possible there is a mistake or misconfiguration on my end, but such a slowdown is very suspicious and I believe isolated to this version.
Dockerfile
nginx.conf