First off congratulation to this amazing work. I think you managed to find the closing gap to make generative Deep learning relevant for real-world application, besides being just a nice toy as previous work in this area.
However to truly judge the performance of your approach I have to say I was a bit disappointed after reading your paper there was not a single note on execution time for either training or more crucial actually sampling of a single final image.
Would you be able to provide some numbers on how long a sample generation takes for a 4kx1k images with 256^2 patch size and on which setup?
Also if possible could you also shed some light on training times and which setup was used.
Thank you!