Hi, martinkersner:
In your code, it seems that you only consider one image pair's evaluation,since I can see your evaluation demo in your another repo : train-deeplab.
however, after I compared your implemention details with other codes, such as deeplabv2,cityscapesScripts,PSPNet evaluation. I found that you didn't involve in confusion matrix calculation, which considers all evaluation images totally. we can take a look at cityscapes benchmark. all of them accumulate all pixels of all images statistically to calculate a final confusion matrix.
Therefore, I think the standard way is:
1).calculate confusion matrix based on all evaluation images.
2). calculate IoU of each class based on the confusion matrix from step (1).
3). calculate the final mIoU from step (2).
I found this problem, since I have different evaluation result between your scripts(56.1) and others.(52) in my task. maybe this is the problem.
hope this issue can help someone else cares about it.