Since AlternativeImage
has been introduced on every level of the structural hierarchy, these image files can be used to represent results from image preprocessing (normalization, denoising, binarization, non-text suppression, despeckling, deskewing, dewarping). Some of these operations can and some cannot be represented descriptively – but referencing derived images always helps avoiding repeated computations.
However, there's a difficulty/penalty involved: All coordinates in the PAGE hierarchy are referring to the original image (under /PcGts/Page/@imageFilename
), whereas derived images (AlternativeImage/@filename
under Page
or Region
or TextLine
or Word
) necessarily have different, local/relative coordinate system. It is connected to the global/absolute coordinate system only implicitly.
So if you want to process via derived images, like crop segments further down the hierarchy (translating from their absolute coordinates to the images' relative coordinates) or add further segmentation (translating from new relative coordinates in the images to new absolute coordinates), then you must know the transformation between them.
This could merely be an offset (which could be unambiguously defined as the top left of the bounding box of the element's polygon), which happens after cropping (on the page level or any segmentation below that).
But there are certain operations which change coordinates non-trivially:
- Deskewing will shift to the center of the element's bounding box, then rotate around that center, increasing the size of the bounding box (to avoid loosing content at the corners), and shifting back to the (new) top left of the bounding box. Alternatively, larger angles (e.g. multiples of 90°) could be applied by reflection instead of rotation.
- Dewarping may change coordinates in any number of ways (3d shear or cubic spline projection, or interpolated raster grid, including as a special case centerline projection).
- Rescaling or aspect correction will multiply coordinates by a constant factor.
All those effects are cumulative, i.e. they will compose into a new coordinate transform at each step, and in the order of the operations applied to the image (and its predecessors). This is not always trivial, e.g. cropping before/after deskewing, deskewing on page and then again on region level. It's certainly not rocket science, but (believe me) there are many ways you can get this wrong when you have to implement it.
Now, for cropping and deskewing, we are in the fortunate situation that – provided the operations applied on the derived image have been carried out in the "correct" way and documented in its @comments
– their respective coordinate transform can be reconstructed from the descriptive information (Coords/@points
and @orientation
).
But for dewarping and rescaling we don't even have any descriptive annotation yet.
For dewarping, maybe the dewarping schema with its /DwGts/Grid/Row/@points
is sufficient (although it is unfortunate that this schema is external to the content schema).
But for rescaling, there's nothing at all.
You could ask:
- shouldn't we then allow annotating the coordinate transform explicitly?
- why do you want to rescale?
1: I'd be happy to see PAGE adopt some representation of affine transformations (basically a 3x3 float array) under AlternativeImage/@coordinate-system
. But I would still consider this only a redundant convenience feature.
2: Rescaling is useful under various scenarios:
- avoid wasting computation on images with too large pixel density by downsampling them during processing
- ensuring a fixed pixel density for operations that expect certain component sizes or distances (e.g. rule-based segmentation tools always assuming 300 DPI)
- ensuring a fixed pixel resolution for operations that expect a certain image size (e.g. neural segmentation tools)
- ensuring a fixed width/height aspect ratio during processing
Thus, I propose to at least introduce a descriptive annotation for derived images' scale factors:
AlternativeImage/@imageWidth
(as in Page/@imageWidth
)
AlternativeImage/@imageHeight
(as in Page/@imageHeight
)
AlternativeImage/@imageXResolution
(as in Page/@imageXResolution
)
AlternativeImage/@imageYResolution
(as in Page/@imageYResolution
)
AlternativeImage/@imageResolutionUnit
(as in Page/@imageResolutionUnit
)
AlternativeImage/@imageXScale
(how much is AlternativeImage/@imageXResolution
zoomed over Page/@imageXResolution
?)
AlternativeImage/@imageYScale
(how much is AlternativeImage/@imageYResolution
zoomed over Page/@imageYResolution
?)
(Of course, the latter 2 are redundant, but pixel density might not be known exactly/reliably and thus omitted / set to zero. In that case, the scale can still describe precisely the factor between the unknown density of the original image and the unknown density of the derived image.)