I'm curious to use GMFlow for stereo matching task. I noticed the closed issue #13 where you suggested to replace 2D cross-attention in Transformers with 1D cross-attention and 2D global matching with 1D global matching.
Using Stereo-RAFT 1D correlation as a model I did manage to implement somehow 1D global matching but not sure at all if it is right or wrong (didn't manage to include pred_bidir_flow parameter).
def coords_grid(b, h, w):
y, x = torch.meshgrid(torch.arange(h), torch.arange(w)) # [H, W]
stacks = [x, y]
grid = torch.stack(stacks, dim=0).float() # [2, H, W] or [3, H, W]
grid = grid[None].repeat(b, 1, 1, 1) # [B, 2, H, W] or [B, 3, H, W]
return grid
def global_correlation_softmax_1d(feature0, feature1):
# global correlation
b, c, h, w = feature0.shape
feature0 = feature0.permute(0, 2, 3, 1)
feature1 = feature1.permute(0, 2, 1, 3)
corr = torch.matmul(feature0, feature1) / (c ** 0.5) # [B, H, W, W]
# flow from softmax
init_grid = coords_grid(b, h, w).to(corr.device) # [B, 2, H, W]
grid = init_grid.permute(0, 2, 3, 1) # [B, H*W, 2]
prob = F.softmax(corr, dim=-1) # [B, H, W, W]
correspondence = torch.matmul(prob, grid).permute(0, 3, 1, 2) # [B, 2, H, W]
# when predicting bidirectional flow, flow is the concatenation of forward flow and backward flow
flow = correspondence - init_grid
return flow
For the 2D cross-attention replacement with 1D variant I don't know exactly which function I should modify. I assume that modifications should be done in single_head_split_window_attention() function but no idea how.
I also noticed that the results of GMStereo is added to Middlebury stereo evalutation. Are you planning to release codes related to that project (this would solve automatically my issues related to modifications)?