Learning to Estimate Hidden Motions with Global Motion Aggregation

Shihao Jiang (Zac)

Last update: Dec 18, 2022

Related tags

Deep Learning GMA

Overview

Learning to Estimate Hidden Motions with Global Motion Aggregation (GMA)

This repository contains the source code for our paper:

Learning to Estimate Hidden Motions with Global Motion Aggregation
Shihao Jiang, Dylan Campbell, Yao Lu, Hongdong Li, Richard Hartley
ANU, Oxford

Environments

You will have to choose cudatoolkit version to match your compute environment. The code is tested on PyTorch 1.8.0 but other versions might also work.

conda create --name gma python==3.7
conda activate gma
conda install pytorch=1.8.0 torchvision=0.9.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install matplotlib imageio einops scipy opencv-python

Demo

sh demo.sh

Train

sh train.sh

Evaluate

sh evaluate.sh

License

WTFPL. See LICENSE file.

Acknowledgement

The overall code framework is adapted from RAFT. We thank the authors for the contribution. We also thank Phil Wang for open-sourcing transformer implementations.

Comments

How to handle subtitle with large motion?

Hi,

I wonder to know whether GMA could handle the case that subtitle accompany with large motion which is common in the movie? Would the subtitle be kept well in the interpolation result?

opened by leiwen83 4
Attention Map Visualization

Thanks for making your source code public. Could you please share your code for visualizing the attention map in Figure 6 or guide me on how to obtain it?

opened by Bayrambai 3
reproduce results

Hi, I have run the first two stages in your train.sh, which are chairs and things. But I only get 1.35 and 2.83 EPE on Sintel's clean and final pass. I want to know how can I get the same result in your paper, that is 1.30 and 2.74. Is it achieved by training multiple times to get the optimal value?

opened by 863689877 3
IndexError: index 0 is out of bounds for axis 0 with size 0

Hi! great work! When I tested sintel-test-final dataset and created a create_sintel_submission, It happened" IndexError: index 0 is out of bounds for axis 0 with size 0" on a pair of images, why?

opened by forbyme 2
Is there any requirements on the size of input images

When I input two images of size 768 x 1856 to the model, I got the error below: einops.EinopsError: Error while processing rearrange-reduction pattern "(y v) d -> y () v d". Input tensor shape: torch.Size([25600, 128]). Additional info: {'y': 232}. Shape mismatch, can't divide axis of length 25600 in chunks of 232 Any idea why this happens?

opened by Tord-Zhang 2
the transformer head number

Thank you for your concise and efficient work! I would like to ask a question about the impact of the transformer head's number. I found no ablation experiments for this variable in the paper you published, and you set it to 1 in your code. May I ask if you have conducted relevant experiments on this variable, and whether the performance will be improved if the number is increased?

opened by 863689877 2
Running Evaluation for KITTI Test

Hey @zacjiang ,

Thank you for sharing your work! @zacjiang, I was looking to evaluate the pre-trained model on the KITTI test set. I have completed the repository set up, and got it running according to instructions mentioned on GitHub. I was able to reproduce the results mentioned in the paper in Table 2 for the KITTI train dataset.

But When I run it for the KITTI test, the execution of evaluation.py fails. This is because in evaluation.py file, It expects 4 outputs from the data loader, https://github.com/zacjiang/GMA/blob/2f1fd29468a86a354d44dd25d107930b3f175043/evaluate.py#L348-L355. Whereas if the split is test then according to the dataset file here: https://github.com/zacjiang/GMA/blob/2f1fd29468a86a354d44dd25d107930b3f175043/core/datasets.py#L38-L46. It returns only three values.

Then in that case how do i get numbers for KITTI Test split.

Regards, Nitin Bansal

opened by nbansal90 1
kitti submission

Hi, it is a very nice work! I would like to ask questions about kitti submission. When I get 'kitti_submission' folder, how can I create the right ZIP file. The tips on the Kitti website are as follows: For the optical flow task, do we only need to include the 'flow' folder in the submitted folder?So just rename your procedurally generated 'kitti_submission' folder to 'flow' and compress it into a ZIP file?

opened by 863689877 1
Cannot find file named 'things_val_test_set.txt'

Hi, file named 'things_val_test_set.txt' cannot be found in in core/datasets.py, which leads to failure when validate flyingthings3d dataset. Please provide this file, and explain the source of that. Thanks.

opened by 863689877 1
Recommended checkpoint for real-world images

Hi, thank you for your great work.

I want to test the GMA on real-world images (ie, not synthetic ones). Could you tell me which one of the four checkpoints (chairs, kitti, sintel, things) is expected to generalize well on real-world images?

opened by duanzhiihao 1
how to count FLOPS of GMA model?

I wonder how to accurately count floating point operations per second of GMA model. The open interface basically only counts the convolution , for those special operations in the model, are there any good statistical methods? In other words, how do you do that?

opened by leeqiaogithub 0
Intuition behind two details in the code
Hello, Thank you very much for sharing your precious work with us. I had two questions regarding the code.

In gma.py, line 60, the query tensor is scaled by self.scale = dim_head ** -0.5. Why is that necessary? I would also be thankful if you explain why you set the value to dim_head ** -0.5.

In the same file, line 113, motion features are added to the attention output tensor. Could you please give some insights on that as well?

Thanks a lot. Azin
opened by az-ja 0
about query projector and key projector

It says,"we project the context feature map to a query feature map and a key feature map. We then take the dot product of the two feature maps and a softmax to obtain an attention matrix" but in network.py line 99,I just found "attention = self.att(inp)". this is what puzzled me.

opened by leeqiaogithub 0
Reproducibility of GMA on Sintel and KITTI test

Thanks for the great code ! I try to reproduce the Sintel and KITTI test results reported in the paper. However, I got 1.58 on Sintel clean, 2.64 on Sintel final for GMA(our), and 5.14 on KITTI for GMA(p only). The results seem worse than those reported in the paper (1.39 on Sintel clean, 2.47 on Sintel final for GMA(our), and 4.93 on KITTI for GMA(p only)). Is it because you find the best iteration checkpoint on the validation set, while I use the last iteration checkpoint? If so, may I know the validation set you choose?

opened by zwei-lin 2
Does HD1k really help?

I was using HD1k when doing Sintel finetuning, just as GMA does. I'm surprised that it only consists of grayscale images; that means there's a big domain gap between HD1k and other training sets. Wonder if I remove HD1k, would the model be trained better? My training without HD1k is ongoing, and seems the loss on the training data is much smaller, and accuracy is higher. Would update when it finishes.

opened by askerlee 1
Training Set of Sintel Submission

Hi, I'm trying to reproduce your result on sintel benchmark. I notice that you use 'C + T + S/K (+ H)' in the experiment table of the paper. To my knowledge, referring to the RAFT paper, C+T+S/K means when you train the sintel stage, you only use C+T+S. I have no idea what is the (+H), even with the explain “S/K (+ H)” refers to methods that are fine-tuned on the Sintel and KITTI datasets, with some also fine-tuned on the HD1K dataset. '. What does thewith some' mean? Could you please detail the training schedule you used in the Sintel submission? Is it the C+T+S+H?

opened by drinkingcoder 1

Owner

Shihao Jiang (Zac)

PhD Student at Australian National University

GitHub

Code for "Learning to Segment Rigid Motions from Two Frames".

rigidmask Code for "Learning to Segment Rigid Motions from Two Frames". ** This is a partial release with inference and evaluation code.

157 Nov 21, 2022

Official PyTorch implementation of the paper "TEMOS: Generating diverse human motions from textual descriptions"

TEMOS: TExt to MOtionS Generating diverse human motions from textual descriptions Description Official PyTorch implementation of the paper "TEMOS: Gen

187 Dec 27, 2022

A real-time motion capture system that estimates poses and global translations using only 6 inertial measurement units

TransPose Code for our SIGGRAPH 2021 paper "TransPose: Real-time 3D Human Translation and Pose Estimation with Six Inertial Sensors". This repository

261 Dec 31, 2022

A deep learning network built with TensorFlow and Keras to classify gender and estimate age.

Convolutional Neural Network (CNN). This repository contains a source code of a deep learning network built with TensorFlow and Keras to classify gend

1 Dec 18, 2021

A deep learning network built with TensorFlow and Keras to classify gender and estimate age.

Convolutional Neural Network (CNN). This repository contains a source code of a deep learning network built with TensorFlow and Keras to classify gend

1 Dec 19, 2021

Official Pytorch implementation of "Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes", CVPR 2022

Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes / 3DCrowdNet News ?? 3DCrowdNet achieves the state-of-the-art accuracy on 3D

113 Dec 21, 2022

This repository contains the code for the paper "Hierarchical Motion Understanding via Motion Programs"

Hierarchical Motion Understanding via Motion Programs (CVPR 2021) This repository contains the official implementation of: Hierarchical Motion Underst

40 Dec 5, 2022

Exploring Versatile Prior for Human Motion via Motion Frequency Guidance (3DV2021)

Exploring Versatile Prior for Human Motion via Motion Frequency Guidance This is the codebase for video-based human motion reconstruction in human-mot

5 Jul 14, 2022

A very simple baseline to estimate 2D & 3D SMPL-compatible keypoints from a single color image.

Minimal Body A very simple baseline to estimate 2D & 3D SMPL-compatible keypoints from a single color image. The model file is only 51.2 MB and runs a

49 Dec 5, 2022

Learning hidden low dimensional dyanmics using a Generalized Onsager Principle and neural networks

OnsagerNet Learning hidden low dimensional dyanmics using a Generalized Onsager Principle and neural networks This is the original pyTorch implemenati

3 Aug 24, 2022

Official repository for the CVPR 2021 paper "Learning Feature Aggregation for Deep 3D Morphable Models"

Deep3DMM Official repository for the CVPR 2021 paper Learning Feature Aggregation for Deep 3D Morphable Models. Requirements This code is tested on Py

38 Dec 27, 2022

A pytorch reproduction of { Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation }.

A PyTorch Reproduction of HCN Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation. Ch

210 Dec 31, 2022

Learning to Estimate Hidden Motions with Global Motion Aggregation

Related tags

Overview

Learning to Estimate Hidden Motions with Global Motion Aggregation (GMA)

Environments

Demo

Train

Evaluate

License

Acknowledgement

Comments

Owner

Shihao Jiang (Zac)

Code for "Learning to Segment Rigid Motions from Two Frames".

Official PyTorch implementation of the paper "TEMOS: Generating diverse human motions from textual descriptions"

A real-time motion capture system that estimates poses and global translations using only 6 inertial measurement units

A deep learning network built with TensorFlow and Keras to classify gender and estimate age.

A deep learning network built with TensorFlow and Keras to classify gender and estimate age.

Official Pytorch implementation of "Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes", CVPR 2022

This repository contains the code for the paper "Hierarchical Motion Understanding via Motion Programs"

Exploring Versatile Prior for Human Motion via Motion Frequency Guidance (3DV2021)

A very simple baseline to estimate 2D & 3D SMPL-compatible keypoints from a single color image.

Learning hidden low dimensional dyanmics using a Generalized Onsager Principle and neural networks

Official repository for the CVPR 2021 paper "Learning Feature Aggregation for Deep 3D Morphable Models"

A pytorch reproduction of { Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation }.

【ACMMM 2021】DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning

Discover hidden deepweb pages

Magisk module to enable hidden features on Android 12 Developer Preview 1.

A library for hidden semi-Markov models with explicit durations

discovering subdomains, hidden paths, extracting unique links

Code for Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding

Exploit Camera Raw Data for Video Super-Resolution via Hidden Markov Model Inference