Generating Videos with Scene Dynamics

Carl Vondrick

Last update: Jan 4, 2023

Related tags

Overview

Generating Videos with Scene Dynamics

This repository contains an implementation of Generating Videos with Scene Dynamics by Carl Vondrick, Hamed Pirsiavash, Antonio Torralba, to appear at NIPS 2016. The model learns to generate tiny videos using adversarial networks.

Example Generations

Below are some selected videos that are generated by our model. These videos are not real; they are hallucinated by a generative video model. While they are not photo-realistic, the motions are fairly reasonable for the scene category they are trained on.

Beach	Golf
Train Station	Baby

Training

The code requires a Torch7 installation.

To train a generator for video, see main.lua. This file will construct the networks, start many threads to load data, and train the networks.

For the conditional version, see main_conditional.lua. This is similar to main.lua, except the input to the model is a static image.

To generate videos, see generate.lua. This file will also output intermediate layers, such as the mask and background image, which you can inspect manually.

Data

The data loading is designed assuming videos have been stabilized and flattened into JPEG images. We do this for efficiency. Stabilization is computationally slow and must be done offline, and reading one file per video is more efficient on NFS.

For our stabilization code, see the 'extra' directory. Essentially, this will convert each video into an image of vertically concatenated frames. After doing this, you create a text file listing all the frames, which you pass into the data loader.

Models

You can download our pre-trained models here (1 GB ZIP file).

Notes

The code is based on DCGAN and our starter code in Torch7.

If you find this useful for your research, please consider citing our NIPS paper.

License

MIT

Comments

Issues running main.lua

Hi

When I try to run main.lua I get this error message :

/data/vision/torralba/crossmodal/flickr_videos/scene_extract/lists-full/_b_beach.txt.train : No such file or directory

Do I have to manually download all the txt file of your project or is there a lua script in the project, which downloads all the data ?

opened by cryptedp 5
Publish datasets as torrent files

The article says there's 7 TB of files, they could be published as a set of torrent files that could be community-seeded from different machines.

It makes sense to split them to amounts individually downloadable to a typical machine (e.g. 7 torrents for 1 tb each)

opened by wizzard0 2
Pre-trained models for condition generation?

Hi Carl,can you also share the pretrained model for future generation? （This is similar to the request made by @Yuliang-Zou and @17Skye17 before）Before you said this can be done, maybe you forgot to update it ？ This will be very helpful to me.Thank you so much~

opened by HomeTong 0
Confusion about conditional model generation

Hi @cvondrick I'm trying to generate a conditional model with main_conditional.lua. The README.md says that a conditional model is trained on a single static image. So, I created a job_list.txt containing a single line referencing a single jpeg file, and ran stabilize_videos_many.py, which extracted a single scene. I then ran main_conditional.lua against the extracted scene. This resulted in 100 iterations through the dataset, and then the program stopped, without saving a model. It looks like main_conditional.lua expects to run 1000 iterations through the dataset, but when I run main_conditional.lua against a single scene, it only run 100 iterations.

Is there something I should be doing differently? Should I run main_conditional.lua against several identical scenes to mimic a larger dataset? Should I simply set niter to a higher number, or saveIter to a lower one?

Thanks to you and your colleagues for sharing this great work!

opened by maxenglander 1
Output size of generate.lua

I just ran generate.lua and I get a gif file, which size is 384 pixels and 1408 pixels, which mean that 6 gifs have been stacked horizontally and 22 gifs vertically. I checked the code, and nothing suggests to me, where these stacking is happening. What do I have to change, that the output is only a 64x64 image ?

opened by cryptedp 1
How can I make this work for smaller size inputs?

So I want to run this on 32x32 frames? What all changes will I have to make for this to work? Is there any short way to do it or will I have to hack through the entire code?

opened by prannayk 0
Question of UCF101

Hi, Carl！ Wonderful job! I just want to know where can I download the file mentioned in your code: /data/vision/torralba/hallucination/UCF101/gan/train.txt

Best regards.

opened by ghost 3

Generating Videos with Scene Dynamics

Related tags

Overview

Generating Videos with Scene Dynamics

Example Generations

Training

Data

Models

Notes

License

Comments

Issues running main.lua

Publish datasets as torrent files

Pre-trained models for condition generation?

Confusion about conditional model generation

Output size of generate.lua

How can I make this work for smaller size inputs?

Question of UCF101

Owner

Carl Vondrick

[TIP 2020] Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion

Neural Scene Graphs for Dynamic Scene (CVPR 2021)

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics

Code for ECCV 2020 paper "Contacts and Human Dynamics from Monocular Video".

Pytorch code for "State-only Imitation with Transition Dynamics Mismatch" (ICLR 2020)

PyTorch Code of "Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spatiotemporal Dynamics"

VID-Fusion: Robust Visual-Inertial-Dynamics Odometry for Accurate External Force Estimation

Official PyTorch implementation of "Physics-aware Difference Graph Networks for Sparsely-Observed Dynamics".

source code for https://arxiv.org/abs/2005.11248 "Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics"

PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and reinforcement learning

This is the open-source reference implementation of the SIGGRAPH 2021 paper Intersection-free Rigid Body Dynamics.

Pytorch implementation of CVPR2020 paper “VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation”

A python implementation of Physics-informed Spline Learning for nonlinear dynamics discovery

A Python framework for developing parallelized Computational Fluid Dynamics software to solve the hyperbolic 2D Euler equations on distributed, multi-block structured grids.