Playable Video Generation

Playable Video Generation
Willi Menapace, Stéphane Lathuilière, Sergey Tulyakov, Aliaksandr Siarohin, Elisa Ricci

Paper: ArXiv
Supplementary: Website
Demo: Try it Live

Abstract: This paper introduces the unsupervised learning problem of playable video generation (PVG). In PVG, we aim at allowing a user to control the generated video by selecting a discrete action at every time step as when playing a video game. The difficulty of the task lies both in learning semantically consistent actions and in generating realistic videos conditioned on the user input. We propose a novel framework for PVG that is trained in a self-supervised manner on a large dataset of unlabelled videos. We employ an encoder-decoder architecture where the predicted action labels act as bottleneck. The network is constrained to learn a rich action space using, as main driving loss, a reconstruction loss on the generated video. We demonstrate the effectiveness of the proposed approach on several datasets with wide environment variety.

Overview

Figure 1. Illustration of the proposed CADDY model for playable video generation.

Given a set of completely unlabeled videos, we jointly learn a set of discrete actions and a video generation model conditioned on the learned actions. At test time, the user can control the generated video on-the-fly providing action labels as if he or she was playing a videogame. We name our method CADDY. Our architecture for unsupervised playable video generation is composed by several components. An encoder E extracts frame representations from the input sequence. A temporal model estimates the successive states using a recurrent dynamics network R and an action network A which predicts the action label corresponding to the current action performed in the input sequence. Finally, a decoder D reconstructs the input frames. The model is trained using reconstruction as the main driving loss.

Requirements

We recommend the use of Linux and of one or more CUDA compatible GPUs. We provide both a Conda environment and a Dockerfile to configure the required libraries.

Conda

The environment can be installed and activated with:

conda env create -f env.yml

conda activate video-generation

Docker

Use the Dockerfile to build the docker image:

docker build -t video-generation:1.0 .

Run the docker image mounting the root directory to /video-generation in the docker container:

docker run -it --gpus all --ipc=host -v /path/to/directory/video-generation:/video-generation video-generation:1.0 /bin/bash

Preparing Datasets

BAIR

Coming soon

Atari Breakout

Download the breakout_160_ours.tar.gz archive from Google Drive and extract it under the data folder.

Tennis

The Tennis dataset is automatically acquired from Youtube by running

./get_tennis_dataset.sh

This requires an installation of youtube-dl (Download). Please run youtube-dl -U to update the utility to the latest version. The dataset will be created at data/tennis_v4_256_ours.

Custom Datasets

Custom datasets can be created from a user-provided folder containing plain videos. Acquired video frames are sampled at the specified resolution and framerate. ffmpeg is used for the extraction and supports multiple input formats. By default only mp4 files are acquired.

python -m dataset.acquisition.convert_video_directory --video_directory --output_directory --target_size [--fps --video_extension --processes ]

As an example the following command transforms all mp4 videos in the tmp/my_videos directory into a 256x256px dataset sampled at 10fps and saves it in the data/my_videos folder python -m dataset.acquisition.convert_video_directory --video_directory tmp/my_videos --output_directory data/my_videos --target_size 256 256 --fps 10

Using Pretrained Models

Pretrained models in .pth.tar format are available for all the datasets and can be downloaded at the following link: Google Drive

Please place each directory under the checkpoints folder. Training and inference scripts automatically make use of the latest.pth.tar checkpoint when present in the checkpoints subfolder corresponding to the configuration in use.

Playing

When a latest.pth.tar checkpoint is present under the checkpoints folder corresponding to the current configuration, the model can be interactively used to generate videos with the following commands:

Bair: python play.py --config configs/01_bair.yaml
Breakout: python play.py configs/breakout/02_breakout.yaml
Tennis: python play.py --config configs/03_tennis.yaml

A full screen window will appear and actions can be provided using number keys in the range [1, actions_count]. Number key 0 resets the generation process.

The inference process is lightweight and can be executed even in browser as in our Live Demo.

Training

The models can be trained with the following commands:

python train.py --config configs/

The training process generates multiple files under the results and checkpoint directories a sub directory with the name corresponding to the one specified in the configuration file. In particular, the folder under the results directory will contain an images folder showing qualitative results obtained during training. The checkpoints subfolder will contain regularly saved checkpoints and the latest.pth.tar checkpoint representing the latest model parameters.

The training can be completely monitored through Weights and Biases by running before execution of the training command: wandb init

Training the model in full resolution on our datasets required the following GPU resources:

BAIR: 4x2080Ti 44GB
Breakout: 1x2080Ti 11GB
Tennis: 2x2080 16GB

Lower resolution versions of the model can be trained with a single 8GB GPU.

Evaluation

Evaluation requires two steps. First, an evaluation dataset must be built. Second, evaluation is carried out on the evaluation dataset. To build the evaluation dataset please issue:

python build_evaluation_dataset.py --config configs/

The command creates a reconstruction of the test portion of the dataset under the results//evaluation_dataset directory. To run evaluation issue:

python evaluate_dataset.py --config configs/evaluation/configs/

Evaluation results are saved under the evaluation_results directory the folder specified in the configuration file with the name data.yml.

When I run train.py, it trains fine for about 1000 or so iterations, and then it crashes due to some threading error involving tkinter.

I ran this command: (video-generation) G ➜ PlayableVideoGeneration git:(main) python3 train.py --config configs/02_breakout.yaml

And much further down, many minutes later I got this:

step: 984/300000 loss_component_observations_rec:0.039 loss_component_perceptual_loss:0.808 loss_component_hidden_states_rec:0.222 loss_component_states_rec:0.001 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.039 avg_perceptual_loss:0.417 states_rec_loss:0.007 hidden_states_rec_loss:0.222 entropy_loss:0.907 samples_entropy:0.696 action_distribution_entropy:1.053 states_magnitude:0.656 hidden_states_magnitude:0.419 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.676 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.678 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.034 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.727 action_variations_mean:-0.026 reconstructed_action_directions_kl_loss:0.033 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.492 observations_rec_loss_r0:0.028 perceptual_loss_r0_l0:0.492 perceptual_loss_r0_l1:0.085 perceptual_loss_r0_l2:0.165 perceptual_loss_r0_l3:0.161 perceptual_loss_r0_l4:0.061 perceptual_loss_r1:0.246 observations_rec_loss_r1:0.018 perceptual_loss_r1_l0:0.246 perceptual_loss_r1_l1:0.044 perceptual_loss_r1_l2:0.078 perceptual_loss_r1_l3:0.077 perceptual_loss_r1_l4:0.035 perceptual_loss_r2:0.513 observations_rec_loss_r2:0.071 perceptual_loss_r2_l0:0.513 perceptual_loss_r2_l1:0.123 perceptual_loss_r2_l2:0.168 perceptual_loss_r2_l3:0.128 perceptual_loss_r2_l4:0.047 loss:1.071 lr: 0.0004
step: 985/300000 loss_component_observations_rec:0.038 loss_component_perceptual_loss:0.814 loss_component_hidden_states_rec:0.221 loss_component_states_rec:0.002 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.038 avg_perceptual_loss:0.420 states_rec_loss:0.008 hidden_states_rec_loss:0.221 entropy_loss:0.928 samples_entropy:0.676 action_distribution_entropy:1.021 states_magnitude:0.657 hidden_states_magnitude:0.419 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.675 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.678 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.034 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.693 action_variations_mean:0.036 reconstructed_action_directions_kl_loss:0.033 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.484 observations_rec_loss_r0:0.028 perceptual_loss_r0_l0:0.484 perceptual_loss_r0_l1:0.084 perceptual_loss_r0_l2:0.161 perceptual_loss_r0_l3:0.160 perceptual_loss_r0_l4:0.060 perceptual_loss_r1:0.257 observations_rec_loss_r1:0.017 perceptual_loss_r1_l0:0.257 perceptual_loss_r1_l1:0.044 perceptual_loss_r1_l2:0.081 perceptual_loss_r1_l3:0.084 perceptual_loss_r1_l4:0.036 perceptual_loss_r2:0.518 observations_rec_loss_r2:0.068 perceptual_loss_r2_l0:0.518 perceptual_loss_r2_l1:0.119 perceptual_loss_r2_l2:0.169 perceptual_loss_r2_l3:0.134 perceptual_loss_r2_l4:0.051 loss:1.075 lr: 0.0004
step: 986/300000 loss_component_observations_rec:0.039 loss_component_perceptual_loss:0.797 loss_component_hidden_states_rec:0.219 loss_component_states_rec:0.002 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.039 avg_perceptual_loss:0.411 states_rec_loss:0.008 hidden_states_rec_loss:0.219 entropy_loss:0.915 samples_entropy:0.727 action_distribution_entropy:1.049 states_magnitude:0.655 hidden_states_magnitude:0.419 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.676 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.678 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.034 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.649 action_variations_mean:-0.017 reconstructed_action_directions_kl_loss:0.033 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.478 observations_rec_loss_r0:0.030 perceptual_loss_r0_l0:0.478 perceptual_loss_r0_l1:0.084 perceptual_loss_r0_l2:0.160 perceptual_loss_r0_l3:0.155 perceptual_loss_r0_l4:0.060 perceptual_loss_r1:0.248 observations_rec_loss_r1:0.018 perceptual_loss_r1_l0:0.248 perceptual_loss_r1_l1:0.043 perceptual_loss_r1_l2:0.079 perceptual_loss_r1_l3:0.081 perceptual_loss_r1_l4:0.034 perceptual_loss_r2:0.507 observations_rec_loss_r2:0.070 perceptual_loss_r2_l0:0.507 perceptual_loss_r2_l1:0.117 perceptual_loss_r2_l2:0.163 perceptual_loss_r2_l3:0.132 perceptual_loss_r2_l4:0.051 loss:1.057 lr: 0.0004
step: 987/300000 loss_component_observations_rec:0.038 loss_component_perceptual_loss:0.791 loss_component_hidden_states_rec:0.217 loss_component_states_rec:0.001 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.038 avg_perceptual_loss:0.408 states_rec_loss:0.007 hidden_states_rec_loss:0.217 entropy_loss:0.886 samples_entropy:0.665 action_distribution_entropy:0.977 states_magnitude:0.654 hidden_states_magnitude:0.418 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.673 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.676 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.035 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.616 action_variations_mean:-0.237 reconstructed_action_directions_kl_loss:0.034 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.466 observations_rec_loss_r0:0.028 perceptual_loss_r0_l0:0.466 perceptual_loss_r0_l1:0.083 perceptual_loss_r0_l2:0.158 perceptual_loss_r0_l3:0.150 perceptual_loss_r0_l4:0.056 perceptual_loss_r1:0.245 observations_rec_loss_r1:0.017 perceptual_loss_r1_l0:0.245 perceptual_loss_r1_l1:0.043 perceptual_loss_r1_l2:0.077 perceptual_loss_r1_l3:0.078 perceptual_loss_r1_l4:0.036 perceptual_loss_r2:0.513 observations_rec_loss_r2:0.069 perceptual_loss_r2_l0:0.513 perceptual_loss_r2_l1:0.118 perceptual_loss_r2_l2:0.165 perceptual_loss_r2_l3:0.136 perceptual_loss_r2_l4:0.048 loss:1.048 lr: 0.0004
step: 988/300000 loss_component_observations_rec:0.037 loss_component_perceptual_loss:0.813 loss_component_hidden_states_rec:0.218 loss_component_states_rec:0.002 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.037 avg_perceptual_loss:0.419 states_rec_loss:0.008 hidden_states_rec_loss:0.218 entropy_loss:0.922 samples_entropy:0.615 action_distribution_entropy:0.983 states_magnitude:0.656 hidden_states_magnitude:0.417 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.672 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.673 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.035 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.750 action_variations_mean:0.063 reconstructed_action_directions_kl_loss:0.035 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.484 observations_rec_loss_r0:0.026 perceptual_loss_r0_l0:0.484 perceptual_loss_r0_l1:0.084 perceptual_loss_r0_l2:0.164 perceptual_loss_r0_l3:0.157 perceptual_loss_r0_l4:0.059 perceptual_loss_r1:0.244 observations_rec_loss_r1:0.018 perceptual_loss_r1_l0:0.244 perceptual_loss_r1_l1:0.042 perceptual_loss_r1_l2:0.077 perceptual_loss_r1_l3:0.079 perceptual_loss_r1_l4:0.035 perceptual_loss_r2:0.530 observations_rec_loss_r2:0.069 perceptual_loss_r2_l0:0.530 perceptual_loss_r2_l1:0.123 perceptual_loss_r2_l2:0.172 perceptual_loss_r2_l3:0.138 perceptual_loss_r2_l4:0.051 loss:1.070 lr: 0.0004
step: 989/300000 loss_component_observations_rec:0.037 loss_component_perceptual_loss:0.777 loss_component_hidden_states_rec:0.216 loss_component_states_rec:0.001 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.037 avg_perceptual_loss:0.401 states_rec_loss:0.007 hidden_states_rec_loss:0.216 entropy_loss:0.954 samples_entropy:0.645 action_distribution_entropy:1.025 states_magnitude:0.656 hidden_states_magnitude:0.418 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.674 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.674 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.034 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.599 action_variations_mean:0.092 reconstructed_action_directions_kl_loss:0.034 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.476 observations_rec_loss_r0:0.027 perceptual_loss_r0_l0:0.476 perceptual_loss_r0_l1:0.084 perceptual_loss_r0_l2:0.161 perceptual_loss_r0_l3:0.154 perceptual_loss_r0_l4:0.057 perceptual_loss_r1:0.215 observations_rec_loss_r1:0.017 perceptual_loss_r1_l0:0.215 perceptual_loss_r1_l1:0.040 perceptual_loss_r1_l2:0.069 perceptual_loss_r1_l3:0.066 perceptual_loss_r1_l4:0.030 perceptual_loss_r2:0.512 observations_rec_loss_r2:0.067 perceptual_loss_r2_l0:0.512 perceptual_loss_r2_l1:0.119 perceptual_loss_r2_l2:0.166 perceptual_loss_r2_l3:0.129 perceptual_loss_r2_l4:0.052 loss:1.031 lr: 0.0004
step: 990/300000 loss_component_observations_rec:0.037 loss_component_perceptual_loss:0.777 loss_component_hidden_states_rec:0.217 loss_component_states_rec:0.001 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.037 avg_perceptual_loss:0.401 states_rec_loss:0.006 hidden_states_rec_loss:0.217 entropy_loss:0.940 samples_entropy:0.823 action_distribution_entropy:1.053 states_magnitude:0.657 hidden_states_magnitude:0.419 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.677 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.678 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.034 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.531 action_variations_mean:0.005 reconstructed_action_directions_kl_loss:0.033 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.462 observations_rec_loss_r0:0.027 perceptual_loss_r0_l0:0.462 perceptual_loss_r0_l1:0.080 perceptual_loss_r0_l2:0.153 perceptual_loss_r0_l3:0.149 perceptual_loss_r0_l4:0.061 perceptual_loss_r1:0.231 observations_rec_loss_r1:0.018 perceptual_loss_r1_l0:0.231 perceptual_loss_r1_l1:0.040 perceptual_loss_r1_l2:0.073 perceptual_loss_r1_l3:0.073 perceptual_loss_r1_l4:0.034 perceptual_loss_r2:0.509 observations_rec_loss_r2:0.065 perceptual_loss_r2_l0:0.509 perceptual_loss_r2_l1:0.116 perceptual_loss_r2_l2:0.165 perceptual_loss_r2_l3:0.132 perceptual_loss_r2_l4:0.051 loss:1.032 lr: 0.0004
step: 991/300000 loss_component_observations_rec:0.041 loss_component_perceptual_loss:0.843 loss_component_hidden_states_rec:0.222 loss_component_states_rec:0.002 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.041 avg_perceptual_loss:0.435 states_rec_loss:0.008 hidden_states_rec_loss:0.222 entropy_loss:0.906 samples_entropy:0.729 action_distribution_entropy:1.034 states_magnitude:0.654 hidden_states_magnitude:0.419 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.680 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.680 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.033 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.662 action_variations_mean:-0.087 reconstructed_action_directions_kl_loss:0.033 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.506 observations_rec_loss_r0:0.031 perceptual_loss_r0_l0:0.506 perceptual_loss_r0_l1:0.086 perceptual_loss_r0_l2:0.166 perceptual_loss_r0_l3:0.167 perceptual_loss_r0_l4:0.067 perceptual_loss_r1:0.274 observations_rec_loss_r1:0.020 perceptual_loss_r1_l0:0.274 perceptual_loss_r1_l1:0.047 perceptual_loss_r1_l2:0.086 perceptual_loss_r1_l3:0.087 perceptual_loss_r1_l4:0.042 perceptual_loss_r2:0.524 observations_rec_loss_r2:0.071 perceptual_loss_r2_l0:0.524 perceptual_loss_r2_l1:0.123 perceptual_loss_r2_l2:0.172 perceptual_loss_r2_l3:0.135 perceptual_loss_r2_l4:0.048 loss:1.108 lr: 0.0004
step: 992/300000 loss_component_observations_rec:0.039 loss_component_perceptual_loss:0.839 loss_component_hidden_states_rec:0.218 loss_component_states_rec:0.001 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.039 avg_perceptual_loss:0.433 states_rec_loss:0.007 hidden_states_rec_loss:0.218 entropy_loss:0.900 samples_entropy:0.640 action_distribution_entropy:1.012 states_magnitude:0.654 hidden_states_magnitude:0.417 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.679 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.681 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.033 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.773 action_variations_mean:-0.026 reconstructed_action_directions_kl_loss:0.033 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.502 observations_rec_loss_r0:0.028 perceptual_loss_r0_l0:0.502 perceptual_loss_r0_l1:0.086 perceptual_loss_r0_l2:0.166 perceptual_loss_r0_l3:0.165 perceptual_loss_r0_l4:0.065 perceptual_loss_r1:0.261 observations_rec_loss_r1:0.018 perceptual_loss_r1_l0:0.261 perceptual_loss_r1_l1:0.044 perceptual_loss_r1_l2:0.082 perceptual_loss_r1_l3:0.084 perceptual_loss_r1_l4:0.039 perceptual_loss_r2:0.535 observations_rec_loss_r2:0.071 perceptual_loss_r2_l0:0.535 perceptual_loss_r2_l1:0.123 perceptual_loss_r2_l2:0.173 perceptual_loss_r2_l3:0.139 perceptual_loss_r2_l4:0.053 loss:1.097 lr: 0.0004
step: 993/300000 loss_component_observations_rec:0.038 loss_component_perceptual_loss:0.823 loss_component_hidden_states_rec:0.221 loss_component_states_rec:0.001 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.038 avg_perceptual_loss:0.424 states_rec_loss:0.007 hidden_states_rec_loss:0.221 entropy_loss:0.902 samples_entropy:0.650 action_distribution_entropy:1.029 states_magnitude:0.653 hidden_states_magnitude:0.417 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.683 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.684 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.032 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.737 action_variations_mean:-0.088 reconstructed_action_directions_kl_loss:0.032 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.497 observations_rec_loss_r0:0.026 perceptual_loss_r0_l0:0.497 perceptual_loss_r0_l1:0.084 perceptual_loss_r0_l2:0.165 perceptual_loss_r0_l3:0.165 perceptual_loss_r0_l4:0.064 perceptual_loss_r1:0.253 observations_rec_loss_r1:0.017 perceptual_loss_r1_l0:0.253 perceptual_loss_r1_l1:0.043 perceptual_loss_r1_l2:0.078 perceptual_loss_r1_l3:0.083 perceptual_loss_r1_l4:0.039 perceptual_loss_r2:0.522 observations_rec_loss_r2:0.070 perceptual_loss_r2_l0:0.522 perceptual_loss_r2_l1:0.119 perceptual_loss_r2_l2:0.167 perceptual_loss_r2_l3:0.136 perceptual_loss_r2_l4:0.054 loss:1.082 lr: 0.0004
step: 994/300000 loss_component_observations_rec:0.039 loss_component_perceptual_loss:0.840 loss_component_hidden_states_rec:0.223 loss_component_states_rec:0.001 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.039 avg_perceptual_loss:0.433 states_rec_loss:0.007 hidden_states_rec_loss:0.223 entropy_loss:0.921 samples_entropy:0.652 action_distribution_entropy:0.991 states_magnitude:0.654 hidden_states_magnitude:0.418 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.682 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.684 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.032 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.670 action_variations_mean:0.009 reconstructed_action_directions_kl_loss:0.032 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.502 observations_rec_loss_r0:0.029 perceptual_loss_r0_l0:0.502 perceptual_loss_r0_l1:0.085 perceptual_loss_r0_l2:0.166 perceptual_loss_r0_l3:0.167 perceptual_loss_r0_l4:0.063 perceptual_loss_r1:0.259 observations_rec_loss_r1:0.018 perceptual_loss_r1_l0:0.259 perceptual_loss_r1_l1:0.046 perceptual_loss_r1_l2:0.083 perceptual_loss_r1_l3:0.083 perceptual_loss_r1_l4:0.034 perceptual_loss_r2:0.538 observations_rec_loss_r2:0.069 perceptual_loss_r2_l0:0.538 perceptual_loss_r2_l1:0.122 perceptual_loss_r2_l2:0.173 perceptual_loss_r2_l3:0.142 perceptual_loss_r2_l4:0.055 loss:1.103 lr: 0.0004
step: 995/300000 loss_component_observations_rec:0.037 loss_component_perceptual_loss:0.797 loss_component_hidden_states_rec:0.219 loss_component_states_rec:0.001 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.037 avg_perceptual_loss:0.411 states_rec_loss:0.006 hidden_states_rec_loss:0.219 entropy_loss:0.955 samples_entropy:0.738 action_distribution_entropy:1.055 states_magnitude:0.655 hidden_states_magnitude:0.419 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.679 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.680 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.033 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.549 action_variations_mean:0.042 reconstructed_action_directions_kl_loss:0.033 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.454 observations_rec_loss_r0:0.027 perceptual_loss_r0_l0:0.454 perceptual_loss_r0_l1:0.079 perceptual_loss_r0_l2:0.152 perceptual_loss_r0_l3:0.147 perceptual_loss_r0_l4:0.057 perceptual_loss_r1:0.233 observations_rec_loss_r1:0.017 perceptual_loss_r1_l0:0.233 perceptual_loss_r1_l1:0.040 perceptual_loss_r1_l2:0.074 perceptual_loss_r1_l3:0.076 perceptual_loss_r1_l4:0.032 perceptual_loss_r2:0.547 observations_rec_loss_r2:0.069 perceptual_loss_r2_l0:0.547 perceptual_loss_r2_l1:0.124 perceptual_loss_r2_l2:0.176 perceptual_loss_r2_l3:0.146 perceptual_loss_r2_l4:0.054 loss:1.055 lr: 0.0004
step: 996/300000 loss_component_observations_rec:0.037 loss_component_perceptual_loss:0.848 loss_component_hidden_states_rec:0.219 loss_component_states_rec:0.001 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.037 avg_perceptual_loss:0.437 states_rec_loss:0.007 hidden_states_rec_loss:0.219 entropy_loss:0.957 samples_entropy:0.737 action_distribution_entropy:1.052 states_magnitude:0.655 hidden_states_magnitude:0.419 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.685 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.686 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.032 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.732 action_variations_mean:0.245 reconstructed_action_directions_kl_loss:0.032 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.511 observations_rec_loss_r0:0.026 perceptual_loss_r0_l0:0.511 perceptual_loss_r0_l1:0.085 perceptual_loss_r0_l2:0.166 perceptual_loss_r0_l3:0.170 perceptual_loss_r0_l4:0.070 perceptual_loss_r1:0.264 observations_rec_loss_r1:0.017 perceptual_loss_r1_l0:0.264 perceptual_loss_r1_l1:0.043 perceptual_loss_r1_l2:0.081 perceptual_loss_r1_l3:0.086 perceptual_loss_r1_l4:0.043 perceptual_loss_r2:0.535 observations_rec_loss_r2:0.069 perceptual_loss_r2_l0:0.535 perceptual_loss_r2_l1:0.121 perceptual_loss_r2_l2:0.173 perceptual_loss_r2_l3:0.137 perceptual_loss_r2_l4:0.057 loss:1.106 lr: 0.0004
step: 997/300000 loss_component_observations_rec:0.037 loss_component_perceptual_loss:0.797 loss_component_hidden_states_rec:0.220 loss_component_states_rec:0.001 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.037 avg_perceptual_loss:0.411 states_rec_loss:0.007 hidden_states_rec_loss:0.220 entropy_loss:0.953 samples_entropy:0.789 action_distribution_entropy:1.073 states_magnitude:0.657 hidden_states_magnitude:0.420 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.685 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.686 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.032 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.588 action_variations_mean:0.111 reconstructed_action_directions_kl_loss:0.031 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.483 observations_rec_loss_r0:0.026 perceptual_loss_r0_l0:0.483 perceptual_loss_r0_l1:0.083 perceptual_loss_r0_l2:0.162 perceptual_loss_r0_l3:0.159 perceptual_loss_r0_l4:0.060 perceptual_loss_r1:0.230 observations_rec_loss_r1:0.017 perceptual_loss_r1_l0:0.230 perceptual_loss_r1_l1:0.041 perceptual_loss_r1_l2:0.074 perceptual_loss_r1_l3:0.072 perceptual_loss_r1_l4:0.031 perceptual_loss_r2:0.521 observations_rec_loss_r2:0.069 perceptual_loss_r2_l0:0.521 perceptual_loss_r2_l1:0.122 perceptual_loss_r2_l2:0.170 perceptual_loss_r2_l3:0.136 perceptual_loss_r2_l4:0.046 loss:1.056 lr: 0.0004
step: 998/300000 loss_component_observations_rec:0.037 loss_component_perceptual_loss:0.769 loss_component_hidden_states_rec:0.218 loss_component_states_rec:0.001 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.037 avg_perceptual_loss:0.397 states_rec_loss:0.007 hidden_states_rec_loss:0.218 entropy_loss:0.921 samples_entropy:0.684 action_distribution_entropy:1.048 states_magnitude:0.656 hidden_states_magnitude:0.419 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.691 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.694 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.030 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.663 action_variations_mean:-0.024 reconstructed_action_directions_kl_loss:0.030 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.467 observations_rec_loss_r0:0.026 perceptual_loss_r0_l0:0.467 perceptual_loss_r0_l1:0.081 perceptual_loss_r0_l2:0.157 perceptual_loss_r0_l3:0.153 perceptual_loss_r0_l4:0.058 perceptual_loss_r1:0.227 observations_rec_loss_r1:0.017 perceptual_loss_r1_l0:0.227 perceptual_loss_r1_l1:0.040 perceptual_loss_r1_l2:0.074 perceptual_loss_r1_l3:0.071 perceptual_loss_r1_l4:0.032 perceptual_loss_r2:0.496 observations_rec_loss_r2:0.069 perceptual_loss_r2_l0:0.496 perceptual_loss_r2_l1:0.117 perceptual_loss_r2_l2:0.160 perceptual_loss_r2_l3:0.129 perceptual_loss_r2_l4:0.045 loss:1.025 lr: 0.0004
step: 999/300000 loss_component_observations_rec:0.037 loss_component_perceptual_loss:0.788 loss_component_hidden_states_rec:0.220 loss_component_states_rec:0.001 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.037 avg_perceptual_loss:0.407 states_rec_loss:0.007 hidden_states_rec_loss:0.220 entropy_loss:0.918 samples_entropy:0.722 action_distribution_entropy:1.030 states_magnitude:0.657 hidden_states_magnitude:0.420 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.689 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.691 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.031 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.602 action_variations_mean:-0.057 reconstructed_action_directions_kl_loss:0.030 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.477 observations_rec_loss_r0:0.026 perceptual_loss_r0_l0:0.477 perceptual_loss_r0_l1:0.082 perceptual_loss_r0_l2:0.158 perceptual_loss_r0_l3:0.156 perceptual_loss_r0_l4:0.061 perceptual_loss_r1:0.223 observations_rec_loss_r1:0.017 perceptual_loss_r1_l0:0.223 perceptual_loss_r1_l1:0.040 perceptual_loss_r1_l2:0.072 perceptual_loss_r1_l3:0.070 perceptual_loss_r1_l4:0.029 perceptual_loss_r2:0.522 observations_rec_loss_r2:0.069 perceptual_loss_r2_l0:0.522 perceptual_loss_r2_l1:0.122 perceptual_loss_r2_l2:0.166 perceptual_loss_r2_l3:0.130 perceptual_loss_r2_l4:0.057 loss:1.046 lr: 0.0004
Exception ignored in: <function Image.__del__ at 0x7f10c53415f0>
Traceback (most recent call last):
  File "/home/ryan/miniconda3/envs/video-generation/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <function Image.__del__ at 0x7f10c53415f0>
Traceback (most recent call last):
  File "/home/ryan/miniconda3/envs/video-generation/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <function Variable.__del__ at 0x7f10c5a7cb90>
Traceback (most recent call last):
  File "/home/ryan/miniconda3/envs/video-generation/lib/python3.7/tkinter/__init__.py", line 332, in __del__
    if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
RuntimeError: main thread is not in main loop
Exception ignored in: <function Image.__del__ at 0x7f10c53415f0>
Traceback (most recent call last):
  File "/home/ryan/miniconda3/envs/video-generation/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <function Variable.__del__ at 0x7f10c5a7cb90>
Traceback (most recent call last):
  File "/home/ryan/miniconda3/envs/video-generation/lib/python3.7/tkinter/__init__.py", line 332, in __del__
    if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
RuntimeError: main thread is not in main loop
Exception ignored in: <function Image.__del__ at 0x7f10c53415f0>
Traceback (most recent call last):
  File "/home/ryan/miniconda3/envs/video-generation/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <function Image.__del__ at 0x7f10c53415f0>
Traceback (most recent call last):
  File "/home/ryan/miniconda3/envs/video-generation/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <function Image.__del__ at 0x7f10c53415f0>
Traceback (most recent call last):
  File "/home/ryan/miniconda3/envs/video-generation/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <function Image.__del__ at 0x7f10c53415f0>
Traceback (most recent call last):
  File "/home/ryan/miniconda3/envs/video-generation/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <function Image.__del__ at 0x7f10c53415f0>
Traceback (most recent call last):
  File "/home/ryan/miniconda3/envs/video-generation/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Tcl_AsyncDelete: async handler deleted by the wrong thread
[1]    4391 abort (core dumped)  python3 train.py --config configs/02_breakout.yaml
(video-generation) G ➜ PlayableVideoGeneration git:(main) wandb: Program ended successfully.
wandb: Run summary:
wandb:                                                                train/perceptual_loss_r0_l2 0.15621022880077362
wandb:                                                                                      _step 981
wandb:                                       train/loss_component_action_directions_kl_divergence 3.709313273429871e-06
wandb:                                              train/reconstructed_action_directions_kl_loss 0.03681553900241852
wandb:                                                       train/loss_component_perceptual_loss 0.8016525010267893
wandb:                                          train/loss_component_action_state_distribution_kl 0.0
wandb:                                   train/reconstructed_action_directions_variance_magnitude 0.6641947031021118
wandb:                                                                train/perceptual_loss_r0_l0 0.4748656153678894
wandb:                                                            train/avg_observations_rec_loss 0.038134743149081864
wandb:                                                                   train/perceptual_loss_r1 0.24382595717906952
wandb:                                                                   train/perceptual_loss_r2 0.5221801400184631
wandb:                                                                train/perceptual_loss_r1_l1 0.04235634580254555
wandb:                                                                train/perceptual_loss_r0_l3 0.1558036208152771
wandb:                                                                   train/perceptual_loss_r0 0.4748656153678894
wandb:                                                                      train/samples_entropy 0.6838436126708984
wandb:                                                                train/perceptual_loss_r1_l4 0.03565201908349991
wandb:                                                            train/loss_component_states_rec 0.0014105471782386303
wandb:                                                      train/loss_component_observations_rec 0.038134743149081864
wandb:                                                                train/perceptual_loss_r2_l0 0.5221801400184631
wandb:                                                              train/hidden_states_magnitude 0.419323205947876
wandb:                                                           train/average_centroids_distance 1.7627293345867656e-05
wandb:                                                                  train/avg_perceptual_loss 0.413623904188474
wandb:                                       train/reconstructed_action_directions_mean_magnitude 0.0011238184524700046
wandb:                                                                train/perceptual_loss_r0_l4 0.06138833239674568
wandb:                                                                train/perceptual_loss_r1_l2 0.07608731091022491
wandb:                                                                train/perceptual_loss_r2_l1 0.12005559355020523
wandb:                                                                      train/states_rec_loss 0.0070527358911931515
wandb:                                                                train/perceptual_loss_r2_l4 0.05388300120830536
wandb:                                                                   train/gumbel_temperature 0.97057
wandb:                                                       train/action_mutual_information_loss -3.69977205991745e-05
wandb:                                                                   train/observations_count 7.0
wandb:                                                     train/action_directions_mean_magnitude 0.0011720473412424326
wandb:                                                            train/ground_truth_observations 6.0
wandb:                                                                train/perceptual_loss_r0_l1 0.08212797343730927
wandb:                                                               train/hidden_states_rec_loss 0.21878568828105927
wandb:                                                               train/action_variations_mean -0.14023524522781372
wandb:                                                                train/perceptual_loss_r2_l2 0.16710881888866425
wandb:                                                                                 train/loss 1.0599816392899204
wandb:                                                                                   train/lr 0.0004
wandb:                                             train/loss_component_action_mutual_information -5.549658089876175e-06
wandb:                                                                train/perceptual_loss_r1_l3 0.07853354513645172
wandb:                                                                                   _runtime 396.27002787590027
wandb:                                                                     train/states_magnitude 0.6560764312744141
wandb:                                                          train/action_distribution_entropy 1.0813705921173096
wandb:                                                            train/action_directions_kl_loss 0.037093132734298706
wandb:                                                    train/action_state_distribution_kl_loss 8.807628546492197e-06
wandb:                                                             train/observations_rec_loss_r2 0.06988160312175751
wandb:                                                             train/observations_rec_loss_r0 0.02682528644800186
wandb:                                                               train/loss_component_entropy 0.0
wandb:                                                             train/observations_rec_loss_r1 0.01769733987748623
wandb:                                                 train/action_directions_variance_magnitude 0.6632212996482849
wandb:                                                                         train/entropy_loss 0.897321343421936
wandb:                                                                train/perceptual_loss_r2_l3 0.13487893342971802
wandb:                                                                                       step 981
wandb:                                                             train/centroids_mean_magnitude 1.4390966498467606e-05
wandb:                                                                                 _timestamp 1626460241.496398
wandb:                                               train/action_directions_reconstruction_error 2.0398056221893057e-07
wandb:                                                    train/average_action_variations_norm_l2 0.6760526895523071
wandb:                                                     train/loss_component_hidden_states_rec 0.21878568828105927
wandb:                                                                train/perceptual_loss_r1_l0 0.24382595717906952
wandb: Syncing files in wandb/run-20210716_182405-w65gbanw:
wandb:   code/train.py
wandb: plus 8 W&B file(s) and 1 media file(s)
wandb:
wandb: Synced 02_breakout: https://app.wandb.ai/ryanburgert/video-generation/runs/w65gbanw
(video-generation) G ➜ PlayableVideoGeneration git:(main)

Are there any workarounds for this? I've tried finding references to tkinter, and there are none in the code that I can see. UPDATE: I found that tesnor_displayer uses matplotlib.pyplot, which imports tkinter.

I'm using the provided conda environment and docker container. I get this error consistently each time I try running it.

Tennis dataset

Hi,

Thank you for the paper and the code. It is very interesting.

I am new to this area so I'm learning about the training method. Could you give me more details on how you setup the Tennis dataset? I would like to expand to my own dataset for video generation.

Thanks

opened by nikky4D 4
ResolvePackageNotFound Error
Hello, I'm trying to install using Conda on Windows 10, but when I run "conda env create -f env.yml" it gives me the following error: Collecting package metadata (repodata.json): done Solving environment: failed

ResolvePackageNotFound:

gstreamer=1.14.0

readline=7.0

libgcc-ng=9.1.0

glib=2.63.1

ld_impl_linux-64=2.33.1

gmp=6.2.0

rhash=1.3.8

libedit=3.1.20181209

gst-plugins-base=1.14.0

ncurses=6.2

libstdcxx-ng=9.1.0

libuuid=1.0.3

expat=2.2.6

libgfortran-ng=7.3.0

dbus=1.13.12
opened by AVTV64 3
Ablation studies
I am trying to do ablation studies on Tenis dataset, different from what is done in paper for BAIR.

It looks switching off G.S is straightforward from yml file. However, switching off vt - action variability embedding and L_act: training with the mutual information loss doesn't look that simple.

Can you shed some light on this how to proceed?

I see that to switch off L_act, there are many places to comment code. Or is it ok to set action_mutual_information_lambda and action_mutual_information_lambda_pretraining to 0? Does this work?

About v_t, I am just unable to figure that out how to switch it off in code. From the paper, it is defined as the difference between the observed action direction dt and its assigned cluster centroid. The only clue I find is in model.py line 188 says:

if not self.config["model"]["action_network"]["use_variations"]: flat_action_variations = flat_action_variations * 0

Does use_variations=False helps to do this ablation study?
opened by karims 2
Running on web

I'm trying to run pre-trained models on remote server and I want to visualize interaction like in the demo website.

Is it possible that you can point out the source code of the demo website?

Also, is it possible to run the pre-trained models on jupyter and see the video generation?

opened by karims 2
Question on Variability Embedding

Hello,

Thank you for providing the code to this paper!

I was wondering how you predetermined the value of K? It's mentioned that some of the actions were duplicated, so was curious if you saw improvements by lowering K to the sort of limit you and I might expect on Breakout for example.

Also, when inferring you mentioned enforcing vt=0. Is that also the case when R is fed the frame features that have been reconstructed during the mixed training stage? And does randomly sampling vt when testing on something like Tennis produce reasonable outputs? I can understand not using it on Breakout as you don't really want to introduce that non-determinism.

Thanks again :smile:

opened by phillips96 2
Are .pkl files necessary?

Hi,

It's an interesting research!

I want to reproduce the results of this paper, but I am still confused about the details. In breakout dataset, a video file contains not only the images, but also 4 .pkl files. How do these files initialize? If I use custom datasets, are these four files necessary?

Thanks

opened by Carmenliang 1

train.py crashes

opened by RyannDaGreat 1

Error when locating font

Line 32 of PlayableVideoGeneration/utils/save_video_ffmpeg.py should be font = ImageFont.truetype("utils/fonts/Roboto-Regular.ttf", pointsize) and not font = ImageFont.truetype("fonts/Roboto-Regular.ttf", pointsize). This error makes python play.py --config configs/02_breakout.yaml fail

opened by RyannDaGreat 1
∆-MSE and ∆-acc in data.yml

I am little confused in evaluation results of data.yml which gets generated. Where do I find ∆-MSE and ∆-acc? For reference, they are displayed in Table 1 of ablation studies.

opened by karims 1
Discrepancy in del-MSE and del-ACC values

Hi, Interesting work! I trained the network on Atari Breakout dataset and Tennis dataset on Tesla V100 using the same setup as yours. On evaluation, I found the FID, FVD and LPIPS values to be similar to the ones reported in paper, however del-mse and del-acc are way off. The data.yaml files can be found at tennis and Atari. Can you suggest why this might be happening?

Thanks, Sonam

opened by sonam-rgb 0

Playable Video Generation

Related tags

Overview

Playable Video Generation

Overview

Requirements

Conda

Docker

Preparing Datasets

BAIR

Atari Breakout

Tennis

Custom Datasets

Using Pretrained Models

Playing

Training

Evaluation

Comments

Owner

Willi Menapace

Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

Image-generation-baseline - MUGE Text To Image Generation Baseline

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

VideoGPT: Video Generation using VQ-VAE and Transformers

Code for "ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on", accepted at WACV 2021 Generation of Human Behavior Workshop.

Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

[NeurIPS 2020] Blind Video Temporal Consistency via Deep Video Prior

Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.

Search Youtube Video and Get Video info

We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances to transformations applied to both the audio and video streams.

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

Video lie detector using xgboost - A video lie detector using OpenFace and xgboost

Eff video representation - Efficient video representation through neural fields

Video-face-extractor - Video face extractor with Python

[CVPR 2022] Official PyTorch Implementation for "Reference-based Video Super-Resolution Using Multi-Camera Video Triplets"

Detectron2 is FAIR's next-generation platform for object detection and segmentation.

Learning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Networks

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)