Playable Video Generation

Overview

Playable Video Generation




Playable Video Generation
Willi Menapace, Stéphane Lathuilière, Sergey Tulyakov, Aliaksandr Siarohin, Elisa Ricci

Paper: ArXiv
Supplementary: Website
Demo: Try it Live

Abstract: This paper introduces the unsupervised learning problem of playable video generation (PVG). In PVG, we aim at allowing a user to control the generated video by selecting a discrete action at every time step as when playing a video game. The difficulty of the task lies both in learning semantically consistent actions and in generating realistic videos conditioned on the user input. We propose a novel framework for PVG that is trained in a self-supervised manner on a large dataset of unlabelled videos. We employ an encoder-decoder architecture where the predicted action labels act as bottleneck. The network is constrained to learn a rich action space using, as main driving loss, a reconstruction loss on the generated video. We demonstrate the effectiveness of the proposed approach on several datasets with wide environment variety.

Overview



Figure 1. Illustration of the proposed CADDY model for playable video generation.


Given a set of completely unlabeled videos, we jointly learn a set of discrete actions and a video generation model conditioned on the learned actions. At test time, the user can control the generated video on-the-fly providing action labels as if he or she was playing a videogame. We name our method CADDY. Our architecture for unsupervised playable video generation is composed by several components. An encoder E extracts frame representations from the input sequence. A temporal model estimates the successive states using a recurrent dynamics network R and an action network A which predicts the action label corresponding to the current action performed in the input sequence. Finally, a decoder D reconstructs the input frames. The model is trained using reconstruction as the main driving loss.

Requirements

We recommend the use of Linux and of one or more CUDA compatible GPUs. We provide both a Conda environment and a Dockerfile to configure the required libraries.

Conda

The environment can be installed and activated with:

conda env create -f env.yml

conda activate video-generation

Docker

Use the Dockerfile to build the docker image:

docker build -t video-generation:1.0 .

Run the docker image mounting the root directory to /video-generation in the docker container:

docker run -it --gpus all --ipc=host -v /path/to/directory/video-generation:/video-generation video-generation:1.0 /bin/bash

Preparing Datasets

BAIR

Coming soon

Atari Breakout

Download the breakout_160_ours.tar.gz archive from Google Drive and extract it under the data folder.

Tennis

The Tennis dataset is automatically acquired from Youtube by running

./get_tennis_dataset.sh

This requires an installation of youtube-dl (Download). Please run youtube-dl -U to update the utility to the latest version. The dataset will be created at data/tennis_v4_256_ours.

Custom Datasets

Custom datasets can be created from a user-provided folder containing plain videos. Acquired video frames are sampled at the specified resolution and framerate. ffmpeg is used for the extraction and supports multiple input formats. By default only mp4 files are acquired.

python -m dataset.acquisition.convert_video_directory --video_directory --output_directory --target_size [--fps --video_extension --processes ]

As an example the following command transforms all mp4 videos in the tmp/my_videos directory into a 256x256px dataset sampled at 10fps and saves it in the data/my_videos folder python -m dataset.acquisition.convert_video_directory --video_directory tmp/my_videos --output_directory data/my_videos --target_size 256 256 --fps 10

Using Pretrained Models

Pretrained models in .pth.tar format are available for all the datasets and can be downloaded at the following link: Google Drive

Please place each directory under the checkpoints folder. Training and inference scripts automatically make use of the latest.pth.tar checkpoint when present in the checkpoints subfolder corresponding to the configuration in use.

Playing

When a latest.pth.tar checkpoint is present under the checkpoints folder corresponding to the current configuration, the model can be interactively used to generate videos with the following commands:

  • Bair: python play.py --config configs/01_bair.yaml

  • Breakout: python play.py configs/breakout/02_breakout.yaml

  • Tennis: python play.py --config configs/03_tennis.yaml

A full screen window will appear and actions can be provided using number keys in the range [1, actions_count]. Number key 0 resets the generation process.

The inference process is lightweight and can be executed even in browser as in our Live Demo.

Training

The models can be trained with the following commands:

python train.py --config configs/

The training process generates multiple files under the results and checkpoint directories a sub directory with the name corresponding to the one specified in the configuration file. In particular, the folder under the results directory will contain an images folder showing qualitative results obtained during training. The checkpoints subfolder will contain regularly saved checkpoints and the latest.pth.tar checkpoint representing the latest model parameters.

The training can be completely monitored through Weights and Biases by running before execution of the training command: wandb init

Training the model in full resolution on our datasets required the following GPU resources:

  • BAIR: 4x2080Ti 44GB
  • Breakout: 1x2080Ti 11GB
  • Tennis: 2x2080 16GB

Lower resolution versions of the model can be trained with a single 8GB GPU.

Evaluation

Evaluation requires two steps. First, an evaluation dataset must be built. Second, evaluation is carried out on the evaluation dataset. To build the evaluation dataset please issue:

python build_evaluation_dataset.py --config configs/

The command creates a reconstruction of the test portion of the dataset under the results//evaluation_dataset directory. To run evaluation issue:

python evaluate_dataset.py --config configs/evaluation/configs/

Evaluation results are saved under the evaluation_results directory the folder specified in the configuration file with the name data.yml.

Comments
  • Tennis dataset

    Tennis dataset

    Hi,

    Thank you for the paper and the code. It is very interesting.

    I am new to this area so I'm learning about the training method. Could you give me more details on how you setup the Tennis dataset? I would like to expand to my own dataset for video generation.

    Thanks

    opened by nikky4D 4
  • ResolvePackageNotFound Error

    ResolvePackageNotFound Error

    Hello, I'm trying to install using Conda on Windows 10, but when I run "conda env create -f env.yml" it gives me the following error: Collecting package metadata (repodata.json): done Solving environment: failed

    ResolvePackageNotFound:

    • gstreamer=1.14.0
    • readline=7.0
    • libgcc-ng=9.1.0
    • glib=2.63.1
    • ld_impl_linux-64=2.33.1
    • gmp=6.2.0
    • rhash=1.3.8
    • libedit=3.1.20181209
    • gst-plugins-base=1.14.0
    • ncurses=6.2
    • libstdcxx-ng=9.1.0
    • libuuid=1.0.3
    • expat=2.2.6
    • libgfortran-ng=7.3.0
    • dbus=1.13.12
    opened by AVTV64 3
  • Ablation studies

    Ablation studies

    I am trying to do ablation studies on Tenis dataset, different from what is done in paper for BAIR.

    It looks switching off G.S is straightforward from yml file. However, switching off vt - action variability embedding and L_act: training with the mutual information loss doesn't look that simple.

    Can you shed some light on this how to proceed?

    I see that to switch off L_act, there are many places to comment code. Or is it ok to set action_mutual_information_lambda and action_mutual_information_lambda_pretraining to 0? Does this work?

    About v_t, I am just unable to figure that out how to switch it off in code. From the paper, it is defined as the difference between the observed action direction dt and its assigned cluster centroid. The only clue I find is in model.py line 188 says:

    if not self.config["model"]["action_network"]["use_variations"]:
                flat_action_variations = flat_action_variations * 0
    

    Does use_variations=False helps to do this ablation study?

    opened by karims 2
  • Running on web

    Running on web

    I'm trying to run pre-trained models on remote server and I want to visualize interaction like in the demo website.

    Is it possible that you can point out the source code of the demo website?

    Also, is it possible to run the pre-trained models on jupyter and see the video generation?

    opened by karims 2
  • Question on Variability Embedding

    Question on Variability Embedding

    Hello,

    Thank you for providing the code to this paper!

    I was wondering how you predetermined the value of K? It's mentioned that some of the actions were duplicated, so was curious if you saw improvements by lowering K to the sort of limit you and I might expect on Breakout for example.

    Also, when inferring you mentioned enforcing vt=0. Is that also the case when R is fed the frame features that have been reconstructed during the mixed training stage? And does randomly sampling vt when testing on something like Tennis produce reasonable outputs? I can understand not using it on Breakout as you don't really want to introduce that non-determinism.

    Thanks again :smile:

    opened by phillips96 2
  • Are .pkl files necessary?

    Are .pkl files necessary?

    Hi,

    It's an interesting research!

    I want to reproduce the results of this paper, but I am still confused about the details. In breakout dataset, a video file contains not only the images, but also 4 .pkl files. How do these files initialize? If I use custom datasets, are these four files necessary?

    Thanks

    opened by Carmenliang 1
  • train.py crashes

    train.py crashes

    When I run train.py, it trains fine for about 1000 or so iterations, and then it crashes due to some threading error involving tkinter.

    I ran this command: (video-generation) G ➜ PlayableVideoGeneration git:(main) python3 train.py --config configs/02_breakout.yaml

    And much further down, many minutes later I got this:

    step: 984/300000 loss_component_observations_rec:0.039 loss_component_perceptual_loss:0.808 loss_component_hidden_states_rec:0.222 loss_component_states_rec:0.001 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.039 avg_perceptual_loss:0.417 states_rec_loss:0.007 hidden_states_rec_loss:0.222 entropy_loss:0.907 samples_entropy:0.696 action_distribution_entropy:1.053 states_magnitude:0.656 hidden_states_magnitude:0.419 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.676 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.678 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.034 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.727 action_variations_mean:-0.026 reconstructed_action_directions_kl_loss:0.033 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.492 observations_rec_loss_r0:0.028 perceptual_loss_r0_l0:0.492 perceptual_loss_r0_l1:0.085 perceptual_loss_r0_l2:0.165 perceptual_loss_r0_l3:0.161 perceptual_loss_r0_l4:0.061 perceptual_loss_r1:0.246 observations_rec_loss_r1:0.018 perceptual_loss_r1_l0:0.246 perceptual_loss_r1_l1:0.044 perceptual_loss_r1_l2:0.078 perceptual_loss_r1_l3:0.077 perceptual_loss_r1_l4:0.035 perceptual_loss_r2:0.513 observations_rec_loss_r2:0.071 perceptual_loss_r2_l0:0.513 perceptual_loss_r2_l1:0.123 perceptual_loss_r2_l2:0.168 perceptual_loss_r2_l3:0.128 perceptual_loss_r2_l4:0.047 loss:1.071 lr: 0.0004
    step: 985/300000 loss_component_observations_rec:0.038 loss_component_perceptual_loss:0.814 loss_component_hidden_states_rec:0.221 loss_component_states_rec:0.002 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.038 avg_perceptual_loss:0.420 states_rec_loss:0.008 hidden_states_rec_loss:0.221 entropy_loss:0.928 samples_entropy:0.676 action_distribution_entropy:1.021 states_magnitude:0.657 hidden_states_magnitude:0.419 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.675 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.678 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.034 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.693 action_variations_mean:0.036 reconstructed_action_directions_kl_loss:0.033 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.484 observations_rec_loss_r0:0.028 perceptual_loss_r0_l0:0.484 perceptual_loss_r0_l1:0.084 perceptual_loss_r0_l2:0.161 perceptual_loss_r0_l3:0.160 perceptual_loss_r0_l4:0.060 perceptual_loss_r1:0.257 observations_rec_loss_r1:0.017 perceptual_loss_r1_l0:0.257 perceptual_loss_r1_l1:0.044 perceptual_loss_r1_l2:0.081 perceptual_loss_r1_l3:0.084 perceptual_loss_r1_l4:0.036 perceptual_loss_r2:0.518 observations_rec_loss_r2:0.068 perceptual_loss_r2_l0:0.518 perceptual_loss_r2_l1:0.119 perceptual_loss_r2_l2:0.169 perceptual_loss_r2_l3:0.134 perceptual_loss_r2_l4:0.051 loss:1.075 lr: 0.0004
    step: 986/300000 loss_component_observations_rec:0.039 loss_component_perceptual_loss:0.797 loss_component_hidden_states_rec:0.219 loss_component_states_rec:0.002 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.039 avg_perceptual_loss:0.411 states_rec_loss:0.008 hidden_states_rec_loss:0.219 entropy_loss:0.915 samples_entropy:0.727 action_distribution_entropy:1.049 states_magnitude:0.655 hidden_states_magnitude:0.419 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.676 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.678 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.034 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.649 action_variations_mean:-0.017 reconstructed_action_directions_kl_loss:0.033 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.478 observations_rec_loss_r0:0.030 perceptual_loss_r0_l0:0.478 perceptual_loss_r0_l1:0.084 perceptual_loss_r0_l2:0.160 perceptual_loss_r0_l3:0.155 perceptual_loss_r0_l4:0.060 perceptual_loss_r1:0.248 observations_rec_loss_r1:0.018 perceptual_loss_r1_l0:0.248 perceptual_loss_r1_l1:0.043 perceptual_loss_r1_l2:0.079 perceptual_loss_r1_l3:0.081 perceptual_loss_r1_l4:0.034 perceptual_loss_r2:0.507 observations_rec_loss_r2:0.070 perceptual_loss_r2_l0:0.507 perceptual_loss_r2_l1:0.117 perceptual_loss_r2_l2:0.163 perceptual_loss_r2_l3:0.132 perceptual_loss_r2_l4:0.051 loss:1.057 lr: 0.0004
    step: 987/300000 loss_component_observations_rec:0.038 loss_component_perceptual_loss:0.791 loss_component_hidden_states_rec:0.217 loss_component_states_rec:0.001 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.038 avg_perceptual_loss:0.408 states_rec_loss:0.007 hidden_states_rec_loss:0.217 entropy_loss:0.886 samples_entropy:0.665 action_distribution_entropy:0.977 states_magnitude:0.654 hidden_states_magnitude:0.418 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.673 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.676 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.035 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.616 action_variations_mean:-0.237 reconstructed_action_directions_kl_loss:0.034 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.466 observations_rec_loss_r0:0.028 perceptual_loss_r0_l0:0.466 perceptual_loss_r0_l1:0.083 perceptual_loss_r0_l2:0.158 perceptual_loss_r0_l3:0.150 perceptual_loss_r0_l4:0.056 perceptual_loss_r1:0.245 observations_rec_loss_r1:0.017 perceptual_loss_r1_l0:0.245 perceptual_loss_r1_l1:0.043 perceptual_loss_r1_l2:0.077 perceptual_loss_r1_l3:0.078 perceptual_loss_r1_l4:0.036 perceptual_loss_r2:0.513 observations_rec_loss_r2:0.069 perceptual_loss_r2_l0:0.513 perceptual_loss_r2_l1:0.118 perceptual_loss_r2_l2:0.165 perceptual_loss_r2_l3:0.136 perceptual_loss_r2_l4:0.048 loss:1.048 lr: 0.0004
    step: 988/300000 loss_component_observations_rec:0.037 loss_component_perceptual_loss:0.813 loss_component_hidden_states_rec:0.218 loss_component_states_rec:0.002 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.037 avg_perceptual_loss:0.419 states_rec_loss:0.008 hidden_states_rec_loss:0.218 entropy_loss:0.922 samples_entropy:0.615 action_distribution_entropy:0.983 states_magnitude:0.656 hidden_states_magnitude:0.417 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.672 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.673 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.035 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.750 action_variations_mean:0.063 reconstructed_action_directions_kl_loss:0.035 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.484 observations_rec_loss_r0:0.026 perceptual_loss_r0_l0:0.484 perceptual_loss_r0_l1:0.084 perceptual_loss_r0_l2:0.164 perceptual_loss_r0_l3:0.157 perceptual_loss_r0_l4:0.059 perceptual_loss_r1:0.244 observations_rec_loss_r1:0.018 perceptual_loss_r1_l0:0.244 perceptual_loss_r1_l1:0.042 perceptual_loss_r1_l2:0.077 perceptual_loss_r1_l3:0.079 perceptual_loss_r1_l4:0.035 perceptual_loss_r2:0.530 observations_rec_loss_r2:0.069 perceptual_loss_r2_l0:0.530 perceptual_loss_r2_l1:0.123 perceptual_loss_r2_l2:0.172 perceptual_loss_r2_l3:0.138 perceptual_loss_r2_l4:0.051 loss:1.070 lr: 0.0004
    step: 989/300000 loss_component_observations_rec:0.037 loss_component_perceptual_loss:0.777 loss_component_hidden_states_rec:0.216 loss_component_states_rec:0.001 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.037 avg_perceptual_loss:0.401 states_rec_loss:0.007 hidden_states_rec_loss:0.216 entropy_loss:0.954 samples_entropy:0.645 action_distribution_entropy:1.025 states_magnitude:0.656 hidden_states_magnitude:0.418 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.674 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.674 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.034 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.599 action_variations_mean:0.092 reconstructed_action_directions_kl_loss:0.034 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.476 observations_rec_loss_r0:0.027 perceptual_loss_r0_l0:0.476 perceptual_loss_r0_l1:0.084 perceptual_loss_r0_l2:0.161 perceptual_loss_r0_l3:0.154 perceptual_loss_r0_l4:0.057 perceptual_loss_r1:0.215 observations_rec_loss_r1:0.017 perceptual_loss_r1_l0:0.215 perceptual_loss_r1_l1:0.040 perceptual_loss_r1_l2:0.069 perceptual_loss_r1_l3:0.066 perceptual_loss_r1_l4:0.030 perceptual_loss_r2:0.512 observations_rec_loss_r2:0.067 perceptual_loss_r2_l0:0.512 perceptual_loss_r2_l1:0.119 perceptual_loss_r2_l2:0.166 perceptual_loss_r2_l3:0.129 perceptual_loss_r2_l4:0.052 loss:1.031 lr: 0.0004
    step: 990/300000 loss_component_observations_rec:0.037 loss_component_perceptual_loss:0.777 loss_component_hidden_states_rec:0.217 loss_component_states_rec:0.001 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.037 avg_perceptual_loss:0.401 states_rec_loss:0.006 hidden_states_rec_loss:0.217 entropy_loss:0.940 samples_entropy:0.823 action_distribution_entropy:1.053 states_magnitude:0.657 hidden_states_magnitude:0.419 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.677 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.678 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.034 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.531 action_variations_mean:0.005 reconstructed_action_directions_kl_loss:0.033 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.462 observations_rec_loss_r0:0.027 perceptual_loss_r0_l0:0.462 perceptual_loss_r0_l1:0.080 perceptual_loss_r0_l2:0.153 perceptual_loss_r0_l3:0.149 perceptual_loss_r0_l4:0.061 perceptual_loss_r1:0.231 observations_rec_loss_r1:0.018 perceptual_loss_r1_l0:0.231 perceptual_loss_r1_l1:0.040 perceptual_loss_r1_l2:0.073 perceptual_loss_r1_l3:0.073 perceptual_loss_r1_l4:0.034 perceptual_loss_r2:0.509 observations_rec_loss_r2:0.065 perceptual_loss_r2_l0:0.509 perceptual_loss_r2_l1:0.116 perceptual_loss_r2_l2:0.165 perceptual_loss_r2_l3:0.132 perceptual_loss_r2_l4:0.051 loss:1.032 lr: 0.0004
    step: 991/300000 loss_component_observations_rec:0.041 loss_component_perceptual_loss:0.843 loss_component_hidden_states_rec:0.222 loss_component_states_rec:0.002 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.041 avg_perceptual_loss:0.435 states_rec_loss:0.008 hidden_states_rec_loss:0.222 entropy_loss:0.906 samples_entropy:0.729 action_distribution_entropy:1.034 states_magnitude:0.654 hidden_states_magnitude:0.419 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.680 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.680 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.033 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.662 action_variations_mean:-0.087 reconstructed_action_directions_kl_loss:0.033 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.506 observations_rec_loss_r0:0.031 perceptual_loss_r0_l0:0.506 perceptual_loss_r0_l1:0.086 perceptual_loss_r0_l2:0.166 perceptual_loss_r0_l3:0.167 perceptual_loss_r0_l4:0.067 perceptual_loss_r1:0.274 observations_rec_loss_r1:0.020 perceptual_loss_r1_l0:0.274 perceptual_loss_r1_l1:0.047 perceptual_loss_r1_l2:0.086 perceptual_loss_r1_l3:0.087 perceptual_loss_r1_l4:0.042 perceptual_loss_r2:0.524 observations_rec_loss_r2:0.071 perceptual_loss_r2_l0:0.524 perceptual_loss_r2_l1:0.123 perceptual_loss_r2_l2:0.172 perceptual_loss_r2_l3:0.135 perceptual_loss_r2_l4:0.048 loss:1.108 lr: 0.0004
    step: 992/300000 loss_component_observations_rec:0.039 loss_component_perceptual_loss:0.839 loss_component_hidden_states_rec:0.218 loss_component_states_rec:0.001 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.039 avg_perceptual_loss:0.433 states_rec_loss:0.007 hidden_states_rec_loss:0.218 entropy_loss:0.900 samples_entropy:0.640 action_distribution_entropy:1.012 states_magnitude:0.654 hidden_states_magnitude:0.417 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.679 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.681 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.033 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.773 action_variations_mean:-0.026 reconstructed_action_directions_kl_loss:0.033 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.502 observations_rec_loss_r0:0.028 perceptual_loss_r0_l0:0.502 perceptual_loss_r0_l1:0.086 perceptual_loss_r0_l2:0.166 perceptual_loss_r0_l3:0.165 perceptual_loss_r0_l4:0.065 perceptual_loss_r1:0.261 observations_rec_loss_r1:0.018 perceptual_loss_r1_l0:0.261 perceptual_loss_r1_l1:0.044 perceptual_loss_r1_l2:0.082 perceptual_loss_r1_l3:0.084 perceptual_loss_r1_l4:0.039 perceptual_loss_r2:0.535 observations_rec_loss_r2:0.071 perceptual_loss_r2_l0:0.535 perceptual_loss_r2_l1:0.123 perceptual_loss_r2_l2:0.173 perceptual_loss_r2_l3:0.139 perceptual_loss_r2_l4:0.053 loss:1.097 lr: 0.0004
    step: 993/300000 loss_component_observations_rec:0.038 loss_component_perceptual_loss:0.823 loss_component_hidden_states_rec:0.221 loss_component_states_rec:0.001 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.038 avg_perceptual_loss:0.424 states_rec_loss:0.007 hidden_states_rec_loss:0.221 entropy_loss:0.902 samples_entropy:0.650 action_distribution_entropy:1.029 states_magnitude:0.653 hidden_states_magnitude:0.417 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.683 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.684 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.032 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.737 action_variations_mean:-0.088 reconstructed_action_directions_kl_loss:0.032 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.497 observations_rec_loss_r0:0.026 perceptual_loss_r0_l0:0.497 perceptual_loss_r0_l1:0.084 perceptual_loss_r0_l2:0.165 perceptual_loss_r0_l3:0.165 perceptual_loss_r0_l4:0.064 perceptual_loss_r1:0.253 observations_rec_loss_r1:0.017 perceptual_loss_r1_l0:0.253 perceptual_loss_r1_l1:0.043 perceptual_loss_r1_l2:0.078 perceptual_loss_r1_l3:0.083 perceptual_loss_r1_l4:0.039 perceptual_loss_r2:0.522 observations_rec_loss_r2:0.070 perceptual_loss_r2_l0:0.522 perceptual_loss_r2_l1:0.119 perceptual_loss_r2_l2:0.167 perceptual_loss_r2_l3:0.136 perceptual_loss_r2_l4:0.054 loss:1.082 lr: 0.0004
    step: 994/300000 loss_component_observations_rec:0.039 loss_component_perceptual_loss:0.840 loss_component_hidden_states_rec:0.223 loss_component_states_rec:0.001 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.039 avg_perceptual_loss:0.433 states_rec_loss:0.007 hidden_states_rec_loss:0.223 entropy_loss:0.921 samples_entropy:0.652 action_distribution_entropy:0.991 states_magnitude:0.654 hidden_states_magnitude:0.418 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.682 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.684 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.032 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.670 action_variations_mean:0.009 reconstructed_action_directions_kl_loss:0.032 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.502 observations_rec_loss_r0:0.029 perceptual_loss_r0_l0:0.502 perceptual_loss_r0_l1:0.085 perceptual_loss_r0_l2:0.166 perceptual_loss_r0_l3:0.167 perceptual_loss_r0_l4:0.063 perceptual_loss_r1:0.259 observations_rec_loss_r1:0.018 perceptual_loss_r1_l0:0.259 perceptual_loss_r1_l1:0.046 perceptual_loss_r1_l2:0.083 perceptual_loss_r1_l3:0.083 perceptual_loss_r1_l4:0.034 perceptual_loss_r2:0.538 observations_rec_loss_r2:0.069 perceptual_loss_r2_l0:0.538 perceptual_loss_r2_l1:0.122 perceptual_loss_r2_l2:0.173 perceptual_loss_r2_l3:0.142 perceptual_loss_r2_l4:0.055 loss:1.103 lr: 0.0004
    step: 995/300000 loss_component_observations_rec:0.037 loss_component_perceptual_loss:0.797 loss_component_hidden_states_rec:0.219 loss_component_states_rec:0.001 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.037 avg_perceptual_loss:0.411 states_rec_loss:0.006 hidden_states_rec_loss:0.219 entropy_loss:0.955 samples_entropy:0.738 action_distribution_entropy:1.055 states_magnitude:0.655 hidden_states_magnitude:0.419 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.679 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.680 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.033 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.549 action_variations_mean:0.042 reconstructed_action_directions_kl_loss:0.033 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.454 observations_rec_loss_r0:0.027 perceptual_loss_r0_l0:0.454 perceptual_loss_r0_l1:0.079 perceptual_loss_r0_l2:0.152 perceptual_loss_r0_l3:0.147 perceptual_loss_r0_l4:0.057 perceptual_loss_r1:0.233 observations_rec_loss_r1:0.017 perceptual_loss_r1_l0:0.233 perceptual_loss_r1_l1:0.040 perceptual_loss_r1_l2:0.074 perceptual_loss_r1_l3:0.076 perceptual_loss_r1_l4:0.032 perceptual_loss_r2:0.547 observations_rec_loss_r2:0.069 perceptual_loss_r2_l0:0.547 perceptual_loss_r2_l1:0.124 perceptual_loss_r2_l2:0.176 perceptual_loss_r2_l3:0.146 perceptual_loss_r2_l4:0.054 loss:1.055 lr: 0.0004
    step: 996/300000 loss_component_observations_rec:0.037 loss_component_perceptual_loss:0.848 loss_component_hidden_states_rec:0.219 loss_component_states_rec:0.001 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.037 avg_perceptual_loss:0.437 states_rec_loss:0.007 hidden_states_rec_loss:0.219 entropy_loss:0.957 samples_entropy:0.737 action_distribution_entropy:1.052 states_magnitude:0.655 hidden_states_magnitude:0.419 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.685 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.686 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.032 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.732 action_variations_mean:0.245 reconstructed_action_directions_kl_loss:0.032 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.511 observations_rec_loss_r0:0.026 perceptual_loss_r0_l0:0.511 perceptual_loss_r0_l1:0.085 perceptual_loss_r0_l2:0.166 perceptual_loss_r0_l3:0.170 perceptual_loss_r0_l4:0.070 perceptual_loss_r1:0.264 observations_rec_loss_r1:0.017 perceptual_loss_r1_l0:0.264 perceptual_loss_r1_l1:0.043 perceptual_loss_r1_l2:0.081 perceptual_loss_r1_l3:0.086 perceptual_loss_r1_l4:0.043 perceptual_loss_r2:0.535 observations_rec_loss_r2:0.069 perceptual_loss_r2_l0:0.535 perceptual_loss_r2_l1:0.121 perceptual_loss_r2_l2:0.173 perceptual_loss_r2_l3:0.137 perceptual_loss_r2_l4:0.057 loss:1.106 lr: 0.0004
    step: 997/300000 loss_component_observations_rec:0.037 loss_component_perceptual_loss:0.797 loss_component_hidden_states_rec:0.220 loss_component_states_rec:0.001 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.037 avg_perceptual_loss:0.411 states_rec_loss:0.007 hidden_states_rec_loss:0.220 entropy_loss:0.953 samples_entropy:0.789 action_distribution_entropy:1.073 states_magnitude:0.657 hidden_states_magnitude:0.420 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.685 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.686 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.032 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.588 action_variations_mean:0.111 reconstructed_action_directions_kl_loss:0.031 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.483 observations_rec_loss_r0:0.026 perceptual_loss_r0_l0:0.483 perceptual_loss_r0_l1:0.083 perceptual_loss_r0_l2:0.162 perceptual_loss_r0_l3:0.159 perceptual_loss_r0_l4:0.060 perceptual_loss_r1:0.230 observations_rec_loss_r1:0.017 perceptual_loss_r1_l0:0.230 perceptual_loss_r1_l1:0.041 perceptual_loss_r1_l2:0.074 perceptual_loss_r1_l3:0.072 perceptual_loss_r1_l4:0.031 perceptual_loss_r2:0.521 observations_rec_loss_r2:0.069 perceptual_loss_r2_l0:0.521 perceptual_loss_r2_l1:0.122 perceptual_loss_r2_l2:0.170 perceptual_loss_r2_l3:0.136 perceptual_loss_r2_l4:0.046 loss:1.056 lr: 0.0004
    step: 998/300000 loss_component_observations_rec:0.037 loss_component_perceptual_loss:0.769 loss_component_hidden_states_rec:0.218 loss_component_states_rec:0.001 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.037 avg_perceptual_loss:0.397 states_rec_loss:0.007 hidden_states_rec_loss:0.218 entropy_loss:0.921 samples_entropy:0.684 action_distribution_entropy:1.048 states_magnitude:0.656 hidden_states_magnitude:0.419 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.691 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.694 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.030 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.663 action_variations_mean:-0.024 reconstructed_action_directions_kl_loss:0.030 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.467 observations_rec_loss_r0:0.026 perceptual_loss_r0_l0:0.467 perceptual_loss_r0_l1:0.081 perceptual_loss_r0_l2:0.157 perceptual_loss_r0_l3:0.153 perceptual_loss_r0_l4:0.058 perceptual_loss_r1:0.227 observations_rec_loss_r1:0.017 perceptual_loss_r1_l0:0.227 perceptual_loss_r1_l1:0.040 perceptual_loss_r1_l2:0.074 perceptual_loss_r1_l3:0.071 perceptual_loss_r1_l4:0.032 perceptual_loss_r2:0.496 observations_rec_loss_r2:0.069 perceptual_loss_r2_l0:0.496 perceptual_loss_r2_l1:0.117 perceptual_loss_r2_l2:0.160 perceptual_loss_r2_l3:0.129 perceptual_loss_r2_l4:0.045 loss:1.025 lr: 0.0004
    step: 999/300000 loss_component_observations_rec:0.037 loss_component_perceptual_loss:0.788 loss_component_hidden_states_rec:0.220 loss_component_states_rec:0.001 loss_component_entropy:0.000 loss_component_action_directions_kl_divergence:0.000 loss_component_action_mutual_information:-0.000 loss_component_action_state_distribution_kl:0.000 avg_observations_rec_loss:0.037 avg_perceptual_loss:0.407 states_rec_loss:0.007 hidden_states_rec_loss:0.220 entropy_loss:0.918 samples_entropy:0.722 action_distribution_entropy:1.030 states_magnitude:0.657 hidden_states_magnitude:0.420 action_directions_mean_magnitude:0.001 action_directions_variance_magnitude:0.689 reconstructed_action_directions_mean_magnitude:0.001 reconstructed_action_directions_variance_magnitude:0.691 action_directions_reconstruction_error:0.000 action_directions_kl_loss:0.031 centroids_mean_magnitude:0.000 average_centroids_distance:0.000 average_action_variations_norm_l2:0.602 action_variations_mean:-0.057 reconstructed_action_directions_kl_loss:0.030 action_mutual_information_loss:-0.000 action_state_distribution_kl_loss:0.000 ground_truth_observations:6.000 gumbel_temperature:0.970 observations_count:7.000 perceptual_loss_r0:0.477 observations_rec_loss_r0:0.026 perceptual_loss_r0_l0:0.477 perceptual_loss_r0_l1:0.082 perceptual_loss_r0_l2:0.158 perceptual_loss_r0_l3:0.156 perceptual_loss_r0_l4:0.061 perceptual_loss_r1:0.223 observations_rec_loss_r1:0.017 perceptual_loss_r1_l0:0.223 perceptual_loss_r1_l1:0.040 perceptual_loss_r1_l2:0.072 perceptual_loss_r1_l3:0.070 perceptual_loss_r1_l4:0.029 perceptual_loss_r2:0.522 observations_rec_loss_r2:0.069 perceptual_loss_r2_l0:0.522 perceptual_loss_r2_l1:0.122 perceptual_loss_r2_l2:0.166 perceptual_loss_r2_l3:0.130 perceptual_loss_r2_l4:0.057 loss:1.046 lr: 0.0004
    Exception ignored in: <function Image.__del__ at 0x7f10c53415f0>
    Traceback (most recent call last):
      File "/home/ryan/miniconda3/envs/video-generation/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
        self.tk.call('image', 'delete', self.name)
    RuntimeError: main thread is not in main loop
    Exception ignored in: <function Image.__del__ at 0x7f10c53415f0>
    Traceback (most recent call last):
      File "/home/ryan/miniconda3/envs/video-generation/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
        self.tk.call('image', 'delete', self.name)
    RuntimeError: main thread is not in main loop
    Exception ignored in: <function Variable.__del__ at 0x7f10c5a7cb90>
    Traceback (most recent call last):
      File "/home/ryan/miniconda3/envs/video-generation/lib/python3.7/tkinter/__init__.py", line 332, in __del__
        if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
    RuntimeError: main thread is not in main loop
    Exception ignored in: <function Image.__del__ at 0x7f10c53415f0>
    Traceback (most recent call last):
      File "/home/ryan/miniconda3/envs/video-generation/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
        self.tk.call('image', 'delete', self.name)
    RuntimeError: main thread is not in main loop
    Exception ignored in: <function Variable.__del__ at 0x7f10c5a7cb90>
    Traceback (most recent call last):
      File "/home/ryan/miniconda3/envs/video-generation/lib/python3.7/tkinter/__init__.py", line 332, in __del__
        if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
    RuntimeError: main thread is not in main loop
    Exception ignored in: <function Image.__del__ at 0x7f10c53415f0>
    Traceback (most recent call last):
      File "/home/ryan/miniconda3/envs/video-generation/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
        self.tk.call('image', 'delete', self.name)
    RuntimeError: main thread is not in main loop
    Exception ignored in: <function Image.__del__ at 0x7f10c53415f0>
    Traceback (most recent call last):
      File "/home/ryan/miniconda3/envs/video-generation/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
        self.tk.call('image', 'delete', self.name)
    RuntimeError: main thread is not in main loop
    Exception ignored in: <function Image.__del__ at 0x7f10c53415f0>
    Traceback (most recent call last):
      File "/home/ryan/miniconda3/envs/video-generation/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
        self.tk.call('image', 'delete', self.name)
    RuntimeError: main thread is not in main loop
    Exception ignored in: <function Image.__del__ at 0x7f10c53415f0>
    Traceback (most recent call last):
      File "/home/ryan/miniconda3/envs/video-generation/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
        self.tk.call('image', 'delete', self.name)
    RuntimeError: main thread is not in main loop
    Exception ignored in: <function Image.__del__ at 0x7f10c53415f0>
    Traceback (most recent call last):
      File "/home/ryan/miniconda3/envs/video-generation/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
        self.tk.call('image', 'delete', self.name)
    RuntimeError: main thread is not in main loop
    Tcl_AsyncDelete: async handler deleted by the wrong thread
    [1]    4391 abort (core dumped)  python3 train.py --config configs/02_breakout.yaml
    (video-generation) G ➜ PlayableVideoGeneration git:(main) wandb: Program ended successfully.
    wandb: Run summary:
    wandb:                                                                train/perceptual_loss_r0_l2 0.15621022880077362
    wandb:                                                                                      _step 981
    wandb:                                       train/loss_component_action_directions_kl_divergence 3.709313273429871e-06
    wandb:                                              train/reconstructed_action_directions_kl_loss 0.03681553900241852
    wandb:                                                       train/loss_component_perceptual_loss 0.8016525010267893
    wandb:                                          train/loss_component_action_state_distribution_kl 0.0
    wandb:                                   train/reconstructed_action_directions_variance_magnitude 0.6641947031021118
    wandb:                                                                train/perceptual_loss_r0_l0 0.4748656153678894
    wandb:                                                            train/avg_observations_rec_loss 0.038134743149081864
    wandb:                                                                   train/perceptual_loss_r1 0.24382595717906952
    wandb:                                                                   train/perceptual_loss_r2 0.5221801400184631
    wandb:                                                                train/perceptual_loss_r1_l1 0.04235634580254555
    wandb:                                                                train/perceptual_loss_r0_l3 0.1558036208152771
    wandb:                                                                   train/perceptual_loss_r0 0.4748656153678894
    wandb:                                                                      train/samples_entropy 0.6838436126708984
    wandb:                                                                train/perceptual_loss_r1_l4 0.03565201908349991
    wandb:                                                            train/loss_component_states_rec 0.0014105471782386303
    wandb:                                                      train/loss_component_observations_rec 0.038134743149081864
    wandb:                                                                train/perceptual_loss_r2_l0 0.5221801400184631
    wandb:                                                              train/hidden_states_magnitude 0.419323205947876
    wandb:                                                           train/average_centroids_distance 1.7627293345867656e-05
    wandb:                                                                  train/avg_perceptual_loss 0.413623904188474
    wandb:                                       train/reconstructed_action_directions_mean_magnitude 0.0011238184524700046
    wandb:                                                                train/perceptual_loss_r0_l4 0.06138833239674568
    wandb:                                                                train/perceptual_loss_r1_l2 0.07608731091022491
    wandb:                                                                train/perceptual_loss_r2_l1 0.12005559355020523
    wandb:                                                                      train/states_rec_loss 0.0070527358911931515
    wandb:                                                                train/perceptual_loss_r2_l4 0.05388300120830536
    wandb:                                                                   train/gumbel_temperature 0.97057
    wandb:                                                       train/action_mutual_information_loss -3.69977205991745e-05
    wandb:                                                                   train/observations_count 7.0
    wandb:                                                     train/action_directions_mean_magnitude 0.0011720473412424326
    wandb:                                                            train/ground_truth_observations 6.0
    wandb:                                                                train/perceptual_loss_r0_l1 0.08212797343730927
    wandb:                                                               train/hidden_states_rec_loss 0.21878568828105927
    wandb:                                                               train/action_variations_mean -0.14023524522781372
    wandb:                                                                train/perceptual_loss_r2_l2 0.16710881888866425
    wandb:                                                                                 train/loss 1.0599816392899204
    wandb:                                                                                   train/lr 0.0004
    wandb:                                             train/loss_component_action_mutual_information -5.549658089876175e-06
    wandb:                                                                train/perceptual_loss_r1_l3 0.07853354513645172
    wandb:                                                                                   _runtime 396.27002787590027
    wandb:                                                                     train/states_magnitude 0.6560764312744141
    wandb:                                                          train/action_distribution_entropy 1.0813705921173096
    wandb:                                                            train/action_directions_kl_loss 0.037093132734298706
    wandb:                                                    train/action_state_distribution_kl_loss 8.807628546492197e-06
    wandb:                                                             train/observations_rec_loss_r2 0.06988160312175751
    wandb:                                                             train/observations_rec_loss_r0 0.02682528644800186
    wandb:                                                               train/loss_component_entropy 0.0
    wandb:                                                             train/observations_rec_loss_r1 0.01769733987748623
    wandb:                                                 train/action_directions_variance_magnitude 0.6632212996482849
    wandb:                                                                         train/entropy_loss 0.897321343421936
    wandb:                                                                train/perceptual_loss_r2_l3 0.13487893342971802
    wandb:                                                                                       step 981
    wandb:                                                             train/centroids_mean_magnitude 1.4390966498467606e-05
    wandb:                                                                                 _timestamp 1626460241.496398
    wandb:                                               train/action_directions_reconstruction_error 2.0398056221893057e-07
    wandb:                                                    train/average_action_variations_norm_l2 0.6760526895523071
    wandb:                                                     train/loss_component_hidden_states_rec 0.21878568828105927
    wandb:                                                                train/perceptual_loss_r1_l0 0.24382595717906952
    wandb: Syncing files in wandb/run-20210716_182405-w65gbanw:
    wandb:   code/train.py
    wandb: plus 8 W&B file(s) and 1 media file(s)
    wandb:
    wandb: Synced 02_breakout: https://app.wandb.ai/ryanburgert/video-generation/runs/w65gbanw
    (video-generation) G ➜ PlayableVideoGeneration git:(main)
    

    Are there any workarounds for this? I've tried finding references to tkinter, and there are none in the code that I can see. UPDATE: I found that tesnor_displayer uses matplotlib.pyplot, which imports tkinter.

    I'm using the provided conda environment and docker container. I get this error consistently each time I try running it.

    opened by RyannDaGreat 1
  • Error when locating font

    Error when locating font

    Line 32 of PlayableVideoGeneration/utils/save_video_ffmpeg.py should be font = ImageFont.truetype("utils/fonts/Roboto-Regular.ttf", pointsize) and not font = ImageFont.truetype("fonts/Roboto-Regular.ttf", pointsize). This error makes python play.py --config configs/02_breakout.yaml fail

    opened by RyannDaGreat 1
  • ∆-MSE and ∆-acc in data.yml

    ∆-MSE and ∆-acc in data.yml

    I am little confused in evaluation results of data.yml which gets generated. Where do I find ∆-MSE and ∆-acc? For reference, they are displayed in Table 1 of ablation studies.

    opened by karims 1
  • Discrepancy in del-MSE and del-ACC values

    Discrepancy in del-MSE and del-ACC values

    Hi, Interesting work! I trained the network on Atari Breakout dataset and Tennis dataset on Tesla V100 using the same setup as yours. On evaluation, I found the FID, FVD and LPIPS values to be similar to the ones reported in paper, however del-mse and del-acc are way off. The data.yaml files can be found at tennis and Atari. Can you suggest why this might be happening?

    Thanks, Sonam

    opened by sonam-rgb 0
Owner
Willi Menapace
Hi, I'm Willi Menapace, Ph.D Student and passionate deep learning practitioner. Here you can find some of the projects I am allowed to publish.
Willi Menapace
Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Storium GPT-2 Models This is the official repository for the GPT-2 models described in the EMNLP 2020 paper [STORIUM: A Dataset and Evaluation Platfor

Nader Akoury 27 Dec 20, 2022
A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

Keren Ye 35 Nov 20, 2022
Image-generation-baseline - MUGE Text To Image Generation Baseline

MUGE Text To Image Generation Baseline Requirements and Installation More detail

null 23 Oct 17, 2022
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

Introduction English | 简体中文 MMAction2 is an open-source toolbox for video understanding based on PyTorch. It is a part of the OpenMMLab project. The m

OpenMMLab 2.7k Jan 7, 2023
VideoGPT: Video Generation using VQ-VAE and Transformers

VideoGPT: Video Generation using VQ-VAE and Transformers [Paper][Website][Colab][Gradio Demo] We present VideoGPT: a conceptually simple architecture

Wilson Yan 470 Dec 30, 2022
Code for "ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on", accepted at WACV 2021 Generation of Human Behavior Workshop.

ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on [ Paper ] [ Project Page ] This repository contains the code fo

Andrew Jong 97 Dec 13, 2022
Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

?? Depth-Aware Generative Adversarial Network for Talking Head Video Generation (CVPR 2022) ?? If DaGAN is helpful in your photos/projects, please hel

Fa-Ting Hong 503 Jan 4, 2023
[NeurIPS 2020] Blind Video Temporal Consistency via Deep Video Prior

pytorch-deep-video-prior (DVP) Official PyTorch implementation for NeurIPS 2020 paper: Blind Video Temporal Consistency via Deep Video Prior TensorFlo

Yazhou XING 90 Oct 19, 2022
Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.

vid2vid Project | YouTube(short) | YouTube(full) | arXiv | Paper(full) Pytorch implementation for high-resolution (e.g., 2048x1024) photorealistic vid

NVIDIA Corporation 8.1k Jan 1, 2023
Search Youtube Video and Get Video info

PyYouTube Get Video Data from YouTube link Installation pip install PyYouTube How to use it ? Get Videos Data from pyyoutube import Data yt = Data("ht

lokaman chendekar 35 Nov 25, 2022
We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances to transformations applied to both the audio and video streams.

Multi-Modal Self-Supervision using GDT and StiCa This is an official pytorch implementation of papers: Multi-modal Self-Supervision from Generalized D

Facebook Research 42 Dec 9, 2022
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

PySlowFast PySlowFast is an open source video understanding codebase from FAIR that provides state-of-the-art video classification models with efficie

Meta Research 5.3k Jan 3, 2023
Video lie detector using xgboost - A video lie detector using OpenFace and xgboost

video_lie_detector_using_xgboost a video lie detector using OpenFace and xgboost

null 2 Jan 11, 2022
Eff video representation - Efficient video representation through neural fields

Neural Residual Flow Fields for Efficient Video Representations 1. Download MPI

null 41 Jan 6, 2023
Video-face-extractor - Video face extractor with Python

Python face extractor Setup Create the srcvideos and faces directories Put your

null 2 Feb 3, 2022
[CVPR 2022] Official PyTorch Implementation for "Reference-based Video Super-Resolution Using Multi-Camera Video Triplets"

Reference-based Video Super-Resolution (RefVSR) Official PyTorch Implementation of the CVPR 2022 Paper Project | arXiv | RealMCVSR Dataset This repo c

Junyong Lee 151 Dec 30, 2022
Detectron2 is FAIR's next-generation platform for object detection and segmentation.

Detectron2 is Facebook AI Research's next generation software system that implements state-of-the-art object detection algorithms. It is a ground-up r

Facebook Research 23.3k Jan 8, 2023
Learning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Networks

CyGNet This repository reproduces the AAAI'21 paper “Learning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Network

CunchaoZ 89 Jan 3, 2023
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Deep Daze mist over green hills shattered plates on the grass cosmic love and attention a time traveler in the crowd life during the plague meditative

Phil Wang 4.4k Jan 3, 2023