A JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short.

DeepMind

Last update: Nov 20, 2022

Related tags

Deep Learning brave

Overview

BraVe

This is a JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short.

The model provided in this package was implemented based on the internal model that was used to compute results for the accompanying paper. It achieves comparable results on the evaluation tasks when evaluated side-by-side. Not all details are guaranteed to be identical though, and some results may differ from those given in the paper. In particular, this implementation does not provide the option to train with optical flow.

We provide a selection of pretrained checkpoints in the table below, which can directly be evaluated against HMDB 51 with the evaluation tools this package. These are exactly the checkpoints that were used to provide the numbers in the accompanying paper, and were not trained with the exact trainer given in this package. For details on training a model with this package, please see the end of this readme.

In the table below, the different configurations are represented by using e.g. V/A for video (narrow view) to audio (broad view), or V/F for a narrow view containing video, and a broad view containing optical flow.

The backbone in each case is TSMResnet, with a given width multiplier (please see the accompanying paper for further details). For all of the given numbers below, the SVM regularization constant used is 0.0001. For HMDB 51, the average is given in brackets, followed by the top-1 percentages for each of the splits.

Views	Architecture	HMDB51	UCF-101	K600	Trained with this package	Checkpoint
V/AF	TSM (1X)	(69.2%) 71.307%, 68.497%, 67.843%	92.9%	69.2%	✗	download
V/AF	TSM (2X)	(69.9%) 72.157%, 68.432%, 69.02%	93.2%	70.2%	✗	download
V/A	TSM (1X)	(69.4%) 70.131%, 68.889%, 69.085%	93.0%	70.6%	✗	download
V/VVV	TSM (1X)	(65.4%) 66.797%, 63.856%, 65.425%	92.6%	70.8%	✗	download

Reproducing results from the paper

This package provides everything needed to evaluate the above checkpoints against HMDB 51. It supports Python 3.7 and above.

To get started, we recommend using a clean virtualenv. You may then install the brave package directly from GitHub using,

pip install git+https://github.com/deepmind/brave.git

A pre-processed version of the HMDB 51 dataset can be downloaded using the following command. It requires that both ffmpeg and unrar are available. The following will download the dataset to /tmp/hmdb51/, but any other location would also work.

  python -m brave.download_hmdb --output_dir /tmp/hmdb51/

To evaluate a checkpoint downloaded from the above table, the following may be used. The dataset shards arguments should be set to match the paths used above.

  python -m brave.evaluate_video_embeddings \
    --checkpoint_path <path/to/downloaded/checkpoint>.npy \
    --train_dataset_shards '/tmp/hmdb51/split_1/train/*' \
    --test_dataset_shards '/tmp/hmdb51/split_1/test/*' \
    --svm_regularization 0.0001 \
    --batch_size 8

Note that any of the three splits can be evaluated by changing the dataset split paths. To run this efficiently using a GPU, it is also necessary to install the correct version of jaxlib. To install jaxlib with support for cuda 10.1 on linux, the following install should be sufficient, though other precompiled packages may be found through the JAX documentation.

  pip install https://storage.googleapis.com/jax-releases/cuda101/jaxlib-0.1.69+cuda101-cp39-none-manylinux2010_x86_64.whl

Depending on the available GPU memory available, the batch_size parameter may be tuned to obtain better performance, or to reduce the required GPU memory.

Training a network

This package may also be used to train a model from scratch using jaxline. In order to try this, first ensure the configuration is set appropriately by modifying brave/config.py. At minimum, it would also be necessary to choose an appropriate global batch size (by default, the setting of 512 is likely too large for any single-machine training setup). In addition, a value must be set for dataset_shards. This should contain the paths of the tfrecord files containing the serialized training data.

For details on checkpointing and distributing computation, see the jaxline documentation.

Similarly to above, it is necessary to install the correct jaxlib package to enable training on a GPU.

The training may now be launched using,

  python -m brave.experiment --config=brave/config.py

Training datasets

This model is able to read data stored in the format specified by DMVR. For an example of writing training data in the correct format see the code in dataset/fixtures.py, which is used to write the test fixtures used in the tests for this package.

Running the tests

After checking out this code locally, you may run the package tests using

  pip install -e .
  pytest brave

We recommend doing this from a clean virtual environment.

Citing this work

If you use this code (or any derived code), data or these models in your work, please cite the relevant accompanying paper.

@misc{recasens2021broaden,
      title={Broaden Your Views for Self-Supervised Video Learning},
      author={Adrià Recasens and Pauline Luc and Jean-Baptiste Alayrac and Luyu Wang and Ross Hemsley and Florian Strub and Corentin Tallec and Mateusz Malinowski and Viorica Patraucean and Florent Altché and Michal Valko and Jean-Bastien Grill and Aäron van den Oord and Andrew Zisserman},
      year={2021},
      eprint={2103.16559},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Disclaimer

This is not an official Google product

Comments

Config help would be appreciated
Hi,

I am having difficulty setting up this program to use my GPU (nvidia RTX 2070). I am relatively new to configuring nvidia drivers and cuda versions in general and I attempted to do so in Ubuntu 20.04. For your information, I am running the brave program within a Python3.9 venv (not Anaconda).

I tried 2 approaches:

On Windows host, I created a UbuntuServer20.04 VM in VirtualBox. I did not go very far with this, because stumbled upon a post suggesting that I cannot use my host's GPU with full functionality through VirtualBox VM (only an emulated version with cuda cores inaccessible).

Dual-booted my machine with Ubuntu Desktop 20.04. I was able to install Cuda 10.1 (as per noted is needed for jaxlib in the brave documentation). To install cuda, I followed steps on this website : https://medium.com/@stephengregory_69986/installing-cuda-10-1-on-ubuntu-20-04-e562a5e724a0

When on Linux host, I have no issue displaying my GPU drivers and cuda version with terminal commands, however, there seems to be a default path I forgot to set somewhere so that the brave code knows where to look for my cuda binaries.

When I run the long command (python -m brave.evaluate_video_embeddings) with "--batch_size 8", I see many warnings regarding "cuda.so", then my CPU fan makes a lot of noise, but I do not see any print statements. Eventually "Killed" is printed on the terminal and the program stops.

When I run the long command (python -m brave.evaluate_video_embeddings) with "--batch_size 1", I see the same warnings regarding "cuda.so" and "xla_bridge.py:234] No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)".

However, I get to see the following "completed embedding" progressing, which is nice! I stopped the program because I had other things to do and clearly have no idea of how long this process could take. This said, this is the most progress I made although the warnings suggest that the code exclusively used my CPU (not GPU).

My questions:

Is a cuda enabled GPU necessary to run this program or does it just make it run faster?

Is there something obviously wrong with my GPU configuration attempts that I do not see? Please recommend a different approach or point me to relevant documentation.

How long does the "brave.evaluate_video_embeddings --batch_size 1" command take to execute approximately? I have a i7 9700k + RTX 2070 and everything stored on SSD.

Thank you very much for your time. I am excited to get this code to run.
opened by JulienLariviere 2

[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

TBE The source code for our paper "Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Le

150 Dec 28, 2022

Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

111 Dec 29, 2022

Deep Semisupervised Multiview Learning With Increasing Views (IEEE TCYB 2021, PyTorch Code)

Deep Semisupervised Multiview Learning With Increasing Views (ISVN, IEEE TCYB) Peng Hu, Xi Peng, Hongyuan Zhu, Liangli Zhen, Jie Lin, Huaibai Yan, Dez

3 Nov 19, 2022

[CVPR 2022 Oral] Crafting Better Contrastive Views for Siamese Representation Learning

Crafting Better Contrastive Views for Siamese Representation Learning (CVPR 2022 Oral) 2022-03-29: The paper was selected as a CVPR 2022 Oral paper! 2

249 Dec 28, 2022

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

59 Dec 28, 2022

Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models

Patch-Rotation(PatchRot) Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models Submitted to Neurips2021 To

4 Jul 12, 2021

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

1 Jan 23, 2022

Video Autoencoder: self-supervised disentanglement of 3D structure and motion

Video Autoencoder: self-supervised disentanglement of 3D structure and motion This repository contains the code (in PyTorch) for the model introduced

157 Dec 22, 2022

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training [Arxiv] VideoMAE: Masked Autoencoders are Data-Efficient Learne

Multimedia Computing Group, Nanjing University

697 Jan 7, 2023

A JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short.

Related tags

Overview

BraVe

Reproducing results from the paper

Training a network

Training datasets

Running the tests

Citing this work

Disclaimer

You might also like...

[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

Deep Semisupervised Multiview Learning With Increasing Views (IEEE TCYB 2021, PyTorch Code)

[CVPR 2022 Oral] Crafting Better Contrastive Views for Siamese Representation Learning

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Video Autoencoder: self-supervised disentanglement of 3D structure and motion

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Comments

Config help would be appreciated

Owner

DeepMind

Mini-hmc-jax - A simple implementation of Hamiltonian Monte Carlo in JAX

The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

GAN JAX - A toy project to generate images from GANs with JAX

CLOOB training (JAX) and inference (JAX and PyTorch)

Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

An official PyTorch Implementation of Boundary-aware Self-supervised Learning for Video Scene Segmentation (BaSSL)

Viewmaker Networks: Learning Views for Unsupervised Representation Learning

Implementation of FitVid video prediction model in JAX/Flax.

Usable Implementation of "Bootstrap Your Own Latent" self-supervised learning, from Deepmind, in Pytorch