Scenic: A Jax Library for Computer Vision and Beyond

Google Research

Last update: Dec 27, 2022

Related tags

Deep Learning research computer-vision deep-learning transformers attention jax vision-transformer

Overview

Scenic

Scenic is a codebase with a focus on research around attention-based models for computer vision. Scenic has been successfully used to develop classification, segmentation, and detection models for multiple modalities including images, video, audio, and multimodal combinations of them.

More precisely, Scenic is a (i) set of shared light-weight libraries solving tasks commonly encountered tasks when training large-scale (i.e. multi-device, multi-host) vision models; and (ii) a number of projects containing fully fleshed out problem-specific training and evaluation loops using these libraries.

Scenic is developed in JAX and uses Flax.

What we offer

Among others Scenic provides

Boilerplate code for launching experiments, summary writing, logging, profiling, etc;
Optimized training and evaluation loops, losses, metrics, bi-partite matchers, etc;
Input-pipelines for popular vision datasets;
Baseline models, including strong non-attentional baselines.

Papers using Scenic

Scenic can be used to reproduce the results from the following papers, which were either developed using Scenic, or have been reimplemented in Scenic:

Philosophy

Scenic aims to facilitate rapid prototyping of large-scale vision models. To keep the code simple to understand and extend we prefer forking and copy-pasting over adding complexity or increasing abstraction. Only when functionality proves to be widely useful across many models and tasks it may be upstreamed to Scenic's shared libraries.

Code structure

Shared libraries provided by Scenic are split into:

dataset_lib: Implements IO pipelines for loading and pre-processing data for common Computer Vision tasks and benchmarks. All pipelines are designed to be scalable and support multi-host and multi-device setups, taking care of dividing data among multiple hosts, incomplete batches, caching, pre-fetching, etc.
model_lib: Provides (i) several abstract model interfaces (e.g. ClassificationModel or SegmentationModel in model_lib.base_models) with task-specific losses and metrics; (ii) neural network layers in model_lib.layers, focusing on efficient implementation of attention and transfomer layers; and (iii) accelerator-friedly implementations of bipartite matching algorithms in model_lib.matchers.
train_lib: Provides tools for constructing training loops and implements several example trainers (classification trainer and segmentation trainer).
common_lib: Utilities that do not belong anywhere else.

Projects

Models built on top of Scenic exist as separate projects. Model-specific code such as configs, layers, losses, network architectures, or training and evaluation loops exist as separate projects.

Common baselines such as a ResNet or a Visual Transformer (ViT) are implemented in the projects/baselines project. Forking this directory is a good starting point for new projects.

There is no one-fits-all recipe for how much code should be re-used by projects. Projects can fall anywhere on the wide spectrum of code re-use: from defining new configs for an existing model to redefining models, training loop, logging, etc.

Getting started

See projects/baselines/README.md for a walk-through baseline models and instructions on how to run the code.
If you would like to to contribute to Scenic, please check out the Philisophy, Code structure and Contributing sections. Should your contribution be a part of the shared libraries, please send us a pull request!

Quick start

Download the code from GitHub

git clone https://github.com/google-research/scenic.git
cd scenic
pip install .

and run training for ViT on ImageNet:

python main.py -- \
  --config=projects/baselines/configs/imagenet/imagenet_vit_config.py \
  --workdir=./

Disclaimer: This is not an official Google product.

Comments

For OWL-ViT, is there a demo which shows the way using image patch as querys to do one-shot detection?

Hi, thanks for your great work. And the demo of text zero-shot is amazing. For OWL-ViT, is there a demo which shows the way using image patch as querys to do one-shot detection? Thanks.

opened by Edwardmark 9
Does TokenLearner only square inputs supported?

TokenLearner has versions of v1.0 and v1.1. https://github.com/google-research/scenic/blob/98fdaae2be238e233ba213643c41227bb8f60fb3/scenic/projects/token_learner/model.py#L140-L141 The v1.1 said only supported square inputs. Does the V1.0 version also support square input? Why?

opened by leijue222 7
OWL-ViT: Tensorflow checkpoints as downloads

Hi @mjlm! Thanks for your awesome work on OWL-ViT!

IMHO, it would be convenient to users and also more sustainable, if you would make the checkpoints for tensorflow available as downloads. Like that, people would not have to convert them themselves using the export notebook.

Thanks for considering it.

opened by maxfrei750 6
error caused by config.init_from.model_config = None

Hi,

Do you mind sharing the config.init_from.model_config for vivit project? The config py files in the repo set model_config to None, which causes error when trying to initialize model.

Thanks!
projects

opened by mlpen 6
Support alternative data_augmentations in imagenet_dataset

We are interested in using scenic's dataset_lib for a self-supervised learning project. The current imagenet_dataset is limited to either "default" augmentation (random crop, reshape, flip) or "None" (center crop, reshape). Some self-supervised learning models rely on alternative augmentations like random_resized_crop.

I could PR, (a) a solution that allows users to pass arbitrary preprocessing functions or (b) add a branch to the current logic for ssl preprocessing. Is this something that scenic would support? Is there a preference for one of these solutions?

opened by ryanccarelli 5
ModuleNotFoundError: No module named 'official'

I'm trying to train a mbt model. However, I got the following error:

Traceback (most recent call last): File "/home/eftekhar/anaconda3/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/eftekhar/anaconda3/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/eftekhar/models/scenic/scenic/projects/mbt/main.py", line 47, in app.run(main=main) File "/home/eftekhar/models/scenic/scenic/app.py", line 65, in run app.run(functools.partial(_run_main, main=main)) File "/home/eftekhar/anaconda3/lib/python3.9/site-packages/absl/app.py", line 308, in run _run_main(main, args) File "/home/eftekhar/anaconda3/lib/python3.9/site-packages/absl/app.py", line 254, in _run_main sys.exit(main(argv)) File "/home/eftekhar/models/scenic/scenic/app.py", line 100, in _run_main main(rng=rng, config=FLAGS.config, workdir=FLAGS.workdir, writer=writer) File "/home/eftekhar/models/scenic/scenic/projects/mbt/main.py", line 34, in main dataset = train_utils.get_dataset( File "/home/eftekhar/models/scenic/scenic/train_lib_deprecated/train_utils.py", line 280, in get_dataset dataset_builder = datasets.get_dataset(dataset_name) File "/home/eftekhar/models/scenic/scenic/dataset_lib/datasets.py", line 144, in get_dataset return DatasetRegistry.get(dataset_name) File "/home/eftekhar/models/scenic/scenic/dataset_lib/datasets.py", line 107, in get importlib.import_module(module) File "/home/eftekhar/anaconda3/lib/python3.9/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1030, in _gcd_import File "", line 1007, in _find_and_load File "", line 986, in _find_and_load_unlocked File "", line 680, in _load_unlocked File "", line 850, in exec_module File "", line 228, in _call_with_frames_removed File "/home/eftekhar/models/scenic/scenic/projects/vivit/data/video_tfrecord_dataset.py", line 18, in from scenic.dataset_lib import video_ops File "/home/eftekhar/models/scenic/scenic/dataset_lib/video_ops.py", line 35, in from official.vision.image_classification import augment ModuleNotFoundError: No module named 'official'

opened by parhameftekhar 4

Unable to install scenic from requirements.txt in docker

Hi, I am trying to build docker image using

https://github.com/google-research/google-research/blob/eaa1a3f4c7e223f86c5266605c8aaf5b09df640b/dreamfields/Dockerfile#L1-L14

But the process halts at scenic with following error.

requirements.txt relevant content

git+git://github.com/google-research/scenic.git

error log:

Step 5/8 : RUN pip install -r requirements.txt
 ---> Running in 6678fe83aa7b
Looking in links: https://storage.googleapis.com/jax-releases/jax_releases.html, https://download.pytorch.org/whl/torch_stable.html
Collecting git+https://github.com/openai/CLIP.git (from -r requirements.txt (line 5))
  Cloning https://github.com/openai/CLIP.git to /tmp/pip-req-build-ldyy3145
  Running command git clone --filter=blob:none -q https://github.com/openai/CLIP.git /tmp/pip-req-build-ldyy3145
  Resolved https://github.com/openai/CLIP.git to commit e58d49454c92986a1d2a6a48add2333bbfbeaf51
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting git+git://github.com/google-research/scenic.git (from -r requirements.txt (line 15))
  Cloning git://github.com/google-research/scenic.git to /tmp/pip-req-build-50cvcxa5
  Running command git clone --filter=blob:none -q git://github.com/google-research/scenic.git /tmp/pip-req-build-50cvcxa5
  fatal: remote error:
    The unauthenticated git protocol on port 9418 is no longer supported.
  Please see https://github.blog/2021-09-01-improving-git-protocol-security-github/ for more information.
WARNING: Discarding git+git://github.com/google-research/scenic.git. Command errored out with exit status 128: git clone --filter=blob:none -q git://github.com/google-research/scenic.git /tmp/pip-req-build-50cvcxa5 Check the logs for full command output.
ERROR: Command errored out with exit status 128: git clone --filter=blob:none -q git://github.com/google-research/scenic.git /tmp/pip-req-build-50cvcxa5 Check the logs for full command output.
WARNING: You are using pip version 21.3.1; however, version 22.0.4 is available.
You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.
The command '/bin/sh -c pip install -r requirements.txt' returned a non-zero code: 1

opened by INF800 4

Working with Imagenet-21K

Hi,

Thanks again for this wonderful library! This is not really an issue - sort of a feature request. What would be the best way to work with ImageNet-21K using Scenic? TFDS support only the ImageNet-1K dataset.

opened by moabarar 4

vit demo not running past one epoch

when running the base config, eg:

time python3 main.py --config=projects/baselines/configs/imagenet/imagenet_vit_config.py --workdir=./fp32-test

with the only modification being a batch size of 48 (i am using 2 nvidia gpu's to run things)

at the end of the first epoch the code dies:

I0929 15:28:35.031877 139903179183936 local.py:41] Setting work unit notes: 0.4% @9277, 0.3 steps/s, ETA: 153959 min (90 min : 0.0% checkpoint, 23.7% eval)
I0929 15:28:35.042148 139882655643392 logging_writer.py:34] [9277] steps_per_sec=0.259032
I0929 15:29:42.340490 139903179183936 local.py:41] Setting work unit notes: 0.4% @9281, 0.1 steps/s, ETA: 671072 min (91 min : 0.0% checkpoint, 23.4% eval)
I0929 15:29:42.351035 139882655643392 logging_writer.py:34] [9281] steps_per_sec=0.059428
I0929 15:31:58.488358 139903179183936 local.py:41] Setting work unit notes: 0.4% @9283, 0.0 steps/s, ETA: 2713548 min (93 min : 0.0% checkpoint, 22.8% eval)
I0929 15:31:58.627787 139882655643392 logging_writer.py:34] [9283] steps_per_sec=0.014697
Killed

real    93m53.945s
user    222m56.696s
sys     63m40.052s

do you have any suggestions on things to check/have you seen this before?

opened by brettkoonce 4

OWL-ViT: evaluation on LVIS gets stuck while preparing dataset

Hi! Thanks for your awesome work on OWL-ViT!

However, I have met a problem on using the project. I followed the instruction in README.md to prepare the environment, and installed third-party library. And I used the example command, trying to evaluate. But the program got stuck on this line.

Here is the output.

2022-11-07 03:20:42.996048: W external/org_tensorflow/tensorflow/tsl/platform/default/dso_loader.cc:66] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or director y; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2022-11-07 03:20:43.116724: W external/org_tensorflow/tensorflow/tsl/platform/default/dso_loader.cc:66] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or director y; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2022-11-07 03:20:43.117854: W external/org_tensorflow/tensorflow/tsl/platform/default/dso_loader.cc:66] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or director y; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2022-11-07 03:20:43.743084: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRAR Y_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
WARNING:absl:GlobalAsyncCheckpointManager is not imported correctly. Checkpointing of GlobalDeviceArrays will not be available.To use the feature, install tensorstore. 2022-11-07 03:20:47.933287: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:114] *** WARNING *** You are using ptxas 10.2.89, which is older than 11.1. ptxas before 11.1 is known to miscompile XLA code, leading to incorrect results or invalid-address errors.

2022-11-07 03:20:47.993549: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:114] *** WARNING *** You are using ptxas 10.2.89, which is older than 11.1. ptxas before 11.1 is known to miscompile XLA code, leading to incorrect results or invalid-address errors.

2022-11-07 03:20:48.057838: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:114] *** WARNING *** You are using ptxas 10.2.89, which is older than 11.1. ptxas before 11.1 is known to miscompile XLA code, leading to incorrect results or invalid-address errors.

2022-11-07 03:20:48.116860: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:114] *** WARNING *** You are using ptxas 10.2.89, which is older than 11.1. ptxas before 11.1 is known to miscompile XLA code, leading to incorrect results or invalid-address errors.

2022-11-07 03:20:49.961280: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRAR Y_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2022-11-07 03:20:49.961436: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRAR Y_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2022-11-07 03:20:49.961511: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the gu ide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices...
I1107 03:20:49.963742 139994922760000 evaluator.py:660] JAX devices: 2
W1107 03:20:49.964679 139994922760000 compilation_cache.py:49] Initialized persistent compilation cache at /tmp/jax_compilation_cache 0%| | 0.00/1.29M [00:00<?, ?iB/s]I1107 03:20:50.743195 139994922760000 tokenizer.py:36] Downloaded vocabulary from https://github.com/openai/CLIP/blob/main/clip/bpe_simple_vocab_16e6.txt.gz?raw =true to /root/.cache/scenic/clip
100%|█████████████████████████████████████| 1.29M/1.29M [00:00<00:00, 12.8MiB/s]
2022-11-07 03:20:50.887795: W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "NOT_FOUND: Could n ot locate the credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata". I1107 03:20:51.263590 139994922760000 checkpoints.py:719] Restoring checkpoint from gs://scenic-bucket/owl_vit/checkpoints/clip_vit_b32_b0203fc I1107 03:20:55.555792 139994922760000 dataset_info.py:634] Load pre-computed DatasetInfo (eg: splits, num examples,...) from GCS: lvis/1.2.0 I1107 03:20:56.140138 139994922760000 dataset_info.py:540] Load dataset info from /tmp/tmpapikfn5ltfds
I1107 03:20:56.148214 139994922760000 dataset_info.py:610] Field info.release_notes from disk and from code do not match. Keeping the one from code. I1107 03:20:56.148426 139994922760000 dataset_info.py:610] Field info.splits from disk and from code do not match. Keeping the one from code. I1107 03:20:56.148806 139994922760000 dataset_builder.py:475] Generating dataset lvis (/openseg_blob/dataset/lvis/lvis/1.2.0) Downloading and preparing dataset 25.35 GiB (download: 25.35 GiB, generated: 22.29 GiB, total: 47.64 GiB) to /openseg_blob/dataset/lvis/lvis/1.2.0...
Dl Completed...: 0 url [00:00, ? url/s] I1107 05:20:08.568320 140550711347008 download_manager.py:347] Skipping download of https://s3-us-west-2.amazonaws.com/dl.fbaipublicfiles.com/LVIS/lvis_v1_image_info_test_dev.json.zip: File c ached in /openseg_blob/dataset/lvis/downloads/s3-us-west-2_dl.fbai.com_LVIS_lvis_v1_imagJqn_72umBx8rzrvxAK5qyTBIIaIalLsDPKDknGyjL44.zip
Dl Completed...: 100%|███████████████████████████I1107 05:20:08.571549 140550711347008 download_manager.py:489] Reusing extraction of /openseg_blob/dataset/lvis/downloads/s3-us-west-2_dl.fbai.com_LVIS_lvis_v1_imagJqn_72umBx8rzrvxAK5qyTBIIaI alLsDPKDknGyjL44.zip at /openseg_blob/dataset/lvis/downloads/extracted/ZIP.s3-us-west-2_dl.fbai.com_LVIS_lvis_v1_imagJqn_72umBx8rzrvxAK5qyTBIIaIalLsDPKDknGyjL44.zip.
I1107 05:20:08.587899 140550711347008 download_manager.py:347] Skipping download of http://images.cocodataset.org/zips/test2017.zip: File cached in /openseg_blob/dataset/lvis/downloads/images.cocodataset.org_zips_test2017x5CMPJ-Uui8zQOu-7Fj CXba-h3TxjWjC8V0ONp2Vuro.zip
Dl Completed...: 100%|███████████████████████████I1107 05:20:08.598515 140550711347008 download_manager.py:489] Reusing extraction of /openseg_blob/dataset/lvis/downloads/images.cocodataset.org_zips_test2017x5CMPJ-Uui8zQOu-7FjCXba-h3TxjWjC8 V0ONp2Vuro.zip at /openseg_blob/dataset/lvis/downloads/extracted/ZIP.images.cocodataset.org_zips_test2017x5CMPJ-Uui8zQOu-7FjCXba-h3TxjWjC8V0ONp2Vuro.zip.
I1107 05:20:08.614054 140550711347008 download_manager.py:347] Skipping download of https://s3-us-west-2.amazonaws.com/dl.fbaipublicfiles.com/LVIS/lvis_v1_train.json.zip: File cached in /openseg_blob/dataset/lvis/downloads/s3-us-west-2_dl.f bai.com_LVIS_lvis_v1_trai6EW-gXNtpOfBnEkNVjOSyLLHX_KqFGfgGU-e1pFZOOU.zip
Dl Completed...: 100%|███████████████████████████I1107 05:20:08.616075 140550711347008 download_manager.py:489] Reusing extraction of /openseg_blob/dataset/lvis/downloads/s3-us-west-2_dl.fbai.com_LVIS_lvis_v1_trai6EW-gXNtpOfBnEkNVjOSyLLHX_K qFGfgGU-e1pFZOOU.zip at /openseg_blob/dataset/lvis/downloads/extracted/ZIP.s3-us-west-2_dl.fbai.com_LVIS_lvis_v1_trai6EW-gXNtpOfBnEkNVjOSyLLHX_KqFGfgGU-e1pFZOOU.zip.
I1107 05:20:08.624226 140550711347008 download_manager.py:347] Skipping download of http://images.cocodataset.org/zips/train2017.zip: File cached in /openseg_blob/dataset/lvis/downloads/images.cocodataset.org_zips_train2017aai7WOpfj5nSSHXyF BbeLp3tMXjpA_H3YD4oO54G2Sk.zip
Dl Completed...: 100%|███████████████████████████I1107 05:20:08.632148 140550711347008 download_manager.py:489] Reusing extraction of /openseg_blob/dataset/lvis/downloads/images.cocodataset.org_zips_train2017aai7WOpfj5nSSHXyFBbeLp3tMXjpA_H3 YD4oO54G2Sk.zip at /openseg_blob/dataset/lvis/downloads/extracted/ZIP.images.cocodataset.org_zips_train2017aai7WOpfj5nSSHXyFBbeLp3tMXjpA_H3YD4oO54G2Sk.zip.
I1107 05:20:08.647421 140550711347008 download_manager.py:347] Skipping download of https://s3-us-west-2.amazonaws.com/dl.fbaipublicfiles.com/LVIS/lvis_v1_val.json.zip: File cached in /openseg_blob/dataset/lvis/downloads/s3-us-west-2_dl.fba i.com_LVIS_lvis_v1_val._sJS-sAaW2DLxybeR8oL4SW7_t5HUGAg0plfGy34d5E.zip
Dl Completed...: 100%|███████████████████████████I1107 05:20:08.651283 140550711347008 download_manager.py:489] Reusing extraction of /openseg_blob/dataset/lvis/downloads/s3-us-west-2_dl.fbai.com_LVIS_lvis_v1_val._sJS-sAaW2DLxybeR8oL4SW7_t5 HUGAg0plfGy34d5E.zip at /openseg_blob/dataset/lvis/downloads/extracted/ZIP.s3-us-west-2_dl.fbai.com_LVIS_lvis_v1_val._sJS-sAaW2DLxybeR8oL4SW7_t5HUGAg0plfGy34d5E.zip.
I1107 05:20:08.659612 140550711347008 download_manager.py:347] Skipping download of http://images.cocodataset.org/zips/val2017.zip: File cached in /openseg_blob/dataset/lvis/downloads/images.cocodataset.org_zips_val2017T34syyhm7FBBmTyc8qlSu -1pZHsRXQ902nzo9L74LwU.zip
Dl Completed...: 100%|███████████████████████████I1107 05:20:08.667268 140550711347008 download_manager.py:489] Reusing extraction of /openseg_blob/dataset/lvis/downloads/images.cocodataset.org_zips_val2017T34syyhm7FBBmTyc8qlSu-1pZHsRXQ902n zo9L74LwU.zip at /openseg_blob/dataset/lvis/downloads/extracted/ZIP.images.cocodataset.org_zips_val2017T34syyhm7FBBmTyc8qlSu-1pZHsRXQ902nzo9L74LwU.zip.
Extraction completed...: 0 file [00:00, ? file/s]
Dl Size...: 100%|██████████████████████████████████████| 27214093950/27214093950 [00:00<00:00, 229072711184.82 MiB/s]
Dl Completed...: 100%|███████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 50.36 url/s]
I1107 05:20:45.144630 140550711347008 native_type_compatibility.py:250] Using Any for unsupported type: typing.Sequence[~T]
I1107 05:20:45.253493 140550711347008 bigquery.py:422] No module named google.cloud.bigquery_storage_v1. As a result, the ReadFromBigQuery transform CANNOT be used with method=DIRECT_READ. Generating splits...: 0%| | 0/3 [00:00<?, ? splits/s]W1107 05:29:28.448266 140550711347008 split_builder.py:227] **************************** WARNING **************************

Warning: The dataset you're trying to generate is using Apache Beam,
yet no beam_runner nor beam_options was explicitly provided.

Some Beam datasets take weeks to generate, so are usually not suited
for single machine generation. Please have a look at the instructions
to setup distributed generation:

https://www.tensorflow.org/datasets/beam_datasets#generating_a_beam_dataset

I1107 05:29:28.450022 140550711347008 pipeline.py:188] Missing pipeline option (runner). Executing pipeline using the default runner: DirectRunner. W1107 05:29:48.097574 140550711347008 pipeline_options.py:339] Discarding unparseable args: ['--alsologtostderr=true', '--platform=gpu', '--config=scenic/projects/owl_vit/configs/clip_b32.py', '--checkpoint_path=gs://scenic-bucket/owl_vit/c heckpoints/clip_vit_b32_b0203fc', '--annotations_path=/openseg_blob/dataset/lvis/lvis_v1_val.json', '--tfds_data_dir=/openseg_blob/dataset/lvis', '--output_dir=outputs'] I1107 05:29:48.098392 140550711347008 environments.py:376] Default Python SDK image for environment is apache/beam_python3.7_sdk:2.42.0 I1107 05:36:36.703704 140550711347008 translations.py:714] ==================== <function annotate_downstream_side_inputs at 0x7fcb9ecc55f0> ==================== I1107 05:36:37.070872 140550711347008 translations.py:714] ==================== <function fix_side_input_pcoll_coders at 0x7fcb9ecc5710> ==================== I1107 05:36:37.433204 140550711347008 translations.py:714] ==================== <function pack_combiners at 0x7fcb9ecc5c20> ==================== I1107 05:36:37.437028 140550711347008 translations.py:714] ==================== <function lift_combiners at 0x7fcb9ecc5cb0> ==================== I1107 05:36:37.440332 140550711347008 translations.py:714] ==================== <function expand_sdf at 0x7fcb9ecc5e60> ==================== I1107 05:36:37.805572 140550711347008 translations.py:714] ==================== <function expand_gbk at 0x7fcb9ecc5ef0> ==================== I1107 05:36:37.809968 140550711347008 translations.py:714] ==================== <function sink_flattens at 0x7fcb9ecc6050> ==================== I1107 05:36:37.811944 140550711347008 translations.py:714] ==================== <function greedily_fuse at 0x7fcb9ecc60e0> ==================== I1107 05:36:38.192294 140550711347008 translations.py:714] ==================== <function read_to_impulse at 0x7fcb9ecc6170> ==================== I1107 05:36:38.192996 140550711347008 translations.py:714] ==================== <function impulse_to_input at 0x7fcb9ecc6200> ==================== I1107 05:36:38.193770 140550711347008 translations.py:714] ==================== <function sort_stages at 0x7fcb9ecc6440> ==================== I1107 05:36:38.196089 140550711347008 translations.py:714] ==================== <function add_impulse_to_dangling_transforms at 0x7fcb9ecc6560> ==================== I1107 05:36:38.196668 140550711347008 translations.py:714] ==================== <function setup_timer_mapping at 0x7fcb9ecc63b0> ==================== I1107 05:36:38.563957 140550711347008 translations.py:714] ==================== <function populate_data_channel_coders at 0x7fcb9ecc64d0> ==================== I1107 05:36:44.928154 140550711347008 statecache.py:172] Creating state cache with size 100
I1107 05:36:44.930204 140550711347008 worker_handlers.py:908] Created Worker handler <apache_beam.runners.portability.fn_api_runner.worker_handlers.EmbeddedWorkerHandler object at 0x7fcb93f23d50> for environment ref_Environment_default_envi ronment_1 (beam:env:embedded_python:v1, b'')

I am a beginner of using tensorflow. I have wondered if there is any solution to solve it.

By the way, I actually have LVIS dataset on my disk, but I do not know how to organize it and use it in this program. Could you show the organization of the LVIS dataset?

Looking forward to your comments.

opened by kxqt 3
ERROR: Could not find a version that satisfies the requirement tf_nightly==2.9.0.dev20220401 (from scenic) (from versions: none)

When I tried to install scenic I run pip install . and get this error. ERROR: Could not find a version that satisfies the requirement tf-nightly (from versions: none) ERROR: No matching distribution found for tf-nightly when I tried to run !pip install tf-nightly here is the result: !pip install tf-nightly pip install . install tf-nightly Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple, https://pypi.ngc.nvidia.com Processing /misc/export3/tansihan/scenic Preparing metadata (setup.py) ... done Collecting install Downloading https://pypi.tuna.tsinghua.edu.cn/packages/4d/c8/8cbca135f9e167810756ea2bc34b028501936675fcbd7dadccf752fa4622/install-1.3.5-py3-none-any.whl (3.2 kB) ERROR: Could not find a version that satisfies the requirement tf-nightly (from versions: none) ERROR: No matching distribution found for tf-nightly

My python version is 3.9.0 I wonder why this happens and how to solve it.

opened by Claire874 3
Evaluation process killed itself

I ran following evaluate command to evaluate the public B/32 checkpoint on LVIS but it just killed. Even when I try on Colab pro or PC. Any help is appreciated.

I1215 11:27:14.304406 139930883561344 pipeline.py:185] Missing pipeline option (runner). Executing pipeline using the default runner: DirectRunner. W1215 11:27:33.767053 139930883561344 pipeline_options.py:351] Discarding unparseable args: ['--alsologtostderr=true', '--platform=gpu', '--config=scenic/projects/owl_vit/configs/clip_b32.py', '--checkpoint_path=gs://scenic-bucket/owl_vit/checkpoints/clip_vit_b32_b0203fc', '--annotations_path=/content/scenic/lvis_v1_val.json', '--tfds_data_dir=/content/scenic/val2017', '--output_dir=/content/scenic/evaluator'] I1215 11:27:33.767558 139930883561344 environments.py:376] Default Python SDK image for environment is apache/beam_python3.8_sdk:2.43.0 tcmalloc: large alloc 1434370048 bytes == 0x7f3d47a56000 @ 0x7f4432586615 0x5d6f4c 0x51edd1 0x51ef5b 0x5aac95 0x4990ca 0x55cd91 0x5d8941 0x4997a2 0x4fd8b5 0x4990ca 0x5d8868 0x4997a2 0x5d8868 0x4990ca 0x4fd8b5 0x4990ca 0x5d8868 0x4997a2 0x5d8868 0x4990ca 0x4fd8b5 0x4990ca 0x5d8868 0x4997a2 0x5d8868 0x4997c7 0x5d8868 0x4990ca 0x5d8868 0x4990ca I1215 11:33:47.870071 139930883561344 translations.py:714] ==================== <function annotate_downstream_side_inputs at 0x7f3df3f1b700> ==================== I1215 11:33:48.162448 139930883561344 translations.py:714] ==================== <function fix_side_input_pcoll_coders at 0x7f3df3f1b820> ==================== I1215 11:33:48.453671 139930883561344 translations.py:714] ==================== <function pack_combiners at 0x7f3df3f1bd30> ==================== I1215 11:33:48.457019 139930883561344 translations.py:714] ==================== <function lift_combiners at 0x7f3df3f1bdc0> ==================== I1215 11:33:48.460954 139930883561344 translations.py:714] ==================== <function expand_sdf at 0x7f3df3f1bf70> ==================== I1215 11:33:48.752287 139930883561344 translations.py:714] ==================== <function expand_gbk at 0x7f3df3f1d040> ==================== I1215 11:33:48.755760 139930883561344 translations.py:714] ==================== <function sink_flattens at 0x7f3df3f1d160> ==================== I1215 11:33:48.757105 139930883561344 translations.py:714] ==================== <function greedily_fuse at 0x7f3df3f1d1f0> ==================== I1215 11:33:49.058572 139930883561344 translations.py:714] ==================== <function read_to_impulse at 0x7f3df3f1d280> ==================== I1215 11:33:49.059184 139930883561344 translations.py:714] ==================== <function impulse_to_input at 0x7f3df3f1d310> ==================== I1215 11:33:49.059814 139930883561344 translations.py:714] ==================== <function sort_stages at 0x7f3df3f1d550> ==================== I1215 11:33:49.061884 139930883561344 translations.py:714] ==================== <function add_impulse_to_dangling_transforms at 0x7f3df3f1d670> ==================== I1215 11:33:49.062419 139930883561344 translations.py:714] ==================== <function setup_timer_mapping at 0x7f3df3f1d4c0> ==================== I1215 11:33:49.352953 139930883561344 translations.py:714] ==================== <function populate_data_channel_coders at 0x7f3df3f1d5e0> ==================== I1215 11:33:54.947741 139930883561344 statecache.py:234] Creating state cache with size 104857600 I1215 11:33:54.948321 139930883561344 worker_handlers.py:903] Created Worker handler <apache_beam.runners.portability.fn_api_runner.worker_handlers.EmbeddedWorkerHandler object at 0x7f3dc4807a60> for environment ref_Environment_default_environment_1 (beam:env:embedded_python:v1, b'') Killed

opened by kent252 0
Use `lax.conv_general_dilated_patches` to extract image patches in time linear to the number of channels (vs quadratic).

Use lax.conv_general_dilated_patches to extract image patches in time linear to the number of channels (vs quadratic).

Note that complexity of extract_patches is quadratic in the number of channels C, while lax.conv_general_dilated_patches is linear due to using depthwise convolution. AFAIK extract_image_patches has the same linear complexity, but can be shortened by calling to lax.conv_general_dilated_patches as well.

opened by copybara-service[bot] 0
AttributeError: 'list' object has no attribute 'items'

I'm trying to train the MBT model on my own dataset. I get the following error. Any help is appreciated.

Traceback (most recent call last): File "/home/eftekhar/anaconda3/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/eftekhar/anaconda3/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/eftekhar/models/scenic/scenic/projects/mbt/main.py", line 49, in app.run(main=main) File "/home/eftekhar/anaconda3/lib/python3.9/site-packages/scenic/app.py", line 65, in run app.run(functools.partial(_run_main, main=main)) File "/home/eftekhar/anaconda3/lib/python3.9/site-packages/absl/app.py", line 308, in run _run_main(main, args) File "/home/eftekhar/anaconda3/lib/python3.9/site-packages/absl/app.py", line 254, in _run_main sys.exit(main(argv)) File "/home/eftekhar/anaconda3/lib/python3.9/site-packages/scenic/app.py", line 100, in _run_main main(rng=rng, config=FLAGS.config, workdir=FLAGS.workdir, writer=writer) File "/home/eftekhar/models/scenic/scenic/projects/mbt/main.py", line 39, in main trainer.train( File "/home/eftekhar/anaconda3/lib/python3.9/site-packages/scenic/projects/mbt/trainer.py", line 425, in train gflops) = mbt_train_utils.initialize_model( File "/home/eftekhar/anaconda3/lib/python3.9/site-packages/scenic/projects/mbt/train_utils.py", line 83, in initialize_model for modality_name, spec in input_spec.items(): AttributeError: 'list' object has no attribute 'items'

The config file also is:

r"""Multimodal sound classification on the balanced (mini) AudioSet.

"""

import ml_collections

AUDIOSET_TRAIN_SIZE = 20361

def get_config(): """Returns the base experiment configuration.""" config = ml_collections.ConfigDict() config.experiment_name = 'mbt_balanced_audioset_classification'

config.dataset_configs = ml_collections.ConfigDict() config.dataset_configs.base_dir = '/home/eth/tfrecords_builder/tmp/generated_dataset' config.dataset_configs.tables = { 'train': 'train', 'validation': 'valid', 'test': 'test', } config.dataset_configs.examples_per_subset = { 'train': 4189, 'validation': 898, 'test': 898 } config.dataset_configs.num_classes = 20 config.data_dtype_str = 'float32' config.dataset_name = 'video_tfrecord_dataset' config.dataset_configs.modalities = ('spectrogram', 'rgb') config.dataset_configs.return_as_dict = False config.dataset_configs.num_frames = 32 config.dataset_configs.stride = 2 config.dataset_configs.num_spec_frames = 8 config.dataset_configs.spec_stride = 1

config.dataset_configs.spec_mean = 1.102 config.dataset_configs.spec_stddev = 2.762

config.dataset_configs.min_resize = 256 config.dataset_configs.crop_size = 224 config.dataset_configs.spec_shape = (100, 128)

config.dataset_configs.one_hot_labels = True config.dataset_configs.zero_centering = True

config.dataset_configs.do_multicrop_test = True config.dataset_configs.log_test_epochs = 4 config.dataset_configs.num_test_clips = 4 config.dataset_configs.test_batch_size = 8 # Needs to be num_local_devices config.multicrop_clips_per_device = 2

config.dataset_configs.augmentation_params = ml_collections.ConfigDict() config.dataset_configs.augmentation_params.do_jitter_scale = True config.dataset_configs.augmentation_params.scale_min_factor = 0.9 config.dataset_configs.augmentation_params.scale_max_factor = 1.33 config.dataset_configs.augmentation_params.prob_scale_jitter = 1.0 config.dataset_configs.augmentation_params.do_color_augment = True config.dataset_configs.augmentation_params.prob_color_augment = 0.8 config.dataset_configs.augmentation_params.prob_color_drop = 0.1

config.dataset_configs.prefetch_to_device = 2

config.dataset_configs.spec_augment = True config.dataset_configs.spec_augment_params = ml_collections.ConfigDict() config.dataset_configs.spec_augment_params.freq_mask_max_bins = 48 config.dataset_configs.spec_augment_params.freq_mask_count = 1 config.dataset_configs.spec_augment_params.time_mask_max_frames = 48 config.dataset_configs.spec_augment_params.time_mask_count = 4 config.dataset_configs.spec_augment_params.time_warp_max_frames = 1.0 config.dataset_configs.spec_augment_params.time_warp_max_ratio = 0 config.dataset_configs.spec_augment_params.time_mask_max_ratio = 0

config.model_name = 'mbt_multilabel_classification' config.model = ml_collections.ConfigDict() config.model.modality_fusion = ('spectrogram', 'rgb') config.model.use_bottleneck = True config.model.test_with_bottlenecks = True config.model.share_encoder = False config.model.n_bottlenecks = 4 config.model.fusion_layer = 8 config.model.hidden_size = 768 config.model.patches = ml_collections.ConfigDict() config.model.attention_config = ml_collections.ConfigDict() config.model.attention_config.type = 'spacetime' config.model.num_heads = 12 config.model.mlp_dim = 3072 config.model.num_layers = 12 config.model.representation_size = None config.model.classifier = 'gap' config.model.attention_dropout_rate = 0. config.model.dropout_rate = 0. config.model_dtype_str = 'float32'

config.model.temporal_encoding_config = ml_collections.ConfigDict() config.model.temporal_encoding_config.method = '3d_conv' config.model.patches.size = [16, 16, 2] config.model.temporal_encoding_config.kernel_init_method = 'central_frame_initializer' config.model.temporal_encoding_config.n_sampled_frames = 4 # Unused here.

config.trainer_name = 'mbt_trainer' config.optimizer = 'momentum' config.optimizer_configs = ml_collections.ConfigDict() config.l2_decay_factor = 0 config.max_grad_norm = 1 config.label_smoothing = 0.3 config.num_training_epochs = 50 config.batch_size = 64 config.rng_seed = 0 config.mixup = ml_collections.ConfigDict() config.mixup.alpha = 0.5 config.mixmod = False config.model.stochastic_droplayer_rate = 0.3

config.init_from = ml_collections.ConfigDict() config.init_from.model_config = None disable=line-too-long config.init_from.checkpoint_path = '/home/eth/models/scenic/scenic/projects/mbt/vit_base' config.init_from.checkpoint_format = 'scenic' config.init_from.model_config = ml_collections.ConfigDict() config.init_from.model_config.model = ml_collections.ConfigDict() config.init_from.model_config.model.classifier = 'token' # Specify if this is 'token' or 'gap'. pylint: disable=line-too-long config.init_from.restore_positional_embedding = True config.init_from.restore_input_embedding = True config.init_from.positional_embed_size_change = 'resize_tile'

steps_per_epoch = AUDIOSET_TRAIN_SIZE // config.batch_size total_steps = config.num_training_epochs * steps_per_epoch config.lr_configs = ml_collections.ConfigDict() config.lr_configs.learning_rate_schedule = 'compound' config.lr_configs.factors = 'constant * cosine_decay * linear_warmup' config.lr_configs.warmup_steps = 2.5 * steps_per_epoch config.lr_configs.steps_per_cycle = total_steps config.lr_configs.base_learning_rate = 5e-1

config.write_summary = True config.checkpoint = True # Do checkpointing. config.debug_train = False # Debug mode during training. config.debug_eval = False # Debug mode during eval. config.checkpoint_steps = 500 # Checkpoint more frequently than a val epoch. return config

opened by parhameftekhar 0
unable to install MTV (Scenic)

When I install, it reports "unable to initialize coda" caused by jax and jaxlib. If I use recent version of them, it says "module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'". Obviously, in these new versions, the attribute 'GpuAllocatorConfig'" has been removed. In fact, if I downgrade the version to 0.3.5, these problems disappear. However, in this case it will report can not import 'GlobalAsyncCheckpointManager' from 'jax.experimental.gda_serialization.serialization'. I have also tried the other versions, but I did not find version that works. It also seems that the version of Flax should match with versions of jax and jaxlib. Could anyone have idea about this?

For version 0.3.25 of jax and jaxlib, the errors are:

I1206 10:23:18.787282 46942088984704 xla_bridge.py:353] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: I1206 10:23:18.787436 46942088984704 xla_bridge.py:353] Unable to initialize backend 'cuda': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' I1206 10:23:18.787487 46942088984704 xla_bridge.py:353] Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' I1206 10:23:18.788055 46942088984704 xla_bridge.py:353] Unable to initialize backend 'tpu': INVALID_ARGUMENT: TpuPlatform is not available. I1206 10:23:18.788148 46942088984704 xla_bridge.py:353] Unable to initialize backend 'plugin': xla_extension has no attributes named get_plugin_device_client. Compile TensorFlow with //tensorflow/compiler/xla/python:enable_plugin_device set to true (defaults to false) to enable this. W1206 10:23:18.788196 46942088984704 xla_bridge.py:360] No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

For version 0.3.5 of jax and jaxlib, the errors are: ImportError: cannot import name 'GlobalAsyncCheckpointManager' from 'jax.experimental.gda_serialization.serialization' (/data/gpfs/projects/punim0512/haihang_envs/Scenic/lib/python3.9/site-packages/jax/experimental/gda_serialization/serialization.py)

opened by HaihangWu 0

Scenic: A Jax Library for Computer Vision and Beyond

Related tags

Overview

Scenic

What we offer

Papers using Scenic

Philosophy

Code structure

Projects

Getting started

Quick start

Comments

Owner

Google Research

GAN JAX - A toy project to generate images from GANs with JAX

Mini-hmc-jax - A simple implementation of Hamiltonian Monte Carlo in JAX

GluonMM is a library of transformer models for computer vision and multi-modality research

Open Source Differentiable Computer Vision Library for PyTorch

An Agnostic Computer Vision Framework - Pluggable to any Training Library: Fastai, Pytorch-Lightning with more to come

CVNets: A library for training computer vision networks

A simple, high level, easy-to-use open source Computer Vision library for Python.

[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

A machine learning library for spiking neural networks. Supports training with both torch and jax pipelines, and deployment to neuromorphic hardware.

Datasets, Transforms and Models specific to Computer Vision

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian Sign Language.

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

ML for NLP and Computer Vision.

Lacmus is a cross-platform application that helps to find people who are lost in the forest using computer vision and neural networks.

Monk is a low code Deep Learning tool and a unified wrapper for Computer Vision.

Spiking Neural Network for Computer Vision using SpikingJelly framework and Pytorch-Lightning

Computer Vision Script to recognize first person motion, developed as final project for the course "Machine Learning and Deep Learning"

Computer vision - fun segmentation experience using classic and deep tools :)