Oscar and VinVL

Overview

Oscar: Object-Semantics Aligned Pre-training for Vision-and-Language Tasks

VinVL: Revisiting Visual Representations in Vision-Language Models

Updates

05/28/2020: Released finetuned models on downstream tasks, please check MODEL_ZOO.md.
05/15/2020: Released pretrained models, datasets, and code for downstream tasks finetuning.
01/13/2021: our new work VinVL proposed OSCAR+, an improved version of OSCAR, and provided a better object-attribute detection model to extract features for V+L tasks. The VinVL work achieved SOTA performance on all seven V+L tasks here. Please stay tuned for the model and code release.
03/08/2021: Oscar+ pretraining code released, please check the last section in VinVL_MODEL_ZOO.md. All image features and model checkpoints in VinVL are also released. Please check VinVL for details.
04/13/2021: Our Scene Graph Benchmark Repo has been released. Welcome to use the code there to extract image features with VinVL pretrained models.

Introduction

This repository contains source code necessary to reproduce the results presented in the paper Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks. We propose a new cross-modal pre-training method Oscar (Object-Semantics Aligned Pre-training). It leverages object tags detected in images as anchor points to significantly ease the learning of image-text alignments. We pre-train Oscar on the public corpus of 6.5 million text-image pairs, and fine-tune it on downstream tasks, creating new state-of-the-arts on six well-established vision-language understanding and generation tasks. For more on this project, see the Microsoft Research Blog post.

Performance

Task t2i t2i i2t i2t IC IC IC IC NoCaps NoCaps VQA NLVR2 GQA
Metric R@1 R@5 R@1 R@5 B@4 M C S C S test-std test-P test-std
SoTA_S 39.2 68.0 56.6 84.5 38.9 29.2 129.8 22.4 61.5 9.2 70.92 58.80 63.17
SoTA_B 54.0 80.8 70.0 91.1 40.5 29.7 137.6 22.8 86.58 12.38 73.67 79.30 -
SoTA_L 57.5 82.8 73.5 92.2 41.7 30.6 140.0 24.5 - - 74.93 81.47 -
----- --- --- --- --- --- --- --- --- --- --- --- --- ---
Oscar_B 54.0 80.8 70.0 91.1 40.5 29.7 137.6 22.8 78.8 11.7 73.44 78.36 61.62
Oscar_L 57.5 82.8 73.5 92.2 41.7 30.6 140.0 24.5 80.9 11.3 73.82 80.05 -
----- --- --- --- --- --- --- --- --- --- --- --- --- ---
VinVL_B 58.1 83.2 74.6 92.6 40.9 30.9 140.6 25.1 92.46 13.07 76.12 83.08 64.65
VinVL_L 58.8 83.5 75.4 92.9 41.0 31.1 140.9 25.2 - - 76.62 83.98 -
gain 1.3 0.7 1.9 0.6 -0.7 0.5 0.9 0.7 5.9 0.7 1.69 2.51 1.48

t2i: text-to-image retrieval; i2t: image-to-text retrieval; IC: image captioning on COCO.

Download

We released pre-trained models, datasets, VinVL image features, and Oscar+ pretraining corpus for downstream tasks. Please check VinVL_DOWNLOAD.md for details.

To download checkpoints for the Vanilla OSCAR, please check DOWNLOAD.md for details.

Installation

Check INSTALL.md for installation instructions.

Model Zoo

Check MODEL_ZOO.md for scripts to run oscar downstream finetuning.

Check VinVL_MODEL_ZOO.md for scripts to run oscar+ pretraining and downstream finetuning.

Citations

Please consider citing this paper if you use the code:

@article{li2020oscar,
  title={Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks},
  author={Li, Xiujun and Yin, Xi and Li, Chunyuan and Hu, Xiaowei and Zhang, Pengchuan and Zhang, Lei and Wang, Lijuan and Hu, Houdong and Dong, Li and Wei, Furu and Choi, Yejin and Gao, Jianfeng},
  journal={ECCV 2020},
  year={2020}
}

@article{zhang2021vinvl,
  title={VinVL: Making Visual Representations Matter in Vision-Language Models},
  author={Zhang, Pengchuan and Li, Xiujun and Hu, Xiaowei and Yang, Jianwei and Zhang, Lei and Wang, Lijuan and Choi, Yejin and Gao, Jianfeng},
  journal={CVPR 2021},
  year={2021}
}

License

Oscar is released under the MIT license. See LICENSE for details.

Comments
  • ModuleNotFoundError: No module named 'transformers.pytorch_transformers'

    ModuleNotFoundError: No module named 'transformers.pytorch_transformers'

    Hi, thank for your work.

    I'm trying to finetune for image captioning task. When i run

    python oscar/run_captioning.py \
        --model_name_or_path pretrained_models/base-vg-labels/ep_67_588997 \
        --do_train \
        --do_lower_case \
        --evaluate_during_training \
        --add_od_labels \
        --learning_rate 0.00003 \
        --per_gpu_train_batch_size 64 \
        --num_train_epochs 30 \
        --save_steps 5000 \
        --output_dir output/
    

    I encounter this error

    2021-02-04 06:41:10.151589: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib:/usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
    2021-02-04 06:41:10.151621: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
    Traceback (most recent call last):
      File "test.py", line 5, in <module>
        from transformers.pytorch_transformers.modeling_utils import PreTrainedModel
    ModuleNotFoundError: No module named 'transformers.pytorch_transformers'
    

    I clone this repo with cmd

    git clone https://github.com/microsoft/Oscar.git
    git submodule init
    git submodule update
    

    How could i fix this issue?

    opened by NguyenVanThanhHust 12
  • Faster RCNN model version and Object Tag Sequences

    Faster RCNN model version and Object Tag Sequences

    opened by xiaoleihuang 10
  • Installation Failure

    Installation Failure

    Failing to clone the repo and its submodules, please help.

    $git clone --recursive [email protected]:microsoft/Oscar.git
    Cloning into 'Oscar'...
    [email protected]: Permission denied (publickey).
    fatal: Could not read from remote repository.
    
    Please make sure you have the correct access rights
    and the repository exists.
    

    Additionally for cloning using https "https://github.com/microsoft/Oscar.git", the submodules fail to install giving the same error:

    [email protected]: Permission denied (publickey).
    fatal: Could not read from remote repository.
    
    opened by tjdevWorks 7
  • Coco caption pertained model output results are not good

    Coco caption pertained model output results are not good

    Hello,

        Thank you for your great works!
    
        I used your pertained model for coco image captioning. Here is the command I used.
    
       python oscar/run_captioning.py \
    --do_test \
    --do_eval \
    --test_yaml test.yaml \
    --per_gpu_eval_batch_size 64 \
    --num_beams 5 \
    --max_gen_length 20 \
    --eval_model_dir image_caption/Oscarrepo/Oscar/checkpoint-29-132780/
    

    where checkpoint-29-132780 is uncompressed pertained coco model folder. But the outputs are not good. Some following examples are the following:

    caption claire libraries libraries libraries libraries libraries robbery libraries libraries libraries libraries libraries libraries libraries librariesletsletslets caption demanded adoptedrredrred libraries libraries libraries libraries librariessteadsteadsteadsteadsteadstead libraries libraries libraries caption typing curvature curvature libraries curvature curvature curvature curvature curvature curvature curvature curvature curvature curvature curvature curvature curvature

    Do I miss some important steps? Thank you for your help! Also, where is the test.yaml. Thanks

    opened by joey-wang123 7
  • Why are the number of labels and the number of image feature regions unequal in the CaptionTensorizer Coco-caption

    Why are the number of labels and the number of image feature regions unequal in the CaptionTensorizer Coco-caption

    https://github.com/microsoft/Oscar/blob/a9013bb7dda35a63856d1cebd16eeeeb73615e5c/oscar/run_captioning.py#L195 Hello, could you please explain why the number of labels(text_b) is not equal to the number of image feature regions? It's a little bit weird from my point of view.

    opened by ZuoJiaxing 7
  • How to create train_caption.json on Flickr8k dataset? [Image Captioning task]

    How to create train_caption.json on Flickr8k dataset? [Image Captioning task]

    Hello everyone! I want to run Oscar on Flickr8k. I've already created all the other files like: feature.lineidx , label.lineidx, feature.tsv, label.tsv,... but I don't know how to create train_caption.json from the captioning annotation of Flickr8k (because I see that train_caption.json of COCO uses attributes: image_id, id, caption; meanwhile, the annotation of Flickr8k uses attributes: image_name, caption). Anyone knows how to do it? Please help me! Thanks a lot!

    opened by hasontung1999 4
  • Extracted feature for VQA test-dev set

    Extracted feature for VQA test-dev set

    Thank you for making this excellent work public! I hope to reproduce your result on vqa task, but some problem occurred about the dataset. I downloaded the vqa dataset following this instruction: https://github.com/microsoft/Oscar/blob/master/DOWNLOAD.md#datasets, and I didn't find image frcnn feature for test-dev. I'm not sure if there's some mistake during my downloading, or this part just wasn't provided. If it's not possible to share frcnn features for test-dev, could you please provide some code and basic information about how to extract the features myself, so I can reproduce the work correctly. For example,

    • the features were extracted by which version of faster rcnn?
    • what's the correct structure to save these features? [i.e. for each image, how to organize all rois features&location and bound them with image id or question id]

    Really thank you for your kind help!

    opened by weiyx16 4
  • Pre-training for image captioning

    Pre-training for image captioning

    Hello, and congrats for your brilliant work! I’d like to ask. For image captioning, you mention in the appendix:

    we directly fine-tune Oscar for image captioning on COCO without additional pre-training on Conceptual Captions

    Does that mean you only use COCO dataset for pretraining, and not the rest (SBU, Flickr, GQA)? And the cider score of 1.4 is achieved after fine tuning the coco only pretrained model?

    opened by fawazsammani 4
  • TypeError: cannot serialize '_io.TextIOWrapper' object

    TypeError: cannot serialize '_io.TextIOWrapper' object

    Hi, when executing the run_captioning command I get this error:

    ForkingPickler(file, protocol).dump(obj) TypeError: cannot serialize '_io.TextIOWrapper' object

    I also report here the complete log:

    python oscar/run_captioning.py --do_test --do_eval --test_yaml vinvl_demo_images_features/inference_test/test.yaml --per_gpu_eval_batch_size 64 --num_beams 5 --max_gen_length 20 --eval_model_dir vinvl_demo_images_features/coco_captioning_large_scst/checkpoint-4-50000

    2021-10-03 18:34:06,058 vlpretrain WARNING: Device: cuda, n_gpu: 1 2021-10-03 18:34:06,063 vlpretrain WARNING: Override max_seq_length to 50 = max_gen_length:20 + od_labels_len:30 2021-10-03 18:34:06,064 vlpretrain WARNING: Override do_lower_case with train args: False -> True 2021-10-03 18:34:06,070 vlpretrain WARNING: Override add_od_labels with train args: False -> True 2021-10-03 18:34:06,101 vlpretrain INFO: Evaluate the following checkpoint: vinvl_demo_images_features/coco_captioning_large_scst/checkpoint-4-50000 2021-10-03 18:34:17,930 vlpretrain INFO: Training/evaluation parameters Namespace(adam_epsilon=1e-08, add_od_labels=True, cider_cached_tokens='coco-train-words.p', config_name='', data_dir='datasets/coco_caption', device='cpu', distributed=False, do_eval=True, do_lower_case=True, do_test=True, do_train=False, drop_out=0.1, drop_worst_after=0, drop_worst_ratio=0, eval_model_dir='vinvl_demo_images_features/coco_captioning_large_scst/checkpoint-4-50000', evaluate_during_training=False, freeze_embedding=False, gradient_accumulation_steps=1, img_feature_dim=2054, img_feature_type='frcnn', label_smoothing=0, learning_rate=3e-05, length_penalty=1, local_rank=0, logging_steps=20, loss_type='sfmx', mask_prob=0.15, max_gen_length=20, max_grad_norm=1.0, max_img_seq_length=50, max_masked_tokens=3, max_seq_a_length=40, max_seq_length=50, max_steps=-1, min_constraints_to_satisfy=2, model_name_or_path=None, no_cuda=True, num_beams=5, num_gpus=1, num_keep_best=1, num_labels=2, num_return_sequences=1, num_train_epochs=40, num_workers=4, output_dir='output/', output_hidden_states=False, output_mode='classification', per_gpu_eval_batch_size=64, per_gpu_train_batch_size=64, repetition_penalty=1, save_steps=-1, sc_baseline_type='greedy', sc_beam_size=1, sc_train_sample_n=5, scheduler='linear', scst=False, seed=88, temperature=1, test_yaml='vinvl_demo_images_features/inference_test/test.yaml', tie_weights=False, tokenizer_name='', top_k=0, top_p=1, train_yaml='train.yaml', use_cbs=False, val_yaml='val.yaml', warmup_steps=0, weight_decay=0.05) 2021-10-03 18:34:17,933 vlpretrain INFO: Evaluate on dataset: vinvl_demo_images_features/inference_test/test.yaml c:\users\gabriele.ferrario\onedrive\desktop\tesi\vinvl\oscar\oscar\oscar\utils\misc.py:34: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. return yaml.load(fp) predict_file: vinvl_demo_images_features/coco_captioning_large_scst/checkpoint-4-50000\pred.coco_caption.test.beam5.max20.odlabels.tsv values: <generator object test..gen_rows at 0x000001F47BC3DD48> test_dataloader: <torch.utils.data.dataloader.DataLoader object at 0x000001F47B421448> C:\Users\gabriele.ferrario.conda\envs\sg_benchmark\lib\site-packages\torch\cuda_init_.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at ..\c10\cuda\CUDAFunctions.cpp:100.) return torch._C._cuda_getDeviceCount() > 0 Traceback (most recent call last): File "oscar/run_captioning.py", line 1018, in main() File "oscar/run_captioning.py", line 1014, in main checkpoint) File "oscar/run_captioning.py", line 621, in evaluate test(args, val_dataloader, model, tokenizer, predict_file) File "oscar/run_captioning.py", line 715, in test tsv_writer(gen_rows(), cache_file) File "c:\users\gabriele.ferrario\onedrive\desktop\tesi\vinvl\oscar\oscar\oscar\utils\tsv_file_ops.py", line 18, in tsv_writer for value in values: File "oscar/run_captioning.py", line 681, in gen_rows for step, (img_keys, batch) in tqdm(enumerate(test_dataloader)): File "C:\Users\gabriele.ferrario.conda\envs\sg_benchmark\lib\site-packages\torch\utils\data\dataloader.py", line 352, in iter return self._get_iterator() File "C:\Users\gabriele.ferrario.conda\envs\sg_benchmark\lib\site-packages\torch\utils\data\dataloader.py", line 294, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "C:\Users\gabriele.ferrario.conda\envs\sg_benchmark\lib\site-packages\torch\utils\data\dataloader.py", line 801, in init w.start() File "C:\Users\gabriele.ferrario.conda\envs\sg_benchmark\lib\multiprocessing\process.py", line 112, in start self._popen = self._Popen(self) File "C:\Users\gabriele.ferrario.conda\envs\sg_benchmark\lib\multiprocessing\context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\Users\gabriele.ferrario.conda\envs\sg_benchmark\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "C:\Users\gabriele.ferrario.conda\envs\sg_benchmark\lib\multiprocessing\popen_spawn_win32.py", line 89, in init reduction.dump(process_obj, to_child) File "C:\Users\gabriele.ferrario.conda\envs\sg_benchmark\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: cannot serialize '_io.TextIOWrapper' object

    Does anyone have any suggestions? thank you

    opened by GabrieleFerrario 3
  • Cannot find eval caption index file when testing image/text retrieval ..

    Cannot find eval caption index file when testing image/text retrieval ..

    First of all, many thanks for your sharing VinVL model. However, when I used Flickr-30k pre-extracted features to finetune for Oscarplus base to evaluate Image/Text Retrieval, I found I missed --eval_caption_index_file minival_caption_indexs_top20.pt. So would you mind sharing the download link? Where could i get the minival_caption_indexs_top20.pt? Thanks!

    opened by byougert 3
  • Cannot find image_label for pre-training

    Cannot find image_label for pre-training

    I followed https://github.com/microsoft/Oscar/blob/master/VinVL_DOWNLOAD.md#pre-exacted-image-features to prepare image features, and followed https://github.com/microsoft/Oscar/blob/master/VinVL_MODEL_ZOO.md#oscarplus-pretraining for pre-training. But I cannot find where are the image labels for the pre-training datasets, e.g. COCO, flickr30k, GCA.

    As shown in https://biglmdiag.blob.core.windows.net/vinvl/pretrain_corpus/coco_flickr30k_googlecc_gqa_sbu_oi_x152c4big2exp168.yaml, we need to prepare image_label_path. corpus: coco_flickr30k_gqa_googlecc_sbu_oi corpus_file: coco_flickr30k_googlecc_gqa_sbu_oi.tsv image_label_path: coco: X152C4_frcnnbig2_exp168model_0060000model.roi_heads.nm_filter_2_model.roi_heads.score_thresh_0.2/coco flickr30k: X152C4_frcnnbig2_exp168model_0060000model.roi_heads.nm_filter_2_model.roi_heads.score_thresh_0.2/flickr30k gqa: X152C4_frcnnbig2_exp168model_0060000model.roi_heads.nm_filter_2_model.roi_heads.score_thresh_0.2/gqa googlecc: X152C4_frcnnbig2_exp168model_0060000model.roi_heads.nm_filter_2_model.roi_heads.score_thresh_0.2/googlecc sbu: X152C4_frcnnbig2_exp168model_0060000model.roi_heads.nm_filter_2_model.roi_heads.score_thresh_0.2/sbu oi: X152C4_frcnnbig2_exp168model_0060000model.roi_heads.nm_filter_2_model.roi_heads.score_thresh_0.2/oi image_feature_path: coco: vinvl/image_features/coco_X152C4_frcnnbig2_exp168model_0060000model.roi_heads.nm_filter_2_model.roi_heads.score_thresh_0.2/model_0060000 flickr30k: vinvl/image_features/flickr30k_X152C4_frcnnbig2_exp168model_0060000model.roi_heads.nm_filter_2_model.roi_heads.score_thresh_0.2/model_0060000 gqa: vinvl/image_features/gqa_X152C4_frcnnbig2_exp168model_0060000model.roi_heads.nm_filter_2_model.roi_heads.score_thresh_0.2/model_0060000 googlecc: vinvl/image_features/googlecc_X152C4_frcnnbig2_exp168model_0060000model.roi_heads.nm_filter_2_model.roi_heads.score_thresh_0.2/model_0060000 sbu: vinvl/image_features/sbu_X152C4_frcnnbig2_exp168model_0060000model.roi_heads.nm_filter_2_model.roi_heads.score_thresh_0.2/model_0060000 oi: vinvl/image_features/oi_X152C4_frcnnbig2_exp168model_0060000model.roi_heads.nm_filter_2_model.roi_heads.score_thresh_0.2/model_0060000

    To be specific, there is no guidance to download or generate predictions_gt.tsv and QA_fileB.tsv files, which are needed for pre-training in https://github.com/microsoft/Oscar/blob/master/oscar/datasets/oscar_tsv.py#L383-L385.

    opened by yikaiw 3
  • Vocabulary of the test split

    Vocabulary of the test split

    Hi! Thanks for the written paper and the availabe code.

    I have what may be a stupid question, but I didn't find a straight answer to it anywhere:

    When evaluating the model with the karpathy test split, some words might not be present on the vocabulary from the train split. What do you do? Simple remove these words from the captions of the test split?

    opened by gondimjoaom 0
  • The specified resource does not exist.

    The specified resource does not exist.

    When I run: wget https://biglmdiag.blob.core.windows.net/oscar/pretrained_models/large-vg-labels.zip It returned "The specified resource does not exist".

    opened by victorup 4
  • Can you share the full NoCaps results on the test data?

    Can you share the full NoCaps results on the test data?

    The OSCAR paper reports a CIDEr score of 78.8 and 80.9 for OSCAR base and large, resp. Since it's not clarified, I assume these are scores on the NoCaps test split, for NoCaps-entire. Can you share the scores for the in-, near- and out-domain subsplits? And can you confirm if the 78.8/80.9 scores are on the test data? Thanks!

    opened by YovaKem 1
  • VinVL features for datasets not available

    VinVL features for datasets not available

    Hi there,

    Thanks a lot for your code release. I noticed that the VinVL features are not available anymore: https://github.com/microsoft/Oscar/blob/master/VinVL_DOWNLOAD.md#pre-exacted-image-features

    Could you please advise?

    opened by aleSuglia 1
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

Ancient Greek BERT The first and only available Ancient Greek sub-word BERT model! State-of-the-art post fine-tuning on Part-of-Speech Tagging and Mor

Pranaydeep Singh 22 Dec 8, 2022
An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

SVM Données Une base d’images contient 490 images pour l’apprentissage (400 voitures et 90 bateaux), et encore 21 images pour fait des tests. Prétrait

Achraf Rahouti 3 Nov 30, 2021
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 23.6k Dec 31, 2022
Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

pgmpy pgmpy is a python library for working with Probabilistic Graphical Models. Documentation and list of algorithms supported is at our official sit

pgmpy 2.2k Jan 3, 2023
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 3k Jan 3, 2023
This repository is related to an Arabic tutorial, within the tutorial we discuss the common data structure and algorithms and their worst and best case for each, then implement the code using Python.

Data Structure and Algorithms with Python This repository is related to the Arabic tutorial here, within the tutorial we discuss the common data struc

Mohamed Ayman 33 Dec 2, 2022
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 20.6k Feb 13, 2021
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 2.8k Feb 12, 2021
BBB streaming without Xorg and Pulseaudio and Chromium and other nonsense (heavily WIP)

BBB Streamer NG? Makes a conference like this... ...streamable like this! I also recorded a small video showing the basic features: https://www.youtub

Lukas Schauer 60 Oct 21, 2022
All the essential resources and template code needed to understand and practice data structures and algorithms in python with few small projects to demonstrate their practical application.

Data Structures and Algorithms Python INDEX 1. Resources - Books Data Structures - Reema Thareja competitiveCoding Big-O Cheat Sheet DAA Syllabus Inte

Shushrut Kumar 129 Dec 15, 2022
Implement face detection, and age and gender classification, and emotion classification.

YOLO Keras Face Detection Implement Face detection, and Age and Gender Classification, and Emotion Classification. (image from wider face dataset) Ove

Chloe 10 Nov 14, 2022
PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos. By adopting a unified pipeline-based API design, PyKale enforces standardization and minimalism, via reusing existing resources, reducing repetitions and redundancy, and recycling learning models across areas.

PyKale 370 Dec 27, 2022
[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, CVPR 2021. Ayan Kumar Bhunia, Pinaki nath Chowdhury, Yongxin Yan

Ayan Kumar Bhunia 44 Dec 12, 2022
Alex Pashevich 62 Dec 24, 2022
Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pytorch Lightning 1.4k Jan 1, 2023