X-modaler is a versatile and high-performance codebase for cross-modal analytics.

Overview

X-modaler

X-modaler is a versatile and high-performance codebase for cross-modal analytics. This codebase unifies comprehensive high-quality modules in state-of-the-art vision-language techniques, which are organized in a standardized and user-friendly fashion.

The original paper can be found here.

Installation

See installation instructions.

Requiremenets

  • Linux or macOS with Python ≥ 3.6
  • PyTorch ≥ 1.8 and torchvision that matches the PyTorch installation. Install them together at pytorch.org to make sure of this
  • fvcore
  • pytorch_transformers
  • jsonlines
  • pycocotools

Getting Started

See Getting Started with X-modaler

Training & Evaluation in Command Line

We provide a script in "train_net.py", that is made to train all the configs provided in X-modaler. You may want to use it as a reference to write your own training script.

To train a model(e.g., UpDown) with "train_net.py", first setup the corresponding datasets following datasets, then run:

# Teacher Force
python train_net.py --num-gpus 4 \
 	--config-file configs/image_caption/updown.yaml

# Reinforcement Learning
python train_net.py --num-gpus 4 \
 	--config-file configs/image_caption/updown_rl.yaml

Model Zoo and Baselines

A large set of baseline results and trained models are available here.

Image Captioning
Attention Show, attend and tell: Neural image caption generation with visual attention ICML 2015
LSTM-A3 Boosting image captioning with attributes ICCV 2017
Up-Down Bottom-up and top-down attention for image captioning and visual question answering CVPR 2018
GCN-LSTM Exploring visual relationship for image captioning ECCV 2018
Transformer Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning ACL 2018
Meshed-Memory Meshed-Memory Transformer for Image Captioning CVPR 2020
X-LAN X-Linear Attention Networks for Image Captioning CVPR 2020
Video Captioning
MP-LSTM Translating Videos to Natural Language Using Deep Recurrent Neural Networks NAACL HLT 2015
TA Describing Videos by Exploiting Temporal Structure ICCV 2015
Transformer Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning ACL 2018
TDConvED Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning AAAI 2019
Vision-Language Pretraining
Uniter UNITER: UNiversal Image-TExt Representation Learning ECCV 2020
TDEN Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network AAAI 2021

Image Captioning on MSCOCO (Cross-Entropy Loss)

Name Model BLEU@1 BLEU@2 BLEU@3 BLEU@4 METEOR ROUGE-L CIDEr-D SPICE
LSTM-A3 GoogleDrive 75.3 59.0 45.4 35.0 26.7 55.6 107.7 19.7
Attention GoogleDrive 76.4 60.6 46.9 36.1 27.6 56.6 113.0 20.4
Up-Down GoogleDrive 76.3 60.3 46.6 36.0 27.6 56.6 113.1 20.7
GCN-LSTM GoogleDrive 76.8 61.1 47.6 36.9 28.2 57.2 116.3 21.2
Transformer GoogleDrive 76.4 60.3 46.5 35.8 28.2 56.7 116.6 21.3
Meshed-Memory GoogleDrive 76.3 60.2 46.4 35.6 28.1 56.5 116.0 21.2
X-LAN GoogleDrive 77.5 61.9 48.3 37.5 28.6 57.6 120.7 21.9
TDEN GoogleDrive 75.5 59.4 45.7 34.9 28.7 56.7 116.3 22.0

Image Captioning on MSCOCO (CIDEr Score Optimization)

Name Model BLEU@1 BLEU@2 BLEU@3 BLEU@4 METEOR ROUGE-L CIDEr-D SPICE
LSTM-A3 GoogleDrive 77.9 61.5 46.7 35.0 27.1 56.3 117.0 20.5
Attention GoogleDrive 79.4 63.5 48.9 37.1 27.9 57.6 123.1 21.3
Up-Down GoogleDrive 80.1 64.3 49.7 37.7 28.0 58.0 124.7 21.5
GCN-LSTM GoogleDrive 80.2 64.7 50.3 38.5 28.5 58.4 127.2 22.1
Transformer GoogleDrive 80.5 65.4 51.1 39.2 29.1 58.7 130.0 23.0
Meshed-Memory GoogleDrive 80.7 65.5 51.4 39.6 29.2 58.9 131.1 22.9
X-LAN GoogleDrive 80.4 65.2 51.0 39.2 29.4 59.0 131.0 23.2
TDEN GoogleDrive 81.3 66.3 52.0 40.1 29.6 59.8 132.6 23.4

Video Captioning on MSVD

Name Model BLEU@1 BLEU@2 BLEU@3 BLEU@4 METEOR ROUGE-L CIDEr-D SPICE
MP-LSTM GoogleDrive 77.0 65.6 56.9 48.1 32.4 68.1 73.1 4.8
TA GoogleDrive 80.4 68.9 60.1 51.0 33.5 70.0 77.2 4.9
Transformer GoogleDrive 79.0 67.6 58.5 49.4 33.3 68.7 80.3 4.9
TDConvED GoogleDrive 81.6 70.4 61.3 51.7 34.1 70.4 77.8 5.0

Video Captioning on MSR-VTT

Name Model BLEU@1 BLEU@2 BLEU@3 BLEU@4 METEOR ROUGE-L CIDEr-D SPICE
MP-LSTM GoogleDrive 73.6 60.8 49.0 38.6 26.0 58.3 41.1 5.6
TA GoogleDrive 74.3 61.8 50.3 39.9 26.4 59.4 42.9 5.8
Transformer GoogleDrive 75.4 62.3 50.0 39.2 26.5 58.7 44.0 5.9
TDConvED GoogleDrive 76.4 62.3 49.9 38.9 26.3 59.0 40.7 5.7

Visual Question Answering

Name Model Overall Yes/No Number Other
Uniter GoogleDrive 70.1 86.8 53.7 59.6
TDEN GoogleDrive 71.9 88.3 54.3 62.0

Caption-based image retrieval on Flickr30k

Name Model R1 R5 R10
Uniter GoogleDrive 61.6 87.7 92.8
TDEN GoogleDrive 62.0 86.6 92.4

Visual commonsense reasoning

Name Model Q -> A QA -> R Q -> AR
Uniter GoogleDrive 73.0 75.3 55.4
TDEN GoogleDrive 75.0 76.5 57.7

License

X-modaler is released under the Apache License, Version 2.0.

Citing X-modaler

If you use X-modaler in your research, please use the following BibTeX entry.

@inproceedings{Xmodaler2021,
  author =       {Yehao Li, Yingwei Pan, Jingwen Chen, Ting Yao, and Tao Mei},
  title =        {X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics},
  booktitle =    {Proceedings of the 29th ACM international conference on Multimedia},
  year =         {2021}
}
Comments
  • Some doubts regarding executing the scripts

    Some doubts regarding executing the scripts

    I was following the instructions on this page regarding how to use the builtin mscoco dataset for image captioning. I have doubts on the following points:

    1. In which directory should i have my images and annotations?
    2. I was trying to run the tools/create_feats.py script for converting the karpathy_train_resnet101_faster_rcnn_genome.tsv.0 into .npz format. however I am running out of space on colab pro. What could be the reason for this and is there any fix available?
    opened by architlatkar27 13
  • OSError: [Errno 12] Cannot allocate memory

    OSError: [Errno 12] Cannot allocate memory

    when i run up-down config,the processing is well but when i run xlan config,there are some errors in the procssing of training. I change num_workers=0, this error still exsit. thanks for your reply Error################################### eta: 3:10:12 iter: 800 total_loss: 2.673 time: 1.4086 data_time: 0.4650 lr: 2.5031e-05 max_mem: 2680M Traceback (most recent call last): File "train_net.py", line 71, in args=(args,), File "/data1/wlx/project2021/xmodaler-master/xmodaler/engine/launch.py", line 83, in launch daemon=False, File "/data1/wlx/anaconda3/envs/xmodaler/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/data1/wlx/anaconda3/envs/xmodaler/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes while not context.join(): File "/data1/wlx/anaconda3/envs/xmodaler/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 150, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

    -- Process 0 terminated with the following error: Traceback (most recent call last): File "/data1/wlx/anaconda3/envs/xmodaler/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, *args) File "/data1/wlx/project2021/xmodaler-master/xmodaler/engine/launch.py", line 129, in _distributed_worker main_func(*args) File "/data1/wlx/project2021/xmodaler-master/train_net.py", line 59, in main return trainer.train() File "/data1/wlx/project2021/xmodaler-master/xmodaler/engine/defaults.py", line 365, in train super().train(self.start_iter, self.max_iter) File "/data1/wlx/project2021/xmodaler-master/xmodaler/engine/train_loop.py", line 152, in train self.after_step() File "/data1/wlx/project2021/xmodaler-master/xmodaler/engine/train_loop.py", line 182, in after_step h.after_step() File "/data1/wlx/project2021/xmodaler-master/xmodaler/engine/hooks.py", line 407, in after_step self._do_eval(epoch) File "/data1/wlx/project2021/xmodaler-master/xmodaler/engine/hooks.py", line 372, in _do_eval results = self._func(epoch) File "/data1/wlx/project2021/xmodaler-master/xmodaler/engine/defaults.py", line 324, in test_and_save_results eval_results = self.test(self.cfg, self.model, self.test_data_loader, self.test_evaluator, epoch) File "/data1/wlx/project2021/xmodaler-master/xmodaler/engine/defaults.py", line 478, in test eval_res = evaluator.eval(results, epoch) File "/data1/wlx/project2021/xmodaler-master/xmodaler/evaluation/coco_evaler.py", line 44, in eval cocoEval.evaluate() File "/data1/wlx/project2021/xmodaler-master/cococaption/pycocoevalcap/eval.py", line 38, in evaluate gts = tokenizer.tokenize(gts) File "/data1/wlx/project2021/xmodaler-master/cococaption/pycocoevalcap/tokenizer/ptbtokenizer.py", line 55, in tokenize stdout=subprocess.PIPE) File "/data1/wlx/anaconda3/envs/xmodaler/lib/python3.7/subprocess.py", line 800, in init restore_signals, start_new_session) File "/data1/wlx/anaconda3/envs/xmodaler/lib/python3.7/subprocess.py", line 1482, in _execute_child restore_signals, start_new_session, preexec_fn) OSError: [Errno 12] Cannot allocate memory

    opened by WangLanxiao 7
  • Some question about relation_file, attribute_file and gv_feat_file about COCO

    Some question about relation_file, attribute_file and gv_feat_file about COCO

    Thank you for your contribution. I read the datacode about COCO carefully. I have a little confused. As for ext_data, how can I get these data ? Looking forward to your reply. ext_data = { "relation": _load_pkl_file(self.relation_file), "attribute": _load_pkl_file(self.attribute_file), "gv_feat": _load_pkl_file(self.gv_feat_file) }

    opened by WangLanxiao 6
  • Evaluation

    Evaluation

    Hi I am using pretrained lstmA3 model. After loading the model successfully and obtaining the dataloader, i am trying to evaluate the model on MSCOCO. I have followed the steps as mentioned in the documentation. However there seems to be some problem with it. Here is the screenshot of the same - from xmodaler.evaluation.coco_evaler import COCOEvaler evaluator = COCOEvaler(cfg, annfile="/content/captions_val2014.json", output_dir="/content/outputs") print(type(dataloader)) model.eval() with torch.no_grad(): for batch in dataloader: print(batch[0].keys()) print(batch[0]["ATTRIBUTE"].shape) output = model(batch) break image

    I am able to obtain the dataloader from build_xmodaler_valtest_loader. A batch of this loader contains list of dictionaries.

    opened by architlatkar27 6
  • TypeError: list indices must be integers or slices, not numpy.str_

    TypeError: list indices must be integers or slices, not numpy.str_

    When training with cosnet_rl.yaml, this error occurs: ...... (predictor): BasePredictor( (logits): Linear(in_features=512, out_features=10200, bias=True) (dropout): Dropout(p=0.5, inplace=False) ) (greedy_decoder): GreedyDecoder() (beam_searcher): BeamSearcher() ) [09/08 16:48:14] xl.datasets.common INFO: Serializing 113287 elements to byte tensors and concatenating them all ... [09/08 16:48:23] xl.datasets.common INFO: Serialized dataset takes 162.52 MiB [09/08 16:48:23] xl.datasets.common INFO: Serializing 5000 elements to byte tensors and concatenating them all ... [09/08 16:48:23] xl.datasets.common INFO: Serialized dataset takes 2.68 MiB [09/08 16:48:23] xl.datasets.common INFO: Serializing 5000 elements to byte tensors and concatenating them all ... [09/08 16:48:23] xl.datasets.common INFO: Serialized dataset takes 2.68 MiB [09/08 16:48:39] fvcore.common.checkpoint INFO: [Checkpointer] Loading from /home/stormai/userfile/caoshan/code/xmodaler/ModelResult/xe/cosnet_xe.pth ... [09/08 16:48:44] xl.engine.train_loop INFO: Starting training from iteration 0 [09/08 16:48:45] xl.engine.train_loop ERROR: Exception during training: Traceback (most recent call last): File "/home/stormai/userfile/caoshan/code/xmodaler-master/xmodaler/engine/train_loop.py", line 151, in train self.run_step() File "/home/stormai/userfile/caoshan/code/xmodaler-master/xmodaler/engine/rl_mean_trainer.py", line 47, in run_step bs_rewards = self.scorer(bs_outputs_dict) File "/home/stormai/userfile/caoshan/code/xmodaler-master/xmodaler/scorer/base_scorer.py", line 66, in call gts = [self.gts[i] for i in ids] File "/home/stormai/userfile/caoshan/code/xmodaler-master/xmodaler/scorer/base_scorer.py", line 66, in gts = [self.gts[i] for i in ids] TypeError: list indices must be integers or slices, not numpy.str_

    I try to fix this by forcing ids and gts to convert to list, but it doesn't work. Could you pls help me ?

    opened by chssjhjbj 4
  • Captions on New Dataset

    Captions on New Dataset

    I'm trying to get captions from a pre-trained model on a new dataset (crisismmd) and was wondering whether I need to extract the image features from the raw images first? It looks as though "kfg" utilizes features and their locations rather than raw images as I saw in "mscoco.py": image

    Could I use a model like detectron2 to get these features, if that is what is needed? Or what would be the optimal way of doing this? Thank you in advance!

    opened by sramshetty 4
  • Some question about lr and warm up

    Some question about lr and warm up

    Hello, after I accidentally interrupted the training, I used --resume to resume the training, but I found that the learning rate has not been restored. How can I correct such an error. In addition, which parameters control the end time of warm up? Thanks a lot! image

    enhancement 
    opened by WangLanxiao 4
  • About the extracted features

    About the extracted features

    Hi,

    Thank you very much for your great code. May I ask if you would provide the extracted and pre-processed features (the .npz files) for datasets such as COCO, VCR, MSVTT?

    Thank you very much!

    opened by cpsxhao 4
  • Predict on a given raw input

    Predict on a given raw input

    Dear contributors, is there a simple way I can directly predict results on my customized input? For example, in the video captioning, I want to generate the caption for my own video, which is in MP4 format. Is there a simple way to do it? Thanks.

    enhancement 
    opened by takuyara 4
  • 在CosNet上使用自己的数据集

    在CosNet上使用自己的数据集

    您好!非常感谢您的工作。在使用新数据集在cosnet上训练时,产生了一些问题,向您请教一下: 1.feature中使用的CLIP_RN101_49是怎么得到的?我可以自己使用CLIP的编解码器进行处理么? 2.cosnet中的pkl文件里,'attr_pred' 'attr_labels' 'missing_labels'分别是什么意思?您是否有计划上传预处理的代码? 非常感谢!

    opened by Wangyf1998 3
  • Image to text search using clip

    Image to text search using clip

    Hi, dear author, in your latest CVPR2022 paper (Comprehending and Ordering Semantics for Image Captioning), how to retrieve semantically similar sentences for the input image using clip model, can you give some tutorials? Thanks a lot!

    opened by ltp1995 3
  • During training the net, i got this RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

    During training the net, i got this RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

    I have used the train_net.py, but some problems appeared, first i got the 'line 68, in forward wt.index_copy(0, ind, torch.multinomial(prob_prev, 1).view(-1).index_select(0, ind)) RuntimeError: invalid multinomial distribution (sum of probabilities <= 0)' ,but when i cleared the memory and re-run the train_net_py,it could work successfully until the 10 epoch In 10th epoch,i got this: File "D:\llk\xmodaler-master\xmodaler\modeling\meta_arch\rnn_att_enc_dec.py", line 68, in forward wt.index_copy(0, ind, torch.multinomial(prob_prev, 1).view(-1).index_select(0, ind)) #####bug RuntimeError: probability tensor contains either inf, nan or element < 0 Can somebody help me??😥

    opened by misayllk 0
  • Folder of MSVD dataset features in google drive is empty

    Folder of MSVD dataset features in google drive is empty

    Thank you for sharing this great work The folder of MSVD dataset features in google drive is empty for these links: (https://drive.google.com/drive/folders/1vx9n7tAIt8su0y_3tsPJGvMPBMm8JLCZ?usp=sharing) (https://drive.google.com/drive/folders/1-jvt6aKMDmhZC03DPEpwwgYxeL4PSD5J) I need the script file for extracting the MSVD features to extract these features for videos by myself, can you send it to me please ([email protected])

    opened by adeljalalyousif 2
  • Possible implementation in Google Colab.

    Possible implementation in Google Colab.

    I would need to generate a caption in a video, what would be the easiest way to use it? I don't have to train it, just use it to generate this caption. I haven't found a working example in the documentation. Could it possibly be possible to have a colab notebook ready to use?

    opened by francescomarino10 1
  • 怎么在我自己的数据集上推理?

    怎么在我自己的数据集上推理?

    首先感谢您们的工作,我在尝试使用MSVD的预训练模型权重在我自己准备的数据集上进行视频摘要推理。

    我首先使用resnet152提取我自己的数据集的高维特征,然后组装成npy。 image 之后按照msvd数据集中的caption_test.json,填入我自己数据集中相应的id和file_name,(因为我只要进行推理任务,而且不需要评估,所以annotation为空)。 image

    之后将正确的路径填写到config/video_caption/msvd/base_caption.yaml,但是现在它运行报错,没有1297.npy这个文件。 所以我想提问,为什么我修改了caption_test.json,它还是不读取我的caption_test.json里的数据?

    opened by DestoryVIP 3
  • GitHub Action to lint Python code

    GitHub Action to lint Python code

    Results: https://github.com/cclauss/xmodaler/actions

    https://flake8.pycqa.org/en/latest/user/error-codes.html

    On the flake8 test selection, this PR does not focus on "style violations" (the majority of flake8 error codes that psf/black can autocorrect). Instead, these tests focus on runtime safety and correctness:

    • E9 tests are about Python syntax errors usually raised because flake8 can not build an Abstract Syntax Tree (AST). Often these issues are a sign of unused code or code that has not been ported to Python 3. These would be compile-time errors in a compiled language but in a dynamic language like Python, they result in the script halting/crashing on the user.
    • F63 tests are usually about the confusion between identity and equality in Python. Use ==/!= to compare str, bytes, and int literals is the classic case. These are areas where a == b is True but a is b is False (or vice versa). Python >= 3.8 will raise SyntaxWarnings on these instances.
    • F7 tests logic errors and syntax errors in type hints
    • F82 tests are almost always undefined names which are usually a sign of a typo, missing imports, or code that has not been ported to Python 3. These also would be compile-time errors in a compiled language but in Python, a NameError is raised which will halt/crash the script on the user.
    opened by cclauss 0
Owner
null
[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Versatile Multi-Modal Pre-Training for Human-Centric Perception Fangzhou Hong1  Liang Pan1  Zhongang Cai1,2,3  Ziwei Liu1* 1S-Lab, Nanyang Technologic

Fangzhou Hong 96 Jan 3, 2023
CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

UC2 UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training Mingyang Zhou, Luowei Zhou, Shuohang Wang, Yu Cheng, Linjie Li, Zhou Yu,

Mingyang Zhou 28 Dec 30, 2022
The code for our paper CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention.

CrossFormer This repository is the code for our paper CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention. Introduction Existin

cheerss 238 Jan 6, 2023
High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.

Anakin2.0 Welcome to the Anakin GitHub. Anakin is a cross-platform, high-performance inference engine, which is originally developed by Baidu engineer

null 514 Dec 28, 2022
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

ONNX Runtime is a cross-platform inference and training machine-learning accelerator. ONNX Runtime inference can enable faster customer experiences an

Microsoft 8k Jan 4, 2023
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

ROSITA News & Updates (24/08/2021) Release the demo to perform fine-grained semantic alignments using the pretrained ROSITA model. (15/08/2021) Releas

Vision and Language Group@ MIL 48 Dec 23, 2022
Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

RGBT Crowd Counting Lingbo Liu, Jiaqi Chen, Hefeng Wu, Guanbin Li, Chenglong Li, Liang Lin. "Cross-Modal Collaborative Representation Learning and a L

null 37 Dec 8, 2022
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation This repository is the pytorch implementation of our paper: Hierarchical Cr

null 43 Nov 21, 2022
Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

CMIC-Retrieval Code for Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning. ICCV 2021. Introduction In this wo

null 42 Nov 17, 2022
《Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis》(2021)

Image2Reverb Image2Reverb is an end-to-end neural network that generates plausible audio impulse responses from single images of acoustic environments

Nikhil Singh 48 Nov 27, 2022
Code for Referring Image Segmentation via Cross-Modal Progressive Comprehension, CVPR2020.

CMPC-Refseg Code of our CVPR 2020 paper Referring Image Segmentation via Cross-Modal Progressive Comprehension. Shaofei Huang*, Tianrui Hui*, Si Liu,

spyflying 55 Dec 1, 2022
Cross-modal Deep Face Normals with Deactivable Skip Connections

Cross-modal Deep Face Normals with Deactivable Skip Connections Victoria Fernández Abrevaya*, Adnane Boukhayma*, Philip H. S. Torr, Edmond Boyer (*Equ

null 72 Nov 27, 2022
Probabilistic Cross-Modal Embedding (PCME) CVPR 2021

Probabilistic Cross-Modal Embedding (PCME) CVPR 2021 Official Pytorch implementation of PCME | Paper Sanghyuk Chun1 Seong Joon Oh1 Rafael Sampaio de R

NAVER AI 87 Dec 21, 2022
Cross-Modal Contrastive Learning for Text-to-Image Generation

Cross-Modal Contrastive Learning for Text-to-Image Generation This repository hosts the open source JAX implementation of XMC-GAN. Setup instructions

Google Research 94 Nov 12, 2022
A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

CLIP4CMR A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval The original data and pre-calculate

null 9 Jan 12, 2022
AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)

AOT-GAN for High-Resolution Image Inpainting Arxiv Paper | AOT-GAN: Aggregated Contextual Transformations for High-Resolution Image Inpainting Yanhong

Multimedia Research 214 Jan 3, 2023
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 3k Jan 3, 2023
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Chao Ma 2.8k Feb 12, 2021
[CVPR 2021 Oral] ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis [arxiv|pdf|v

Yinan He 78 Dec 22, 2022