Official code for ROCA: Robust CAD Model Retrieval and Alignment from a Single Image (CVPR 2022)

Related tags

Computer Vision ROCA
Overview

ROCA: Robust CAD Model Alignment and Retrieval from a Single Image (CVPR 2022)

Code release of our paper ROCA. Check out our video, paper, and website!

If you find our paper or this repository helpful, please cite:

@article{gumeli2022roca,
  title={ROCA: Robust CAD Model Retrieval and Alignment from a Single Image},
  author={G{\"u}meli, Can and Dai, Angela and Nie{\ss}ner, Matthias},
  booktitle={Proc. Computer Vision and Pattern Recognition (CVPR), IEEE},
  year={2022}
}

Development Environment

We use the following development environment for this project:

  • Nvidia RTX 3090 GPU
  • Intel Xeon W-1370
  • Ubuntu 20.04
  • CUDA Version 11.2
  • cudatoolkit 11.0
  • Pytorch 1.7
  • Pytorch3D 0.5 or 0.6
  • Detectron2 0.3

Installation

This code is developed using anaconda3 with Python 3.8 (download here), therefore we recommend a similar setup.

You can simply run the following code in the command line to create the development environment:

$ source setup.sh

For visualizing some demo results or using the data preprocessing code, you need our custom rasterizer. In case the provided x86-64 linux shared object does not work for you, you may install the rasterizer here.

Running the Demo

We provide four sample input images in network/assets folder. The images are captured with a smartphone and then preprocessed to be compatible with ROCA format. To run the demo, you first need to download data and config from this Google Drive folder. Models folder contains the pre-trained model and used config, while Data folder contains images and dataset.

Assuming contents of the Models directory are in $MODEL_DIR and contents of the Data directory are in $DATA_DIR, you can run:

$ cd network
$ python demo.py --model_path $MODEL_DIR/model_best.pth --data_dir $DATA_DIR/Dataset --config_path $MODEL_DIR/config.yaml

You will see image overlay and CAD visualization are displayed one by one. Open3D mesh visualization is an interactive window where you can see geometries from different viewpoints. Close the Open3D window to continue to the next visualization. You will see similar results to the image above.

For headless visualization, you can specify an output directory where resulting images and meshes are placed:

$ python demo.py --model_path $MODEL_DIR/model_best.pth --data_dir $DATA_DIR/Dataset --config_path $MODEL_DIR/config.yaml --output_dir $OUTPUT_DIR

You may use the --wild option to visualize results with "wild retrieval". Note that we omit the table category in this case due to large size diversity.

Preparing Data

Downloading Processed Data (Recommended)

We provide preprocessed images and labels in this Google Drive folder. Download and extract all folders to a desired location before running the training and evaluation code.

Rendering Data

Alternatively, you can render data yourself. Our data preparation code lives in the renderer folder.

Our project depends on ShapeNet (Chang et al., '15), ScanNet (Dai et al. '16), and Scan2CAD (Avetisyan et al. '18) datasets. For ScanNet, we use ScanNet25k images which are provided as a zip file via the ScanNet download script.

Once you get the data, check renderer/env.sh file for the locations of different datasets. The meanings of environment variables are described as inline comments in env.sh.

After editing renderer/env.sh, run the data generation script:

$ cd renderer
$ sh run.sh

Please check run.sh to see how individual scripts are running for data preprocessing and feel free to customize the data pipeline!

Training and Evaluating Models

Our training code lives in the network directory. Navigate to the network/env.sh and edit the environment variables. Make sure data directories are consistent with the ones locations downloaded and extracted folders. If you manually prepared data, make sure locations in /network/env.sh are consistent with the variables set in renderer/env.sh.

After you are done with network/env.sh, run the run.sh script to train a new model or evaluate an existing model based on the environment variables you set in env.sh:

$ cd network
$ sh run.sh

Replicating Experiments from the Main Paper

Based on the configurations in network/env.sh, you can run different ablations from the paper. The default config will run the (final) experiment. You can do the following edits cumulatively for different experiments:

  1. For P+E+W+R, set RETRIEVAL_MODE=resnet_resnet+image
  2. For P+E+W, set RETRIEVAL_MODE=nearest
  3. For P+E, set NOC_WEIGHTS=0
  4. For P, set E2E=0

Resources

To get the datasets and gain further insight regarding our implementation, we refer to the following datasets and open-source codebases:

Datasets and Metadata

Libraries

Projects

Comments
  • A small bug

    A small bug

    Hi,thank for your excellent work on single image secen understanding. I found a small bug in line 165 of network/roca/modeling/retrieval_head/retrieval_ops.py.

     except ValueError:
            assert len(feats) == 0 #feats.numel() == 0 is invalid, since feats is a List.
    
    opened by louzq16 6
  • RuntimeError: Error(s) in loading state_dict for ROCA

    RuntimeError: Error(s) in loading state_dict for ROCA

    Hello @cangumeli, Sorry to bother you, but I am getting the following error while running the demo.py with model_best.pth Appreciate any help!

    File "/home/aston/Desktop/python/ROCA-main/network/roca/engine/predictor.py", line 44, in __init__
        model.load_state_dict(backup['model'])
      File "/home/aston/anaconda3/envs/roca/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1604, in load_state_dict
        raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
    RuntimeError: Error(s) in loading state_dict for ROCA:
    	Unexpected key(s) in state_dict: "pixel_mean", "pixel_std", "proposal_generator.anchor_generator.cell_anchors.0", "proposal_generator.anchor_generator.cell_anchors.1", "proposal_generator.anchor_generator.cell_anchors.2", "proposal_generator.anchor_generator.cell_anchors.3", "proposal_generator.anchor_generator.cell_anchors.4". 
    
    Process finished with exit code 1
    
    opened by WenZhiKun 4
  • Ask about the positive and negative exemple mining : )

    Ask about the positive and negative exemple mining : )

    Hi dear author,

    Thanks for your wonderful job and making the source code public so quickly. I think the idea in the paper of learning a joint embedding between depth object and cad model is clever and easy to understand. But it seems that you are not explaining in paper how to find postive and negative exemples in joint embedding learning. I would be appreciate if you can explain more details about it.

    Thanks a lot : )

    opened by DoctorXK 4
  • Errors encountered during training

    Errors encountered during training

    I tried to train the model from scratch using both the provided dataset and processed data following the instructions. But I met the following error.

    `[06/13 16:27:47 d2.data.datasets.coco]: Loaded 5436 images in COCO format from /workspace/ROCA/dataset/Data/Dataset/scan2cad_instances_val.json [06/13 16:27:47 d2.data.common]: Serializing 5436 elements to byte tensors and concatenating them all ... [06/13 16:27:47 d2.data.common]: Serialized dataset takes 11.65 MiB [06/13 16:27:54 d2.evaluation.evaluator]: Start inference on 5436 images [06/13 16:27:55 d2.evaluation.evaluator]: Inference done 11/5436. 0.0710 s / img. ETA=0:07:35 [06/13 16:28:00 d2.evaluation.evaluator]: Inference done 72/5436. 0.0697 s / img. ETA=0:07:21 [06/13 16:28:05 d2.evaluation.evaluator]: Inference done 134/5436. 0.0693 s / img. ETA=0:07:13 [06/13 16:28:10 d2.evaluation.evaluator]: Inference done 199/5436. 0.0683 s / img. ETA=0:07:01 [06/13 16:28:15 d2.evaluation.evaluator]: Inference done 263/5436. 0.0681 s / img. ETA=0:06:54 [06/13 16:28:20 d2.evaluation.evaluator]: Inference done 326/5436. 0.0681 s / img. ETA=0:06:50 [06/13 16:28:25 d2.evaluation.evaluator]: Inference done 390/5436. 0.0679 s / img. ETA=0:06:43 [06/13 16:28:30 d2.evaluation.evaluator]: Inference done 446/5436. 0.0691 s / img. ETA=0:06:46 ERROR [06/13 16:28:34 d2.engine.train_loop]: Exception during training: Traceback (most recent call last): File "/workspace/ROCA/network/roca/modeling/retrieval_head/retrieval_ops.py", line 162, in voxelize_nocs volumes = add_pointclouds_to_volumes(points, volumes) File "/root/anaconda3/envs/pytorch3d/lib/python3.8/site-packages/pytorch3d/ops/points_to_volumes.py", line 275, in add_pointclouds_to_volumes raise ValueError("'pointclouds' have to have their 'features' defined.") ValueError: 'pointclouds' have to have their 'features' defined.

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/root/anaconda3/envs/pytorch3d/lib/python3.8/site-packages/detectron2/engine/train_loop.py", line 135, in train self.after_step() File "/root/anaconda3/envs/pytorch3d/lib/python3.8/site-packages/detectron2/engine/train_loop.py", line 165, in after_step h.after_step() File "/root/anaconda3/envs/pytorch3d/lib/python3.8/site-packages/detectron2/engine/hooks.py", line 353, in after_step self._do_eval() File "/root/anaconda3/envs/pytorch3d/lib/python3.8/site-packages/detectron2/engine/hooks.py", line 328, in _do_eval results = self._func() File "/root/anaconda3/envs/pytorch3d/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 366, in test_and_save_results self._last_eval_results = self.test(self.cfg, self.model) File "/workspace/ROCA/network/roca/engine/trainer.py", line 180, in test results = super().test(cfg, model, evaluators) File "/root/anaconda3/envs/pytorch3d/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 534, in test results_i = inference_on_dataset(model, data_loader, evaluator) File "/root/anaconda3/envs/pytorch3d/lib/python3.8/site-packages/detectron2/evaluation/evaluator.py", line 141, in inference_on_dataset outputs = model(inputs) File "/root/anaconda3/envs/pytorch3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/workspace/ROCA/network/roca/modeling/meta_arch/meta_arch.py", line 40, in forward return self.inference(batched_inputs) File "/workspace/ROCA/network/roca/modeling/meta_arch/meta_arch.py", line 124, in inference results, extra_outputs = self.roi_heads( File "/root/anaconda3/envs/pytorch3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/workspace/ROCA/network/roca/modeling/roi_heads/roi_heads.py", line 132, in forward pred_instances, alignment_outputs = self._forward_alignment( File "/workspace/ROCA/network/roca/modeling/roi_heads/roi_heads.py", line 180, in _forward_alignment return self._forward_alignment_inference( File "/workspace/ROCA/network/roca/modeling/roi_heads/roi_heads.py", line 282, in _forward_alignment_inference predictions, extra_outputs = self.alignment_head( File "/root/anaconda3/envs/pytorch3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/workspace/ROCA/network/roca/modeling/alignment_head/alignment_head.py", line 137, in forward return self.forward_inference(*args, **kwargs) File "/workspace/ROCA/network/roca/modeling/alignment_head/alignment_head.py", line 333, in forward_inference predictions, extra_outputs = self._forward_retrieval_inference( File "/workspace/ROCA/network/roca/modeling/alignment_head/alignment_head.py", line 803, in _forward_retrieval_inference cad_ids, pred_indices = self.retrieval_head( File "/root/anaconda3/envs/pytorch3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/workspace/ROCA/network/roca/modeling/retrieval_head/retrieval_head.py", line 201, in forward return self._embedding_lookup( File "/workspace/ROCA/network/roca/modeling/retrieval_head/retrieval_head.py", line 353, in _embedding_lookup noc_embeds = self.embed_nocs(shape_code, noc_points, pred_masks) File "/workspace/ROCA/network/roca/modeling/retrieval_head/retrieval_ops.py", line 165, in voxelize_nocs assert len(feats) == 0 AssertionError [06/13 16:28:34 d2.engine.hooks]: Overall training speed: 7497 iterations in 1:56:25 (0.9318 s / it) [06/13 16:28:34 d2.engine.hooks]: Total training time: 2:18:02 (0:21:37 on hooks) [06/13 16:28:34 d2.utils.events]: eta: 18:51:20 iter: 7499 total_loss: 5.823 loss_cls: 0.3354 loss_box_reg: 0.482 3 loss_image_depth: 0.3101 loss_mask: 0.3888 loss_mask_iou: 0.3305 loss_roi_depth: 0.2764 loss_mean_depth: 0.2694 loss_scale: 0.3106 loss_depth_min: 0.04275 loss_depth_max: 0.05285 loss_trans: 0.4288 loss_noc: 0.9828 loss_pro c: 0.5804 loss_trans_proc: 0.3947 loss_noc_comp: 0.1564 loss_triplet: 0.2907 loss_rpn_cls: 0.03125 loss_rpn_loc: 0.01378 time: 0.9317 data_time: 0.0259 lr: 0.001 max_mem: 4627M Traceback (most recent call last): File "/workspace/ROCA/network/roca/modeling/retrieval_head/retrieval_ops.py", line 162, in voxelize_nocs volumes = add_pointclouds_to_volumes(points, volumes) File "/root/anaconda3/envs/pytorch3d/lib/python3.8/site-packages/pytorch3d/ops/points_to_volumes.py", line 275, in add_pointclouds_to_volumes raise ValueError("'pointclouds' have to have their 'features' defined.") ValueError: 'pointclouds' have to have their 'features' defined.

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "main.py", line 175, in main(parse_args()) File "main.py", line 171, in main train_or_eval(args, cfg) File "main.py", line 164, in train_or_eval trainer.train() File "/root/anaconda3/envs/pytorch3d/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 413, in train super().train(self.start_iter, self.max_iter) File "/root/anaconda3/envs/pytorch3d/lib/python3.8/site-packages/detectron2/engine/train_loop.py", line 135, in train self.after_step() File "/root/anaconda3/envs/pytorch3d/lib/python3.8/site-packages/detectron2/engine/train_loop.py", line 165, in after_step h.after_step() File "/root/anaconda3/envs/pytorch3d/lib/python3.8/site-packages/detectron2/engine/hooks.py", line 353, in after_step self._do_eval() File "/root/anaconda3/envs/pytorch3d/lib/python3.8/site-packages/detectron2/engine/hooks.py", line 328, in _do_eval results = self._func() File "/root/anaconda3/envs/pytorch3d/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 366, in test_and_save_results self._last_eval_results = self.test(self.cfg, self.model) File "/workspace/ROCA/network/roca/engine/trainer.py", line 180, in test results = super().test(cfg, model, evaluators) File "/root/anaconda3/envs/pytorch3d/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 534, in test results_i = inference_on_dataset(model, data_loader, evaluator) File "/root/anaconda3/envs/pytorch3d/lib/python3.8/site-packages/detectron2/evaluation/evaluator.py", line 141, in inference_on_dataset outputs = model(inputs) File "/root/anaconda3/envs/pytorch3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/workspace/ROCA/network/roca/modeling/meta_arch/meta_arch.py", line 40, in forward return self.inference(batched_inputs) File "/workspace/ROCA/network/roca/modeling/meta_arch/meta_arch.py", line 124, in inference results, extra_outputs = self.roi_heads( File "/root/anaconda3/envs/pytorch3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/workspace/ROCA/network/roca/modeling/roi_heads/roi_heads.py", line 132, in forward pred_instances, alignment_outputs = self._forward_alignment( File "/workspace/ROCA/network/roca/modeling/roi_heads/roi_heads.py", line 180, in _forward_alignment return self._forward_alignment_inference( File "/workspace/ROCA/network/roca/modeling/roi_heads/roi_heads.py", line 282, in _forward_alignment_inference predictions, extra_outputs = self.alignment_head( File "/root/anaconda3/envs/pytorch3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/workspace/ROCA/network/roca/modeling/alignment_head/alignment_head.py", line 137, in forward return self.forward_inference(*args, **kwargs) File "/workspace/ROCA/network/roca/modeling/alignment_head/alignment_head.py", line 333, in forward_inference predictions, extra_outputs = self._forward_retrieval_inference( File "/workspace/ROCA/network/roca/modeling/alignment_head/alignment_head.py", line 803, in _forward_retrieval_inference cad_ids, pred_indices = self.retrieval_head( File "/root/anaconda3/envs/pytorch3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/workspace/ROCA/network/roca/modeling/retrieval_head/retrieval_head.py", line 201, in forward return self._embedding_lookup( File "/workspace/ROCA/network/roca/modeling/retrieval_head/retrieval_head.py", line 353, in _embedding_lookup noc_embeds = self.embed_nocs(shape_code, noc_points, pred_masks) File "/workspace/ROCA/network/roca/modeling/retrieval_head/retrieval_head.py", line 226, in embed_nocs noc_points = voxelize_nocs(grid_to_point_list(noc_points, mask)) File "/workspace/ROCA/network/roca/modeling/retrieval_head/retrieval_ops.py", line 165, in voxelize_nocs assert len(feats) == 0 AssertionError

    `

    I am using pytorch3d 0.6.2, pytorch 1.7.0

    opened by WenM1222 2
  • i can't import something

    i can't import something

    in file predictor.py : i can't import these as follows: from roca.config import roca_config from roca.data import CADCatalog from roca.data.constants import CAD_TAXONOMY, COLOR_BY_CLASS from roca.data.datasets import register_scan2cad from roca.structures import Intrinsics from roca.utils.alignment_errors import translation_diff from roca.utils.linalg import make_M_from_tqs the errors say: Unresolved reference "roca" it 's a basic issues however, i can't solve .

    opened by noviceswing 2
  • unable to load materials from model_normalized.mtl

    unable to load materials from model_normalized.mtl

    skipping vacant point sample ('02933112', '37e5fcf70007bc26788f926f4d51e733')...

    WARNING - 2022-05-30 21:59:28,861 - obj - unable to load materials from: model_normalized.mtl

    WARNING - 2022-05-30 21:59:28,868 - obj - specified material (material_52_24) not loaded!

    opened by noviceswing 1
  • A minor error in README

    A minor error in README

    Hi! I tried the demo by following the instructions in README and it works successfully, except that command should be

    $ python demo.py --model_path $MODEL_DIR/model_best.pth --data_dir $DATA_DIR/Dataset --config_path $MODEL_DIR/config.yaml
    

    where --model_dir is changed to --model_path and --config_dir is changed to --config_path.

    Please correct me if I am wrong. Thanks!

    opened by C-H-Chien 1
  • Error while training the model

    Error while training the model

    Hello @cangumeli, Sorry to bother you again, but I am getting following error while training the ROCA model. Appreciate any help.

    [09/07 15:47:04 d2.evaluation.evaluator]: Inference done 5388/5436. 0.0699 s / img. ETA=0:00:03
    [09/07 15:47:07 d2.evaluation.evaluator]: Total inference time: 0:07:07.043241 (0.078631 s / img per device, on 1 devices)
    [09/07 15:47:07 d2.evaluation.evaluator]: Total inference pure compute time: 0:06:19 (0.069837 s / img per device, on 1 devices)
    
    Starting per-frame evaluation
    Frame: 0/5436
    Frame: 500/5436
    Frame: 1000/5436
    Frame: 1500/5436
    Frame: 2000/5436
    Frame: 2500/5436
    Frame: 3000/5436
    Frame: 3500/5436
    Frame: 4000/5436
    Frame: 4500/5436
    Frame: 5000/5436
    Traceback (most recent call last):
      File "C:\Users\Anaconda3\envs\roca\lib\contextlib.py", line 131, in __exit__
        self.gen.throw(type, value, traceback)
      File "D:\research\code\roca\network\roca\engine\trainer.py", line 207, in cad_context
        yield
      File "D:\research\code\roca\network\roca\engine\trainer.py", line 180, in test
        results = super().test(cfg, model, evaluators)
      File "d:\research\code\detectron2-0.3\detectron2\engine\defaults.py", line 534, in test
        results_i = inference_on_dataset(model, data_loader, evaluator)
      File "d:\research\code\detectron2-0.3\detectron2\evaluation\evaluator.py", line 176, in inference_on_dataset
        results = evaluator.evaluate()
      File "d:\research\code\detectron2-0.3\detectron2\evaluation\evaluator.py", line 91, in evaluate
        result = evaluator.evaluate()
      File "D:\research\code\roca\network\roca\evaluation\per_frame_evaluation.py", line 81, in evaluate
        compute_ap(scores, labels, npos).item() * 100,
    AttributeError: 'float' object has no attribute 'item'
    
    
    opened by supriya-gdptl 1
  • Error while trying to run demo.py

    Error while trying to run demo.py

    I followed all your steps for installation and it is successfully finished but when I try to run your demo.py example I get error: "anaconda3/envs/roca/lib/python3.8/site-packages/torch/lib/../../../../libcublas.so.11: undefined symbol: free_gemm_select, version libcublasLt.so.11" According to google, people who had this error suggested to change version of pytorch or cudatoolkit but when I change it the rest of code fails and I get other errors. What could it be to cause this error and how to solve it?

    opened by PeterARVR 1
  • Question about demo

    Question about demo

    Hello @cangumeli ,

    Thank you for sharing the code.

    I have a question about the code written in demo.py. On line 25, why do you use scene names from ScanNet dataset?

    I want to try the demo code for images taken by phone camera. Could you please tell what steps I need to follow for preprocessing? What scene names I need to choose to write on line 25 of demo.py to work on such not-in-dataset images?

    Thank you, Supriya

    opened by supriya-gdptl 2
  • First issue :) Trying to run it on new images

    First issue :) Trying to run it on new images

    Hi,

    Thanks for this great work, it looks very promising and exciting! I am doing some tests. I did not have major issues with installation and the demo runs. Congrats!

    I want to adapt the code to get a simple CLI with an input image and its intrinsics. It would be great if such a demo was part of the codebase IMHO.

    To this end, could you kindly explain, in demo.py, what is the "scene" argument here ?

       for name, scene in zip(
            ('3m', 'sofa', 'lab', 'desk'),
            ('scene0474_02', 'scene0207_00', 'scene0378_02', 'scene0474_02')
        ):
    

    Thanks Thibault

    opened by ThibaultGROUEIX 2
Owner
null
Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).

Bridging Video-text Retrieval with Multiple Choice Questions, CVPR 2022 (Oral) Paper | Project Page | Pre-trained Model | CLIP-Initialized Pre-trained

Applied Research Center (ARC), Tencent PCG 99 Jan 6, 2023
Code for CVPR'2022 paper ✨ "Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model"

PPE ✨ Repository for our CVPR'2022 paper: Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-

Zipeng Xu 34 Nov 28, 2022
Code for CVPR 2022 paper "SoftGroup for Instance Segmentation on 3D Point Clouds"

SoftGroup We provide code for reproducing results of the paper SoftGroup for 3D Instance Segmentation on Point Clouds (CVPR 2022) Author: Thang Vu, Ko

Thang Vu 231 Dec 27, 2022
Code for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"

Bailando Code for CVPR 2022 (oral) paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory" [Paper] | [Project Page] | [Vi

Li Siyao 237 Dec 29, 2022
A Joint Video and Image Encoder for End-to-End Retrieval

Frozen️ in Time ❄️ ️️️️ ⏳ A Joint Video and Image Encoder for End-to-End Retrieval (arXiv) Repository to contain the code, models, data for end-to-end

null 225 Dec 25, 2022
textspotter - An End-to-End TextSpotter with Explicit Alignment and Attention

An End-to-End TextSpotter with Explicit Alignment and Attention This is initially described in our CVPR 2018 paper. Getting Started Installation Clone

Tong He 323 Nov 10, 2022
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

Microsoft 235 Dec 22, 2022
WACV 2022 Paper - Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching Code based on our WACV 2022 Accepted Paper: https://arxiv.org/pdf/

Andres 13 Dec 17, 2022
Code for the paper "DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks" (ICCV '19)

DewarpNet This repository contains the codes for DewarpNet training. Recent Updates [May, 2020] Added evaluation images and an important note about Ma

CVLab@StonyBrook 354 Jan 1, 2023
Slice a single image into multiple pieces and create a dataset from them

OpenCV Image to Dataset Converter Slice a single image of Persian digits into mu

Meysam Parvizi 14 Dec 29, 2022
Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo Thomas Kollar, Michael Laskey, Kevin Stone, Brijen Thananjeyan

null 68 Dec 14, 2022
CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering" official PyTorch implementation.

LED2-Net This is PyTorch implementation of our CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering". Y

Fu-En Wang 83 Jan 4, 2023
Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

Christian Bartz 496 Jan 5, 2023
caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection Abstract This is a caffe re-implementation of R2CNN: Rotational Region CNN fo

candler 80 Dec 28, 2021
This is a tensorflow re-implementation of PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network.My blog:

PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network Introduction This is a tensorflow re-implementation of PSENet: Shape Robu

Michael liu 498 Dec 30, 2022
PSENet - Shape Robust Text Detection with Progressive Scale Expansion Network.

News Python3 implementations of PSENet [1], PAN [2] and PAN++ [3] are released at https://github.com/whai362/pan_pp.pytorch. [1] W. Wang, E. Xie, X. L

null 1.1k Dec 24, 2022
Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Head Detector Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd. The head_detection mod

Ramana Subramanyam 76 Dec 6, 2022
Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

SynthText Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Ved

Ankush Gupta 1.8k Dec 28, 2022