The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

Overview

3D Human Pose Estimation with Spatial and Temporal Transformers

This repo is the official implementation for 3D Human Pose Estimation with Spatial and Temporal Transformers.

Video Demonstration

PoseFormer Architecture

Video Demo

3D HPE on Human3.6M

3D HPE on videos in-the-wild using PoseFormer

Our code is built on top of VideoPose3D.

Environment

The code is developed and tested under the following environment

  • Python 3.8.2
  • PyTorch 1.7.1
  • CUDA 11.0

You can create the environment:

conda env create -f poseformer.yml

Dataset

Our code is compatible with the dataset setup introduced by Martinez et al. and Pavllo et al.. Please refer to VideoPose3D to set up the Human3.6M dataset (./data directory).

Evaluating pre-trained models

We provide the pre-trained 81-frame model (CPN detected 2D pose as input) here. To evaluate it, put it into the ./checkpoint directory and run:

python run_poseformer.py -k cpn_ft_h36m_dbb -f 81 -c checkpoint --evaluate detected81f.bin

We also provide pre-trained 81-frame model (Ground truth 2D pose as input) here. To evaluate it, put it into the ./checkpoint directory and run:

python run_poseformer.py -k gt -f 81 -c checkpoint --evaluate gt81f.bin

Training new models

  • To train a model from scratch (CPN detected 2D pose as input), run:
python run_poseformer.py -k cpn_ft_h36m_dbb -f 27 -lr 0.0001 -lrd 0.99

-f controls how many frames are used as input. 27 frames achieves 47.0 mm, 81 frames achieves achieves 44.3 mm.

  • To train a model from scratch (Ground truth 2D pose as input), run:
python run_poseformer.py -k gt -f 81 -lr 0.0001 -lrd 0.99

81 frames achieves 31.3 mm (MPJPE).

Visualization and other functions

We keep our code consistent with VideoPose3D. Please refer to their project page for further information.

Bibtex

If you find our work useful in your research, please consider citing:

@article{zheng20213d,
title={3D Human Pose Estimation with Spatial and Temporal Transformers},
author={Zheng, Ce and Zhu, Sijie and Mendieta, Matias and Yang, Taojiannan and Chen, Chen and Ding, Zhengming},
journal={arXiv preprint arXiv:2103.10455},
year={2021}
}

Acknowledgement

Part of our code is borrowed from VideoPose3D. We thank the authors for releasing the codes.

Comments
  • Could you please provide the traning log?

    Could you please provide the traning log?

    I follow the setting as your paper and train the model with 27 frame, but I can't get the performance as your paper said. This is my traning log [1] time 20.00 lr 0.000200 3d_train 79.10 3d_valid 64.31 [2] time 19.75 lr 0.000196 3d_train 42.98 3d_valid 59.67 [3] time 19.77 lr 0.000192 3d_train 34.62 3d_valid 58.55 [4] time 19.88 lr 0.000188 3d_train 29.96 3d_valid 57.28 [5] time 19.83 lr 0.000184 3d_train 26.81 3d_valid 56.05 [6] time 19.86 lr 0.000181 3d_train 24.47 3d_valid 55.73 [7] time 19.82 lr 0.000177 3d_train 22.63 3d_valid 54.94 [8] time 19.88 lr 0.000174 3d_train 21.14 3d_valid 54.19 [9] time 19.84 lr 0.000170 3d_train 19.91 3d_valid 54.72 [10] time 19.86 lr 0.000167 3d_train 18.87 3d_valid 53.88 [11] time 19.77 lr 0.000163 3d_train 17.99 3d_valid 54.01 [12] time 19.74 lr 0.000160 3d_train 17.25 3d_valid 54.11 [13] time 19.68 lr 0.000157 3d_train 16.62 3d_valid 54.26 [14] time 19.74 lr 0.000154 3d_train 16.08 3d_valid 54.30 [15] time 19.66 lr 0.000151 3d_train 15.60 3d_valid 54.15 [16] time 19.80 lr 0.000148 3d_train 15.17 3d_valid 54.57 [17] time 19.85 lr 0.000145 3d_train 14.79 3d_valid 54.10 [18] time 19.95 lr 0.000142 3d_train 14.45 3d_valid 53.76 [19] time 19.80 lr 0.000139 3d_train 14.15 3d_valid 53.93 [20] time 19.84 lr 0.000136 3d_train 13.86 3d_valid 54.03 [21] time 19.84 lr 0.000134 3d_train 13.60 3d_valid 54.89 [22] time 19.86 lr 0.000131 3d_train 13.36 3d_valid 54.32 [23] time 19.87 lr 0.000128 3d_train 13.14 3d_valid 53.97 [24] time 19.82 lr 0.000126 3d_train 12.94 3d_valid 54.72 [25] time 19.78 lr 0.000123 3d_train 12.75 3d_valid 54.38 [26] time 19.86 lr 0.000121 3d_train 12.58 3d_valid 54.50 [27] time 19.85 lr 0.000118 3d_train 12.42 3d_valid 54.18 [28] time 19.82 lr 0.000116 3d_train 12.25 3d_valid 54.38 [29] time 19.81 lr 0.000114 3d_train 12.13 3d_valid 54.09 [30] time 19.83 lr 0.000111 3d_train 11.99 3d_valid 55.16 [31] time 19.82 lr 0.000109 3d_train 11.87 3d_valid 54.22 [32] time 19.84 lr 0.000107 3d_train 11.75 3d_valid 54.56 [33] time 19.81 lr 0.000105 3d_train 11.64 3d_valid 54.03 [34] time 19.84 lr 0.000103 3d_train 11.54 3d_valid 54.39 [35] time 19.81 lr 0.000101 3d_train 11.43 3d_valid 55.32 [36] time 19.80 lr 0.000099 3d_train 11.33 3d_valid 54.34 [37] time 19.81 lr 0.000097 3d_train 11.24 3d_valid 55.05 [38] time 19.80 lr 0.000095 3d_train 11.15 3d_valid 54.58 [39] time 19.85 lr 0.000093 3d_train 11.07 3d_valid 54.72 [40] time 19.79 lr 0.000091 3d_train 10.99 3d_valid 55.01 [41] time 19.74 lr 0.000089 3d_train 10.91 3d_valid 54.71 [42] time 19.81 lr 0.000087 3d_train 10.83 3d_valid 54.52 [43] time 19.83 lr 0.000086 3d_train 10.76 3d_valid 54.47 [44] time 19.87 lr 0.000084 3d_train 10.69 3d_valid 54.51 [45] time 19.86 lr 0.000082 3d_train 10.63 3d_valid 54.54 [46] time 19.80 lr 0.000081 3d_train 10.56 3d_valid 55.12 [47] time 19.86 lr 0.000079 3d_train 10.50 3d_valid 55.35 [48] time 19.78 lr 0.000077 3d_train 10.44 3d_valid 55.37 [49] time 19.68 lr 0.000076 3d_train 10.39 3d_valid 54.62 [50] time 19.70 lr 0.000074 3d_train 10.32 3d_valid 55.17

    opened by flyyyyer 13
  • Learning Curves

    Learning Curves

    Hi, Could you provide the Train and test Learning Curves for 27f, or when it could get ~47mm, I try to run your code: python run_poseformer.py -k cpn_ft_h36m_dbb -f 27 -lr 0.00004 -lrd 0.99 util epoch 28, the 3d_valid is just 51.88954. Sorry because my GPU too weak, it takes me too long time.

    Thanks, best regards.

    opened by hankhuynh1011 7
  • ValueError on  Custom Video Visualization

    ValueError on Custom Video Visualization

    ValueError: non-broadcastable output operand with shape (1621,1,17,3) doesn't match the broadcast shape (1621,1621,17,3)

    $ python run_poseformer.py -k cpn_ft_h36m_dbb -c checkpoint --evaluate pretrained_h36m_cpn.bin --render --viz-subject S1 --viz-action Directions --viz-camera 0 --viz-video "/home/user/Desktop/Test/PoseFormer-main/data/Video/Directions.mp4" --viz-output output.gif --viz-size 3 --viz-downsample 2 --viz-limit 60

    Loading dataset... Preparing data... Loading 2D detections... INFO: Receptive field: 81 frames INFO: Trainable parameter count: 9602885 Loading checkpoint checkpoint/pretrained_h36m_cpn.bin INFO: Testing on 225676 frames Rendering... Traceback (most recent call last): File "/home/user/anaconda3/envs/pose2/lib/python3.8/runpy.py", line 193, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/user/anaconda3/envs/pose2/lib/python3.8/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/user/.vscode/extensions/ms-python.python-2021.9.1246542782/pythonFiles/lib/python/debugpy/main.py", line 45, in cli.main() File "/home/user/.vscode/extensions/ms-python.python-2021.9.1246542782/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 444, in main run() File "/home/user/.vscode/extensions/ms-python.python-2021.9.1246542782/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 285, in run_file runpy.run_path(target_as_str, run_name=compat.force_str("main")) File "/home/user/anaconda3/envs/pose2/lib/python3.8/runpy.py", line 263, in run_path return _run_module_code(code, init_globals, run_name, File "/home/user/anaconda3/envs/pose2/lib/python3.8/runpy.py", line 96, in _run_module_code _run_code(code, mod_globals, init_globals, File "/home/user/anaconda3/envs/pose2/lib/python3.8/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/user/Desktop/Test/PoseFormer-main/run_poseformer.py", line 606, in prediction += trajectory ValueError: non-broadcastable output operand with shape (1621,1,17,3) doesn't match the broadcast shape (1621,1621,17,3)

    opened by nies14 6
  • Unable to resolve anaconda packages to setup environment

    Unable to resolve anaconda packages to setup environment

    When running

    $ conda env create -f poseformer.yml

    I get this output:

    (base) D:\Desktop\PoseFormer-main>conda env create -f poseformer.yml Collecting package metadata (repodata.json): done Solving environment: failed

    ResolvePackageNotFound:

    • pyyaml==5.3.1=py38h7b6447c_1
    • mkl-service==2.3.0=py38he904b0f_0
    • xorg-xproto==7.0.31=h14c3975_1007
    • cudatoolkit==11.0.221=h6bb024c_0
    • libffi==3.2.1=hf484d3e_1007
    • wrapt==1.12.1=py38h7b6447c_1
    • matplotlib-base==3.3.2=py38h91b0d89_0
    • nss==3.57=he751ad9_0
    • libedit==3.1.20191231=h14c3975_1
    • readline==8.0=h7b6447c_0
    • tensorflow-base==2.2.0=mkl_py38h5059a2d_0
    • pillow==7.2.0=py38hb39fc2d_0
    • libtiff==4.1.0=h2733197_1
    • gst-plugins-base==1.14.5=h0935bb2_2
    • tornado==6.0.4=py38h7b6447c_1
    • expat==2.2.9=he6710b0_2
    • jpeg==9d=h516909a_0
    • lz4-c==1.9.2=he6710b0_1
    • nspr==4.29=he1b5a44_0
    • kiwisolver==1.2.0=py38hbf85e49_0
    • scipy==1.5.2=py38h0b6359f_0
    • fontconfig==2.13.1=h1056068_1002
    • psutil==5.7.2=py38h7b6447c_0
    • libwebp-base==1.1.0=h7b6447c_3
    • mysql-libs==8.0.21=hf3661c5_2
    • gmp==6.2.0=he1b5a44_2
    • python==3.8.2=hcf32534_0
    • cffi==1.14.0=py38h2e261b9_0
    • qt==5.12.9=h1f2b2cb_0
    • libsodium==1.0.18=h7b6447c_0
    • mkl_random==1.1.1=py38h0573a6f_0
    • libevent==2.1.10=hcdb4288_2
    • jasper==1.900.1=hd497a04_4
    • krb5==1.17.1=h173b8e3_0
    • freetype==2.10.2=h5ab3b9f_0
    • libllvm10==10.0.1=hbcb73fb_5
    • openssl==1.1.1i=h27cfd23_0
    • libuuid==2.32.1=h14c3975_1000
    • libpng==1.6.37=hbc83047_0
    • pcre==8.44=he6710b0_0
    • libiconv==1.16=h516909a_0
    • pyzmq==19.0.2=py38he6710b0_1
    • xorg-kbproto==1.0.7=h14c3975_1002
    • tensorflow==2.2.0=mkl_py38h6d3daf0_0
    • brotlipy==0.7.0=py38h7b6447c_1000
    • certifi==2020.12.5=py38h06a4308_0
    • pytorch==1.7.1=py3.8_cuda11.0.221_cudnn8.0.5_0
    • aiohttp==3.7.2=py38h1e0a361_0
    • sqlite==3.33.0=h62c20be_0
    • lame==3.100=h7b6447c_0
    • libgcc-ng==9.1.0=hdf63c60_0
    • ld_impl_linux-64==2.33.1=h53a641e_7
    • libxcb==1.14=h7b6447c_0
    • nettle==3.4.1=hbb512f6_0
    • xorg-libx11==1.6.12=h516909a_0
    • icu==67.1=he1b5a44_0
    • cairo==1.16.0=h3fc0475_1005
    • pixman==0.38.0=h7b6447c_0
    • ca-certificates==2021.1.19=h06a4308_0
    • libxkbcommon==0.10.0=he1b5a44_0
    • ninja==1.10.1=py38hfd86e86_0
    • xz==5.2.5=h7b6447c_0
    • zeromq==4.3.2=he6710b0_3
    • numpy-base==1.19.2=py38hfa32c7d_0
    • gnutls==3.6.13=h79a8f9a_0
    • libprotobuf==3.13.0=hd408876_0
    • dbus==1.13.16=hb2f20db_0
    • glib==2.63.1=h5a9c865_0
    • libuv==1.40.0=h7b6447c_0
    • libpq==12.3=h5513abc_0
    • gstreamer==1.14.5=h36ae1b5_2
    • multidict==4.7.5=py38h1e0a361_2
    • xorg-libice==1.0.10=h516909a_0
    • protobuf==3.13.0=py38he6710b0_1
    • mkl_fft==1.2.0=py38h23d657b_0
    • libstdcxx-ng==9.1.0=hdf63c60_0
    • yarl==1.6.2=py38h1e0a361_0
    • cython==0.29.21=py38he6710b0_0
    • x264==1!152.20180806=h7b6447c_0
    • libxml2==2.9.10=h68273f3_2
    • lcms2==2.11=h396b838_0
    • grpcio==1.33.2=py38heead2fc_0
    • harfbuzz==2.7.2=hee91db6_0
    • zstd==1.4.5=h9ceee32_0
    • h5py==2.10.0=py38hd6299e0_1
    • bzip2==1.0.8=h7b6447c_0
    • pandas==1.1.1=py38he6710b0_0
    • c-ares==1.16.1=h516909a_3
    • tk==8.6.10=hbc83047_0
    • xorg-xextproto==7.3.0=h14c3975_1002
    • yaml==0.2.5=h7b6447c_0
    • hdf5==1.10.6=hb1b8bf9_0
    • libclang==10.0.1=default_hb85057a_2
    • xorg-libsm==1.2.3=h84519dc_1000
    • xorg-libxext==1.3.4=h516909a_0
    • xorg-libxrender==0.9.10=h516909a_1002
    • openh264==2.1.1=h8b12597_0
    • xorg-renderproto==0.11.1=h14c3975_1002
    • libgfortran-ng==7.3.0=hdf63c60_0
    • ncurses==6.2=he6710b0_1
    • cryptography==3.1.1=py38h1ba5d50_0
    • zlib==1.2.11=h7b6447c_3
    • graphite2==1.3.14=h23475e2_0

    Is there a way to fix this? Thanks!

    opened by fortminors 3
  • Question about preprocessing MPII_INF_3DHP?

    Question about preprocessing MPII_INF_3DHP?

    could you give me the details about preprocessing the MPII_INF_3DHP dataset? I follow this script https://github.com/mkocabas/VIBE/blob/master/lib/data_utils/mpii3d_utils.py to preprocess the dataset and train with lr=0.0004, batch_size=512, frame=9, I got much smaller results(MPJPE:50mm) than your paper. is it corrected?

    opened by leechaonan 2
  • question about preprocessing

    question about preprocessing

    I notice you use "norm_coordinate_screnn" to preprocess data. But I don't know whether the coordinates of the original points are absolute coordinate on the original image or not. Because I notice your method just processes a single human pose, the original coordinate before "norm_coordinate_screnn" is just absolute coordinate or processed coordinate?

    opened by shoutOutYangJie 1
  • RuntimeError: CUDA out of memory

    RuntimeError: CUDA out of memory

    Trying to run the pre-trained model & training new models...Getting bellow error

    RuntimeError: CUDA out of memory. Tried to allocate 366.00 MiB (GPU 0; 7.79 GiB total capacity; 4.50 GiB already allocated; 303.38 MiB free; 4.62 GiB reserved in total by PyTorch)

    Reducing batch size also doesn't work

    Processor: Core i9 Ram: 64 GB GPU: RTX 3070

    The Traceback: ' Traceback (most recent call last): File "run_poseformer.py", line 309, in predicted_3d_pos = model_pos_train(inputs_2d) File "/home/user/anaconda3/envs/pose2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/user/anaconda3/envs/pose2/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward return self.module(*inputs[0], **kwargs[0]) File "/home/user/anaconda3/envs/pose2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/user/Desktop/Research/PoseFormer-main/common/model_poseformer.py", line 178, in forward x = self.Spatial_forward_features(x) File "/home/user/Desktop/Research/PoseFormer-main/common/model_poseformer.py", line 154, in Spatial_forward_features x = blk(x) File "/home/user/anaconda3/envs/pose2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/user/Desktop/Research/PoseFormer-main/common/model_poseformer.py", line 81, in forward x = x + self.drop_path(self.attn(self.norm1(x))) File "/home/user/anaconda3/envs/pose2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/user/Desktop/Research/PoseFormer-main/common/model_poseformer.py", line 56, in forward attn = (q @ k.transpose(-2, -1)) * self.scale RuntimeError: CUDA out of memory. Tried to allocate 366.00 MiB (GPU 0; 7.79 GiB total capacity; 4.50 GiB already allocated; 303.38 MiB free; 4.62 GiB reserved in total by PyTorch) '

    opened by nies14 0
  • Question about 'spatial pose embedding'

    Question about 'spatial pose embedding'

    Dear author,

    Thanks for your great work! I have a simple question waiting for your reply.

    I saw you just create a learnable parameter initialized with zero and indicate as spatial pose embedding.

    self.Spatial_pos_embed = nn.Parameter(torch.zeros(1, num_joints, embed_dim_ratio))

    I wonder how it works to retain positional information of the sequence?

    opened by zhywanna 1
  • The test results are of the wrong order of magnitude

    The test results are of the wrong order of magnitude

    When i run the command "python run_poseformer.py -k cpn_ft_h36m_dbb -f 81 -c checkpoint --evaluate detected81f.bin" to evaluate the pre-trained 81-frame model, it produces the following result

    image

    Is there a bug in this code please?

    opened by PJ-Hunter 1
  • Visualization of self-attentions

    Visualization of self-attentions

    Hi I wanna ask how to get predicted outputs (weight i, j) in Fig. 5. Does it mean Softmax(Query j dot product Key i / dimension^0.5) or Softmax(Query j dot product Key i / dimension^0.5) Value i? but the latter one is a vector.

    thx

    opened by erikervalid 1
Owner
Ce Zheng
Ce Zheng
Ce Zheng
Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Dual Encoding for Video Retrieval by Text Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding

null 81 Dec 1, 2022
Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018

Code for the AAAI18 paper PixelLink: Detecting Scene Text via Instance Segmentation, by Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai. Contributions

null 758 Dec 22, 2022
This is the code for our paper DAAIN: Detection of Anomalous and AdversarialInput using Normalizing Flows

Merantix-Labs: DAAIN This is the code for our paper DAAIN: Detection of Anomalous and Adversarial Input using Normalizing Flows which can be found at

Merantix 14 Oct 12, 2022
Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Head Detector Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd. The head_detection mod

Ramana Subramanyam 76 Dec 6, 2022
Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo Thomas Kollar, Michael Laskey, Kevin Stone, Brijen Thananjeyan

null 68 Dec 14, 2022
code for our ICCV 2021 paper "DeepCAD: A Deep Generative Network for Computer-Aided Design Models"

DeepCAD This repository provides source code for our paper: DeepCAD: A Deep Generative Network for Computer-Aided Design Models Rundi Wu, Chang Xiao,

Rundi Wu 85 Dec 31, 2022
This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

TransFG: A Transformer Architecture for Fine-grained Recognition Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-gra

Ju He 307 Jan 3, 2023
Official implementation of "An Image is Worth 16x16 Words, What is a Video Worth?" (2021 paper)

An Image is Worth 16x16 Words, What is a Video Worth? paper Official PyTorch Implementation Gilad Sharir, Asaf Noy, Lihi Zelnik-Manor DAMO Academy, Al

null 213 Nov 12, 2022
CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering" official PyTorch implementation.

LED2-Net This is PyTorch implementation of our CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering". Y

Fu-En Wang 83 Jan 4, 2023
An official PyTorch implementation of the paper "Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences", ICCV 2021.

PyTorch implementation of Learning by Aligning (ICCV 2021) This is an official PyTorch implementation of the paper "Learning by Aligning: Visible-Infr

CV Lab @ Yonsei University 30 Nov 5, 2022
kaldi-asr/kaldi is the official location of the Kaldi project.

Kaldi Speech Recognition Toolkit To build the toolkit: see ./INSTALL. These instructions are valid for UNIX systems including various flavors of Linux

Kaldi 12.3k Jan 5, 2023
Hand gesture detection project with aweome UI implementation.

an awesome hand gesture detection project for you to be creative! Imagination is the limit to do with this project.

AR Ashraf 39 Sep 26, 2022
Official implementation of Character Region Awareness for Text Detection (CRAFT)

CRAFT: Character-Region Awareness For Text detection Official Pytorch implementation of CRAFT text detector | Paper | Pretrained Model | Supplementary

Clova AI Research 2.5k Jan 3, 2023
Official PyTorch implementation for "Mixed supervision for surface-defect detection: from weakly to fully supervised learning"

Mixed supervision for surface-defect detection: from weakly to fully supervised learning [Computers in Industry 2021] Official PyTorch implementation

ViCoS Lab 169 Dec 30, 2022
[BMVC'21] Official PyTorch Implementation of Grounded Situation Recognition with Transformers

Grounded Situation Recognition with Transformers Paper | Model Checkpoint This is the official PyTorch implementation of Grounded Situation Recognitio

Junhyeong Cho 18 Jul 19, 2022
An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

InceptText-Tensorflow An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Orien

GeorgeJoe 115 Dec 12, 2022
A PyTorch implementation of ECCV2018 Paper: TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes A PyTorch implement of TextSnake: A Flexible Representation for Detecting

Prince Wang 417 Dec 12, 2022
An Implementation of the seglink alogrithm in paper Detecting Oriented Text in Natural Images by Linking Segments

Tips: A more recent scene text detection algorithm: PixelLink, has been implemented here: https://github.com/ZJULearning/pixel_link Contents: Introduc

dengdan 484 Dec 7, 2022
This is the implementation of the paper "Gated Recurrent Convolution Neural Network for OCR"

Gated Recurrent Convolution Neural Network for OCR This project is an implementation of the GRCNN for OCR. For details, please refer to the paper: htt

null 90 Dec 22, 2022