Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

Overview

bottom-up-attention

This code implements a bottom-up attention model, based on multi-gpu training of Faster R-CNN with ResNet-101, using object and attribute annotations from Visual Genome.

The pretrained model generates output features corresponding to salient image regions. These bottom-up attention features can typically be used as a drop-in replacement for CNN features in attention-based image captioning and visual question answering (VQA) models. This approach was used to achieve state-of-the-art image captioning performance on MSCOCO (CIDEr 117.9, BLEU_4 36.9) and to win the 2017 VQA Challenge (70.3% overall accuracy), as described in:

Some example object and attribute predictions for salient image regions are illustrated below.

teaser-bike teaser-oven

Note: This repo only includes code for training the bottom-up attention / Faster R-CNN model (section 3.1 of the paper). The actual captioning model (section 3.2) is available in a separate repo here.

Reference

If you use our code or features, please cite our paper:

@inproceedings{Anderson2017up-down,
  author = {Peter Anderson and Xiaodong He and Chris Buehler and Damien Teney and Mark Johnson and Stephen Gould and Lei Zhang},
  title = {Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering},
  booktitle={CVPR},
  year = {2018}
}

Disclaimer

This code is modified from py-R-FCN-multiGPU, which is in turn modified from py-faster-rcnn code. Please refer to these links for further README information (for example, relating to other models and datasets included in the repo) and appropriate citations for these works. This README only relates to Faster R-CNN trained on Visual Genome.

License

bottom-up-attention is released under the MIT License (refer to the LICENSE file for details).

Pretrained features

For ease-of-use, we make pretrained features available for the entire MSCOCO dataset. It is not necessary to clone or build this repo to use features downloaded from the links below. Features are stored in tsv (tab-separated-values) format that can be read with tools/read_tsv.py.

LINKS HAVE BEEN UPDATED TO GOOGLE CLOUD STORAGE (14 Feb 2021)

10 to 100 features per image (adaptive):

36 features per image (fixed):

Both sets of features can be recreated by using tools/generate_tsv.py with the appropriate pretrained model and with MIN_BOXES/MAX_BOXES set to either 10/100 or 36/36 respectively - refer Demo.

Contents

  1. Requirements: software
  2. Requirements: hardware
  3. Basic installation
  4. Demo
  5. Training
  6. Testing

Requirements: software

  1. Important Please use the version of caffe contained within this repository.

  2. Requirements for Caffe and pycaffe (see: Caffe installation instructions)

Note: Caffe must be built with support for Python layers and NCCL!

# In your Makefile.config, make sure to have these lines uncommented
WITH_PYTHON_LAYER := 1
USE_NCCL := 1
# Unrelatedly, it's also recommended that you use CUDNN
USE_CUDNN := 1
  1. Python packages you might not have: cython, python-opencv, easydict
  2. Nvidia's NCCL library which is used for multi-GPU training https://github.com/NVIDIA/nccl

Requirements: hardware

Any NVIDIA GPU with 12GB or larger memory is OK for training Faster R-CNN ResNet-101.

Installation

  1. Clone the repository
git clone https://github.com/peteanderson80/bottom-up-attention/
  1. Build the Cython modules

    cd $REPO_ROOT/lib
    make
  2. Build Caffe and pycaffe

    cd $REPO_ROOT/caffe
    # Now follow the Caffe installation instructions here:
    #   http://caffe.berkeleyvision.org/installation.html
    
    # If you're experienced with Caffe and have all of the requirements installed
    # and your Makefile.config in place, then simply do:
    make -j8 && make pycaffe

Demo

  1. Download pretrained model, and put it under data\faster_rcnn_models.

  2. Run tools/demo.ipynb to show object and attribute detections on demo images.

  3. Run tools/generate_tsv.py to extract bounding box features to a tab-separated-values (tsv) file. This will require modifying the load_image_ids function to suit your data locations. To recreate the pretrained feature files with 10 to 100 features per image, set MIN_BOXES=10 and MAX_BOXES=100. To recreate the pretrained feature files with 36 features per image, set MIN_BOXES=36 and MAX_BOXES=36 use this alternative pretrained model instead. The alternative pretrained model was trained for fewer iterations but performance is similar.

Training

  1. Download the Visual Genome dataset. Extract all the json files, as well as the image directories VG_100K and VG_100K_2 into one folder $VGdata.

  2. Create symlinks for the Visual Genome dataset

    cd $REPO_ROOT/data
    ln -s $VGdata vg
  3. Generate xml files for each image in the pascal voc format (this will take some time). This script will extract the top 2500/1000/500 objects/attributes/relations and also does basic cleanup of the visual genome data. Note however, that our training code actually only uses a subset of the annotations in the xml files, i.e., only 1600 object classes and 400 attribute classes, based on the hand-filtered vocabs found in data/genome/1600-400-20. The relevant part of the codebase is lib/datasets/vg.py. Relation labels can be included in the data layers but are currently not used.

    cd $REPO_ROOT
    ./data/genome/setup_vg.py
  4. Please download the ImageNet-pre-trained ResNet-100 model manually, and put it into $REPO_ROOT/data/imagenet_models

  5. You can train your own model using ./experiments/scripts/faster_rcnn_end2end_multi_gpu_resnet_final.sh (see instructions in file). The train (95k) / val (5k) / test (5k) splits are in data/genome/{split}.txt and have been determined using data/genome/create_splits.py. To avoid val / test set contamination when pre-training for MSCOCO tasks, for images in both datasets these splits match the 'Karpathy' COCO splits.

    Trained Faster-RCNN snapshots are saved under:

    output/faster_rcnn_resnet/vg/
    

    Logging outputs are saved under:

    experiments/logs/
    
  6. Run tools/review_training.ipynb to visualize the training data and predictions.

Testing

  1. The model will be tested on the validation set at the end of training, or models can be tested directly using tools/test_net.py, e.g.:

    ./tools/test_net.py --gpu 0 --imdb vg_1600-400-20_val --def models/vg/ResNet-101/faster_rcnn_end2end_final/test.prototxt --cfg experiments/cfgs/faster_rcnn_end2end_resnet.yml --net data/faster_rcnn_models/resnet101_faster_rcnn_final.caffemodel > experiments/logs/eval.log 2<&1
    

    Mean AP is reported separately for object prediction and attibute prediction (given ground-truth object detections). Test outputs are saved under:

    output/faster_rcnn_resnet/vg_1600-400-20_val/<network snapshot name>/
    

Expected detection results for the pretrained model

objects [email protected] objects weighted [email protected] attributes [email protected] attributes weighted [email protected]
Faster R-CNN, ResNet-101 10.2% 15.1% 7.8% 27.8%

Note that mAP is relatively low because many classes overlap (e.g. person / man / guy), some classes can't be precisely located (e.g. street, field) and separate classes exist for singular and plural objects (e.g. person / people). We focus on performance in downstream tasks (e.g. image captioning, VQA) rather than detection performance.

Comments
  • Hardware Requirement, does eval need that 12G GPU memory?

    Hardware Requirement, does eval need that 12G GPU memory?

    F0906 11:11:48.665238   945 syncedmem.cpp:71] Check failed: error == cudaSuccess (2 vs. 0)  out of memory
    

    i got this error when running demo.ipynb with a new picture size 416x449, but example pictures running success.

    opened by demobin8 18
  • why using the VG data to train the Faster-RCNN model

    why using the VG data to train the Faster-RCNN model

    Thanks for sharing the models and features. I have tried the feature for VQA with my own model, really surprising results indeed :)
    I have two questions as follows:

    1. As the VQA datasets is based on the images of MSCOCO, will it be better to train the faster rcnn model on the COCO dataset directly?
    2. Could a better object detection model, e.g., R-fcn or Deformable R-fcn further improve the VQA performance?
    opened by yuzcccc 9
  • I can not download pretrained model

    I can not download pretrained model

    I can not download pretrained model with the link https://www.dropbox.com/s/5xethd2nxa8qrnq/resnet101_faster_rcnn_final.caffemodel?dl=1 . Are there any other links to download? thanks

    opened by xpchen0 6
  • binascii.Error: Incorrect padding

    binascii.Error: Incorrect padding

    Got binascii.Error: Incorrect padding when reading image 300104 from test2014/test2014_resnet101_faster_rcnn_genome.tsv.1 with tools/read_tsv.py. Anything wrong?

    Traceback (most recent call last):
      File "read_tsv.py", line 64, in <module>
        read_and_save(os.path.join(in_dir, in_file), out_dir)
      File "read_tsv.py", line 45, in read_and_save
        item['features'] = np.frombuffer(base64.decodestring(item['features']), dtype=np.float32).reshape((item['num_boxes'], -1))
      File "/usr/lib64/python2.7/base64.py", line 321, in decodestring
        return binascii.a2b_base64(s)
    binascii.Error: Incorrect padding
    
    opened by cswhjiang 6
  • I recreate the pretrained feature files with 36 features per image, but the rois num of some images does not have 36?

    I recreate the pretrained feature files with 36 features per image, but the rois num of some images does not have 36?

    I want recreate visual genome feature with 36 features per image. but I find some images roi nums <36; how can I make rois nums >=36; image where the five nums means: rois.shape, max_conf.shape, np.argsort(max_conf)[::-1].shape,img.h,img.w

    roisnum <36,it cannot get 36 features per image.

    opened by WeiqiuChen 4
  • Caffe Installation failed

    Caffe Installation failed

    I've struggled with Caffe installation for several days. It gives me a headache with fixing one error and then moves to the next error. Can I ask for :

    • Python Version
    • CUDA Version
    • CUDNN Version
    • NCCL Version
    • OS Version That you guys use?

    I'm using Ubuntu 16.04 inside GCP This is information I got from cMake :

    -- Boost version: 1.58.0
    -- Found the following Boost libraries:
    --   system
    --   thread
    --   filesystem
    --   chrono
    --   date_time
    --   atomic
    -- Found gflags  (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libgflags.so)
    -- Found glog    (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libglog.so)
    -- Found PROTOBUF Compiler: /usr/bin/protoc
    -- Found lmdb    (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/liblmdb.so)
    -- Found LevelDB (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libleveldb.so)
    -- Found Snappy  (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libsnappy.so)
    -- CUDA detected: 7.5
    -- Found cuDNN: ver. 8.1.1 found (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libcudnn.so)
    -- Added CUDA NVCC flags for: sm_37
    -- OpenCV found (/usr/share/OpenCV)
    -- Found Atlas (include: /usr/include, library: /usr/lib/libatlas.so)
    -- NumPy ver. 1.11.0 found (include: /usr/lib/python2.7/dist-packages/numpy/core/include)
    -- Boost version: 1.58.0
    -- Found the following Boost libraries:
    --   python
    -- Could NOT find Doxygen (missing:  DOXYGEN_EXECUTABLE) 
    -- 
    -- ******************* Caffe Configuration Summary *******************
    -- General:
    --   Version           :   1.0.0-rc3
    --   Git               :   514e561-dirty
    --   System            :   Linux
    --   C++ compiler      :   /usr/bin/c++
    --   Release CXX flags :   -O3 -DNDEBUG -fPIC -Wall -Wno-sign-compare -Wno-uninitialized
    --   Debug CXX flags   :   -g -fPIC -Wall -Wno-sign-compare -Wno-uninitialized
    --   Build type        :   Release
    -- 
    --   BUILD_SHARED_LIBS :   ON
    --   BUILD_python      :   ON
    --   BUILD_matlab      :   OFF
    --   BUILD_docs        :   ON
    --   CPU_ONLY          :   OFF
    --   USE_OPENCV        :   ON
    --   USE_LEVELDB       :   ON
    --   USE_LMDB          :   ON
    --   USE_NCCL          :   OFF
    --   ALLOW_LMDB_NOLOCK :   OFF
    -- 
    -- Dependencies:
    --   BLAS              :   Yes (Atlas)
    --   Boost             :   Yes (ver. 1.58)
    --   glog              :   Yes
    --   gflags            :   Yes
    --   protobuf          :   Yes (ver. 2.6.1)
    --   lmdb              :   Yes (ver. 0.9.17)
    --   LevelDB           :   Yes (ver. 1.18)
    --   Snappy            :   Yes (ver. 1.1.3)
    --   OpenCV            :   Yes (ver. 2.4.9.1)
    --   CUDA              :   Yes (ver. 7.5)
    -- 
    -- NVIDIA CUDA:
    --   Target GPU(s)     :   Auto
    --   GPU arch(s)       :   sm_37
    --   cuDNN             :   Yes (ver. 8.1.1)
    -- 
    -- Python:
    --   Interpreter       :   /usr/bin/python2.7 (ver. 2.7.12)
    --   Libraries         :   /usr/lib/x86_64-linux-gnu/libpython2.7.so (ver 2.7.12)
    --   NumPy             :   /usr/lib/python2.7/dist-packages/numpy/core/include (ver 1.11.0)
    -- 
    -- Documentaion:
    --   Doxygen           :   No
    --   config_file       :   
    -- 
    -- Install:
    --   Install path      :   /home/vinson.ciawandy/bottom-up-attention/caffe/build/install
    -- 
    -- Configuring done
    -- Generating done
    -- Build files have been written to: /home/vinson.ciawandy/bottom-up-attention/caffe/build
    

    This is my last error where I'm giving up the installation:

    make all
    [  1%] Running C++/Python protocol buffer compiler on /home/vinson.ciawandy/bottom-up-attention/caffe/src/caffe/proto/caffe.proto
    Scanning dependencies of target proto
    [  1%] Building CXX object src/caffe/CMakeFiles/proto.dir/__/__/include/caffe/proto/caffe.pb.cc.o
    [  1%] Linking CXX static library ../../lib/libproto.a
    [  1%] Built target proto
    [  1%] Building NVCC (Device) object src/caffe/CMakeFiles/cuda_compile.dir/layers/cuda_compile_generated_power_layer.cu.o
    In file included from /home/vinson.ciawandy/bottom-up-attention/caffe/include/caffe/util/device_alternate.hpp:40:0,
                     from /home/vinson.ciawandy/bottom-up-attention/caffe/include/caffe/common.hpp:19,
                     from /home/vinson.ciawandy/bottom-up-attention/caffe/include/caffe/blob.hpp:8,
                     from /home/vinson.ciawandy/bottom-up-attention/caffe/include/caffe/layers/power_layer.hpp:6,
                     from /home/vinson.ciawandy/bottom-up-attention/caffe/src/caffe/layers/power_layer.cu:3:
    /home/vinson.ciawandy/bottom-up-attention/caffe/include/caffe/util/cudnn.hpp:169:2: error: #endif without #if
     #endif  // CAFFE_UTIL_CUDNN_H_
      ^
    CMake Error at cuda_compile_generated_power_layer.cu.o.cmake:207 (message):
      Error generating
      /home/vinson.ciawandy/bottom-up-attention/caffe/build/src/caffe/CMakeFiles/cuda_compile.dir/layers/./cuda_compile_generated_power_layer.cu.o
    
    
    src/caffe/CMakeFiles/caffe.dir/build.make:525: recipe for target 'src/caffe/CMakeFiles/cuda_compile.dir/layers/cuda_compile_generated_power_layer.cu.o' failed
    make[2]: *** [src/caffe/CMakeFiles/cuda_compile.dir/layers/cuda_compile_generated_power_layer.cu.o] Error 1
    CMakeFiles/Makefile2:272: recipe for target 'src/caffe/CMakeFiles/caffe.dir/all' failed
    make[1]: *** [src/caffe/CMakeFiles/caffe.dir/all] Error 2
    Makefile:127: recipe for target 'all' failed
    make: *** [all] Error 2
    

    Or can I just ditch the Caffe inside this repo and just use the official one? Thanks

    opened by vinson2233 3
  • I run python demo.py failed!

    I run python demo.py failed!

    I was download the model file first. when I run demo.py get message like this: I1017 11:02:41.250684 40048 net.cpp:131] Top shape: 1 2 126 14 (3528) I1017 11:02:41.250689 40048 net.cpp:139] Memory required for data: 117482412 I1017 11:02:41.250692 40048 layer_factory.hpp:77] Creating layer rpn_cls_prob_reshape I1017 11:02:41.250699 40048 net.cpp:86] Creating Layer rpn_cls_prob_reshape I1017 11:02:41.250704 40048 net.cpp:408] rpn_cls_prob_reshape <- rpn_cls_prob I1017 11:02:41.250713 40048 net.cpp:382] rpn_cls_prob_reshape -> rpn_cls_prob_reshape I1017 11:02:41.250741 40048 net.cpp:124] Setting up rpn_cls_prob_reshape I1017 11:02:41.250747 40048 net.cpp:131] Top shape: 1 18 14 14 (3528) I1017 11:02:41.250751 40048 net.cpp:139] Memory required for data: 117496524 I1017 11:02:41.250756 40048 layer_factory.hpp:77] Creating layer proposal F1017 11:02:41.250799 40048 layer_factory.hpp:81] Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: Python (known types: AbsVal, Accuracy, ArgMax, BNLL, BatchNorm, BatchReindex, Bias, BoxAnnotatorOHEM, Concat, ContrastiveLoss, Convolution, Crop, Data, Deconvolution, Dropout, DummyData, ELU, Eltwise, Embed, EuclideanLoss, Exp, Filter, Flatten, HDF5Data, HDF5Output, HingeLoss, Im2col, ImageData, InfogainLoss, InnerProduct, InnerProductBlob, Input, LRN, LSTM, LSTMUnit, Log, MVN, MemoryData, MultinomialLogisticLoss, PReLU, PSROIPooling, Parameter, Pooling, Power, RNN, ROIPooling, ReLU, Reduction, Reshape, SPP, Scale, Sigmoid, SigmoidCrossEntropyLoss, Silence, Slice, SmoothL1Loss, SmoothL1LossOHEM, Softmax, SoftmaxWithLoss, SoftmaxWithLossOHEM, Split, TanH, Threshold, Tile, WindowData) *** Check failure stack trace: *** Aborted (core dumped)

    opened by wanesta 3
  • activation of the relation prediction

    activation of the relation prediction

    @peteanderson80 I saw the option "HAS_RELATION" in the cfg file. I turned it on and add a top[6] data for the proposal_target_layer and set the param num_rel_classes to 21(I am not sure if this is correct for the vg_1600-400-20 dataset) and start training, I got the following error:

    File "/home/work/bottom-up-attention/tools/../lib/roi_data_layer/minibatch.py", line 55, in get_minibatch
        "Generation of gt_relations doesn't accomodate dropping objects"
    AssertionError: Generation of gt_relations doesn't accomodate dropping objects
    

    Is there something wrong with my setting?

    opened by yuzcccc 3
  • downloading features is really slow

    downloading features is really slow

    Hello,

    the links to download features are very slow, and often they break right in the middle of the download. Is it possible to speed it up somehow / get an alternative link?

    Thanks.

    opened by nilinykh 2
  • how can i generate roi feature?

    how can i generate roi feature?

    hello, I have prepared the coordinates data of flickr30k ground truth boxes, I really want to use your code to generate feature vectors with 2048 dimension of my own ground truth boxes, could you please give me some idea?

    opened by Zhao-Yuting 2
  • Can not download the pretrained model.

    Can not download the pretrained model.

    I can't open this pretrained model link for downloading the pretrained model. It shows: UserProjectAccountProblem User project billing account not in good standing.

    The billing account for project 704337700738 is disabled in state delinquent
    Could anyone help me about this?
    opened by LongForCMU 2
  • Would someone please help with generating the features?

    Would someone please help with generating the features?

    I'm wondering whether someone would please share the extracted features from dataset Flickr30k? tsv file is just fine, in a setting of MAX 36 MIN 36, which includes the io-boxes and features of dimension-2048. I tried days for fixing the environment issues but still failed.

    opened by Wenjun-Wu 1
  • how to train vg/VGG16/faster_rcnn_end2end_attr_softmax_primed/ and vg/VGG16/faster_rcnn_end2end_attr

    how to train vg/VGG16/faster_rcnn_end2end_attr_softmax_primed/ and vg/VGG16/faster_rcnn_end2end_attr

    Hi, I use VGG16.v2.caffemodel as the pre-train model. However, i can not find the .yml of faster_rcnn_end2end_attr and faster_rcnn_end2end_attr_softmax_primed. Could you provide it?

    opened by freedom6927 0
  • How to run it on google colab

    How to run it on google colab

    Hi, I want to run it on google colab. I tried installing caffe to colab while cloning this repository as well but at the end the caffe installation has lots of issues. Also, colab uses python 3.7 as default version.

    Can you provide the steps to run it on colab?

    opened by ifmaq1 1
  •  No module named 'caffe._caffe'

    No module named 'caffe._caffe'

    When I run the demo,then output:

    from ._caffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, \
    ModuleNotFoundError: No module named 'caffe._caffe'
    
    

    How can I solve this problem? btw ,I use python3 to run the code:)

    opened by Yinhance 1
Image Captioning using CNN ,LSTM and Attention

Image Captioning using CNN ,LSTM and Attention This is a deeplearning model which tries to summarize an image into a text . Installation Install this

ASUTOSH GHANTO 1 Dec 16, 2021
Simple image captioning model - CLIP prefix captioning.

Simple image captioning model - CLIP prefix captioning.

null 688 Jan 4, 2023
Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

PyTorch Implementation of Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers 1 Using Colab Please notic

Hila Chefer 489 Jan 7, 2023
improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

null 310 Dec 28, 2022
Optimized code based on M2 for faster image captioning training

Transformer Captioning This repository contains the code for Transformer-based image captioning. Based on meshed-memory-transformer, we further optimi

lyricpoem 16 Dec 16, 2022
NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

null 5 Nov 3, 2022
An attempt at the implementation of Glom, Geoffrey Hinton's new idea that integrates neural fields, predictive coding, top-down-bottom-up, and attention (consensus between columns)

GLOM - Pytorch (wip) An attempt at the implementation of Glom, Geoffrey Hinton's new idea that integrates neural fields, predictive coding,

Phil Wang 173 Dec 14, 2022
Implementation of Bidirectional Recurrent Independent Mechanisms (Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules)

BRIMs Bidirectional Recurrent Independent Mechanisms Implementation of the paper Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neura

Sarthak Mittal 26 May 26, 2022
Image Captioning using CNN and Transformers

Image-Captioning Keras/Tensorflow Image Captioning application using CNN and Transformer as encoder/decoder. In particulary, the architecture consists

null 24 Dec 28, 2022
Attention-based CNN-LSTM and XGBoost hybrid model for stock prediction

Attention-based CNN-LSTM and XGBoost hybrid model for stock prediction Requirements The code has been tested running under Python 3.7.4, with the foll

zshicode 84 Jan 1, 2023
A Pytorch Implementation of Source Data-free Domain Adaptation for a Faster R-CNN

A Pytorch Implementation of Source Data-free Domain Adaptation for a Faster R-CNN Please follow Faster R-CNN and DAF to complete the environment confi

null 2 Jan 12, 2022
Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

SimplePose Code and pre-trained models for our paper, “Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation”, a

Jia Li 256 Dec 24, 2022
3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks

3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks Introduction This repository contains the code and models for the follo

null 124 Jan 6, 2023
Bottom-up Human Pose Estimation

Introduction This is the official code of Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation. This paper has been accepted to CVPR2

null 108 Dec 1, 2022
This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression Introduction In this paper, we are interested in the bottom-up paradigm of estima

HRNet 367 Dec 27, 2022
minimizer-space de Bruijn graphs (mdBG) for whole genome assembly

rust-mdbg: Minimizer-space de Bruijn graphs (mdBG) for whole-genome assembly rust-mdbg is an ultra-fast minimizer-space de Bruijn graph (mdBG) impleme

Barış Ekim 148 Dec 1, 2022
Pipeline code for Sequential-GAM(Genome Architecture Mapping).

Sequential-GAM Pipeline code for Sequential-GAM(Genome Architecture Mapping). mapping whole_preprocess.sh include the whole processing of mapping. usa

null 3 Nov 3, 2022
Gapmm2: gapped alignment using minimap2 (align transcripts to genome)

gapmm2: gapped alignment using minimap2 This tool is a wrapper for minimap2 to r

Jon Palmer 2 Jan 27, 2022
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

STAM - Pytorch Implementation of STAM (Space Time Attention Model), yet another pure and simple SOTA attention model that bests all previous models in

Phil Wang 109 Dec 28, 2022