A High-Performance Distributed Library for Large-Scale Bundle Adjustment

Overview

MegBA: A High-Performance and Distributed Library for Large-Scale Bundle Adjustment

This repo contains an official implementation of MegBA.

MegBA is a fast and distributed library for large-scale Bundle Adjustment (BA). MegBA has a novel end-to-end vectorised BA algorithm which can fully exploit the massive parallel cores on GPUs, thus speeding up the entire BA computation. It also has a novel distributed BA algorithm that can automatically partition BA problems, and solve BA sub-problems using distributed GPUs. The GPUs synchronise intermediate solving state using network-efficient collective communication, and the synchronisation is designed to minimise communication cost. MegBA has a memory-efficient GPU runtime and it exposes g2o-compatible APIs. Experiments show that MegBA can out-perform state-of-the-art BA libraries (i.e., Ceres and DeepLM) by ~50x and ~5x respectively, in public large-scale BA benchmarks.

Version

  • 2021/12/06 Beta version released! It corresponds to this paper.
  • General version code release (Expected Dec 31 2021)
  • memory-efficient version with implicit Hessian (TBD)
  • analytical differential module, IMU factor, prior factor (TBD)

Paper:

Quickstart

Dependencies:

You can also easily install all dependencies with script: script

Demo with BAL dataset:

  • Download any pre.txt.bz2 file from BAL Dataset: https://grail.cs.washington.edu/projects/bal/ and uncompressed.

  • Compile

    mkdir build
    cd build
    cmake ..
    make -j4 BAL_Double
  • Run the demo (Venice-1778)

    cd examples
    ./BAL_Double --name=Venice --world_size=2 --iter=100 --solver_tol=1e-1 --solver_refuse_ratio=1 --solver_max_iter=100 --tau=1e4 --epsilon1=1 --epsilon2=1e-10
    • world_size: number of GPUs available
    • iter: the maximal number of LM iteration
    • epsilon: threshold in LM
    • solver_tol: tolerance of solver (distributed PCG solver)
    • solver_refuse_ratio: early stop for the solver
    • solver_max_iter: the maximal iteration of solver
    • tau: the initial region

Notes for the practitioners

  • Currently, MegBA implements automatic differentation only for generalizability. Please consider implementing your own analytical differentiation module.
  • If you use devices without modern inter-device communication (i.e., NVLinks..), you might find the data transfer is the bottleneck.
  • Empirically, we found it is necessary to customize the LM trust-region strategies and tune its hyper-parameters to further boost the performance.

Documentation

Under doc/ (Coming soon...)

Collaborate with Us

Please check here for MegBA's future plan.

If you are intereted in MegBA and want to collaborate, you can:

  • Apply for an Internship at Megvii Research 3D, please send your resume to [email protected], with your expected starting date. (subject: 3D组CUDA实习生-Name) Unfortunately, now we are only able to host interns with work authorization in China.
  • As an external collaborator (coding), just fork this repo and send PRs. We will review your PR carefully (and merge it into MegBA).
  • As an algorithm/novelty contributor, please send an email to [email protected].
  • Any new feature request, you can send an email to [email protected] as well. Note that it is not guaranteed the requested feature will be added or added soon

Contact Information:

BibTeX Citation

If you find MegBA useful for your project, please consider citing:

@misc{2021megba,
  title={MegBA: A High-Performance and Distributed Library for Large-Scale Bundle Adjustment}, 
  author={Jie Ren and Wenteng Liang and Ran Yan and Luo Mai and Shiwen Liu and Xiao Liu},
  year={2021},
  eprint={2112.01349},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

License

MegBA is licensed under the Apache License, Version 2.0.

Comments
  • Problem building on Windows

    Problem building on Windows

    Hi, Thanks for this great contribution, I'm trying to build the repo on windows but I found out that NCCL is not supported on Windows system so I can't build it. If there is a idea to build library for single gpu without support of NCCL and adding support to windows I would like to contribute for that

    opened by RamadanAhmed 7
  • Adding camera-camera edge in case of stereo camera.

    Adding camera-camera edge in case of stereo camera.

    Hi guys, Thanks for your great contribution.

    I want to ask about the possibility of adding an extra type of edge between stereo-camera pairs(camera-camera edge), and if there is a guideline to do that if it is possible

    Thanks in advance

    opened by khshmt 2
  • run time error thrust::system::system_error

    run time error thrust::system::system_error

    Hi, Thanks a lot for your great contribution.

    I got the below error while running the BAL_Double example,

    "solving /home/khshmt/cpp_ws/MegBA/data, world_size: 1, max iter: 20, solver_tol: 10, solver_refuse_ratio: 1, solver_max_iter: 50, tau: 1, epsilon1: 1, epsilon2: 1e-10 Start with error: 0, log error: -inf, elapsed 209 ms terminate called after throwing an instance of 'thrust::system::system_error' what(): after reduction step 1: cudaErrorIllegalAddress: an illegal memory access was encountered Aborted (core dumped)"

    linux ==>> 20.04 g++ ==>> 9.4.0 cuda ==>> 11.6 GPU ==>> NVIDIA Quadro M6000 dataset ==>> venice ==>>problem-1778-993923-pre.txt.bz2

    cmake configuration and build runs flawless, but i got the above error every time I run knowing It loads all the data(like: num_points, num_cameras, num_observations and so on) from the dataset correctlly. Screenshot from 2022-07-07 15-37-16

    opened by khshmt 2
  • feat(all): allow build on windows/linux

    feat(all): allow build on windows/linux

    • Upgrade CMake to use find(CUDAToolkit) instead of deprecated find(CUDA)
    • Add CMakePresets
    • Remove .gitmodules
    • Add vcpkg.json to mange third_party packages
    • Add libraryConfig.cmake.in for library installation
    • Add CMake artifact to .gitignore
    opened by RamadanAhmed 2
  • sqrt API not exposed

    sqrt API not exposed

    Hi ,really appreciate for your amazing work.

    I found a bug when i self-implement a vertex, "cuda 700 error". It seems that the sqrt interface not exposed out.

    opened by xingweiqu 1
  • dentifier

    dentifier "CUSPARSE_SPMV_ALG_DEFAULT" is undefined

    home/pengshuxue/Downloads/MegBA/src/solver/schur_pcg_solver.cu(470): error: identifier "CUSPARSE_SPMV_ALG_DEFAULT" is undefined detected during: instantiation of "__nv_bool MegBA::::SchurPCGSolverDistributed(const MegBA::SolverOption::SolverOptionPCG &, const std::vector<T *, std::allocator<T *>> &, const std::vector<T *, std::allocator<T *>> &, const std::vector<T *, std::allocator<T *>> &, const std::vector<int *, std::allocator<int *>> &, const std::vector<int *, std::allocator<int *>> &, const std::vector<T *, std::allocator<T *>> &, const std::vector<int *, std::allocator<int *>> &, const std::vector<int *, std::allocator<int *>> &, const std::vector<T *, std::allocator<T *>> &, int, int, int, int, const std::vector<int, std::allocator> &, int, int, const std::vector<T *, std::allocator<T *>> &) [with T=double]" (658): here instantiation of "void MegBA::SchurPCGSolver::solve(const MegBA::BaseLinearSystem &) [with T=double]" (661): here

    /home/pengshuxue/Downloads/MegBA/src/solver/schur_pcg_solver.cu(201): error: identifier "CUSPARSE_SPMV_ALG_DEFAULT" is undefined detected during: instantiation of "__nv_bool MegBA::::SchurPCGSolverDistributed(const MegBA::SolverOption::SolverOptionPCG &, const std::vector<T *, std::allocator<T *>> &, const std::vector<T *, std::allocator<T *>> &, const std::vector<T *, std::allocator<T *>> &, const std::vector<int *, std::allocator<int *>> &, const std::vector<int *, std::allocator<int *>> &, const std::vector<T *, std::allocator<T *>> &, const std::vector<int *, std::allocator<int *>> &, const std::vector<int *, std::allocator<int *>> &, const

    opened by pshuxue 1
  • fatal error: nccl.h: No such file or directory

    fatal error: nccl.h: No such file or directory

    In file included from /home/pengshuxue/Downloads/MegBA/src/algo/lm_algo.cu:15: /home/pengshuxue/Downloads/MegBA/include/resource/handle_manager.h:11:10: fatal error: nccl.h: No such file or directory 11 | #include <nccl.h> | ^~~~~~~~ compilation terminated. CMake Error at lm_algo_CUDA_generated_lm_algo.cu.o.cmake:220 (message): Error generating /home/pengshuxue/Downloads/MegBA/build/src/algo/CMakeFiles/lm_algo_CUDA.dir//./lm_algo_CUDA_generated_lm_algo.cu.o

    make[2]: *** [src/algo/CMakeFiles/lm_algo_CUDA.dir/build.make:65: src/algo/CMakeFiles/lm_algo_CUDA.dir/lm_algo_CUDA_generated_lm_algo.cu.o] Error 1 make[1]: *** [CMakeFiles/Makefile2:340: src/algo/CMakeFiles/lm_algo_CUDA.dir/all] Error 2 make: *** [Makefile:84: all] Error 2

    Mrs J, what should I do?

    opened by pshuxue 1
  • feedback of demo

    feedback of demo

    Hello, Your work is pretty well that compared with ceres. I tried your demo of BAL_Float with Venice-52. I noticed that only few iteration of LM sucessed and when I print the deltaXL2 and xL2, the number is very big .

    solving problem-52-64053-pre.txt, world_size: 1, solver iter: 100, solver_tol: 0.01, solver_refuse_ratio: 1, solver_max_iter: 100, tau: 10000, epsilon1: 1, epsilon2: 1e-10 52 64053 347173 0 1 2 3 4 start with error: 1.11521e+07, log error: 7.04736 deltaXL2 = 795.057990 xL2 = 20297.274566 1-th iter error: 967452, log error: 5.98563 deltaXL2 = 641.070984 xL2 = 20457.752299 2-th iter error: 504157, log error: 5.70257 deltaXL2 = 900.042953 xL2 = 20451.053379 3th iter rollbackLM deltaXL2 = 546.620354 xL2 = 20451.053379 4th iter rollbackLM deltaXL2 = 284.468891 xL2 = 20451.053379 5-th iter error: 400720, log error: 5.60284 deltaXL2 = 374.414253 xL2 = 20427.049189 6-th iter error: 364873, log error: 5.56214 deltaXL2 = 370.487802 xL2 = 20488.023926 7th iter rollbackLM deltaXL2 = 2287.042375 xL2 = 20488.023926 8th iter rollbackLM deltaXL2 = 71488.917241 xL2 = 20488.023926 9-th iter error: 362014, log error: 5.55872 deltaXL2 = 137.477325 xL2 = 74207.259595 10-th iter error: 291625, log error: 5.46482 deltaXL2 = 201.875231 xL2 = 74267.585371 11-th iter error: 277062, log error: 5.44258 deltaXL2 = 181.679374 xL2 = 74428.300552 12-th iter error: 272821, log error: 5.43588 deltaXL2 = 195.595084 xL2 = 74569.198498 13th iter rollbackLM deltaXL2 = 574978812.009305 xL2 = 74569.198498 14th iter rollbackLM deltaXL2 = 553295309151825.250000 xL2 = 74569.198498 15th iter rollbackLM deltaXL2 = 67003134385564172288.000000 xL2 = 74569.198498 16th iter rollbackLM deltaXL2 = 558190831280728100044800.000000 xL2 = 74569.198498 17th iter rollbackLM deltaXL2 = 223324689881403556926324736.000000 xL2 = 74569.198498 18th iter rollbackLM deltaXL2 = 1540883170175477071243902976.000000 xL2 = 74569.198498 19th iter rollbackLM deltaXL2 = 83242234827980244867088384.000000 xL2 = 74569.198498 20th iter rollbackLM deltaXL2 = 17566485752858001014784.000000 xL2 = 74569.198498 21th iter rollbackLM deltaXL2 = 7240292771779925.000000 xL2 = 74569.198498 22th iter rollbackLM deltaXL2 = 1833498346.973379 xL2 = 74569.198498 23th iter rollbackLM deltaXL2 = 895261.733858 xL2 = 74569.198498 24th iter rollbackLM deltaXL2 = 218.569759 xL2 = 74569.198498 25th iter rollbackLM deltaXL2 = 0.026681 xL2 = 74569.198498 26-th iter error: 272818, log error: 5.43587 deltaXL2 = 0.000000 xL2 = 74569.197054 deltaXL2 <= epsilon2*(xL2+epsilon2) deltaXL2 = 0.000000 xL2 = 74569.197054 solve dur: 3219 ms

    opened by YZHUA 1
  • some feedback of demo

    some feedback of demo

    Hello, I tried your demo of BAL_Double. I have 2 questions of it.

    1. The program will take 100% of my cpu even I have no data (Venice). There is no strategy of data checking.
    2. The program will take 100% of my cpu when I input params world_size==2. But I have only 1 gpu. The program stay at that progress and use my cpu. When I input right gpu number of 1, the program solved problem very fast. Finally, I suggest the program should more robust of the main progress. For example, some checking and logs let us know what happened. And I also want to know why it take all of my cpu when nothing to solve. Many thanks for watching that! Your work is pretty well that compared with ceres. I think there are still some small problem to solve on the level of program.
    opened by ssmem 1
  • Resconstruction, a stable version

    Resconstruction, a stable version

    1. Code reconstruction, add new class BaseAlgo, BaseLinearSystem, BaseSolver
    2. Remove dependency of modifying source code of Eigen
    3. Fix rollback bugs in LM algorithm
    4. Fix memory leaking problem
    opened by JieRen98 0
  • Generating problem

    Generating problem

    While compiling MegBA. Some problem occured.

    :~/MegBA/build$ make
    [  7%] Built target cuManager
    [  9%] Building NVCC (Device) object src/operator/CMakeFiles/math_function_Jet_Vector_CUDA.dir/math_function_Jet_Vector_CUDA_generated_math_function_Jet_Vector_CUDA.cu.o
    CMake Error at math_function_Jet_Vector_CUDA_generated_math_function_Jet_Vector_CUDA.cu.o.cmake:219 (message):
      Error generating
      /home/wlh/MegBA/build/src/operator/CMakeFiles/math_function_Jet_Vector_CUDA.dir//./math_function_Jet_Vector_CUDA_generated_math_function_Jet_Vector_CUDA.cu.o
    
    
    src/operator/CMakeFiles/math_function_Jet_Vector_CUDA.dir/build.make:598: recipe for target 'src/operator/CMakeFiles/math_function_Jet_Vector_CUDA.dir/math_function_Jet_Vector_CUDA_generated_math_function_Jet_Vector_CUDA.cu.o' failed
    make[2]: *** [src/operator/CMakeFiles/math_function_Jet_Vector_CUDA.dir/math_function_Jet_Vector_CUDA_generated_math_function_Jet_Vector_CUDA.cu.o] Error 1
    CMakeFiles/Makefile2:183: recipe for target 'src/operator/CMakeFiles/math_function_Jet_Vector_CUDA.dir/all' failed
    make[1]: *** [src/operator/CMakeFiles/math_function_Jet_Vector_CUDA.dir/all] Error 2
    Makefile:83: recipe for target 'all' failed
    make: *** [all] Error 2
    

    I am using cuda 11.6 with nvidia-driver-510.

    opened by Airplane5 3
  • What if all cameras share the same intrinsics?

    What if all cameras share the same intrinsics?

    Thanks for sharing your excellent work.

    In SfM, it is a very common scenario that all (part of, at least) cameras share the same camera intrinsics Matrix and Undistort parameters(k1, k2, p1, etc...), and these camera intrinsics are also variables to be estimated. In another words, like ceres did, if the element groups include 3 types: "camera_intrinsics, image_poses, points", Will the system generalize to this kind of problem?

    How to do this? Hope to receiving your advice.

    opened by ghost 1
  • memory leak when solve multi-time GBA

    memory leak when solve multi-time GBA

    Hi thanks for your amazing work and open sourced code

    Start with error: 0.105238, log error: -0.977827, elapsed 4 ms
    CUDA error 700 [/usr/local/cuda-11.3/include/cub/block/../iterator/../util_device.cuh, 635]: an illegal memory access was encountered
    CUDA error 700 [/usr/local/cuda-11.3/include/thrust/system/cuda/detail/extrema.h, 210]: an illegal memory access was encountered
    terminate called after throwing an instance of 'thrust::system::system_error'
      what():  extrema failed on 1st step: cudaErrorIllegalAddress: an illegal memory access was encountered
    
    

    I have a cuda error 700, when I construct solver and solve GBA multi-times, this error would happen. It seems that there is some raw pointer error. Could you plz to help me for this problem?

    opened by xingweiqu 1
  • CMake Modernization

    CMake Modernization

    Hi Everyone,

    I'm very interested in this project. I've primarily been a user of Colmap, but have been searching for a way to speed up the global bundle adjustment. I only came across the paper by accident. The results look very promising! I'd like to contribute where I can.

    Would you mind if I spent a few minutes modernizing the CMake?

    opened by KBentley57 3
Owner
旷视研究院 3D 组
旷视科技(Face++)研究院 3D 组(原 SLAM 组)
旷视研究院 3D 组
Deep Unsupervised 3D SfM Face Reconstruction Based on Massive Landmark Bundle Adjustment.

(ACMMM 2021 Oral) SfM Face Reconstruction Based on Massive Landmark Bundle Adjustment This repository shows two tasks: Face landmark detection and Fac

BoomStar 51 Dec 13, 2022
Deep Unsupervised 3D SfM Face Reconstruction Based on Massive Landmark Bundle Adjustment.

(ACMMM 2021 Oral) SfM Face Reconstruction Based on Massive Landmark Bundle Adjustment This repository shows two tasks: Face landmark detection and Fac

BoomStar 51 Dec 13, 2022
Poplar implementation of "Bundle Adjustment on a Graph Processor" (CVPR 2020)

Poplar Implementation of Bundle Adjustment using Gaussian Belief Propagation on Graphcore's IPU Implementation of CVPR 2020 paper: Bundle Adjustment o

Joe Ortiz 34 Dec 5, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 14.5k Jan 8, 2023
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate. Website • Key Features • How To Use • Docs •

Pytorch Lightning 21.1k Jan 1, 2023
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate. Website • Key Features • How To Use • Docs •

Pytorch Lightning 11.9k Feb 13, 2021
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate. Website • Key Features • How To Use • Docs •

Pytorch Lightning 21.1k Jan 8, 2023
A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.

About This repository provides data and code for the paper: Scalable Data Annotation Pipeline for High-Quality Large Speech Datasets Development (subm

Appen Repos 86 Dec 7, 2022
PyTorch implementation of the paper: Long-tail Learning via Logit Adjustment

logit-adj-pytorch PyTorch implementation of the paper: Long-tail Learning via Logit Adjustment This code implements the paper: Long-tail Learning via

Chamuditha Jayanga 53 Dec 23, 2022
This codebase is the official implementation of Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization (NeurIPS2021, Spotlight)

Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization This codebase is the official implementation of Test-Time Classifier A

null 47 Dec 28, 2022
A face dataset generator with out-of-focus blur detection and dynamic interval adjustment.

A face dataset generator with out-of-focus blur detection and dynamic interval adjustment.

Yutian Liu 2 Jan 29, 2022
Forecasting for knowable future events using Bayesian informative priors (forecasting with judgmental-adjustment).

What is judgyprophet? judgyprophet is a Bayesian forecasting algorithm based on Prophet, that enables forecasting while using information known by the

AstraZeneca 56 Oct 26, 2022
BARF: Bundle-Adjusting Neural Radiance Fields 🤮 (ICCV 2021 oral)

BARF ?? : Bundle-Adjusting Neural Radiance Fields Chen-Hsuan Lin, Wei-Chiu Ma, Antonio Torralba, and Simon Lucey IEEE International Conference on Comp

Chen-Hsuan Lin 539 Dec 28, 2022
Galileo library for large scale graph training by JD

近年来,图计算在搜索、推荐和风控等场景中获得显著的效果,但也面临超大规模异构图训练,与现有的深度学习框架Tensorflow和PyTorch结合等难题。 Galileo(伽利略)是一个图深度学习框架,具备超大规模、易使用、易扩展、高性能、双后端等优点,旨在解决超大规模图算法在工业级场景的落地难题,提

JD Galileo Team 128 Nov 29, 2022
Secure Distributed Training at Scale

Secure Distributed Training at Scale This repository contains the implementation of experiments from the paper "Secure Distributed Training at Scale"

Yandex Research 9 Jul 11, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Jan 4, 2023
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 5.7k Feb 12, 2021
PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing

PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing

null 943 Jan 7, 2023
HyperPose is a library for building high-performance custom pose estimation applications.

HyperPose is a library for building high-performance custom pose estimation applications.

TensorLayer Community 1.2k Jan 4, 2023