This repository contains the code needed to train Mega-NeRF models and generate the sparse voxel octrees

cmusatyalab

Last update: Dec 28, 2022

Related tags

Deep Learning mega-nerf

Overview

Mega-NeRF

This repository contains the code needed to train Mega-NeRF models and generate the sparse voxel octrees used by the Mega-NeRF-Dynamic viewer.

The codebase for the Mega-NeRF-Dynamic viewer can be found here.

Note: This is a preliminary release and there may still be outstanding bugs.

Citation

@misc{turki2021meganerf,
      title={Mega-NeRF: Scalable Construction of Large-Scale NeRFs for Virtual Fly-Throughs}, 
      author={Haithem Turki and Deva Ramanan and Mahadev Satyanarayanan},
      year={2021},
      eprint={2112.10703},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Demo

Setup

conda env create -f environment.yml
conda activate mega-nerf

The codebase has been mainly tested against CUDA >= 11.1 and V100/2080 Ti/3090 Ti GPUs. 1080 Ti GPUs should work as well although training will be much slower.

Data

Mill 19

The Building scene can be downloaded here.
The Rubble scene can be downloaded here.

UrbanScene 3D

Download the raw photo collections from the UrbanScene3D dataset
Download the refined camera poses for one of the scenes below:

Run python scripts/copy_images.py --image_path $RAW_PHOTO_PATH --dataset_path $CAMERA_POSE_PATH

Quad 6k Dataset

Download the raw photo collections from here.
Download the refined camera poses
Run python scripts/copy_images.py --image_path $RAW_PHOTO_PATH --dataset_path $CAMERA_POSE_PATH

Custom Data

The expected directory structure is:

/coordinates.pt: Torch file that should contain the following keys:
- 'origin_drb': Origin of scene in real-world units
- 'pose_scale_factor': Scale factor mapping from real-world unit (ie: meters) to [-1, 1] range
'/{val|train}/rgbs/': JPEG or PNG images
'/{val|train}/metadata/': Image-specific image metadata saved as a torch file. Each image should have a corresponding metadata file with the following file format: {rgb_stem}.pt. Each metadata file should contain the following keys:
- 'W': Image width
- 'H': Image height
- 'intrinsics': Image intrinsics in the following form: [fx, fy, cx, cy]
- 'c2w': Camera pose. 3x3 camera matrix with the convention used in the original NeRF repo, ie: x: down, y: right, z: backwards, followed by the following transformation: torch.cat([camera_in_drb[:, 1:2], -camera_in_drb[:, :1], camera_in_drb[:, 2:4]], -1)

Training

Generate the training partitions for each submodule: python scripts/create_cluster_masks.py --config configs/mega-nerf/${DATASET_NAME}.yml --dataset_path $DATASET_PATH --output $MASK_PATH --grid_dim $GRID_X $GRID_Y
- Note: this can be run across multiple GPUs by instead running python -m torch.distributed.run --standalone --nnodes=1 --nproc_per_node $NUM_GPUS --max_restarts 0 scripts/create_cluster_masks.py
Train each submodule: python mega_nerf/train.py --config_file configs/mega-nerf/${DATASET_NAME}.yml --exp_name $EXP_PATH --dataset_path $DATASET_PATH --chunk_paths $SCRATCH_PATH --cluster_mask_path ${MASK_PATH}/${SUBMODULE_INDEX}
- Note: training with against full scale data will write hundreds of GBs / several TBs of shuffled data to disk. You can downsample the training data using train_scale_factor option.
- Note: we provide a utility script based on parscript to start multiple training jobs in parallel. It can run through the following command: CONFIG_FILE=configs/mega-nerf/${DATASET_NAME}.yaml EXP_PREFIX=$EXP_PATH DATASET_PATH=$DATASET_PATH CHUNK_PREFIX=$SCRATCH_PATH MASK_PATH=$MASK_PATH python -m parscript.dispatcher parscripts/run_8.txt -g $NUM_GPUS
Merge the trained submodules into a unified Mega-NeRF model: python scripts/merge_submodules.py --config_file configs/mega-nerf/${DATASET_NAME}.yaml --ckpt_prefix ${EXP_PREFIX}- --centroid_path ${MASK_PATH}/params.pt --output $MERGED_OUTPUT

Evaluation

Single-GPU evaluation: python mega_nerf/eval.py --config_file configs/nerf/${DATASET_NAME}.yaml --exp_name $EXP_NAME --dataset_path $DATASET_PATH --container_path $MERGED_OUTPUT

Multi-GPU evaluation: python -m torch.distributed.run --standalone --nnodes=1 --nproc_per_node $NUM_GPUS mega_nerf/eval.py --config_file configs/nerf/${DATASET_NAME}.yaml --exp_name $EXP_NAME --dataset_path $DATASET_PATH --container_path $MERGED_OUTPUT

Octree Extraction (for use by Mega-NeRF-Dynamic viewer)

python scripts/create_octree.py --config configs/mega-nerf/${DATASET_NAME}.yaml --dataset_path $DATASET_PATH --container_path $MERGED_OUTPUT --output $OCTREE_PATH

Acknowledgements

Large parts of this codebase are based on existing work in the nerf_pl, NeRF++, and Plenoctree repositories. We use svox to serialize our sparse voxel octrees and the generated structures should be largely compatible with that codebase.

Comments

Custom dataset
Congrats on your great work and thank you for releasing the code. I am curious to train my custom dataset but get stuck in the dataset preparation process. I also notice that you mention in the paper: "We refine the initial GPS-derived camera poses in the Mill 19 and UrbanScene3D datasets and the estimates provided in the Quad 6k dataset using PixSFM."

I have the following questions:

How are the initial camera poses derived from drone GPS and IMU sensors? How are the initial GPS-derived camera poses refined using PixSFM? Could you kindly provide the script that can generate camera poses?

Can the camera poses estimated by COLMAP work with Mega-NeRF? Or should also be refined by PixSFM?

Thanks in advance and really looking forward to your reply! Xingyi
opened by xingyi-li 12
arguments questions

I read the paper about mega-nerf, which is amazing. Could I ask some questions like the arguments of the code? For example, what is the mean of '--chunk_paths $SCRATCH_PATH '? I do not see the notes of this parameter.

opened by city19992 9
Question about c2w

Hi, just to confirm the format of c2w matrix. Is it c2w[:3,:3] corresponds to rot matrix in order [x,y,z] and c2w[:3,-1] the camera position in order [z,x,y]? I noticed in _truncate_with_plane_intersection you use rays_o[:, :, 0] < altitude.

opened by kam1107 7
The validation images are also used in training

Hi, thank you for your great open source and quick response to my previous questions. I am trying to train mega-nerf to reproduce the results. I have a question related to the validation images. As in the code below: https://github.com/cmusatyalab/mega-nerf/blob/306e06cc316dd4f5c84d0610308bcbc208228fc3/mega_nerf/runner.py#L620 It seems that the validation images are also used in training. To my understanding, it is because we need to learn the appearance embeddings of the validation images. I would to like confirm this setting and want to ask whether I understand this setting correctly. Thank you very much.

Best regards, Zhenxing Mi

opened by MiZhenxing 5
how to see results after training?

thank you for your excellent work! I have a question. I use dataset building and after training i cannot see the results of whole scene. Is there a .mp4 or some images exist to show the training result? Besides: I notice in render_images.py ,which requires input including pose.txt/intrinsics.txt/embeddings.txt. how can i get these ?

opened by Furenchampion 5
pixsfm cameras of Campus dataset

Hi, seems the camera parameter link of Campus has broken (https://storage.cmusatyalab.org/mega-nerf-data/campus-pixsfm.tgz). Could you please update the link so we could fully test on the datasets? Thank you very much.

opened by MiZhenxing 4
CUDA out of memory when creating mask

Hi I used the quad6 dataset and set the grid_dim to 10, 10

However, I get the following cuda_outof_memory error: RuntimeError: CUDA out of memory. Tried to allocate 18.31 GiB (GPU 0; 23.70 GiB total capacity; 18.57 GiB already allocated; 3.13 GiB free; 18.58 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I'm using 3090 with 8 cards. I dont understand why it caused the out of memory issue.

Weide

opened by weidezhang 3
Training speed almost irrelevant to the size of dataset

I have trained the building dataset with 1000+ images and a sample of 50 images of the dataset. I trained on 4 3090 GPUs and it took almost 40 hours. However, the train time of the small dataset was almost as same as big one. I was wondering how to speed up this process. Thanks a lot.

opened by h8c2 3

Error while training mill 19 building, centroid 0 in a distributed fashion

I get this error after a while trying to train (on multiple nodes with multiple gpus each) a model on the building dataset, for chunk/submodule 0 (using the provided cluster masks) Any tips on debugging this? Seems like somewhere there is an issue in a sample.

 28%|██▊       | 140533/500000 [15:32:21<45:35:06,  2.19it/s]
 28%|██▊       | 140534/500000 [15:32:22<50:19:14,  1.98it/s]
 28%|██▊       | 140535/500000 [15:32:23<53:45:04,  1.86it/s]
 28%|██▊       | 140536/500000 [15:32:23<44:03:00,  2.27it/s]
 28%|██▊       | 140537/500000 [15:32:23<49:43:21,  2.01it/s]
 28%|██▊       | 140538/500000 [15:32:24<55:23:35,  1.80it/s]
 28%|██▊       | 140539/500000 [15:32:24<46:40:38,  2.14it/s]
 28%|██▊       | 140540/500000 [15:32:25<46:16:18,  2.16it/s]
 28%|██▊       | 140541/500000 [15:32:25<38:48:16,  2.57it/s]
 28%|██▊       | 140542/500000 [15:32:25<34:51:35,  2.86it/s]
 28%|██▊       | 140543/500000 [15:32:26<38:54:21,  2.57it/s]
 28%|██▊       | 140544/500000 [15:32:26<46:17:54,  2.16it/s]
 28%|██▊       | 140545/500000 [15:32:27<40:27:56,  2.47it/s]
 28%|██▊       | 140546/500000 [15:32:27<46:56:04,  2.13it/s]
 28%|██▊       | 140547/500000 [15:32:28<52:37:22,  1.90it/s]ERROR:torch.distributed.elastic.multiprocessing.errors.error_handler:{
[stderr]  "message": {
[stderr]    "message": "ValueError: cannot reshape array of size 42028165 into shape (42107792,)",
[stderr]    "extraInfo": {
[stderr]      "py_callstack": "Traceback (most recent call last):\n  File \"/azureml-envs/azureml_42fe9568fca52a4c68567983a86d6e0f/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py\", line 345, in wrapper\n    return f(*args, **kwargs)\n  File \"/tmp/code/mega_nerf/train.py\", line 24, in main\n    Runner(hparams).train()\n  File \"/tmp/code/mega_nerf/runner.py\", line 226, in train\n    dataset.load_chunk()\n  File \"/tmp/code/mega_nerf/datasets/filesystem_dataset.py\", line 80, in load_chunk\n    chosen, self._loaded_rgbs, self._loaded_rays, self._loaded_image_indices = self._chunk_future.result()\n  File \"/azureml-envs/azureml_42fe9568fca52a4c68567983a86d6e0f/lib/python3.9/concurrent/futures/_base.py\", line 438, in result\n    return self.__get_result()\n  File \"/azureml-envs/azureml_42fe9568fca52a4c68567983a86d6e0f/lib/python3.9/concurrent/futures/_base.py\", line 390, in __get_result\n    raise self._exception\n  File \"/azureml-envs/azureml_42fe9568fca52a4c68567983a86d6e0f/lib/python3.9/concurrent/futures/thread.py\", line 52, in run\n    result = self.fn(*self.args, **self.kwargs)\n  File \"/tmp/code/mega_nerf/datasets/filesystem_dataset.py\", line 111, in _load_chunk_inner\n    loaded_pixel_indices = torch.IntTensor(np.load(self._ray_arrays[next_index]))\n  File \"/azureml-envs/azureml_42fe9568fca52a4c68567983a86d6e0f/lib/python3.9/site-packages/numpy/lib/npyio.py\", line 440, in load\n    return format.read_array(fid, allow_pickle=allow_pickle,\n  File \"/azureml-envs/azureml_42fe9568fca52a4c68567983a86d6e0f/lib/python3.9/site-packages/numpy/lib/format.py\", line 787, in read_array\n    array.shape = shape\nValueError: cannot reshape array of size 42028165 into shape (42107792,)\n",
[stderr]      "timestamp": "1651655215"
[stderr]    }
[stderr]  }
[stderr]}
[stderr]Traceback (most recent call last):
[stderr]  File "/tmp/code/mega_nerf/train.py", line 28, in <module>
[stderr]    main(_get_train_opts())
[stderr]  File "/azureml-envs/azureml_42fe9568fca52a4c68567983a86d6e0f/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
[stderr]    return f(*args, **kwargs)
[stderr]  File "/tmp/code/mega_nerf/train.py", line 24, in main
[stderr]    Runner(hparams).train()
[stderr]  File "/tmp/code/mega_nerf/runner.py", line 226, in train
[stderr]    dataset.load_chunk()
[stderr]  File "/tmp/code/mega_nerf/datasets/filesystem_dataset.py", line 80, in load_chunk
[stderr]    chosen, self._loaded_rgbs, self._loaded_rays, self._loaded_image_indices = self._chunk_future.result()
[stderr]  File "/azureml-envs/azureml_42fe9568fca52a4c68567983a86d6e0f/lib/python3.9/concurrent/futures/_base.py", line 438, in result
[stderr]    return self.__get_result()
[stderr]  File "/azureml-envs/azureml_42fe9568fca52a4c68567983a86d6e0f/lib/python3.9/concurrent/futures/_base.py", line 390, in __get_result
[stderr]    raise self._exception
[stderr]  File "/azureml-envs/azureml_42fe9568fca52a4c68567983a86d6e0f/lib/python3.9/concurrent/futures/thread.py", line 52, in run
[stderr]    result = self.fn(*self.args, **self.kwargs)
[stderr]  File "/tmp/code/mega_nerf/datasets/filesystem_dataset.py", line 111, in _load_chunk_inner
[stderr]    loaded_pixel_indices = torch.IntTensor(np.load(self._ray_arrays[next_index]))
[stderr]  File "/azureml-envs/azureml_42fe9568fca52a4c68567983a86d6e0f/lib/python3.9/site-packages/numpy/lib/npyio.py", line 440, in load
[stderr]    return format.read_array(fid, allow_pickle=allow_pickle,
[stderr]  File "/azureml-envs/azureml_42fe9568fca52a4c68567983a86d6e0f/lib/python3.9/site-packages/numpy/lib/format.py", line 787, in read_array
[stderr]    array.shape = shape
[stderr]ValueError: cannot reshape array of size 42028165 into shape (42107792,)

opened by madratman 3

Question about running a demo

What a great work! May I ask a few questions? If I just want to run a demo, all I need to do is downloading one of the datasets and the pretrained model corresponding to the dataset, and running the create_octree.py to extract the octree and then see the result by the Mega-NeRF-Dynamic viewer, is that correct? Moreover, what's the meaning of --container_path argument and what should I fill in to it? Many thanks!

opened by tonytonglt 3
forward-facing dataset

Hi, thanks for the excellent work.

I want to test the code on a relatively large-scale forward-facing dataset. How to change the code, particularly how to produce the centroids? My understanding is that the parameter ray_altitude_range no longer needs. But I cannot get reasonable cluster mask.

opened by Harper714 2
not clear img results

hi, thks for this great work for large scene. i uesd it for my own dataset (500 pics), after training 500k, i get the img from your 'eval_result_rgbs' from the scritpt 'eval.py'. but the result rgb img is not clear enough, do you have any suggestion? any reponse is helpful. thanks!

opened by Calsia 1
questions about render_image.py

@hturki thanks for your great work! I'm following the tutorial to see its outcome. However, after the evaluation, I can only see the matrixs.txt, which I want is to see some rendered images or videos. The question is I don't really understand how to creat 3 files of the input param in render_image.py, could you pls offer some examples?

opened by cezarbbb 3
How pixel assign to cell?

Dear @jaharkes @hturki @therealsatya @teiszler

Thanks the great work! But I have some understing. How to assign a pixel to certain 3D cell? Using the depth of the pixel? There are many sample locations on a ray of a pixel. How to process the sample locations?

Best, Yingjie CAI

opened by yjcaimeow 0
How to get the pixel color if a training ray it traverses multiple submodules?

This work is excellent. But I have a question, in the paper, you said each submodule can be trained separately, but what if a training ray corresponds to multiple submodules, do you merge the sample points in different modules?

opened by cwchenwang 0
How to visualize and save OBJ?

Hi @hturki ! I was trying to compile mega-nerf-viewer but was stuck in the final stage. Do you have any idea about this problem?

OS: Ubuntu 20.04 gcc version: 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1) cmake: 3.24.1 libtorch libx11-dev_1.6.9-2ubuntu1.3_amd64 cuda: 11.6

I have successfully compiled mega-nerf-viewer. Then I ran the following commands with one NVIDIA Tesla A100: ./mega-nerf-viewer ../../OCTREE_PATH.npz --model_path ../../building-pixsfm-8.pt Thanks!

opened by chl2 0

This repository contains the code needed to train Mega-NeRF models and generate the sparse voxel octrees

Related tags

Overview

Mega-NeRF

Citation

Demo

Setup

Data

Mill 19

UrbanScene 3D

Quad 6k Dataset

Custom Data

Training

Evaluation

Octree Extraction (for use by Mega-NeRF-Dynamic viewer)

Acknowledgements

Comments

Owner

cmusatyalab

Compute descriptors for 3D point cloud registration using a multi scale sparse voxel architecture

This repository contains a PyTorch implementation of "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis".

This repository contains a pytorch implementation of "HeadNeRF: A Real-time NeRF-based Parametric Head Model (CVPR 2022)".

The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

A repo that contains all the mesh keys needed for mesh backend, along with a code example of how to use them in python

Differentiable Neural Computers, Sparse Access Memory and Sparse Differentiable Neural Computers, for Pytorch

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

This repository contains the code and models necessary to replicate the results of paper: How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

Code for "PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds", CVPR 2021

Python code to fuse multiple RGB-D images into a TSDF voxel volume.

This repository contains a set of codes to run (i.e., train, perform inference with, evaluate) a diarization method called EEND-vector-clustering.

This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

Implementation of ICCV2021(Oral) paper - VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation

Voxel Transformer for 3D object detection

for taichi voxel-challange event

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR 2022)

All the essential resources and template code needed to understand and practice data structures and algorithms in python with few small projects to demonstrate their practical application.

This repo contains the code required to train the multivariate time-series Transformer.

Code used to generate the results appearing in "Train longer, generalize better: closing the generalization gap in large batch training of neural networks"