Pytorch version of SfmLearner from Tinghui Zhou et al.

Overview

SfMLearner Pytorch version

This codebase implements the system described in the paper:

Unsupervised Learning of Depth and Ego-Motion from Video

Tinghui Zhou, Matthew Brown, Noah Snavely, David G. Lowe

In CVPR 2017 (Oral).

See the project webpage for more details.

Original Author : Tinghui Zhou ([email protected]) Pytorch implementation : Clément Pinard ([email protected])

sample_results

Preamble

This codebase was developed and tested with Pytorch 1.0.1, CUDA 10 and Ubuntu 16.04. Original code was developped in tensorflow, you can access it here

Prerequisite

pip3 install -r requirements.txt

or install manually the following packages :

pytorch >= 1.0.1
pebble
matplotlib
imageio
scipy
argparse
tensorboardX
blessings
progressbar2
path.py

Note

Because it uses latests pytorch features, it is not compatible with anterior versions of pytorch.

If you don't have an up to date pytorch, the tags can help you checkout the right commits corresponding to your pytorch version.

What has been done

  • Training has been tested on KITTI and CityScapes.
  • Dataset preparation has been largely improved, and now stores image sequences in folders, making sure that movement is each time big enough between each frame
  • That way, training is now significantly faster, running at ~0.14sec per step vs ~0.2s per steps initially (on a single GTX980Ti)
  • In addition you don't need to prepare data for a particular sequence length anymore as stacking is made on the fly.
  • You can still choose the former stacked frames dataset format.
  • Convergence is now almost as good as original paper with same hyper parameters
  • You can know compare with groud truth for your validation set. It is still possible to validate without, but you now can see that minimizing photometric error is not equivalent to optimizing depth map.

Differences with official Implementation

  • Smooth Loss is different from official repo. Instead of applying it to disparity, we apply it to depth. Original disparity smooth loss did not work well (don't know why !) and it did not even converge at all with weight values used (0.5).
  • loss is divided by 2.3 when downscaling instead of 2. This is the results of empiric experiments, so the optimal value is clearly not carefully determined.
  • As a consequence, with a smooth loss of 2.0̀, depth test is better, but Pose test is worse. To revert smooth loss back to original, you can change it here

Preparing training data

Preparation is roughly the same command as in the original code.

For KITTI, first download the dataset using this script provided on the official website, and then run the following command. The --with-depth option will save resized copies of groundtruth to help you setting hyper parameters. The --with-pose will dump the sequence pose in the same format as Odometry dataset (see pose evaluation)

python3 data/prepare_train_data.py /path/to/raw/kitti/dataset/ --dataset-format 'kitti' --dump-root /path/to/resulting/formatted/data/ --width 416 --height 128 --num-threads 4 [--static-frames /path/to/static_frames.txt] [--with-depth] [--with-pose]

For Cityscapes, download the following packages: 1) leftImg8bit_sequence_trainvaltest.zip, 2) camera_trainvaltest.zip. You will probably need to contact the administrators to be able to get it. Then run the following command

python3 data/prepare_train_data.py /path/to/cityscapes/dataset/ --dataset-format 'cityscapes' --dump-root /path/to/resulting/formatted/data/ --width 416 --height 171 --num-threads 4

Notice that for Cityscapes the img_height is set to 171 because we crop out the bottom part of the image that contains the car logo, and the resulting image will have height 128.

Training

Once the data are formatted following the above instructions, you should be able to train the model by running the following command

python3 train.py /path/to/the/formatted/data/ -b4 -m0.2 -s0.1 --epoch-size 3000 --sequence-length 3 --log-output [--with-gt]

You can then start a tensorboard session in this folder by

tensorboard --logdir=checkpoints/

and visualize the training progress by opening https://localhost:6006 on your browser. If everything is set up properly, you should start seeing reasonable depth prediction after ~30K iterations when training on KITTI.

Evaluation

Disparity map generation can be done with run_inference.py

python3 run_inference.py --pretrained /path/to/dispnet --dataset-dir /path/pictures/dir --output-dir /path/to/output/dir

Will run inference on all pictures inside dataset-dir and save a jpg of disparity (or depth) to output-dir for each one see script help (-h) for more options.

Disparity evaluation is avalaible

python3 test_disp.py --pretrained-dispnet /path/to/dispnet --pretrained-posenet /path/to/posenet --dataset-dir /path/to/KITTI_raw --dataset-list /path/to/test_files_list

Test file list is available in kitti eval folder. To get fair comparison with Original paper evaluation code, don't specify a posenet. However, if you do, it will be used to solve the scale factor ambiguity, the only ground truth used to get it will be vehicle speed which is far more acceptable for real conditions quality measurement, but you will obviously get worse results.

Pose evaluation is also available on Odometry dataset. Be sure to download both color images and pose !

python3 test_pose.py /path/to/posenet --dataset-dir /path/to/KITIT_odometry --sequences [09]

ATE (Absolute Trajectory Error) is computed as long as RE for rotation (Rotation Error). RE between R1 and R2 is defined as the angle of R1*R2^-1 when converted to axis/angle. It corresponds to RE = arccos( (trace(R1 @ R2^-1) - 1) / 2). While ATE is often said to be enough to trajectory estimation, RE seems important here as sequences are only seq_length frames long.

Pretrained Nets

Avalaible here

Arguments used :

python3 train.py /path/to/the/formatted/data/ -b4 -m0 -s2.0 --epoch-size 1000 --sequence-length 5 --log-output --with-gt

Depth Results

Abs Rel Sq Rel RMSE RMSE(log) Acc.1 Acc.2 Acc.3
0.181 1.341 6.236 0.262 0.733 0.901 0.964

Pose Results

5-frames snippets used

Seq. 09 Seq. 10
ATE 0.0179 (std. 0.0110) 0.0141 (std. 0.0115)
RE 0.0018 (std. 0.0009) 0.0018 (std. 0.0011)

Discussion

Here I try to link the issues that I think raised interesting questions about scale factor, pose inference, and training hyperparameters

  • Issue 48 : Why is target frame at the center of the sequence ?
  • Issue 39 : Getting pose vector without the scale factor uncertainty
  • Issue 46 : Is Interpolated groundtruth better than sparse groundtruth ?
  • Issue 45 : How come the inverse warp is absolute and pose and depth are only relative ?
  • Issue 32 : Discussion about validation set, and optimal batch size
  • Issue 25 : Why filter out static frames ?
  • Issue 24 : Filtering pixels out of the photometric loss
  • Issue 60 : Inverse warp is only one way !

Other Implementations

TensorFlow by tinghuiz (original code, and paper author)

Comments
  • confusion about scale factor

    confusion about scale factor

    image What's the difference between the two ways for calculating errors? I changed the model a little and trained 1000 epochs using your 'dispnet_model_best.pth.tar' and 'exp_pose_checkpoint.pth.tar' as pretrained model. But I get worse results in 'Results with scale factor determined by GT/prediction ratio (like the original paper) ' but better result in 'Results with scale factor determined by PoseNet'. What's dose it mean? How many epochs will be better?

    opened by sunnyHelen 29
  • Blank output after training on KITTI

    Blank output after training on KITTI

    I used the following command to train on KITTI

    python3 train.py ./data/kitti/kitti_rawdata_formatted/ -b4 -m0.2 -s0.1 --epoch-size 3000 --sequence-length 3 --log-output

    However, I get a blank image when I run the inference (attached). 0000000000_disp

    Even while training, the dispnet output and the depth outputs (seen on tensorboard) are blank images.

    Please help.

    opened by mathmanu 29
  • Can not reproduce pose results.

    Can not reproduce pose results.

    I am getting the following with the pretrained models, which are much worse than reported here. https://github.com/ClementPinard/SfmLearner-Pytorch#pose-results It is possible that the pretrained models in google drive are not updated.

    $ python test_pose.py ../SfmLearner-Pytorch/pretrained/exp_pose_model_best.pth.tar --dataset-dir kitti/odometry/ --sequence 09
    getting test metadata for theses sequences : {Path('kitti/odometry/sequences/09')}
    100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.50it/s]
    1591 snippets to test
    100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋| 1587/1591 [02:35<00:00, 10.20it/s]
    
    Results
    	        ATE,         RE
    mean 	     0.0195,     0.0041
    std 	     0.0106,     0.0022
    $ python test_pose.py ../SfmLearner-Pytorch/pretrained/exp_pose_model_best.pth.tar --dataset-dir kitti/odometry/ --sequence 10
    getting test metadata for theses sequences : {Path('kitti/odometry/sequences/10')}
    100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.54s/it]
    1201 snippets to test
    100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌| 1197/1201 [03:11<00:00,  6.24it/s]
    
    Results
    	        ATE,         RE
    mean 	     0.0148,     0.0042
    std 	     0.0096,     0.0026
    opened by anuragranj 16
  • Does the size of batch-size affect the training results?

    Does the size of batch-size affect the training results?

    Hi, I have run the train.py with the command blow on KITTI-raw-data : python3 train.py /path/to/the/formatted/data/ -b4 -m0 -s2.0 --epoch-size 1000 --sequence-length 5 --log-output --with-gt Otherwise the batch-size=80, and the train(41664)/vaild(2452) split is different. The result I get is: disp: Results with scale factor determined by GT/prediction ratio (like the original paper) : ` abs_rel, sq_rel, rms, log_rms, a1, a2, a3 0.2058, 1.6333, 6.7410, 0.2895, 0.6762, 0.8853, 0.9532

    pose: Results 10 ATE, RE mean 0.0223, 0.0053 std 0.0188, 0.0036

    Results 09 ATE, RE mean 0.0284, 0.0055 std 0.0241, 0.0035 ` You can see that there's still a quiet big margin with yours: Abs Rel | Sq Rel | RMSE | RMSE(log) | Acc.1 | Acc.2 | Acc.3 0.181 | 1.341 | 6.236 | 0.262 | 0.733 | 0.901 | 0.964

    I think there is no other factors causing this difference, otherwise the batch-size and data split. Therefore, does the size of batch-size affect the training results?

    What's more, when I try to train my model with two Titan GPUs, batch-size=80*2=160, the memory usage of each GPU is: GPU0: about 11G, GPU1: about 6G. There is a huge memory usage difference between two GPUs, and it seriously impacts multi-gpu trianing. And then I find the loss calculations are all placed on the first GPU, actually the memory is mainly used to calculate the 4 scales of depth photometric_reconstruction_loss, and we can just move some scales to the cuda:0, and others to cuda:1, it might be better I think.

    opened by youmi-zym 14
  • inference depth with same results

    inference depth with same results

    Hello author, I have a problem. No matter what picture I use, the final depth image and disp image are exactly the same. I don’t know the reason. Could you please help me solve it? Thank you very much . 0000000160 0000000000 0000000000_depth 0000000000_disp 0000000001_depth 0000000001_disp

    opened by liumingcun 13
  • AttributeError: 'list' object has no attribute 'detach'

    AttributeError: 'list' object has no attribute 'detach'

    Sorry for disturb you. When I run train.py, I got a mistake , I dont know how to solve it, could you tell me what should I do to run the train.py correctly? Here is the error message:

    Traceback (most recent call last): File "train.py", line 460, in main() File "train.py", line 219, in main errors, error_names = validate_without_gt(args, val_loader, disp_net, pose_exp_net, epoch, logger, tb_writer) File "/home/jie_r/.conda/envs/sfmlearner/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 43, in decorate_no_grad 100% (200 of 200) |##############################################################| Elapsed Time: 0:04:32 Time: 0:04:32 File "train.py", line 372, in validate_without_gt log_output_tensorboard(tb_writer, 'val', i, '', epoch, 1./disp, disp, warped, diff, explainability_mask) 100% (3000 of 3000) |############################################################| Elapsed Time: 0:04:32 Time: 0:04:32 warped_to_show = tensor2array(warped_j) File "/home/jie_r/SfmLearner-Pytorch-master/utils.py", line 87, in tensor2array 100% (753 of 753) |##############################################################| Elapsed Time: 0:00:00 Time: 0:00:00 AttributeError: 'list' object has no attribute 'detach'

    opened by lordbutters 13
  • error throwing with multi threads in prepare_train_data.py

    error throwing with multi threads in prepare_train_data.py

    Hi, I tried to tun prepare_train_data.py using the command you provided (num_thread=4), but got the following error message:

    joblib.externals.loky.process_executor._RemoteTraceback: """ Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 418, in _process_worker r = call_item() File "/usr/local/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 272, in call return self.fn(*self.args, **self.kwargs) File "/usr/local/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 567, in call return self.func(*args, **kwargs) File "/usr/local/lib/python3.6/site-packages/joblib/parallel.py", line 225, in call for func, args, kwargs in self.items] File "/usr/local/lib/python3.6/site-packages/joblib/parallel.py", line 225, in for func, args, kwargs in self.items] File "/Users/siwei/Desktop/SfmLearner-Pytorch-master/data/prepare_train_data.py", line 33, in dump_example dump_dir = args.dump_root/scene_data['rel_path'] AttributeError: 'Namespace' object has no attribute 'dump_root' """

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last): File "/Users/siwei/Desktop/SfmLearner-Pytorch-master/data/prepare_train_data.py", line 109, in main() File "/Users/siwei/Desktop/SfmLearner-Pytorch-master/data/prepare_train_data.py", line 86, in main Parallel(n_jobs=args.num_threads)(delayed(dump_example)(scene) for scene in tqdm(data_loader.scenes)) File "/usr/local/lib/python3.6/site-packages/joblib/parallel.py", line 930, in call self.retrieve() File "/usr/local/lib/python3.6/site-packages/joblib/parallel.py", line 833, in retrieve self._output.extend(job.get(timeout=self.timeout)) File "/usr/local/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 521, in wrap_future_result return future.result(timeout=timeout) File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/concurrent/futures/_base.py", line 432, in result return self.__get_result() File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result raise self._exception AttributeError: 'Namespace' object has no attribute 'dump_root'

    Do you have any idea why? The error didn't happen when I used only one thread.

    opened by sanweiliti 12
  • Confusion about the 'src_pixel_coords' parameter in function 'F.grid_sample'

    Confusion about the 'src_pixel_coords' parameter in function 'F.grid_sample'

    Hi @ClementPinard ,

    Thanks for you work. I'm confused about the second parameter 'src_pixel_coords' of function 'F.grid_sample'. projected_img = F.grid_sample(img, src_pixel_coords, padding_mode=padding_mode, align_corners=True) The last dimension of that parameter seems to be the input pixel locations, but in your code, it seems to be the output pixel locations.Why?

    Thanks

    opened by pleasegostraight 11
  • questions about the loss function

    questions about the loss function

    Hi, Clément Pinard It's an impressive job that you have done in this project and the code is written beautifully. May I ask you some questions about the loss function? For the photometric_reconstruction_loss, it use the predicted depth and pose to get the grid from ref_imgs to tgt_img. Thus, we can use the F.grid_sample to generate tgt_img from ref_imgs. But in theory, we can also generate ref_imgs from tgt_img. Do you have any idea about how to realize this operation?

    opened by huagl 8
  • difference in validation with sparse ground truth and filled ground truth of depth

    difference in validation with sparse ground truth and filled ground truth of depth

    Hi, The depth predictions are validated with sparse ground truth depths of KITTI here, but there are also other papers validating against full ground truth (filled by interpolating). Will there be a large difference in the validation result between these two methods? Which one is the main criteria nowadays in monocular depth estimation?

    opened by sanweiliti 8
  • Reconstruction loss as NaN

    Reconstruction loss as NaN

    When I went through the function to calculate photometric reconstruction loss I found this line of code assert((reconstruction_loss == reconstruction_loss).data[0] == 1) I figured out that this line is there to check if reconstruction loss is NaN. But I couldn't quite figure out why are we doing this. Under what circumstances could reconstruction loss reach NaN.

    I'm trying to train only the pose network. I'm using the depth map obtained from Kinect camera in place of training the DispNet. I'm getting assertion error randomly in run time. I'm unable to figure out the cause. I wan to know why we are checking for NaNs in reconstruction loss. Also what are the possible causes for reconstruction loss to reach NaN?

    opened by asprasan 8
  • Question about using oxts data

    Question about using oxts data

    Thanks for great work! ` if scale is None: scale = np.cos(lat * np.pi / 180.)

    pose_matrix = pose_from_oxts_packet(metadata[:6], scale) if origin is None: origin = pose_matrix odo_pose = imu2cam @ np.linalg.inv(origin) @ pose_matrix @ np.linalg.inv(imu2cam) ` I am curious about this part of the code which is not shown in the original sfmlearner. Could you tell me how this works(about math detail),or what book or reference should i read. Concretely

    1. what is the effect of the scale
    2. why ty = lat * np.pi * er / 180. different from the original implementation in https://github.com/utiasSTARS/pykitti
    3. the math detail about the transformation from (lat lon alt) to translation
    4. pose_matrix convert ? coordinate system to ? coordinate system,the effect of np.linalg.inv(origin) @ pose_matrix
    opened by myalos 0
  • Weird results from pretrained model on KITTI images

    Weird results from pretrained model on KITTI images

    Thanks for the PyTorch codes! When I use the pretrained model (https://drive.google.com/drive/folders/1H1AFqSS8wr_YzwG2xWwAQHTfXN5Moxmx) to inference disp and depth on Kitti images, the disp and the depth result looks weird. The disp and the depth change very sharply. The main part of the disp is either 255 or 0, with almost no intermediate value. 2011_09_26_drive_0011_sync_03-0000000000_disp 0000000000 2011_09_26_drive_0011_sync_03-0000000000_depth

    opened by a961009 4
  • Query regarding depth map.

    Query regarding depth map.

    Hi , Thanks a lot for your work, its very inspiring.

    1. After executing run_inference.py the depth image obtained is a 3 channel image, so how can I get the actual depth of each pixel in the image? Does any of those 3 channels represent the depth of the corresponding pixel?

    2. Also, what I understood by now is that, the depthmap obtained from the network dosent provide the direct actual depth but it is the scaled depth. So I need to compare the scaled depth obtained from the network with the ground truth value and find the scaling factor. And this scaling factor I can use it everytime further to get the actual depth from the scaled depth. Could you please confirm if I am understanding it correctly.

    3. I am currently trying it on KITTI dataset. Could it be possible to calculate the depthmap ourself using the disparity map obtained from the model?

    Thank You.

    opened by rohitcmohite 2
  • test_disp.py min/max depth value issue

    test_disp.py min/max depth value issue

    Hi! Thank you for your nice implementation.

    I have a question about clipping the depth value in test_disp.py. https://github.com/ClementPinard/SfmLearner-Pytorch/blob/master/test_disp.py

    Currently, before applying the scale factor to predicted depth, depth is clipped by min/max depth value as shown below.

    pred_depth_zoomed = zoom(pred_depth,
                             (gt_depth.shape[0]/pred_depth.shape[0],
                              gt_depth.shape[1]/pred_depth.shape[1])
                             ).clip(args.min_depth, args.max_depth)
    
    ( ... )
    
    if seq_length > 1:
            ( ... )
            scale_factor = np.mean(sample['displacements'] / displacement_magnitudes)
            errors[0,:,j] = compute_errors(gt_depth, pred_depth_zoomed*scale_factor)
    
    ( ... )
    
    scale_factor = np.median(gt_depth)/np.median(pred_depth_zoomed)
    errors[1,:,j] = compute_errors(gt_depth, pred_depth_zoomed*scale_factor)
    
    

    But isn't it correct to clipping after applying the scale factor as shown below?

    pred_depth_zoomed = zoom(pred_depth,
                             (gt_depth.shape[0]/pred_depth.shape[0],
                              gt_depth.shape[1]/pred_depth.shape[1])
                             )
    
    ( ... )
    
    
    if seq_length > 1:
            ( ... )
            scale_factor = np.mean(sample['displacements'] / displacement_magnitudes)
            errors[0,:,j] = compute_errors(gt_depth, (pred_depth_zoomed*scale_factor).clip(args.min_depth, args.max_depth))
    
    ( ... )
    
    scale_factor = np.median(gt_depth)/np.median(pred_depth_zoomed)
    errors[1,:,j] = compute_errors(gt_depth, (pred_depth_zoomed*scale_factor).clip(args.min_depth, args.max_depth))
    
    

    The difference by this issue is below when pose network is not used. | |abs_diff|abs_rel|sq_rel|rms|log_rms|abs_log|a1|a2|a3| |---|---|---|---|---|---|---|---|---|---| |current|2.3598|0.1233|0.8640|4.8908|0.1987|0.1258|0.8569|0.9561|0.9815| |if fixed|2.3391|0.1229|0.8205|4.7596|0.1981|0.1256|0.8572|0.9564|0.9817|

    Especially, the sq_rel and rms metric is sensitive to this issue.

    Thank you.

    opened by seb-le 3
  • How to use KITTI Odometry 00-08 to train pose-exp-net?

    How to use KITTI Odometry 00-08 to train pose-exp-net?

    Hello, thank you for open-source this code. I am training the network recently, but I don't know how to train the pose estimation network, which requires KITTI Odometry 00-08. Looking forward to your reply.

    opened by yangbinchao 1
Owner
Clément Pinard
PhD ENSTA Paris, Deep Learning Engineer @ ContentSquare
Clément Pinard
A PaddlePaddle version of Neural Renderer, refer to its PyTorch version

Neural 3D Mesh Renderer in PadddlePaddle A PaddlePaddle version of Neural Renderer, refer to its PyTorch version Install Run: pip install neural-rende

AgentMaker 13 Jul 12, 2022
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

DLR-RM 4.7k Jan 1, 2023
Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer

VidLanKD Implementation of VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer by Zineng Tang, Jaemin Cho, Hao Tan, Mohi

Zineng Tang 54 Dec 20, 2022
The PyTorch improved version of TPAMI 2017 paper: Face Alignment in Full Pose Range: A 3D Total Solution.

Face Alignment in Full Pose Range: A 3D Total Solution By Jianzhu Guo. [Updates] 2020.8.30: The pre-trained model and code of ECCV-20 are made public

Jianzhu Guo 3.4k Jan 2, 2023
PyTorch version of the paper 'Enhanced Deep Residual Networks for Single Image Super-Resolution' (CVPRW 2017)

About PyTorch 1.2.0 Now the master branch supports PyTorch 1.2.0 by default. Due to the serious version problem (especially torch.utils.data.dataloade

Sanghyun Son 2.1k Jan 1, 2023
Collection of generative models in Pytorch version.

pytorch-generative-model-collections Original : [Tensorflow version] Pytorch implementation of various GANs. This repository was re-implemented with r

Hyeonwoo Kang 2.4k Dec 31, 2022
A pytorch-version implementation codes of paper: "BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation"

BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation A pytorch-version implementation

null 11 Oct 8, 2022
a reimplementation of UnFlow in PyTorch that matches the official TensorFlow version

pytorch-unflow This is a personal reimplementation of UnFlow [1] using PyTorch. Should you be making use of this work, please cite the paper according

Simon Niklaus 134 Nov 20, 2022
a reimplementation of LiteFlowNet in PyTorch that matches the official Caffe version

pytorch-liteflownet This is a personal reimplementation of LiteFlowNet [1] using PyTorch. Should you be making use of this work, please cite the paper

Simon Niklaus 365 Dec 31, 2022
PyTorch version implementation of DORN

DORN_PyTorch This is a PyTorch version implementation of DORN Reference H. Fu, M. Gong, C. Wang, K. Batmanghelich and D. Tao: Deep Ordinal Regression

Zilin.Zhang 3 Apr 27, 2022
PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

Study-CSRNet-pytorch This is the PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

null 0 Mar 1, 2022
A python code to convert Keras pre-trained weights to Pytorch version

Weights_Keras_2_Pytorch 最近想在Pytorch项目里使用一下谷歌的NIMA,但是发现没有预训练好的pytorch权重,于是整理了一下将Keras预训练权重转为Pytorch的代码,目前是支持Keras的Conv2D, Dense, DepthwiseConv2D, Batch

Liu Hengyu 2 Dec 16, 2021
The modify PyTorch version of Siam-trackers which are speed-up by TensorRT.

SiamTracker-with-TensorRT The modify PyTorch version of Siam-trackers which are speed-up by TensorRT or ONNX. [Updating...] Examples demonstrating how

null 9 Dec 13, 2022
Pytorch-Swin-Unet-V2 - a modified version of Swin Unet based on Swin Transfomer V2

Swin Unet V2 Swin Unet V2 is a modified version of Swin Unet arxiv based on Swin

Chenxu Peng 26 Dec 3, 2022
This is the official Pytorch-version code of FlatGCN (Flattened Graph Convolutional Networks for Recommendation).

FlatGCN This is the official Pytorch-version code of FlatGCN (Flattened Graph Convolutional Networks for Recommendation, submitted to ICASSP2022). Req

Dreamer 2 Aug 9, 2022
Boundary IoU API (Beta version)

Boundary IoU API (Beta version) Bowen Cheng, Ross Girshick, Piotr Dollár, Alexander C. Berg, Alexander Kirillov [arXiv] [Project] [BibTeX] This API is

Bowen Cheng 177 Dec 29, 2022
[Open Source]. The improved version of AnimeGAN. Landscape photos/videos to anime

[Open Source]. The improved version of AnimeGAN. Landscape photos/videos to anime

CC 4.4k Dec 27, 2022
YOLTv4 builds upon YOLT and SIMRDWN, and updates these frameworks to use the most performant version of YOLO, YOLOv4

YOLTv4 builds upon YOLT and SIMRDWN, and updates these frameworks to use the most performant version of YOLO, YOLOv4. YOLTv4 is designed to detect objects in aerial or satellite imagery in arbitrarily large images that far exceed the ~600×600 pixel size typically ingested by deep learning object detection frameworks.

Adam Van Etten 161 Jan 6, 2023