D2Go is a toolkit for efficient deep learning

Related tags

Deep Learning d2go
Overview

D2Go

D2Go is a production ready software system from FacebookResearch, which supports end-to-end model training and deployment for mobile platforms.

What's D2Go

  • It is a deep learning toolkit powered by PyTorch and Detectron2.
  • State-of-the-art efficient backbone networks for mobile devices.
  • End-to-end model training, quantization and deployment pipeline.
  • Easy export to TorchScript format for deployment.

Installation

Install PyTorch Nightly (use CUDA 10.2 as example, see details at PyTorch Website):

conda install pytorch torchvision cudatoolkit=10.2 -c pytorch-nightly

Install Detectron2 (other installation options at Detectron2):

python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Install mobile_cv:

python -m pip install 'git+https://github.com/facebookresearch/mobile-vision.git'

Install d2go:

git clone https://github.com/facebookresearch/d2go
cd d2go & python -m pip install .

Get Started

License

D2Go is released under the Apache 2.0 license.

Comments
  • Demo script fails with ImportError: cannot import name 'metanet_pb2' from 'caffe2.proto'

    Demo script fails with ImportError: cannot import name 'metanet_pb2' from 'caffe2.proto'

    Instructions To Reproduce the 🐛 Bug:

    What exact command you run: python demo.py --config-file faster_rcnn_fbnetv3a_C4.yaml --input input1.jpg --output output1.jpg

    Full logs or other relevant observations:

    (d2go) mat@ada:~/repos/d2go/demo$ python demo.py --config-file faster_rcnn_fbnetv3a_C4.yaml --input input1.jpg --output output1.jpg
    Traceback (most recent call last):
      File "demo.py", line 11, in <module>
        from d2go.model_zoo import model_zoo
      File "/home/mat/anaconda3/envs/d2go/lib/python3.7/site-packages/d2go/model_zoo/model_zoo.py", line 7, in <module>
        from d2go.runner import create_runner
      File "/home/mat/anaconda3/envs/d2go/lib/python3.7/site-packages/d2go/runner/__init__.py", line 10, in <module>
        from .default_runner import BaseRunner, Detectron2GoRunner, GeneralizedRCNNRunner
      File "/home/mat/anaconda3/envs/d2go/lib/python3.7/site-packages/d2go/runner/default_runner.py", line 28, in <module>
        from d2go.export.d2_meta_arch import patch_d2_meta_arch
      File "/home/mat/anaconda3/envs/d2go/lib/python3.7/site-packages/d2go/export/__init__.py", line 5, in <module>
        from . import torchscript  # noqa
      File "/home/mat/anaconda3/envs/d2go/lib/python3.7/site-packages/d2go/export/torchscript.py", line 13, in <module>
        from detectron2.export.flatten import TracingAdapter, flatten_to_tuple
      File "/home/mat/anaconda3/envs/d2go/lib/python3.7/site-packages/detectron2/export/__init__.py", line 3, in <module>
        from .api import *
      File "/home/mat/anaconda3/envs/d2go/lib/python3.7/site-packages/detectron2/export/api.py", line 6, in <module>
        from caffe2.proto import caffe2_pb2
      File "/home/mat/anaconda3/envs/d2go/lib/python3.7/site-packages/caffe2/proto/__init__.py", line 11, in <module>
        from caffe2.proto import caffe2_pb2, metanet_pb2, torch_pb2
    ImportError: cannot import name 'metanet_pb2' from 'caffe2.proto' (/home/mat/anaconda3/envs/d2go/lib/python3.7/site-packages/caffe2/proto/__init__.py)
    
    pytorch                   1.11.0.dev20211109 py3.7_cuda10.2_cudnn7.6.5_0    pytorch-nightly
    pytorch-lightning         1.5.0                    pypi_0    pypi
    

    Was unable to track down what the issue was. Tried installing stable PyTorch build and got segfault

    bug 
    opened by msalvaris 15
  • Unable to replicate balloon training result

    Unable to replicate balloon training result

    Code to reproduce result:

    Tried to test the training of a model via the balloon example (https://github.com/facebookresearch/d2go/blob/master/demo/d2go_beginner.ipynb)

    import os
    import json
    import numpy as np
    from detectron2.structures import BoxMode
    from detectron2.data import MetadataCatalog, DatasetCatalog
    import cv2
    
    def get_balloon_dicts(img_dir):
        json_file = os.path.join(img_dir, "via_region_data.json")
        with open(json_file) as f:
            imgs_anns = json.load(f)
    
        dataset_dicts = []
        for idx, v in enumerate(imgs_anns.values()):
            record = {}
            
            filename = os.path.join(img_dir, v["filename"])
            height, width = cv2.imread(filename).shape[:2]
            
            record["file_name"] = filename
            record["image_id"] = idx
            record["height"] = height
            record["width"] = width
          
            annos = v["regions"]
            objs = []
            for _, anno in annos.items():
                assert not anno["region_attributes"]
                anno = anno["shape_attributes"]
                px = anno["all_points_x"]
                py = anno["all_points_y"]
                poly = [(x + 0.5, y + 0.5) for x, y in zip(px, py)]
                poly = [p for x in poly for p in x]
    
                obj = {
                    "bbox": [np.min(px), np.min(py), np.max(px), np.max(py)],
                    "bbox_mode": BoxMode.XYXY_ABS,
                    "segmentation": [poly],
                    "category_id": 0,
                }
                objs.append(obj)
            record["annotations"] = objs
            dataset_dicts.append(record)
        return dataset_dicts
    
    for d in ["train", "val"]:
        DatasetCatalog.register("balloon_" + d, lambda d=d: get_balloon_dicts("balloon/" + d))
        MetadataCatalog.get("balloon_" + d).set(thing_classes=["balloon"], evaluator_type="coco")
    
    balloon_metadata = MetadataCatalog.get("balloon_train")
    
    from d2go.runner import Detectron2GoRunner
    from d2go.model_zoo import model_zoo
    
    def prepare_for_launch():
        runner = Detectron2GoRunner()
        cfg = runner.get_default_cfg()
        cfg.merge_from_file(model_zoo.get_config_file("faster_rcnn_fbnetv3a_C4.yaml"))
        cfg.MODEL_EMA.ENABLED = False
        cfg.DATASETS.TRAIN = ("balloon_train",)
        cfg.DATASETS.TEST = ("balloon_val",)
        cfg.DATALOADER.NUM_WORKERS = 2
        cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("faster_rcnn_fbnetv3a_C4.yaml")  # Let training initialize from model zoo
        cfg.SOLVER.IMS_PER_BATCH = 2
        cfg.SOLVER.BASE_LR = 0.00025  # pick a good LR
        cfg.SOLVER.MAX_ITER = 600    # 600 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset
        cfg.SOLVER.STEPS = []        # do not decay learning rate
        cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   # faster, and good enough for this toy dataset (default: 512)
        cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # only has one class (ballon). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)
        cfg.OUTPUT_DIR = 'balloon_model'
        # NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here.
        os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
        return cfg, runner
    
    cfg, runner = prepare_for_launch()
    model = runner.build_model(cfg)
    runner.do_train(cfg, model, resume=False)
    
    cfg.MODEL.WEIGHTS = 'balloon_model/model_final.pth'
    metrics = runner.do_test(cfg, model)
    print(metrics)
    

    Result

     Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.021
     Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.061
     Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.012
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.001
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.041
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.004
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.118
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.204
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.012
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.333
    

    And the inference results on test images are terrible.

    Expected Result

     Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.494
     Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.651
     Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.543
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.104
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.757
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.204
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.526
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.526
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.118
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.810
    
    opened by AaronReidCI 14
  • Hardcode top-level directory, remove import for resource access

    Hardcode top-level directory, remove import for resource access

    Summary: This diff is part of a stack which has the goal of "buckifying" D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go core and enabling autodeps and other tooling. The last diff in the stack introduces the TARGETS. The diffs earlier in the stack are resolving circular dependencies and other issues which prevent the buckification from occurring.

    This diff changes the import paths being rerouted in the d2go package. Instead of local-importing the package, which creates a buck circular dependency, we can hardcode the top-level path. We know it is fixed because we are defining it with base_module.

    For the other packages it can remain as-is.

    Also add some dependencies annotations in preparation for the buckification, with manual.

    Differential Revision: D35928513

    CLA Signed fb-exported 
    opened by miqueljubert 13
  • assertion error in data loader while converting to int8

    assertion error in data loader while converting to int8

    I trained "faster_rcnn_fbnetv3g_fpn" on custom dataset. And model trained successfully. but receiving this error while converting to int8.

    code I am using to convert:

    import copy
    from detectron2.data import build_detection_test_loader
    from d2go.export.api import convert_and_export_predictor
    from d2go.tests.data_loader_helper import create_fake_detection_data_loader
    from d2go.export.d2_meta_arch import patch_d2_meta_arch
    
    import logging
    
    # disable all the warnings
    previous_level = logging.root.manager.disable
    logging.disable(logging.INFO)
    
    patch_d2_meta_arch()
    
    cfg_name = 'faster_rcnn_fbnetv3g_fpn.yaml'
    pytorch_model = model_zoo.get(cfg_name, trained=True)
    pytorch_model.cpu()
    
    with create_fake_detection_data_loader(224, 320, is_train=False) as data_loader:
        predictor_path = convert_and_export_predictor(
                model_zoo.get_config(cfg_name),
                copy.deepcopy(pytorch_model),
                "torchscript_int8@tracing",
                './',
                data_loader,
            )
    
    # recover the logging level
    logging.disable(previous_level)
    

    The error I am receiving:

    WARNING [03/16 06:34:40 mobile_cv.arch.utils.helper]: Arguments ['width_divisor', 'dw_skip_bnrelu', 'zero_last_bn_gamma'] skipped for op Conv2d
    loading annotations into memory...
    Done (t=0.00s)
    creating index...
    index created!
    /usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:477: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
      cpuset_checked))
    /usr/local/lib/python3.7/dist-packages/torch/quantization/observer.py:123: UserWarning: Please use quant_min and quant_max to specify the range for observers.                     reduce_range will be deprecated in a future release of PyTorch.
      reduce_range will be deprecated in a future release of PyTorch."
    ---------------------------------------------------------------------------
    AssertionError                            Traceback (most recent call last)
    <ipython-input-19-46e25055929a> in <module>()
         23             "torchscript_int8@tracing",
         24             './',
    ---> 25             data_loader,
         26         )
         27 
    
    /usr/local/lib/python3.7/dist-packages/d2go/export/api.py in convert_and_export_predictor(cfg, pytorch_model, predictor_type, output_dir, data_loader)
         98             pytorch_model = post_training_quantize(cfg, pytorch_model, data_loader)
         99             # only check bn exists in ptq as qat still has bn inside fused ops
    --> 100             assert not fuse_utils.check_bn_exist(pytorch_model)
        101         logger.info(f"Converting quantized model {cfg.QUANTIZATION.BACKEND}...")
        102         if cfg.QUANTIZATION.EAGER_MODE:
    
    AssertionError: 
    

    I have changed the config file and included the newly registered custom dataset. what could be else wrong here?

    bug 
    opened by DhruvMakwana 13
  • Add required example_args argument to prepare_fx and prepare_qat_fx

    Add required example_args argument to prepare_fx and prepare_qat_fx

    Summary: FX Graph Mode Quantization needs to know whether an fx node is a floating point Tensor before it can decide whether to insert observer/fake_quantize module or not, since we only insert observer/fake_quantize module for floating point Tensors. Currently we have some hacks to support this by defining some rules like NON_OBSERVABLE_ARG_DICT (https://github.com/pytorch/pytorch/blob/master/torch/ao/quantization/fx/utils.py#L496), but this approach is fragile and we do not plan to maintain it long term in the pytorch code base.

    As we discussed in the design review, we'd need to ask users to provide sample args and sample keyword args so that we can infer the type in a more robust way. This PR starts with changing the prepare_fx and prepare_qat_fx api to require user to either provide example arguments thrugh example_inputs, Note this api doesn't support kwargs, kwargs can make https://github.com/pytorch/pytorch/pull/76496#discussion_r861230047 (comment) simpler, but it will be rare, and even then we can still workaround with positional arguments, also torch.jit.trace(https://pytorch.org/docs/stable/generated/torch.jit.trace.html) and ShapeProp: https://github.com/pytorch/pytorch/blob/master/torch/fx/passes/shape_prop.py#L140 just have single positional args, we'll just use a single example_inputs argument for now.

    If needed, we can extend the api with an optional example_kwargs. e.g. in case when there are a lot of arguments for forward and it makes more sense to pass the arguments by keyword

    BC-breaking Note: Before: m = resnet18(...) m = prepare_fx(m, qconfig_dict)

    After: m = resnet18(...) m = prepare_fx(m, qconfig_dict, example_inputs=(torch.randn(1, 3, 224, 224),))

    Reviewed By: vkuzo, andrewor14

    Differential Revision: D35984526

    CLA Signed fb-exported 
    opened by jerryzh168 10
  • Enable torch tracing by changing assertions in d2go forwards to allow for torch.fx.proxy.Proxy type.

    Enable torch tracing by changing assertions in d2go forwards to allow for torch.fx.proxy.Proxy type.

    Summary: Torch FX tracing propagates a type of torch.fx.proxy.Proxy through the graph.

    Existing type assertions in the d2go code base trigger during torch FX tracing, causing tracing to fail.

    This diff adds a helper function is_fx_proxy(), for checking for torch.fx.proxy.Proxy instances, then uses this to guard the existing assertions, thus enabling the tracing, as well as maintaining the originally intended functionality.

    Differential Revision: D35518556

    CLA Signed fb-exported 
    opened by simonhollis 10
  • Quantization-aware training with the API

    Quantization-aware training with the API

    Instructions To Reproduce the 🐛 Bug:

    I tried to add quantization-aware training to the d2go_beginner.ipynb notebook, but I couldn't get it to work.

    Code:

    from d2go.runner import Detectron2GoRunner
    
    
    def prepare_for_launch():
        runner = Detectron2GoRunner()
        cfg = runner.get_default_cfg()
        cfg.merge_from_file(model_zoo.get_config_file("faster_rcnn_fbnetv3a_C4.yaml"))
        cfg.MODEL_EMA.ENABLED = False
        cfg.DATASETS.TRAIN = ("balloon_train",)
        cfg.DATASETS.TEST = ("balloon_val",)
        cfg.DATALOADER.NUM_WORKERS = 2
        cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("faster_rcnn_fbnetv3a_C4.yaml")  # Let training initialize from model zoo
        cfg.SOLVER.IMS_PER_BATCH = 2
        cfg.SOLVER.BASE_LR = 0.00025  # pick a good LR
        cfg.SOLVER.MAX_ITER = 600    # 300 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset
        cfg.SOLVER.STEPS = []        # do not decay learning rate
        cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   # faster, and good enough for this toy dataset (default: 512)
        cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # only has one class (ballon). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)
        # NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here.
    
        # quantization-aware training
        cfg.QUANTIZATION.BACKEND = "qnnpack"
        cfg.QUANTIZATION.QAT.ENABLED = True
        cfg.QUANTIZATION.QAT.START_ITER = 0
        cfg.QUANTIZATION.QAT.ENABLE_OBSERVER_ITER = 0
        cfg.QUANTIZATION.QAT.DISABLE_OBSERVER_ITER = 5
        cfg.QUANTIZATION.QAT.FREEZE_BN_ITER = 7
    
        os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
        return cfg, runner
    
    cfg, runner = prepare_for_launch()
    print(cfg)
    model = runner.build_model(cfg)
    runner.do_train(cfg, model, resume=False)
    

    Error message:

    AssertionError                            Traceback (most recent call last)
    <ipython-input-11-327fe2b2a9ce> in <module>()
         32 cfg, runner = prepare_for_launch()
         33 print(cfg)
    ---> 34 model = runner.build_model(cfg)
         35 runner.do_train(cfg, model, resume=False)
    
    15 frames
    /usr/local/lib/python3.7/dist-packages/torch/quantization/fuser_method_mappings.py in get_fuser_method(op_list, additional_fuser_method_mapping)
        129                                      additional_fuser_method_mapping)
        130     fuser_method = all_mappings.get(op_list, None)
    --> 131     assert fuser_method is not None, "did not find fuser method for: {} ".format(op_list)
        132     return fuser_method
    
    AssertionError: did not find fuser method for: (<class 'torch.nn.modules.conv.Conv2d'>, <class 'mobile_cv.arch.layers.batch_norm.NaiveSyncBatchNorm'>, <class 'torch.nn.modules.activation.ReLU'>) 
    

    Expected behavior:

    Quantization-aware training should work with the API.

    bug 
    opened by TannerGilbert 10
  • Training time  is huge

    Training time is huge

    the model training time is very very huge, the training iteration itself is fast, however the training load the whole data while training unlike detectron2, the time i needed to train model in detectron2 was 3 days for 300 000 iterations, now i need 3 days for 50 000 iterations. while training i see the script load the whole training data all again and again which consume time. can i speed up the training process? thanks in advance.

    opened by AhmedHessuin 9
  • One EMAState in D2go 1/N - model_ema.py --> ema.py

    One EMAState in D2go 1/N - model_ema.py --> ema.py

    Summary: Renaming model_ema.py to ema.py (as modeling is already in the folder name. Fixing dependencies after rename

    Differential Revision: D41685115

    CLA Signed Merged fb-exported 
    opened by mcimpoi 8
  • Integrate PyTorch Fully Sharded Data Parallel (FSDP)

    Integrate PyTorch Fully Sharded Data Parallel (FSDP)

    Summary: Integrate PyTorch FSDP, which supports two sharding modes: 1. gradient + optimizer sharding; 2. full model sharding (params + gradient + optimizer). This feature is enabled in the train_net.py code path.

    Sources

    • Integration follows this tutorial: https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html

    API changes

    • Add new config keys to support the new feature. Refer to mobile-vision/d2go/d2go/trainer/fairscale.py for the full list of config options
    • Add FSDPCheckpointer as an inheritance of QATCheckpointer to support special loading/saving logic for FSDP models

    Differential Revision: D39228316

    CLA Signed fb-exported 
    opened by YanjunChen329 8
  • use runner class instead of instance outside of main

    use runner class instead of instance outside of main

    Summary: As discussed, we decided to not use runner instance outside of main, previous diffs already solved the prerequisites, this diff mainly does the renaming.

    • Use runner name (str) in the fblearner, ML pipeline.
    • Use runner name (str) in FBL operator, MAST and binary operator.
    • Use runner class as the interface of main, it can be either the name of class (str) or actual class. The main usage should be using str, so that the importing of class happens inside main, But since this is BC breaking, i.e. some scripts or tests will fail without updating, supporting actual class makes it easier modify code for those cases (eg. some local test class doesn't have a name).

    Differential Revision: D37060338

    CLA Signed fb-exported 
    opened by wat3rBro 8
  • Clean Up MobileOptimizerType Rewrite Flags Public API and Documentation

    Clean Up MobileOptimizerType Rewrite Flags Public API and Documentation

    Summary: X-link: https://github.com/pytorch/pytorch/pull/91600

    Remove MobileOptimizerType and all rewrite flags from torch.X and torch._C.X to clean up torch.X and torch._C.X namespaces

    The affected rewrite flags are

    • CONV_BN_FUSION
    • FUSE_ADD_RELU
    • HOIST_CONV_PACKED_PARAMS
    • INSERT_FOLD_PREPACK_OPS
    • REMOVE_DROPOUT
    • VULKAN_AUTOMATIC_GPU_TRANSFER

    Bc-Breaking Change:

    Before this change, the rewrite flags were accessible through all of

    1. torch.utils.mobile_optimizer.MobileOptimizerType.X
    2. torch._C.MobileOptimizerType.X
    3. torch.X
    4. torch.MobileOptimizerType.X
    5. torch._C.X

    But after this change, only torch.utils.mobile_optimizer.MobileOptimizerType.X (option 1 above) and the newly added torch._C._MobileOptimizerType.X remain

    Corresponding updates to PyTorch Tutorial Docs are in https://github.com/pytorch/tutorials/pull/2163

    Differential Revision: D41690203

    CLA Signed fb-exported 
    opened by salilsdesai 2
  • Parallelize EMA optimizer

    Parallelize EMA optimizer

    Summary: Tracing d2go runners using adamw optimizer yielded small operators being executed in the EMA code. They can be fused together by using multi-tensor API.

    Differential Revision: D42098310

    CLA Signed fb-exported 
    opened by frabu6 1
  • Convert local checkpoint to global one automatically in d2go FSDP checkpointer

    Convert local checkpoint to global one automatically in d2go FSDP checkpointer

    Summary:

    Design

    Following D41861308, local checkpoints need to be converted to global ones before being loaded and used in non-FSDP wrapped models. This diff implements such conversion in d2go checkpointer level to allow automatic conversion with minimal user interference and no new config key.

    In previous diff, FSDPWrapper has 2 loading modes and 2 saving modes: it uses load_local_state_dict to determine whether the ckpt we want to load is local or global, and uses use_local_state_dict to decide whether to save new ckpts as local or global. Thus, there are 4 combinations of loading/saving modes:

    1. load local + save local
    2. load local + save global
    3. load global + save local
    4. load global + save global

    And the local-to-global checkpoint conversion maps to mode 2: load local + save global. Thus, when the checkpointer is in mode 2, it automatically saves the model to a global ckpt right after it loads the local ckpt. Because this happens in checkpointer level, normal training/eval can resume after ckpt conversion. This gives users a consistent and seamless experience with normal training/eval, while also providing a separate ckpt conversion feature via eval-only.

    Usage

    Suppose we want to convert local checkpoint /tmp/model_final, user can run the same training command with extra args: MODEL.WEIGHTS=/tmp/model_final and FSDP.USE_LOCAL_STATE_DICT=False

    Wiki: https://www.internalfb.com/intern/wiki/Mobile_Vision/Detectron2Go/D2 (https://github.com/facebookresearch/d2go/commit/87374efb134e539090e0b5c476809dc35bf6aedb)Go_Tutorials/Diffusion_Pipeline/Diffusion_Model_Inference/#using-checkpoints-traine

    Differential Revision: D41926662

    CLA Signed fb-exported 
    opened by YanjunChen329 1
  • Move FSDP wrapping to runner.build_model

    Move FSDP wrapping to runner.build_model

    Summary: Move FSDP wrapping to runner.build_model by rewriting it as a modeling hook

    Motivation When a model is too large to run inference on a single GPU, it requires using FSDP with local checkpointing mode to save peak GPU memory. However, in eval_pytorch workflow (train_net with eval-only), models are evaluated without being wrapped by FSDP. This may cause OOM errors for the reasons above. Thus, it may be a better practice to wrap model with FSDP during runner.build_model(cfg), so evaluation can also be run in the same FSDP setting as in training.

    This diff moves FSDP wrapping to runner.build_model(cfg) by rewriting it as a modeling hook.

    API changes

    • Users need to append "FSDPModelingHook" to MODEL.MODELING_HOOKS to enable FSDP.
    • FSDP.ALGORITHM can only be full or grad_optim

    Note It's not possible to unwrap an FSDP model back to the normal model, so FSDPModelingHook.unapply() can't be implemented

    Differential Revision: D41416917

    CLA Signed fb-exported 
    opened by YanjunChen329 1
Owner
Facebook Research
Facebook Research
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

Microsoft 17k Feb 11, 2021
FAMIE is a comprehensive and efficient active learning (AL) toolkit for multilingual information extraction (IE)

FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction

null 18 Sep 1, 2022
AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning

AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning (NeurIPS 2020) Introduction AdaShare is a novel and differentiable approach fo

null 94 Dec 22, 2022
Efficient-GlobalPointer - Pytorch Efficient GlobalPointer

引言 感谢苏神带来的模型,原文地址:https://spaces.ac.cn/archives/8877 如何运行 对应模型EfficientGlobalPoi

powerycy 40 Dec 14, 2022
TorchIO is a Medical image preprocessing and augmentation toolkit for deep learning. Part of the PyTorch Ecosystem.

Medical image preprocessing and augmentation toolkit for deep learning. Part of the PyTorch Ecosystem.

Fernando Pérez-García 1.6k Jan 6, 2023
TorchOk - The toolkit for fast Deep Learning experiments in Computer Vision

TorchOk - The toolkit for fast Deep Learning experiments in Computer Vision

null 52 Dec 23, 2022
A clear, concise, simple yet powerful and efficient API for deep learning.

The Gluon API Specification The Gluon API specification is an effort to improve speed, flexibility, and accessibility of deep learning technology for

Gluon API 2.3k Dec 17, 2022
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Microsoft 8.4k Jan 1, 2023
PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing

PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing

null 943 Jan 7, 2023
Code release for "Self-Tuning for Data-Efficient Deep Learning" (ICML 2021)

Self-Tuning for Data-Efficient Deep Learning This repository contains the implementation code for paper: Self-Tuning for Data-Efficient Deep Learning

THUML @ Tsinghua University 101 Dec 11, 2022
Implementation of "Selection via Proxy: Efficient Data Selection for Deep Learning" from ICLR 2020.

Selection via Proxy: Efficient Data Selection for Deep Learning This repository contains a refactored implementation of "Selection via Proxy: Efficien

Stanford Future Data Systems 70 Nov 16, 2022
Lorien: A Unified Infrastructure for Efficient Deep Learning Workloads Delivery

Lorien: A Unified Infrastructure for Efficient Deep Learning Workloads Delivery Lorien is an infrastructure to massively explore/benchmark the best sc

Amazon Web Services - Labs 45 Dec 12, 2022
TorchX: A PyTorch Extension Library for More Efficient Deep Learning

TorchX TorchX: A PyTorch Extension Library for More Efficient Deep Learning. @misc{torchx, author = {Ansheng You and Changxu Wang}, title = {T

Donny You 8 May 28, 2022
A deep learning library that makes face recognition efficient and effective

Distributed Arcface Training in Pytorch This is a deep learning library that makes face recognition efficient, and effective, which can train tens of

Sajjad Aemmi 10 Nov 23, 2021
An efficient and easy-to-use deep learning model compression framework

TinyNeuralNetwork 简体中文 TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework, which contains features like neura

Alibaba 441 Dec 25, 2022
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.

English | 简体中文 Easy Parallel Library Overview Easy Parallel Library (EPL) is a general and efficient library for distributed model training. Usability

Alibaba 185 Dec 21, 2022
This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL [Deep Graph Library] and PyTorch.

BUPT GAMMA Lab 519 Jan 2, 2023
Tutorial on active learning with the Nvidia Transfer Learning Toolkit (TLT).

Active Learning with the Nvidia TLT Tutorial on active learning with the Nvidia Transfer Learning Toolkit (TLT). In this tutorial, we will show you ho

Lightly 25 Dec 3, 2022
CRLT: A Unified Contrastive Learning Toolkit for Unsupervised Text Representation Learning

CRLT: A Unified Contrastive Learning Toolkit for Unsupervised Text Representation Learning This repository contains the code and relevant instructions

XiaoMing 5 Aug 19, 2022