a distributed deep learning platform

  • Add new example APIs

    Add new example APIs

    The changes recommended in https://github.com/apache/singa-doc/pull/14 to simply the APIs of the examples I have reduced the number of arguments in the train_mnist_cnn

    opened by chrishkchris 50
  • SINGA-487 Include NCCL and MPICH in conda build

    SINGA-487 Include NCCL and MPICH in conda build

    Seems that adding nccl and mpich is okay in conda build of singa, but need to check further and add other thing such as python "deprecated"

    ERROR: test_tensor (unittest.loader._FailedTest)
    ImportError: Failed to import test module: test_tensor
    Traceback (most recent call last):
      File "/root/miniconda/conda-bld/singa_1583767754024/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib/python3.6/unittest/loader.py", line 428, in _find_test_path
        module = self._get_module_from_name(name)
      File "/root/miniconda/conda-bld/singa_1583767754024/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib/python3.6/unittest/loader.py", line 369, in _get_module_from_name
      File "/root/miniconda/conda-bld/singa_1583767754024/test_tmp/test/python/test_tensor.py", line 24, in <module>
        from singa import tensor
      File "/root/miniconda/conda-bld/singa_1583767754024/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib/python3.6/site-packages/singa/tensor.py", line 58, in <module>
        from deprecated import deprecated
    ModuleNotFoundError: No module named 'deprecated'
    FAIL: test_conv2d (test_mkldnn.TestPythonOperation)
    Traceback (most recent call last):
      File "/root/miniconda/conda-bld/singa_1583767754024/test_tmp/test/python/test_mkldnn.py", line 78, in test_conv2d
        self.assertAlmostEqual(4.0, _dW[0], places=5)
    AssertionError: 4.0 != 0.040000003 within 5 places
    Ran 13 tests in 0.003s
    FAILED (failures=1, errors=12)
    + exit 0
    Resource usage statistics from testing singa:
       Process count: 1
       CPU time: unavailable
       Memory: 3.0M
       Disk usage: 656B
       Time elapsed: 0:00:02.1
    TEST END: /root/miniconda/conda-bld/linux-64/singa-2.1.0.dev-cudnn7.3.1_cuda10.0_py36.tar.bz2
    Renaming work directory,  /root/miniconda/conda-bld/singa_1583767754024/work  to  /root/miniconda/conda-bld/singa_1583767754024/work_moved_singa-2.1.0.dev-cudnn7.3.1_cuda10.0_py36_linux-64_main_build_loop
    # Automatic uploading is disabled
    # If you want to upload package(s) to anaconda.org later, type:
    anaconda upload /root/miniconda/conda-bld/linux-64/singa-2.1.0.dev-cudnn7.3.1_cuda10.0_py36.tar.bz2
    # To have conda build upload to anaconda.org automatically, use
    # $ conda config --set anaconda_upload yes
    anaconda_upload is not set.  Not uploading wheels: []
    Resource usage summary:
    Total time: 0:07:23.3
    CPU usage: sys=0:00:06.5, user=0:02:17.0
    Maximum memory usage observed: 777.8M
    Total disk usage observed (not including envs): 252.4K
    Source and build intermediates have been left in /root/miniconda/conda-bld.
    There are currently 4 accumulated.
    To remove them, you can run the ```conda build purge``` command
    [email protected]:~/dcsysh/singa/tool/conda/singa#
    opened by chrishkchris 40
  • Dev branch cpu training problem (with conv and pool)

    Dev branch cpu training problem (with conv and pool)

    After merging the PR #590, today I am trying to solve the problem that the mnist cnn hangs as reported in PR #589. When I run mnist_cnn after changing to cpu, it hangs. So the training has some problem.

    opened by chrishkchris 29
  • Autograd Layer constructor

    Autograd Layer constructor

    The Layer class in Autograd is to maintain the model parameters. It passes the parameters into the operation and thus operations are stateless.

    Typically the parameter size depends on the input and layer configuration. Currently, we require the users to provide the input size in the layer constructor. Then we can create the parameter tensor and initialize it in the constructor, e.g., Linear layer. One potential problem is that the initialization operation may not be buffered. @XJDKC Is it an issue? For some layers like RNN implemented using cudnn, although we can get the input size, the parameter size is unknown until the cudnn handle is created, which is done until the data is forwarded through the layer.

    Another way is to delay the parameter tensor creation until the layer is called for forward propagation. At that time, we have the input tensor (and its device). Then in the layer constructor, we do not need the user to provide the input size. The drawback is that after the layer is created, the get_params() function would still fail to get the parameter tensors as they are not created yet. @dcslin To switch to this approach, we need to change the constructors of existing layer classes and examples. We also need to provide an initializer function/class into the constructor for initializing the parameter tensors after they are created.

    Please add your comments.

    opened by nudles 28
  • Add save and load method for Module class

    Add save and load method for Module class

    Updated on May 15

    class Layer:
       def get_params(self):
           """the params of this layer and sublayers as a dict;  param name is: layername.param
               e.g., self.W = Tensor(), self.b=Tensor()
                      name of W and b is  like conv1.W and conv1.b  
        def get_states(self):
           """states of this layer as sublayers that are necessary for model evaluation/inference.
               the states include the params and others, e.g., the running mean and var of batchnorm.
    class Module(Layer):   
      def compile(self ...):
         """set the name of each layer and sublayers, which will be used to create the dict 
              for get_params and get_states. Then no need to manually config the layer name 
             the __init__ method of a layer.
            For instance,
            class Blk(Layer):
                 def __init__(self):
                      self.conv1= Conv2d()
                      self.conv2 = Conv2d()
            class MyModel(Module):
                  def __init__(self):         
                     self.blk1 = Blk() --> blk1.conv1, blk1.conv2
                     self.blk2 = Blk()  --> blk2.conv1, blk2.conv2
      # high priority
      def save(self, fpath, ckp_states={}):
          """Save the model and optionally some states.
             fpath: output file path (without the extension)
             ckp_states(dict): states for checkpoint that are not attributes of Module, e.g., epoch ID.
          cust_states = {}
          if ckp_states is not None:
             cust_states = ckp_states + model (include sublayers) attributes - get_states()
          save model states via onnx with customized field for the cust_states
      def load(self, fpath, dev, use_graph, graph_alg):
          """Load the model onto dev
             path: input file path (without the extension)
              dict for the ckp_states.
          load model states + cust_states
          model attributes = model states + attributes from cust_states
          restore the model attributes
          return the rest states as a dict
    # lower priority
    def save(fpath, model, ckp_states):
        attributes <-- model
        replace all tensors in attributes + ckp_states into dict name -->(shape, dtype)
        dump the tensors via numpy.savez_compressed
        dump model via pickle
    def load(fpath, dev, use_graph, graph_alg):
         load model via pickle
         load tensors via numpy.load
         restore the tensors 
         return the ckp_states


    • Params: layer parameters (Tensor) that are updated via SGD. Layer.get_params()
    • States: Params + other variables that are necessary for model evaluation/inference. Superset of params. Layer.get_states()
    • Attributes: members of a class instance class.__dict__. Superset of states.
    opened by nudles 27
  • Refactor sonnx, test cases and examples

    Refactor sonnx, test cases and examples


    1. sonnx to suit new API(almost done),
    2. sonnx to use new autograd API,
    3. test cases,
    4. debug test operations,
    5. debug sonnx backend test cases,
    6. examples
    7. add re-training examples
    8. (if time is enough) add more example(will move to another PR)
    opened by joddiy 24
  • Refactor autograd module

    Refactor autograd module

    #688 is refactoring the autograd module. Here are some comments about the current APIs in autograd.

    1. Relationship between the classes and functions in autograd. Operator implements the forward and backward method for autograd. For each Operator class, there is a function that creates an Operator instance and calls the forward method. Layer stores the states (handles and parameters) and calls Operator function for the real computation. Note that a layer class can have sub-layers (as states) for creating complex and deep models. Issue: When we create a network using the Module API, there are both stateless (e.g., flatten) and statefull (conv2d) operations. Currently, we create layers in __init__ of Module and calls the layers and operator function in forward method. Therefore, Layer and Operator are mixed, which may confuse the users. A better way is to use Layer instances only. For every operator, we create a corresponding layer class to replace the layer function.

    2. Layer API. issue: when and how to initialize the parameters and (handle) of a layer?

    • do initialization in __init__ method OR when the data is forwarded for the first time #674
    • pass an initializer function to the __init__ method of a layer and use it to initialize the parameters OR pass an initializer function to the __init__ method of the Module class and use it to initialize the parameter (through get_params) of the layers after forwarding the layers for once. The second approach requires the Module class's __init__ to do a forward pass of all layers and then get_params of each layer for initialization. To do that, it needs at least the shapes of the input tensors and the device. The drawback of the first approach is that we need to include the initializer in every Layer constructor.

    comments are welcomed.

    opened by nudles 23
  • SINGA-505 Computational graph with memory optimization

    SINGA-505 Computational graph with memory optimization


    This PR adds the computational graph with memory optimization. It is based on the code developed by @chrishkchris and @XJDKC and some discussions with @nudles.


    There are four main features in this PR, namely the construction of the computational graph, lazy allocation, automatic recycling, and synchronization pipeline. Details as follows:

    • Computational graph construction: Construct a computational graph based on the user-defined neural network or expressions and then run the graph to accomplish the training task. The computational graph also includes operations like synch and fused synch in the communicator.
    • Lazy allocation: When blocks need to be allocated, devices won't allocate memory for them immediately. Only when an operation uses this block for the first time, memory allocation will be performed.
    • Automatic recycling: Automatically deallocate the intermediate tensors which won't be used again in the following operations when we are running the graph in an iteration.
    • Synchronization pipeline: In previous synchronization operations, buffers were used to synchronize multiple tensors at once. But the communicator needs to collect all the tensors before copying them into the buffer. Synchronization pipeline can copy tensors to the buffer separately, which reduces the time for synchronous operations.


    1. Computational graph construction
      • Use the technique of delayed execution to falsely perform operations in the forward propagation and backward propagation once. Buffer all the operations and the tensors read or written by each operation.
      • Calculate dependencies between all the operations to decide the order of execution. (Support directed cyclic graph)
      • Execute all the operations in the order we just calculated to update all the parameters.
      • The system will only analyze the same graph once. If new operations are added to the graph, the calculation graph will be re-analyzed.
      • Provided a module class for users to use this feature more conveniently.
    2. Lazy allocation
      • When a device needs to create a new block, just pass the device to that block instead of allocating a piece of memory from the mempool and passing the pointer to that block.
      • When the block is accessed for the first time, let the device corresponding to the block allocate memory and then access it.
    3. Automatic recycling
      • When calculating dependencies between the operations during the graph construction, the reference count of tensors can also be calculated.
      • When an operation is completed, we can decrease the reference count of tensors the operation used.
      • If a tensor's reference count reaches zero, it means the tensor won't be accessed by latter operations and we can recycle its memory.
      • The program will track the usage of the block. If a block is used on the python side, it will not be recycled, which is convenient for debugging on the python side.
    4. Synchronization pipeline
      • If a tensor needs fusion synchronization, it will be copied to the buffer immediately and don't need to gather all the tensors. Because the copy operation is advanced, it takes less time to do real synchronization. This optimizes the use of the GPU.


    • Tensor&Operation&Communicator
      • Change the capture type of tensors in lambda expressions to achieve delayed execution.
      • Change the type of input and output parameters to ensure that the input and output of the operation are tensors.
      • Submit synchronous operations to the device for execution.
    • Device: Add code for
      • Buffering operations
      • RunGraph
    • Scheduler
      • Add operations in the graph to construct the graph.
      • Analyse the graph, calculating dependencies and deciding the reallocation of the blocks.
      • Execute the operations in the graph.
    • Block: Add a member variable of type device to help to do the lazy allocation. Add a function to help to do automatic recycling.
    • Communicator and opt.py: Support the synchronization pipeline.
    • Swig: add some interfaces
    • Module: Provide a Module class on the python side for users to use the graph more conveniently.
    • Examples: Add some examples with operations buffering by using Module class.


    Single node

    • Experiment settings
    • Explanation
      • s :second
      • it : iteration
      • Memory:peak memory usage of single GPU
      • Throughout:number of pictures processed per second
      • Time:total time
      • Speed:iterations per second
      • Reduction:the memory usage reduction rate compared with dev branch
      • Seepdup: speedup ratio compared with dev branch
    • Result:
    Batchsize Cases Memory(MB) Time(s) Speed(it/s) Throughput Reduction Speedup
    16 dev 4975 14.1952 14.0893 225.4285 0.00% 1.0000
    PR:no graph 4995 14.1264 14.1579 226.5261 -0.40% 1.0049
    PR:with graph, bfs 3283 13.7438 14.5520 232.8318 34.01% 1.0328
    PR:with graph, serial 3265 13.7420 14.5540 232.8635 34.37% 1.0330
    32 dev 10119 13.4587 7.4302 237.7649 0.00% 1.0000
    PR:no graph 10109 13.2952 7.5315 240.6875 0.10% 1.0123
    PR:with graph, bfs 6839 13.1059 7.6302 244.1648 32.41% 1.0269
    PR:with graph, serial 6845 13.0489 7.6635 245.2312 32.35% 1.0314

    Multi processes

    • Experiment settings
    • Explanation: the same as above
    • Result:
    Batchsize Cases Memory(MB) Time(s) Speed(it/s) Throughput Reduction Speedup
    16 dev 5439 17.3323 11.5391 369.2522 0.00% 1.0000
    PR:no graph 5427 17.8232 11.2213 359.0831 0.22% 0.9725
    PR:with graph, bfs 3389 18.2310 10.9703 351.0504 37.69% 0.9725
    PR:with graph, serial 3437 17.0389 11.7378 375.6103 36.69% 1.0172
    32 dev 10547 14.8635 6.7684 237.7649 0.00% 1.0000
    PR:no graph 10503 14.7746 6.7684 433.1748 0.42% 1.0060
    PR:with graph, bfs 6935 14.8553 6.7384 433.1748 34.25% 1.0269
    PR:with graph, serial 7027 14.3271 6.9798 446.7074 33.37% 1.0374

    From the table above, we can know:

    • This PR does not affect training time and memory usage if the graph is disabled (has backward compatibility).
    • This PR can significantly reduce memory usage and training time by using the graph.

    How to use

    • An example of CNN:
    class CNN(module.Module):
        def __init__(self, optimizer):
            super(CNN, self).__init__()
            self.conv1 = autograd.Conv2d(1, 20, 5, padding=0)
            self.conv2 = autograd.Conv2d(20, 50, 5, padding=0)
            self.linear1 = autograd.Linear(4 * 4 * 50, 500)
            self.linear2 = autograd.Linear(500, 10)
            self.pooling1 = autograd.MaxPool2d(2, 2, padding=0)
            self.pooling2 = autograd.MaxPool2d(2, 2, padding=0)
            self.optimizer = optimizer
        def forward(self, x):
            y = self.conv1(x)
            y = autograd.relu(y)
            y = self.pooling1(y)
            y = self.conv2(y)
            y = autograd.relu(y)
            y = self.pooling2(y)
            y = autograd.flatten(y)
            y = self.linear1(y)
            y = autograd.relu(y)
            y = self.linear2(y)
            return y
        def loss(self, x, ty):
            return autograd.softmax_cross_entropy(x, ty)
        def optim(self, loss):
    # initialization other objects
    # ......
    model = CNN(sgd)
    # Train
    for b in range(num_train_batch):
        # Generate the patch data in this iteration
        # ......
        # Copy the patch data into input tensors
        # Train the model
        out = model(tx)
        loss = model.loss(out, ty)


    • [ ] Computation graph optimization: replace a subgraph of the input computation graph with another subgraph which is functionally equivalent to the original one.
    • [ ] Support recalculation and swapping out variables from the GPU.
    • [ ] Perform operations in the graph in the order of DFS.
    • [ ] Performing operations in parallel.
    opened by XJDKC 20
  • AssertionError for the ONNX training testcases?

    AssertionError for the ONNX training testcases?

    AssertionError with the onnx testcase: https://github.com/apache/singa/blob/master/examples/onnx/training/train.py

    $ cd examples/onnx
    $ python3 training/train.py --model vgg16

    Then I get the following error msg:

    File "training/train.py", line 437, in <module>
        args.onnx_model_path, args.data, sgd, args.graph, args.verbosity)
      File "training/train.py", line 295, in run
        model.compile([tx], is_train=True, use_graph=graph, sequential=sequential)
      File "/home/extend/lijiansong/work-space/anaconda2/envs/intel-caffe/lib/python3.6/site-packages/singa/model.py", line 177, in compile
      File "/home/extend/lijiansong/work-space/anaconda2/envs/intel-caffe/lib/python3.6/site-packages/singa/layer.py", line 63, in wrapper
        return func(self, *args, **kwargs)
      File "training/train.py", line 191, in forward
        y = self.linear(y)
      File "/home/extend/lijiansong/work-space/anaconda2/envs/intel-caffe/lib/python3.6/site-packages/singa/layer.py", line 110, in __call__
        return self.forward(*args, **kwargs)
      File "/home/extend/lijiansong/work-space/anaconda2/envs/intel-caffe/lib/python3.6/site-packages/singa/layer.py", line 61, in wrapper
        self.initialize(*args, **kwargs)
      File "/home/extend/lijiansong/work-space/anaconda2/envs/intel-caffe/lib/python3.6/site-packages/singa/layer.py", line 45, in wrapper
        'initialize function expects PlaceHolders or Tensors')
    AssertionError: initialize function expects PlaceHolders or Tensors

    Something maybe wrong with the layer initialization?

    singa version: 3100(the latest build from the source code of master branch) Python version: 3.5.2 ONNX version: 1.5.0

    opened by lijiansong 18
  • Clang-formatter results in different formatting?

    Clang-formatter results in different formatting?

    I used the clang-formatter with VS-code after I alter the tensor.h file, it results in different format with the dev branch.

    format fromat2

    The tensor.cc should have re-formatted before in PR #581. So, did I use incorrect setting in clang-formatter?

    opened by chrishkchris 17
  • Is there any runtime problem of onnx in Travis CI built SINGA CPU version related to libprotobuf.so.20?

    Is there any runtime problem of onnx in Travis CI built SINGA CPU version related to libprotobuf.so.20?

    In the log of travis CI CPU version build, it displays the error in test_onnx that cannot import libprotobuf.so.20 https://travis-ci.org/github/apache/singa/jobs/664251025#L3998

    ERROR: test_onnx (unittest.loader._FailedTest)
    ImportError: Failed to import test module: test_onnx
    Traceback (most recent call last):
      File "/home/travis/conda-bld-1971.5/singa_1584596418932/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla/lib/python3.7/unittest/loader.py", line 436, in _find_test_path
        module = self._get_module_from_name(name)
      File "/home/travis/conda-bld-1971.5/singa_1584596418932/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla/lib/python3.7/unittest/loader.py", line 377, in _get_module_from_name
      File "/home/travis/conda-bld-1971.5/singa_1584596418932/test_tmp/test/python/test_onnx.py", line 24, in <module>
        from singa import sonnx
      File "/home/travis/conda-bld-1971.5/singa_1584596418932/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla/lib/python3.7/site-packages/singa/sonnx.py", line 23, in <module>
        import onnx.utils
      File "/home/travis/conda-bld-1971.5/singa_1584596418932/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla/lib/python3.7/site-packages/onnx/__init__.py", line 8, in <module>
        from .onnx_cpp2py_export import ONNX_ML
    ImportError: libprotobuf.so.20: cannot open shared object file: No such file or directory
    opened by chrishkchris 14
  • Add CodeQL workflow

    Add CodeQL workflow

    Hi apache/singa!

    This is not an automatic, 🤖-generated PR, as you can check in my GitHub profile, I work for GitHub and I am part of the GitHub Security Lab which is helping out with the migration of LGTM configurations to Code Scanning. You might have heard that we've integrated LGTM's underlying CodeQL analysis engine natively into GitHub. The result is GitHub code scanning!

    With LGTM fully integrated into code scanning, we are focused on improving CodeQL within the native GitHub code scanning experience. In order to take advantage of current and future improvements to our analysis capabilities, we suggest you enable code scanning on your repository. Please take a look at our blog post for more information.

    This pull request enables code scanning by adding an auto-generated codeql.yml workflow file for GitHub Actions to your repository — take a look! We tested it before opening this pull request, so all should be working :heavy_check_mark:. In fact, you might already have seen some alerts appear on this pull request!

    Where needed and if possible, we’ve adjusted the configuration to the needs of your particular repository. But of course, you should feel free to tweak it further! Check this page for detailed documentation.

    Questions? Check out the FAQ below!


    Click here to expand the FAQ section

    How often will the code scanning analysis run?

    By default, code scanning will trigger a scan with the CodeQL engine on the following events:

    • On every pull request — to flag up potential security problems for you to investigate before merging a PR.
    • On every push to your default branch and other protected branches — this keeps the analysis results on your repository’s Security tab up to date.
    • Once a week at a fixed time — to make sure you benefit from the latest updated security analysis even when no code was committed or PRs were opened.

    What will this cost?

    Nothing! The CodeQL engine will run inside GitHub Actions, making use of your unlimited free compute minutes for public repositories.

    What types of problems does CodeQL find?

    The CodeQL engine that powers GitHub code scanning is the exact same engine that powers LGTM.com. The exact set of rules has been tweaked slightly, but you should see almost exactly the same types of alerts as you were used to on LGTM.com: we’ve enabled the security-and-quality query suite for you.

    How do I upgrade my CodeQL engine?

    No need! New versions of the CodeQL analysis are constantly deployed on GitHub.com; your repository will automatically benefit from the most recently released version.

    The analysis doesn’t seem to be working

    If you get an error in GitHub Actions that indicates that CodeQL wasn’t able to analyze your code, please follow the instructions here to debug the analysis.

    How do I disable LGTM.com?

    If you have LGTM’s automatic pull request analysis enabled, then you can follow these steps to disable the LGTM pull request analysis. You don’t actually need to remove your repository from LGTM.com; it will automatically be removed in the next few months as part of the deprecation of LGTM.com (more info here).

    Which source code hosting platforms does code scanning support?

    GitHub code scanning is deeply integrated within GitHub itself. If you’d like to scan source code that is hosted elsewhere, we suggest that you create a mirror of that code on GitHub.

    How do I know this PR is legitimate?

    This PR is filed by the official LGTM.com GitHub App, in line with the deprecation timeline that was announced on the official GitHub Blog. The proposed GitHub Action workflow uses the official open source GitHub CodeQL Action. If you have any other questions or concerns, please join the discussion here in the official GitHub community!

    I have another question / how do I get in touch?

    Please join the discussion here to ask further questions and send us suggestions!

    opened by pwntester 0
  • Bump protobuf-java from 2.6.1 to 3.16.3 in /java

    Bump protobuf-java from 2.6.1 to 3.16.3 in /java

    Bumps protobuf-java from 2.6.1 to 3.16.3.

    Release notes

    Sourced from protobuf-java's releases.

    Protobuf Release v3.16.3


    • Refactoring java full runtime to reuse sub-message builders and prepare to migrate parsing logic from parse constructor to builder.
    • Move proto wireformat parsing functionality from the private "parsing constructor" to the Builder class.
    • Change the Lite runtime to prefer merging from the wireformat into mutable messages rather than building up a new immutable object before merging. This way results in fewer allocations and copy operations.
    • Make message-type extensions merge from wire-format instead of building up instances and merging afterwards. This has much better performance.
    • Fix TextFormat parser to build up recurring (but supposedly not repeated) sub-messages directly from text rather than building a new sub-message and merging the fully formed message into the existing field.
    • This release addresses a Security Advisory for Java users

    Protocol Buffers v3.16.1


    • Improve performance characteristics of UnknownFieldSet parsing (#9371)

    Protocol Buffers v3.16.0


    • Fix compiler warnings issue found in conformance_test_runner #8189 (#8190)
    • Fix MinGW-w64 build issues. (#8286)
    • [Protoc] C++ Resolved an issue where NO_DESTROY and CONSTINIT are in incorrect order (#8296)
    • Fix PROTOBUF_CONSTINIT macro redefinition (#8323)
    • Delete StringPiecePod (#8353)
    • Fix gcc error: comparison of unsigned expression in '>= 0' is always … (#8309)
    • Fix cmake install on iOS (#8301)
    • Create a CMake option to control whether or not RTTI is enabled (#8347)
    • Fix endian.h location on FreeBSD (#8351)
    • Refactor util::Status (#8354)
    • Make util::Status more similar to absl::Status (#8405)
    • Fix -Wsuggest-destructor-override for generated C++ proto classes. (#8408)
    • Refactor StatusOr and StringPiece (#8406)
    • Refactor uint128 (#8416)
    • The ::pb namespace is no longer exposed due to conflicts.
    • Allow MessageDifferencer::TreatAsSet() (and friends) to override previous calls instead of crashing.
    • Reduce the size of generated proto headers for protos with string or bytes fields.
    • Move arena() operation on uncommon path to out-of-line routine
    • For iterator-pair function parameter types, take both iterators by value.
    • Code-space savings and perhaps some modest performance improvements in RepeatedPtrField.
    • Eliminate nullptr check from every tag parse.
    • Remove unused _$name$cached_byte_size fields.
    • Serialize extension ranges together when not broken by a proto field in the middle.
    • Do out-of-line allocation and deallocation of string object in ArenaString.

    ... (truncated)

    • b8c2488 Updating version.json and repo version numbers to: 16.3
    • 42e47e5 Refactoring Java parsing (3.16.x) (#10668)
    • 98884a8 Merge pull request #10556 from deannagarcia/3.16.x
    • 450b648 Cherrypick ruby fixes for monterey
    • b17bb39 Merge pull request #10548 from protocolbuffers/3.16.x-202209131829
    • c18f5e7 Updating changelog
    • 6f4e817 Updating version.json and repo version numbers to: 16.2
    • a7d4e94 Merge pull request #10547 from deannagarcia/3.16.x
    • 55815e4 Apply patch
    • 152d7bf Update version.json with "lts": true (#10535)
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies java 
    opened by dependabot[bot] 1
  • Update documentation for distributed training

    Update documentation for distributed training

    need to update documentation for distributed training, https://singa.apache.org/docs/dist-train/

    e.g., adding explanations for conda package installation, optimizer selection, explanations for different data copies in different machines

    opened by lzjpaul 0
  • Adding GPU testing to Github workflows

    Adding GPU testing to Github workflows

    This issue is open to discuss different options for adding GPU build and test to Github workflows.

    To enable this feature, SINGA must provide a real or virtual machine with GPU as host machine for running the workflow. Then use the self-hosted runner feature of Github Actions. See also this MLOps video tutorial.

    The team need to take some decisions:

    1. Which machine(s) should we use? (e.g. virtual machine(s) on AWS, dedicated server(s) at NUS, ..)
    2. Which operating systems we test on? (Only Linux? or also Mac).
    3. When we run the GPU build and test workflow? (with every pull request? once per day at night? once per week? only on master branch?, ...)
    4. Should we keep the machines always on? or use them only when the scheduled test is running and shut down them when there is no workflow runs? Assuming we run the GPU build and test only at scheduled time.
    5. Should we add workflows to run examples and test the Jupyter notebooks? note that some examples may take hours or days to complete the training. But automating the test of examples can be very useful to speed up the development.

    What do you think?

    opened by moazreyad 21
  • Adding dependabot support

    Adding dependabot support

    Dependabot is a native tool in github that automatically checks the dependencies of the project and creates pull requests for outdated or insecure dependencies. See how it works from here.

    Currently it will analyze only the dependencies defined in pom.xmlin SINGA because other dependencies are hard coded in configuration or shell scripts which are not supported by this tool. In future commits, the dependencies will be moved to standard files such as python requirements.txt so that they become available to dependabot and similar tools.

    opened by moazreyad 5
  • Raise and handle exceptions in CPP code

    Raise and handle exceptions in CPP code

    Currently, we abort the program when any check fails via glog's CHECK functions. We do not catch any exceptions like memory exception or cudnn exceptions.

    As a result, the program will abort or crash whenever there is an error or exception, which sometimes shutdown the jupyter notebook or colab notebook when we run the code in the notebook environment.

    This ticket is to raise and handle exceptions in CPP code.

    ref: http://www.swig.org/Doc3.0/SWIGDocumentation.html#Customization_exception

    opened by nudles 0
  • 3.0.0(Apr 21, 2020)

  • 3.0.0.rc1(Apr 8, 2020)

    This release includes following changes:

    • Code quality has been promoted by introducing linting check in CI and auto code formatter. For linting, the tools, cpplint and pylint, are used and configured to comply google coding styles details in tool/linting/. Similarly, formatting tools, clang-format and yapf configured with google coding styles, are the recommended one for developers to clean code before submitting changes, details in tool/code-format/. LGTM is enabled on Github for code quality check; License check is also enabled.

    • New Tensor APIs are added for naming consistency, and feature enhancement:

      • size(), mem_size(), get_value(), to_proto(), l1(), l2(): added for the sake of naming consistency
      • AsType(): convert data type between float and int
      • ceil(): perform element-wise ceiling of the input
      • concat(): concatenate two tensor
      • index selector: e.g. tensor1[:,:,1:,1:]
      • softmax(in, axis): allow to perform softmax on a axis on a multi-dimensional tensor
    • 14 new operators are added into the autograd module: Gemm, GlobalAveragePool, ConstantOfShape, Dropout, ReduceSum, ReduceMean, Slice, Ceil, Split, Gather, Tile, NonZero, Cast, OneHot. Their unit tests are added as well.

    • 14 new operators are added to sonnx module for both backend and frontend: Gemm, GlobalAveragePool, ConstantOfShape, Dropout, ReduceSum, ReduceMean, Slice, Ceil, Split, Gather, Tile, NonZero, Cast, OneHot. Their tests are added as well.

    • Some ONNX models are imported into SINGA, including Bert-squad, Arcface, FER+ Emotion, MobileNet, ResNet18, Tiny Yolov2, Vgg16, and Mnist.

    • Some operators now support multidirectional broadcasting, including Add, Sub, Mul, Div, Pow, PRelu, Gemm

    • [Distributed training with communication optimization]. DistOpt has implemented multiple optimization techniques, including gradient sparsification, chunk transmission, and gradient compression.

    • Computational graph construction at the CPP level. The operations submitted to the Device are buffered. After analyzing the dependency, the computational graph is created, which is further analyzed for speed and memory optimization. To enable this feature, use the Module API.

    • New website based on Docusaurus. The documentation files are moved to a separate repo [singa-doc]](https://github.com/apache/singa-doc). The static website files are stored at singa-site.

    • DNNL(Deep Neural Network Library), powered by Intel, is integrated into model/operations/[batchnorm|pooling|convolution], the changes is opaque to the end users. The current version is dnnl v1.1 which replaced previous integration of mkl-dnn v0.18. The framework could boost the performance of dl operations when executing on CPU. The dnnl dependency is installed through conda.

    • Some Tensor APIs are marked as deprecated which could be replaced by broadcast, and it can support better on multi-dimensional operations. These APIs are add_column(), add_row(), div_column(), div_row(), mult_column(), mult_row()

    • Conv and Pooling are enhanced to support fine-grained padding like (2,3,2,3), and SAME_UPPER, SAME_LOWER pad mode and shape checking.

    • Reconstruct soonx,

      • Support two types of weight value (Initializer and Constant Node);
      • For some operators (BatchNorm, Reshape, Clip, Slice, Gather, Tile, OneHot), move some inputs to its attributes;
      • Define and implement the type conversion map.
    Source code(tar.gz)
    Source code(zip)
  • 2.0.0(Apr 7, 2019)

    In this release, we have added the support of ONNX, implemented the CPU operations using MKLDNN, added operations for autograd, and updated the dependent libraries and CI tools.

    • Core components

      • [SINGA-434] Support tensor broadcasting
      • [SINGA-370] Improvement to tensor reshape and various misc. changes related to SINGA-341 and 351
    • Model components

      • [SINGA-333] Add support for Open Neural Network Exchange (ONNX) format
      • [SINGA-385] Add new python module for optimizers
      • [SINGA-394] Improve the CPP operations via Intel MKL DNN lib
      • [SINGA-425] Add 3 operators , Abs(), Exp() and leakyrelu(), for Autograd
      • [SINGA-410] Add two function, set_params() and get_params(), for Autograd Layer class
      • [SINGA-383] Add Separable Convolution for autograd
      • [SINGA-388] Develop some RNN layers by calling tiny operations like matmul, addbias.
      • [SINGA-382] Implement concat operation for autograd
      • [SINGA-378] Implement maxpooling operation and its related functions for autograd
      • [SINGA-379] Implement batchnorm operation and its related functions for autograd
    • Utility functions and CI

      • [SINGA-432] Update depdent lib versions in conda-build config
      • [SINGA-429] Update docker images for latest cuda and cudnn
      • [SINGA-428] Move Docker images under Apache user name
    • Documentation and usability

      • [SINGA-395] Add documentation for autograd APIs
      • [SINGA-344] Add a GAN example
      • [SINGA-390] Update installation.md
      • [SINGA-384] Implement ResNet using autograd API
      • [SINGA-352] Complete SINGA documentation in Chinese version
    • Bugs fixed

      • [SINGA-431] Unit Test failed - Tensor Transpose
      • [SINGA-422] ModuleNotFoundError: No module named "_singa_wrap"
      • [SINGA-418] Unsupportive type 'long' in python3.
      • [SINGA-409] Basic singa-cpu import throws error
      • [SINGA-408] Unsupportive function definition in python3
      • [SINGA-380] Fix bugs from Reshape
    Source code(tar.gz)
    Source code(zip)
The Apache Software Foundation
The Apache Software Foundation
BigDL: Distributed Deep Learning Framework for Apache Spark

BigDL: Distributed Deep Learning on Apache Spark What is BigDL? BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can w

null 4.1k Nov 22, 2022
Distributed Deep learning with Keras & Spark

Elephas: Distributed Deep Learning with Keras & Spark Elephas is an extension of Keras, which allows you to run distributed deep learning models at sc

Max Pumperla 1.6k Nov 20, 2022
Uber Open Source 1.5k Nov 18, 2022
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective. 10x Larger Models 10x Faster Trainin

Microsoft 8.2k Nov 17, 2022
WAGMA-SGD is a decentralized asynchronous SGD for distributed deep learning training based on model averaging.

WAGMA-SGD is a decentralized asynchronous SGD based on wait-avoiding group model averaging. The synchronization is relaxed by making the collectives externally-triggerable, namely, a collective can be initiated without requiring that all the processes enter it. It partially reduces the data within non-overlapping groups of process, improving the parallel scalability.

Shigang Li 6 Jun 18, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 14.4k Nov 25, 2022
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear

null 22.8k Nov 20, 2022
🎛 Distributed machine learning made simple.

?? lazycluster Distributed machine learning made simple. Use your preferred distributed ML framework like a lazy engineer. Getting Started • Highlight

Machine Learning Tooling 44 Sep 28, 2022
Management of exclusive GPU access for distributed machine learning workloads

TensorHive is an open source tool for managing computing resources used by multiple users across distributed hosts. It focuses on granting

Paweł Rościszewski 132 Nov 4, 2022
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 23.5k Nov 21, 2022
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Horovod Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make dis

Horovod 12.9k Nov 25, 2022
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray

A unified Data Analytics and AI platform for distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray What is Analytics Zoo? Analytics Zo

null 2.5k Nov 18, 2022
A high performance and generic framework for distributed DNN training

BytePS BytePS is a high performance and general distributed training framework. It supports TensorFlow, Keras, PyTorch, and MXNet, and can run on eith

Bytedance Inc. 3.3k Nov 20, 2022
Distributed Computing for AI Made Simple

Project Home Blog Documents Paper Media Coverage Join Fiber users email list [email protected] Fiber Distributed Computing for AI Made Simp

Uber Open Source 993 Nov 24, 2022
Distributed scikit-learn meta-estimators in PySpark

sk-dist: Distributed scikit-learn meta-estimators in PySpark What is it? sk-dist is a Python package for machine learning built on top of scikit-learn

Ibotta 283 Nov 9, 2022
Distributed Evolutionary Algorithms in Python

DEAP DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. It seeks to make algorithms explicit and data stru

Distributed Evolutionary Algorithms in Python 4.9k Nov 21, 2022
DistML is a Ray extension library to support large-scale distributed ML training on heterogeneous multi-node multi-GPU clusters

DistML is a Ray extension library to support large-scale distributed ML training on heterogeneous multi-node multi-GPU clusters

null 27 Aug 19, 2022
Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way

Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.

The Apache Software Foundation 119 Oct 20, 2022
Graphsignal is a machine learning model monitoring platform.

Graphsignal is a machine learning model monitoring platform. It helps ML engineers, MLOps teams and data scientists to quickly address issues with data and models as well as proactively analyze model performance and availability.

Graphsignal 144 Sep 21, 2022