a distributed deep learning platform

The Apache Software Foundation

Last update: Jan 5, 2023

Related tags

Machine Learning deep-learning

Overview

Apache SINGA

Distributed deep learning system

http://singa.apache.org

Quick Start

Issues

JIRA tickets

Code Analysis:

Mailing Lists

Comments

Add new example APIs

The changes recommended in https://github.com/apache/singa-doc/pull/14 to simply the APIs of the examples I have reduced the number of arguments in the train_mnist_cnn

opened by chrishkchris 50

SINGA-487 Include NCCL and MPICH in conda build

Seems that adding nccl and mpich is okay in conda build of singa, but need to check further and add other thing such as python "deprecated"

======================================================================
ERROR: test_tensor (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: test_tensor
Traceback (most recent call last):
  File "/root/miniconda/conda-bld/singa_1583767754024/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib/python3.6/unittest/loader.py", line 428, in _find_test_path
    module = self._get_module_from_name(name)
  File "/root/miniconda/conda-bld/singa_1583767754024/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/root/miniconda/conda-bld/singa_1583767754024/test_tmp/test/python/test_tensor.py", line 24, in <module>
    from singa import tensor
  File "/root/miniconda/conda-bld/singa_1583767754024/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib/python3.6/site-packages/singa/tensor.py", line 58, in <module>
    from deprecated import deprecated
ModuleNotFoundError: No module named 'deprecated'


======================================================================
FAIL: test_conv2d (test_mkldnn.TestPythonOperation)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/miniconda/conda-bld/singa_1583767754024/test_tmp/test/python/test_mkldnn.py", line 78, in test_conv2d
    self.assertAlmostEqual(4.0, _dW[0], places=5)
AssertionError: 4.0 != 0.040000003 within 5 places

----------------------------------------------------------------------
Ran 13 tests in 0.003s

FAILED (failures=1, errors=12)
TEST CONV2D FORWARD
TEST CONV2D DATA BACKWARD
TEST CONV2D WEIGHT BACKWARD
+ exit 0

Resource usage statistics from testing singa:
   Process count: 1
   CPU time: unavailable
   Memory: 3.0M
   Disk usage: 656B
   Time elapsed: 0:00:02.1

TEST END: /root/miniconda/conda-bld/linux-64/singa-2.1.0.dev-cudnn7.3.1_cuda10.0_py36.tar.bz2
Renaming work directory,  /root/miniconda/conda-bld/singa_1583767754024/work  to  /root/miniconda/conda-bld/singa_1583767754024/work_moved_singa-2.1.0.dev-cudnn7.3.1_cuda10.0_py36_linux-64_main_build_loop
# Automatic uploading is disabled
# If you want to upload package(s) to anaconda.org later, type:

anaconda upload /root/miniconda/conda-bld/linux-64/singa-2.1.0.dev-cudnn7.3.1_cuda10.0_py36.tar.bz2

# To have conda build upload to anaconda.org automatically, use
# $ conda config --set anaconda_upload yes

anaconda_upload is not set.  Not uploading wheels: []
####################################################################################
Resource usage summary:

Total time: 0:07:23.3
CPU usage: sys=0:00:06.5, user=0:02:17.0
Maximum memory usage observed: 777.8M
Total disk usage observed (not including envs): 252.4K


####################################################################################
Source and build intermediates have been left in /root/miniconda/conda-bld.
There are currently 4 accumulated.
To remove them, you can run the ```conda build purge``` command
root@3c17fd6cb72e:~/dcsysh/singa/tool/conda/singa#

opened by chrishkchris 40

Dev branch cpu training problem (with conv and pool)

After merging the PR #590, today I am trying to solve the problem that the mnist cnn hangs as reported in PR #589. When I run mnist_cnn after changing to cpu, it hangs. So the training has some problem.

opened by chrishkchris 29
Autograd Layer constructor

The Layer class in Autograd is to maintain the model parameters. It passes the parameters into the operation and thus operations are stateless.

Typically the parameter size depends on the input and layer configuration. Currently, we require the users to provide the input size in the layer constructor. Then we can create the parameter tensor and initialize it in the constructor, e.g., Linear layer. One potential problem is that the initialization operation may not be buffered. @XJDKC Is it an issue? For some layers like RNN implemented using cudnn, although we can get the input size, the parameter size is unknown until the cudnn handle is created, which is done until the data is forwarded through the layer.

Another way is to delay the parameter tensor creation until the layer is called for forward propagation. At that time, we have the input tensor (and its device). Then in the layer constructor, we do not need the user to provide the input size. The drawback is that after the layer is created, the get_params() function would still fail to get the parameter tensors as they are not created yet. @dcslin To switch to this approach, we need to change the constructors of existing layer classes and examples. We also need to provide an initializer function/class into the constructor for initializing the parameter tensors after they are created.

Please add your comments.

opened by nudles 28

Add save and load method for Module class

Updated on May 15

class Layer:
   def get_params(self):
       """the params of this layer and sublayers as a dict;  param name is: layername.param
           e.g., self.W = Tensor(), self.b=Tensor()
                  name of W and b is  like conv1.W and conv1.b  
       """

    def get_states(self):
       """states of this layer as sublayers that are necessary for model evaluation/inference.
           the states include the params and others, e.g., the running mean and var of batchnorm.
      """

class Module(Layer):   
  def compile(self ...):
     """set the name of each layer and sublayers, which will be used to create the dict 
          for get_params and get_states. Then no need to manually config the layer name 
         the __init__ method of a layer.
 
        For instance,
        class Blk(Layer):
             def __init__(self):
                  self.conv1= Conv2d()
                  self.conv2 = Conv2d()

        class MyModel(Module):
              def __init__(self):         
                 self.blk1 = Blk() --> blk1.conv1, blk1.conv2
                 self.blk2 = Blk()  --> blk2.conv1, blk2.conv2
   """

  # high priority
  def save(self, fpath, ckp_states={}):
      """Save the model and optionally some states.
      
      Args:
         fpath: output file path (without the extension)
         ckp_states(dict): states for checkpoint that are not attributes of Module, e.g., epoch ID.
      """
      cust_states = {}
      if ckp_states is not None:
         cust_states = ckp_states + model (include sublayers) attributes - get_states()
      save model states via onnx with customized field for the cust_states

  def load(self, fpath, dev, use_graph, graph_alg):
      """Load the model onto dev
   
       Args:
         path: input file path (without the extension)

       Returns:
          dict for the ckp_states.
      ```
      load model states + cust_states
      model attributes = model states + attributes from cust_states
      self.compile()
      restore the model attributes
      return the rest states as a dict

# lower priority
def save(fpath, model, ckp_states):
    attributes <-- model
    replace all tensors in attributes + ckp_states into dict name -->(shape, dtype)
    dump the tensors via numpy.savez_compressed
    dump model via pickle

def load(fpath, dev, use_graph, graph_alg):
     load model via pickle
     load tensors via numpy.load
     restore the tensors 
     return the ckp_states

Clarification:

Params: layer parameters (Tensor) that are updated via SGD. Layer.get_params()
States: Params + other variables that are necessary for model evaluation/inference. Superset of params. Layer.get_states()
Attributes: members of a class instance class.__dict__. Superset of states.

enhancement

opened by nudles 27

Refactor sonnx, test cases and examples
todo-list:

~~sonnx to suit new API(almost done),~~

~~sonnx to use new autograd API,~~

~~test cases,~~

~~debug test operations,~~

~~debug sonnx backend test cases,~~

~~examples~~

~~add re-training examples~~

(if time is enough) add more example(will move to another PR)
opened by joddiy 24
Refactor autograd module
#688 is refactoring the autograd module. Here are some comments about the current APIs in autograd.

Relationship between the classes and functions in autograd. Operator implements the forward and backward method for autograd. For each Operator class, there is a function that creates an Operator instance and calls the forward method. Layer stores the states (handles and parameters) and calls Operator function for the real computation. Note that a layer class can have sub-layers (as states) for creating complex and deep models. Issue: When we create a network using the Module API, there are both stateless (e.g., flatten) and statefull (conv2d) operations. Currently, we create layers in __init__ of Module and calls the layers and operator function in forward method. Therefore, Layer and Operator are mixed, which may confuse the users. A better way is to use Layer instances only. For every operator, we create a corresponding layer class to replace the layer function.

Layer API. issue: when and how to initialize the parameters and (handle) of a layer?

do initialization in __init__ method OR when the data is forwarded for the first time #674

pass an initializer function to the __init__ method of a layer and use it to initialize the parameters OR pass an initializer function to the __init__ method of the Module class and use it to initialize the parameter (through get_params) of the layers after forwarding the layers for once. The second approach requires the Module class's __init__ to do a forward pass of all layers and then get_params of each layer for initialization. To do that, it needs at least the shapes of the input tensors and the device. The drawback of the first approach is that we need to include the initializer in every Layer constructor.

comments are welcomed.
opened by nudles 23

SINGA-505 Computational graph with memory optimization

Overview

This PR adds the computational graph with memory optimization. It is based on the code developed by @chrishkchris and @XJDKC and some discussions with @nudles.

Features

There are four main features in this PR, namely the construction of the computational graph, lazy allocation, automatic recycling, and synchronization pipeline. Details as follows:

Computational graph construction: Construct a computational graph based on the user-defined neural network or expressions and then run the graph to accomplish the training task. The computational graph also includes operations like synch and fused synch in the communicator.
Lazy allocation: When blocks need to be allocated, devices won't allocate memory for them immediately. Only when an operation uses this block for the first time, memory allocation will be performed.
Automatic recycling: Automatically deallocate the intermediate tensors which won't be used again in the following operations when we are running the graph in an iteration.
Synchronization pipeline: In previous synchronization operations, buffers were used to synchronize multiple tensors at once. But the communicator needs to collect all the tensors before copying them into the buffer. Synchronization pipeline can copy tensors to the buffer separately, which reduces the time for synchronous operations.

Design

Computational graph construction
- Use the technique of delayed execution to falsely perform operations in the forward propagation and backward propagation once. Buffer all the operations and the tensors read or written by each operation.
- Calculate dependencies between all the operations to decide the order of execution. (Support directed cyclic graph)
- Execute all the operations in the order we just calculated to update all the parameters.
- The system will only analyze the same graph once. If new operations are added to the graph, the calculation graph will be re-analyzed.
- Provided a module class for users to use this feature more conveniently.
Lazy allocation
- When a device needs to create a new block, just pass the device to that block instead of allocating a piece of memory from the mempool and passing the pointer to that block.
- When the block is accessed for the first time, let the device corresponding to the block allocate memory and then access it.
Automatic recycling
- When calculating dependencies between the operations during the graph construction, the reference count of tensors can also be calculated.
- When an operation is completed, we can decrease the reference count of tensors the operation used.
- If a tensor's reference count reaches zero, it means the tensor won't be accessed by latter operations and we can recycle its memory.
- The program will track the usage of the block. If a block is used on the python side, it will not be recycled, which is convenient for debugging on the python side.
Synchronization pipeline
- If a tensor needs fusion synchronization, it will be copied to the buffer immediately and don't need to gather all the tensors. Because the copy operation is advanced, it takes less time to do real synchronization. This optimizes the use of the GPU.

Changes

Tensor&Operation&Communicator
- Change the capture type of tensors in lambda expressions to achieve delayed execution.
- Change the type of input and output parameters to ensure that the input and output of the operation are tensors.
- Submit synchronous operations to the device for execution.
Device: Add code for
- Buffering operations
- RunGraph
Scheduler
- Add operations in the graph to construct the graph.
- Analyse the graph, calculating dependencies and deciding the reallocation of the blocks.
- Execute the operations in the graph.
Block: Add a member variable of type device to help to do the lazy allocation. Add a function to help to do automatic recycling.
Communicator and opt.py: Support the synchronization pipeline.
Swig: add some interfaces
Module: Provide a Module class on the python side for users to use the graph more conveniently.
Examples: Add some examples with operations buffering by using Module class.

Evaluation

Single node

Experiment settings
- Modle
  - dev branch: ResNet50 in resnet.py
  - this pr: ResNet50 in resnet_module.py
- GPU: Nvidia RTX 2080Ti
Explanation
- s ：second
- it ： iteration
- Memory：peak memory usage of single GPU
- Throughout：number of pictures processed per second
- Time：total time
- Speed：iterations per second
- Reduction：the memory usage reduction rate compared with dev branch
- Seepdup: speedup ratio compared with dev branch
Result:

Batchsize	Cases	Memory(MB)	Time(s)	Speed(it/s)	Throughput	Reduction	Speedup
16	dev	4975	14.1952	14.0893	225.4285	0.00%	1.0000
	PR:no graph	4995	14.1264	14.1579	226.5261	-0.40%	1.0049
	PR:with graph, bfs	3283	13.7438	14.5520	232.8318	34.01%	1.0328
	PR:with graph, serial	3265	13.7420	14.5540	232.8635	34.37%	1.0330
32	dev	10119	13.4587	7.4302	237.7649	0.00%	1.0000
	PR:no graph	10109	13.2952	7.5315	240.6875	0.10%	1.0123
	PR:with graph, bfs	6839	13.1059	7.6302	244.1648	32.41%	1.0269
	PR:with graph, serial	6845	13.0489	7.6635	245.2312	32.35%	1.0314

Multi processes

Experiment settings
- Model
  - dev branch: ResNet50 in resnet_dist.py
  - this pr: ResNet50 in resnet_module.py
- GPU: Nvidia RTX 2080Ti * 2
- MPI: two MPI processes on one node
Explanation: the same as above
Result:

Batchsize	Cases	Memory(MB)	Time(s)	Speed(it/s)	Throughput	Reduction	Speedup
16	dev	5439	17.3323	11.5391	369.2522	0.00%	1.0000
	PR:no graph	5427	17.8232	11.2213	359.0831	0.22%	0.9725
	PR:with graph, bfs	3389	18.2310	10.9703	351.0504	37.69%	0.9725
	PR:with graph, serial	3437	17.0389	11.7378	375.6103	36.69%	1.0172
32	dev	10547	14.8635	6.7684	237.7649	0.00%	1.0000
	PR:no graph	10503	14.7746	6.7684	433.1748	0.42%	1.0060
	PR:with graph, bfs	6935	14.8553	6.7384	433.1748	34.25%	1.0269
	PR:with graph, serial	7027	14.3271	6.9798	446.7074	33.37%	1.0374

From the table above, we can know:

This PR does not affect training time and memory usage if the graph is disabled (has backward compatibility).
This PR can significantly reduce memory usage and training time by using the graph.

How to use

An example of CNN:


class CNN(module.Module):

    def __init__(self, optimizer):
        super(CNN, self).__init__()

        self.conv1 = autograd.Conv2d(1, 20, 5, padding=0)
        self.conv2 = autograd.Conv2d(20, 50, 5, padding=0)
        self.linear1 = autograd.Linear(4 * 4 * 50, 500)
        self.linear2 = autograd.Linear(500, 10)
        self.pooling1 = autograd.MaxPool2d(2, 2, padding=0)
        self.pooling2 = autograd.MaxPool2d(2, 2, padding=0)

        self.optimizer = optimizer

    def forward(self, x):
        y = self.conv1(x)
        y = autograd.relu(y)
        y = self.pooling1(y)
        y = self.conv2(y)
        y = autograd.relu(y)
        y = self.pooling2(y)
        y = autograd.flatten(y)
        y = self.linear1(y)
        y = autograd.relu(y)
        y = self.linear2(y)
        return y

    def loss(self, x, ty):
        return autograd.softmax_cross_entropy(x, ty)

    def optim(self, loss):
        self.optimizer.backward_and_update(loss)

# initialization other objects
# ......
model = CNN(sgd)

# Train
for b in range(num_train_batch):
    # Generate the patch data in this iteration
    # ......

    # Copy the patch data into input tensors
    tx.copy_from_numpy(x)
    ty.copy_from_numpy(y)

    # Train the model
    out = model(tx)
    loss = model.loss(out, ty)
    model.optim(loss)

More examples:
- MLP
- CNN
- ResNet

Plan

[ ] Computation graph optimization: replace a subgraph of the input computation graph with another subgraph which is functionally equivalent to the original one.
[ ] Support recalculation and swapping out variables from the GPU.
[ ] Perform operations in the graph in the order of DFS.
[ ] Performing operations in parallel.

opened by XJDKC 20

AssertionError for the ONNX training testcases?

AssertionError with the onnx testcase: https://github.com/apache/singa/blob/master/examples/onnx/training/train.py

$ cd examples/onnx
$ python3 training/train.py --model vgg16

Then I get the following error msg:

File "training/train.py", line 437, in <module>
    args.onnx_model_path, args.data, sgd, args.graph, args.verbosity)
  File "training/train.py", line 295, in run
    model.compile([tx], is_train=True, use_graph=graph, sequential=sequential)
  File "/home/extend/lijiansong/work-space/anaconda2/envs/intel-caffe/lib/python3.6/site-packages/singa/model.py", line 177, in compile
    self.forward(*inputs)
  File "/home/extend/lijiansong/work-space/anaconda2/envs/intel-caffe/lib/python3.6/site-packages/singa/layer.py", line 63, in wrapper
    return func(self, *args, **kwargs)
  File "training/train.py", line 191, in forward
    y = self.linear(y)
  File "/home/extend/lijiansong/work-space/anaconda2/envs/intel-caffe/lib/python3.6/site-packages/singa/layer.py", line 110, in __call__
    return self.forward(*args, **kwargs)
  File "/home/extend/lijiansong/work-space/anaconda2/envs/intel-caffe/lib/python3.6/site-packages/singa/layer.py", line 61, in wrapper
    self.initialize(*args, **kwargs)
  File "/home/extend/lijiansong/work-space/anaconda2/envs/intel-caffe/lib/python3.6/site-packages/singa/layer.py", line 45, in wrapper
    'initialize function expects PlaceHolders or Tensors')
AssertionError: initialize function expects PlaceHolders or Tensors

Something maybe wrong with the layer initialization?

singa version: 3100(the latest build from the source code of master branch) Python version: 3.5.2 ONNX version: 1.5.0

opened by lijiansong 18

Clang-formatter results in different formatting?

I used the clang-formatter with VS-code after I alter the tensor.h file, it results in different format with the dev branch.

The tensor.cc should have re-formatted before in PR #581. So, did I use incorrect setting in clang-formatter?

opened by chrishkchris 17

Is there any runtime problem of onnx in Travis CI built SINGA CPU version related to libprotobuf.so.20?

In the log of travis CI CPU version build, it displays the error in test_onnx that cannot import libprotobuf.so.20 https://travis-ci.org/github/apache/singa/jobs/664251025#L3998


======================================================================
3966
ERROR: test_onnx (unittest.loader._FailedTest)
3967
----------------------------------------------------------------------
3968
ImportError: Failed to import test module: test_onnx
3969
Traceback (most recent call last):
3970
  File "/home/travis/conda-bld-1971.5/singa_1584596418932/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla/lib/python3.7/unittest/loader.py", line 436, in _find_test_path
3971
    module = self._get_module_from_name(name)
3972
  File "/home/travis/conda-bld-1971.5/singa_1584596418932/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla/lib/python3.7/unittest/loader.py", line 377, in _get_module_from_name
3973
    __import__(name)
3974
  File "/home/travis/conda-bld-1971.5/singa_1584596418932/test_tmp/test/python/test_onnx.py", line 24, in <module>
3975
    from singa import sonnx
3976
  File "/home/travis/conda-bld-1971.5/singa_1584596418932/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla/lib/python3.7/site-packages/singa/sonnx.py", line 23, in <module>
3977
    import onnx.utils
3978
  File "/home/travis/conda-bld-1971.5/singa_1584596418932/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla/lib/python3.7/site-packages/onnx/__init__.py", line 8, in <module>
3979
    from .onnx_cpp2py_export import ONNX_ML
3980
ImportError: libprotobuf.so.20: cannot open shared object file: No such file or directory

opened by chrishkchris 14

CVE-2007-4559 Patch

Patching CVE-2007-4559

Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

opened by TrellixVulnTeam 0
Intermediate information printing

Need to print intermediate information for cnn and cifar_distributed_cnn examples since it may take quite long for large models to finish one epoch of training

opened by lzjpaul 0
Add CodeQL workflow
Hi apache/singa!

This is not an automatic, 🤖-generated PR, as you can check in my GitHub profile, I work for GitHub and I am part of the GitHub Security Lab which is helping out with the migration of LGTM configurations to Code Scanning. You might have heard that we've integrated LGTM's underlying CodeQL analysis engine natively into GitHub. The result is GitHub code scanning!

With LGTM fully integrated into code scanning, we are focused on improving CodeQL within the native GitHub code scanning experience. In order to take advantage of current and future improvements to our analysis capabilities, we suggest you enable code scanning on your repository. Please take a look at our blog post for more information.

This pull request enables code scanning by adding an auto-generated codeql.yml workflow file for GitHub Actions to your repository — take a look! We tested it before opening this pull request, so all should be working :heavy_check_mark:. In fact, you might already have seen some alerts appear on this pull request!

Where needed and if possible, we’ve adjusted the configuration to the needs of your particular repository. But of course, you should feel free to tweak it further! Check this page for detailed documentation.

Questions? Check out the FAQ below!

FAQ

Click here to expand the FAQ section

How often will the code scanning analysis run?

By default, code scanning will trigger a scan with the CodeQL engine on the following events:

On every pull request — to flag up potential security problems for you to investigate before merging a PR.

On every push to your default branch and other protected branches — this keeps the analysis results on your repository’s Security tab up to date.

Once a week at a fixed time — to make sure you benefit from the latest updated security analysis even when no code was committed or PRs were opened.

What will this cost?

Nothing! The CodeQL engine will run inside GitHub Actions, making use of your unlimited free compute minutes for public repositories.

What types of problems does CodeQL find?

The CodeQL engine that powers GitHub code scanning is the exact same engine that powers LGTM.com. The exact set of rules has been tweaked slightly, but you should see almost exactly the same types of alerts as you were used to on LGTM.com: we’ve enabled the security-and-quality query suite for you.

How do I upgrade my CodeQL engine?

No need! New versions of the CodeQL analysis are constantly deployed on GitHub.com; your repository will automatically benefit from the most recently released version.

The analysis doesn’t seem to be working

If you get an error in GitHub Actions that indicates that CodeQL wasn’t able to analyze your code, please follow the instructions here to debug the analysis.

How do I disable LGTM.com?

If you have LGTM’s automatic pull request analysis enabled, then you can follow these steps to disable the LGTM pull request analysis. You don’t actually need to remove your repository from LGTM.com; it will automatically be removed in the next few months as part of the deprecation of LGTM.com (more info here).

Which source code hosting platforms does code scanning support?

GitHub code scanning is deeply integrated within GitHub itself. If you’d like to scan source code that is hosted elsewhere, we suggest that you create a mirror of that code on GitHub.

How do I know this PR is legitimate?

This PR is filed by the official LGTM.com GitHub App, in line with the deprecation timeline that was announced on the official GitHub Blog. The proposed GitHub Action workflow uses the official open source GitHub CodeQL Action. If you have any other questions or concerns, please join the discussion here in the official GitHub community!

I have another question / how do I get in touch?

Please join the discussion here to ask further questions and send us suggestions!
opened by pwntester 0
Bump protobuf-java from 2.6.1 to 3.16.3 in /java
Bumps protobuf-java from 2.6.1 to 3.16.3.

Release notes

Sourced from protobuf-java's releases.

Protobuf Release v3.16.3

Java

Refactoring java full runtime to reuse sub-message builders and prepare to migrate parsing logic from parse constructor to builder.

Move proto wireformat parsing functionality from the private "parsing constructor" to the Builder class.

Change the Lite runtime to prefer merging from the wireformat into mutable messages rather than building up a new immutable object before merging. This way results in fewer allocations and copy operations.

Make message-type extensions merge from wire-format instead of building up instances and merging afterwards. This has much better performance.

Fix TextFormat parser to build up recurring (but supposedly not repeated) sub-messages directly from text rather than building a new sub-message and merging the fully formed message into the existing field.

This release addresses a Security Advisory for Java users

Protocol Buffers v3.16.1

Java

Improve performance characteristics of UnknownFieldSet parsing (#9371)

Protocol Buffers v3.16.0

C++

Fix compiler warnings issue found in conformance_test_runner #8189 (#8190)

Fix MinGW-w64 build issues. (#8286)

[Protoc] C++ Resolved an issue where NO_DESTROY and CONSTINIT are in incorrect order (#8296)

Fix PROTOBUF_CONSTINIT macro redefinition (#8323)

Delete StringPiecePod (#8353)

Fix gcc error: comparison of unsigned expression in '>= 0' is always … (#8309)

Fix cmake install on iOS (#8301)

Create a CMake option to control whether or not RTTI is enabled (#8347)

Fix endian.h location on FreeBSD (#8351)

Refactor util::Status (#8354)

Make util::Status more similar to absl::Status (#8405)

Fix -Wsuggest-destructor-override for generated C++ proto classes. (#8408)

Refactor StatusOr and StringPiece (#8406)

Refactor uint128 (#8416)

The ::pb namespace is no longer exposed due to conflicts.

Allow MessageDifferencer::TreatAsSet() (and friends) to override previous calls instead of crashing.

Reduce the size of generated proto headers for protos with string or bytes fields.

Move arena() operation on uncommon path to out-of-line routine

For iterator-pair function parameter types, take both iterators by value.

Code-space savings and perhaps some modest performance improvements in RepeatedPtrField.

Eliminate nullptr check from every tag parse.

Remove unused _$name$cached_byte_size fields.

Serialize extension ranges together when not broken by a proto field in the middle.

Do out-of-line allocation and deallocation of string object in ArenaString.

... (truncated)

Commits

b8c2488 Updating version.json and repo version numbers to: 16.3

42e47e5 Refactoring Java parsing (3.16.x) (#10668)

98884a8 Merge pull request #10556 from deannagarcia/3.16.x

450b648 Cherrypick ruby fixes for monterey

b17bb39 Merge pull request #10548 from protocolbuffers/3.16.x-202209131829

c18f5e7 Updating changelog

6f4e817 Updating version.json and repo version numbers to: 16.2

a7d4e94 Merge pull request #10547 from deannagarcia/3.16.x

55815e4 Apply patch

152d7bf Update version.json with "lts": true (#10535)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies java
opened by dependabot[bot] 1
Adding GPU testing to Github workflows
This issue is open to discuss different options for adding GPU build and test to Github workflows.

To enable this feature, SINGA must provide a real or virtual machine with GPU as host machine for running the workflow. Then use the self-hosted runner feature of Github Actions. See also this MLOps video tutorial.

The team need to take some decisions:

Which machine(s) should we use? (e.g. virtual machine(s) on AWS, dedicated server(s) at NUS, ..)

Which operating systems we test on? (Only Linux? or also Mac).

When we run the GPU build and test workflow? (with every pull request? once per day at night? once per week? only on master branch?, ...)

Should we keep the machines always on? or use them only when the scheduled test is running and shut down them when there is no workflow runs? Assuming we run the GPU build and test only at scheduled time.

Should we add workflows to run examples and test the Jupyter notebooks? note that some examples may take hours or days to complete the training. But automating the test of examples can be very useful to speed up the development.

What do you think?
opened by moazreyad 21

Releases(3.0.0)

3.0.0(Apr 21, 2020)

Release note is here
Source code(tar.gz)
Source code(zip)
3.0.0.rc1(Apr 8, 2020)
This release includes following changes:

Code quality has been promoted by introducing linting check in CI and auto code formatter. For linting, the tools, cpplint and pylint, are used and configured to comply google coding styles details in tool/linting/. Similarly, formatting tools, clang-format and yapf configured with google coding styles, are the recommended one for developers to clean code before submitting changes, details in tool/code-format/. LGTM is enabled on Github for code quality check; License check is also enabled.

New Tensor APIs are added for naming consistency, and feature enhancement:

size(), mem_size(), get_value(), to_proto(), l1(), l2(): added for the sake of naming consistency

AsType(): convert data type between float and int

ceil(): perform element-wise ceiling of the input

concat(): concatenate two tensor

index selector: e.g. tensor1[:,:,1:,1:]

softmax(in, axis): allow to perform softmax on a axis on a multi-dimensional tensor

14 new operators are added into the autograd module: Gemm, GlobalAveragePool, ConstantOfShape, Dropout, ReduceSum, ReduceMean, Slice, Ceil, Split, Gather, Tile, NonZero, Cast, OneHot. Their unit tests are added as well.

14 new operators are added to sonnx module for both backend and frontend: Gemm, GlobalAveragePool, ConstantOfShape, Dropout, ReduceSum, ReduceMean, Slice, Ceil, Split, Gather, Tile, NonZero, Cast, OneHot. Their tests are added as well.

Some ONNX models are imported into SINGA, including Bert-squad, Arcface, FER+ Emotion, MobileNet, ResNet18, Tiny Yolov2, Vgg16, and Mnist.

Some operators now support multidirectional broadcasting, including Add, Sub, Mul, Div, Pow, PRelu, Gemm

[Distributed training with communication optimization]. DistOpt has implemented multiple optimization techniques, including gradient sparsification, chunk transmission, and gradient compression.

Computational graph construction at the CPP level. The operations submitted to the Device are buffered. After analyzing the dependency, the computational graph is created, which is further analyzed for speed and memory optimization. To enable this feature, use the Module API.

New website based on Docusaurus. The documentation files are moved to a separate repo [singa-doc]](https://github.com/apache/singa-doc). The static website files are stored at singa-site.

DNNL(Deep Neural Network Library), powered by Intel, is integrated into model/operations/[batchnorm|pooling|convolution], the changes is opaque to the end users. The current version is dnnl v1.1 which replaced previous integration of mkl-dnn v0.18. The framework could boost the performance of dl operations when executing on CPU. The dnnl dependency is installed through conda.

Some Tensor APIs are marked as deprecated which could be replaced by broadcast, and it can support better on multi-dimensional operations. These APIs are add_column(), add_row(), div_column(), div_row(), mult_column(), mult_row()

Conv and Pooling are enhanced to support fine-grained padding like (2,3,2,3), and SAME_UPPER, SAME_LOWER pad mode and shape checking.

Reconstruct soonx,

Support two types of weight value (Initializer and Constant Node);

For some operators (BatchNorm, Reshape, Clip, Slice, Gather, Tile, OneHot), move some inputs to its attributes;

Define and implement the type conversion map.

Source code(tar.gz)
Source code(zip)
2.0.0(Apr 7, 2019)
In this release, we have added the support of ONNX, implemented the CPU operations using MKLDNN, added operations for autograd, and updated the dependent libraries and CI tools.

Core components

[SINGA-434] Support tensor broadcasting

[SINGA-370] Improvement to tensor reshape and various misc. changes related to SINGA-341 and 351

Model components

[SINGA-333] Add support for Open Neural Network Exchange (ONNX) format

[SINGA-385] Add new python module for optimizers

[SINGA-394] Improve the CPP operations via Intel MKL DNN lib

[SINGA-425] Add 3 operators , Abs(), Exp() and leakyrelu(), for Autograd

[SINGA-410] Add two function, set_params() and get_params(), for Autograd Layer class

[SINGA-383] Add Separable Convolution for autograd

[SINGA-388] Develop some RNN layers by calling tiny operations like matmul, addbias.

[SINGA-382] Implement concat operation for autograd

[SINGA-378] Implement maxpooling operation and its related functions for autograd

[SINGA-379] Implement batchnorm operation and its related functions for autograd

Utility functions and CI

[SINGA-432] Update depdent lib versions in conda-build config

[SINGA-429] Update docker images for latest cuda and cudnn

[SINGA-428] Move Docker images under Apache user name

Documentation and usability

[SINGA-395] Add documentation for autograd APIs

[SINGA-344] Add a GAN example

[SINGA-390] Update installation.md

[SINGA-384] Implement ResNet using autograd API

[SINGA-352] Complete SINGA documentation in Chinese version

Bugs fixed

[SINGA-431] Unit Test failed - Tensor Transpose

[SINGA-422] ModuleNotFoundError: No module named "_singa_wrap"

[SINGA-418] Unsupportive type 'long' in python3.

[SINGA-409] Basic singa-cpu import throws error

[SINGA-408] Unsupportive function definition in python3

[SINGA-380] Fix bugs from Reshape

Source code(tar.gz)
Source code(zip)

Owner

The Apache Software Foundation

GitHub

BigDL: Distributed Deep Learning Framework for Apache Spark

BigDL: Distributed Deep Learning on Apache Spark What is BigDL? BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can w

4.1k Jan 9, 2023

Distributed Deep learning with Keras & Spark

Elephas: Distributed Deep Learning with Keras & Spark Elephas is an extension of Keras, which allows you to run distributed deep learning models at sc

1.6k Dec 29, 2022

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Petastorm Contents Petastorm Installation Generating a dataset Plain Python API Tensorflow API Pytorch API Spark Dataset Converter API Analyzing petas

1.6k Dec 31, 2022

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective. 10x Larger Models 10x Faster Trainin

8.4k Dec 30, 2022

WAGMA-SGD is a decentralized asynchronous SGD for distributed deep learning training based on model averaging.

WAGMA-SGD is a decentralized asynchronous SGD based on wait-avoiding group model averaging. The synchronization is relaxed by making the collectives externally-triggerable, namely, a collective can be initiated without requiring that all the processes enter it. It partially reduces the data within non-overlapping groups of process, improving the parallel scalability.

6 Jun 18, 2022

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

14.5k Jan 7, 2023

An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear

23.3k Dec 31, 2022

🎛 Distributed machine learning made simple.

?? lazycluster Distributed machine learning made simple. Use your preferred distributed ML framework like a lazy engineer. Getting Started • Highlight

44 Nov 27, 2022

Management of exclusive GPU access for distributed machine learning workloads

TensorHive is an open source tool for managing computing resources used by multiple users across distributed hosts. It focuses on granting

131 Dec 12, 2022

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community

23.6k Jan 3, 2023

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Horovod Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make dis

12.9k Jan 7, 2023

Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray

A unified Data Analytics and AI platform for distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray What is Analytics Zoo? Analytics Zo

2.5k Dec 28, 2022

A high performance and generic framework for distributed DNN training

BytePS BytePS is a high performance and general distributed training framework. It supports TensorFlow, Keras, PyTorch, and MXNet, and can run on eith

3.3k Dec 28, 2022

Distributed Computing for AI Made Simple

Project Home Blog Documents Paper Media Coverage Join Fiber users email list [email protected] Fiber Distributed Computing for AI Made Simp

997 Dec 30, 2022

Distributed scikit-learn meta-estimators in PySpark

sk-dist: Distributed scikit-learn meta-estimators in PySpark What is it? sk-dist is a Python package for machine learning built on top of scikit-learn

282 Dec 9, 2022

Distributed Evolutionary Algorithms in Python

DEAP DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. It seeks to make algorithms explicit and data stru

4.9k Jan 5, 2023

DistML is a Ray extension library to support large-scale distributed ML training on heterogeneous multi-node multi-GPU clusters

27 Aug 19, 2022

Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way

Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.

121 Dec 28, 2022

Graphsignal is a machine learning model monitoring platform.

Graphsignal is a machine learning model monitoring platform. It helps ML engineers, MLOps teams and data scientists to quickly address issues with data and models as well as proactively analyze model performance and availability.

143 Dec 5, 2022

a distributed deep learning platform

Related tags

Overview

Apache SINGA

Quick Start

Issues

Code Analysis:

Mailing Lists

Comments

Overview

Features

Design

Changes

Evaluation

Single node

Multi processes

How to use

Plan

Patching CVE-2007-4559

FAQ

How often will the code scanning analysis run?

What will this cost?

What types of problems does CodeQL find?

How do I upgrade my CodeQL engine?

The analysis doesn’t seem to be working

How do I disable LGTM.com?

Which source code hosting platforms does code scanning support?

How do I know this PR is legitimate?

I have another question / how do I get in touch?

Protobuf Release v3.16.3

Java

Protocol Buffers v3.16.1

Java

Protocol Buffers v3.16.0

C++

Releases(3.0.0)

3.0.0(Apr 21, 2020)

3.0.0.rc1(Apr 8, 2020)

2.0.0(Apr 7, 2019)

Owner

The Apache Software Foundation

BigDL: Distributed Deep Learning Framework for Apache Spark

Distributed Deep learning with Keras & Spark

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

WAGMA-SGD is a decentralized asynchronous SGD for distributed deep learning training based on model averaging.

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

🎛 Distributed machine learning made simple.

Management of exclusive GPU access for distributed machine learning workloads

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray

A high performance and generic framework for distributed DNN training

Distributed Computing for AI Made Simple

Distributed scikit-learn meta-estimators in PySpark

Distributed Evolutionary Algorithms in Python

DistML is a Ray extension library to support large-scale distributed ML training on heterogeneous multi-node multi-GPU clusters

Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way

Graphsignal is a machine learning model monitoring platform.