Apache SINGA
Distributed deep learning system
Distributed deep learning system
The changes recommended in https://github.com/apache/singa-doc/pull/14 to simply the APIs of the examples I have reduced the number of arguments in the train_mnist_cnn
Seems that adding nccl and mpich is okay in conda build of singa, but need to check further and add other thing such as python "deprecated"
======================================================================
ERROR: test_tensor (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: test_tensor
Traceback (most recent call last):
File "/root/miniconda/conda-bld/singa_1583767754024/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib/python3.6/unittest/loader.py", line 428, in _find_test_path
module = self._get_module_from_name(name)
File "/root/miniconda/conda-bld/singa_1583767754024/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib/python3.6/unittest/loader.py", line 369, in _get_module_from_name
__import__(name)
File "/root/miniconda/conda-bld/singa_1583767754024/test_tmp/test/python/test_tensor.py", line 24, in <module>
from singa import tensor
File "/root/miniconda/conda-bld/singa_1583767754024/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib/python3.6/site-packages/singa/tensor.py", line 58, in <module>
from deprecated import deprecated
ModuleNotFoundError: No module named 'deprecated'
======================================================================
FAIL: test_conv2d (test_mkldnn.TestPythonOperation)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/root/miniconda/conda-bld/singa_1583767754024/test_tmp/test/python/test_mkldnn.py", line 78, in test_conv2d
self.assertAlmostEqual(4.0, _dW[0], places=5)
AssertionError: 4.0 != 0.040000003 within 5 places
----------------------------------------------------------------------
Ran 13 tests in 0.003s
FAILED (failures=1, errors=12)
TEST CONV2D FORWARD
TEST CONV2D DATA BACKWARD
TEST CONV2D WEIGHT BACKWARD
+ exit 0
Resource usage statistics from testing singa:
Process count: 1
CPU time: unavailable
Memory: 3.0M
Disk usage: 656B
Time elapsed: 0:00:02.1
TEST END: /root/miniconda/conda-bld/linux-64/singa-2.1.0.dev-cudnn7.3.1_cuda10.0_py36.tar.bz2
Renaming work directory, /root/miniconda/conda-bld/singa_1583767754024/work to /root/miniconda/conda-bld/singa_1583767754024/work_moved_singa-2.1.0.dev-cudnn7.3.1_cuda10.0_py36_linux-64_main_build_loop
# Automatic uploading is disabled
# If you want to upload package(s) to anaconda.org later, type:
anaconda upload /root/miniconda/conda-bld/linux-64/singa-2.1.0.dev-cudnn7.3.1_cuda10.0_py36.tar.bz2
# To have conda build upload to anaconda.org automatically, use
# $ conda config --set anaconda_upload yes
anaconda_upload is not set. Not uploading wheels: []
####################################################################################
Resource usage summary:
Total time: 0:07:23.3
CPU usage: sys=0:00:06.5, user=0:02:17.0
Maximum memory usage observed: 777.8M
Total disk usage observed (not including envs): 252.4K
####################################################################################
Source and build intermediates have been left in /root/miniconda/conda-bld.
There are currently 4 accumulated.
To remove them, you can run the ```conda build purge``` command
root@3c17fd6cb72e:~/dcsysh/singa/tool/conda/singa#
After merging the PR #590, today I am trying to solve the problem that the mnist cnn hangs as reported in PR #589. When I run mnist_cnn after changing to cpu, it hangs. So the training has some problem.
The Layer class in Autograd is to maintain the model parameters. It passes the parameters into the operation and thus operations are stateless.
Typically the parameter size depends on the input and layer configuration. Currently, we require the users to provide the input size in the layer constructor. Then we can create the parameter tensor and initialize it in the constructor, e.g., Linear layer. One potential problem is that the initialization operation may not be buffered. @XJDKC Is it an issue? For some layers like RNN implemented using cudnn, although we can get the input size, the parameter size is unknown until the cudnn handle is created, which is done until the data is forwarded through the layer.
Another way is to delay the parameter tensor creation until the layer is called for forward propagation. At that time, we have the input tensor (and its device). Then in the layer constructor, we do not need the user to provide the input size. The drawback is that after the layer is created, the get_params() function would still fail to get the parameter tensors as they are not created yet. @dcslin To switch to this approach, we need to change the constructors of existing layer classes and examples. We also need to provide an initializer function/class into the constructor for initializing the parameter tensors after they are created.
Please add your comments.
Updated on May 15
class Layer:
def get_params(self):
"""the params of this layer and sublayers as a dict; param name is: layername.param
e.g., self.W = Tensor(), self.b=Tensor()
name of W and b is like conv1.W and conv1.b
"""
def get_states(self):
"""states of this layer as sublayers that are necessary for model evaluation/inference.
the states include the params and others, e.g., the running mean and var of batchnorm.
"""
class Module(Layer):
def compile(self ...):
"""set the name of each layer and sublayers, which will be used to create the dict
for get_params and get_states. Then no need to manually config the layer name
the __init__ method of a layer.
For instance,
class Blk(Layer):
def __init__(self):
self.conv1= Conv2d()
self.conv2 = Conv2d()
class MyModel(Module):
def __init__(self):
self.blk1 = Blk() --> blk1.conv1, blk1.conv2
self.blk2 = Blk() --> blk2.conv1, blk2.conv2
"""
# high priority
def save(self, fpath, ckp_states={}):
"""Save the model and optionally some states.
Args:
fpath: output file path (without the extension)
ckp_states(dict): states for checkpoint that are not attributes of Module, e.g., epoch ID.
"""
cust_states = {}
if ckp_states is not None:
cust_states = ckp_states + model (include sublayers) attributes - get_states()
save model states via onnx with customized field for the cust_states
def load(self, fpath, dev, use_graph, graph_alg):
"""Load the model onto dev
Args:
path: input file path (without the extension)
Returns:
dict for the ckp_states.
```
load model states + cust_states
model attributes = model states + attributes from cust_states
self.compile()
restore the model attributes
return the rest states as a dict
# lower priority
def save(fpath, model, ckp_states):
attributes <-- model
replace all tensors in attributes + ckp_states into dict name -->(shape, dtype)
dump the tensors via numpy.savez_compressed
dump model via pickle
def load(fpath, dev, use_graph, graph_alg):
load model via pickle
load tensors via numpy.load
restore the tensors
return the ckp_states
Clarification:
Layer.get_params()
Layer.get_states()
class.__dict__
. Superset of states.todo-list:
#688 is refactoring the autograd module. Here are some comments about the current APIs in autograd.
Relationship between the classes and functions in autograd.
Operator
implements the forward and backward method for autograd.
For each Operator
class, there is a function that creates an Operator
instance and calls the forward method.
Layer
stores the states (handles and parameters) and calls Operator
function for the real computation. Note that a layer class can have sub-layers (as states) for creating complex and deep models.
Issue:
When we create a network using the Module
API, there are both stateless (e.g., flatten) and statefull (conv2d) operations. Currently, we create layers in __init__
of Module
and calls the layers and operator function in forward
method. Therefore, Layer and Operator are mixed, which may confuse the users. A better way is to use Layer instances only. For every operator, we create a corresponding layer class to replace the layer function.
Layer API. issue: when and how to initialize the parameters and (handle) of a layer?
__init__
method OR when the data is forwarded for the first time #674initializer
function to the __init__
method of a layer and use it to initialize the parameters OR pass an initializer
function to the __init__
method of the Module class and use it to initialize the parameter (through get_params
) of the layers after forwarding the layers for once. The second approach requires the Module class's __init__
to do a forward pass of all layers and then get_params
of each layer for initialization. To do that, it needs at least the shapes of the input tensors and the device. The drawback of the first approach is that we need to include the initializer in every Layer constructor.comments are welcomed.
This PR adds the computational graph with memory optimization. It is based on the code developed by @chrishkchris and @XJDKC and some discussions with @nudles.
There are four main features in this PR, namely the construction of the computational graph, lazy allocation, automatic recycling, and synchronization pipeline. Details as follows:
Computational graph construction
: Construct a computational graph based on the user-defined neural network or expressions and then run the graph to accomplish the training task. The computational graph also includes operations like synch and fused synch in the communicator.Lazy allocation
: When blocks need to be allocated, devices won't allocate memory for them immediately. Only when an operation uses this block for the first time, memory allocation will be performed.Automatic recycling
: Automatically deallocate the intermediate tensors which won't be used again in the following operations when we are running the graph in an iteration.Synchronization pipeline
: In previous synchronization operations, buffers were used to synchronize multiple tensors at once. But the communicator needs to collect all the tensors before copying them into the buffer. Synchronization pipeline can copy tensors to the buffer separately, which reduces the time for synchronous operations.Tensor
&Operation
&Communicator
Device
: Add code for
Scheduler
Block
: Add a member variable of type device to help to do the lazy allocation. Add a function to help to do automatic recycling.Communicator
and opt.py
: Support the synchronization pipeline.Swig
: add some interfacesModule
: Provide a Module class on the python side for users to use the graph more conveniently.Examples
: Add some examples with operations buffering by using Module class.s
:secondit
: iterationMemory
:peak memory usage of single GPUThroughout
:number of pictures processed per secondTime
:total timeSpeed
:iterations per secondReduction
:the memory usage reduction rate compared with dev branchSeepdup
: speedup ratio compared with dev branchBatchsize | Cases | Memory(MB) | Time(s) | Speed(it/s) | Throughput | Reduction | Speedup |
---|---|---|---|---|---|---|---|
16 | dev | 4975 | 14.1952 | 14.0893 | 225.4285 | 0.00% | 1.0000 |
PR:no graph | 4995 | 14.1264 | 14.1579 | 226.5261 | -0.40% | 1.0049 | |
PR:with graph, bfs | 3283 | 13.7438 | 14.5520 | 232.8318 | 34.01% | 1.0328 | |
PR:with graph, serial | 3265 | 13.7420 | 14.5540 | 232.8635 | 34.37% | 1.0330 | |
32 | dev | 10119 | 13.4587 | 7.4302 | 237.7649 | 0.00% | 1.0000 |
PR:no graph | 10109 | 13.2952 | 7.5315 | 240.6875 | 0.10% | 1.0123 | |
PR:with graph, bfs | 6839 | 13.1059 | 7.6302 | 244.1648 | 32.41% | 1.0269 | |
PR:with graph, serial | 6845 | 13.0489 | 7.6635 | 245.2312 | 32.35% | 1.0314 |
Batchsize | Cases | Memory(MB) | Time(s) | Speed(it/s) | Throughput | Reduction | Speedup |
---|---|---|---|---|---|---|---|
16 | dev | 5439 | 17.3323 | 11.5391 | 369.2522 | 0.00% | 1.0000 |
PR:no graph | 5427 | 17.8232 | 11.2213 | 359.0831 | 0.22% | 0.9725 | |
PR:with graph, bfs | 3389 | 18.2310 | 10.9703 | 351.0504 | 37.69% | 0.9725 | |
PR:with graph, serial | 3437 | 17.0389 | 11.7378 | 375.6103 | 36.69% | 1.0172 | |
32 | dev | 10547 | 14.8635 | 6.7684 | 237.7649 | 0.00% | 1.0000 |
PR:no graph | 10503 | 14.7746 | 6.7684 | 433.1748 | 0.42% | 1.0060 | |
PR:with graph, bfs | 6935 | 14.8553 | 6.7384 | 433.1748 | 34.25% | 1.0269 | |
PR:with graph, serial | 7027 | 14.3271 | 6.9798 | 446.7074 | 33.37% | 1.0374 |
From the table above, we can know:
class CNN(module.Module):
def __init__(self, optimizer):
super(CNN, self).__init__()
self.conv1 = autograd.Conv2d(1, 20, 5, padding=0)
self.conv2 = autograd.Conv2d(20, 50, 5, padding=0)
self.linear1 = autograd.Linear(4 * 4 * 50, 500)
self.linear2 = autograd.Linear(500, 10)
self.pooling1 = autograd.MaxPool2d(2, 2, padding=0)
self.pooling2 = autograd.MaxPool2d(2, 2, padding=0)
self.optimizer = optimizer
def forward(self, x):
y = self.conv1(x)
y = autograd.relu(y)
y = self.pooling1(y)
y = self.conv2(y)
y = autograd.relu(y)
y = self.pooling2(y)
y = autograd.flatten(y)
y = self.linear1(y)
y = autograd.relu(y)
y = self.linear2(y)
return y
def loss(self, x, ty):
return autograd.softmax_cross_entropy(x, ty)
def optim(self, loss):
self.optimizer.backward_and_update(loss)
# initialization other objects
# ......
model = CNN(sgd)
# Train
for b in range(num_train_batch):
# Generate the patch data in this iteration
# ......
# Copy the patch data into input tensors
tx.copy_from_numpy(x)
ty.copy_from_numpy(y)
# Train the model
out = model(tx)
loss = model.loss(out, ty)
model.optim(loss)
AssertionError with the onnx testcase: https://github.com/apache/singa/blob/master/examples/onnx/training/train.py
$ cd examples/onnx
$ python3 training/train.py --model vgg16
Then I get the following error msg:
File "training/train.py", line 437, in <module>
args.onnx_model_path, args.data, sgd, args.graph, args.verbosity)
File "training/train.py", line 295, in run
model.compile([tx], is_train=True, use_graph=graph, sequential=sequential)
File "/home/extend/lijiansong/work-space/anaconda2/envs/intel-caffe/lib/python3.6/site-packages/singa/model.py", line 177, in compile
self.forward(*inputs)
File "/home/extend/lijiansong/work-space/anaconda2/envs/intel-caffe/lib/python3.6/site-packages/singa/layer.py", line 63, in wrapper
return func(self, *args, **kwargs)
File "training/train.py", line 191, in forward
y = self.linear(y)
File "/home/extend/lijiansong/work-space/anaconda2/envs/intel-caffe/lib/python3.6/site-packages/singa/layer.py", line 110, in __call__
return self.forward(*args, **kwargs)
File "/home/extend/lijiansong/work-space/anaconda2/envs/intel-caffe/lib/python3.6/site-packages/singa/layer.py", line 61, in wrapper
self.initialize(*args, **kwargs)
File "/home/extend/lijiansong/work-space/anaconda2/envs/intel-caffe/lib/python3.6/site-packages/singa/layer.py", line 45, in wrapper
'initialize function expects PlaceHolders or Tensors')
AssertionError: initialize function expects PlaceHolders or Tensors
Something maybe wrong with the layer initialization?
singa version: 3100(the latest build from the source code of master branch) Python version: 3.5.2 ONNX version: 1.5.0
I used the clang-formatter with VS-code after I alter the tensor.h file, it results in different format with the dev branch.
The tensor.cc should have re-formatted before in PR #581. So, did I use incorrect setting in clang-formatter?
In the log of travis CI CPU version build, it displays the error in test_onnx that cannot import libprotobuf.so.20 https://travis-ci.org/github/apache/singa/jobs/664251025#L3998
======================================================================
3966
ERROR: test_onnx (unittest.loader._FailedTest)
3967
----------------------------------------------------------------------
3968
ImportError: Failed to import test module: test_onnx
3969
Traceback (most recent call last):
3970
File "/home/travis/conda-bld-1971.5/singa_1584596418932/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla/lib/python3.7/unittest/loader.py", line 436, in _find_test_path
3971
module = self._get_module_from_name(name)
3972
File "/home/travis/conda-bld-1971.5/singa_1584596418932/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla/lib/python3.7/unittest/loader.py", line 377, in _get_module_from_name
3973
__import__(name)
3974
File "/home/travis/conda-bld-1971.5/singa_1584596418932/test_tmp/test/python/test_onnx.py", line 24, in <module>
3975
from singa import sonnx
3976
File "/home/travis/conda-bld-1971.5/singa_1584596418932/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla/lib/python3.7/site-packages/singa/sonnx.py", line 23, in <module>
3977
import onnx.utils
3978
File "/home/travis/conda-bld-1971.5/singa_1584596418932/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla/lib/python3.7/site-packages/onnx/__init__.py", line 8, in <module>
3979
from .onnx_cpp2py_export import ONNX_ML
3980
ImportError: libprotobuf.so.20: cannot open shared object file: No such file or directory
Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.
If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.
Need to print intermediate information for cnn and cifar_distributed_cnn examples since it may take quite long for large models to finish one epoch of training
Hi apache/singa
!
This is not an automatic, 🤖-generated PR, as you can check in my GitHub profile, I work for GitHub and I am part of the GitHub Security Lab which is helping out with the migration of LGTM configurations to Code Scanning. You might have heard that we've integrated LGTM's underlying CodeQL analysis engine natively into GitHub. The result is GitHub code scanning!
With LGTM fully integrated into code scanning, we are focused on improving CodeQL within the native GitHub code scanning experience. In order to take advantage of current and future improvements to our analysis capabilities, we suggest you enable code scanning on your repository. Please take a look at our blog post for more information.
This pull request enables code scanning by adding an auto-generated codeql.yml
workflow file for GitHub Actions to your repository — take a look! We tested it before opening this pull request, so all should be working :heavy_check_mark:. In fact, you might already have seen some alerts appear on this pull request!
Where needed and if possible, we’ve adjusted the configuration to the needs of your particular repository. But of course, you should feel free to tweak it further! Check this page for detailed documentation.
Questions? Check out the FAQ below!
By default, code scanning will trigger a scan with the CodeQL engine on the following events:
Nothing! The CodeQL engine will run inside GitHub Actions, making use of your unlimited free compute minutes for public repositories.
The CodeQL engine that powers GitHub code scanning is the exact same engine that powers LGTM.com. The exact set of rules has been tweaked slightly, but you should see almost exactly the same types of alerts as you were used to on LGTM.com: we’ve enabled the security-and-quality
query suite for you.
No need! New versions of the CodeQL analysis are constantly deployed on GitHub.com; your repository will automatically benefit from the most recently released version.
If you get an error in GitHub Actions that indicates that CodeQL wasn’t able to analyze your code, please follow the instructions here to debug the analysis.
If you have LGTM’s automatic pull request analysis enabled, then you can follow these steps to disable the LGTM pull request analysis. You don’t actually need to remove your repository from LGTM.com; it will automatically be removed in the next few months as part of the deprecation of LGTM.com (more info here).
GitHub code scanning is deeply integrated within GitHub itself. If you’d like to scan source code that is hosted elsewhere, we suggest that you create a mirror of that code on GitHub.
This PR is filed by the official LGTM.com GitHub App, in line with the deprecation timeline that was announced on the official GitHub Blog. The proposed GitHub Action workflow uses the official open source GitHub CodeQL Action. If you have any other questions or concerns, please join the discussion here in the official GitHub community!
Please join the discussion here to ask further questions and send us suggestions!
Bumps protobuf-java from 2.6.1 to 3.16.3.
Sourced from protobuf-java's releases.
Protobuf Release v3.16.3
Java
- Refactoring java full runtime to reuse sub-message builders and prepare to migrate parsing logic from parse constructor to builder.
- Move proto wireformat parsing functionality from the private "parsing constructor" to the Builder class.
- Change the Lite runtime to prefer merging from the wireformat into mutable messages rather than building up a new immutable object before merging. This way results in fewer allocations and copy operations.
- Make message-type extensions merge from wire-format instead of building up instances and merging afterwards. This has much better performance.
- Fix TextFormat parser to build up recurring (but supposedly not repeated) sub-messages directly from text rather than building a new sub-message and merging the fully formed message into the existing field.
- This release addresses a Security Advisory for Java users
Protocol Buffers v3.16.1
Java
- Improve performance characteristics of UnknownFieldSet parsing (#9371)
Protocol Buffers v3.16.0
C++
- Fix compiler warnings issue found in conformance_test_runner #8189 (#8190)
- Fix MinGW-w64 build issues. (#8286)
- [Protoc] C++ Resolved an issue where NO_DESTROY and CONSTINIT are in incorrect order (#8296)
- Fix PROTOBUF_CONSTINIT macro redefinition (#8323)
- Delete StringPiecePod (#8353)
- Fix gcc error: comparison of unsigned expression in '>= 0' is always … (#8309)
- Fix cmake install on iOS (#8301)
- Create a CMake option to control whether or not RTTI is enabled (#8347)
- Fix endian.h location on FreeBSD (#8351)
- Refactor util::Status (#8354)
- Make util::Status more similar to absl::Status (#8405)
- Fix -Wsuggest-destructor-override for generated C++ proto classes. (#8408)
- Refactor StatusOr and StringPiece (#8406)
- Refactor uint128 (#8416)
- The ::pb namespace is no longer exposed due to conflicts.
- Allow MessageDifferencer::TreatAsSet() (and friends) to override previous calls instead of crashing.
- Reduce the size of generated proto headers for protos with
string
orbytes
fields.- Move arena() operation on uncommon path to out-of-line routine
- For iterator-pair function parameter types, take both iterators by value.
- Code-space savings and perhaps some modest performance improvements in RepeatedPtrField.
- Eliminate nullptr check from every tag parse.
- Remove unused _$name$cached_byte_size fields.
- Serialize extension ranges together when not broken by a proto field in the middle.
- Do out-of-line allocation and deallocation of string object in ArenaString.
... (truncated)
b8c2488
Updating version.json and repo version numbers to: 16.342e47e5
Refactoring Java parsing (3.16.x) (#10668)98884a8
Merge pull request #10556 from deannagarcia/3.16.x450b648
Cherrypick ruby fixes for montereyb17bb39
Merge pull request #10548 from protocolbuffers/3.16.x-202209131829c18f5e7
Updating changelog6f4e817
Updating version.json and repo version numbers to: 16.2a7d4e94
Merge pull request #10547 from deannagarcia/3.16.x55815e4
Apply patch152d7bf
Update version.json with "lts": true (#10535)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebase
will rebase this PR@dependabot recreate
will recreate this PR, overwriting any edits that have been made to it@dependabot merge
will merge this PR after your CI passes on it@dependabot squash and merge
will squash and merge this PR after your CI passes on it@dependabot cancel merge
will cancel a previously requested merge and block automerging@dependabot reopen
will reopen this PR if it is closed@dependabot close
will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually@dependabot ignore this major version
will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor version
will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependency
will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)@dependabot use these labels
will set the current labels as the default for future PRs for this repo and language@dependabot use these reviewers
will set the current reviewers as the default for future PRs for this repo and language@dependabot use these assignees
will set the current assignees as the default for future PRs for this repo and language@dependabot use this milestone
will set the current milestone as the default for future PRs for this repo and languageYou can disable automated security fix PRs for this repo from the Security Alerts page.
This issue is open to discuss different options for adding GPU build and test to Github workflows.
To enable this feature, SINGA must provide a real or virtual machine with GPU as host machine for running the workflow. Then use the self-hosted runner feature of Github Actions. See also this MLOps video tutorial.
The team need to take some decisions:
What do you think?
Release note is here
Source code(tar.gz)This release includes following changes:
Code quality has been promoted by introducing linting check in CI and auto code formatter.
For linting, the tools, cpplint
and pylint
, are used and configured to comply
google coding styles details in tool/linting/
.
Similarly, formatting tools, clang-format
and yapf
configured with google coding styles,
are the recommended one for developers to clean code before submitting changes,
details in tool/code-format/
. LGTM is enabled on Github for
code quality check; License check is also enabled.
New Tensor APIs are added for naming consistency, and feature enhancement:
float
and int
14 new operators are added into the autograd module: Gemm, GlobalAveragePool, ConstantOfShape, Dropout, ReduceSum, ReduceMean, Slice, Ceil, Split, Gather, Tile, NonZero, Cast, OneHot. Their unit tests are added as well.
14 new operators are added to sonnx module for both backend and frontend: Gemm, GlobalAveragePool, ConstantOfShape, Dropout, ReduceSum, ReduceMean, Slice, Ceil, Split, Gather, Tile, NonZero, Cast, OneHot. Their tests are added as well.
Some ONNX models are imported into SINGA, including Bert-squad, Arcface, FER+ Emotion, MobileNet, ResNet18, Tiny Yolov2, Vgg16, and Mnist.
Some operators now support multidirectional broadcasting, including Add, Sub, Mul, Div, Pow, PRelu, Gemm
[Distributed training with communication optimization]. DistOpt has implemented multiple optimization techniques, including gradient sparsification, chunk transmission, and gradient compression.
Computational graph construction at the CPP level. The operations submitted to the Device are buffered. After analyzing the dependency, the computational graph is created, which is further analyzed for speed and memory optimization. To enable this feature, use the Module API.
New website based on Docusaurus. The documentation files are moved to a separate repo [singa-doc]](https://github.com/apache/singa-doc). The static website files are stored at singa-site.
DNNL(Deep Neural Network Library), powered by Intel,
is integrated into model/operations/[batchnorm|pooling|convolution]
,
the changes is opaque to the end users. The current version is dnnl v1.1
which replaced previous integration of mkl-dnn v0.18. The framework could
boost the performance of dl operations when executing on CPU. The dnnl dependency
is installed through conda.
Some Tensor APIs are marked as deprecated which could be replaced by broadcast, and it can support better on multi-dimensional operations. These APIs are add_column(), add_row(), div_column(), div_row(), mult_column(), mult_row()
Conv and Pooling are enhanced to support fine-grained padding like (2,3,2,3), and SAME_UPPER, SAME_LOWER pad mode and shape checking.
Reconstruct soonx,
In this release, we have added the support of ONNX, implemented the CPU operations using MKLDNN, added operations for autograd, and updated the dependent libraries and CI tools.
Core components
Model components
Utility functions and CI
Documentation and usability
Bugs fixed
singa-cpu
import throws errorBigDL: Distributed Deep Learning on Apache Spark What is BigDL? BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can w
Elephas: Distributed Deep Learning with Keras & Spark Elephas is an extension of Keras, which allows you to run distributed deep learning models at sc
Petastorm Contents Petastorm Installation Generating a dataset Plain Python API Tensorflow API Pytorch API Spark Dataset Converter API Analyzing petas
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective. 10x Larger Models 10x Faster Trainin
WAGMA-SGD is a decentralized asynchronous SGD based on wait-avoiding group model averaging. The synchronization is relaxed by making the collectives externally-triggerable, namely, a collective can be initiated without requiring that all the processes enter it. It partially reduces the data within non-overlapping groups of process, improving the parallel scalability.
Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a
Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear
?? lazycluster Distributed machine learning made simple. Use your preferred distributed ML framework like a lazy engineer. Getting Started • Highlight
TensorHive is an open source tool for managing computing resources used by multiple users across distributed hosts. It focuses on granting
eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l
Horovod Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make dis
A unified Data Analytics and AI platform for distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray What is Analytics Zoo? Analytics Zo
BytePS BytePS is a high performance and general distributed training framework. It supports TensorFlow, Keras, PyTorch, and MXNet, and can run on eith
Project Home Blog Documents Paper Media Coverage Join Fiber users email list [email protected] Fiber Distributed Computing for AI Made Simp
sk-dist: Distributed scikit-learn meta-estimators in PySpark What is it? sk-dist is a Python package for machine learning built on top of scikit-learn
DEAP DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. It seeks to make algorithms explicit and data stru
DistML is a Ray extension library to support large-scale distributed ML training on heterogeneous multi-node multi-GPU clusters
Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.
Graphsignal is a machine learning model monitoring platform. It helps ML engineers, MLOps teams and data scientists to quickly address issues with data and models as well as proactively analyze model performance and availability.