DaCe is a parallel programming framework that takes code in Python/NumPy and other programming languages

Overview

General Tests GPU Tests FPGA Tests Documentation Status PyPI version codecov

DaCe - Data-Centric Parallel Programming

Decoupling domain science from performance optimization.

DaCe is a parallel programming framework that takes code in Python/NumPy and other programming languages, and maps it to high-performance CPU, GPU, and FPGA programs, which can be optimized to achieve state-of-the-art. Internally, DaCe uses the Stateful DataFlow multiGraph (SDFG) data-centric intermediate representation: A transformable, interactive representation of code based on data movement. Since the input code and the SDFG are separate, it is posible to optimize a program without changing its source, so that it stays readable. On the other hand, transformations are customizable and user-extensible, so they can be written once and reused in many applications. With data-centric parallel programming, we enable direct knowledge transfer of performance optimization, regardless of the application or the target processor.

DaCe generates high-performance programs for:

  • Multi-core CPUs (tested on Intel and IBM POWER9)
  • NVIDIA GPUs
  • AMD GPUs (with HIP)
  • Xilinx FPGAs
  • Intel FPGAs

DaCe can be written inline in Python and transformed in the command-line/Jupyter Notebooks, or SDFGs can be interactively modified using the Data-centric Interactive Optimization Development Environment (DIODE, currently experimental).

For more information, see our paper.

See an example SDFG in the standalone viewer (SDFV).

Tutorials

Installation and Dependencies

To install: pip install dace

Runtime dependencies:

  • A C++14-capable compiler (e.g., gcc 5.3+)
  • Python 3.6 or newer
  • CMake 3.15 or newer

Running

Python scripts: Run DaCe programs (in implicit or explicit syntax) using Python directly.

SDFV (standalone SDFG viewer): To view SDFGs separately, run the sdfv installed script with the .sdfg file as an argument. Alternatively, you can use the link or open diode/sdfv.html directly and choose a file in the browser.

Visual Studio Code plugin: Install from the VSCode marketplace or open an .sdfg file for interactive SDFG viewing and transformation.

DIODE interactive development (experimental):: Either run the installed script diode, or call python3 -m diode from the shell. Then, follow the printed instructions to enter the web interface.

The sdfgcc tool: Compile .sdfg files with sdfgcc program.sdfg. Interactive command-line optimization is possible with the --optimize flag.

Jupyter Notebooks: DaCe is Jupyter-compatible. If a result is an SDFG or a state, it will show up directly in the notebook. See the tutorials for examples.

Octave scripts (experimental): .m files can be run using the installed script dacelab, which will create the appropriate SDFG file.

Note for Windows/Visual C++ users: If compilation fails in the linkage phase, try setting the following environment variable to force Visual C++ to use Multi-Threaded linkage:

X:\path\to\dace> set _CL_=/MT

Publication

If you use DaCe, cite us:

@inproceedings{dace,
  author    = {Ben-Nun, Tal and de~Fine~Licht, Johannes and Ziogas, Alexandros Nikolaos and Schneider, Timo and Hoefler, Torsten},
  title     = {Stateful Dataflow Multigraphs: A Data-Centric Model for Performance Portability on Heterogeneous Architectures},
  year      = {2019},
  booktitle = {Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis},
  series = {SC '19}
}

Configuration

DaCe creates a file called .dace.conf in the user's home directory. It provides useful settings that can be modified either directly in the file (YAML), within DIODE, or overriden on a case-by-case basis using environment variables that begin with DACE_ and specify the setting (where categories are separated by underscores). The full configuration schema is located here.

Useful environment variable configurations include:

  • DACE_CONFIG (default: ~/.dace.conf): Override DaCe configuration file choice.

General configuration:

  • DACE_debugprint (default: False): Print debugging information.
  • DACE_compiler_use_cache (default: False): Uses DaCe program cache instead of re-optimizing and compiling programs.
  • DACE_compiler_default_data_types (default: Python): Chooses default types for integer and floating-point values. If Python is chosen, int and float are both 64-bit wide. If C is chosen, int and float are 32-bit wide.

GPU programming and debugging:

  • DACE_compiler_cuda_backend (default: cuda): Chooses the GPU backend to use (can be cuda for NVIDIA GPUs or hip for AMD GPUs).
  • DACE_compiler_cuda_syncdebug (default: False): If True, calls device-synchronization after every GPU kernel and checks for errors. Good for checking crashes or invalid memory accesses.

FPGA programming:

  • DACE_compiler_fpga_vendor: (default: xilinx): Can be xilinx for Xilinx FPGAs, or intel_fpga for Intel FPGAs.

SDFG interactive transformation:

  • DACE_optimizer_transform_on_call (default: False): Uses the transformation command line interface every time a @dace function is called.
  • DACE_optimizer_interface (default: dace.transformation.optimizer.SDFGOptimizer): Controls the SDFG optimization process if transform_on_call is enabled. By default, uses the transformation command line interface.
  • DACE_optimizer_automatic_simplification (default: True): If False, skips automatic simplification in the Python frontend (see transformations tutorial for more information).

Profiling:

  • DACE_profiling (default: False): Enables profiling measurement of the DaCe program runtime in milliseconds. Produces a log file and prints out median runtime.
  • DACE_treps (default: 100): Number of repetitions to run a DaCe program when profiling is enabled.

Contributing

DaCe is an open-source project. We are happy to accept Pull Requests with your contributions! Please follow the contribution guidelines before submitting a pull request.

License

DaCe is published under the New BSD license, see LICENSE.

Comments
  • Variable shadowing issue after applying FPGA transform in implicit notation

    Variable shadowing issue after applying FPGA transform in implicit notation

    Running this code:

    import dace
    import numpy as np
    
    
    n = dace.symbol("n")
    
    @dace.program
    def dot(x: dace.float32[n], y: dace.float32[n], result: dace.float32[1]):
    
        @dace.map(_[0:n])
        def product(i):
            x_in << x[i]
            y_in << y[i]
    
            result_out >> result(1, lambda a, b: a + b)
            result_out = x_in * y_in
    
    # ----------
    # MAIN
    # ----------
    if __name__== "__main__":
        a = np.array([1,2,3,4,5,6], dtype=np.float32)
        b = np.array([1,2,3,4,5,6], dtype=np.float32)
        c = np.array([0], dtype=np.float32)
    
        dot_sdfg = dot.to_sdfg()
    
        dot_sdfg(x=a, y=b, result=c, n=a.shape[0])
        print("Vec a: ", a)
        print("Vec b: ", b)
        print(c)
    

    After applying "FPGATransformSDFG" the tasklet in connector and the inner state source memlet have a name clash i.e. produce a shadowing issue. See also in the attached image of the SDFG generated by the code after applying the FPGA transformation. Screenshot 2020-02-20 at 12 57 29

    Last lines of error output:

      File "/home/burgerm/dace/dace/codegen/targets/cpu.py", line 464, in _emit_copy
        "    " + self.memlet_definition(sdfg, memlet, False, vconn),
      File "/home/burgerm/dace/dace/codegen/targets/cpu.py", line 975, in memlet_definition
        allow_shadowing=allow_shadowing)
      File "/home/burgerm/dace/dace/codegen/targets/target.py", line 226, in add
        raise dace.codegen.codegen.CodegenError(err_str)
    dace.codegen.codegen.CodegenError: Shadowing variable x_in from type DefinedType.Pointer to DefinedType.Scalar
    
    bug transformations 
    opened by manuelburger 10
  • Fix stream allocation scoping

    Fix stream allocation scoping

    There was an issue where streams would be allocated globally (to a state) and locally. This should not happen. The expected behaviour is to never allocate streams locally to a scope.

    This PR fixes this issue by never including streams in the scope transient analysis.

    opened by komplexon3 9
  • Parallelize Xilinx tests

    Parallelize Xilinx tests

    Translate xilinx_test.sh into xilinx_test.py so we can run multiprocessing.starmap on our tests.

    Time for running Xilinx tests reduced from ~27 minutes to ~11 minutes.

    opened by definelicht 9
  • Unroll PEs in FPGA Codegen

    Unroll PEs in FPGA Codegen

    Unroll maps with schedule Unrolled as part of the FPGA codegen in order to detect them as processing elements.

    This is potentially fishy, since we are applying a transformation during code generation (!!) that modifies the SDFG. It is, however, a neat way of avoiding manually handling this in the FPGA codegen when detecting and generating modules.

    opened by definelicht 8
  • General unroller

    General unroller

    Unrolled scheduler now supports unrolling maps anywhere in the SDFG, also if they contain nested SDFG's. Adds 2 tests, one that checks nested unrolling with nested SDFG's, and one much simpler test, that unrolls a map, containing one tasklet. The general concept of unrolling is to backup all the fields that might be affected by calls to replace, then replacing all the map parameters and generating the scope, followed by restoring the fields that were saved.

    opened by jnice-81 8
  • Allow transforming @dace.program

    Allow transforming @dace.program

    Currently, in order to apply transformations to a @dace.program, you have to first convert it to an SDFG.

    This is sometimes suboptimal because it changes the arguments required to call the program. For example:

    matmul(A, B, C)
    sdfg = matmul.to_sdfg()
    sdfg(A=A, B=B, C=C, N=N, K=K, M=M)
    

    It would be convenient to not have to change the program arguments like this when converting to an SDFG, perhaps by allowing transformations to be applied to the underlying SDFG of a @dace.program while maintaining the program interface/API.

    frontend 
    opened by definelicht 8
  • Generate Duplicated NestedSDFGs only once

    Generate Duplicated NestedSDFGs only once

    PR for #392

    First commit, so that we can iterate.

    Goal: avoid generating 2+ times the same code for a NestedSDFGs that is used multiple times (which include also LibNodes after expansion)

    Current implementation:

    • we need to unequivocally identify SDFG. For the moment, I've added to the SDFG an additional property unique_name (type string, default empty)
    • this is used in the code generator (CPU) to keep track of the already generated Nested SDFG. If we try to generate an already seen NestedSDFG, it will skip it
    • there are two additional tests for checking that it works on CPU and FPGA (under the assumption that the topmost SDFG is scheduled on the CPU)

    So, up to now, it is up to the user to specify the SDFG unique name. We would probably need something (in the configuration file?), to disable this.

    Don't know why code coverage fails: the relative difference is 100%

    opened by TizianoDeMatteis 8
  • Calling A @ B as a dace.program inside a function gives the wrong result

    Calling A @ B as a dace.program inside a function gives the wrong result

    Describe the bug Using dace.program A @ B returns different result than just calling A @ B. This issue doesn't seem to happen when I call it from the main method, but when it's nested inside a more complex function, the results are wrong.

    To Reproduce I have a method that calls a matrix matrix multiplication like so: C[m1:m,0:B_dim] += A[m1:m,0:m1] @ B[0:m1,0:B_dim] And I attempted to replace it with: C[m1:m, 0:B_dim] += matmul_lib(A[m1:m,0:m1],B[0:m1,0:B_dim]) Where: @dace.program def matmul_lib(A: dtype[M, K], B: dtype[K, N]): return A @ B

    Expected behavior They should return the same numerical result, but for some reason they do not. When looking at the numbers produced, the first row seems to correspond but the rows after are all wrong. Example: This is what I get from calling simply A @ B which gives me the correct result: 0.99134411 2.37927192 0.6220935 1.92701032 0.96556958 1.42080484 0.83607334 0.86882378 1.82525592 2.89202545 1.35001469 2.37230364 1.44825839 1.81533188 1.29120714 1.11907193 1.80647289 1.72074369 1.32760667 1.88492459 1.67942782 1.52714228 1.53037621 0.79207197 1.88622792 2.82863798 1.24272828 2.70113389 2.19127038 2.11175294 1.71630729 1.28087929

    This is what I get from calling it through matmul_lib: 0.99134411 2.37927192 0.6220935 1.92701032 0.96556958 1.42080484 0.83607334 0.86882378 0.80634312 1.64045609 0.5430692 1.04235601 0.9985718 1.39916937 0.73351313 0.86935447 1.95304681 2.74536479 1.56036938 2.40742907 1.69156976 1.96556436 1.48281472 1.27695084 0.90018296 2.24500806 0.64027737 1.62140514 1.39162286 1.60093125 0.99937141 0.96864924

    Desktop (please complete the following information):

    • OS: windows 10
    • DaCe on commit: 4f36b20e602a0320ce8303aafcfd9d430d1614e7
    • python 3.7.9
    bug frontend 
    opened by Simeonedef 7
  • Code style issue: decorators

    Code style issue: decorators

    Google style guide

    Use decorators judiciously when there is a clear advantage. https://google.github.io/styleguide/pyguide.html#217-function-and-method-decorators

    Examples

    https://github.com/spcl/dace/blob/22fd2d20b896b58a39f5dedc698d0807296767a5/dace/transformation/transformation.py#L15 https://github.com/spcl/dace/blob/22fd2d20b896b58a39f5dedc698d0807296767a5/dace/transformation/dataflow/tiling.py#L12

    Cons

    Require to keep in mind that something will be made with class/function. Languages provide more classic features to support the registry, that people can understand faster.

    Possible solution

    Keep registry inside its own class. Define classmethod for this class, which creates an instance of the class and fills it with default transformations. Such classmethod should import each transformation itself. The advantage of such design is that users can extend it by using its own class instance that is filled with user-provided transformations. Another advantage is that there is no global registry.

    Good uses

    To make annotations in python/numpy interface.

    opened by and-ivanov 7
  • Serialize patch

    Serialize patch

    • [x] As a start, don't catch all exceptions, and don't fail silently if a field is missing.
    • [x] Tolerate string inputs in set_properties_from_json
    • [x] There's a monstrous list of string-to-type mappings in dace.properties.known_types. This seems brittle and hideous. We can replace this by calling an optionally implemented method.
    • [x] Naming scheme is off (should be to_json, not toJSON
    • [x] Move set_properties_from_json out of Property class
    • [x] Right now every implementation of toJSON/fromJSON needs to call json.dumps and json.loads. Reduce this to receive the JSON objece.
    opened by definelicht 7
  • Reductions are broken on Xilinx FPGAs

    Reductions are broken on Xilinx FPGAs

    Describe the bug When using a reduction (either manual dace.reduce or detected e.g. np.max) dace generates incorrect FPGA code which fails to compile with the error:

    /lustre/home/nx08/nx08/jquinn/dace_iterative_solvers/.dacecache/broken_reduction_sym_1/src/xilinx/device/broken_reduction_sym_0_0.cpp: In function 'void broken_reduction_sym_0_0_0(const double*, double&, int)':
    /lustre/home/nx08/nx08/jquinn/dace_iterative_solvers/.dacecache/broken_reduction_sym_1/src/xilinx/device/broken_reduction_sym_0_0.cpp:40:34: error: invalid initialization of reference of type 'double*&' from expression of type 'double'
       40 |         reduce_1_0_2(&__A_in[0], __result_out, N);
          |                                  ^~~~~~~~~~~~
    /lustre/home/nx08/nx08/jquinn/dace_iterative_solvers/.dacecache/broken_reduction_sym_1/src/xilinx/device/broken_reduction_sym_0_0.cpp:10:47: note: in passing argument 2 of 'void reduce_1_0_2(const double*, double*&, int)'
       10 | void reduce_1_0_2(const double* _in, double*& _out, int N) {
          |                                      ~~~~~~~~~^~~~
    gmake[2]: *** [CMakeFiles/broken_reduction_sym_1.dir/lustre/home/nx08/nx08/jquinn/dace_iterative_solvers/.dacecache/broken_reduction_sym_1/src/xilinx/device/broken_reduction_sym_0_0.cpp.o] Error 1
    gmake[1]: *** [CMakeFiles/broken_reduction_sym_1.dir/all] Error 2
    gmake: *** [all] Error 2
    

    To Reproduce Minimal example:

    import dace
    import numpy as np
    from dace.transformation.interstate import FPGATransformSDFG
    
    N = dace.symbol("N")
    
    @dace.program
    def broken_reduction_sym(A: dace.float64[N]):
        # result = np.min(A)
        result = dace.reduce(lambda a, b: a+b, A)
    
    broken_reduction_sdfg = broken_reduction_sym.to_sdfg()
    broken_reduction_sdfg.apply_transformations(FPGATransformSDFG)
    broken_reduction = broken_reduction_sdfg.compile()
    

    Expected behavior Reductions should produce code that compiles.

    Additional context Dace version: 0.13.2 Vitis version: 2021.2 XRT version: 2.11.634 Python version: 3.9.7 Cmake version: 3.19.3 G++ version: 10.2.0

    bug 
    opened by JamieJQuinn 6
  • auto_optimize now properly chooses GPUAuto expansion for reduce nodes

    auto_optimize now properly chooses GPUAuto expansion for reduce nodes

    This PR adds very little code to auto_optimize.py. The added code ensures that the GPUAutoExpansion gets used for reduce nodes, when auto_optimize is used to optimize the produced SDFG.

    Before, it would always choose CUDA (device) even though the GPUAuto expansion is higher in the implementation_prio.

    opened by hodelcl 0
  • Warn user when calling `to_sdfg` on a function that shouldn't be reparsed

    Warn user when calling `to_sdfg` on a function that shouldn't be reparsed

    When using reparse_sdfg or recompile keyword arguments:

    1. sometimes recompile shows up as a constant expression
    2. if calling to_sdfg() the user should be warned that these arguments will be ignored.
    opened by tbennun 0
  • csdfg can not handle torch.rand() tensor in getting_started.ipynb

    csdfg can not handle torch.rand() tensor in getting_started.ipynb

    Describe the bug I replace [12] tester = np.random.rand(2000, 4000) by

    import torch
    tester = torch.rand(2000,4000)
    tester
    

    and %timeit csdfg(A=tester, N=np.int32(2000)) by %timeit csdfg(A=tester,N=2000) which means use torch.tensor() to replace numpy array as input, but we almost get an error of "Kernel Restarting":

    To Reproduce Changes the code as I mentioned, then run all cells.

    Expected behavior Output the time of csdfg(A=tester,N=2000).

    Screenshots Almost we will get an error: fe66eb6b68d5142e6f6de24b75193b8

    Sometimes we can get expect result (without any code changes): 056fe4ed6456de44139542e53585d3d

    Desktop (please complete the following information):

    • OS: Linux
    • Browser: Chrome
    • Version: 106.0.52

    Additional context The error also occurs on Windows OS.

    opened by Weigaa 0
  • Python `with` statement code generation identifier is not unique enough

    Python `with` statement code generation identifier is not unique enough

    Describe the bug The with statement generates in C code a pair of __with_XXX___enter, __with_XXX___exit statements, with XXX the line number in the original source. It's also how those symbols are earmarked in the SDFG. Unfortunately, this is can cause nasty clashes when:

    • with statement from two different file are at the same line
    • code change outside the DaCe handled code path end up changing the line of a with statement that is considered by DaCe, which invalidates running the .so (bad symbol) when technically nothing changed in the code DaCe should care about.

    To Reproduce

    It would be a two file reproducer with with statement sharing a file number.

    Expected behavior

    The python frontend handling with statement properly is a very good feature and shouldn't be discarded. A more robust sanitization of the with statement should be found.

    Proposal: with util.timer.clock("mainloop") to be sanitized as __with_util_timer_clock_mainloop_X___enter with X a global counter on with statements to keep ordering consistent.

    opened by FlorianDeconinck 0
Releases(v0.14.1)
  • v0.14.1(Oct 14, 2022)

    This release of DaCe offers mostly stability fixes for the Python frontend, transformations, and callbacks.

    Full Changelog: https://github.com/spcl/dace/compare/v0.14...v0.14.1

    Source code(tar.gz)
    Source code(zip)
  • v0.14(Aug 26, 2022)

    What's Changed

    This release brings forth a major change to how SDFGs are simplified in DaCe, using the Simplify pass pipeline. This both improves the performance of DaCe's transformations and introduces new types of simplification, such as dead dataflow elimination.

    Please let us know if there are any regressions with this new release.

    Features

    • Breaking change: The experimental dace.constant type hint has now achieved stable status and was renamed to dace.compiletime
    • Major change: Only modified configuration entries are now stored in ~/.dace.conf. The SDFG build folders still include the full configuration file. Old .dace.conf files are detected and migrated automatically.
    • Detailed, multi-platform performance counters are now available via native LIKWID instrumentation (by @lukastruemper in https://github.com/spcl/dace/pull/1063). To use, set .instrument to dace.InstrumentationType.LIKWID_Counters
    • GPU Memory Pools are now supported through CUDA's mallocAsync API. To enable, set desc.pool = True on any GPU data descriptor.
    • Map schedule and array storage types can now be annotated directly in Python code (by @orausch in https://github.com/spcl/dace/pull/1088). For example:
    import dace
    from dace.dtypes import StorageType, ScheduleType
    
    N = dace.symbol('N')
    
    @dace
    def add_on_gpu(a: dace.float64[N] @ StorageType.GPU_Global,
                   b: dace.float64[N] @ StorageType.GPU_Global):
      # This map will become a GPU kernel
      for i in dace.map[0:N] @ ScheduleType.GPU_Device:
        b[i] = a[i] + 1.0
    
    • Customizing GPU block dimension and OpenMP threading properties per map is now supported
    • Optional arrays (i.e., arrays that can be None) can now be annotated in the code. The simplification pipeline also infers non-optional arrays from their use and can optimize code by eliminating branches. For example:
    @dace
    def optional(maybe: Optional[dace.float64[20]], always: dace.float64[20]):
      always += 1  # "always" is always used, so it will not be optional
      if maybe is None:  # This condition will stay in the code
        return 1
      if always is None:  # This condition will be eliminated in simplify
        return 2
      return 3
    

    Minor changes

    • Miscellaneous fixes to transformations and passes
    • Fixes for string literal ("string") use in the Python frontend
    • einsum is now a library node
    • If CMake is already installed, it is now detected and will not be installed through pip
    • Add kernel detection flag by @TizianoDeMatteis in https://github.com/spcl/dace/pull/1061
    • Better support for __array_interface__ objects by @gronerl in https://github.com/spcl/dace/pull/1071
    • Replacements look up base classes by @tbennun in https://github.com/spcl/dace/pull/1080

    Full Changelog: https://github.com/spcl/dace/compare/v0.13.3...v0.14

    Source code(tar.gz)
    Source code(zip)
  • v0.13.3(Jun 30, 2022)

    What's Changed

    • Better integration with Visual Studio Code: Calling sdfg.view() inside a VSCode console or debug session will open the file directly in the editor!
    • Code generator for the Snitch RISC-V architecture (by @noah95 and @am-ivanov)
    • Minor hotfixes to Python frontend, transformations, and code generation (with @orausch)

    Full Changelog: https://github.com/spcl/dace/compare/v0.13.2...v0.13.3

    Source code(tar.gz)
    Source code(zip)
  • v0.13.2(Jun 22, 2022)

    What's Changed

    • New API for SDFG manipulation: Passes and Pipelines. More about that in the next major release!
    • Various fixes to frontend, type inference, and code generation.
    • Support for more numpy and Python functions: arange, round, etc.
    • Better callback support:
      • Support callbacks with keyword arguments
      • Support literal lists, tuples, sets, and dictionaries in callbacks
    • New transformations: move loop into map, on-the-fly-recomputation map fusion
    • Performance improvements to frontend
    • Better Docker container compatibility via fixes for config files without a home directory
    • Add interface to check whether in a DaCe parsing context in https://github.com/spcl/dace/pull/998
    def potentially_parsed_by_dace():
        if not dace.in_program():
            print('Called by Python interpreter!')
       else:
           print('Compiled with DaCe!')
    
    • Support compressed (gzipped) SDFGs. Loads normally, saves with:
    sdfg.save('myprogram.sdfgz', compress=True)  # or just run gzip on your old SDFGs
    
    • SDFV: Add web serving capability by @orausch in https://github.com/spcl/dace/pull/1013. Use for interactively debugging SDFGs on remote nodes with: sdfg.view(8080) (or any other port)

    Full Changelog: https://github.com/spcl/dace/compare/v0.13.1...v0.13.2

    Source code(tar.gz)
    Source code(zip)
  • v0.13.1(Apr 26, 2022)

    What's Changed

    • Python frontend: Bug fixes for closures and callbacks in nested scopes
    • Bug fixes for several transformations (StateFusion, RedundantSecondArray)
    • Fixes for issues with FORTRAN ordering of numpy arrays
    • Python object duplicate reference checks in SDFG validation

    Full Changelog: https://github.com/spcl/dace/compare/v0.13...v0.13.1

    Source code(tar.gz)
    Source code(zip)
  • v0.13(Feb 28, 2022)

    New Features

    Cutout:

    Cutout allows developers to take large DaCe programs and cut out subgraphs reliably to create a runnable sub-program. This sub-program can be then used to check for correctness, benchmark, and transform a part of a program without having to run the full application. * Example usage from Python:

    def my_method(sdfg: dace.SDFG, state: dace.SDFGState):
        nodes = [n for n in state if isinstance(n, dace.nodes.LibraryNode)]  # Cut every library node
        cut_sdfg: dace.SDFG = cutout.cutout_state(state, *nodes)
        # The cut SDFG now includes each library node and all the necessary arrays to call it with
    

    Also available in the SDFG editor:

    Data Instrumentation:

    Just like node instrumentation for performance analysis, data instrumentation allows users to set access nodes to be saved to an instrumented data report, and loaded later for exact reproducible runs. * Data instrumentation natively works with CPU and GPU global memory, so there is no need to copy data back * Combined with Cutout, this is a powerful interface to perform local optimizations in large applications with ease! * Example use:

        @dace.program
        def tester(A: dace.float64[20, 20]):
            tmp = A + 1
            return tmp + 5
    
        sdfg = tester.to_sdfg()
        for node, _ in sdfg.all_nodes_recursive():  # Instrument every access node
            if isinstance(node, nodes.AccessNode):
                node.instrument = dace.DataInstrumentationType.Save
    
        A = np.random.rand(20, 20)
        result = sdfg(A)
    
        # Get instrumented data from report
        dreport = sdfg.get_instrumented_data()
        assert np.allclose(dreport['A'], A)
        assert np.allclose(dreport['tmp'], A + 1)
        assert np.allclose(dreport['__return'], A + 6)
    

    Logical Groups:

    SDFG elements can now be grouped by any criteria, and they will be colored during visualization by default (by @phschaad). See example in action:

    Changes and Bug Fixes

    • Samples and tutorials have now been updated to reflect the latest API
    • Constants (added with sdfg.add_constant) can now be used as access nodes in SDFGs. The constants are hard-coded into the generated program, so you can run code with the best performance possible.
    • View nodes can now use the views connector to disambiguate which access node is being viewed
    • Python frontend: else clause is now handled in for and while loops
    • Scalars have been removed from the __dace_init generated function signature (by @orausch)
    • Multiple clock signals in the RTL codegen (by @carljohnsen)
    • Various fixes to frontends, transformations, and code generators

    Full Changelog available at https://github.com/spcl/dace/compare/v0.12...v0.13

    Source code(tar.gz)
    Source code(zip)
  • v0.12(Jan 22, 2022)

    API Changes

    Important: Pattern-matching transformation API has been significantly simplified. Transformations using the old API must be ported! Summary of changes:

    • Transformations now expand either the SingleStateTransformation or MultiStateTransformation classes instead of using decorators
    • Patterns must be registered as class variables called PatternNodes
    • Nodes in matched patterns can be then accessed in can_be_applied and apply directly using self.nodename
    • The name strict is now replaced with permissive (False by default). Permissive mode allows transformations to match in more cases, but may be dangerous to apply (e.g., create race conditions).
    • can_be_applied is now a method of the transformation
    • The apply method accepts a graph and the SDFG.

    Example of using the new API:

    import dace
    from dace import nodes
    from dace.sdfg import utils as sdutil
    from dace.transformation import transformation as xf
    
    class ExampleTransformation(xf.SingleStateTransformation):
        # Define pattern nodes
        map_entry = xf.PatternNode(nodes.MapEntry)
        access = xf.PatternNode(nodes.AccessNode)
    
        # Define matching subgraphs
        @classmethod
        def expressions(cls):
            # MapEntry -> Access
            return [sdutil.node_path_graph(cls.map_entry, cls.access)]
    
        def can_be_applied(self, graph: dace.SDFGState, expr_index: int, sdfg: dace.SDFG, permissive: bool = False) -> bool:
            # Returns True if the transformation can be applied on a subgraph
            if permissive:  # In permissive mode, we will always apply this transformation
                return True
            return self.map_entry.schedule == dace.ScheduleType.CPU_Multicore
    
        def apply(self, graph: dace.SDFGState, sdfg: dace.SDFG):
            # Apply the transformation using the SDFG API
            pass
    

    Simplifying SDFGs is renamed from sdfg.apply_strict_transformations() to sdfg.simplify()

    AccessNodes no longer have an AccessType field.

    Other changes

    • More nested SDFG inlining opportunities by default with the multi-state inline transformation
    • Performance optimizations of the DaCe framework (parsing, transformations, code generation) for large graphs
    • Support for Xilinx Vitis 2021.2
    • Minor fixes to transformations and deserialization

    Full Changelog: https://github.com/spcl/dace/compare/v0.11.4...v0.12

    Source code(tar.gz)
    Source code(zip)
  • v0.11.4(Dec 17, 2021)

    What's Changed

    • If a Python call cannot be parsed into a data-centric program, DaCe will automatically generate a callback into Python. Supports CPU arrays and GPU arrays (via CuPy) without copying!
    • Python 3.10 support
    • CuPy arrays are supported when calling @dace.programs in JIT mode
    • Fix various issues in Python frontend and code generation

    Full Changelog: https://github.com/spcl/dace/compare/v0.11.3...v0.11.4

    Source code(tar.gz)
    Source code(zip)
  • v0.11.3(Nov 23, 2021)

  • v0.11.2(Nov 12, 2021)

  • v0.11.1(Oct 18, 2021)

    What's Changed

    • More flexible Python frontend: you can now call functions and object methods, use fields and globals in @dace programs! Some examples:
      • There is no need to annotate called functions
      • @dataclass and general object field support
      • Loop unrolling: implicit and explicit (with the dace.unroll generator)
      • Constant folding and explicit constant arguments (with dace.constant as a type hint)
      • Debuggability: all functions (e.g. dace.map, dace.tasklet) work in pure Python as well
      • and many more features
    • NumPy semantics are followed more closely, e.g., subscripts create array views
    • Direct CuPy and torch.tensor integration in @dace program arguments
    • Auto-optimization (preview): use @dace.program(auto_optimize=True, device=dace.DeviceType.CPU) to automatically run some transformations, such as turning loops into parallel maps.
    • ARM SVE code generation support by @sscholbe (#705)
    • Support for MLIR tasklets by @Berke-Ates in (#747)
    • Source Mapping by @benibenj in https://github.com/spcl/dace/pull/756
    • Support for HBM on Xilinx FPGAs by @jnice-81 (#762)

    Miscellaneous:

    • Various performance optimizations to calling @dace programs
    • Various bug fixes to transformations, code generator, and frontends

    Full Changelog: https://github.com/spcl/dace/compare/v0.10.8...v0.11.1

    Source code(tar.gz)
    Source code(zip)
  • v0.10.8(Apr 14, 2021)

    What's New?

    • Various bug fixes and more stable Python/NumPy frontend
    • Support for running DaCe programs within the Python interpreter
    • (experimental) Support for automatic optimization passes (more coming soon!)
    Source code(tar.gz)
    Source code(zip)
  • v0.10.0(Oct 4, 2020)

    What's New?

    • Python frontend improvements: More Python features are supported, such as return values, tuples, and numpy broadcasting. @dace.programs can now call other programs or SDFGs.
    • AMD GPU (HIP) Support: AMD GPUs are now fully supported with HIP code generation.
    • Easy-to-use transformation APIs: Apply transformation compositions with one call, enumerate subgraph matches manually, and many more functions now available as part of the dace API. See the new tutorial for examples.
    • Faster code generation: Backends now generate lower-level code that is more compiler-friendly.
    • Instrumentation interface: Setting the instrument property for SDFG nodes and states enables easy-to-use, localized performance reporting with timers, GPU events, and PAPI performance counters.
    • DaCe VSCode plugin: Interactive SDFG viewer and optimizer as part of Visual Studio Code. Download the plugin here.
    • Type inference and connector types: In addition to automatic type inference, connectors on nodes can now be defined with explicit types, giving more fine-grained control over type reinterpreting and vector types.
    • Subgraph transformations: New transformation type that can work on arbitrary subgraphs. For example, fuse any computation within a state with SubgraphFusion.
    • Persistent GPU kernel schedule: Launch persistent kernels with a change of a property! Proportion used of GPU multiprocessors is configurable.
    • More transformations: Loop manipulation and other new transformations now available with DaCe. Some transformations (such as Vectorization) made more robust to corner cases.
    • More tools: Use sdfgcc to quickly compile and optimize .sdfg files from the command line, generating header and library files. Great for interoperability and Makefiles.
    • Short DaCe annotation: Data-centric functions can now be annotated with @dace.
    • Many minor fixes and additions: More library nodes (such as einsum) and new properties added, enabling faster performance and more productive high-performance coding than ever.
    Source code(tar.gz)
    Source code(zip)
  • v0.9.5(Jan 6, 2020)

    What's New?

    • Intel FPGA backend: Generates and compiles Intel FPGA OpenCL code from SDFGs.
    • Renderer: Many improvements to the scalability of drawing large SDFGs, touch/mobile support, and code view upon zooming into Tasklets.
    • SDFV: Now includes a sidebar with information about clicked nodes/edges/states.
    • GPU reduction: Now supports Reduce nodes where output array contains multiple dimensions (if contiguous). On other cases, use the ReduceExpansion transformation.
    • Faster compilation: Improved CMake usage to speed up compilation time if files were not changed.
    • Stability: Various fixes to the Python frontend, transformations, code generation, and DIODE (on Linux and Windows).
    • Generated programs now include header (.h) file and an example C program that invokes the compiled SDFG.
    Source code(tar.gz)
    Source code(zip)
  • v0.9.0(Oct 22, 2019)

    What's New

    • NumPy syntax for Python: Wrap Python functions that work on numpy arrays with @dace.program and create SDFGs from implicit dataflow.
    • DIODE 2.0: DIODE has been reworked to operate in the browser, and works natively on Windows. Note that it is currently experimental, and some features may cause errors. We are happy to fix bugs if you find and report issues!
    • Standalone SDFG renderer (SDFV) and improved Jupyter support: Contextual, optimized SDFG drawing with collapsible scopes (double-click a map, a state, or a nested SDFG). Fully integrated into Jupyter notebooks.
    • Transformations: Improvements to scalability of subgraph pattern matching and memlet propagation.
    • Improvements to the TensorFlow frontend.
    • Many minor bug fixes and several API improvements.
    Source code(tar.gz)
    Source code(zip)
  • v0.8.1(Aug 24, 2019)

Sensitivity Analysis Library in Python (Numpy). Contains Sobol, Morris, Fractional Factorial and FAST methods.

Sensitivity Analysis Library (SALib) Python implementations of commonly used sensitivity analysis methods. Useful in systems modeling to calculate the

SALib 663 Jan 5, 2023
NumPy and Pandas interface to Big Data

Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte

Blaze 3.1k Jan 5, 2023
An Aspiring Drop-In Replacement for NumPy at Scale

Legate NumPy is a Legate library that aims to provide a distributed and accelerated drop-in replacement for the NumPy API on top of the Legion runtime. Using Legate NumPy you do things like run the final example of the Python CFD course completely unmodified on 2048 A100 GPUs in a DGX SuperPOD and achieve good weak scaling.

Legate 502 Jan 3, 2023
CleanX is an open source python library for exploring, cleaning and augmenting large datasets of X-rays, or certain other types of radiological images.

cleanX CleanX is an open source python library for exploring, cleaning and augmenting large datasets of X-rays, or certain other types of radiological

Candace Makeda Moore, MD 20 Jan 5, 2023
Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

Data lineage made simple, reliable, and automated. Effortlessly track the flow of data, understand dependencies and analyze impact. Features Visualiza

null 898 Jan 9, 2023
Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano

PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) an

PyMC 7.2k Dec 30, 2022
Deep universal probabilistic programming with Python and PyTorch

Getting Started | Documentation | Community | Contributing Pyro is a flexible, scalable deep probabilistic programming library built on PyTorch. Notab

null 7.7k Dec 30, 2022
Python library for creating data pipelines with chain functional programming

PyFunctional Features PyFunctional makes creating data pipelines easy by using chained functional operators. Here are a few examples of what it can do

Pedro Rodriguez 2.1k Jan 5, 2023
Functional tensors for probabilistic programming

Funsor Funsor is a tensor-like library for functions and distributions. See Functional tensors for probabilistic programming for a system description.

null 208 Dec 29, 2022
A probabilistic programming language in TensorFlow. Deep generative models, variational inference.

Edward is a Python library for probabilistic modeling, inference, and criticism. It is a testbed for fast experimentation and research with probabilis

Blei Lab 4.7k Jan 9, 2023
A probabilistic programming library for Bayesian deep learning, generative models, based on Tensorflow

ZhuSuan is a Python probabilistic programming library for Bayesian deep learning, which conjoins the complimentary advantages of Bayesian methods and

Tsinghua Machine Learning Group 2.2k Dec 28, 2022
BioMASS - A Python Framework for Modeling and Analysis of Signaling Systems

Mathematical modeling is a powerful method for the analysis of complex biological systems. Although there are many researches devoted on produ

BioMASS 22 Dec 27, 2022
PyChemia, Python Framework for Materials Discovery and Design

PyChemia, Python Framework for Materials Discovery and Design PyChemia is an open-source Python Library for materials structural search. The purpose o

Materials Discovery Group 61 Oct 2, 2022
wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information

Python based Wikidata framework for easy dataframe extraction wikirepo is a Python package that provides a framework to easily source and leverage sta

Andrew Tavis McAllister 35 Jan 4, 2023
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

Benedek Rozemberczki 1.8k Jan 9, 2023
ETL flow framework based on Yaml configs in Python

ETL framework based on Yaml configs in Python A light framework for creating data streams. Setting up streams through configuration in the Yaml file.

Павел Максимов 18 Jul 6, 2022
An ETL framework + Monitoring UI/API (experimental project for learning purposes)

Fastlane An ETL framework for building pipelines, and Flask based web API/UI for monitoring pipelines. Project structure fastlane |- fastlane: (ETL fr

Dan Katz 2 Jan 6, 2022
PLStream: A Framework for Fast Polarity Labelling of Massive Data Streams

PLStream: A Framework for Fast Polarity Labelling of Massive Data Streams Motivation When dataset freshness is critical, the annotating of high speed

null 4 Aug 2, 2022
Incubator for useful bioinformatics code, primarily in Python and R

Collection of useful code related to biological analysis. Much of this is discussed with examples at Blue collar bioinformatics. All code, images and

Brad Chapman 560 Jan 3, 2023