Run Python in Apache Storm topologies. Pythonic API, CLI tooling, and a topology DSL.

Related tags

python storm apache-storm
Overview

logo

Build Status

Streamparse lets you run Python code against real-time streams of data via Apache Storm. With streamparse you can create Storm bolts and spouts in Python without having to write a single line of Java. It also provides handy CLI utilities for managing Storm clusters and projects.

The Storm/streamparse combo can be viewed as a more robust alternative to Python worker-and-queue systems, as might be built atop frameworks like Celery and RQ. It offers a way to do "real-time map/reduce style computation" against live streams of data. It can also be a powerful way to scale long-running, highly parallel Python processes in production.

Demo

Documentation

User Group

Follow the project's progress, get involved, submit ideas and ask for help via our Google Group, [email protected].

Contributors

Alphabetical, by last name:

Changelog

See the releases page on GitHub.

Roadmap

See the Roadmap.

Issues
  • Python Topology DSL

    Python Topology DSL

    We currently rely on lein and the Clojure DSL to build topologies. This is nice because Clojure DSL is bundled in Storm and it allows us to freely mix Python components with JVM (and even, other multi-lang) components. And via Clojure, we get local debugging for free via LocalCluster.

    But it's also an impediment to new users coming to streamparse expecting "pure Python" support for Storm. See for example this Twitter conversation:

    https://twitter.com/sarthakdev/status/539390816339247104

    The Clojure DSL was chosen for expediency, but for pure Python topologies, a Python DSL might be even better and allow the streamparse user not to learn much about Java/Clojure. The recently-released pyleus approach to this problem is to provide a YAML DSL and a Java builder tool.

    One approach to a Python DSL would be to leverage some new work going on in Storm core to make Topology configuration dynamic via JSON, as described in #81. Another option would be to have the Python DSL actually generate Clojure DSL, which would then be compiled. I haven't currently decided on the best course of action but I am personally interested in building the Python DSL to make streamparse more usable by Pythonistas out-of-the-box.

    enhancement 
    opened by amontalenti 54
  • Make BatchingBolt use tick tuples instead of threads.

    Make BatchingBolt use tick tuples instead of threads.

    This still allows for non-time-based batching; it just doesn't require threads anymore.

    This is not a backward-compatible change, as secs_between_branches was changed to ticks_between_branches, and people now need to run Storm with topology.tick.tuple.freq.secs set to 1 to have their code work the same as before.

    This should also fix the sporadic unit test failures we were seeing on Travis CI, since the BatchingBolt tests used to be time-based.

    This fixes #125.

    enhancement 
    opened by dan-blanchard 27
  • Provide example of mixed Java + Python topology using storm-kafka

    Provide example of mixed Java + Python topology using storm-kafka

    Requested on the mailing list:

    I am looking forward to understand how I can use clojure DSL to submit a a topology which has spout written in Java and Bolts written in Python.

    The best first example would be to use the new official Kafka spout (storm-kafka) that is now included in Storm's 0.9.2 release. This combined with Python bolts processing the data from the Kafka spout would be the best first Python + Java interop example for streamparse, I think.

    enhancement help wanted 
    opened by amontalenti 23
  • Adds basic functionality for getting Storm statistics at command line

    Adds basic functionality for getting Storm statistics at command line

    Starts to address issue #17. Allows user to essentially recreate storm UI index page and topology detail page at the command line (configuration details omitted for brevity at this point).

    User can run streamparse -e ENV to get storm-ui-index-page-like results. User can also add -n TOPO_NAME to get topology-detail-page-like-results.

    Main problems at the moment:

    • Prints wide tables that will wrap in terminal windows.
    • Doesn't live-refresh (e.g. with curses).
    enhancement 
    opened by tdhopper 23
  • Re-organize tests to avoid shell processes (branch: feature/reorg_tests)

    Re-organize tests to avoid shell processes (branch: feature/reorg_tests)

    I'm opening this test for a review of the approach, not because all the tests have been reorganized. If there's support for the ideas in here, then I'll finish it up and replace our current tests with this.

    Overview

    Right now, our tests are a bit brittle, the barrier to entry for adding new tests is too high, and debugging failing tests, or gathering code coverage statistics is fairly difficult. This is by no means to say the current tests are bad. They work, they test things, and they aren't too hard to add new things to. I did this because I want adding new tests to be stupid easy. Everything that makes adding new tests easier to write and debug means that more and better tests will get written.

    To do this, the big thing I wanted to do was remove the subprocess calls entirely. While they look like Storm, they aren't Storm, and they add a good bit of complication to the setup. For one, it makes using pdb to debug a test impossible, and the break in process space makes code coverage harder to collect.

    Finally, we didn't have true integration tests. True integration tests would integrate with Storm and verify that communication was successful. Instead, it integrated with a mock that we created that mimics Storm. Instead of using a subprocess to mock Storm, I'm opting for the mock library to mock individual components. The mock library is a common Python tool for testing that any contributor walking in will know. Using a common tool like that will remove the barrier or learning how our subprocess storm mock works, which makes writing tests that much easier.

    Disclaimer: I've no idea how this will play with python3. Testing that will come later.

    Benefits

    • Easier debugging into running test processes
    • Significantly faster tests
      • No process creation/teardown for every test
    • Simpler testing
      • Use common mock tools instead of our own mock framework
    • Unit testing means more focused tests covering specific use cases

    Next steps

    If other people agree this is a good approach, I'll finish test coverage for the current streamparse version. @msukmanowsky has a large Pull Request coming in, so I'll wait until that's resolved and write all the necessary testing for what's there. Once that's complete, I'll hook up coveralls for streamparse to get code coverage stats. I don't know how useful they'll be, but I'm curious about it and it's integration with Travis-CI.

    opened by kbourgoin 22
  • sparse run with named streams not compatible with Storm 1.0.1

    sparse run with named streams not compatible with Storm 1.0.1

    Hello,

    the currently stable streamparse 3.0.0 is not compatible with Storm 1.0.1, which is the latest official release available on storm.apache.org. It requires at least Storm 1.0.2 or 1.1.0, because FluxShellSpout.setNamedStream and FluxShellBolt.setNamedStream are only available since then.

    Since 1.0.2 and 1.1.0 are only released on GitHub and not yet available on maven this might be a serious problem. Streamparse also uses 1.0.1 as a default in the examples.

    documentation 
    opened by macheins 22
  • `sparse visualize` command

    `sparse visualize` command

    Use an existing topology visualization tool like:

    or some mix of these, add a visualization command for topologies:

    sparse visualize [(--name | -n) <topology>] (--format | -f (png|jpg|gif)) output_filename 
    
    enhancement help wanted 
    opened by msukmanowsky 19
  • Move away from Clojure

    Move away from Clojure

    @amontalenti and I were struggling with getting the Clojure part of streamparse compiled and deployed to clojars because of typical JVM dependency issues, and it made me think it would be nice if we were a pure-Python package. I know the official thrift binding for Python are horrible to get installed and working properly, but thriftpy is actually a really nice package, and I made it compatible with the Storm thrift spec a while ago.

    The approach I'm imagining would use the REST API for most things and only use the Thrift interface to Nimbus for submitting a topology.

    One downside I see is that unless the user also had the storm CLI installed locally, they wouldn't be able to run a local topology like you currently can with sparse run.

    opened by dan-blanchard 17
  • sparse submit and sparse list should not require SSH tunnels

    sparse submit and sparse list should not require SSH tunnels

    Currently, when you run sparse submit or sparse list, the streamparse code opens an SSH reverse tunnel to your configured cluster in config.json and then connects to the SSH-forwarded port.

    However, there are use cases where Storm is running directly on your machine, or running in a Virtualbox VM or Docker container which is port-forwarded at the network level to localhost. In these cases, especially in the Docker case, SSH is not a viable option for communicating with the nimbus. It should, instead, connect directly to the nimbus at the exposed port.

    In the case of sparse submit, this also means that virtualenv setup steps cannot run.

    I am also going to link this issue to every issue that is dealing with SSH problems.

    bug help wanted 
    opened by amontalenti 16
  • 3.0.0.dev3 Bolt.spec(config={}) is not passed to storm_conf when using `sparse run`

    3.0.0.dev3 Bolt.spec(config={}) is not passed to storm_conf when using `sparse run`

    class WordCount(Topology):
        word_spout = WordSpout.spec(
            config={"foo":"bar"}
        )
    
    class WordSpout(Bolt):
        def initialize(self, storm_conf, context):
            foo_var = storm_conf.get('foo')
            ...
    

    foo_var is None.

    How did you pass config to storm in your previous code (2.x.x) ShellBolt?

    bug waiting on upstream 
    opened by Darkless012 16
  • Can't run sparse run on wordcount example: RuntimeError: dictionary keys changed during iteration

    Can't run sparse run on wordcount example: RuntimeError: dictionary keys changed during iteration

    I'm using streamparse==3.16.0 and storm 1.1.3. When I try to run using python3.8.5 I get a RuntimeError with the following exception trace:

    (VENV3) [email protected]:~/Documents/repos/wordcount$ sparse run Traceback (most recent call last): File "/home/francis/Documents/repos/platform/VENV3/bin/sparse", line 8, in sys.exit(main()) File "/home/francis/Documents/repos/platform/VENV3/lib/python3.8/site-packages/streamparse/cli/sparse.py", line 87, in main args.func(args) File "/home/francis/Documents/repos/platform/VENV3/lib/python3.8/site-packages/streamparse/cli/run.py", line 127, in main run_local_topology( File "/home/francis/Documents/repos/platform/VENV3/lib/python3.8/site-packages/streamparse/cli/run.py", line 46, in run_local_topology topology_class = get_topology_from_file(topology_file) File "/home/francis/Documents/repos/platform/VENV3/lib/python3.8/site-packages/streamparse/util.py", line 573, in get_topology_from_file mod = importlib.import_module(mod_name) File "/usr/lib/python3.8/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import File "", line 991, in _find_and_load File "", line 975, in _find_and_load_unlocked File "", line 671, in _load_unlocked File "", line 783, in exec_module File "", line 219, in _call_with_frames_removed File "topologies/wordcount.py", line 11, in class WordCount(Topology): File "topologies/wordcount.py", line 13, in WordCount count_bolt = WordCountBolt.spec(inputs={word_spout: Grouping.fields("word")}, par=2) File "/home/francis/Documents/repos/platform/VENV3/lib/python3.8/site-packages/streamparse/storm/bolt.py", line 192, in spec return ShellBoltSpec( File "/home/francis/Documents/repos/platform/VENV3/lib/python3.8/site-packages/streamparse/dsl/bolt.py", line 23, in init super(ShellBoltSpec, self).init( File "/home/francis/Documents/repos/platform/VENV3/lib/python3.8/site-packages/streamparse/dsl/component.py", line 267, in init super(ShellComponentSpec, self).init( File "/home/francis/Documents/repos/platform/VENV3/lib/python3.8/site-packages/streamparse/dsl/component.py", line 41, in init self.inputs = self._sanitize_inputs(inputs) File "/home/francis/Documents/repos/platform/VENV3/lib/python3.8/site-packages/streamparse/dsl/component.py", line 84, in _sanitize_inputs for key, val in inputs.items(): RuntimeError: dictionary keys changed during iteration

    opened by frank-switchdin 1
  • Wordcount example use Anaconda environment

    Wordcount example use Anaconda environment

    While running wordcount example, I'm getting an error

    Cannot run program "streamparse_run" (in directory "/home/user/apache-storm-2.2.0/“/home/user/apache-storm-2.2.0/data”/supervisor/stormdist/wordcount-5-1603899388/resources"): error=2
    

    There's no directory of my Anaconda env, where sparse_run command is stored.

    How should I modify the config.json in order to be able to run wordcount example locally using Anaconda environment? This is my config.json so far:

    {
        "serializer": "json",
        "topology_specs": "topologies/",
        "virtualenv_specs": "virtualenvs/",
        "envs": {
            "prod": {
                "user": "",
                "ssh_password": "",
                "nimbus": "localhost",
                "workers": ["localhost"],
                "log": {
                    "path": "",
                    "max_bytes": 1000000,
                    "backup_count": 10,
                    "level": "info"
                },
                "use_ssh_for_nimbus": false,
                "virtualenv_root": "/home/user/anaconda3/envs",
                "use_virtualenv": true,
                "install_virtualenv": false,
                "virtualenv_name": "test"
            }
        }
    }
    
    opened by colonder 0
  • Bump json from 1.7.7 to 2.3.1 in /examples/kafka-jvm/chef/cookbooks/python

    Bump json from 1.7.7 to 2.3.1 in /examples/kafka-jvm/chef/cookbooks/python

    Bumps json from 1.7.7 to 2.3.1.

    Changelog

    Sourced from json's changelog.

    2020-06-30 (2.3.1)

    • Spelling and grammar fixes for comments. Pull request #191 by Josh Kline.
    • Enhance generic JSON and #generate docs. Pull request #347 by Victor Shepelev.
    • Add :nodoc: for GeneratorMethods. Pull request #349 by Victor Shepelev.
    • Baseline changes to help (JRuby) development. Pull request #371 by Karol Bucek.
    • Add metadata for rubygems.org. Pull request #379 by Alexandre ZANNI.
    • Remove invalid JSON.generate description from JSON module rdoc. Pull request #384 by Jeremy Evans.
    • Test with TruffleRuby in CI. Pull request #402 by Benoit Daloze.
    • Rdoc enhancements. Pull request #413 by Burdette Lamar.
    • Fixtures/ are not being tested... Pull request #416 by Marc-André Lafortune.
    • Use frozen string for hash key. Pull request #420 by Marc-André Lafortune.
    • Added :call-seq: to RDoc for some methods. Pull request #422 by Burdette Lamar.
    • Small typo fix. Pull request #423 by Marc-André Lafortune.

    2019-12-11 (2.3.0)

    • Fix default of create_additions to always be false for JSON(user_input) and JSON.parse(user_input, nil). Note that JSON.load remains with default true and is meant for internal serialization of trusted data. [CVE-2020-10663]
    • Fix passing args all #to_json in json/add/*.
    • Fix encoding issues
    • Fix issues of keyword vs positional parameter
    • Fix JSON::Parser against bigdecimal updates
    • Bug fixes to JRuby port

    2019-02-21 (2.2.0)

    • Adds support for 2.6 BigDecimal and ruby standard library Set datetype.

    2017-04-18 (2.1.0)

    • Allow passing of decimal_class option to specify a class as which to parse JSON float numbers.

    2017-03-23 (2.0.4)

    • Raise exception for incomplete unicode surrogates/character escape sequences. This problem was reported by Daniel Gollahon (dgollahon).
    • Fix arbitrary heap exposure problem. This problem was reported by Ahmad Sherif (ahmadsherif).

    2017-01-12 (2.0.3)

    • Set required_ruby_version to 1.9
    • Some small fixes

    2016-07-26 (2.0.2)

    ... (truncated)

    Commits
    • 0951d77 Bump version to 2.3.1
    • ddc29e2 Merge pull request #429 from flori/remove-generate-task-for-gemspec
    • cee8020 Removed gemspec task from default task on Rakefile
    • 9fd6371 Use VERSION file instead of hard-coded value
    • dc90bcf Removed explicitly date field in gemspec, it will assign by rubygems.org
    • 4c11a40 Removed task for json_pure.gemspec
    • e794ec9 Merge pull request #426 from marcandre/indent
    • 7cc9301 Merge pull request #428 from marcandre/change_fix
    • 9e2a1fb Make changes more precise #424
    • f8fa987 Merge pull request #424 from marcandre/update_changes
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the Security Alerts page.
    dependencies ruby 
    opened by dependabot[bot] 0
  • Bump rack from 1.5.2 to 2.2.3 in /examples/kafka-jvm/chef/cookbooks/python

    Bump rack from 1.5.2 to 2.2.3 in /examples/kafka-jvm/chef/cookbooks/python

    Bumps rack from 1.5.2 to 2.2.3.

    Changelog

    Sourced from rack's changelog.

    [2.2.3] - 2020-06-15

    Security

    • [CVE-2020-8184] Do not allow percent-encoded cookie name to override existing cookie names. BREAKING CHANGE: Accessing cookie names that require URL encoding with decoded name no longer works. (@fletchto99)

    [2.2.2] - 2020-02-11

    Fixed

    • Fix incorrect Rack::Request#host value. (#1591, @ioquatix)
    • Revert Rack::Handler::Thin implementation. (#1583, @jeremyevans)
    • Double assignment is still needed to prevent an "unused variable" warning. (#1589, @kamipo)
    • Fix to handle same_site option for session pool. (#1587, @kamipo)

    [2.2.1] - 2020-02-09

    Fixed

    • Rework Rack::Request#ip to handle empty forwarded_for. (#1577, @ioquatix)

    [2.2.0] - 2020-02-08

    SPEC Changes

    • rack.session request environment entry must respond to to_hash and return unfrozen Hash. (@jeremyevans)
    • Request environment cannot be frozen. (@jeremyevans)
    • CGI values in the request environment with non-ASCII characters must use ASCII-8BIT encoding. (@jeremyevans)
    • Improve SPEC/lint relating to SERVER_NAME, SERVER_PORT and HTTP_HOST. (#1561, @ioquatix)

    Added

    • rackup supports multiple -r options and will require all arguments. (@jeremyevans)
    • Server supports an array of paths to require for the :require option. (@khotta)
    • Files supports multipart range requests. (@fatkodima)
    • Multipart::UploadedFile supports an IO-like object instead of using the filesystem, using :filename and :io options. (@jeremyevans)
    • Multipart::UploadedFile supports keyword arguments :path, :content_type, and :binary in addition to positional arguments. (@jeremyevans)
    • Static supports a :cascade option for calling the app if there is no matching file. (@jeremyevans)
    • Session::Abstract::SessionHash#dig. (@jeremyevans)
    • Response.[] and MockResponse.[] for creating instances using status, headers, and body. (@ioquatix)
    • Convenient cache and content type methods for Rack::Response. (#1555, @ioquatix)

    Changed

    • Request#params no longer rescues EOFError. (@jeremyevans)
    • Directory uses a streaming approach, significantly improving time to first byte for large directories. (@jeremyevans)
    • Directory no longer includes a Parent directory link in the root directory index. (@jeremyevans)
    • QueryParser#parse_nested_query uses original backtrace when reraising exception with new class. (@jeremyevans)
    • ConditionalGet follows RFC 7232 precedence if both If-None-Match and If-Modified-Since headers are provided. (@jeremyevans)
    • .ru files supports the frozen-string-literal magic comment. (@eregon)

    ... (truncated)

    Commits
    • 1741c58 bump version
    • 5ccca47 When parsing cookies, only decode the values
    • a5e80f0 Bump version.
    • b0de37d Remove trailing whitespace.
    • 1a784e5 Prepare CHANGELOG for next patch release.
    • a0d57d4 Fix to handle same_site option for session pool
    • a9b223b Ensure full match. Fixes #1590.
    • f4c5645 Double assignment is still needed to prevent an "unused variable" warning
    • 5c121dd Revert "Update Thin handler to better handle more options"
    • 961d976 Prepare point release.
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the Security Alerts page.
    dependencies ruby 
    opened by dependabot[bot] 0
  • Bump rake from 10.1.1 to 13.0.1 in /examples/kafka-jvm/chef/cookbooks/python

    Bump rake from 10.1.1 to 13.0.1 in /examples/kafka-jvm/chef/cookbooks/python

    Bumps rake from 10.1.1 to 13.0.1.

    Changelog

    Sourced from rake's changelog.

    === 13.0.1

    ==== Bug fixes

    • Fixed bug: Reenabled task raises previous exception on second invokation Pull Request #271 by thorsteneckel
    • Fix an incorrectly resolved arg pattern Pull Request #327 by mjbellantoni

    === 13.0.0

    ==== Enhancements

    • Follows recent changes on keyword arguments in ruby 2.7. Pull Request #326 by nobu
    • Make PackageTask be able to omit parent directory while packing files Pull Request #310 by tonytonyjan
    • Add order only dependency Pull Request #269 by take-cheeze

    ==== Compatibility changes

    • Drop old ruby versions(< 2.2)

    === 12.3.3

    ==== Bug fixes

    • Use the application's name in error message if a task is not found. Pull Request #303 by tmatilai

    ==== Enhancements:

    • Use File.open explicitly.

    === 12.3.2

    ==== Bug fixes

    • Fixed test fails caused by 2.6 warnings. Pull Request #297 by hsbt

    ==== Enhancements:

    • Rdoc improvements. Pull Request #293 by colby-swandale
    • Improve multitask performance. Pull Request #273 by jsm
    • Add alias prereqs. Pull Request #268 by take-cheeze

    ... (truncated)

    Commits
    • c8251e2 Bump version to 13.0.1
    • 8edd860 Fixed build failure of the latest GitHub Actions
    • b6e2a66 Merge pull request #271 from thorsteneckel/bugfix-reenable_invocation_exception
    • 985abff Merge pull request #327 from mjbellantoni/mjb-order-only-arg-fix
    • 4a90acb Merge pull request #329 from jeremyevans/skip-taint-test-on-2.7
    • 4dc6282 Skip a taint test on Ruby 2.7
    • a08b697 Merge pull request #328 from orien/gem-metadata
    • c3953d4 Add project metadata to the gemspec
    • 46a8f7c Update comments to reflect the current state
    • 00aacdc Fix an incorrectly resolved arg pattern
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the Security Alerts page.
    dependencies ruby 
    opened by dependabot[bot] 0
  • Wrodcount example not work in local mode with storm 2.2.x version.

    Wrodcount example not work in local mode with storm 2.2.x version.

    I just want to report this bug(?) to somebody try to run wordcount tutorial.

    With storm version 2.2.x, there is little bit difference in storm shell command between 2.2.x and 1.2.3

    So, sparse run never run in local mode but run in deployment mode.

    I'm new to storm, and spent 1 days for catching it..

    opened by JeongtaekLim 1
  • find_packages call in setup.py should exclude tests folder

    find_packages call in setup.py should exclude tests folder

    Hi, we're running into the issue which both packages (streamparse which depends on pystorm) are trying to install init.py file on the same location (under test folder), I was told the culprit is https://github.com/Parsely/streamparse/blob/master/setup.py#L55 and it's supposed to be using:

    find_packages(exclude=["tests"])

    opened by z11373 0
  • issue with logging sometimes not writing to the file

    issue with logging sometimes not writing to the file

    Hi, we are running streamparse v3.13.1 on production, and we have our spout code simply calls self.logger.info(), which I guess it inherits from pystorm component (maybe https://github.com/pystorm/pystorm/blob/master/pystorm/component.py#L247 ?)

    I just learned that in the past few days no log lines being produced by that spout, but surprisingly today I see it writes the log again to the file. Checking the python process on the machine, I see the spout has been running since Jan 28th. This is very unfortunate, given when we really need the logs (from that spout), they are not available, and I don't know why :-( One thing for sure, the spout is still processing events, it's just what it's doing not being recorded to the log. I see other spouts/bolts in the topology seem fine.

    Any idea what could be the cause of this mysterious missing logs? And how can I debug this? Of course, if I simply kill the spout process, then Storm will automatically restart it, and the logging will be fine again. I really appreciate any help here. Thanks!

    opened by z11373 0
  • Adding new handler to the storm logger

    Adding new handler to the storm logger

    I have a custom log handler created and would like to push logs to this handler as well.

    I am calling following function in all the bolts initialize -

    def set_from_conf(component, stormconf):
        global log
        log = component.log
        for key, value in stormconf.items():
            if key.startswith('project.'):
                const = key.split(".")[1].upper()
                if const not in globals():
                    raise ValueError("Unknown setting '{}' in config.json".format(key))
                globals()[const] = value
    

    Can I do something like -

    component.logger = get_custom_logger()
    

    Or just

    import logging
    logging.getLogger().addHanlder(customHandler)
    
    opened by yugansh20 0
  • using PYTHONPATH or alternative in streamparse

    using PYTHONPATH or alternative in streamparse

    We have our python module that is called from our streamparse topology, right now we dpkg the module in the site-packages folder, but we want to have a way to override the location, for example if engineer wants to try his change, he/she can just redirect streamparse to look for the module at specific location (instead of in site-packages folder). Will setting PYTHONPATH work for our case? I think it'll work for submitting topology only, but then likely fail when Storm spawn the spout/bolt process (unless the env var is being propagated). If setting PYTHONPATH is not right solution for our case, is there a recommended solution? Thanks!

    opened by z11373 1
Releases(v4.0.0)
  • v4.0.0(Oct 7, 2020)

    ⚠️ Backward Incompatible Changes ⚠️

    • streamparse now only supports Python 3.6+

    Features

    • Switched depending on thriftpy to thriftpy2, because thriftpy is no longer maintained (PR #481)
    Source code(tar.gz)
    Source code(zip)
  • v3.16.0(Apr 8, 2019)

    Features

    • Faster virtualenv creation when passing multiple requirements files to sparse submit (PR #459)
    • Add sudo_user option to config.json that can be used for specifying a different user that is used for virtualenv creation/deletion instead of always using the SSH login user. (PR #455)

    Fixes

    • Fix typo that would cause update_virtualenv to when no CLI options were passed (PR #459)

    ⚠️ In the upcoming streamparse 4.0 release, support for Python 2 will be dropped. ⚠️

    Source code(tar.gz)
    Source code(zip)
  • v3.15.1(Jan 23, 2019)

    Fixes

    • Make update_virtualenv create virtualenvs as sudo user, and not just when trying to delete them (0250cfa)
    • Prevent pip 19.0 from being installed because it breaks --no-cache-dir installs (db26183)
    • Prevent virtualenv from downloading pip versions other than what we have pinned (0573bc8)
    • Make redis examples work with sparse run again (b61c85d)
    • Fix issues where repr(Topology) was always None (fd5b4b6)

    Misc.

    • Simplify Travis setup (#454)
    Source code(tar.gz)
    Source code(zip)
  • v3.15.0(Dec 6, 2018)

    Features

    • Allow submitting topologies as inactive/deactivated (PR #448)
    • Stop pinning pip to 9.x when running update_virtualenv (PR #444)

    Fixes

    • Added missing options argument to bootstrap project (PR #436)
    • Import fixes in redis example (PR #451)
    • Fix issues with virtualenv-related option resolution, which made options only in config.json not get used. (PR #453)

    Misc.

    • Reformatted code with black
    Source code(tar.gz)
    Source code(zip)
  • v3.14.0(Aug 15, 2018)

    Features

    • Allow install_venv and use_venv options to be overridden via CLI, topology config, etc. (PR #421)
    • Stop pinning pip to 9.x when running update_virtualenv (PR #444)

    Fixes

    • Pass config_file to get_config in submit.py (PR #419)
    Source code(tar.gz)
    Source code(zip)
  • v3.13.1(Jan 16, 2018)

    This fixes an issue with the --overwrite_virtualenv option introduced in 3.13.0 when some of the files in the virtualenv are not removable due to insufficient permissions. You can now specify the user to user for removing them with --user. We also delete using sudo by default now. (PR #417)

    Source code(tar.gz)
    Source code(zip)
  • v3.13.0(Jan 12, 2018)

    This tiny release just adds the --overwrite_virtualenv flag to sparse submit and sparse update_virtualenv for the cases where you want to recreate a virtualenv without having to manually delete it from all the worker nodes. (PR #416)

    Source code(tar.gz)
    Source code(zip)
  • v3.12.0(Dec 22, 2017)

    This release mostly improves and fixes some CLI option handling.

    Features

    • Added --config option to all commands to pass custom config.json path. (PR #409, Issue #343)
    • Added --options option to sparse update_virtualenv, so it can properly handle customized worker lists. (PR #409)
    • Switched from using prettytable to texttable, which has better line wrapping support for wide tables. (PR #413)
    • Updated version of storm.thrift used internally to the one from Storm 1.1.1, which made the sparse list tables more informative. We also now verify that Storm believe the topology name you are submitting is a valid name. (PR #414)

    Fixes

    • Added proper option resolution for sparse update_virtualenv (PR #409)
    • Fixed a couple typos in our handling of using custom Java objects for groupings. (PR #405)
    Source code(tar.gz)
    Source code(zip)
  • v3.11.0(Oct 12, 2017)

    This release simply makes it possible to override more settings that are in config.json at the Topology level. You can now add config = {'virtualenv_flags': '-p /path/to/python3'} to have some topologies in your project using one version of Python and others using another (Issue #399, PR #402)

    Source code(tar.gz)
    Source code(zip)
  • v3.10.0(Sep 13, 2017)

    This release just adds options to pre_submit_hook and post_submit_hook arguments. This is mostly so you can use the storm workers list inside hooks. (PR #396)

    Source code(tar.gz)
    Source code(zip)
  • v3.9.0(Sep 1, 2017)

    This release simply adds a new feature where the storm.workers.list topology configuration option is now set when you submit a topology, so if some part of your topology needs to know the list of Storm workers, you do not need to resort to connecting to Nimbus with each executor to find it out. (PR #395)

    Source code(tar.gz)
    Source code(zip)
  • v3.8.0(Aug 30, 2017)

    Another small release, but fixes issues with sparse tail and sparse remove_logs.

    Features

    • Can now specify the number of simultaneous SSH connections to allow when updating virtualenvs or accessing/removing logs from your workers with the --pool_size argument. (PR #393)
    • The user that is used to remove logs from the Storm workers with sparse remove_logs can now be specified with --user (PR #393)

    Fixes

    • sparse tail and sparse remove_logs do a much better job of only finding the logs that relate to your specified topology in Storm 1.x (PR #393)
    • sparse run will no longer crash if you have par set to a dict in your topology. (bafb72b)
    Source code(tar.gz)
    Source code(zip)
  • v3.7.1(Aug 29, 2017)

  • v3.7.0(Aug 25, 2017)

    Small release, but a big convenience feature was added.

    Features

    • No longer need to specify workers in your config.json! They will be looked up dynamically by communicating with your Nimbus server. If for some reason you would like to restrict where streamparse creates virtualenvs to a subset of your workers, you can still specify the worker list in config.json and that will take precedence. (PR #389)
    • Added better error messages for when util.get_ui_jsons fails (commit b2a8219)
    • Can now pass config_file file-like objects to util.get_config for if you need to do something like retrieve the config at runtime from a wheel. Not very common, but it is now supported. (PR #390)
    • Removed the now unused decorators module.

    Fixes

    • Fixed a documentation issue where the FAQ still referenced the old ext module that was removed a long time ago (Issue #388)
    Source code(tar.gz)
    Source code(zip)
  • v3.6.0(Jun 29, 2017)

    This has bunch of bugfixes, but a few new features too.

    Features

    • Added virtualenv_name as a setting in config.json for users who want to reuse the same virtualenv for multiple topologies. (Issue #371, PR #373)
    • Support ruamel.yaml>=0.15 (PR #379)
    • You can now inherit Storm Thrift types directly from streamparse.thrift instead of needing the extra from streamparse.thrift import storm_thrift (PR #380)
    • Added --timeout option to sparse run, sparse list, and sparse submit so that you can control how long to wait for Nimbus to respond before timing out. This is very useful on slow connections. (Issue #341, PR #381)

    Fixes

    • Fixed failing fabfile.py and tasks.py imports in get_user_tasks with Python 3. (Issue #376, PR #378)
    • Made Storm version parsing a little more lenient to fix some rare crashes (Issue #356)
    • Added documentation on how to pass string arguments to Java constructors (Issue #357)
    • Added documentation on how to create topologies with cycles in them (Issue #339)
    • Fixed issue where we were writing invalid YAML when a string contained a colon. (Issue #361)
    • We now pass on the serialized properly with sparse run (Issue #340, PR #382)
    • sparse no longer crashes on Windows (Issues #346 and pystorm/pystorm#40, PR pystorm/pystorm#45)
    Source code(tar.gz)
    Source code(zip)
  • v3.5.0(May 19, 2017)

    This release brings some compatibility fixes for Storm 1.0.3+.

    Features

    • Propagate topology parallelism settings to Flux when using sparse run (Issue #364, PR #365)
    • Pass component configs on to Flux when using sparse run (PR #266)

    Fixes

    • Make sure output shows up when using sparse run (PR #363)
    • Handle new resources nesting added in Storm 1.0.3 (Issue #362, PR #366)
    Source code(tar.gz)
    Source code(zip)
  • v3.4.0(Jan 26, 2017)

    This release fixes a few bugs and adds a few new features that require pystorm 3.1.0 or greater.

    Features

    • Added a ReliableSpout implementation that can be used to have spouts that will automatically replay failed tuples up to a specified number of times before giving up on them. (pystorm/pystorm#39)
    • Added Spout.activate and Spout.deactivate methods that will be called in Storm 1.1.0 and above when a spout is activated or deactivated. This is handy if you want to close database connections on deactivation and reconnect on activation. (Issue #351, PR pystorm/pystorm#42)
    • Can now override config.json Nimbus host and port with the STREAMPARSE_NIMBUS environment variable (PR #347)
    • Original topology name will now be sent to Storm as topology.original_name even when you're using sparse --override_name. (PR #354)

    Fixes

    • Fixed an issue where batching bolts would fail all batches they had received when they encountered an exception, even when exit_on_exception was False. Now they will only fail the current batch when exit_on_exception is False; if it is True, all batches are still failed. (PR pystorm/pystorm#43)
    • No longer call lein jar twice when creating jars. (PR #348)
    • We now use yaml.safe_load instead of yaml.load when parsing command line options. (commit 6e8c4d8)
    Source code(tar.gz)
    Source code(zip)
  • v3.3.0(Nov 23, 2016)

    This release fixes a few bugs and adds the ability to pre-build JARs for submission to Storm/Nimbus..

    Features

    • Added --local_jar_path and --remote_jar_path options to submit to allow the re-use of pre-built JARs. This should make deploying topologies that are all within the same Python project much faster. (Issue #332)
    • Added help subcommand, since it's not immediately obvious to users that sparse -h submit and sparse submit -h will return different help messages. (Issue #334)
    • We now provide a universal wheel on PyPI (commit f600c98)
    • sparse kill can now kill any topology and not just those that have a definition in your topologies folder. (commit 66b3a70)

    Fixes

    • Fixed Python 3 compatibility issue in sparse stats (Issue #333) an issue where name was being used instead of override_name when calling pre- and post-submit hooks. (10e8ce3)
    • sparse will no longer hang without any indication of why when you run it as root. (Issue #324)
    • RedisWordCount example topology works again (PR #336)
    • Fix an issue where updating virtualenvs could be slow because certain versions of fabric would choke on the pip output (commit 9b1978f)
    Source code(tar.gz)
    Source code(zip)
  • v3.2.0(Nov 3, 2016)

    This release adds tools to simplify some common deployment scenarios where you need to deploy the same topology to different environments.

    Features

    • The par parameter for the Component.spec() method used to set options for components within your topology can now take dictionaries in addition to integers. The keys must be names of environments in your config.json, and the values are integers as before. This allows you to specify different parallelism hints for components depending on the environment they are deployed to. This is very helpful when one of your environments has more processing power than the other. (PR #326)
    • Added --requirements options to sparse update_virtualenv and sparse submit commands so that you can customize the requirements files that are used for your virtualenv, instead of relying on files in your virtualenv_specs directory. (PR #328)
    • pip is now automatically upgraded to 9.x on the worker nodes and is now run with the flags --upgrade --upgrade-strategy only-if-needed to ensure that requirements specified as ranges are upgraded to the same version on all machines, without needlessly upgrading all recursive dependencies. (PR #327)

    Fixes

    • Fixed an issue where name was being used instead of override_name when calling pre- and post-submit hooks. (10e8ce3)
    • Docs have been updated to fix some RST rendering issues (issue #321)
    • Updated quickstart to clarify which version of Storm is required (PR #315)
    • Added information about flux-core dependency to help string for sparse run (PR #316)
    Source code(tar.gz)
    Source code(zip)
  • v3.1.1(Sep 2, 2016)

    This bugfix release fixes an issue where not having graphviz installed in your virtualenv would cause every command to crash, not just sparse visualize (#311)

    Source code(tar.gz)
    Source code(zip)
  • v3.1.0(Sep 1, 2016)

    Implemented enhancements:

    • Added sparse visualize command that will use graphviz to generate a visualization of your topology (PR #308)
    • Can now set ssh port in config.json (Issue #229, PR #309)
    • Use latest Storm for quickstart (PR #306)
    • Re-enable support for bolts without outputs in sparse run (PR #292)

    Fixed bugs:

    • sparse run error if more than one environment in config.json (Issue #304, PR #305)
    • Switch from invoke to fabric for kafka-jvm to fix TypeError (Issue #301, PR #310)
    • Rely on pystorm 3.0.3 to fix nested exception issue
    • Updated bootstrap filter so that generated project.clj will work fine with both sparse run and sparse submit
    Source code(tar.gz)
    Source code(zip)
  • v3.0.1(Jul 29, 2016)

  • v3.0.0(Jul 27, 2016)

    This is the final release of streamparse 3.0.0. The developer preview versions of this release have been used extensively by many people for months, so we are quite confident in this release, but please let us know if you encounter any issues.

    You can install this release via pip with pip install streamparse==3.0.0.

    Highlights

    • Topologies are now specified via a Python Topology DSL instead of the Clojure Topology DSL. This means you can/must now write your topologies in Python! Components can still be written in any language supported by Storm, of course. (Issues #84 and #136, PR #199, #226)
    • When log.path is not set in your config.json, pystorm will no longer issue warning about how you should set it; instead, it will automatically set up a StormHandler and log everything directly to your Storm logs. This is really handy as in Storm 1.0 there's support through the UI for searching logs.
    • The --ackers and --workers settings now default to the number of worker nodes in your Storm environment instead of 2.
    • Added sparse slot_usage command that can show you how balanced your topologies are across nodes. This is something that isn't currently possible with the Storm UI on its own. (PR #218)
    • Now fully Python 3 compatible (and tested on up to 3.5), because we rely on fabric3 instead of plain old fabric now. (4acfa2f)
    • Now rely on pystorm package for handling Multi-Lang IPC between Storm and Python. This library is essentially the same as our old storm subpackage with a few enhancements (e.g., the ability to use MessagePack instead of JSON to serialize messages). (Issue #174, Commits aaeb3e9 and 1347ded)

    :warning: API Breaking Changes :warning:

    • Topologies are now specified via a Python Topology DSL instead of the Clojure Topology DSL. This means you can/must now write your topologies in Python! Components can still be written in any language supported by Storm, of course. (Issues #84 and #136, PR #199, #226)
    • The deprecated Spout.emit_many method has been removed. (pystorm/[email protected])
    • As a consequence of using the new Python Topology DSL, all Bolts and Spouts that emit anything are expected to have the outputs attribute declared. It must either be a list of str or Stream objects, as described in the docs.
    • We temporarily removed the sparse run command, as we've removed all of our Clojure code, and this was the only thing that had to still be done in Clojure. (Watch issue #213 for future developments)
    • ssh_tunnel has moved from streamparse.contextmanagers to streamparse.util. The streamparse.contextmanagers module has been removed.
    • The ssh_tunnel context manager now returns the hostname and port that should be used for connecting nimbus (e.g., ('localhost', 1234) when use_ssh_for_nimbus is True or unspecified, and ('nimbus.foo.com', 6627) when use_ssh_for_nimbus is False).
    • need_task_ids defaults to False instead of True in all emit() method calls. If you were previously storing the task IDs that your tuples were emitted to (which is pretty rare), then you must pass need_task_ids=True in your emit() calls. This should provide a little speed boost to most users, because we do not need to wait on a return message from Storm for every emitted tuple.
    • Instead of having the log.level setting in your config.json influence the root logger's level, only your component (and its StormHandler if you haven't set log.path)'s levels will be set.
    • When log.path is not set in your config.json, pystorm will no longer issue warning about how you should set it; instead, it will automatically set up a StormHandler and log everything directly to your Storm logs. This is really handy as in Storm 1.0 there's support through the UI for searching logs.
    • The --par option to sparse submit has been remove. Please use --ackers and --workers instead.
    • The --ackers and --workers settings now default to the number of worker nodes in your Storm environment instead of 2.

    Features

    • Added sparse slot_usage command that can show you how balanced your topologies are across nodes. This is something that isn't currently possible with the Storm UI on its own. (PR #218)
    • Can now specify ssh_password in config.json if you don't have SSH keys setup. Storing your password in plaintext is not recommended, but nice to have for local VMs. (PR #224, thanks @motazreda)
    • Now fully Python 3 compatible (and tested on up to 3.5), because we rely on fabric3 instead of plain old fabric now. (4acfa2f)
    • Now remove _resources directory after JAR has been created.
    • Added serializer setting to config.json that can be used to switch between JSON and msgpack pack serializers (PR #238). Note that you cannot use the msgpack serializer unless you also include a Java implementation in your topology's JAR such as the one provided by Pyleus, or the one being added to Storm in apache/storm#1136. (PR #238)
    • Added support for custom log filenames (PR #234 — thanks @ kalmanolah)
    • Can now set environment-specific options, acker_count, and worker_count settings, to avoid constantly passing all those pesky options to sparse submit. (PR #265)
    • Added option to disable installation of virtualenv was stilling allowing their use, install_virtualenv. (PR #264).
    • The Python Topology DSL now allows topology-level config options to be set via the config attribute of the Topology class. (Issue #276, PRs #284 and #289)
    • Can now pass any valid YAML as a value for sparse submit --option (Issue #280, PR #285)
    • Added --override_name option to kill, submit, and update_virtualenv commands so that you can deploy the same topology file multiple times with different overridden names. (Issue #207, PR #286)

    Fixes

    • sparse slot_usage, sparse stats, and sparse worker_uptime are much faster as we've fixed an issue where they were creating many SSH subprocesses.
    • All commands that must connect to the Nimbus server now properly use SSH tunnels again.
    • The output from running pip install is now displayed when submitting your topology, so you can see if things get stuck.
    • sparse submit should no longer sporadically raise exceptions about failing to create SSH tunnels (PR #242).
    • sparse submit will no longer crash when your provide a value for --ackers (PR #241).
    • pin pystorm version to >=2.0.1 (PR #230)
    • sparse tail now looks for pystorm named filenames (@9339908)
    • Fixed typo that caused crash in sparse worker_uptime (@7085804)
    • Added back sparse run (PR #244)
    • sparse run should no longer crash when searching for the version number on some versions of Storm. (Issue #254, PR #255)
    • sparse run will no longer crash due to PyYAML dumping out !!python/unicode garbage into the YAML files. (Issue #256, PR #257)
    • A sparse run TypeError with Python 3 has been fixed. (@e232224)
    • sparse update_virtualenv will no longer ignore the virtualenv_flags setting in config.json. (Issue #281, PR #282)
    • sparse run now supports named streams on Storm 1.0.1+ (PR #260)
    • No longer remove non-topology-specific logs with sparse remove_logs (@45bd005)
    • sparse tail will now find logs in subdirectories for Storm 1.0+ compatibility (Issue #268, PR #271)

    Other Changes

    • Now rely on pystorm package for handling Multi-Lang IPC between Storm and Python. This library is essentially the same as our old storm subpackage with a few enhancements (e.g., the ability to use MessagePack instead of JSON to serialize messages). (Issue #174, Commits aaeb3e9 and 1347ded)
    • All Bolt, Spout, and Topology-related classes are all available directly at the streamparse package level (i.e., you can just do from streamparse import Bolt now) (Commit b9bf4ae).
    • sparse kill now will kill inactive topologies. (Issue #156)
    • All examples now use the Python DSL
    • The Kafka-JVM example has been cleaned up a bit, so now you can click on Storm UI log links and they'll work.
    • Docs have been updated to reflect latest Leiningen installation instructions. (PR #261)
    • A broken link in our docs was fixed. (PR #273)
    • JARs are now uploaded before killing the running topology to reduce downtime during deployments (PR #277)
    • Switched from PyYAML to ruamel.yaml (@18fd2e9)
    • Added docs for handling multiple streams and groupings (Issue #252, @344ce8c)
    • Added VPC deployment docs (Issue #134, @d2bd1ac)
    Source code(tar.gz)
    Source code(zip)
  • v3.0.0.dev3(Apr 20, 2016)

    This is the fourth developer preview release of streamparse 3.0. In addition to having been extensively tested in production, this version also is the first in the 3.0 line that has sparse run back in it. However it is only supports on Storm 0.10.0+ and requires you to add [org.apache.storm/flux-core "0.10.0"] to your dependencies in your project.clj, because it uses Storm's new Flux library to start the local cluster.

    You can install this release via pip with pip install --pre streamparse==3.0.0.dev3. It will not automatically install because it's a pre-release.

    :warning: API Breaking Changes :warning:

    In addition to those outlined in the 3.0.0dev0 and 3.0.0dev1 release notes, this release introduces the following backwards incompatible changes from pinning our pystorm version to 3.0+:

    • need_task_ids defaults to False instead of True in all emit() method calls. If you were previously storing the task IDs that your tuples were emitted to (which is pretty rare), then you must pass need_task_ids=True in your emit() calls. This should provide a little speed boost to most users, because we do not need to wait on a return message from Storm for every emitted tuple.
    • Instead of having the log.level setting in your config.json influence the root logger's level, only your component (and its StormHandler if you haven't set log.path)'s levels will be set.
    • When log.path is not set in your config.json, pystorm will no longer issue warning about how you should set it; instead, it will automatically set up a StormHandler and log everything directly to your Storm logs. This is really handy as in Storm 1.0 there's support through the UI for searching logs.

    Features

    • Added back sparse run (PR #244)
    Source code(tar.gz)
    Source code(zip)
  • v3.0.0.dev2(Apr 13, 2016)

    This is the third developer preview release of streamparse 3.0. Unlike when we released the previous two, this one has been tested extensively in production, so users should feel more confident using it. It's still missing sparse run, which will try to fix before the final release.

    You can install this release via pip with pip install --pre streamparse==3.0.0.dev2. It will not automatically install because it's a pre-release.

    :warning: API Breaking Changes :warning:

    These are outlined in the 3.0.0dev0 and 3.0.0dev1 release notes.

    Features

    • Added serializer setting to config.json that can be used to switch between JSON and msgpack pack serializers (PR #238). Note that you cannot use the msgpack serializer unless you also include a Java implementation in your topology's JAR such as the one provided by Pyleus, or the one being added to Storm in apache/storm#1136. (PR #238)
    • Added support for custom log filenames (PR #234 — thanks @ kalmanolah)

    Fixes

    • sparse submit should no longer sporadically raise exceptions about failing to create SSH tunnels (PR #242).
    • sparse submit will no longer crash when your provide a value for --ackers (PR #241).
    • pin pystorm version to >=2.0.1 (PR #230)
    • sparse tail now looks for pystorm named filenames (@9339908)
    • Fixed typo that caused crash in sparse worker_uptime (@7085804)
    Source code(tar.gz)
    Source code(zip)
  • v3.0.0.dev1(Mar 17, 2016)

    This is the second developer preview release of streamparse 3.0. It has not been tested extensively in production yet, so we are looking for as much feedback as we can get from users who are willing to test it out.

    You can install this release via pip with pip install --pre streamparse==3.0.0.dev1. It will not automatically install because it's a pre-release.

    :warning: API Breaking Changes :warning:

    In additions to those outlined in the 3.0.0dev0 release notes, we've made a few more changes.

    • ssh_tunnel has moved from streamparse.contextmanagers to streamparse.util. The streamparse.contextmanagers module has been removed.
    • The ssh_tunnel context manager now returns the hostname and port that should be used for connecting nimbus (e.g., ('localhost', 1234) when use_ssh_for_nimbus is True or unspecified, and ('nimbus.foo.com', 6627) when use_ssh_for_nimbus is False).

    Fixes

    • sparse slot_usage, sparse stats, and sparse worker_uptime are much faster as we've fixed an issue where they were creating many SSH subprocesses.
    • All commands that must connect to the Nimbus server now properly use SSH tunnels again.
    • The output from running pip install is now displayed when submitting your topology, so you can see if things get stuck.
    Source code(tar.gz)
    Source code(zip)
  • v3.0.0.dev0(Mar 11, 2016)

    This is the first developer preview release of streamparse 3.0. It has not been tested extensively in production yet, so we are looking for as much feedback as we can get from users who are willing to test it out.

    You can install this release via pip with pip install --pre streamparse==3.0.0.dev0. It will not automatically install because it's a pre-release.

    :warning: API Breaking Changes :warning:

    • Topologies are now specified via a Python Topology DSL instead of the Clojure Topology DSL. This means you can/must now write your topologies in Python! Components can still be written in any language supported by Storm, of course. (Issues #84 and #136, PR #199, #226)
    • The deprecated Spout.emit_many method has been removed. (pystorm/[email protected])
    • As a consequence of using the new Python Topology DSL, all Bolts and Spouts that emit anything are expected to have the outputs attribute declared. It must either be a list of str or Stream objects, as described in the docs.
    • We temporarily removed the sparse run command, as we've removed all of our Clojure code, and this was the only thing that had to still be done in Clojure. (Watch issue #213 for future developments)

    Features

    • Added sparse slot_usage command that can show you how balanced your topologies are across nodes. This is something that isn't currently possible with the Storm UI on its own. (PR #218)
    • Can now specify ssh_password in config.json if you don't have SSH keys setup. Storing your password in plaintext is not recommended, but nice to have for local VMs. (PR #224, thanks @motazreda)
    • Now fully Python 3 compatible (and tested on up to 3.5), because we rely on fabric3 instead of plain old fabric now. (4acfa2f)
    • Now remove _resources directory after JAR has been created.

    Other Changes

    • Now rely on pystorm package for handling Multi-Lang IPC between Storm and Python. This library is essentially the same as our old storm subpackage with a few enhancements (e.g., the ability to use MessagePack instead of JSON to serialize messages). (Issue #174, Commits aaeb3e9 and 1347ded)
    • All Bolt, Spout, and Topology-related classes are all available directly at the streamparse package level (i.e., you can just do from streamparse import Bolt now) (Commit b9bf4ae).
    • sparse kill now will kill inactive topologies. (Issue #156)
    • All examples now use the Python DSL
    • The Kafka-JVM example has been cleaned up a bit, so now you can click on Storm UI log links and they'll work.
    Source code(tar.gz)
    Source code(zip)
  • v2.1.4(Jan 11, 2016)

    This minor release adds support for specifying ui.port in config.json to make the sparse stats and sparse worker_uptime commands work when ui.port is not set to the default 8080.

    Source code(tar.gz)
    Source code(zip)
  • v2.1.3(Oct 20, 2015)

  • v2.1.2(Oct 13, 2015)

Python Stream Processing

Python Stream Processing Version: 1.10.4 Web: http://faust.readthedocs.io/ Download: http://pypi.org/project/faust Source: http://github.com/robinhood

Robinhood 5.6k Jun 13, 2021
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Luigi is a Python (3.6, 3.7 tested) package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow managemen

Spotify 14.6k Jun 13, 2021
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Horovod Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make dis

Horovod 11.3k Jun 13, 2021
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)

English | 简体中文 Welcome to the PaddlePaddle GitHub. PaddlePaddle, as the only independent R&D deep learning platform in China, has been officially open

null 15.7k Jun 13, 2021
Run MapReduce jobs on Hadoop or Amazon Web Services

mrjob: the Python MapReduce library mrjob is a Python 2.7/3.4+ package that helps you write and run Hadoop Streaming jobs. Stable version (v0.7.4) doc

Yelp.com 2.5k Jun 8, 2021
Bittorrent software for cats

NyaaV2 Setting up for development This project uses Python 3.7. There are features used that do not exist in 3.6, so make sure to use Python 3.7. This

null 2.8k Jun 11, 2021
Distributed machine learning platform

Veles Distributed platform for rapid Deep learning application development Consists of: Platform - https://github.com/Samsung/veles Znicz Plugin - Neu

Samsung 893 May 24, 2021
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear

null 16.2k Jun 13, 2021
Ray provides a simple, universal API for building distributed applications.

An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

null 16.2k Jun 13, 2021
Framework and Library for Distributed Online Machine Learning

Jubatus The Jubatus library is an online machine learning framework which runs in distributed environment. See http://jubat.us/ for details. Quick Sta

Jubatus 701 May 17, 2021
Microsoft Distributed Machine Learning Toolkit

DMTK Distributed Machine Learning Toolkit https://www.dmtk.io Please open issues in the project below. For any technical support email to [email protected]

Microsoft 2.8k May 30, 2021
ZeroNet - Decentralized websites using Bitcoin crypto and BitTorrent network

ZeroNet Decentralized websites using Bitcoin crypto and the BitTorrent network - https://zeronet.io / onion Why? We believe in open, free, and uncenso

ZeroNet 16.8k Jun 13, 2021
Privacy enhanced BitTorrent client with P2P content discovery

Tribler Towards making Bittorrent anonymous and impossible to shut down. We use our own dedicated Tor-like network for anonymous torrent downloading.

null 3.6k Jun 12, 2021
Deluge BitTorrent client - Git mirror, PRs only

Deluge is a BitTorrent client that utilizes a daemon/client model. It has various user interfaces available such as the GTK-UI, Web-UI and a Console-UI. It uses libtorrent at it's core to handle the BitTorrent protocol.

Deluge team 1k Jun 15, 2021