Jug: A Task-Based Parallelization Framework

Overview

Jug: A Task-Based Parallelization Framework

Jug allows you to write code that is broken up into tasks and run different tasks on different processors.

https://travis-ci.com/luispedro/jug.png Join the chat at https://gitter.im/luispedro/jug

It uses the filesystem to communicate between processes and works correctly over NFS, so you can coordinate processes on different machines.

Jug is a pure Python implementation and should work on any platform.

Python versions 3.5 and above are supported.

Website: http://luispedro.org/software/jug

Documentation: https://jug.readthedocs.org/

Video: On vimeo or showmedo

Mailing List: http://groups.google.com/group/jug-users

Testimonials

"I've been using jug with great success to distribute the running of a reasonably large set of parameter combinations" - Andreas Longva

Install

You can install Jug with pip:

pip install Jug

or use, if you are using conda, you can install jug from conda-forge using the following commands:

conda config --add channels conda-forge
conda install jug

Citation

If you use Jug to generate results for a scientific publication, please cite

Coelho, L.P., (2017). Jug: Software for Parallel Reproducible Computation in Python. Journal of Open Research Software. 5(1), p.30.

http://doi.org/10.5334/jors.161

Short Example

Here is a one minute example. Save the following to a file called primes.py (if you have installed jug, you can obtain a slightly longer version of this example by running jug demo on the command line):

from jug import TaskGenerator
from time import sleep

@TaskGenerator
def is_prime(n):
    sleep(1.)
    for j in range(2,n-1):
        if (n % j) == 0:
            return False
    return True

primes100 = [is_prime(n) for n in range(2,101)]

This is a brute-force way to find all the prime numbers up to 100. Of course, this is only for didactical purposes, normally you would use a better method. Similarly, the sleep function is so that it does not run too fast. Still, it illustrates the basic functionality of Jug for embarassingly parallel problems.

Type jug status primes.py to get:

Task name                  Waiting       Ready    Finished     Running
----------------------------------------------------------------------
primes.is_prime                  0          99           0           0
......................................................................
Total:                           0          99           0           0

This tells you that you have 99 tasks called primes.is_prime ready to run. So run jug execute primes.py &. You can even run multiple instances in the background (if you have multiple cores, for example). After starting 4 instances and waiting a few seconds, you can check the status again (with jug status primes.py):

Task name                  Waiting       Ready    Finished     Running
----------------------------------------------------------------------
primes.is_prime                  0          63          32           4
......................................................................
Total:                           0          63          32           4

Now you have 32 tasks finished, 4 running, and 63 still ready. Eventually, they will all finish and you can inspect the results with jug shell primes.py. This will give you an ipython shell. The primes100 variable is available, but it is an ugly list of jug.Task objects. To get the actual value, you call the value function:

In [1]: primes100 = value(primes100)

In [2]: primes100[:10]
Out[2]: [True, True, False, True, False, True, False, False, False, True]

What's New

Version 2.1.1 (18 March 2021)

  • Include requirements files in distribution

Version 2.1.0 (18 March 2021)

  • Improvements to webstatus (by Robert Denham)
  • Removed Python 2.7 support
  • Fix output encoding for Python 3.8
  • Fix bug mixing mapreduce() & status --cache
  • Make block_access (used in mapreduce()) much faster (20x)
  • Fix important redis bug
  • More precise output in cleanup command

Version 2.0.2 (Thu Jun 11 2020)

  • Fix command line argument parsing

Version 2.0.1 (Thu Jun 11 2020)

  • Fix handling of JUG_EXIT_IF_FILE_EXISTS environmental variable
  • Fix passing an argument to jug.main() function
  • Extend --pdb to exceptions raised while importing the jugfile (issue #79)

version 2.0.0 (Fri Feb 21 2020)

  • jug.backend.base_store has 1 new method 'listlocks'
  • jug.backend.base_lock has 2 new methods 'fail' and 'is_failed'
  • Add 'jug execute --keep-failed' to preserve locks on failing tasks.
  • Add 'jug cleanup --failed-only' to remove locks from failed tasks
  • 'jug status' and 'jug graph' now display failed tasks
  • Check environmental exit variables by default (suggested by Renato Alves, issue #66)
  • Fix 'jug sleep-until' in the presence of barrier() (issue #71)

For older version see ChangeLog file or the full history.

Comments
  • Jug doesn't like jobs when they're ended by SIGTERM

    Jug doesn't like jobs when they're ended by SIGTERM

    There seems to be a weird interaction when you run jug cleanup while jobs are still running, but I could be wrong, but in addition, when a job is ended by sigterm jobs seem to enter the completed stage and never finish.

    opened by danpf 12
  • Implement a wrapper to run jug with gridmap

    Implement a wrapper to run jug with gridmap

    A light-weight wrapper to run jug with gridmap on a Grid Engine cluster

    Get the best of both worlds of gridmap and jug

    • http://pythonhosted.org/Jug/
    • http://gridmap.readthedocs.org/en/latest/

    This implements jug.grid_jug to run jug jobs with all the comfortable features of gridmap.

    As I am not sure whether this belongs into the gridmap or the jug package, I create a pull request for both to foster discussion between the two projects. The file jug/grid.py is identical with the file gridmap/jug.py .

    opened by andsor 9
  • Processes only start working after checking status

    Processes only start working after checking status

    When using subprocess.Popen(["jug", "execute", "exmp.py"], stderr=STDOUT, creationflags=CREATE_NEW_CONSOLE) runing jug process, the subprocess will only starts after checking status. The same thing happens when running by subprocess.Popen(["jug", "execute", "exmp.py"], stderr=STDOUT, shell=True)

    This problem doesn't appear when using subprocess.Popen(["jug", "execute", "exmp.py"], stderr=STDOUT)

    Would you mind to help?

    opened by chenxi-shi 7
  • Failed to assign tasks correctly

    Failed to assign tasks correctly

    Hi, I wrote a small piece of python code on a macbook which has 10 + 1 tasks. 10 similar sleeping task and 1 joint task. I am running it by 4 processes and found that sometime Jug cannot assign task correctly. It happens like one process does all tasks along and, at the same time, another 3 share the 11 tasks.

    opened by chenxi-shi 6
  • Raises error if jugdir is not defined

    Raises error if jugdir is not defined

    If jugdir is not provided on command-line to "jug execute", and there is no jug config file, the program errors out, since this is prior to getting to the call to init() which handles the missing jugdir.

    Just define a default:

    parser.add_option('--jugdir',
                    action='store',
                    dest='jugdir',
                    default='jugdir', # define default value
                    help='Where to save intermediate results')
    
    opened by rjplevin 6
  • IterateTask as a decorator

    IterateTask as a decorator

    I have been using a decorator to provide the functionality of iteratetask instead of the function.

    The reason I like this is that it specifies how to handle the output (i.e. how many items in the tuple) closer to the place the output is specified (i.e. at the end of the function) rather than where the workflow is specified (which is usually a different file).

    So it looks like:

    @IterateTask(n=3)
    @jug.TaskGenerator
    def do_thing(args):
        return a, b, c
    

    I've been using the following for a few weeks without incident:

    import jug
    from functools import wraps
    
    def IterateTask(n):
        def decorator(f):
             @wraps(f)
             def wrapper(*args, **kwargs):
                 return jug.iteratetask(f(*args, **kwargs), n=n)
             return wrapper
        return decorator
    

    Is this something you'd be interested in adding? Is there any reason I haven't foreseen that this isn't a good idea?

    opened by justinrporter 5
  • The right way to use Jug with class object

    The right way to use Jug with class object

    Hi, I defined a class which has a init. There are some methods in this class is decorated with @TaskGenerator. These methods cannot get class instance correctly. I tried 2 ways to solve it and they all don't work.

    1. decorate init with @TaskGenerator then init() return's a jug Task instead of None
    2. put a barrier() after defining the class object then got TypeError: cannot serialize '_io.TextIOWrapper' object

    Would you mind to tell me the correct way to do it?

    opened by chenxi-shi 5
  • Backwards compatibility fixes

    Backwards compatibility fixes

    sys.argv now contains jugfile and unparsed arguments (i.e. after --) The subcommand api tests were also leaking causing other parts of the testsuite to fail.

    opened by unode 5
  • `RuntimeError: maximum recursion depth exceeded while calling a Python object`

    `RuntimeError: maximum recursion depth exceeded while calling a Python object`

    Came across this error today:

    CRITICAL:root:Exception while running tangent_lstm.test_lstm: maximum recursion depth exceeded while calling a Python object
    Traceback (most recent call last):
      File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/bin/jug", line 5, in <module>
        main()
      File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/jug.py", line 415, in main
        execute(options)
      File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/jug.py", line 254, in execute
        execution_loop(tasks, options)
      File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/jug.py", line 163, in execution_loop
        t.run(debug_mode=options.debug)
      File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/task.py", line 95, in run
        self.store.dump(self._result, name)
      File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/backends/file_store.py", line 139, in dump
        encode_to(object, output)
      File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/backends/encode.py", line 77, in encode_to
        write(object, stream)
    RuntimeError: maximum recursion depth exceeded while calling a Python object
    

    I'm getting this error when creating a Task that wraps a function which returns a Keras model.

    opened by samuela 5
  • A few Fixes to the redis backend and the hash function, probably worth to merge, probably not.

    A few Fixes to the redis backend and the hash function, probably worth to merge, probably not.

    • The url decomposition regexp for the redis back-end apparently had a problem.
    • The hash-from-arguments approach didn't work for my particular problem: a type can be 'pickle-able' without the pickle being unique, something that makes tasks non-uniquely identifiable. So, made it possible for the user to control the hash somehow.
    opened by alcidesv 5
  • Allow cleanup of only locks

    Allow cleanup of only locks

    I recently had an issue with a few of my jug processes being killed and therefore locks being kept on some active tasks. Normally, I would just delete the lock files in the jugdir, but I was using the Redis backend, so this was not possible. As suggested in the docs, I ran jug cleanup, expecting only the locks to be cleared; I was surprised to find the completed jobs cleared as well.

    Is there any way to have jug cleanup only clear the locks? If not, can there be an option for this?

    opened by freemansw1 4
  • Invalidate --target is too greedy

    Invalidate --target is too greedy

    -------------------------------------------------------------------------------------
               0           0          1           0           0  jugfile.function
               0           0          1           0           0  jugfile.function_other
    .....................................................................................
               0           0          2           2           0  Total
    
    

    Running jug invalidate --target jugfile.function invalidates everything.

    opened by unode 4
Releases(v2.2.2)
  • v2.2.2(Jul 18, 2022)

  • v2.2.0(May 2, 2022)

  • v2.1.1(Mar 21, 2021)

    Important are the fixes for Python 3.8 and redis.

    Compared to 2.1.0, this patch release includes some missing files in the package.

    Full list of changes * Removed Python 2.7 support * Fix output encoding for Python 3.8 * Fix bug mixing mapreduce() & status --cache * Make block_access (used in mapreduce()) much faster (20x) * Fix important redis bug * More precise output in cleanup command

    Source code(tar.gz)
    Source code(zip)
  • v2.0.2(Jun 12, 2020)

    Bugfix release.

    Compared to 2.0.0:

    • Fix handling of JUG_EXIT_IF_FILE_EXISTS environmental variable
    • Fix passing an argument to jug.main() function
    • Extend --pdb to exceptions raised while importing the jugfile (issue #79)
    Source code(tar.gz)
    Source code(zip)
  • v2.0.1(Jun 12, 2020)

  • v2.0.0(Jun 12, 2020)

    The big changes are failed tasks (contributed by @unode).

    Also, environmental variables are now always checked and creating a file called __jug_please_stop.txt will stop a jug execute run in a clean way.

    Full ChangeLog

    • jug.backend.base_store has 1 new method listlocks
    • jug.backend.base_lock has 2 new methods fail and is_failed
    • Add 'jug execute --keep-failed' to preserve locks on failing tasks.
    • Add 'jug cleanup --failed-only' to remove locks from failed tasks
    • 'jug status' and 'jug graph' now display failed tasks
    • Check environmental exit variables by default (suggested by Renato Alves, issue #66)
    • Fix 'jug sleep-until' in the presence of barrier() (issue #71)
    Source code(tar.gz)
    Source code(zip)
  • v2.0.0rc0(Jan 31, 2020)

  • v1.6.9(Jan 31, 2020)

  • v1.6.3(Nov 1, 2017)

    Add a polite request for citations to the Jug paper:

    Coelho, L.P., (2017). Jug: Software for Parallel Reproducible Computation in
    Python. Journal of Open Research Software. 5(1), p.30.
    
    http://doi.org/10.5334/jors.161
    
    Source code(tar.gz)
    Source code(zip)
  • v1.6.2(Nov 1, 2017)

  • v1.6.1(Nov 1, 2017)

  • v1.6.0(Aug 24, 2017)

    Release 1.6.0

    Adds the jug graph subcommand to generate a little graph with tasks and their state.

    • Add jug graph subcommand
    • Generates a graph of tasks
    • jug execute --keep-going now ends with non-zero exit code in case of failures
    • Fix bug with cleanup in dict_store not providing the number of removed records
    • Add jug cleanup --keep-locks to remove obsolete results without affecting locks

    Almost all of the credit for changes since 1.5.0 goes to Renato Alves (@unode on github).

    Source code(tar.gz)
    Source code(zip)
  • v1.5.0(Jul 16, 2017)

    Several new functions (is_jug_running, invalidate in shell, ...) and a new internal structure which allows for extensions (work by Renato Alves, aka @Unode).

    Full ChangeLog:

    • Add demo subcommand
    • Add is_jug_running() function
    • Fix bug in finding config files
    • Improved --debug mode: check for unsupported recursive task creation
    • Add invalidate() to shell environment
    • Use ~/.config/jug/jugrc as configuration file
    • Add experimental support for extensible commands, use ~/.config/jug/jug_user_commands.py
    • jugrc: execute_wait_cycle_time_secs is now execute_wait_cycle_time
    • Expose sync_move in jug.utils
    Source code(tar.gz)
    Source code(zip)
  • v1.3.0(Nov 1, 2016)

    Major improvements are IPython 5 compatibility, jug_execute, and timing.

    Full ChangeLog:

    • Update shell subcommand to IPython 5
    • Use ~/.config/jugrc as configuration file
    • Cleanup usage string
    • Use bottle instead of web.py for webstatus subcommand
    • Add jug_execute function
    • Add timing functionality
    Source code(tar.gz)
    Source code(zip)
  • release-1.2.1(Jun 8, 2016)

    A bugfix release, especially fixing jug on Windows (previously unusable as Windows does not support fsync on directories).

    Full ChangeLog:

    * Changed execution loop to ensure that all tasks are checked (issue #33
    on github)
    * Fixed bug that made 'check' or 'sleep-until' slower than necessary
    * Fixed jug on Windows (which does not support fsync on directories)
    * Made Tasklets use slightly less memory
    
    Source code(tar.gz)
    Source code(zip)
  • release-1.2.0(Jun 8, 2016)

    Several optimizations in code and minor fixes made a new release worthwhile. Because the hashes have been changed, this needs to be a new (minor) version (see issue #25): upgrading will require you to rerun any tasks that use kwargs.

    Full ChangeLog

    • Use HIGHEST_PROTOCOL when pickle()ing
    • Add compress_numpy option to file_store
    • Add register_hook_once function
    • Optimize case when most (or all) tasks are already run
    • Add --short option to jug status and jug execute
    • Fix bug with dictionary order in kwargs (fix by Andreas Sorge)
    • Fix ipython colors (fix by Andreas Sorge)
    • Sort tasks in jug status
    Source code(tar.gz)
    Source code(zip)
Owner
Luis Pedro Coelho
Luis Pedro Coelho
A task scheduler with task scheduling, timing and task completion time tracking functions

A task scheduler with task scheduling, timing and task completion time tracking functions. Could be helpful for time management in daily life.

ArthurLCW 0 Jan 15, 2022
A CLI based task manager tool which helps you track your daily task and activity.

CLI based task manager tool This is the simple CLI tool can be helpful in increasing your productivity. More like your todolist. It uses Postgresql as

ritik 1 Jan 19, 2022
Django database backed celery periodic task scheduler with support for task dependency graph

Djag Scheduler (Dj)ango Task D(AG) (Scheduler) Overview Djag scheduler associates scheduling information with celery tasks The task schedule is persis

Mohith Reddy 3 Nov 25, 2022
Using Bert as the backbone model for lime, designed for NLP task explanation (sentence pair text classification task)

Lime Comparing deep contextualized model for sentences highlighting task. In addition, take the classic explanation model "LIME" with bert-base model

JHJu 2 Jan 18, 2022
PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

Cong Cai 12 Dec 19, 2021
When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework (CVPR 2021 oral)

MTLFace This repository contains the PyTorch implementation and the dataset of the paper: When Age-Invariant Face Recognition Meets Face Age Synthesis

Hzzone 120 Jan 5, 2023
Clepsydra is a mini framework for task scheduling

Intro Clepsydra is a mini framework for task scheduling All parts are designed to be replaceable. Main ideas are: No pickle! Tasks are stored in reada

Andrey Tikhonov 15 Nov 4, 2022
PhoNLP: A BERT-based multi-task learning toolkit for part-of-speech tagging, named entity recognition and dependency parsing

PhoNLP is a multi-task learning model for joint part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing. Experiments on Vietnamese benchmark datasets show that PhoNLP produces state-of-the-art results, outperforming a single-task learning approach that fine-tunes the pre-trained Vietnamese language model PhoBERT for each task independently.

VinAI Research 109 Dec 2, 2022
Task-based datasets, preprocessing, and evaluation for sequence models.

SeqIO: Task-based datasets, preprocessing, and evaluation for sequence models. SeqIO is a library for processing sequential data to be fed into downst

Google 290 Dec 26, 2022
Official repository for "Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems"

Action-Based Conversations Dataset (ABCD) This respository contains the code and data for ABCD (Chen et al., 2021) Introduction Whereas existing goal-

ASAPP Research 49 Oct 9, 2022
NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation (ACL-IJCNLP 2021)

NeuralWOZ This code is official implementation of "NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation". Sungdong Kim, Mi

NAVER AI 31 Oct 25, 2022
Implementation of PyTorch-based multi-task pre-trained models

mtdp Library containing implementation related to the research paper "Multi-task pre-training of deep neural networks for digital pathology" (Mormont

Romain Mormont 27 Oct 14, 2022
The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

Sun Yi 201 Nov 21, 2022
Task-based end-to-end model learning in stochastic optimization

Task-based End-to-end Model Learning in Stochastic Optimization This repository is by Priya L. Donti, Brandon Amos, and J. Zico Kolter and contains th

CMU Locus Lab 164 Dec 29, 2022
The aim of this task is to predict someone's English proficiency based on a text input.

English_proficiency_prediction_NLP The aim of this task is to predict someone's English proficiency based on a text input. Using the The NICT JLE Corp

null 1 Dec 13, 2021
Multi-task yolov5 with detection and segmentation based on yolov5

YOLOv5DS Multi-task yolov5 with detection and segmentation based on yolov5(branch v6.0) decoupled head anchor free segmentation head README中文 Ablation

null 150 Dec 30, 2022