Data intensive science for everyone.

Overview
Galaxy Logo

The latest information about Galaxy can be found on the Galaxy Community Hub.

Community support is available at Galaxy Help.

Chat on gitter Chat on irc Release Documentation Inspect the test results

Galaxy Quickstart

Galaxy requires Python 3.6 . To check your Python version, run:

$ python -V
Python 3.6.10

Start Galaxy:

$ sh run.sh

Once Galaxy completes startup, you should be able to view Galaxy in your browser at: http://localhost:8080

For more installation details please see: https://getgalaxy.org/

Documentation is available at: https://docs.galaxyproject.org/

Tutorials on how to use Galaxy, perform scientific analyses with it, develop Galaxy and its tools, and admin a Galaxy server are at: https://training.galaxyproject.org/

Tools

Tools can be either installed from the Tool Shed or added manually. For details please see the tutorial. Note that not all dependencies for the tools provided in the tool_conf.xml.sample are included. To install them please visit "Manage dependencies" in the admin interface.

Issues and Galaxy Development

Please see CONTRIBUTING.md .

Comments
  • History component refactor

    History component refactor

    This is a complete rebuild of the right hand history panel. Since it's such a big code change, I built it as a beta feature that is accessed by first logging in, then going to the giant Gear-Menu-Of-Doomβ„’ and picking "Beta History Panel" at the bottom of the list.

    Definitely still a WIP. There's a lot to look at here, and I'm still debugging some of it. Unit tests are coming in the next commit.

    Goals: Cache and manage API requests to reduce server/client traffic. Modernize the UI code Make a testable UI Pave the way for more extensive history feature additions in the future (history graph, real-time updates) Implement a reactive data model.

    Notable changes

    Database Timestamp Triggers

    One way to reduce server traffic is to compare dates and only ask for a smaller set of results. To that end, I have implemented triggers to keep the history table's update_time updated as changes occur to its contents. There are still more refinements to be done here.

    Instead of creating some kind of complex ORM association object in the server side code, I thought it might be more performant to simply implement a couple timestamp triggers on the dataset, HDA, HDCA tables that update history when they change. I think this is one of the few uses for database triggers that really makes sense.

    RxJS

    RxJS is the primary mechanism behind the polling and manual content retrieval processes. Although RxJS can be a little more sophisticated to parse than something as simple as a promise, observables are vastly more powerful than promise based query implementations and pave the way for more advanced client-side data options such as real-time web-socket or push-server updates, collaborative editing, etc.

    The most involved parts are currently in the model/ContentLoader objects, and I will be providing extensive documentation about how these objects work. These observables control when we request data from the api either through human interaction with the UI or through automatic server-updates, currently implemented through polling (but since incoming changes are represented as observables, it will be trivial to transition to any other update mechanism such as a websocket.)

    RxDB / IndexedDB

    HDA/HDAC/Dataset and Collection data are stored in IndexedDB using an NPM package RxDB which allows our components to generate and subscribe to live queries that point at the local cached contents. These live query observables will emit updates when the cache does.

    Since IndexDB is local and shared across browser tabs, this means UI data updates are instant across tabs. You won't be able to have stale data in one tab that doesn't jive with another tab any more. Nor is it necessary to ever "refresh" your history.

    Forthcoming: Unit tests Selenium tests Documentation for the ContentLoader, polling cycle, caching mechanisms Possibly implement shared layout components with tool panel redesign PR

    kind/enhancement area/UI-UX kind/feature area/API area/histories area/client 
    opened by Nerdinacan 59
  • Database deadlock

    Database deadlock

    The following exception occurs while running a very large workflow:

    galaxy.metadata DEBUG 2020-09-18 19:35:46,971 setting metadata externally failed for HistoryDatasetAssociation 54263: Metadata results could not be read from '/data/jobs_directory/024/24438/metadata/metadata_results_outfile'
    galaxy.jobs.runners ERROR 2020-09-18 19:35:48,127 (24438/galaxy-islandcompare-24438) Job wrapper finish method failed
    Traceback (most recent call last):
      File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1247, in _execute_context
        self.dialect.do_execute(
      File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute
        cursor.execute(statement, parameters)
    psycopg2.errors.DeadlockDetected: deadlock detected
    DETAIL:  Process 25865 waits for ShareLock on transaction 646521; blocked by process 4054.
    Process 4054 waits for ShareLock on transaction 646527; blocked by process 25743.
    Process 25743 waits for ExclusiveLock on tuple (11,36) of relation 17171 of database 16396; blocked by process 25865.
    HINT:  See server log for query details.
    CONTEXT:  while updating tuple (11,36) in relation "history"
    SQL statement "UPDATE history
                        SET update_time = (now() at time zone 'utc')
                        WHERE id = NEW.history_id OR id = OLD.history_id"
    PL/pgSQL function update_history_update_time() line 9 at SQL statement
    
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "/srv/galaxy/lib/galaxy/jobs/runners/__init__.py", line 540, in _finish_or_resubmit_job
        job_wrapper.finish(tool_stdout, tool_stderr, exit_code, check_output_detected_state=check_output_detected_state, job_stdout=job_stdout, job_stderr=job_stderr)
      File "/srv/galaxy/lib/galaxy/jobs/__init__.py", line 1713, in finish
        self.sa_session.flush()
      File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/orm/scoping.py", line 162, in do
        return getattr(self.registry(), name)(*args, **kwargs)
      File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2496, in flush
        self._flush(objects)
      File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2637, in _flush
        transaction.rollback(_capture_exception=True)
      File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
        compat.raise_(
      File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
        raise exception
      File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2597, in _flush
        flush_context.execute()
      File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/orm/unitofwork.py", line 422, in execute
        rec.execute(self)
      File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/orm/unitofwork.py", line 586, in execute
        persistence.save_obj(
      File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 230, in save_obj
        _emit_update_statements(
      File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 994, in _emit_update_statements
        c = cached_connections[connection].execute(
      File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 984, in execute
        return meth(self, multiparams, params)
      File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 293, in _execute_on_connection
        return connection._execute_clauseelement(self, multiparams, params)
      File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1097, in _execute_clauseelement
        ret = self._execute_context(
      File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1287, in _execute_context
        self._handle_dbapi_exception(
      File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1481, in _handle_dbapi_exception
        util.raise_(
      File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
        raise exception
      File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1247, in _execute_context
        self.dialect.do_execute(
      File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute
        cursor.execute(statement, parameters)
    sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
    DETAIL:  Process 25865 waits for ShareLock on transaction 646521; blocked by process 4054.
    Process 4054 waits for ShareLock on transaction 646527; blocked by process 25743.
    Process 25743 waits for ExclusiveLock on tuple (11,36) of relation 17171 of database 16396; blocked by process 25865.
    HINT:  See server log for query details.
    CONTEXT:  while updating tuple (11,36) in relation "history"
    SQL statement "UPDATE history
                        SET update_time = (now() at time zone 'utc')
                        WHERE id = NEW.history_id OR id = OLD.history_id"
    PL/pgSQL function update_history_update_time() line 9 at SQL statement
    
    [SQL: UPDATE history_dataset_association SET update_time=%(update_time)s, info=%(info)s, blurb=%(blurb)s, peek=%(_peek)s, tool_version=%(tool_version)s, metadata=%(_metadata)s, version=%(version)s WHERE history_dataset_association.id = %(history_dataset_association_id)s]
    [parameters: {'update_time': datetime.datetime(2020, 9, 18, 19, 35, 47, 120934), 'info': '', 'blurb': '2,897 lines, 98 comments', '_peek': '#\n#\n# Core-specific parameters:\n#\n#    Menu parameters:\n', 'tool_version': 'GNU Awk 5.0.1, API: 2.0\n', '_metadata': <psycopg2.extensions.Binary object at 0x7f51075e1ed0>, 'version': 2, 'history_dataset_association_id': 54263}]
    (Background on this error at: http://sqlalche.me/e/e3q8)
    galaxy.jobs ERROR 2020-09-18 19:35:48,180 fail(): Missing output file in working directory: [Errno 2] No such file or directory: '/data/jobs_directory/024/24438/outputs/galaxy_dataset_02b9521f-28a9-4cb2-9d9c-40f3a8088fd3.dat'
    

    The dataset reports Unable to finish job as the error.

    Galaxy 20.05 (autoscaled uwsgi app, 3 job handlers, 3 workflow schedulers)

    uwsgi:
          buffer-size: 16384
          processes: 1
          threads: 4
          offload-threads: 2
    

    I am not sure how to prevent this.

    kind/bug 
    opened by innovate-invent 55
  • Replace eggs with wheels

    Replace eggs with wheels

    Smash Eggs

    More importantly, replace all of our custom egg installation code with a very slightly modified version of pip. The modification is necessary if/until wheel is updated to include support for Linux wheels.

    Other changes include

    • Removal of the awful lib/pkg_resources.py
    • Relocated the bit of eggs code that checked the Galaxy config to determine what dependencies are needed to a wheels module.
    • galaxyproject/docker-build has been updated with scripts and images to support building wheels for distribution on 32 and 64 bit Linux.

    On virtualenvs

    scripts/common_startup.sh will create a virtualenv at $galaxy_root/.venv if one is not already active and wheels will be installed to here. Otherwise, Galaxy will use whatever virtualenv is currently active (in $VIRTUAL_ENV).

    Whatever virtualenv Galaxy is started under will be passed to jobs if a virtualenv is not already active in the job environment (and it exists at the same path on the compute resource). You can force jobs to use a different virtualenv using the <env> tag in a job destination, e.g.:

    <destination id="foo">
        <env file="/path/to/venv/bin/activate"/>
    </destination>
    

    Galaxy needs to become installable in the traditional python setup.py install manner which will change some things about how virtualenvs are handled in the future (tl;dr, we don't need to "handle" them all unless you are running directly from source), but that is the next project. This was all recently done for Pulsar so Galaxy can steal from there. =)

    TODO

    • [x] Replace distribute in bx-python with setuptools.
    • [x] Figure out how to build OS X wheels for distribution.
    • [x] Remove all eggs imports and eggs require()s
    • [x] ~~Get rid of the SVGFig dependency~~
    • [x] Do not attempt to install all dependencies every time (determine what's missing and attempt to install those).
    • [x] Fail to start if a virtualenv does not exist and cannot be created.
    • [x] Figure out how to support modified packages (Whoosh, sqlalchemy-migrate). pip install git+https:// works, but then it's a little trickier to correctly answer "Is sqlalchemy-migrate installed, and is it the right version?"
    • [x] Build and host wheels for everything that does not already have them (and possibly even the stuff already in PyPI). We should provide wheels for anything that does not provide them, possibly even for pure-python modules since it'll still be faster to install a wheel even though there is no C compilation involved.
    • [x] Figure out how to build psycopg2 ~~and MySQL_Python~~ (anyone using MySQL can just pip install MySQL-python (which common-startup.sh will in fact attempt to do)
    • [x] The pysam wheel is broken: ImportError: /home/nate/galaxy/galaxy/.venv/local/lib/python2.7/site-packages/pysam/calignmentfile.so: undefined symbol: hts_idx_load_literal
    • [x] The bx wheel is broken (possibly due to being built w/o numpy installed?): ImportError: No module named bigbed_file
    • [x] Test drmaa and pbs runners
    • [x] Create an sdist for everything on wheels.galaxyproject.org
    • [x] Test and document creating (or sharing) a venv w/ necessary wheels when running on the cluster
    • [x] Build remaining OS X wheels
    • [x] When using run.sh, ignore whatever virtualenv is active and use .venv. If you want to use something else, either start without run.sh or use run.sh --skip-venv. In the future, we won't mess with venvs at all since Galaxy should install in the standard Pythonic way.
    • [x] Merge updated SOABI code from pypa/pip#3075 into natefoo/pip linux-wheels branch

    Questions

    • ~~Should we pin exact versions of dependencies? I have a strong preference to say no, and as a concession say we should pin to "less than the next major version" e.g. SQLAlchemy<1.1. Although this does present one small problem - we will have to ensure that we always have wheels built for every version published on PyPI (we can certainly automate this) or else pip will bypass the wheels and install from source, unless we use pip install --only-binary. However, if we use --only-binary, then we cannot trivially fall back to compiling from source on platforms that we do not have wheels for.~~ I was won over by the argument that we should pin exact versions ("exact set of versioned dependencies for an exact version of Galaxy), but that we will:
      • Fetch new dependency versions as they are released
      • Build wheels for them
      • Test Galaxy against these new wheel versions
      • If the tests pass, automatically issue a PR against Galaxy to bump the version
    • ~~Should the existence of $galaxy_root/.venv take precedence over $VIRTUAL_ENV (since I keep installing all of Galaxy's wheels in my tox venv =P)?~~use --skip-venv if you don't want .venv to take precedence.
    kind/enhancement 
    opened by natefoo 51
  • After upgrade to release_16.01: jobs are no longer run

    After upgrade to release_16.01: jobs are no longer run

    Hi,

    we've upgraded our Galaxy instances to release 16.01 and since then found that none of our tools work any longer. One test case is to upload a text file with the "upload1" tool. In the logs we see that a job is created, we also see it in the database. The job remains in the "new" state without changes.

    Upon starting Galaxy we see that the main Galaxy Queue Worker is initialized to run on our postgresql database, which contains the job records, and we see that 4 LocalRunner workers are started.

    The uploaded files are in fact created in new_file_path as upload_file_data_0Sf_99 (and have the expected contents), but in the job_working_directory they only appear as zero-sized files.

    Could you please give us a hint as to what's going on here?

    area/admin 
    opened by rekado 48
  • Run Galaxy fully under uWSGI, including job handlers

    Run Galaxy fully under uWSGI, including job handlers

    Rationale

    Previously, if running Galaxy under uWSGI, it was necessary to configure and start separate Paste or application-only (scripts/galaxy-main) Galaxy servers for job handling in order to avoid race conditions with job handler selection, and to separate web workers from job handlers for performance/scalability. This also meant needing some sort of process management (e.g. supervisor) to manage all of these individual processes.

    uWSGI Mules are processes forked from the uWSGI master process after the application is loaded. Mules can also exec() specified arbitrary code, and come with some very nice features:

    1. They can receive messages from uWSGI worker processes.
    2. They can be grouped together in to "farms" such that messages sent to a farm are received only by mules in that farm
    3. They are controlled by the uWSGI master process and can be stopped and started all from a single command line.

    Usage

    This PR introduces the ability to run Galaxy job handlers as mules. In the simplest form, you can:

    $ GALAXY_UWSGI=1 sh run.sh
    

    This will run with a command line like:

    $ uwsgi  --virtualenv /home/nate/work/galaxy/.venv --ini-paste config/galaxy.ini --processes 1 --threads 4 --http localhost:8080 --pythonpath lib --master --static-map /static/style=/home/nate/work/galaxy/static/style/blue --static-map /static=/home/nate/work/galaxy/static --paste-logger --die-on-term --enable-threads --py-call-osafterfork
    

    You can override these defaults (other than the booleans like --master and --enable-threads) with a [uwsgi] section in galaxy.ini, or just configure in galaxy.ini and run uwsgi directly.

    By default, with no job_conf.xml, jobs will be run in uWSGI web worker processes, as they were with Paste. This is to keep things simple at first. To run jobs in mules, you only need to start them and add them to the correct farm, which must be named job-handlers. Be aware that there are some caveats (below) if you have a job_conf.xml. Mules are added in any of the following ways (command line, ini config file, yaml config file):

    $ GALAXY_UWSGI=1 sh run.sh --mule=lib/galaxy/main.py --mule=lib/galaxy/main.py --farm=job-handlers:1,2
    
    [uwsgi]
    mule = lib/galaxy/main.py
    mule = lib/galaxy/main.py
    farm = job-handlers:1,2
    
    uwsgi:
        mule: lib/galaxy/main.py
        mule: lib/galaxy/main.py
        farm: job-handlers:1,2
    

    For more handlers, simply add additional mule options and add their IDs to the farm option.

    Design

    Where possible, I have tried to make this as stack-agnostic and purpose-inspecific as possible. There is a bit of ugliness around how mules are designated as job handlers (they have to be in a farm named job-handlers), but the goal is to make it easy for anyone going forward to send tasks to mules for asynchronous execution. You'll see a few references to "pools," which is a sort of stack-agnostic abstraction of uWSGI Farms.

    For most other functions you might want to push out to mules, it should be as simple as:

    1. Add a new message class as in galaxy.web.stack.message
    2. Create a message handler and register it with app.application_stack.register_message_handler
    3. Send messages to mules with app.application_stack.send_message

    For jobs, messages are only being used for handler selection. We create Jobs in the web workers at tool execution time just as we did before, but they are committed to the database with a null in the handler field, where before they always had to have a handler set at creation time. Mule messages only include the target message handler function, task to perform (setup) and job ID of the job. A mule will receive the message and write its server_name to the handler field, and then pick the job up as they did before without any further jobs code modification.

    Server names

    Under uWSGI, server names are manipulated using the template {server_name}.{pool_name}.{instance_id} where:

    • {server_name} is the original server_name, configurable in the app config (or with Paste/webless, on the command line with --server-name), by default this is main
    • {pool_name} is the worker pool or farm name, for web workers (the processes forked based on the processes uWSGI option) this is web, for mules this is the farm name, e.g. job-handlers
    • {instance_id} is the 1-based index of the server in its pool, for web workers this is its uWSGI-assigned worker id (an integer starting at 1), for mules this is its 1-indexed position in the farm argument

    So in the following galaxy.yml:

    uwsgi:
        processes: 4
        mule: lib/galaxy/main.py
        mule: lib/galaxy/main.py
        mule: lib/galaxy/main.py
        farm: job-handlers:2,3
        farm: something-else:1
    galaxy:
        server_name: galaxy
    

    uWSGI starts 4 web workers, 2 job handlers, and another mule with server_names:

    galaxy.web.1
    galaxy.web.2
    galaxy.web.3
    galaxy.web.4
    galaxy.job-handlers.1
    galaxy.job-handlers.2
    galaxy.something-else.1
    

    This information is important when you want to statically or dynamically map handlers rather than use the default.

    Caveats

    In order to attempt to support existing job_conf.xml files that have a default <handlers> block, jobs are mapped to handlers in the following manner:

    • If you do not have a job_conf.xml, or have a job_conf.xml with no <handlers> block:
      • If started without a configured job-handlers farm or a non-uWSGI server: web workers are job handlers
      • If started with a job-handlers farm: mules are job handlers
    • If you have a <handlers> block and do not have a default= set in <handlers>:
      • Works the same as if you have no job_conf.xml, except explicitly specified static/dynamic handler mappings will result in the specified handler being assigned
    • If you have a <handlers> block and do have a default= set in <handlers>:
      • The default handler or explicit static/dynamic handler specified is assigned

    As before, if a specified handler is assigned and the specified handler is a tag, a handler with that tag is chosen at random. If a handler is assigned due to an explicit static/dynamic mapping, mule messages are not used, the specified handler ID is simply set on the job record in the database.

    One way to mix automatic web/mule handling with mapped handling is to define multiple <handler>s but not a default, since by default, jobs will be sent to the web/mule handlers, and only tools specifically mapped to handlers will be sent to the named handlers. It is possible to map tools to mule handlers in job_conf.xml as well, using server_names main.job-handlers.1, main.job-handlers.2, ...

    This is complicated and perhaps we should do things less magically, but as usually for Galaxy, I am trying to take the approach of least intervention by admins.

    There is some functionality included for templating the server name for greater control - e.g. if you run Galaxy on multiple servers, the server_name (which is persisted in the database and used for job recovery) would need to include some identifier unique for each host. However, configuration for this is not exposed. In the short term, people in that situation (are there any other than me?) can always continue running handlers externally.

    Zerg mode is untested and you would have the potential to encounter race conditions during restarts, especially with respect to job recovery.

    Configurability

    I went through multiple iterations on how to make things configurable. For example:

    ---
    stack:
      workers:
        - name: default-job-handlers
          purpose: handle_jobs    # this essentially controls the role of the mule, what portions of the application it loads, etc.
          processes: 4
          server_name: "{server_name}.{pool_name}.{process_num}"
        - name: special-job-handlers
          purpose: handle_jobs
          processes: 2
          server_name: "{server_name}.{pool_name}.{process_num}"
        - name: spam-spam-spam
          # default purpose, just run the galaxy application
        - name: eggs
          type: standalone    # "webless" galaxy process started externally
    

    This would translate in to a command line like:

    $ uwsgi ... --mule=lib/galaxy/main.py --mule=lib/galaxy/main.py \
        --mule=lib/galaxy/main.py --mule=lib/galaxy/main.py \
        --mule=lib/galaxy/main.py --mule=lib/galaxy/main.py \
        --mule=lib/galaxy/main.py
        --farm=default-job-handlers:1,2,3,4 \
        --farm=special-job-handlers:5,6 \
        --farm=spam-spam-spam:7
    

    Prior to #3179, I'd made a separate YAML config for the containers interface. These configs use defaults set as class attributes on the container classes, and those defaults are merged recursively down the class inheritance chain.

    I wanted to do the same for the stack config, but with #3179, we can start merging YAML configs into the main Galaxy config. Ultimately (after some discussion on the Gitter channel) I've stripped the configurability out until we settle on whether or not and (if yes) how to support the hierarchical configs/defaults in a way compatible with the model @jmchilton created in that excellent PR.

    Invocations

    You can start under uWSGI using a variety of methods:

    ini-paste:

    $ uwsgi --ini-paste config/galaxy.ini ...
    

    ini

    $ uwsgi --ini config/galaxy.ini --module 'galaxy.webapps.galaxy.buildapp:uwsgi_app_factory()' ...
    

    yaml

    $ uwsgi --yaml config/galaxy.yml --module 'galaxy.webapps.galaxy.buildapp:uwsgi_app_factory()' ...
    

    separate app config

    Galaxy config file (ini or yaml) separate from the uWSGI config file (also ini or yaml):

    $ uwsgi --<ini|yaml> config/galaxy.<ini|yml> --set galaxy_config_file=config/galaxy.<ini|yml> ...
    

    no config file

    (For example):

    $ uwsgi --virtualenv /home/nate/work/galaxy/.venv --http localhost:8192 --die-on-term --enable-threads --py-call-osafterfork --master --processes 2 --threads 4 --pythonpath lib --static-map /static/style=/home/nate/work/galaxy/static/style/blue --static-map /static=/home/nate/work/galaxy/static --module 'galaxy.webapps.galaxy.buildapp:uwsgi_app_factory()' --set galaxy_config_file=config/galaxy.yml --mule=lib/galaxy/main.py --mule=lib/galaxy/main.py --farm=job-handlers:1,2
    

    Logging

    By default, everything logs to one stream, and you can't tell which messages come from which process. This isn't bad with one mule, with more it's unmanageable. You can fix this with the following logging config, which includes the use of custom filters in this PR that log the uWSGI worker and mule IDs in the log message:

    [loggers]
    keys = root, galaxy
    
    [handlers]
    keys = console
    
    [formatters]
    keys = generic
    
    [logger_root]
    level = INFO
    handlers = console
    
    [logger_galaxy]
    level = DEBUG
    handlers = console
    qualname = galaxy
    propagate = 0
    
    [handler_console]
    class = StreamHandler
    args = (sys.stderr,)
    level = DEBUG
    formatter = generic
    
    [formatter_generic]
    format = %(name)s %(levelname)-5.5s %(asctime)s [p:%(process)s,w:%(worker_id)s,m:%(mule_id)s] [%(threadName)s] %(message)s
    

    Or, even better, if you switch to a YAML Galaxy application config, you can use a logging.config.dictConfig dict with a special filename_template parameter on any logging.handlers.*FileHandler handlers that will be templated with galaxy.util.facts, e.g.:

    ---
    galaxy:
        debug: yes
        use_interactive: yes
        logging:
            version: 1
            root:
                # root logger
                level: INFO
                handlers:
                    - console
                    - files
            loggers:
                galaxy:
                    level: DEBUG
                    handlers:
                        - console
                        - files
                    qualname: galaxy
                    propagate: 0
            handlers:
                console:
                    class: logging.StreamHandler
                    level: DEBUG
                    formatter: generic
                    stream: ext://sys.stderr
                files:
                    class: logging.FileHandler
                    level: DEBUG
                    formatter: generic
                    filename: galaxy_default.log
                    filename_template: galaxy_{pool_name}_{server_id}.log
            formatters:
                generic:
                    format: "%(name)s %(levelname)-5.5s %(asctime)s [p:%(process)s,w:%(worker_id)s,m:%(mule_id)s] [%(threadName)s] %(message)s"
    

    This will result in 3 log files, galaxy_web_0.log (actually contains log messages for all workers), galaxy_job-handlers_1.log and galaxy_job-handlers_2.log.

    TODO

    For this PR:

    • [x] Support handler assignment for workflow invocations
    • [x] Figure out separate file logging for the workers and each mule
    • [x] Make sure get_uwsgi_args.py works with a variety of [uwsgi] (or lack of) settings in the config
    • [x] Finish refactoring and cleaning
    • [x] Fix linting, tests, etc., whitelist stricter code guidelines
    • [x] Document logging configuration improvements
    • [x] ~~Squash WIP commits~~ I think I want to preserve these
    • [x] Tests would be great ~~Β―\_(ツ)/Β―~~ thanks @jmchilton! (οΎ‰^^)οΎ‰
    • [x] Default to running jobs in worker(s) and don't start mules if there is no galaxy config file and no job_conf.xml and no mules configured.
    • [x] Fix thread join failure on shutdown
    • [x] Implement correct handler assignment (either web workers or mules) when a default job_conf.xml is in place
    • [x] Correct server_name templating for mules, ideally: {server_name}.{pool_name}.{pool_index} where pool_name is the farm name and pool_index is the mule's index in the farm, or at least {server_name}.{pool_name}.{server_id} where server_id is the mule_id
    • [x] Correct the information in this PR description
    • [x] ~~Also, it shouldn't block merging, but~~ I need to improve the way that a Galaxy determines whether stack messaging should be used for handler selection, and whether a mule is a handler.

    Future TODO

    For a post-PR issue:

    • Include a default config for proxying GIEs (work started in #2385)
    • If possible, mules should probably give up the message lock after some timeout. This may only be possible with signals since uwsgi.get_farm_msg() does not have a timeout param.
    • Default to uWSGI
    • Document recommended job config changes
    • Add configurability as described above
    • Add configurability for workflow scheduling handlers
    • Support Zerg mode
    • ~~get_uwsgi_args.py won't play nice with mule and farm settings in the uWSGI config file ([uwsgi] section in galaxy.ini or wherever it is in your case)~~
    • Run handlers without exec(): unbit/uwsgi#1608
    • Run multiple handler pools for mapping jobs in different ways
    • To my knowledge, no other WSGI application stacks support fork/execing asynchronous workers, let alone messaging them. However, I would like to incorporate non-uWSGI worker messaging into the stack code so we could at least send messages to webless Galaxy processes. My initial thought on doing this is to add a stack transport (see galaxy.web.stack.transport) that interfaces with the AMQP support already in galaxy.queues. But alternatively, maybe the stack messaging stuff should be decoupled from the stack entirely and merged directly in to galaxy.queues.
    kind/feature area/jobs area/performance area/framework 
    opened by natefoo 45
  • Add deprecation notice for Python 2.7 support to release notes

    Add deprecation notice for Python 2.7 support to release notes

    A lot of Galaxy dependencies are in the process of dropping support for Python 2, see e.g. https://python3statement.org/

    Even pip now (since release 19.0) displays the following warning:

    DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.

    If we keep supporting Python 2.7, we will have to pin dependencies to old and potentially broken/insecure versions.

    I suggest we add a deprecation notice to the release notes (for 19.01 if possible), specifying which Galaxy release is going to be the last supporting Python 2.7 .

    We should also start moving our test servers (e.g. https://test.galaxyproject.org/ ) to Python 3, to iron the last bugs. This may require the update of some Ansible playbooks and roles.

    xref. #1715

    area/documentation 
    opened by nsoranzo 44
  • Galaxy 'uWSGI process got Segmentation Fault' on OS X.

    Galaxy 'uWSGI process got Segmentation Fault' on OS X.

    The Galaxy uwsgi process got segmentation fault occasionally on Mac OS X, which happened on multiple Mac laptops. The problem could go away after a restart or re-loggin of the OS X. It also could be gone after many retries of starting Galaxy.

    kind/bug 
    opened by qiagu 43
  • bug with hidden_data?

    bug with hidden_data?

    https://github.com/galaxyproject/tools-devteam/issues/341 In release 16_01, devteam's cufflinks 2.2.1.0 was failed and I guess it's caused by a galaxy's bug with hidden_data. Actually I can't find how hidden_data is set.

    In cufflinks_wrapper.xml, the following code ... ## Include reference annotation? #if $reference_annotation.use_ref == "Use reference annotation": -G $reference_annotation.reference_annotation_file $reference_annotation.compatible_hits_norm #end if #if $reference_annotation.use_ref == "Use reference annotation guide": -g $reference_annotation.reference_annotation_guide_file --3-overhang-tolerance=$reference_annotation.three_overhang_tolerance --intron-overhang-tolerance=$reference_annotation.intron_overhang_tolerance $reference_annotation.no_faux_reads #end if ... ## Include global model if available. #if $global_model: --global_model=$global_model #end if ... <param name="global_model" type="hidden_data" label="Global model (for use in Trackster)" optional="True"/>

    generated the following in tool_script.sh where $global_model is set to the value of $reference_annotation.reference_annotation_guide_file.

    python /home/galaxy/dev/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/cufflinks/64698e16f4a6/cufflinks/cufflinks_wrapper.py --input=/home/galaxy/dev/database/files/000/dataset_634.dat --assembled-isoforms-output=/home/galaxy/dev/database/files/000/dataset_654.dat --num-threads="${GALAXY_SLOTS:-4}" -I 300000 -F 0.1 -j 0.15 -g /home/galaxy/dev/database/files/000/dataset_26.dat --3-overhang-tolerance=600 --intron-overhang-tolerance=50 -b --ref_file=/data/hg19/hg19dip.fa -u --global_model=/home/galaxy/dev/database/files/000/dataset_26.dat

    kind/bug area/tool-framework 
    opened by methylome 42
  • Toolform support for selecting datasets from within collections

    Toolform support for selecting datasets from within collections

    My users are finding dataset collections to be not so user-friendly. They generate collections (e.g. sequencing data), then do the map step (assembly), and then are stuck not being able to access the data within collections. They want to do manual analysis of different files within that collection.

    Within the "select single/multiple datasets" UI, it would be nice if collections were listed alongside (maybe bold) and then datasets within collections listed below the header and indented. Much like how timezones are used as headers here https://select2.github.io/examples.html

    kind/enhancement area/UI-UX feature-request 
    opened by hexylena 41
  • Add InteractiveTools.

    Add InteractiveTools.

    Basic idea: launch a container-backed Galaxy Tool and get access to content inside in real time.

    • Give user access via uwsgi (inspired by https://github.com/galaxyproject/galaxy/pull/2385), or another proxy.
    • Access is based upon a key, key_type, token mapping to a host, port. You can have and access any number of RealTimeTools at a time.
    • Port and entry url-path snippet are specified in standard tool.xml files.
    • You can specify any number of ports, so a single RealTimeTool can give access to multiple running applications.
    • RealTimeTools can be added to and installed from the ToolShed.
    • Currently working for docker in local runner.
    kind/enhancement kind/feature area/tool-framework 
    opened by blankenberg 40
  • Python 3 support.

    Python 3 support.

    Python 2.7 will not be maintained past 2020.

    Moreover, some Galaxy dependencies dropped Python2 support: cachetools, cmd2, cwltool, numpy, schema-salad

    Add support for Python >= 3.5 while maintaining support for Python 2.7.

    xref.: https://trello.com/c/dZcCVf9I/2702-migrate-to-python-3

    Useful tools and documentation

    https://docs.python.org/3/howto/pyporting.html https://docs.python.org/2/library/2to3.html http://python-future.org/ https://six.readthedocs.io/ http://python3porting.com/

    Dependencies which need to be ported, dropped or updated

    • [x] bx-python: https://github.com/bxlab/bx-python/pull/7, https://github.com/bxlab/bx-python/pull/25, https://github.com/galaxyproject/galaxy/pull/5492
    • [x] Cheetah: replaced with Cheetah3 fork in https://github.com/galaxyproject/galaxy/pull/5359
    • [x] Fabric: fabric/fabric#1424, replaced with Fabric3 fork in https://github.com/galaxyproject/galaxy/pull/5359
    • [x] feedparser: was copied inside https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/util/sanitize_html.py , removed in https://github.com/galaxyproject/galaxy/pull/5642
    • [x] mercurial: https://www.mercurial-scm.org/wiki/Python3, we should use the command line interface through subprocess instead of using the Mercurial API https://github.com/galaxyproject/galaxy/pull/9065
    • [x] NoseHTML: PR #https://github.com/galaxyproject/nosehtml/pull/1
    • [x] paste
    • [x] pulsar: https://github.com/galaxyproject/pulsar/issues/152, https://github.com/galaxyproject/pulsar/commit/5b663c3027ec31ef2ce2c58335333bcdf593b5f8, https://github.com/galaxyproject/pulsar/pull/156, https://github.com/galaxyproject/galaxy/pull/5494
    • [x] pygithub3: used only in scripts/bootstrap_history.py, should be replaced with PyGithub
    • [x] sgmllib: removed in https://github.com/galaxyproject/galaxy/pull/5642
    • [x] SVGFig: issue #1717, PR #1747
    • [x] twill: branch 2.0 of Cito's fork supports Python >= 3.3, then https://github.com/galaxyproject/galaxy/pull/2656 can be reverted. Or better: replace with selenium tests
    • [x] WebError: unmaintained, even the latest 0.13.1 release doesn't support Python 3 and there is no interest in adding support, see e.g. https://github.com/Pylons/weberror/pull/18 . It could be replaced with Werkzeug, as suggested in https://github.com/galaxyproject/galaxy/pull/1017 (weberror removed in https://github.com/galaxyproject/galaxy/pull/8877)
    • [x] WebHelpers: unmaintained, removed in https://github.com/galaxyproject/galaxy/pull/5359

    Tips

    • Use
    2to3 -w -n FILE
    

    to port a file to Python3, then revert unnecessary changes and use six for anything that breaks Python2.7

    • If 2to3 suggests to enclose a call to the keys(), values() or items() methods of a dict inside list( ), it may be unnecessary, e.g. if the dict is just iterated over a loop. But if items are added or deleted from the dict (changing its size) inside the loop, then making a copy of it with list( ) is necessary.
    • Add
    from __future__ import print_function
    

    at the top of a file when changing print to print() in it.

    • Substitute UserDict.DictMixin with collections.(Mutable)Mapping and add missing methods to the class inheriting from it
    • Substitute string.letters, string.lowercase and string.uppercase with string.ascii_letters, string.ascii_lowercase and string.ascii_uppercase respectively
    • Change calls to string.maketrans to:
    try:
        maketrans = str.maketrans
    except AttributeError:
        from string import maketrans
    
    • Change calls to pipes.quote() to six.moves.shlex_quote()
    • Change calls to filter() and map(lambda ...) to list comprehensions or generator expressions
    • If d is a dict, change:
    d.keys()[0]
    d.values()[0]
    d.items()[0]
    

    to:

    next(iter(d.keys()))
    next(iter(d.values()))
    next(iter(d.items()))
    
    • If a class overrides the __nonzero__() method, this should be renamed to __bool__() and after its definition, the following alias should be added:
        __nonzero__ = __bool__
    
    • if the long type is needed, add the following line after the imports:
    if sys.version_info > (3,):
        long = int
    
    • Since Python 3.7 async and await are keywords, variables with these names need to be renamed.
    • In Python 3 the return value from subprocess.check_output() is a bytestring. If this value is going to be processed as a string, it needs to be decoded.
    kind/enhancement area/framework area/python3 
    opened by nsoranzo 40
  • no set_random_password in User when new user logging in to local toolshed

    no set_random_password in User when new user logging in to local toolshed

    I am working in v21.05. New user is logging in to our local toolshed, below error occurs. Authentication seems to work, though LDAP username is listed as None. Quick fix is to add the set_random_password code to tool_shed/webapp/model/init.py, this fixes the issue and allows user to get in. Upon adding this code, LDAP username is listed correctly as "myuser" in logs.

    galaxy.webapps.galaxy.controllers.user DEBUG 2023-01-03 15:35:04,935 [pN:main,p:50528,w:1,m:0,tN:uWSGIWorker1Core2] trans.app.config.auth_config_file: /galaxy/config/auth_conf.xml
    galaxy.auth.providers.ldap_ad DEBUG 2023-01-03 15:35:04,935 [pN:main,p:50528,w:1,m:0,tN:uWSGIWorker1Core2] LDAP authenticate: email is [email protected]
    galaxy.auth.providers.ldap_ad DEBUG 2023-01-03 15:35:04,936 [pN:main,p:50528,w:1,m:0,tN:uWSGIWorker1Core2] LDAP authenticate: username is None
    galaxy.auth.providers.ldap_ad DEBUG 2023-01-03 15:35:04,936 [pN:main,p:50528,w:1,m:0,tN:uWSGIWorker1Core2] LDAP authenticate: options are {'allow-register': 'False', 'auto-register': 'True', 'allow-password-change': 'False', 'server': 'ldap://myldap.work.edu', 'login-use-username': 'False', 'continue-on-failure': 'False', 'search-fields': 'uid,mail', 'search-base': 'dc=acc,dc=work,dc=edu', 'search-filter': '(mail={email})', 'bind-user': '{dn}', 'bind-password': '{password}', 'auto-register-username': '{uid}', 'auto-register-email': '{mail}', 'redact_username_in_logs': False, 'no_password_check': False}
    galaxy.auth.providers.ldap_ad DEBUG 2023-01-03 15:35:04,947 [pN:main,p:50528,w:1,m:0,tN:uWSGIWorker1Core2] LDAP authenticate: dn is uid=myuser,ou=Users,dc=blah,dc=work,dc=edu
    galaxy.auth.providers.ldap_ad DEBUG 2023-01-03 15:35:04,947 [pN:main,p:50528,w:1,m:0,tN:uWSGIWorker1Core2] LDAP authenticate: search attributes are {'uid': [b'myuser'], 'mail': [b'[email protected]']}
    galaxy.auth.providers.ldap_ad DEBUG 2023-01-03 15:35:04,963 [pN:main,p:50528,w:1,m:0,tN:uWSGIWorker1Core2] LDAP authenticate: whoami is dn:uid=myuser,ou=Users,dc=blah,dc=work,dc=edu
    galaxy.auth.providers.ldap_ad DEBUG 2023-01-03 15:35:04,963 [pN:main,p:50528,w:1,m:0,tN:uWSGIWorker1Core2] LDAP authentication successful
    galaxy.auth.util DEBUG 2023-01-03 15:35:04,966 [pN:main,p:50528,w:1,m:0,tN:uWSGIWorker1Core2] Email: [email protected], auto-register with username: myuser
    Traceback (most recent call last):
      File "lib/galaxy/web/framework/middleware/error.py", line 154, in __call__
        app_iter = self.application(environ, sr_checker)
      File "lib/galaxy/web/framework/middleware/xforwardedhost.py", line 23, in __call__
        return self.app(environ, start_response)
      File "/galaxy/.venv/lib/python3.6/site-packages/paste/translogger.py", line 69, in __call__
        return self.application(environ, replacement_start_response)
      File "/galaxy/.venv/lib/python3.6/site-packages/paste/recursive.py", line 85, in __call__
        return self.application(environ, start_response)
      File "/galaxy/.venv/lib/python3.6/site-packages/routes/middleware.py", line 153, in __call__
        response = self.app(environ, start_response)
      File "/galaxy/.venv/lib/python3.6/site-packages/paste/httpexceptions.py", line 640, in __call__
        return self.application(environ, start_response)
      File "lib/galaxy/web/framework/base.py", line 138, in __call__
        return self.handle_request(environ, start_response)
      File "lib/galaxy/web/framework/base.py", line 217, in handle_request
        body = method(trans, **kwargs)
      File "lib/tool_shed/webapp/controllers/user.py", line 68, in login
        response = self.__validate_login(trans, **kwd)
      File "lib/galaxy/webapps/galaxy/controllers/user.py", line 139, in __validate_login
        message, user = self.__autoregistration(trans, login, password)
      File "lib/galaxy/webapps/galaxy/controllers/user.py", line 97, in __autoregistration
        user = self.user_manager.create(email=email, username=username, password="")
      File "lib/galaxy/managers/users.py", line 116, in create
        user.set_random_password()
    AttributeError: 'User' object has no attribute 'set_random_password'
    
    opened by jhl667 0
  • entrypoints - use timeout instead of interval to prevent stacking requests

    entrypoints - use timeout instead of interval to prevent stacking requests

    closes #15276

    How to test the changes?

    (Select all options that apply)

    • [ ] I've included appropriate automated tests.
    • [x] This is a refactoring of components with existing test coverage.
    • [ ] Instructions for manual testing are as follows:
      1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

    License

    • [x] I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.
    kind/bug area/performance 
    opened by martenson 1
  • Entry point requests pile up

    Entry point requests pile up

    api/entry_points?running=true is always being polled at a set interval, even if the response is not coming in. This leads to a pileup of requests that (if no countermeasure are taken on the server side) will lead to degraded performance. A fix could be to:

    • use setTimeout instead of setInterval
    • not to fire the request if the page isn't visible

    we're doing that already in watchHistory, I think the pattern could probably be extracted into a composable (if vueuse doesn't already have something we can reuse).

    kind/bug area/UI-UX area/performance 
    opened by mvdbeek 0
  • Unable to launch Galaxy web interface using http://localhost:8080

    Unable to launch Galaxy web interface using http://localhost:8080

    I am trying to install and run Galaxy (release 22.05) on a remote server. Upon running the file run.sh, I get the following output on my terminal window:-

    ... serving on http://0.0.0.0:8080 galaxy.model.database_heartbeat DEBUG 2023-01-03 18:38:15,430 [pN:main.1,p:354254,tN:database_heartbeart_main.1.thread] main.1 is config watcher galaxy.queue_worker INFO 2023-01-03 18:38:15,648 [pN:main.1,p:354254,tN:Thread-1] Instance 'main.1' received 'rebuild_toolbox_search_index' task, executing now. galaxy.tools.search DEBUG 2023-01-03 18:38:15,648 [pN:main.1,p:354254,tN:Thread-1] Starting to build toolbox index of panel default. galaxy.tools.search DEBUG 2023-01-03 18:38:15,675 [pN:main.1,p:354254,tN:Thread-1] Toolbox index of panel default finished (26.428 ms) galaxy.tools.search DEBUG 2023-01-03 18:38:15,675 [pN:main.1,p:354254,tN:Thread-1] Starting to build toolbox index of panel ontology:edam_operations. galaxy.tools.search DEBUG 2023-01-03 18:38:15,695 [pN:main.1,p:354254,tN:Thread-1] Toolbox index of panel ontology:edam_operations finished (20.114 ms) galaxy.tools.search DEBUG 2023-01-03 18:38:15,696 [pN:main.1,p:354254,tN:Thread-1] Starting to build toolbox index of panel ontology:edam_topics. galaxy.tools.search DEBUG 2023-01-03 18:38:15,714 [pN:main.1,p:354254,tN:Thread-1] Toolbox index of panel ontology:edam_topics finished (18.627 ms) 2023-01-03 18:38:19,720 INFO success: gunicorn entered RUNNING state, process has stayed up for > than 15 seconds (startsecs)

    ==> /data/galaxy/database/gravity/log/celery-beat.log <== [2023-01-03 18:43:06,505: DEBUG/MainProcess] beat: Synchronizing schedule... [2023-01-03 18:43:06,519: DEBUG/MainProcess] beat: Waking up in 5.00 minutes.

    I have already changed the bind setting in the galaxy.yml file (that I copied from config/galaxy.yml.sample) to 0.0.0.0:8080 to allow binding to any available network interfaces on port 8080. When I try to launch the web interface on my web browser (Google Chrome) using http://127.0.0.1:8080 or http://localhost:8080, the connection is refused by the browser. I also tried changing the port to 9090, but that didn't help either (both of these ports are available to use on my server).

    Any help fixing this issue would be greatly appeciated! Thanks in advance!

    opened by varuntrivedi18 0
  • Bandaid for failing selenium pair uploads

    Bandaid for failing selenium pair uploads

    See more at https://github.com/galaxyproject/galaxy/pull/15262#issuecomment-1370208186

    WIP while I try to chase down my hunch in that thread.

    fixes #15111

    How to test the changes?

    (Select all options that apply)

    • [ ] I've included appropriate automated tests.
    • [ ] This is a refactoring of components with existing test coverage.
    • [ ] Instructions for manual testing are as follows:
      1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

    License

    • [x] I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.
    area/testing/selenium 
    opened by dannon 0
  • Fix form builder in admin panel

    Fix form builder in admin panel

    Fixes: #15269. The form builder in the admin panel is not a frequently used feature. Recently it had several issues and was not working properly. Unfortunately, it is not covered by current tests, it is not a lot of code, but probably needs to be refactored before it can be properly tested as an api. For now, this PR restores its functionality. The form builder allows uploading a csv file describing the form inputs, here is the example: formbuilder_demo.csv, and here is demo screencast on how the form builder works.

    https://user-images.githubusercontent.com/2105447/210455596-198166d8-72de-4131-a0a6-43f780ef8eaa.mov

    How to test the changes?

    (Select all options that apply)

    • [ ] I've included appropriate automated tests.
    • [ ] This is a refactoring of components with existing test coverage.
    • [ ] Instructions for manual testing are as follows:
      1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

    License

    • [x] I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.
    kind/bug area/UI-UX 
    opened by guerler 0
Owner
Galaxy Project
Galaxy is an open, web-based platform for data-intensive research.
Galaxy Project
Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"

Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"

Kenji Hiranabe 3.2k Jan 8, 2023
Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code

A Python framework for creating reproducible, maintainable and modular data science code.

QuantumBlack Labs 7.9k Jan 1, 2023
CS 506 - Computational Tools for Data Science

CS 506 - Computational Tools for Data Science Code, slides, and notes for Boston University CS506 Fall 2021 The Final Project Repository can be found

Lance Galletti 14 Mar 23, 2022
A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

Cookiecutter Data Science A logical, reasonably standardized, but flexible project structure for doing and sharing data science work. Project homepage

Jon C Cline 0 Sep 5, 2021
A framework for feature exploration in Data Science

Beehive A framework for feature exploration in Data Science Background What do we do when we finish one episode of feature exploration in a jupyter no

Steven IJ 1 Jan 3, 2022
ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.

ReproZip ReproZip is a tool aimed at simplifying the process of creating reproducible experiments from command-line executions, a frequently-used comm

null 267 Jan 1, 2023
collection of interesting Computer Science resources

collection of interesting Computer Science resources

Kirill Bobyrev 137 Dec 22, 2022
PsychoPy is an open-source package for creating experiments in behavioral science.

PsychoPy is an open-source package for creating experiments in behavioral science. It aims to provide a single package that is: precise enoug

PsychoPy 1.3k Dec 31, 2022
Algorithms covered in the Bioinformatics Course part of the Cambridge Computer Science Tripos

Bioinformatics This is a repository of all the algorithms covered in the Bioinformatics Course part of the Cambridge Computer Science Tripos Algorithm

null 16 Jun 30, 2022
An interactive explorer for single-cell transcriptomics data

an interactive explorer for single-cell transcriptomics data cellxgene (pronounced "cell-by-gene") is an interactive data explorer for single-cell tra

Chan Zuckerberg Initiative 424 Dec 15, 2022
3D visualization of scientific data in Python

Mayavi: 3D visualization of scientific data in Python Mayavi docs: http://docs.enthought.com/mayavi/mayavi/ TVTK docs: http://docs.enthought.com/mayav

Enthought, Inc. 1.1k Jan 6, 2023
🍊 :bar_chart: :bulb: Orange: Interactive data analysis

Orange Data Mining Orange is a data mining and visualization toolbox for novice and expert alike. To explore data with Orange, one requires no program

Bioinformatics Laboratory 3.9k Jan 5, 2023
Efficient Python Tricks and Tools for Data Scientists

Why efficient Python? Because using Python more efficiently will make your code more readable and run more efficiently.

Khuyen Tran 944 Dec 28, 2022
metedraw is a project mainly for data visualization projects of Atmospheric Science, Marine Science, Environmental Science or other majors

It is mainly for data visualization projects of Atmospheric Science, Marine Science, Environmental Science or other majors.

Nephele 11 Jul 5, 2022
CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.

CKAN: The Open Source Data Portal Software CKAN is the world’s leading open-source data portal platform. CKAN makes it easy to publish, share and work

ckan 3.6k Dec 27, 2022
An Open Source Machine Learning Framework for Everyone

Documentation TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, a

null 170.1k Jan 4, 2023
An Open Source Machine Learning Framework for Everyone

Documentation TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, a

null 170.1k Jan 5, 2023
An Open Source Machine Learning Framework for Everyone

Documentation TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, a

null 153.2k Feb 13, 2021
Anime Streams Scrapper for Telegram Publicly Available for everyone to use

AniRocks Project Structure: ╭─ bot β”œβ”€β”€β”€β”€ plugins: directory stored all the plugins β”œβ”€β”€β”€β”€ utils: a directory of Utilities to help bot Client to create

ポキ 11 Oct 28, 2022
Auto Join: A GitHub action script to automatically invite everyone to the organization who comment at the issue page.

Auto Invite To Org By Issue Comment A GitHub action script to automatically invite everyone to the organization who comment at the issue page. What is

Max Base 6 Jun 8, 2022