Data intensive science for everyone.

The latest information about Galaxy can be found on the Galaxy Community Hub.

Community support is available at Galaxy Help.

Galaxy Quickstart

Galaxy requires Python 3.6 . To check your Python version, run:

$ python -V
Python 3.6.10

Start Galaxy:

$ sh run.sh

Once Galaxy completes startup, you should be able to view Galaxy in your browser at: http://localhost:8080

For more installation details please see: https://getgalaxy.org/

Documentation is available at: https://docs.galaxyproject.org/

Tutorials on how to use Galaxy, perform scientific analyses with it, develop Galaxy and its tools, and admin a Galaxy server are at: https://training.galaxyproject.org/

Tools

Tools can be either installed from the Tool Shed or added manually. For details please see the tutorial. Note that not all dependencies for the tools provided in the tool_conf.xml.sample are included. To install them please visit "Manage dependencies" in the admin interface.

Issues and Galaxy Development

Please see CONTRIBUTING.md .

This is a complete rebuild of the right hand history panel. Since it's such a big code change, I built it as a beta feature that is accessed by first logging in, then going to the giant Gear-Menu-Of-Doom™ and picking "Beta History Panel" at the bottom of the list.

Definitely still a WIP. There's a lot to look at here, and I'm still debugging some of it. Unit tests are coming in the next commit.

Goals: Cache and manage API requests to reduce server/client traffic. Modernize the UI code Make a testable UI Pave the way for more extensive history feature additions in the future (history graph, real-time updates) Implement a reactive data model.

Notable changes

Database Timestamp Triggers

One way to reduce server traffic is to compare dates and only ask for a smaller set of results. To that end, I have implemented triggers to keep the history table's update_time updated as changes occur to its contents. There are still more refinements to be done here.

Instead of creating some kind of complex ORM association object in the server side code, I thought it might be more performant to simply implement a couple timestamp triggers on the dataset, HDA, HDCA tables that update history when they change. I think this is one of the few uses for database triggers that really makes sense.

RxJS

RxJS is the primary mechanism behind the polling and manual content retrieval processes. Although RxJS can be a little more sophisticated to parse than something as simple as a promise, observables are vastly more powerful than promise based query implementations and pave the way for more advanced client-side data options such as real-time web-socket or push-server updates, collaborative editing, etc.

The most involved parts are currently in the model/ContentLoader objects, and I will be providing extensive documentation about how these objects work. These observables control when we request data from the api either through human interaction with the UI or through automatic server-updates, currently implemented through polling (but since incoming changes are represented as observables, it will be trivial to transition to any other update mechanism such as a websocket.)

RxDB / IndexedDB

HDA/HDAC/Dataset and Collection data are stored in IndexedDB using an NPM package RxDB which allows our components to generate and subscribe to live queries that point at the local cached contents. These live query observables will emit updates when the cache does.

Since IndexDB is local and shared across browser tabs, this means UI data updates are instant across tabs. You won't be able to have stale data in one tab that doesn't jive with another tab any more. Nor is it necessary to ever "refresh" your history.

Forthcoming: Unit tests Selenium tests Documentation for the ContentLoader, polling cycle, caching mechanisms Possibly implement shared layout components with tool panel redesign PR

kind/enhancement area/UI-UX kind/feature area/API area/histories area/client

The following exception occurs while running a very large workflow:

galaxy.metadata DEBUG 2020-09-18 19:35:46,971 setting metadata externally failed for HistoryDatasetAssociation 54263: Metadata results could not be read from '/data/jobs_directory/024/24438/metadata/metadata_results_outfile'
galaxy.jobs.runners ERROR 2020-09-18 19:35:48,127 (24438/galaxy-islandcompare-24438) Job wrapper finish method failed
Traceback (most recent call last):
  File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1247, in _execute_context
    self.dialect.do_execute(
  File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.DeadlockDetected: deadlock detected
DETAIL:  Process 25865 waits for ShareLock on transaction 646521; blocked by process 4054.
Process 4054 waits for ShareLock on transaction 646527; blocked by process 25743.
Process 25743 waits for ExclusiveLock on tuple (11,36) of relation 17171 of database 16396; blocked by process 25865.
HINT:  See server log for query details.
CONTEXT:  while updating tuple (11,36) in relation "history"
SQL statement "UPDATE history
                    SET update_time = (now() at time zone 'utc')
                    WHERE id = NEW.history_id OR id = OLD.history_id"
PL/pgSQL function update_history_update_time() line 9 at SQL statement


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/srv/galaxy/lib/galaxy/jobs/runners/__init__.py", line 540, in _finish_or_resubmit_job
    job_wrapper.finish(tool_stdout, tool_stderr, exit_code, check_output_detected_state=check_output_detected_state, job_stdout=job_stdout, job_stderr=job_stderr)
  File "/srv/galaxy/lib/galaxy/jobs/__init__.py", line 1713, in finish
    self.sa_session.flush()
  File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/orm/scoping.py", line 162, in do
    return getattr(self.registry(), name)(*args, **kwargs)
  File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2496, in flush
    self._flush(objects)
  File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2637, in _flush
    transaction.rollback(_capture_exception=True)
  File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
    raise exception
  File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2597, in _flush
    flush_context.execute()
  File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/orm/unitofwork.py", line 422, in execute
    rec.execute(self)
  File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/orm/unitofwork.py", line 586, in execute
    persistence.save_obj(
  File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 230, in save_obj
    _emit_update_statements(
  File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 994, in _emit_update_statements
    c = cached_connections[connection].execute(
  File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 984, in execute
    return meth(self, multiparams, params)
  File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 293, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1097, in _execute_clauseelement
    ret = self._execute_context(
  File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1287, in _execute_context
    self._handle_dbapi_exception(
  File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1481, in _handle_dbapi_exception
    util.raise_(
  File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
    raise exception
  File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1247, in _execute_context
    self.dialect.do_execute(
  File "/srv/galaxy/venv/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
DETAIL:  Process 25865 waits for ShareLock on transaction 646521; blocked by process 4054.
Process 4054 waits for ShareLock on transaction 646527; blocked by process 25743.
Process 25743 waits for ExclusiveLock on tuple (11,36) of relation 17171 of database 16396; blocked by process 25865.
HINT:  See server log for query details.
CONTEXT:  while updating tuple (11,36) in relation "history"
SQL statement "UPDATE history
                    SET update_time = (now() at time zone 'utc')
                    WHERE id = NEW.history_id OR id = OLD.history_id"
PL/pgSQL function update_history_update_time() line 9 at SQL statement

[SQL: UPDATE history_dataset_association SET update_time=%(update_time)s, info=%(info)s, blurb=%(blurb)s, peek=%(_peek)s, tool_version=%(tool_version)s, metadata=%(_metadata)s, version=%(version)s WHERE history_dataset_association.id = %(history_dataset_association_id)s]
[parameters: {'update_time': datetime.datetime(2020, 9, 18, 19, 35, 47, 120934), 'info': '', 'blurb': '2,897 lines, 98 comments', '_peek': '#\n#\n# Core-specific parameters:\n#\n#    Menu parameters:\n', 'tool_version': 'GNU Awk 5.0.1, API: 2.0\n', '_metadata': <psycopg2.extensions.Binary object at 0x7f51075e1ed0>, 'version': 2, 'history_dataset_association_id': 54263}]
(Background on this error at: http://sqlalche.me/e/e3q8)
galaxy.jobs ERROR 2020-09-18 19:35:48,180 fail(): Missing output file in working directory: [Errno 2] No such file or directory: '/data/jobs_directory/024/24438/outputs/galaxy_dataset_02b9521f-28a9-4cb2-9d9c-40f3a8088fd3.dat'

The dataset reports Unable to finish job as the error.

Galaxy 20.05 (autoscaled uwsgi app, 3 job handlers, 3 workflow schedulers)

uwsgi:
      buffer-size: 16384
      processes: 1
      threads: 4
      offload-threads: 2

I am not sure how to prevent this.

kind/bug

Smash Eggs

More importantly, replace all of our custom egg installation code with a very slightly modified version of pip. The modification is necessary if/until wheel is updated to include support for Linux wheels.

Other changes include

Removal of the awful lib/pkg_resources.py
Relocated the bit of eggs code that checked the Galaxy config to determine what dependencies are needed to a wheels module.
galaxyproject/docker-build has been updated with scripts and images to support building wheels for distribution on 32 and 64 bit Linux.

On virtualenvs

scripts/common_startup.sh will create a virtualenv at $galaxy_root/.venv if one is not already active and wheels will be installed to here. Otherwise, Galaxy will use whatever virtualenv is currently active (in $VIRTUAL_ENV).

Whatever virtualenv Galaxy is started under will be passed to jobs if a virtualenv is not already active in the job environment (and it exists at the same path on the compute resource). You can force jobs to use a different virtualenv using the <env> tag in a job destination, e.g.:

<destination id="foo">
    <env file="/path/to/venv/bin/activate"/>
</destination>

Galaxy needs to become installable in the traditional python setup.py install manner which will change some things about how virtualenvs are handled in the future (tl;dr, we don't need to "handle" them all unless you are running directly from source), but that is the next project. This was all recently done for Pulsar so Galaxy can steal from there. =)

TODO

[x] Replace distribute in bx-python with setuptools.
[x] Figure out how to build OS X wheels for distribution.
[x] Remove all eggs imports and eggs require()s
[x] ~~Get rid of the SVGFig dependency~~
[x] Do not attempt to install all dependencies every time (determine what's missing and attempt to install those).
[x] Fail to start if a virtualenv does not exist and cannot be created.
[x] Figure out how to support modified packages (Whoosh, sqlalchemy-migrate). pip install git+https:// works, but then it's a little trickier to correctly answer "Is sqlalchemy-migrate installed, and is it the right version?"
[x] Build and host wheels for everything that does not already have them (and possibly even the stuff already in PyPI). We should provide wheels for anything that does not provide them, possibly even for pure-python modules since it'll still be faster to install a wheel even though there is no C compilation involved.
[x] Figure out how to build psycopg2 ~~and MySQL_Python~~ (anyone using MySQL can just pip install MySQL-python (which common-startup.sh will in fact attempt to do)
[x] The pysam wheel is broken: ImportError: /home/nate/galaxy/galaxy/.venv/local/lib/python2.7/site-packages/pysam/calignmentfile.so: undefined symbol: hts_idx_load_literal
[x] The bx wheel is broken (possibly due to being built w/o numpy installed?): ImportError: No module named bigbed_file
[x] Test drmaa and pbs runners
[x] Create an sdist for everything on wheels.galaxyproject.org
[x] Test and document creating (or sharing) a venv w/ necessary wheels when running on the cluster
[x] Build remaining OS X wheels
[x] When using run.sh, ignore whatever virtualenv is active and use .venv. If you want to use something else, either start without run.sh or use run.sh --skip-venv. In the future, we won't mess with venvs at all since Galaxy should install in the standard Pythonic way.
[x] Merge updated SOABI code from pypa/pip#3075 into natefoo/pip linux-wheels branch

Questions

~~Should we pin exact versions of dependencies? I have a strong preference to say no, and as a concession say we should pin to "less than the next major version" e.g. SQLAlchemy<1.1. Although this does present one small problem - we will have to ensure that we always have wheels built for every version published on PyPI (we can certainly automate this) or else pip will bypass the wheels and install from source, unless we use pip install --only-binary. However, if we use --only-binary, then we cannot trivially fall back to compiling from source on platforms that we do not have wheels for.~~ I was won over by the argument that we should pin exact versions ("exact set of versioned dependencies for an exact version of Galaxy), but that we will:
- Fetch new dependency versions as they are released
- Build wheels for them
- Test Galaxy against these new wheel versions
- If the tests pass, automatically issue a PR against Galaxy to bump the version
~~Should the existence of $galaxy_root/.venv take precedence over $VIRTUAL_ENV (since I keep installing all of Galaxy's wheels in my tox venv =P)?~~use --skip-venv if you don't want .venv to take precedence.

kind/enhancement

Hi,

we've upgraded our Galaxy instances to release 16.01 and since then found that none of our tools work any longer. One test case is to upload a text file with the "upload1" tool. In the logs we see that a job is created, we also see it in the database. The job remains in the "new" state without changes.

Upon starting Galaxy we see that the main Galaxy Queue Worker is initialized to run on our postgresql database, which contains the job records, and we see that 4 LocalRunner workers are started.

The uploaded files are in fact created in new_file_path as upload_file_data_0Sf_99 (and have the expected contents), but in the job_working_directory they only appear as zero-sized files.

Could you please give us a hint as to what's going on here?

area/admin

Rationale

Previously, if running Galaxy under uWSGI, it was necessary to configure and start separate Paste or application-only (scripts/galaxy-main) Galaxy servers for job handling in order to avoid race conditions with job handler selection, and to separate web workers from job handlers for performance/scalability. This also meant needing some sort of process management (e.g. supervisor) to manage all of these individual processes.

uWSGI Mules are processes forked from the uWSGI master process after the application is loaded. Mules can also exec() specified arbitrary code, and come with some very nice features:

They can receive messages from uWSGI worker processes.
They can be grouped together in to "farms" such that messages sent to a farm are received only by mules in that farm
They are controlled by the uWSGI master process and can be stopped and started all from a single command line.

Usage

This PR introduces the ability to run Galaxy job handlers as mules. In the simplest form, you can:

$ GALAXY_UWSGI=1 sh run.sh

This will run with a command line like:

$ uwsgi  --virtualenv /home/nate/work/galaxy/.venv --ini-paste config/galaxy.ini --processes 1 --threads 4 --http localhost:8080 --pythonpath lib --master --static-map /static/style=/home/nate/work/galaxy/static/style/blue --static-map /static=/home/nate/work/galaxy/static --paste-logger --die-on-term --enable-threads --py-call-osafterfork

You can override these defaults (other than the booleans like --master and --enable-threads) with a [uwsgi] section in galaxy.ini, or just configure in galaxy.ini and run uwsgi directly.

By default, with no job_conf.xml, jobs will be run in uWSGI web worker processes, as they were with Paste. This is to keep things simple at first. To run jobs in mules, you only need to start them and add them to the correct farm, which must be named job-handlers. Be aware that there are some caveats (below) if you have a job_conf.xml. Mules are added in any of the following ways (command line, ini config file, yaml config file):

$ GALAXY_UWSGI=1 sh run.sh --mule=lib/galaxy/main.py --mule=lib/galaxy/main.py --farm=job-handlers:1,2

[uwsgi]
mule = lib/galaxy/main.py
mule = lib/galaxy/main.py
farm = job-handlers:1,2

uwsgi:
    mule: lib/galaxy/main.py
    mule: lib/galaxy/main.py
    farm: job-handlers:1,2

For more handlers, simply add additional mule options and add their IDs to the farm option.

Design

Where possible, I have tried to make this as stack-agnostic and purpose-inspecific as possible. There is a bit of ugliness around how mules are designated as job handlers (they have to be in a farm named job-handlers), but the goal is to make it easy for anyone going forward to send tasks to mules for asynchronous execution. You'll see a few references to "pools," which is a sort of stack-agnostic abstraction of uWSGI Farms.

For most other functions you might want to push out to mules, it should be as simple as:

Add a new message class as in galaxy.web.stack.message
Create a message handler and register it with app.application_stack.register_message_handler
Send messages to mules with app.application_stack.send_message

For jobs, messages are only being used for handler selection. We create Jobs in the web workers at tool execution time just as we did before, but they are committed to the database with a null in the handler field, where before they always had to have a handler set at creation time. Mule messages only include the target message handler function, task to perform (setup) and job ID of the job. A mule will receive the message and write its server_name to the handler field, and then pick the job up as they did before without any further jobs code modification.

Server names

Under uWSGI, server names are manipulated using the template {server_name}.{pool_name}.{instance_id} where:

{server_name} is the original server_name, configurable in the app config (or with Paste/webless, on the command line with --server-name), by default this is main
{pool_name} is the worker pool or farm name, for web workers (the processes forked based on the processes uWSGI option) this is web, for mules this is the farm name, e.g. job-handlers
{instance_id} is the 1-based index of the server in its pool, for web workers this is its uWSGI-assigned worker id (an integer starting at 1), for mules this is its 1-indexed position in the farm argument

So in the following galaxy.yml:

uwsgi:
    processes: 4
    mule: lib/galaxy/main.py
    mule: lib/galaxy/main.py
    mule: lib/galaxy/main.py
    farm: job-handlers:2,3
    farm: something-else:1
galaxy:
    server_name: galaxy

uWSGI starts 4 web workers, 2 job handlers, and another mule with server_names:

galaxy.web.1
galaxy.web.2
galaxy.web.3
galaxy.web.4
galaxy.job-handlers.1
galaxy.job-handlers.2
galaxy.something-else.1

This information is important when you want to statically or dynamically map handlers rather than use the default.

Caveats

In order to attempt to support existing job_conf.xml files that have a default <handlers> block, jobs are mapped to handlers in the following manner:

If you do not have a job_conf.xml, or have a job_conf.xml with no <handlers> block:
- If started without a configured job-handlers farm or a non-uWSGI server: web workers are job handlers
- If started with a job-handlers farm: mules are job handlers
If you have a <handlers> block and do not have a default= set in <handlers>:
- Works the same as if you have no job_conf.xml, except explicitly specified static/dynamic handler mappings will result in the specified handler being assigned
If you have a <handlers> block and do have a default= set in <handlers>:
- The default handler or explicit static/dynamic handler specified is assigned

As before, if a specified handler is assigned and the specified handler is a tag, a handler with that tag is chosen at random. If a handler is assigned due to an explicit static/dynamic mapping, mule messages are not used, the specified handler ID is simply set on the job record in the database.

One way to mix automatic web/mule handling with mapped handling is to define multiple <handler>s but not a default, since by default, jobs will be sent to the web/mule handlers, and only tools specifically mapped to handlers will be sent to the named handlers. It is possible to map tools to mule handlers in job_conf.xml as well, using server_names main.job-handlers.1, main.job-handlers.2, ...

This is complicated and perhaps we should do things less magically, but as usually for Galaxy, I am trying to take the approach of least intervention by admins.

There is some functionality included for templating the server name for greater control - e.g. if you run Galaxy on multiple servers, the server_name (which is persisted in the database and used for job recovery) would need to include some identifier unique for each host. However, configuration for this is not exposed. In the short term, people in that situation (are there any other than me?) can always continue running handlers externally.

Zerg mode is untested and you would have the potential to encounter race conditions during restarts, especially with respect to job recovery.

Configurability

I went through multiple iterations on how to make things configurable. For example:

---
stack:
  workers:
    - name: default-job-handlers
      purpose: handle_jobs    # this essentially controls the role of the mule, what portions of the application it loads, etc.
      processes: 4
      server_name: "{server_name}.{pool_name}.{process_num}"
    - name: special-job-handlers
      purpose: handle_jobs
      processes: 2
      server_name: "{server_name}.{pool_name}.{process_num}"
    - name: spam-spam-spam
      # default purpose, just run the galaxy application
    - name: eggs
      type: standalone    # "webless" galaxy process started externally

This would translate in to a command line like:

$ uwsgi ... --mule=lib/galaxy/main.py --mule=lib/galaxy/main.py \
    --mule=lib/galaxy/main.py --mule=lib/galaxy/main.py \
    --mule=lib/galaxy/main.py --mule=lib/galaxy/main.py \
    --mule=lib/galaxy/main.py
    --farm=default-job-handlers:1,2,3,4 \
    --farm=special-job-handlers:5,6 \
    --farm=spam-spam-spam:7

Prior to #3179, I'd made a separate YAML config for the containers interface. These configs use defaults set as class attributes on the container classes, and those defaults are merged recursively down the class inheritance chain.

I wanted to do the same for the stack config, but with #3179, we can start merging YAML configs into the main Galaxy config. Ultimately (after some discussion on the Gitter channel) I've stripped the configurability out until we settle on whether or not and (if yes) how to support the hierarchical configs/defaults in a way compatible with the model @jmchilton created in that excellent PR.

Invocations

You can start under uWSGI using a variety of methods:

ini-paste:

$ uwsgi --ini-paste config/galaxy.ini ...

ini

$ uwsgi --ini config/galaxy.ini --module 'galaxy.webapps.galaxy.buildapp:uwsgi_app_factory()' ...

yaml

$ uwsgi --yaml config/galaxy.yml --module 'galaxy.webapps.galaxy.buildapp:uwsgi_app_factory()' ...

separate app config

Galaxy config file (ini or yaml) separate from the uWSGI config file (also ini or yaml):

$ uwsgi --<ini|yaml> config/galaxy.<ini|yml> --set galaxy_config_file=config/galaxy.<ini|yml> ...

no config file

(For example):

$ uwsgi --virtualenv /home/nate/work/galaxy/.venv --http localhost:8192 --die-on-term --enable-threads --py-call-osafterfork --master --processes 2 --threads 4 --pythonpath lib --static-map /static/style=/home/nate/work/galaxy/static/style/blue --static-map /static=/home/nate/work/galaxy/static --module 'galaxy.webapps.galaxy.buildapp:uwsgi_app_factory()' --set galaxy_config_file=config/galaxy.yml --mule=lib/galaxy/main.py --mule=lib/galaxy/main.py --farm=job-handlers:1,2

Logging

By default, everything logs to one stream, and you can't tell which messages come from which process. This isn't bad with one mule, with more it's unmanageable. You can fix this with the following logging config, which includes the use of custom filters in this PR that log the uWSGI worker and mule IDs in the log message:

[loggers]
keys = root, galaxy

[handlers]
keys = console

[formatters]
keys = generic

[logger_root]
level = INFO
handlers = console

[logger_galaxy]
level = DEBUG
handlers = console
qualname = galaxy
propagate = 0

[handler_console]
class = StreamHandler
args = (sys.stderr,)
level = DEBUG
formatter = generic

[formatter_generic]
format = %(name)s %(levelname)-5.5s %(asctime)s [p:%(process)s,w:%(worker_id)s,m:%(mule_id)s] [%(threadName)s] %(message)s

Or, even better, if you switch to a YAML Galaxy application config, you can use a logging.config.dictConfig dict with a special filename_template parameter on any logging.handlers.*FileHandler handlers that will be templated with galaxy.util.facts, e.g.:

---
galaxy:
    debug: yes
    use_interactive: yes
    logging:
        version: 1
        root:
            # root logger
            level: INFO
            handlers:
                - console
                - files
        loggers:
            galaxy:
                level: DEBUG
                handlers:
                    - console
                    - files
                qualname: galaxy
                propagate: 0
        handlers:
            console:
                class: logging.StreamHandler
                level: DEBUG
                formatter: generic
                stream: ext://sys.stderr
            files:
                class: logging.FileHandler
                level: DEBUG
                formatter: generic
                filename: galaxy_default.log
                filename_template: galaxy_{pool_name}_{server_id}.log
        formatters:
            generic:
                format: "%(name)s %(levelname)-5.5s %(asctime)s [p:%(process)s,w:%(worker_id)s,m:%(mule_id)s] [%(threadName)s] %(message)s"

This will result in 3 log files, galaxy_web_0.log (actually contains log messages for all workers), galaxy_job-handlers_1.log and galaxy_job-handlers_2.log.

TODO

For this PR:

[x] Support handler assignment for workflow invocations
[x] Figure out separate file logging for the workers and each mule
[x] Make sure get_uwsgi_args.py works with a variety of [uwsgi] (or lack of) settings in the config
[x] Finish refactoring and cleaning
[x] Fix linting, tests, etc., whitelist stricter code guidelines
[x] Document logging configuration improvements
[x] ~~Squash WIP commits~~ I think I want to preserve these
[x] Tests would be great ~~¯\_(ツ)/¯~~ thanks @jmchilton! (ﾉ^^)ﾉ
[x] Default to running jobs in worker(s) and don't start mules if there is no galaxy config file and no job_conf.xml and no mules configured.
[x] Fix thread join failure on shutdown
[x] Implement correct handler assignment (either web workers or mules) when a default job_conf.xml is in place
[x] Correct server_name templating for mules, ideally: {server_name}.{pool_name}.{pool_index} where pool_name is the farm name and pool_index is the mule's index in the farm, or at least {server_name}.{pool_name}.{server_id} where server_id is the mule_id
[x] Correct the information in this PR description
[x] ~~Also, it shouldn't block merging, but~~ I need to improve the way that a Galaxy determines whether stack messaging should be used for handler selection, and whether a mule is a handler.

Future TODO

For a post-PR issue:

Include a default config for proxying GIEs (work started in #2385)
If possible, mules should probably give up the message lock after some timeout. This may only be possible with signals since uwsgi.get_farm_msg() does not have a timeout param.
Default to uWSGI
Document recommended job config changes
Add configurability as described above
Add configurability for workflow scheduling handlers
Support Zerg mode
~~get_uwsgi_args.py won't play nice with mule and farm settings in the uWSGI config file ([uwsgi] section in galaxy.ini or wherever it is in your case)~~
Run handlers without exec(): unbit/uwsgi#1608
Run multiple handler pools for mapping jobs in different ways
To my knowledge, no other WSGI application stacks support fork/execing asynchronous workers, let alone messaging them. However, I would like to incorporate non-uWSGI worker messaging into the stack code so we could at least send messages to webless Galaxy processes. My initial thought on doing this is to add a stack transport (see galaxy.web.stack.transport) that interfaces with the AMQP support already in galaxy.queues. But alternatively, maybe the stack messaging stuff should be decoupled from the stack entirely and merged directly in to galaxy.queues.

kind/feature area/jobs area/performance area/framework

A lot of Galaxy dependencies are in the process of dropping support for Python 2, see e.g. https://python3statement.org/

Even pip now (since release 19.0) displays the following warning:

DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.

If we keep supporting Python 2.7, we will have to pin dependencies to old and potentially broken/insecure versions.

I suggest we add a deprecation notice to the release notes (for 19.01 if possible), specifying which Galaxy release is going to be the last supporting Python 2.7 .

We should also start moving our test servers (e.g. https://test.galaxyproject.org/ ) to Python 3, to iron the last bugs. This may require the update of some Ansible playbooks and roles.

xref. #1715

area/documentation

The Galaxy uwsgi process got segmentation fault occasionally on Mac OS X, which happened on multiple Mac laptops. The problem could go away after a restart or re-loggin of the OS X. It also could be gone after many retries of starting Galaxy.

kind/bug

https://github.com/galaxyproject/tools-devteam/issues/341 In release 16_01, devteam's cufflinks 2.2.1.0 was failed and I guess it's caused by a galaxy's bug with hidden_data. Actually I can't find how hidden_data is set.

In cufflinks_wrapper.xml, the following code ... ## Include reference annotation? #if $reference_annotation.use_ref == "Use reference annotation": -G $reference_annotation.reference_annotation_file $reference_annotation.compatible_hits_norm #end if #if $reference_annotation.use_ref == "Use reference annotation guide": -g $reference_annotation.reference_annotation_guide_file --3-overhang-tolerance=$reference_annotation.three_overhang_tolerance --intron-overhang-tolerance=$reference_annotation.intron_overhang_tolerance $reference_annotation.no_faux_reads #end if ... ## Include global model if available. #if $global_model: --global_model=$global_model #end if ... <param name="global_model" type="hidden_data" label="Global model (for use in Trackster)" optional="True"/>

generated the following in tool_script.sh where $global_model is set to the value of $reference_annotation.reference_annotation_guide_file.

python /home/galaxy/dev/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/cufflinks/64698e16f4a6/cufflinks/cufflinks_wrapper.py --input=/home/galaxy/dev/database/files/000/dataset_634.dat --assembled-isoforms-output=/home/galaxy/dev/database/files/000/dataset_654.dat --num-threads="${GALAXY_SLOTS:-4}" -I 300000 -F 0.1 -j 0.15 -g /home/galaxy/dev/database/files/000/dataset_26.dat --3-overhang-tolerance=600 --intron-overhang-tolerance=50 -b --ref_file=/data/hg19/hg19dip.fa -u --global_model=/home/galaxy/dev/database/files/000/dataset_26.dat

kind/bug area/tool-framework

My users are finding dataset collections to be not so user-friendly. They generate collections (e.g. sequencing data), then do the map step (assembly), and then are stuck not being able to access the data within collections. They want to do manual analysis of different files within that collection.

Within the "select single/multiple datasets" UI, it would be nice if collections were listed alongside (maybe bold) and then datasets within collections listed below the header and indented. Much like how timezones are used as headers here https://select2.github.io/examples.html

kind/enhancement area/UI-UX feature-request

Basic idea: launch a container-backed Galaxy Tool and get access to content inside in real time.

Give user access via uwsgi (inspired by https://github.com/galaxyproject/galaxy/pull/2385), or another proxy.
Access is based upon a key, key_type, token mapping to a host, port. You can have and access any number of RealTimeTools at a time.
Port and entry url-path snippet are specified in standard tool.xml files.
You can specify any number of ports, so a single RealTimeTool can give access to multiple running applications.
RealTimeTools can be added to and installed from the ToolShed.
Currently working for docker in local runner.

kind/enhancement kind/feature area/tool-framework

Python 2.7 will not be maintained past 2020.

Moreover, some Galaxy dependencies dropped Python2 support: cachetools, cmd2, cwltool, numpy, schema-salad

Add support for Python >= 3.5 while maintaining support for Python 2.7.

xref.: https://trello.com/c/dZcCVf9I/2702-migrate-to-python-3

Useful tools and documentation

https://docs.python.org/3/howto/pyporting.html https://docs.python.org/2/library/2to3.html http://python-future.org/ https://six.readthedocs.io/ http://python3porting.com/

Dependencies which need to be ported, dropped or updated

[x] bx-python: https://github.com/bxlab/bx-python/pull/7, https://github.com/bxlab/bx-python/pull/25, https://github.com/galaxyproject/galaxy/pull/5492
[x] Cheetah: replaced with Cheetah3 fork in https://github.com/galaxyproject/galaxy/pull/5359
[x] Fabric: fabric/fabric#1424, replaced with Fabric3 fork in https://github.com/galaxyproject/galaxy/pull/5359
[x] feedparser: was copied inside https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/util/sanitize_html.py , removed in https://github.com/galaxyproject/galaxy/pull/5642
[x] mercurial: https://www.mercurial-scm.org/wiki/Python3, we should use the command line interface through subprocess instead of using the Mercurial API https://github.com/galaxyproject/galaxy/pull/9065
[x] NoseHTML: PR #https://github.com/galaxyproject/nosehtml/pull/1
[x] paste
[x] pulsar: https://github.com/galaxyproject/pulsar/issues/152, https://github.com/galaxyproject/pulsar/commit/5b663c3027ec31ef2ce2c58335333bcdf593b5f8, https://github.com/galaxyproject/pulsar/pull/156, https://github.com/galaxyproject/galaxy/pull/5494
[x] pygithub3: used only in scripts/bootstrap_history.py, should be replaced with PyGithub
[x] sgmllib: removed in https://github.com/galaxyproject/galaxy/pull/5642
[x] SVGFig: issue #1717, PR #1747
[x] twill: branch 2.0 of Cito's fork supports Python >= 3.3, then https://github.com/galaxyproject/galaxy/pull/2656 can be reverted. Or better: replace with selenium tests
[x] WebError: unmaintained, even the latest 0.13.1 release doesn't support Python 3 and there is no interest in adding support, see e.g. https://github.com/Pylons/weberror/pull/18 . It could be replaced with Werkzeug, as suggested in https://github.com/galaxyproject/galaxy/pull/1017 (weberror removed in https://github.com/galaxyproject/galaxy/pull/8877)
[x] WebHelpers: unmaintained, removed in https://github.com/galaxyproject/galaxy/pull/5359

Tips

Use

2to3 -w -n FILE

to port a file to Python3, then revert unnecessary changes and use six for anything that breaks Python2.7

If 2to3 suggests to enclose a call to the keys(), values() or items() methods of a dict inside list( ), it may be unnecessary, e.g. if the dict is just iterated over a loop. But if items are added or deleted from the dict (changing its size) inside the loop, then making a copy of it with list( ) is necessary.
Add

from __future__ import print_function

at the top of a file when changing print to print() in it.

Substitute UserDict.DictMixin with collections.(Mutable)Mapping and add missing methods to the class inheriting from it
Substitute string.letters, string.lowercase and string.uppercase with string.ascii_letters, string.ascii_lowercase and string.ascii_uppercase respectively
Change calls to string.maketrans to:

try:
    maketrans = str.maketrans
except AttributeError:
    from string import maketrans

Change calls to pipes.quote() to six.moves.shlex_quote()
Change calls to filter() and map(lambda ...) to list comprehensions or generator expressions
If d is a dict, change:

d.keys()[0]
d.values()[0]
d.items()[0]

to:

next(iter(d.keys()))
next(iter(d.values()))
next(iter(d.items()))

If a class overrides the __nonzero__() method, this should be renamed to __bool__() and after its definition, the following alias should be added:

    __nonzero__ = __bool__

if the long type is needed, add the following line after the imports:

if sys.version_info > (3,):
    long = int

Since Python 3.7 async and await are keywords, variables with these names need to be renamed.
In Python 3 the return value from subprocess.check_output() is a bytestring. If this value is going to be processed as a string, it needs to be decoded.

kind/enhancement area/framework area/python3

I am working in v21.05. New user is logging in to our local toolshed, below error occurs. Authentication seems to work, though LDAP username is listed as None. Quick fix is to add the set_random_password code to tool_shed/webapp/model/init.py, this fixes the issue and allows user to get in. Upon adding this code, LDAP username is listed correctly as "myuser" in logs.

galaxy.webapps.galaxy.controllers.user DEBUG 2023-01-03 15:35:04,935 [pN:main,p:50528,w:1,m:0,tN:uWSGIWorker1Core2] trans.app.config.auth_config_file: /galaxy/config/auth_conf.xml
galaxy.auth.providers.ldap_ad DEBUG 2023-01-03 15:35:04,935 [pN:main,p:50528,w:1,m:0,tN:uWSGIWorker1Core2] LDAP authenticate: email is [email protected]
galaxy.auth.providers.ldap_ad DEBUG 2023-01-03 15:35:04,936 [pN:main,p:50528,w:1,m:0,tN:uWSGIWorker1Core2] LDAP authenticate: username is None
galaxy.auth.providers.ldap_ad DEBUG 2023-01-03 15:35:04,936 [pN:main,p:50528,w:1,m:0,tN:uWSGIWorker1Core2] LDAP authenticate: options are {'allow-register': 'False', 'auto-register': 'True', 'allow-password-change': 'False', 'server': 'ldap://myldap.work.edu', 'login-use-username': 'False', 'continue-on-failure': 'False', 'search-fields': 'uid,mail', 'search-base': 'dc=acc,dc=work,dc=edu', 'search-filter': '(mail={email})', 'bind-user': '{dn}', 'bind-password': '{password}', 'auto-register-username': '{uid}', 'auto-register-email': '{mail}', 'redact_username_in_logs': False, 'no_password_check': False}
galaxy.auth.providers.ldap_ad DEBUG 2023-01-03 15:35:04,947 [pN:main,p:50528,w:1,m:0,tN:uWSGIWorker1Core2] LDAP authenticate: dn is uid=myuser,ou=Users,dc=blah,dc=work,dc=edu
galaxy.auth.providers.ldap_ad DEBUG 2023-01-03 15:35:04,947 [pN:main,p:50528,w:1,m:0,tN:uWSGIWorker1Core2] LDAP authenticate: search attributes are {'uid': [b'myuser'], 'mail': [b'[email protected]']}
galaxy.auth.providers.ldap_ad DEBUG 2023-01-03 15:35:04,963 [pN:main,p:50528,w:1,m:0,tN:uWSGIWorker1Core2] LDAP authenticate: whoami is dn:uid=myuser,ou=Users,dc=blah,dc=work,dc=edu
galaxy.auth.providers.ldap_ad DEBUG 2023-01-03 15:35:04,963 [pN:main,p:50528,w:1,m:0,tN:uWSGIWorker1Core2] LDAP authentication successful
galaxy.auth.util DEBUG 2023-01-03 15:35:04,966 [pN:main,p:50528,w:1,m:0,tN:uWSGIWorker1Core2] Email: [email protected], auto-register with username: myuser
Traceback (most recent call last):
  File "lib/galaxy/web/framework/middleware/error.py", line 154, in __call__
    app_iter = self.application(environ, sr_checker)
  File "lib/galaxy/web/framework/middleware/xforwardedhost.py", line 23, in __call__
    return self.app(environ, start_response)
  File "/galaxy/.venv/lib/python3.6/site-packages/paste/translogger.py", line 69, in __call__
    return self.application(environ, replacement_start_response)
  File "/galaxy/.venv/lib/python3.6/site-packages/paste/recursive.py", line 85, in __call__
    return self.application(environ, start_response)
  File "/galaxy/.venv/lib/python3.6/site-packages/routes/middleware.py", line 153, in __call__
    response = self.app(environ, start_response)
  File "/galaxy/.venv/lib/python3.6/site-packages/paste/httpexceptions.py", line 640, in __call__
    return self.application(environ, start_response)
  File "lib/galaxy/web/framework/base.py", line 138, in __call__
    return self.handle_request(environ, start_response)
  File "lib/galaxy/web/framework/base.py", line 217, in handle_request
    body = method(trans, **kwargs)
  File "lib/tool_shed/webapp/controllers/user.py", line 68, in login
    response = self.__validate_login(trans, **kwd)
  File "lib/galaxy/webapps/galaxy/controllers/user.py", line 139, in __validate_login
    message, user = self.__autoregistration(trans, login, password)
  File "lib/galaxy/webapps/galaxy/controllers/user.py", line 97, in __autoregistration
    user = self.user_manager.create(email=email, username=username, password="")
  File "lib/galaxy/managers/users.py", line 116, in create
    user.set_random_password()
AttributeError: 'User' object has no attribute 'set_random_password'

closes #15276

How to test the changes?

(Select all options that apply)

[ ] I've included appropriate automated tests.
[x] This is a refactoring of components with existing test coverage.
[ ] Instructions for manual testing are as follows:
1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

[x] I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

kind/bug area/performance

api/entry_points?running=true is always being polled at a set interval, even if the response is not coming in. This leads to a pileup of requests that (if no countermeasure are taken on the server side) will lead to degraded performance. A fix could be to:

use setTimeout instead of setInterval
not to fire the request if the page isn't visible

we're doing that already in watchHistory, I think the pattern could probably be extracted into a composable (if vueuse doesn't already have something we can reuse).

kind/bug area/UI-UX area/performance

I am trying to install and run Galaxy (release 22.05) on a remote server. Upon running the file run.sh, I get the following output on my terminal window:-

... serving on http://0.0.0.0:8080 galaxy.model.database_heartbeat DEBUG 2023-01-03 18:38:15,430 [pN:main.1,p:354254,tN:database_heartbeart_main.1.thread] main.1 is config watcher galaxy.queue_worker INFO 2023-01-03 18:38:15,648 [pN:main.1,p:354254,tN:Thread-1] Instance 'main.1' received 'rebuild_toolbox_search_index' task, executing now. galaxy.tools.search DEBUG 2023-01-03 18:38:15,648 [pN:main.1,p:354254,tN:Thread-1] Starting to build toolbox index of panel default. galaxy.tools.search DEBUG 2023-01-03 18:38:15,675 [pN:main.1,p:354254,tN:Thread-1] Toolbox index of panel default finished (26.428 ms) galaxy.tools.search DEBUG 2023-01-03 18:38:15,675 [pN:main.1,p:354254,tN:Thread-1] Starting to build toolbox index of panel ontology:edam_operations. galaxy.tools.search DEBUG 2023-01-03 18:38:15,695 [pN:main.1,p:354254,tN:Thread-1] Toolbox index of panel ontology:edam_operations finished (20.114 ms) galaxy.tools.search DEBUG 2023-01-03 18:38:15,696 [pN:main.1,p:354254,tN:Thread-1] Starting to build toolbox index of panel ontology:edam_topics. galaxy.tools.search DEBUG 2023-01-03 18:38:15,714 [pN:main.1,p:354254,tN:Thread-1] Toolbox index of panel ontology:edam_topics finished (18.627 ms) 2023-01-03 18:38:19,720 INFO success: gunicorn entered RUNNING state, process has stayed up for > than 15 seconds (startsecs)

==> /data/galaxy/database/gravity/log/celery-beat.log <== [2023-01-03 18:43:06,505: DEBUG/MainProcess] beat: Synchronizing schedule... [2023-01-03 18:43:06,519: DEBUG/MainProcess] beat: Waking up in 5.00 minutes.

I have already changed the bind setting in the galaxy.yml file (that I copied from config/galaxy.yml.sample) to 0.0.0.0:8080 to allow binding to any available network interfaces on port 8080. When I try to launch the web interface on my web browser (Google Chrome) using http://127.0.0.1:8080 or http://localhost:8080, the connection is refused by the browser. I also tried changing the port to 9090, but that didn't help either (both of these ports are available to use on my server).

Any help fixing this issue would be greatly appeciated! Thanks in advance!

See more at https://github.com/galaxyproject/galaxy/pull/15262#issuecomment-1370208186

WIP while I try to chase down my hunch in that thread.

fixes #15111

How to test the changes?

(Select all options that apply)

[ ] I've included appropriate automated tests.
[ ] This is a refactoring of components with existing test coverage.
[ ] Instructions for manual testing are as follows:
1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

[x] I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

area/testing/selenium

Fixes: #15269. The form builder in the admin panel is not a frequently used feature. Recently it had several issues and was not working properly. Unfortunately, it is not covered by current tests, it is not a lot of code, but probably needs to be refactored before it can be properly tested as an api. For now, this PR restores its functionality. The form builder allows uploading a csv file describing the form inputs, here is the example: formbuilder_demo.csv, and here is demo screencast on how the form builder works.

https://user-images.githubusercontent.com/2105447/210455596-198166d8-72de-4131-a0a6-43f780ef8eaa.mov

How to test the changes?

(Select all options that apply)

[ ] I've included appropriate automated tests.
[ ] This is a refactoring of components with existing test coverage.
[ ] Instructions for manual testing are as follows:
1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]