Rationale
Previously, if running Galaxy under uWSGI, it was necessary to configure and start separate Paste or application-only (scripts/galaxy-main
) Galaxy servers for job handling in order to avoid race conditions with job handler selection, and to separate web workers from job handlers for performance/scalability. This also meant needing some sort of process management (e.g. supervisor) to manage all of these individual processes.
uWSGI Mules are processes forked from the uWSGI master process after the application is loaded. Mules can also exec()
specified arbitrary code, and come with some very nice features:
- They can receive messages from uWSGI worker processes.
- They can be grouped together in to "farms" such that messages sent to a farm are received only by mules in that farm
- They are controlled by the uWSGI master process and can be stopped and started all from a single command line.
Usage
This PR introduces the ability to run Galaxy job handlers as mules. In the simplest form, you can:
$ GALAXY_UWSGI=1 sh run.sh
This will run with a command line like:
$ uwsgi --virtualenv /home/nate/work/galaxy/.venv --ini-paste config/galaxy.ini --processes 1 --threads 4 --http localhost:8080 --pythonpath lib --master --static-map /static/style=/home/nate/work/galaxy/static/style/blue --static-map /static=/home/nate/work/galaxy/static --paste-logger --die-on-term --enable-threads --py-call-osafterfork
You can override these defaults (other than the booleans like --master
and --enable-threads
) with a [uwsgi]
section in galaxy.ini
, or just configure in galaxy.ini
and run uwsgi
directly.
By default, with no job_conf.xml
, jobs will be run in uWSGI web worker processes, as they were with Paste. This is to keep things simple at first. To run jobs in mules, you only need to start them and add them to the correct farm, which must be named job-handlers. Be aware that there are some caveats (below) if you have a job_conf.xml
. Mules are added in any of the following ways (command line, ini config file, yaml config file):
$ GALAXY_UWSGI=1 sh run.sh --mule=lib/galaxy/main.py --mule=lib/galaxy/main.py --farm=job-handlers:1,2
[uwsgi]
mule = lib/galaxy/main.py
mule = lib/galaxy/main.py
farm = job-handlers:1,2
uwsgi:
mule: lib/galaxy/main.py
mule: lib/galaxy/main.py
farm: job-handlers:1,2
For more handlers, simply add additional mule
options and add their IDs to the farm
option.
Design
Where possible, I have tried to make this as stack-agnostic and purpose-inspecific as possible. There is a bit of ugliness around how mules are designated as job handlers (they have to be in a farm named job-handlers
), but the goal is to make it easy for anyone going forward to send tasks to mules for asynchronous execution. You'll see a few references to "pools," which is a sort of stack-agnostic abstraction of uWSGI Farms.
For most other functions you might want to push out to mules, it should be as simple as:
- Add a new message class as in
galaxy.web.stack.message
- Create a message handler and register it with
app.application_stack.register_message_handler
- Send messages to mules with
app.application_stack.send_message
For jobs, messages are only being used for handler selection. We create Job
s in the web workers at tool execution time just as we did before, but they are committed to the database with a null
in the handler
field, where before they always had to have a handler set at creation time. Mule messages only include the target message handler function, task to perform (setup
) and job ID of the job. A mule will receive the message and write its server_name
to the handler
field, and then pick the job up as they did before without any further jobs code modification.
Server names
Under uWSGI, server names are manipulated using the template {server_name}.{pool_name}.{instance_id}
where:
{server_name}
is the original server_name
, configurable in the app config (or with Paste/webless, on the command line with --server-name
), by default this is main
{pool_name}
is the worker pool or farm name, for web workers (the processes forked based on the processes
uWSGI option) this is web
, for mules this is the farm name, e.g. job-handlers
{instance_id}
is the 1-based index of the server in its pool, for web workers this is its uWSGI-assigned worker id (an integer starting at 1), for mules this is its 1-indexed position in the farm
argument
So in the following galaxy.yml:
uwsgi:
processes: 4
mule: lib/galaxy/main.py
mule: lib/galaxy/main.py
mule: lib/galaxy/main.py
farm: job-handlers:2,3
farm: something-else:1
galaxy:
server_name: galaxy
uWSGI starts 4 web workers, 2 job handlers, and another mule with server_name
s:
galaxy.web.1
galaxy.web.2
galaxy.web.3
galaxy.web.4
galaxy.job-handlers.1
galaxy.job-handlers.2
galaxy.something-else.1
This information is important when you want to statically or dynamically map handlers rather than use the default.
Caveats
In order to attempt to support existing job_conf.xml
files that have a default <handlers>
block, jobs are mapped to handlers in the following manner:
- If you do not have a
job_conf.xml
, or have a job_conf.xml
with no <handlers>
block:
- If started without a configured
job-handlers
farm or a non-uWSGI server: web workers are job handlers
- If started with a
job-handlers
farm: mules are job handlers
- If you have a
<handlers>
block and do not have a default=
set in <handlers>
:
- Works the same as if you have no
job_conf.xml
, except explicitly specified static/dynamic handler mappings will result in the specified handler being assigned
- If you have a
<handlers>
block and do have a default=
set in <handlers>
:
- The default handler or explicit static/dynamic handler specified is assigned
As before, if a specified handler is assigned and the specified handler is a tag, a handler with that tag is chosen at random. If a handler is assigned due to an explicit static/dynamic mapping, mule messages are not used, the specified handler ID is simply set on the job record in the database.
One way to mix automatic web/mule handling with mapped handling is to define multiple <handler>
s but not a default, since by default, jobs will be sent to the web/mule handlers, and only tools specifically mapped to handlers will be sent to the named handlers. It is possible to map tools to mule handlers in job_conf.xml
as well, using server_name
s main.job-handlers.1
, main.job-handlers.2
, ...
This is complicated and perhaps we should do things less magically, but as usually for Galaxy, I am trying to take the approach of least intervention by admins.
There is some functionality included for templating the server name for greater control - e.g. if you run Galaxy on multiple servers, the server_name
(which is persisted in the database and used for job recovery) would need to include some identifier unique for each host. However, configuration for this is not exposed. In the short term, people in that situation (are there any other than me?) can always continue running handlers externally.
Zerg mode is untested and you would have the potential to encounter race conditions during restarts, especially with respect to job recovery.
Configurability
I went through multiple iterations on how to make things configurable. For example:
---
stack:
workers:
- name: default-job-handlers
purpose: handle_jobs # this essentially controls the role of the mule, what portions of the application it loads, etc.
processes: 4
server_name: "{server_name}.{pool_name}.{process_num}"
- name: special-job-handlers
purpose: handle_jobs
processes: 2
server_name: "{server_name}.{pool_name}.{process_num}"
- name: spam-spam-spam
# default purpose, just run the galaxy application
- name: eggs
type: standalone # "webless" galaxy process started externally
This would translate in to a command line like:
$ uwsgi ... --mule=lib/galaxy/main.py --mule=lib/galaxy/main.py \
--mule=lib/galaxy/main.py --mule=lib/galaxy/main.py \
--mule=lib/galaxy/main.py --mule=lib/galaxy/main.py \
--mule=lib/galaxy/main.py
--farm=default-job-handlers:1,2,3,4 \
--farm=special-job-handlers:5,6 \
--farm=spam-spam-spam:7
Prior to #3179, I'd made a separate YAML config for the containers interface. These configs use defaults set as class attributes on the container classes, and those defaults are merged recursively down the class inheritance chain.
I wanted to do the same for the stack config, but with #3179, we can start merging YAML configs into the main Galaxy config. Ultimately (after some discussion on the Gitter channel) I've stripped the configurability out until we settle on whether or not and (if yes) how to support the hierarchical configs/defaults in a way compatible with the model @jmchilton created in that excellent PR.
Invocations
You can start under uWSGI using a variety of methods:
ini-paste:
$ uwsgi --ini-paste config/galaxy.ini ...
ini
$ uwsgi --ini config/galaxy.ini --module 'galaxy.webapps.galaxy.buildapp:uwsgi_app_factory()' ...
yaml
$ uwsgi --yaml config/galaxy.yml --module 'galaxy.webapps.galaxy.buildapp:uwsgi_app_factory()' ...
separate app config
Galaxy config file (ini or yaml) separate from the uWSGI config file (also ini or yaml):
$ uwsgi --<ini|yaml> config/galaxy.<ini|yml> --set galaxy_config_file=config/galaxy.<ini|yml> ...
no config file
(For example):
$ uwsgi --virtualenv /home/nate/work/galaxy/.venv --http localhost:8192 --die-on-term --enable-threads --py-call-osafterfork --master --processes 2 --threads 4 --pythonpath lib --static-map /static/style=/home/nate/work/galaxy/static/style/blue --static-map /static=/home/nate/work/galaxy/static --module 'galaxy.webapps.galaxy.buildapp:uwsgi_app_factory()' --set galaxy_config_file=config/galaxy.yml --mule=lib/galaxy/main.py --mule=lib/galaxy/main.py --farm=job-handlers:1,2
Logging
By default, everything logs to one stream, and you can't tell which messages come from which process. This isn't bad with one mule, with more it's unmanageable. You can fix this with the following logging config, which includes the use of custom filters in this PR that log the uWSGI worker and mule IDs in the log message:
[loggers]
keys = root, galaxy
[handlers]
keys = console
[formatters]
keys = generic
[logger_root]
level = INFO
handlers = console
[logger_galaxy]
level = DEBUG
handlers = console
qualname = galaxy
propagate = 0
[handler_console]
class = StreamHandler
args = (sys.stderr,)
level = DEBUG
formatter = generic
[formatter_generic]
format = %(name)s %(levelname)-5.5s %(asctime)s [p:%(process)s,w:%(worker_id)s,m:%(mule_id)s] [%(threadName)s] %(message)s
Or, even better, if you switch to a YAML Galaxy application config, you can use a logging.config.dictConfig dict with a special filename_template
parameter on any logging.handlers.*FileHandler
handlers that will be templated with galaxy.util.facts
, e.g.:
---
galaxy:
debug: yes
use_interactive: yes
logging:
version: 1
root:
# root logger
level: INFO
handlers:
- console
- files
loggers:
galaxy:
level: DEBUG
handlers:
- console
- files
qualname: galaxy
propagate: 0
handlers:
console:
class: logging.StreamHandler
level: DEBUG
formatter: generic
stream: ext://sys.stderr
files:
class: logging.FileHandler
level: DEBUG
formatter: generic
filename: galaxy_default.log
filename_template: galaxy_{pool_name}_{server_id}.log
formatters:
generic:
format: "%(name)s %(levelname)-5.5s %(asctime)s [p:%(process)s,w:%(worker_id)s,m:%(mule_id)s] [%(threadName)s] %(message)s"
This will result in 3 log files, galaxy_web_0.log
(actually contains log messages for all workers), galaxy_job-handlers_1.log
and galaxy_job-handlers_2.log
.
TODO
For this PR:
- [x] Support handler assignment for workflow invocations
- [x] Figure out separate file logging for the workers and each mule
- [x] Make sure
get_uwsgi_args.py
works with a variety of [uwsgi]
(or lack of) settings in the config
- [x] Finish refactoring and cleaning
- [x] Fix linting, tests, etc., whitelist stricter code guidelines
- [x] Document logging configuration improvements
- [x] ~~Squash WIP commits~~ I think I want to preserve these
- [x] Tests would be great ~~Β―\_(γ)/Β―~~ thanks @jmchilton! (οΎ^^)οΎ
- [x] Default to running jobs in worker(s) and don't start mules if there is no galaxy config file and no job_conf.xml and no mules configured.
- [x] Fix thread join failure on shutdown
- [x] Implement correct handler assignment (either web workers or mules) when a default
job_conf.xml
is in place
- [x] Correct server_name templating for mules, ideally:
{server_name}.{pool_name}.{pool_index}
where pool_name is the farm name and pool_index is the mule's index in the farm, or at least {server_name}.{pool_name}.{server_id}
where server_id is the mule_id
- [x] Correct the information in this PR description
- [x] ~~Also, it shouldn't block merging, but~~ I need to improve the way that a Galaxy determines whether stack messaging should be used for handler selection, and whether a mule is a handler.
Future TODO
For a post-PR issue:
- Include a default config for proxying GIEs (work started in #2385)
- If possible, mules should probably give up the message lock after some timeout. This may only be possible with signals since
uwsgi.get_farm_msg()
does not have a timeout param.
- Default to uWSGI
- Document recommended job config changes
- Add configurability as described above
- Add configurability for workflow scheduling handlers
- Support Zerg mode
- ~~
get_uwsgi_args.py
won't play nice with mule
and farm
settings in the uWSGI config file ([uwsgi]
section in galaxy.ini
or wherever it is in your case)~~
- Run handlers without
exec()
: unbit/uwsgi#1608
- Run multiple handler pools for mapping jobs in different ways
- To my knowledge, no other WSGI application stacks support fork/execing asynchronous workers, let alone messaging them. However, I would like to incorporate non-uWSGI worker messaging into the stack code so we could at least send messages to webless Galaxy processes. My initial thought on doing this is to add a stack transport (see
galaxy.web.stack.transport
) that interfaces with the AMQP support already in galaxy.queues
. But alternatively, maybe the stack messaging stuff should be decoupled from the stack entirely and merged directly in to galaxy.queues
.
kind/feature area/jobs area/performance area/framework