ASGI Correlation ID middleware
Middleware for loading and receiving correlation IDs from request HTTP headers, and making them available in application logs.
By default, the middleware loads correlation IDs from the Correlation-ID
HTTP header, but the name of the header can be specified, and if you're, e.g., on a platform like Heroku you probably want to set the header name to X-Request-ID
instead.
In addition to adding correlation IDs to logs, the middleware supports propagating correlation IDs to Sentry events and Celery tasks. See the relevant sections below for more details.
Table of contents
Installation
pip
install
asgi - correlation - id
Setting up the middleware
Adding the middleware
The middleware can be added like this
app = FastAPI(middleware=[Middleware(CorrelationIdMiddleware)])
or this
app = FastAPI()
app.add_middleware(CorrelationIdMiddleware)
For Starlette apps, just substitute FastAPI
with Starlette
in the example.
Middleware settings
The middleware has a few settings. These are the defaults:
class CorrelationIdMiddleware(
header_name='Correlation-ID',
validate_guid=True,
uuid_length=32,
)
Each individual setting is described below:
header_name
The HTTP header key to read IDs from.
In additon to Correlation-ID
, another popular choice for header name is X-Request-ID
. Among other things, this is the standard header value for request IDs on Heroku.
Defaults to Correlation-ID
.
validate_guid
By default, the middleware validates correlation IDs as valid UUIDs. If turned off, any string will be accepted.
An invalid header is discarded, and a fresh UUID is generated in its place.
Defaults to True
.
uuid_length
Lets you optionally trim the length of correlation IDs. Probably not needed in most cases, but for, e.g., local development having 32-length UUIDs in every log output to your console can be excessive.
Defaults to 32
.
Configuring logging
To get a benefit from the middleware, you'll want to configure your logging setup to log the correlation ID in some form or another. This way logs can be correlated to a single request - which is largely the point of the middleware.
To set up logging of the correlation ID, you simply have to implement the filter supplied by the package.
If your logging config looks something like this:
LOGGING = {
'version': 1,
'disable_existing_loggers': False,
'formatters': {
'web': {
'class': 'logging.Formatter',
'datefmt': '%H:%M:%S',
'format': '%(levelname)s ... %(name)s %(message)s',
},
},
'handlers': {
'web': {
'class': 'logging.StreamHandler',
'formatter': 'web',
},
},
'loggers': {
'my_project': {
'handlers': ['web'],
'level': 'DEBUG',
'propagate': True,
},
},
}
You simply have to make these changes
LOGGING = {
'version': 1,
'disable_existing_loggers': False,
+ 'filters': {
+ 'correlation_id': {'()': CorrelationId},
+ },
'formatters': {
'web': {
'class': 'logging.Formatter',
'datefmt': '%H:%M:%S',
+ 'format': '%(levelname)s ... [%(correlation_id)s] %(name)s %(message)s',
},
},
'handlers': {
'web': {
'class': 'logging.StreamHandler',
+ 'filters': ['correlation_id'],
'formatter': 'web',
},
},
'loggers': {
'my_project': {
'handlers': ['web'],
'level': 'DEBUG',
'propagate': True,
},
},
}
And your log output should go from this:
INFO ... project.views This is a DRF view log, and should have a GUID.
WARNING ... project.services.file Some warning in a function
INFO ... project.views This is a DRF view log, and should have a GUID.
INFO ... project.views This is a DRF view log, and should have a GUID.
WARNING ... project.services.file Some warning in a function
WARNING ... project.services.file Some warning in a function
to this
INFO ... [773fa6885e03493498077a273d1b7f2d] project.views This is a DRF view log, and should have a GUID.
WARNING ... [773fa6885e03493498077a273d1b7f2d] project.services.file Some warning in a function
INFO ... [0d1c3919e46e4cd2b2f4ac9a187a8ea1] project.views This is a DRF view log, and should have a GUID.
INFO ... [99d44111e9174c5a9494275aa7f28858] project.views This is a DRF view log, and should have a GUID.
WARNING ... [0d1c3919e46e4cd2b2f4ac9a187a8ea1] project.services.file Some warning in a function
WARNING ... [99d44111e9174c5a9494275aa7f28858] project.services.file Some warning in a function
Setting up Celery support
Features
What we call "Celery support" is really two distinct features. If you don't care about the details and just want IDs in your logs, feel free to skip ahead to the implementation section.
Feature 1: Passing a correlation ID to a Celery worker
It's pretty useful to be able to discern which HTTP request spawned which background task, if that's something your app might do.
Celery already has a concept of correlation_id
, but unfortunately it's not something users can hook into. Instead we can use the before_task_publish signal to pass our correlation ID to the receiving worker, using the task headers.
Feature 2: Keeping track of what spawned a Celery worker
In the case of a HTTP request spawning a background task, we have full information about the sequence of events.
But what happens if that background task spawns more background tasks, or retries and rejections are added to the mix? As soon as more than one task is spawned, the correlation ID is reduced to an "origin ID" - the ID of the HTTP request that spawned the first worker.
In the same way correlation IDs are nice, because it connects logs to a single HTTP request, we would like something to give us the sequence of events when things get complicated. For this purpose the package supplies two extra log filters:
- The worker
current_id
, which is a generated UUID, unique to each new worker process - The worker
parent_id
which is either the correlation ID (when a background task was spawned from a HTTP request), or thecurrent_id
of the worker process that spawned the current worker process.
So to summarize:
correlation_id
: The ID for the originating HTTP requestcurrent_id
: The ID of the current worker processparent_id
: The ID of the former HTTP/worker process (None
if there was none, as in the case of scheduled tasks)
Adding Celery event hooks
Setting up the event hooks is simple, just import configure_celery
and run it during startup.
from fastapi import FastAPI
from asgi_correlation_id import configure_celery
app = FastAPI()
app.add_event_handler('startup', configure_celery)
You can look over the event hooks here.
Celery event hook settings
The setup function has a few settings. These are the defaults:
def configure_celery(
uuid_length=32,
log_parent=True,
parent_header='CELERY_PARENT_ID',
correlation_id_header='CORRELATION_ID',
) -> None:
Each individual setting is described below:
-
uuid_length
Lets you optionally trim the length of IDs. Probably not needed in most cases, but for, e.g., local development having 32-length UUIDs in every log output to your console *can* be excessive.
Defaults to32
. -
log_parent
If `False` will only pass the correlation ID to worker processes spawned by HTTP requests - nothing else.
Defaults toTrue
. -
correlation_id_header
Same as `parent_header`.
Defaults toCORRELATION_ID
. -
parent_header
The key to store a `parent_id` header value in. There's no need to change this unless you have other signal functions for Celery interacting with task headers.
Defaults toCELERY_PARENT_ID
.
Configuring Celery logging
If this is your logging config:
LOGGING = {
'version': 1,
'disable_existing_loggers': False,
'filters': {
'correlation_id': {'()': CorrelationId},
},
'formatters': {
'web': {
'class': 'logging.Formatter',
'datefmt': '%H:%M:%S',
'format': '%(levelname)s ... [%(correlation_id)s] %(name)s %(message)s',
},
},
'handlers': {
'web': {
'class': 'logging.StreamHandler',
'filters': ['correlation_id'],
'formatter': 'web',
},
},
'loggers': {
'my_project': {
'handlers': ['web'],
'level': 'DEBUG',
'propagate': True,
},
},
}
You simply need to add these lines of code to log the current_id
and parent_id
LOGGING = {
'version': 1,
'disable_existing_loggers': False,
'filters': {
'correlation_id': {'()': CorrelationId},
+ 'celery_tracing': {'()': CeleryTracingIds},
},
'formatters': {
'web': {
'class': 'logging.Formatter',
'datefmt': '%H:%M:%S',
'format': '%(levelname)s ... [%(correlation_id)s] %(name)s %(message)s',
},
+ 'celery': {
+ 'class': 'logging.Formatter',
+ 'datefmt': '%H:%M:%S',
+ 'format': '%(levelname)s ... [%(correlation_id)s] [%(celery_parent_id)s-%(celery_current_id)s] %(name)s %(message)s',
+ },
},
'handlers': {
'web': {
'class': 'logging.StreamHandler',
'filters': ['correlation_id'],
'formatter': 'web',
},
+ 'celery': {
+ 'class': 'logging.StreamHandler',
+ 'filters': ['correlation_id', 'celery_tracing'],
+ 'formatter': 'celery',
+ },
},
'loggers': {
'my_project': {
+ 'handlers': ['celery' if any('celery' in i for i in sys.argv) else 'web'],
'level': 'DEBUG',
'propagate': True,
},
},
}
Though at this point it might make more sense to use a JSON-formatter, since the logs will become pretty cluttered. Using the UUID length settings for local development can also be useful.
{
'version': 1,
'disable_existing_loggers': False,
'filters': {
'correlation_id': {'()': CorrelationId},
'celery_tracing': {'()': CeleryTracingIds},
},
'formatters': {
'dev': {
'class': 'logging.Formatter',
'datefmt': '%H:%M:%S',
'format': '%(levelname)s:\t\b%(asctime)s %(name)s:%(lineno)d [%(correlation_id)s] %(message)s',
},
'dev-celery': {
'class': 'logging.Formatter',
'datefmt': '%H:%M:%S',
'format': (
'%(levelname)s:\t\b%(asctime)s %(name)s:%(lineno)d [%(correlation_id)s]'
' [%(celery_parent_id)s-%(celery_current_id)s] %(message)s'
),
},
'json': {
'()': 'pythonjsonlogger.jsonlogger.JsonFormatter',
'format': """
asctime: %(asctime)s
created: %(created)f
filename: %(filename)s
funcName: %(funcName)s
levelname: %(levelname)s
level: %(levelname)s
levelno: %(levelno)s
lineno: %(lineno)d
message: %(message)s
module: %(module)s
msec: %(msecs)d
name: %(name)s
pathname: %(pathname)s
process: %(process)d
processName: %(processName)s
relativeCreated: %(relativeCreated)d
thread: %(thread)d
threadName: %(threadName)s
exc_info: %(exc_info)s
correlation-id: %(correlation_id)s
celery-current-id: %(celery_current_id)s
celery-parent-id: %(celery_parent_id)s
""",
'datefmt': '%Y-%m-%d %H:%M:%S',
},
},
'handlers': {
'dev': {
'class': 'logging.StreamHandler',
'stream': 'ext://sys.stdout',
'filters': ['correlation_id'],
'formatter': 'console',
},
'dev-celery': {
'class': 'logging.StreamHandler',
'stream': 'ext://sys.stdout',
'filters': ['correlation_id', 'celery_tracing'],
'formatter': 'console-celery',
},
'json': {
'class': 'logging.StreamHandler',
'stream': 'ext://sys.stdout',
'filters': ['correlation_id'],
'formatter': 'json',
},
},
'loggers': {
'my_project': {
'handlers': [
'json' if settings.ENVIRONMENT != 'dev'
else 'dev-celery' if any('celery' in i for i in sys.argv)
else 'dev'
],
'level': 'DEBUG',
'propagate': True,
},
},
}
Sentry support
If your project has sentry-sdk installed, correlation IDs are automatically added to Sentry events as a transaction_id
. This is also the case for Celery tasks, if Celery support is enabled.
See this blogpost for more details.