Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way

Overview

Apache Liminal

Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way.

The platform provides the abstractions and declarative capabilities for data extraction & feature engineering followed by model training and serving. Liminal's goal is to operationalize the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production, freeing them from engineering and non-functional tasks, and allowing them to focus on machine learning code and artifacts.

Basics

Using simple YAML configuration, create your own schedule data pipelines (a sequence of tasks to perform), application servers, and more.

Getting Started

A simple getting stated guide for Liminal can be found here

Apache Liminal Documentation

Full documentation of Apache Liminal can be found here

High Level Architecture

High level architecture documentation can be found here

Example YAML config file

---
name: MyLiminalStack
owner: Bosco Albert Baracus
volumes:
  - volume: myvol1
    local:
      path: /Users/me/myvol1
pipelines:
  - pipeline: my_pipeline
    start_date: 1970-01-01
    timeout_minutes: 45
    schedule: 0 * 1 * *
    metrics:
      namespace: TestNamespace
      backends: [ 'cloudwatch' ]
    tasks:
      - task: my_python_task
        type: python
        description: static input task
        image: my_python_task_img
        source: write_inputs
        env_vars:
          NUM_FILES: 10
          NUM_SPLITS: 3
        mounts:
          - mount: mymount
            volume: myvol1
            path: /mnt/vol1
        cmd: python -u write_inputs.py
      - task: my_parallelized_python_task
        type: python
        description: parallelized python task
        image: my_parallelized_python_task_img
        source: write_outputs
        env_vars:
          FOO: BAR
        executors: 3
        mounts:
          - mount: mymount
            volume: myvol1
            path: /mnt/vol1
        cmd: python -u write_inputs.py
services:
  - service:
    name: my_python_server
    type: python_server
    description: my python server
    image: my_server_image
    source: myserver
    endpoints:
      - endpoint: /myendpoint1
        module: my_server
        function: myendpoint1func

Installation

  1. Install this repository (HEAD)
   pip install git+https://github.com/apache/incubator-liminal.git
  1. Optional: set LIMINAL_HOME to path of your choice (if not set, will default to ~/liminal_home)
echo 'export LIMINAL_HOME=' >> ~/.bash_profile && source ~/.bash_profile

Authoring pipelines

This involves at minimum creating a single file called liminal.yml as in the example above.

If your pipeline requires custom python code to implement tasks, they should be organized like this

If your pipeline introduces imports of external packages which are not already a part of the liminal framework (i.e. you had to pip install them yourself), you need to also provide a requirements.txt in the root of your project.

Testing the pipeline locally

When your pipeline code is ready, you can test it by running it locally on your machine.

  1. Ensure you have The Docker engine running locally, and enable a local Kubernetes cluster: Kubernetes configured

And allocate it at least 3 CPUs (under "Resources" in the Docker preference UI).

If you want to execute your pipeline on a remote kubernetes cluster, make sure the cluster is configured using :

kubectl config set-context <your remote kubernetes cluster>
  1. Build the docker images used by your pipeline.

In the example pipeline above, you can see that tasks and services have an "image" field - such as "my_static_input_task_image". This means that the task is executed inside a docker container, and the docker container is created from a docker image where various code and libraries are installed.

You can take a look at what the build process looks like, e.g. here

In order for the images to be available for your pipeline, you'll need to build them locally:

cd </path/to/your/liminal/code>
liminal build

You'll see that a number of outputs indicating various docker images built.

  1. Create a kubernetes local volume
    In case your Yaml includes working with volumes please first run the following command:
cd </path/to/your/liminal/code> 
liminal create
  1. Deploy the pipeline:
cd </path/to/your/liminal/code> 
liminal deploy

Note: after upgrading liminal, it's recommended to issue the command

liminal deploy --clean

This will rebuild the airlfow docker containers from scratch with a fresh version of liminal, ensuring consistency.

  1. Start the server
liminal start
  1. Stop the server
liminal stop
  1. Display the server logs
liminal logs --follow/--tail

Number of lines to show from the end of the log:
liminal logs --tail=10

Follow log output:
liminal logs --follow
  1. Navigate to http://localhost:8080/admin

  2. You should see your pipeline The pipeline is scheduled to run according to the json schedule: 0 * 1 * * field in the .yml file you provided.

  3. To manually activate your pipeline: Click your pipeline and then click "trigger DAG" Click "Graph view" You should see the steps in your pipeline getting executed in "real time" by clicking "Refresh" periodically.

Pipeline activation

Contributing

More information on contributing can be found here

Running Tests (for contributors)

When doing local development and running Liminal unit-tests, make sure to set LIMINAL_STAND_ALONE_MODE=True

Comments
  • Stream out docker-build cmd stdout

    Stream out docker-build cmd stdout

    Stream out docker-build cmd stdout

    Description of changes:

    For more information on contributing, please see Contributing section of our documentation.

    Status

    • DONE

    Checklist (tick everything that applies)

    opened by zionrubin 3
  • Spark example improvements

    Spark example improvements

    turned the spark code into more structured used spark ml tools to normalize the data and transform the labels

    updated the training code to also compute accuracy and do cross validation updated the validation code to ensure accuracy is about 90% before promoting to production

    opened by assapin 2
  • Bump apache-airflow from 2.1.2 to 2.4.2

    Bump apache-airflow from 2.1.2 to 2.4.2

    Bumps apache-airflow from 2.1.2 to 2.4.2.

    Release notes

    Sourced from apache-airflow's releases.

    Apache Airflow 2.4.2

    Bug Fixes

    • Make tracebacks opt-in (#27059)
    • Add missing AUTOINC/SERIAL for FAB tables (#26885)
    • Add separate error handler for 405(Method not allowed) errors (#26880)
    • Don't re-patch pods that are already controlled by current worker (#26778)
    • Handle mapped tasks in task duration chart (#26722)
    • Fix task duration cumulative chart (#26717)
    • Avoid 500 on dag redirect (#27064)
    • Filter dataset dependency data on webserver (#27046)
    • Remove double collection of dags in airflow dags reserialize (#27030)
    • Fix auto refresh for graph view (#26926)
    • Don't overwrite connection extra with invalid json (#27142)
    • Fix next run dataset modal links (#26897)
    • Change dag audit log sort by date from asc to desc (#26895)
    • Bump min version of jinja2 (#26866)
    • Add missing colors to state_color_mapping jinja global (#26822)
    • Fix running debuggers inside airflow tasks test (#26806)
    • Fix warning when using xcomarg dependencies (#26801)
    • demote Removed state in priority for displaying task summaries (#26789)
    • Ensure the log messages from operators during parsing go somewhere (#26779)
    • Add restarting state to TaskState Enum in REST API (#26776)
    • Allow retrieving error message from data.detail (#26762)
    • Simplify origin string cleaning (#27143)
    • Remove DAG parsing from StandardTaskRunner (#26750)
    • Fix non-hidden cumulative chart on duration view (#26716)
    • Remove TaskFail duplicates check (#26714)
    • Fix airflow tasks run --local when dags_folder differs from that of processor (#26509)
    • Fix yarn warning from d3-color (#27139)
    • Fix version for a couple configurations (#26491)
    • Revert "No grid auto-refresh for backfill dag runs (#25042)" (#26463)
    • Retry on Airflow Schedule DAG Run DB Deadlock (#26347)

    Misc/Internal

    • Clean-ups around task-mapping code (#26879)
    • Move user-facing string to template (#26815)
    • add icon legend to datasets graph (#26781)
    • Bump sphinx and sphinx-autoapi (#26743)
    • Simplify RTIF.delete_old_records() (#26667)
    • Bump FAB to 4.1.4 (#26393)

    Doc only changes

    • Fixed triple quotes in task group example (#26829)
    • Documentation fixes (#26819)
    • make consistency on markup title string level (#26696)
    • Add a note against use of top level code in timetable (#26649)
    • Fix broken URL for docker-compose.yaml (#26726)

    Apache Airflow 2.4.1

    Bug Fixes

    ... (truncated)

    Changelog

    Sourced from apache-airflow's changelog.

    Airflow 2.4.2 (2022-10-23)

    Significant Changes ^^^^^^^^^^^^^^^^^^^

    Default for [webserver] expose_stacktrace changed to False (#27059) """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

    The default for [webserver] expose_stacktrace has been set to False, instead of True. This means administrators must opt-in to expose tracebacks to end users.

    Bug Fixes ^^^^^^^^^

    • Make tracebacks opt-in (#27059)
    • Add missing AUTOINC/SERIAL for FAB tables (#26885)
    • Add separate error handler for 405(Method not allowed) errors (#26880)
    • Don't re-patch pods that are already controlled by current worker (#26778)
    • Handle mapped tasks in task duration chart (#26722)
    • Fix task duration cumulative chart (#26717)
    • Avoid 500 on dag redirect (#27064)
    • Filter dataset dependency data on webserver (#27046)
    • Remove double collection of dags in airflow dags reserialize (#27030)
    • Fix auto refresh for graph view (#26926)
    • Don't overwrite connection extra with invalid json (#27142)
    • Fix next run dataset modal links (#26897)
    • Change dag audit log sort by date from asc to desc (#26895)
    • Bump min version of jinja2 (#26866)
    • Add missing colors to state_color_mapping jinja global (#26822)
    • Fix running debuggers inside airflow tasks test (#26806)
    • Fix warning when using xcomarg dependencies (#26801)
    • demote Removed state in priority for displaying task summaries (#26789)
    • Ensure the log messages from operators during parsing go somewhere (#26779)
    • Add restarting state to TaskState Enum in REST API (#26776)
    • Allow retrieving error message from data.detail (#26762)
    • Simplify origin string cleaning (#27143)
    • Remove DAG parsing from StandardTaskRunner (#26750)
    • Fix non-hidden cumulative chart on duration view (#26716)
    • Remove TaskFail duplicates check (#26714)
    • Fix airflow tasks run --local when dags_folder differs from that of processor (#26509)
    • Fix yarn warning from d3-color (#27139)
    • Fix version for a couple configurations (#26491)
    • Revert "No grid auto-refresh for backfill dag runs (#25042)" (#26463)
    • Retry on Airflow Schedule DAG Run DB Deadlock (#26347)

    Misc/Internal ^^^^^^^^^^^^^

    • Clean-ups around task-mapping code (#26879)
    • Move user-facing string to template (#26815)
    • add icon legend to datasets graph (#26781)
    • Bump sphinx and sphinx-autoapi (#26743)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Bump apache-airflow from 2.1.2 to 2.4.1

    Bump apache-airflow from 2.1.2 to 2.4.1

    Bumps apache-airflow from 2.1.2 to 2.4.1.

    Release notes

    Sourced from apache-airflow's releases.

    Apache Airflow 2.4.1

    Bug Fixes

    • When rendering template, unmap task in context (#26702)
    • Fix scroll overflow for ConfirmDialog (#26681)
    • Resolve deprecation warning re Table.exists() (#26616)
    • Fix XComArg zip bug (#26636)
    • Use COALESCE when ordering runs to handle NULL (#26626)
    • Check user is active (#26635)
    • No missing user warning for public admin (#26611)
    • Allow MapXComArg to resolve after serialization (#26591)
    • Resolve warning about DISTINCT ON query on dags view (#26608)
    • Log warning when secret backend kwargs is invalid (#26580)
    • Fix grid view log try numbers (#26556)
    • Template rendering issue in passing templates_dict to task decorator (#26390)
    • Fix Deferrable stuck as scheduled during backfill (#26205)
    • Suppress SQLALCHEMY_TRACK_MODIFICATIONS warning in db init (#26617)
    • Correctly set json_provider_class on Flask app so it uses our encoder (#26554)
    • Fix WSGI root app (#26549)
    • Fix deadlock when mapped task with removed upstream is rerun (#26518)
    • ExecutorConfigType should be cacheable (#26498)
    • Fix proper joining of the path for logs retrieved from celery workers (#26493)
    • DAG Deps extends base_template (#26439)
    • Don't update backfill run from the scheduler (#26342)

    Doc only changes

    • Clarify owner links document (#26515)
    • Fix invalid RST in dataset concepts doc (#26434)
    • Document the non-sensitive-only option for expose_config (#26507)
    • Fix example_datasets dag names (#26495)
    • Zip-like effect is now possible in task mapping (#26435)
    • Use task decorator in docs instead of classic operators (#25711)

    Apache Airflow 2.4.0

    New Features

    ... (truncated)

    Changelog

    Sourced from apache-airflow's changelog.

    Airflow 2.4.1 (2022-09-30)

    Significant Changes ^^^^^^^^^^^^^^^^^^^

    No significant changes.

    Bug Fixes ^^^^^^^^^

    • When rendering template, unmap task in context (#26702)
    • Fix scroll overflow for ConfirmDialog (#26681)
    • Resolve deprecation warning re Table.exists() (#26616)
    • Fix XComArg zip bug (#26636)
    • Use COALESCE when ordering runs to handle NULL (#26626)
    • Check user is active (#26635)
    • No missing user warning for public admin (#26611)
    • Allow MapXComArg to resolve after serialization (#26591)
    • Resolve warning about DISTINCT ON query on dags view (#26608)
    • Log warning when secret backend kwargs is invalid (#26580)
    • Fix grid view log try numbers (#26556)
    • Template rendering issue in passing templates_dict to task decorator (#26390)
    • Fix Deferrable stuck as scheduled during backfill (#26205)
    • Suppress SQLALCHEMY_TRACK_MODIFICATIONS warning in db init (#26617)
    • Correctly set json_provider_class on Flask app so it uses our encoder (#26554)
    • Fix WSGI root app (#26549)
    • Fix deadlock when mapped task with removed upstream is rerun (#26518)
    • ExecutorConfigType should be cacheable (#26498)
    • Fix proper joining of the path for logs retrieved from celery workers (#26493)
    • DAG Deps extends base_template (#26439)
    • Don't update backfill run from the scheduler (#26342)

    Doc only changes ^^^^^^^^^^^^^^^^

    • Clarify owner links document (#26515)
    • Fix invalid RST in dataset concepts doc (#26434)
    • Document the non-sensitive-only option for expose_config (#26507)
    • Fix example_datasets dag names (#26495)
    • Zip-like effect is now possible in task mapping (#26435)
    • Use task decorator in docs instead of classic operators (#25711)

    Airflow 2.4.0 (2022-09-19)

    Significant Changes ^^^^^^^^^^^^^^^^^^^

    Data-aware Scheduling and Dataset concept added to Airflow

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Bump apache-airflow from 2.1.2 to 2.3.4

    Bump apache-airflow from 2.1.2 to 2.3.4

    Bumps apache-airflow from 2.1.2 to 2.3.4.

    Release notes

    Sourced from apache-airflow's releases.

    Apache Airflow 2.3.4

    Bug Fixes

    • Disable attrs state management on MappedOperator (#24772)
    • Serialize pod_override to JSON before pickling executor_config (#24356)
    • Fix pid check (#24636)
    • Rotate session id during login (#25771)
    • Fix mapped sensor with reschedule mode (#25594)
    • Cache the custom secrets backend so the same instance gets re-used (#25556)
    • Add right padding (#25554)
    • Fix reducing mapped length of a mapped task at runtime after a clear (#25531)
    • Fix airflow db reset when dangling tables exist (#25441)
    • Change disable_verify_ssl behaviour (#25023)
    • Set default task group in dag.add_task method (#25000)
    • Removed interfering force of index. (#25404)
    • Remove useless logging line (#25347)
    • Adding mysql index hint to use index on task_instance.state in critical section query (#25673)
    • Configurable umask to all daemonized processes. (#25664)
    • Fix the errors raised when None is passed to template filters (#25593)
    • Allow wildcarded CORS origins (#25553)
    • Fix "This Session's transaction has been rolled back" (#25532)
    • Fix Serialization error in TaskCallbackRequest (#25471)
    • fix - resolve bash by absolute path (#25331)
    • Add __repr__ to ParamsDict class (#25305)
    • Only load distribution of a name once (#25296)
    • convert TimeSensorAsync target_time to utc on call time (#25221)
    • call updateNodeLabels after expandGroup (#25217)
    • Stop SLA callbacks gazumping other callbacks and DOS'ing the DagProcessorManager queue (#25147)
    • Fix invalidateQueries call (#25097)
    • airflow/www/package.json: Add name, version fields. (#25065)
    • No grid auto-refresh for backfill dag runs (#25042)
    • Fix tag link on dag detail page (#24918)
    • Fix zombie task handling with multiple schedulers (#24906)
    • Bind log server on worker to IPv6 address (#24755) (#24846)
    • Add %z for %(asctime)s to fix timezone for logs on UI (#24811)
    • TriggerDagRunOperator.operator_extra_links is attr (#24676)
    • Send DAG timeout callbacks to processor outside of prohibit_commit (#24366)
    • Don't rely on current ORM structure for db clean command (#23574)
    • Clear next method when clearing TIs (#23929)
    • Two typing fixes (#25690)

    Doc only changes

    • Update set-up-database.rst (#24983)
    • Fix syntax in mysql setup documentation (#24893 (#24939)
    • Note how DAG policy works with default_args (#24804)
    • Update PythonVirtualenvOperator Howto (#24782)
    • Doc: Add hyperlinks to Github PRs for Release Notes (#24532)

    Misc/Internal

    • Remove depreciation warning when use default remote tasks logging handlers (#25764)
    • clearer method name in scheduler_job.py (#23702)

    ... (truncated)

    Changelog

    Sourced from apache-airflow's changelog.

    Airflow 2.3.4 (2022-08-23)

    Significant Changes ^^^^^^^^^^^^^^^^^^^

    Added new config [logging]log_formatter_class to fix timezone display for logs on UI (#24811) """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

    If you are using a custom Formatter subclass in your [logging]logging_config_class, please inherit from airflow.utils.log.timezone_aware.TimezoneAware instead of logging.Formatter. For example, in your custom_config.py:

    .. code-block:: python

    from airflow.utils.log.timezone_aware import TimezoneAware
    

    before

    class YourCustomFormatter(logging.Formatter): ...

    after

    class YourCustomFormatter(TimezoneAware): ...

    AIRFLOW_FORMATTER = LOGGING_CONFIG["formatters"]["airflow"] AIRFLOW_FORMATTER["class"] = "somewhere.your.custom_config.YourCustomFormatter"

    or use TimezoneAware class directly. If you don't have custom Formatter.

    AIRFLOW_FORMATTER["class"] = "airflow.utils.log.timezone_aware.TimezoneAware"

    Bug Fixes ^^^^^^^^^

    • Disable attrs state management on MappedOperator (#24772)
    • Serialize pod_override to JSON before pickling executor_config (#24356)
    • Fix pid check (#24636)
    • Rotate session id during login (#25771)
    • Fix mapped sensor with reschedule mode (#25594)
    • Cache the custom secrets backend so the same instance gets re-used (#25556)
    • Add right padding (#25554)
    • Fix reducing mapped length of a mapped task at runtime after a clear (#25531)
    • Fix airflow db reset when dangling tables exist (#25441)
    • Change disable_verify_ssl behaviour (#25023)
    • Set default task group in dag.add_task method (#25000)
    • Removed interfering force of index. (#25404)
    • Remove useless logging line (#25347)
    • Adding mysql index hint to use index on task_instance.state in critical section query (#25673)
    • Configurable umask to all daemonized processes. (#25664)
    • Fix the errors raised when None is passed to template filters (#25593)

    ... (truncated)

    Commits
    • 88b274c Add release notes
    • 0d08201 Revert "Don't mistakenly take a lock on DagRun via ti.refresh_from_fb (#25312)"
    • c93c56c Revert "Added exception catching to send default email if template file raise...
    • e6f6fde Two typing fixes (#25690)
    • 7914c6c Disable attrs state management on MappedOperator (#24772)
    • 22faa65 Serialize pod_override to JSON before pickling executor_config (#24356)
    • 7a611a8 Remove depreciation warning when use default remote tasks logging handlers (#...
    • 6c29757 Breeze images regenerated
    • 5e21e5c Fix pid check (#24636)
    • acb94f2 clearer method name in scheduler_job.py (#23702)
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Bump apache-airflow from 2.1.2 to 2.2.4

    Bump apache-airflow from 2.1.2 to 2.2.4

    Bumps apache-airflow from 2.1.2 to 2.2.4.

    Release notes

    Sourced from apache-airflow's releases.

    Apache Airflow 2.2.4

    Bug Fixes

    • Adding missing login provider related methods from Flask-Appbuilder (#21294)
    • Fix slow DAG deletion due to missing dag_id index for job table (#20282)
    • Add a session backend to store session data in the database (#21478)
    • Show task status only for running dags or only for the last finished dag (#21352)
    • Use compat data interval shim in log handlers (#21289)
    • Fix mismatch in generated run_id and logical date of DAG run (#18707)
    • Fix TriggerDagRunOperator extra link (#19410)
    • Add possibility to create user in the Remote User mode (#19963)
    • Avoid deadlock when rescheduling task (#21362)
    • Fix the incorrect scheduling time for the first run of dag (#21011)
    • Fix Scheduler crash when executing task instances of missing DAG (#20349)
    • Deferred tasks does not cancel when DAG is marked fail (#20649)
    • Removed duplicated dag_run join in Dag.get_task_instances() (#20591)
    • Avoid unintentional data loss when deleting DAGs (#20758)
    • Fix session usage in /rendered-k8s view (#21006)
    • Fix airflow dags backfill --reset-dagruns errors when run twice (#21062)
    • Do not set TaskInstance.max_tries in refresh_from_task (#21018)
    • Don't require dag_id in body in dagrun REST API endpoint (#21024)
    • Add Roles from Azure OAUTH Response in internal Security Manager (#20707)
    • Allow Viewing DagRuns and TIs if a user has DAG "read" perms (#20663)
    • Fix running airflow dags test <dag_id> <execution_dt> results in error when run twice (#21031)
    • Switch to non-vendored latest connexion library (#20910)
    • Bump flask-appbuilder to >=3.3.4 (#20628)
    • upgrade celery to 5.2.3 (#19703)
    • Bump croniter from <1.1 to <1.2 (#20489)
    • Lift off upper bound for MarkupSafe (#20113)
    • Avoid calling DAG.following_schedule() for TaskInstance.get_template_context() (#20486)
    • Fix(standalone): Remove hardcoded Webserver port (#20429)
    • Remove unnecssary logging in experimental API (#20356)
    • Un-ignore DeprecationWarning (#20322)
    • Deepcopying Kubernetes Secrets attributes causing issues (#20318)
    • Fix(dag-dependencies): fix arrow styling (#20303)
    • Adds retry on taskinstance retrieval lock (#20030)
    • Correctly send timing metrics when using dogstatsd (fix schedule_delay metric) (#19973)
    • Enhance multiple_outputs inference of dict typing (#19608)
    • Fixing ses email backend (#18042)
    • Pin Markupsafe until we are able to upgrade Flask/Jinja (#21664)

    Doc only changes

    • Added explaining concept of logical date in DAG run docs (#21433)
    • Add note about Variable precedence with env vars (#21568)
    • Update error docs to include before_send option (#21275)
    • Augment xcom docs (#20755)
    • Add documentation and release policy on "latest" constraints (#21093)
    • Add a link to the DAG model in the Python API reference (#21060)
    • Added an enum param example (#20841)

    ... (truncated)

    Changelog

    Sourced from apache-airflow's changelog.

    Airflow 2.2.4, (2022-02-22)

    Significant Changes ^^^^^^^^^^^^^^^^^^^

    Smart sensors deprecated """"""""""""""""""""""""

    Smart sensors, an "early access" feature added in Airflow 2, are now deprecated and will be removed in Airflow 2.4.0. They have been superseded by Deferrable Operators, added in Airflow 2.2.0.

    See Migrating to Deferrable Operators <https://airflow.apache.org/docs/apache-airflow/2.2.4/concepts/smart-sensors.html#migrating-to-deferrable-operators>_ for details on how to migrate.

    Bug Fixes ^^^^^^^^^

    • Adding missing login provider related methods from Flask-Appbuilder (#21294)
    • Fix slow DAG deletion due to missing dag_id index for job table (#20282)
    • Add a session backend to store session data in the database (#21478)
    • Show task status only for running dags or only for the last finished dag (#21352)
    • Use compat data interval shim in log handlers (#21289)
    • Fix mismatch in generated run_id and logical date of DAG run (#18707)
    • Fix TriggerDagRunOperator extra link (#19410)
    • Add possibility to create user in the Remote User mode (#19963)
    • Avoid deadlock when rescheduling task (#21362)
    • Fix the incorrect scheduling time for the first run of dag (#21011)
    • Fix Scheduler crash when executing task instances of missing DAG (#20349)
    • Deferred tasks does not cancel when DAG is marked fail (#20649)
    • Removed duplicated dag_run join in Dag.get_task_instances() (#20591)
    • Avoid unintentional data loss when deleting DAGs (#20758)
    • Fix session usage in /rendered-k8s view (#21006)
    • Fix airflow dags backfill --reset-dagruns errors when run twice (#21062)
    • Do not set TaskInstance.max_tries in refresh_from_task (#21018)
    • Don't require dag_id in body in dagrun REST API endpoint (#21024)
    • Add Roles from Azure OAUTH Response in internal Security Manager (#20707)
    • Allow Viewing DagRuns and TIs if a user has DAG "read" perms (#20663)
    • Fix running airflow dags test <dag_id> <execution_dt> results in error when run twice (#21031)
    • Switch to non-vendored latest connexion library (#20910)
    • Bump flask-appbuilder to >=3.3.4 (#20628)
    • upgrade celery to 5.2.3 (#19703)
    • Bump croniter from <1.1 to <1.2 (#20489)
    • Avoid calling DAG.following_schedule() for TaskInstance.get_template_context() (#20486)
    • Fix(standalone): Remove hardcoded Webserver port (#20429)
    • Remove unnecessary logging in experimental API (#20356)
    • Un-ignore DeprecationWarning (#20322)
    • Deepcopying Kubernetes Secrets attributes causing issues (#20318)
    • Fix(dag-dependencies): fix arrow styling (#20303)
    • Adds retry on taskinstance retrieval lock (#20030)
    • Correctly send timing metrics when using dogstatsd (fix schedule_delay metric) (#19973)
    • Enhance multiple_outputs inference of dict typing (#19608)

    ... (truncated)

    Commits
    • ee9049c fixup! Add changelog for 2.2.4rc1
    • 01b909b Pin Markupsafe until we are able to upgrade Flask/Jinja (#21664)
    • eb87aeb Add changelog for 2.2.4rc1
    • 969a275 Clarify pendulum use in timezone cases (#21646)
    • 56d82fc added explaining concept of logical date in DAG run docs (#21433)
    • 8cbf934 Adding missing login provider related methods from Flask-Appbuilder (#21294)
    • 7e80127 Add note about Variable precedence with env vars (#21568)
    • 1cbad37 Reorder migrations to include bugfix in 2.2.4 (#21598)
    • 436f452 Fix slow DAG deletion due to missing dag_id index for job table (#20282)
    • dd0a3a3 update tutorial_etl_dag notes (#21503)
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • [LIMINAL-56] - Add default executor (k8s) for spark tasks

    [LIMINAL-56] - Add default executor (k8s) for spark tasks

    This PR adds support for running spark tasks with the k8s executor by default. Spark task is supported by two types of executors, emr, and k8s.
    To be able to have different executors support for one task, I have needed to change the way we register the tasks to the dag. The old flow in register_dags:

    for each task: 
        call task.apply_task_to_dag (...,executor)
            call executor.apply_task_to_dag (task,...)
    

    The new flow:

    for each task: 
          call executor.apply_task_to_dag (...,task)
              call task.apply_task_to_dag(...) + executor tasks
    

    In addition, I have added a default executor for tasks like job_end, job_start, etc...

    Added unit tests for the new logic

    opened by zionrubin 1
  • [LIMINAL-74] retrieve env var for k8s operator

    [LIMINAL-74] retrieve env var for k8s operator

    environment variables for k8s can be defined in a scope of an env in airflow.cfg https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#kubernetes

    Airflow Variables in MWAA can be defined manually through the AirflowUI. Therefore and based on the documentation we can use the env var for this case.

    The change in the _volume function is a missing part from other commit that got missed.

    opened by naturalett 1
  • Bump wheel from 0.36.2 to 0.38.1

    Bump wheel from 0.36.2 to 0.38.1

    Bumps wheel from 0.36.2 to 0.38.1.

    Changelog

    Sourced from wheel's changelog.

    Release Notes

    UNRELEASED

    • Updated vendored packaging to 22.0

    0.38.4 (2022-11-09)

    • Fixed PKG-INFO conversion in bdist_wheel mangling UTF-8 header values in METADATA (PR by Anderson Bravalheri)

    0.38.3 (2022-11-08)

    • Fixed install failure when used with --no-binary, reported on Ubuntu 20.04, by removing setup_requires from setup.cfg

    0.38.2 (2022-11-05)

    • Fixed regression introduced in v0.38.1 which broke parsing of wheel file names with multiple platform tags

    0.38.1 (2022-11-04)

    • Removed install dependency on setuptools
    • The future-proof fix in 0.36.0 for converting PyPy's SOABI into a abi tag was faulty. Fixed so that future changes in the SOABI will not change the tag.

    0.38.0 (2022-10-21)

    • Dropped support for Python < 3.7
    • Updated vendored packaging to 21.3
    • Replaced all uses of distutils with setuptools
    • The handling of license_files (including glob patterns and default values) is now delegated to setuptools>=57.0.0 (#466). The package dependencies were updated to reflect this change.
    • Fixed potential DoS attack via the WHEEL_INFO_RE regular expression
    • Fixed ValueError: ZIP does not support timestamps before 1980 when using SOURCE_DATE_EPOCH=0 or when on-disk timestamps are earlier than 1980-01-01. Such timestamps are now changed to the minimum value before packaging.

    0.37.1 (2021-12-22)

    • Fixed wheel pack duplicating the WHEEL contents when the build number has changed (#415)
    • Fixed parsing of file names containing commas in RECORD (PR by Hood Chatham)

    0.37.0 (2021-08-09)

    • Added official Python 3.10 support
    • Updated vendored packaging library to v20.9

    ... (truncated)

    Commits
    • 6f1608d Created a new release
    • cf8f5ef Moved news item from PR #484 to its proper place
    • 9ec2016 Removed install dependency on setuptools (#483)
    • 747e1f6 Fixed PyPy SOABI parsing (#484)
    • 7627548 [pre-commit.ci] pre-commit autoupdate (#480)
    • 7b9e8e1 Test on Python 3.11 final
    • a04dfef Updated the pypi-publish action
    • 94bb62c Fixed docs not building due to code style changes
    • d635664 Updated the codecov action to the latest version
    • fcb94cd Updated version to match the release
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • fix k8s secret doc

    fix k8s secret doc

    Changes Title (replace this with a logical title for your changes)

    Issue #, if available:

    Description of changes:

    Replace this with the PR description (mention the changes you have made, why you have made them, provide some background and any references to the provider documentation if needed, etc.).

    For more information on contributing, please see Contributing section of our documentation.

    Status

    Replace this: describe the PR status. Examples:

    • WIP - work in progress
    • DONE - ready for review

    Checklist (tick everything that applies)

    opened by naturalett 0
  • Bump apache-airflow from 2.1.2 to 2.4.3

    Bump apache-airflow from 2.1.2 to 2.4.3

    Bumps apache-airflow from 2.1.2 to 2.4.3.

    Release notes

    Sourced from apache-airflow's releases.

    Apache Airflow 2.4.3

    Bug Fixes

    • Fix double logging with some task logging handler (#27591)
    • Replace FAB url filtering function with Airflow's (#27576)
    • Fix mini scheduler expansion of mapped task (#27506)
    • SLAMiss is nullable and not always given back when pulling task instances (#27423)
    • Fix behavior of _ when searching for DAGs (#27448)
    • Fix getting the dag/task ids from BaseExecutor (#27550)
    • Fix SQLAlchemy primary key black-out error on DDRQ (#27538)
    • Fix IntegrityError during webserver startup (#27297)
    • Add case insensitive constraint to username (#27266)
    • Fix python external template keys (#27256)
    • Reduce extraneous task log requests (#27233)
    • Make RotatingFilehandler used in DagProcessor non-caching (#27223)
    • Listener: Set task on SQLAlchemy TaskInstance object (#27167)
    • Fix dags list page auto-refresh & jump search null state (#27141)
    • Set executor.job_id to BackfillJob.id for backfills (#27020)

    Misc/Internal

    • Bump loader-utils from 1.4.0 to 1.4.1 in /airflow/www (#27552)
    • Reduce log level for k8s TCP_KEEPALIVE etc warnings (#26981)

    Doc only changes

    • Use correct executable in docker compose docs (#27529)
    • Fix wording in DAG Runs description (#27470)
    • Document that KubernetesExecutor overwrites container args (#27450)
    • Fix BaseOperator links (#27441)
    • Correct timer units to seconds from milliseconds. (#27360)
    • Add missed import in the Trigger Rules example (#27309)
    • Update SLA wording to reflect it is relative to Dag Run start. (#27111)
    • Add kerberos environment variables to the docs (#27028)

    Apache Airflow 2.4.2

    Bug Fixes

    • Make tracebacks opt-in (#27059)
    • Add missing AUTOINC/SERIAL for FAB tables (#26885)
    • Add separate error handler for 405(Method not allowed) errors (#26880)
    • Don't re-patch pods that are already controlled by current worker (#26778)
    • Handle mapped tasks in task duration chart (#26722)
    • Fix task duration cumulative chart (#26717)
    • Avoid 500 on dag redirect (#27064)
    • Filter dataset dependency data on webserver (#27046)
    • Remove double collection of dags in airflow dags reserialize (#27030)
    • Fix auto refresh for graph view (#26926)
    • Don't overwrite connection extra with invalid json (#27142)
    • Fix next run dataset modal links (#26897)
    • Change dag audit log sort by date from asc to desc (#26895)
    • Bump min version of jinja2 (#26866)
    • Add missing colors to state_color_mapping jinja global (#26822)
    • Fix running debuggers inside airflow tasks test (#26806)

    ... (truncated)

    Changelog

    Sourced from apache-airflow's changelog.

    Airflow 2.4.3 (2022-11-14)

    Significant Changes ^^^^^^^^^^^^^^^^^^^

    Make RotatingFilehandler used in DagProcessor non-caching (#27223) """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

    In case you want to decrease cache memory when CONFIG_PROCESSOR_MANAGER_LOGGER=True, and you have your local settings created before, you can update processor_manager_handler to use airflow.utils.log.non_caching_file_handler.NonCachingRotatingFileHandler handler instead of logging.RotatingFileHandler. (#27065)

    Bug Fixes ^^^^^^^^^

    • Fix double logging with some task logging handler (#27591)
    • Replace FAB url filtering function with Airflow's (#27576)
    • Fix mini scheduler expansion of mapped task (#27506)
    • SLAMiss is nullable and not always given back when pulling task instances (#27423)
    • Fix behavior of _ when searching for DAGs (#27448)
    • Fix getting the dag/task ids from BaseExecutor (#27550)
    • Fix SQLAlchemy primary key black-out error on DDRQ (#27538)
    • Fix IntegrityError during webserver startup (#27297)
    • Add case insensitive constraint to username (#27266)
    • Fix python external template keys (#27256)
    • Reduce extraneous task log requests (#27233)
    • Make RotatingFilehandler used in DagProcessor non-caching (#27223)
    • Listener: Set task on SQLAlchemy TaskInstance object (#27167)
    • Fix dags list page auto-refresh & jump search null state (#27141)
    • Set executor.job_id to BackfillJob.id for backfills (#27020)

    Misc/Internal ^^^^^^^^^^^^^

    • Bump loader-utils from 1.4.0 to 1.4.1 in /airflow/www (#27552)
    • Reduce log level for k8s TCP_KEEPALIVE etc warnings (#26981)

    Doc only changes ^^^^^^^^^^^^^^^^

    • Use correct executable in docker compose docs (#27529)
    • Fix wording in DAG Runs description (#27470)
    • Document that KubernetesExecutor overwrites container args (#27450)
    • Fix BaseOperator links (#27441)
    • Correct timer units to seconds from milliseconds. (#27360)
    • Add missed import in the Trigger Rules example (#27309)
    • Update SLA wording to reflect it is relative to Dag Run start. (#27111)
    • Add kerberos environment variables to the docs (#27028)

    Airflow 2.4.2 (2022-10-23)

    Significant Changes

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump pyspark from 3.1.3 to 3.2.2 in /examples/spark-app-demo/k8s

    Bump pyspark from 3.1.3 to 3.2.2 in /examples/spark-app-demo/k8s

    Bumps pyspark from 3.1.3 to 3.2.2.

    Commits
    • 78a5825 Preparing Spark release v3.2.2-rc1
    • ba978b3 [SPARK-39099][BUILD] Add dependencies to Dockerfile for building Spark releases
    • 001d8b0 [SPARK-37554][BUILD] Add PyArrow, pandas and plotly to release Docker image d...
    • 9dd4c07 [SPARK-37730][PYTHON][FOLLOWUP] Split comments to comply pycodestyle check
    • bc54a3f [SPARK-37730][PYTHON] Replace use of MPLPlot._add_legend_handle with MPLPlot....
    • c5983c1 [SPARK-38018][SQL][3.2] Fix ColumnVectorUtils.populate to handle CalendarInte...
    • 32aff86 [SPARK-39447][SQL][3.2] Avoid AssertionError in AdaptiveSparkPlanExec.doExecu...
    • be891ad [SPARK-39551][SQL][3.2] Add AQE invalid plan check
    • 1c0bd4c [SPARK-39656][SQL][3.2] Fix wrong namespace in DescribeNamespaceExec
    • 3d084fe [SPARK-39677][SQL][DOCS][3.2] Fix args formatting of the regexp and like func...
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Owner
The Apache Software Foundation
The Apache Software Foundation
Model factory is a ML training platform to help engineers to build ML models at scale

Model Factory Machine learning today is powering many businesses today, e.g., search engine, e-commerce, news or feed recommendation. Training high qu

null 16 Sep 23, 2022
Tangram makes it easy for programmers to train, deploy, and monitor machine learning models.

Tangram Website | Discord Tangram makes it easy for programmers to train, deploy, and monitor machine learning models. Run tangram train to train a mo

Tangram 1.4k Jan 5, 2023
Model Validation Toolkit is a collection of tools to assist with validating machine learning models prior to deploying them to production and monitoring them after deployment to production.

Model Validation Toolkit is a collection of tools to assist with validating machine learning models prior to deploying them to production and monitoring them after deployment to production.

FINRA 25 Dec 28, 2022
MiniTorch - a diy teaching library for machine learning engineers

This repo is the full student code for minitorch. It is designed as a single repo that can be completed part by part following the guide book. It uses

null 1.1k Jan 7, 2023
Uber Open Source 1.6k Dec 31, 2022
easyNeuron is a simple way to create powerful machine learning models, analyze data and research cutting-edge AI.

easyNeuron is a simple way to create powerful machine learning models, analyze data and research cutting-edge AI.

Neuron AI 5 Jun 18, 2022
#30DaysOfStreamlit is a 30-day social challenge for you to build and deploy Streamlit apps.

30 Days Of Streamlit ?? This is the official repo of #30DaysOfStreamlit — a 30-day social challenge for you to learn, build and deploy Streamlit apps.

Streamlit 53 Jan 2, 2023
Hypernets: A General Automated Machine Learning framework to simplify the development of End-to-end AutoML toolkits in specific domains.

A General Automated Machine Learning framework to simplify the development of End-to-end AutoML toolkits in specific domains.

DataCanvas 216 Dec 23, 2022
Falken provides developers with a service that allows them to train AI that can play their games

Falken provides developers with a service that allows them to train AI that can play their games. Unlike traditional RL frameworks that learn through rewards or batches of offline training, Falken is based on training AI via realtime, human interactions.

Google Research 223 Jan 3, 2023
Exemplary lightweight and ready-to-deploy machine learning project

Exemplary lightweight and ready-to-deploy machine learning project

snapADDY GmbH 6 Dec 20, 2022
Microsoft Machine Learning for Apache Spark

Microsoft Machine Learning for Apache Spark MMLSpark is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark

Microsoft Azure 3.9k Dec 30, 2022
An easier way to build neural search on the cloud

Jina is geared towards building search systems for any kind of data, including text, images, audio, video and many more. With the modular design & multi-layer abstraction, you can leverage the efficient patterns to build the system by parts, or chaining them into a Flow for an end-to-end experience.

Jina AI 17k Jan 1, 2023
Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models. Solve a variety of tasks with pre-trained models or finetune them in

Backprop 227 Dec 10, 2022
Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.

Hivemind: decentralized deep learning in PyTorch Hivemind is a PyTorch library to train large neural networks across the Internet. Its intended usage

null 1.3k Jan 8, 2023
The easy way to combine mlflow, hydra and optuna into one machine learning pipeline.

mlflow_hydra_optuna_the_easy_way The easy way to combine mlflow, hydra and optuna into one machine learning pipeline. Objective TODO Usage 1. build do

shibuiwilliam 9 Sep 9, 2022
Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost Models on Time Series data sets with a Single Line of Code. Now updated with Dask to handle millions of rows.

Auto_TS: Auto_TimeSeries Automatically build multiple Time Series models using a Single Line of Code. Now updated with Dask. Auto_timeseries is a comp

AutoViz and Auto_ViML 519 Jan 3, 2023
A collection of interactive machine-learning experiments: 🏋️models training + 🎨models demo

?? Interactive Machine Learning experiments: ??️models training + ??models demo

Oleksii Trekhleb 1.4k Jan 6, 2023
BigDL: Distributed Deep Learning Framework for Apache Spark

BigDL: Distributed Deep Learning on Apache Spark What is BigDL? BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can w

null 4.1k Jan 9, 2023
MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine Learning work with thousands of other users.

The collaboration platform for Machine Learning MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine

MLReef 1.4k Dec 27, 2022