Project repository of Apache Airflow, deployed on Docker in Amazon EC2 via GitLab.

Overview

Airflow on Docker in EC2 + GitLab's CI/CD

Personal project for simple data pipeline using Airflow. Airflow will be installed inside Docker container, which will be deployed in Amazon's EC2. For continuous integration and continuous deployment (CI/CD), GitLab will be used.

Steps

  1. Set up EC2 server. Choose ubuntu. Download the ssh key (with .pem suffix). Open ports of the EC2 instance, especially port 8080.
  2. SSH inside your EC2 server. Install Docker and docker-compose.
  3. Create GitLab account. Create new repository on GitLab, and push this repository there.
  4. On your GitLab's project page, open Settings > CI/CD > Repository Variables. Configure several variables:
    • _AIRFLOW_WWW_USER_PASSWORD -> Arbitrary password for Airflow (Variable)
    • _AIRFLOW_WWW_USER_USERNAME -> Arbitrary username for Airflow (Variable)
    • EC2_ADDRESS -> IP address of your EC2 host (Variable)
    • GITLAB_PASSWORD -> GitLab password (Variable)
    • GITLAB_USERNAME -> GitLab username (Variable)
    • SSH_KEY_EC2 -> Your SSH key (with .pem suffix that you downloaded earlier) (File)
  5. Configure GitLab's runner for CI/CD
  6. Open gitlab-ci.yml, change line 25 and 26 with your email (that registered on GitLab) and Name. If your user name in EC2 is not default (ubuntu), change ubuntu in ubuntu@EC2_ADDRESS with the correct username
  7. Run CI/CD pipeline, check if the code is deployed properly.
  8. SSH to the server to inspect if anything goes wrong
  9. Open the Airflow UI in browser on EC2_IP_ADDRESS:8080
You might also like...
Time tracking program that will format output to be easily put into Gitlab

time_tracker Time tracking program that will format output to be easily put into Gitlab. Feel free to branch and use it yourself! Getting Started Clon

Retrying is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just about anything.

Retrying Retrying is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just

Apache Superset out of box version(Windows 64-bit)

superset_app Apache Superset out of box version (Windows 64bit) prepare job download 3 files python-3.8.10-embed-amd64.zip get-pip.py python_geohash‑0

The Python agent for Apache SkyWalking
The Python agent for Apache SkyWalking

SkyWalking Python Agent SkyWalking-Python: The Python Agent for Apache SkyWalking, which provides the native tracing abilities for Python project. Sky

Make dbt docs and Apache Superset talk to one another
Make dbt docs and Apache Superset talk to one another

dbt-superset-lineage Make dbt docs and Apache Superset talk to one another Why do I need something like this? Odds are rather high that you use dbt to

Python-geoarrow - Storing geometry data in Apache Arrow format

geoarrow Storing geometry data in Apache Arrow format Installation $ pip install

Easy installer for running Amazon AVS Device SDK on Raspberry Pi

avs-device-sdk-pi Scripts to enable Alexa voice activation using Picovoice Porcupine If you like the work, find it useful and if you would like to get

Amazon SageMaker Delta Sharing Examples

This repository contains examples and related resources showing you how to preprocess, train, and serve your models using Amazon SageMaker with data fetched from Delta Lake.

Python for downloading model data (HRRR, RAP, GFS, NBM, etc.) from NOMADS, NOAA's Big Data Program partners (Amazon, Google, Microsoft), and the University of Utah Pando Archive System.
Python for downloading model data (HRRR, RAP, GFS, NBM, etc.) from NOMADS, NOAA's Big Data Program partners (Amazon, Google, Microsoft), and the University of Utah Pando Archive System.

Python for downloading model data (HRRR, RAP, GFS, NBM, etc.) from NOMADS, NOAA's Big Data Program partners (Amazon, Google, Microsoft), and the University of Utah Pando Archive System.

Owner
Ammar Chalifah
Comic artist and a biomedical engineering student from Indonesia.
Ammar Chalifah
This is a practice on Airflow, which is building virtual env, installing Airflow and constructing data pipeline (DAGs)

airflow-test This is a practice on Airflow, which is Builing virtualbox env and setting Airflow on that env Installing Airflow using python virtual en

Jaeyoung 1 Nov 1, 2021
Yet another Airflow plugin using CLI command as RESTful api, supports Airflow v2.X.

中文版文档 Airflow Extended API Plugin Airflow Extended API, which export airflow CLI command as REST-ful API to extend the ability of airflow official API

Eric Cao 106 Nov 9, 2022
A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!

Streamify A data pipeline with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more! Description Objective The project will stre

Ankur Chavda 206 Dec 30, 2022
A tutorial presents several practical examples of how to build DAGs in Apache Airflow

Apache Airflow - Python Brasil 2021 Este tutorial apresenta vários exemplos práticos de como construir DAGs no Apache Airflow. Background Apache Airfl

Jusbrasil 14 Jun 3, 2022
A docker container (Docker Desktop) for a simple python Web app few unit tested

Short web app using Flask, tested with unittest on making massive requests, responses of the website, containerized

Omar 1 Dec 13, 2021
Airflow Operator for running Soda SQL scans

Airflow Operator for running Soda SQL scans

Todd de Quincey 7 Oct 18, 2022
An Airflow operator to call the main function from the dbt-core Python package

airflow-dbt-python An Airflow operator to call the main function from the dbt-core Python package Motivation Airflow running in a managed environment

Tomás Farías Santana 93 Jan 8, 2023
Repositório para estudo do airflow

airflow-101 Repositório para estudo do airflow Docker criado baseado no tutorial Exemplo de API da pokeapi Para executar clone o repo execute as confi

Gabriel (Gabu) Bellon 1 Nov 23, 2021
A reproduction repo for a Scheduling bug in AirFlow 2.2.3

A reproduction repo for a Scheduling bug in AirFlow 2.2.3

Ilya Strelnikov 1 Feb 9, 2022
Gitlab py scripts

Gitlab py scripts The code can be used to gather the list of GitHub groups/projects and the permissions of the users in those groups/projects. group/p

Roghuchi 1 Aug 29, 2022