This is a practice on Airflow, which is building virtual env, installing Airflow and constructing data pipeline (DAGs)

Overview

airflow-test

This is a practice on Airflow, which is

First of all, build virtual env using virtualbox

This is optional but highly recommended

  • It is very important to make sure that development env is independent and has no conflict with other env setting

Next, make python virtual env on proper directory

python3 -m venv sandbox
source ~/sandbox/bin/activate

Let start to install Airflow and initial setting

Install Airflow

pip3 install apache-airflow==2.1.0 \ ## airflow version depends on when you install and constraint 
  --constraint 'git address' ## constraint is ???. It is optional. refer to https://gist.github.com/marclamberti

Airflow Inital Setting

airflow db init ## airflow metadata initializing
airflow users create -u admin -p admin -f jaeyoung -l jang -r Admin -e [email protected] ## create admin user

airflow webserver ## launch web server. after that we can access airflow web server through localhost:8080
airflow scheduler ## activate airfow scheduler to run DAGs

Construct data pipeline with Airflow

One example of data pipeline (one DAGs) - let follow below steps and make data pipeline

  • creating table -> is-api-available -> extending -> processing-use ->
  • what is it? when new user comes in, sense that and process and store user info to destination (?)
  1. create table
  • make python script user_processing.py in dags folder
  • let create table using sqlite operator in user_processing.py
from airflow.models import DAG
from airflow.providers.sqlite.operators.sqlite import SqliteOperator

from datetime import datetime

default_args = {
    'start_date': datetime(2020, 1, 1)
}

with DAG('user_processing', schedule_interval='@daily', default_args=default_args, catchup=False) as dag:
    # Define tasks/operators

    creating_talbe = SqliteOperator(
        task_id='creating_table',
        sqlite_conn_id='db_sqlite',
        sql='''
            CREATE TABLE users (
                firstname TEXT NOT NULL,
                lastname TEXT NOT NULL,
                country TEXT NOT NULL,
                username TEXT NOT NULL,
                password TEXT NOT NULL,
                email TEXT NOT NULL PRIMARY KEY
            );
            '''
    )
  • set sqlite connection on Airflow Web UI
pip install apache-airflow-providers-sqlite ## install airflow provider of sqlite. refer airflow doc. may already be installed from intial installing Airflow..
# next, make a new sqltie connection named 'db_sqlite' in Airflow
  • after that, do test airflow tasks!

airflow tasks test user_processing creating_table 2020-01-01

  • one more things to do just for check again
sqlite3 airflow.db ## connect to sqlite3 in airflow.db
sqlite>.tables; ## see the tables in airflow.dbb
sqlite>select * form users; ## test query
You might also like...
🤖🤖 Jarvis is an virtual assistant which can some tasks easy for you like surfing on web opening an app and much more... 🤖🤖

Jarvis 🤖 🤖 Jarvis is an virtual assistant which can some tasks easy for you like surfing on web opening an app and much more... 🤖 🤖 Developer : su

An open-source Python project series where beginners can contribute and practice coding.
An open-source Python project series where beginners can contribute and practice coding.

Python Mini Projects A collection of easy Python small projects to help you improve your programming skills. Table Of Contents Aim Of The Project Cont

Airflow Operator for running Soda SQL scans

Airflow Operator for running Soda SQL scans

Project repository of Apache Airflow, deployed on Docker in Amazon EC2 via GitLab.

Airflow on Docker in EC2 + GitLab's CI/CD Personal project for simple data pipeline using Airflow. Airflow will be installed inside Docker container,

An Airflow operator to call the main function from the dbt-core Python package

airflow-dbt-python An Airflow operator to call the main function from the dbt-core Python package Motivation Airflow running in a managed environment

Repositório para estudo do airflow

airflow-101 Repositório para estudo do airflow Docker criado baseado no tutorial Exemplo de API da pokeapi Para executar clone o repo execute as confi

A reproduction repo for a Scheduling bug in AirFlow 2.2.3
A reproduction repo for a Scheduling bug in AirFlow 2.2.3

A reproduction repo for a Scheduling bug in AirFlow 2.2.3

A practice program to find the LCM i.e Lowest Common Multiplication of two numbers using python without library.

Finding-LCM-using-python-from-scratch Here, I write a practice program to find the LCM i.e Lowest Common Multiplication of two numbers using python wi

Repository for DNN training, theory to practice, part of the Large Scale Machine Learning class at Mines Paritech

DNN Training, from theory to practice This repository is complementary to the deep learning training lesson given to les Mines ParisTech on the 11th o

Owner
Jaeyoung
Jaeyoung
Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.

Viewflow Viewflow is a framework built on the top of Airflow that enables data scientists to create materialized views. It allows data scientists to f

DataCamp 114 Oct 12, 2022
Yet another Airflow plugin using CLI command as RESTful api, supports Airflow v2.X.

中文版文档 Airflow Extended API Plugin Airflow Extended API, which export airflow CLI command as REST-ful API to extend the ability of airflow official API

Eric Cao 106 Nov 9, 2022
Backtest framework based on DAGs

MultitaskQueue It's a simple framework based on three composed concepts: Task: A task is the smaller unit of execution or simple a node in the DAG, ev

null 4 Dec 9, 2021
Virtual webcam that takes real webcam footage and replaces the background in order to have Virtual Backgrounds in MS Teams for Linux where the feature is unimplemented.

Background Remover The Need It's been good long while since Microsoft first released a Teams version for Linux and yet, one of Teams' coolest features

Dylan Turner 80 Dec 20, 2022
A python script to simplify recompiling, signing and installing reverse engineered android apps.

urszi.py A python script to simplify the Uninstall Recompile Sign Zipalign Install cycle when reverse engineering Android applications. It checks if d

Ahmed Harmouche 4 Jun 24, 2022
B-Pkg is a simple tool in python for installing all basic package in termux

Basic-Pkg ???? Basic-Pkg ???? B-Pkg is a simple tool in python for installing all basic package in termux This is my first tool, I hope you will like

Macgaiver 3 Oct 21, 2021
Python 3 script for installing kali tools on your linux machine

Python 3 script for installing kali tools on your linux machine

gh0st 2 Apr 20, 2022
Data Structures and Algorithms Python - Practice data structures and algorithms in python with few small projects

Data Structures and Algorithms All the essential resources and template code nee

Hesham 13 Dec 1, 2022
A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!

Streamify A data pipeline with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more! Description Objective The project will stre

Ankur Chavda 206 Dec 30, 2022
The RAP community of practice includes all analysts and data scientists who are interested in adopting the working practices included in reproducible analytical pipelines (RAP) at NHS Digital.

The RAP community of practice includes all analysts and data scientists who are interested in adopting the working practices included in reproducible analytical pipelines (RAP) at NHS Digital.

NHS Digital 50 Dec 22, 2022