Create postgres tables from CSV files
This first section is only relate to creating tables from CSV files using postgres container alone. Just one of my experiments. If you interest, you can just follow these steps (only if your working environment support bash):
sh scripts/prep.sh
The prep.sh will handle everything for you by doing follwing:
# Start postgres db container
docker-compose -f postgres.yaml up -d
# Sleep to make sure the container is fully up running
sleep 3
# I have problem with mouting csv files via docker compose, so here we go
# Copy csv and setup.sql to create required tables
docker cp ./csv/ my_postgres:
docker cp ./scripts/setup.sql my_postgres:setup.sql
# Execute the script in postgres db
docker exec -it my_postgres psql -p5432 --dbname=postgres --username=postgres --file=setup.sql
# Shutdown the container
docker-compose -f postgres.yaml down --remove-orphans
I had problem with mount volumn that I can't mount the files under csv and scripts folders. Which still can be improved with a proper mount. But let's skip it for now to save time.
Initial Setup/Start Airflow container
This section will use a separate docker-compose.yaml than the above test. It will be relate due to the fact that we want to use airflow to schedule the tasks above (create table and load data). To do so, do the following. First prepare folders. You can call a new folder specifically for this if you want.
# (optional) mkdir airflow && cd airflow
mkdir ./dags ./logs ./plugins
Next we need the airflow docker-compose.yaml in our airflow directory
curl -O https://airflow.apache.org/docs/apache-airflow/stable/docker-compose.yaml
Next make sure we will have a proper permission to initial Airflow
echo -e "AIRFLOW_UID=$(id -u)\nAIRFLOW_GID=0" > .env
Then we must initial Airflow instance
docker-compose up airflow-init
Wait until the initial finished then (you can use -d to detach if you want)
docker-compose up
Now you will be able to connect to Airflow GUI via http://localhost:8080/
Create Airflow DAG task
First thing, you need to setup connection for postgres database. Go to tab Admin > Connection > +, wow you have to fill details of the connection:
Connection Id: postgres_default
Connection Type: 'Postgres'
Host:
Schema: postgres (default)
Login:
Password:
Port:
Click "Test" button to check your connection then save. Now click at the Airflow icon to return to home page. You should see task name "create_postgres_tables". Try to run by clicking start button select "Trigger DAG".