Data Engineering ZoomCamp
I'm partaking in a Data Engineering Bootcamp / Zoomcamp and will be tracking my progress here. I can't promise these notes will be neat and tidy, but I hope they can help anyone who is working through this bootcamp.
I'll aim to document any problems or errors I come across during my journey, and describe concepts that I found tricky.
Each week I'll work through a series of videos and follow this up with homework exercises.
The Task
The goal is to develop a data pipeline following the architecture below. We will be looking at New York City Taxi data!
Tools
We'll use a range of tools:
- Google Cloud Platform (GCP): Cloud-based auto-scaling platform by Google
- Google Cloud Storage (GCS): Data Lake
- BigQuery: Data Warehouse
- Terraform: Infrastructure-as-Code (IaC)
- Docker: Containerization
- SQL: Data Analysis & Exploration
- Airflow: Pipeline Orchestration
- DBT: Data Transformation
- Spark: Distributed Processing
- Kafka: Streaming
Progress
-
PostgreSQL | Terraform | Docker | Google Cloud Platform
This week was a lot of setup, and a lot of work! Here I was introduced to Docker - a framework for managing containers. I created some containers for PostgreSQL and PgAdmin, before finally creating my own image, which when run, created and populated tables within my PostgreSQL database.
Next up I learned a bit about Google Cloud Platform (GCP), which is suite of Google Cloud Computing resources. Here I setup a service account (more or less a user account for service running in GCP and even setup a Virtual Machine, and connected to it using SSH right from my terminal.
I was also introduced to Terraform - an infrastructure-as-code tool. I used this to generate some stuff on GCP - Big Query and Google Cloud Storage - from a simple script.
I enjoyed this week, although it was heavy going. A lot of late nights trying to understand new concepts and fix unexpected bugs. Although I'm by no means an expert in any of these tools, I do feel more confident in understanding and utilsing them.
-
This week I'm learning about Airflow!
-
Week 3: Pending...
-
Week 4: Pending...
-
Week 5: Pending...
-
Week 6: Pending...