This is a repository for the Duke University Cloud Computing course project on Serveless Data Engineering Pipeline. For this project, I recreated the below pipeline.

Last update: Jul 28, 2021

Related tags

Third-party APIs Wrappers AWS-Serverless-Data-Engineering-Pipeline

Overview

AWS Data Engineering Pipeline

This is a repository for the Duke University Cloud Computing course project on Serverless Data Engineering Pipeline. For this project, I recreated the below pipeline in iCloud9 (reference: https://github.com/noahgift/awslambda):

Below are the steps of how to build this pipeline in AWS:

1️⃣ Create a new iCloud9 environment dedicated to this project.

🤔 Need a refresher? Please check this repo.

⚠️ Make sure to use name as your unique id for your items in the fang table.

2️⃣ Create a fang table in DynamoDB and SQS queue.

You can check how to do it here.

3️⃣ Build producer Lambda Function

In iCloud9, initialize a serverless application with SAM template:
```
sam init 
```

Inputs: 1, 2, 4, "producer"

Set virtual environment and source it:

# I called my virtual environment "comprehendProducer"
python3 -m venv ~/.comprehendProducer
source ~/.comprehendProducer/bin/activate

Add the code for your application to app.py
Add relevant packages used in your app to requirements.txt file

Install requirements

 cd hello_world/
 pip install -r requirements.txt 
 cd ..

Create a repository (producer) in Elastic Container Registry (ECR) and copy its URI
Build and deploy your serverless application:
```
sam build 
sam deploy --guided
```
When prompted to input URI, paste the URI for the producer repository that you've just created.
Create IAM Role granting Administrator Access to the Producer Lambda function.

🤔 Not sure how to create IAM Role? Check out this video (17 min ).
Add the execution role that you created to the Producer Lambda function.

In case you forgot how to do it:

In AWS console: Lambda ➡️ click on producer function ➡️ configuration ➡️ permissions ➡️ Edit ➡️ Select the role under Existing role.
You are all set with the producer function! Now deactivate virtual environment:
```
deactivate 
cd .. 
```

4️⃣ Create an S3 bucket and note its name

5️⃣ Build consumer Lambda Function

Repeat steps in 3️⃣ .

⚠️ In #3 when you add the code for a consumer app to app.py, make sure to replace bucket="fangsentiment" with the name of your S3 bucket.

6️⃣ Add triggers to Lambda Functions

🤔 Not sure how to do it? Check out this video (start times are noted below):

Producer Lambda Function: CloudWatchEvent(30 min)

Consumer Lambda Function: SQS (42 min)

7️⃣ If all goes well, you will see sentiment results in your S3 bucket:

💡 Tip: If you've already deployed your Lambda function but need to edit your application, you can make the necessary edits to your app and build and deploy the app again:

sam build && sam deploy

💡 Tip: If you don't have space left on disk, you may want to remove a few docker containers that you don't use.

#list containers 
docker image ls 
# remove a container 
docker image rm <containerId>

Discord bot developed by Delhi University Student Community!

DUSC-Bot Discord bot developed by Delhi University Student Community! Libraries Used Pycord - Documentation Features Can purge messages in bulk Drop-D

1 Jan 29, 2022

This repository contains code written in the AWS Cloud Development Kit (CDK)

This repository contains code written in the AWS Cloud Development Kit (CDK) which launches infrastructure across two different regions to demonstrate using AWS AppSync in a multi-region setup.

5 Jun 3, 2022

Hermes Bytecode Reverse Engineering Tool (Assemble/Disassemble Hermes Bytecode)

hbctool A command-line interface for disassembling and assembling the Hermes Bytecode. Since the React Native team created their own JavaScript engine

216 Jan 3, 2023

Info & tools for reverse engineering the M6 smart fitness band

m6-reveng This repo contains information and tools for reverse engineering the $7 M6 smart fitness band. Hardware The SoC (system-on-a-chip) is a Teli

41 Dec 26, 2022

Reverse engineering multi-device WhatsApp Web.

whatsapp-web-multi-device-reveng In this repository, the research for reverse engineering multi-device WhatsApp Web takes place, see here for a descri

84 Jan 1, 2023

Intelligent Trading Bot: Automatically generating signals and trading based on machine learning and feature engineering

326 Jan 3, 2023

This is my personal version of Pac-Man using python, which is the first assignment of EA Software Engineering Virtual Experience Program from Forage.com

Vac-Man in Python This is my personal version of Vax-man game using python, which is the first task of EA Software Engineering Virtual Experience Prog

3 Jan 5, 2022

A python to scratch API connector. Can fetch data from the API and send it back in cloud variables.

Scratch2py Scratch2py or S2py is a easy to use, versatile tool to communicate with the Scratch API Based of scratchclient by Raihan142857 Installation

20 Jun 18, 2022

Python script for download course from platzi.com

Platzi Downloader Tool Esta es una pequeña herramienta que hace mucho y que te ahorra una gran cantidad de trabajo a la hora de descargar cursos de Pl

21 Sep 22, 2022

This is a repository for the Duke University Cloud Computing course project on Serveless Data Engineering Pipeline. For this project, I recreated the below pipeline.

Related tags

Overview

AWS Data Engineering Pipeline

You might also like...

Discord bot developed by Delhi University Student Community!

This repository contains code written in the AWS Cloud Development Kit (CDK)

Hermes Bytecode Reverse Engineering Tool (Assemble/Disassemble Hermes Bytecode)

Info & tools for reverse engineering the M6 smart fitness band

Reverse engineering multi-device WhatsApp Web.

Intelligent Trading Bot: Automatically generating signals and trading based on machine learning and feature engineering

This is my personal version of Pac-Man using python, which is the first assignment of EA Software Engineering Virtual Experience Program from Forage.com

A python to scratch API connector. Can fetch data from the API and send it back in cloud variables.

Python script for download course from platzi.com

Owner

Cloud-native, data onboarding architecture for the Google Cloud Public Datasets program

Nasdaq Cloud Data Service (NCDS) provides a modern and efficient method of delivery for realtime exchange data and other financial information. This repository provides an SDK for developing applications to access the NCDS.

💻 A fully functional local AWS cloud stack. Develop and test your cloud & Serverless apps offline!

Prisma Cloud utility scripts, and a Python SDK for Prisma Cloud APIs.

Python client for using Prefect Cloud with Saturn Cloud

Discord Bot that leverages the idea of nested containers using podman, runs untrusted user input, executes Quantum Circuits, allows users to refer to the Qiskit Documentation, and provides the ability to search questions on the Quantum Computing StackExchange.

The scope of this project will be to build a data ware house on Google Cloud Platform that will help answer common business questions as well as powering dashboards

PESU Academy Discord Bot built for PESsants and PESts of PES University

Space Bot, a Discord bot built for HackerSpace Club of PES University

A quick-and-dirty script to scrape the daily menu of Leipzig University Mensa and send it to a telegram channel.