A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Last update: Sep 7, 2022

Related tags

Overview

Realtime Financial Market Data Visualization and Analysis

Introduction

This repo shows my project about real-time stock data pipeline. All the code is written in PYTHON. In this project, I play with various Data Engineering frameworks to develop a financial data processing and visualization platform using Apache Kafka, Apache Cassandra, and Bokeh. I used Kafka for realtime stock price and market news streaming, Cassandra for historical and realtime stock data warehousing, and Bokeh for visualization on web browsers. I also wrote a web crawler to scrape companys' financial statements and basic information from Yahoo Finance, and played with various economy data APIs.

Architecture

There are currently 3 tabs in the webpage:

Stock: Streaming & Fundamental
- Single stock's candlestick plot, basic company & financial information;
- Realtime S&P500 price during trading hours (fake date during non-trading hours)
Stock: Comparison
- 2 user-selected stocks' price, and their statstical summay and correlation
- 5,10,30-day moving average of adjusted close price
Economy
- Geomap of various economy data by state
- 4 economy indicators nationwide for comparison
- The most recent market news

Here is the architecture of the platform.

How Stock Data is Streamed via Kafka to Cassandra:

Please check each tab's screenshot:

Tab 1:

Tab 2:

Tab 3:

You might also like...

PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift This project is composed of two parts: Part1 and Part2

1 Jan 19, 2022

Building house price data pipelines with Apache Beam and Spark on GCP

Comments

How do I run this project?

Can you please share instructions on how to install and run this project? I am using visual studio but I don't know which file to start with and all that good stuff.

opened by hhashim1 0

A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Related tags

Overview

Realtime Financial Market Data Visualization and Analysis

Introduction

Architecture

You might also like...

PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Building house price data pipelines with Apache Beam and Spark on GCP

BigDL - Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems

Python data processing, analysis, visualization, and data operations

ETL pipeline on movie data using Python and postgreSQL

This mini project showcase how to build and debug Apache Spark application using Python

Full automated data pipeline using docker images

Integrate bus data from a variety of sources (batch processing and real time processing).

An Integrated Experimental Platform for time series data anomaly detection.

Comments

How do I run this project?

Owner

PySpark Structured Streaming ROS Kafka ApacheSpark Cassandra

Reading streams of Twitter data, save them to Kafka, then process with Kafka Stream API and Spark Streaming

X-news - Pipeline data use scrapy, kafka, spark streaming, spark ML and elasticsearch, Kibana

Sentiment analysis on streaming twitter data using Spark Structured Streaming & Python

An ETL Pipeline of a large data set from a fictitious music streaming service named Sparkify.

Using Data Science with Machine Learning techniques (ETL pipeline and ML pipeline) to classify received messages after disasters.

SNV calling pipeline developed explicitly to process individual or trio vcf files obtained from Illumina based pipeline (grch37/grch38).

Two phase pipeline + StreamlitTwo phase pipeline + Streamlit

Udacity-api-reporting-pipeline - Udacity api reporting pipeline

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.