Data collection, enhancement, and metrics calculation.

Overview

l3_data_collection

Data collection, enhancement, and metrics calculation.

Summary

Repository containing code for QuantDAO's JDT data collection task.

Data is collected from the exchanges listed below, normalised, LOB constructed, and metrics calculated.

  1. OKex
  2. Phemex
  3. FTX
  4. Kraken
  5. Kucoin
  6. Huobi
  7. Deribit
  8. Bitfinex

Rayman: We need to choose 6 of these to work on.

Rayman: Richard wants L3 Order Book granularity... But this isn't possible as the Websockets don't provide a proper order_id for each order. Therefore it's impossible to match trades data exactly to order book update data. We wouldn't know how to interpret the number of cancellations and orders within a single order book price level quantity change. Please clarify with him today (18/01/2022), as I won't be awake till pretty late.

Dependencies

These packages must be manually installed if not installed already.

pip install numpy

Usage

TBC once implementation is better defined.

Comments
  • Atomic keys

    Atomic keys

    Made it so that keys include the type of the message. Means that partitioning will be more distributed, and most likely every stream processor pod will have something to do

    opened by EzePze 0
  • changed raw topic behaviour

    changed raw topic behaviour

    Instead of having multiple topics for -raw, there is now just one "raw" topic. Multiple benefits:

    • Kafka does not handle consumer group rebalancing the way we want when there are multiple topics. It makes it so that each member in the consumer group must subscribe to at least one partition in every topic, which is too much for a single faust Agent to handle
    • One topic is easier to manage
    opened by EzePze 0
  • fixing binance futures, ftx, and gemini

    fixing binance futures, ftx, and gemini

    Few bugs with these three exchanges:

    • Binance futures doesn't have a "trade" channel, only an "aggTrade" channel
    • Gemini wasn't sending atom timestamps
    • FTX was printing erroneously
    opened by EzePze 0
  • Dockerisation, keys folder, README.md update.

    Dockerisation, keys folder, README.md update.

    • Unified Docker image so that only one image is required for all exchanges.
    • docker-compose.yaml to make starting the collection suite easy.
    • keys folder so the place that all keys, certificates, and environment files go is clear.
    • README.md update.
    opened by ruiwynt 0
  • Refactoring

    Refactoring

    Finished refactoring code to MVP v2 standards:

    • Websockets now run in a single threaded async process.
    • Moved websocket dissemination into another repo.
    • Removed redundant code, significantly decreased code duplication.
    opened by ruiwynt 0
  • No data accessible after docker container download it all

    No data accessible after docker container download it all

    Hello I am running the docker container in order to get local data, but I cant find downloaded data anywhere,

    $ docker-compose up -d Creating network "l3-atom-exchange-collectors-main_default" with the default driver Pulling apollox (gdafund/l3_collectors:latest)... latest: Pulling from gdafund/l3_collectors df9b9388f04a: Pull complete a1ef3e6b7a02: Pull complete 7a687728470e: Pull complete 4ecf30de1710: Pull complete c3b27164aa0c: Pull complete df827379d534: Pull complete f345f1295079: Pull complete 12b394bb3a6d: Pull complete 3529e9d7e452: Pull complete 79a5a92ec118: Pull complete ac353ad855b7: Pull complete Digest: sha256:f0fda7984e07e61704c37b0c8c76b8b337116593b68e9418119130d110863739 Status: Downloaded newer image for gdafund/l3_collectors:latest Pulling bybit_l1_collector (gdafund/l1_quote_producers:latest)... latest: Pulling from gdafund/l1_quote_producers df9b9388f04a: Already exists a1ef3e6b7a02: Already exists 7a687728470e: Already exists 4ecf30de1710: Already exists a1f99e431609: Pull complete bd02aada3eda: Pull complete 90f5b73c4918: Pull complete faa1a02a9c46: Pull complete 24589cf7999f: Pull complete 3cb7c1a91aa0: Pull complete 12de0253e3e9: Pull complete Digest: sha256:29980d5958b4f0f2f9403d8af8f8f8444528b475cb654fcf3c4d4695f28bd19e Status: Downloaded newer image for gdafund/l1_quote_producers:latest Pulling coinbase_v3 (gdafund/l3_atom:0.1.0)... 0.1.0: Pulling from gdafund/l3_atom 42c077c10790: Pull complete f63e77b7563a: Pull complete 0c31162eec9d: Pull complete 7cfd3784111c: Pull complete 791791ccdd73: Pull complete 0bd114bd45f0: Pull complete 59aa6dd2e246: Pull complete 49189f672850: Pull complete 068dbc7c633f: Pull complete ea0a474eb164: Pull complete aa6769f236db: Pull complete 16d282f6fccc: Pull complete 2a2eebc8e198: Pull complete ec7e651d2bc2: Pull complete Digest: sha256:fe1b15a16bc303e42a488f2de4ba291f895450181f5469c7f6c3cdb368cc0b6f Status: Downloaded newer image for gdafund/l3_atom:0.1.0 Creating l3-atom-exchange-collectors-main_bitfinex_1 ... done Creating l3-atom-exchange-collectors-main_ftx_1 ... done Creating l3-atom-exchange-collectors-main_bybit_1 ... done Creating l3-atom-exchange-collectors-main_kraken_1 ... done Creating l3-atom-exchange-collectors-main_okex_1 ... done Creating l3-atom-exchange-collectors-main_binance_1 ... done Creating l3-atom-exchange-collectors-main_kucoin_1 ... done Creating l3-atom-exchange-collectors-main_phemex_1 ... done Creating l3-atom-exchange-collectors-main_deribit_1 ... done Creating l3-atom-exchange-collectors-main_bybit_l1_collector_1 ... done Creating l3-atom-exchange-collectors-main_coinbase_1 ... done Creating l3-atom-exchange-collectors-main_dydx_1 ... done Creating l3-atom-exchange-collectors-main_stream_processor_1 ... done Creating l3-atom-exchange-collectors-main_apollox_1 ... done Creating l3-atom-exchange-collectors-main_coinbase_v3_1 ... done Creating l3-atom-exchange-collectors-main_huobi_1 ... done Creating l3-atom-exchange-collectors-main_kraken-futures_1 ... done

    After a good execution theres no data anywhere

    opened by ealcober 0
Owner
Ruiwyn
Ruiwyn
Leverage Twitter API v2 to analyze tweet metrics such as impressions and profile clicks over time.

Tweetmetric Tweetmetric allows you to track various metrics on your most recent tweets, such as impressions, retweets and clicks on your profile. The

Mathis HAMMEL 29 Oct 18, 2022
A collection of robust and fast processing tools for parsing and analyzing web archive data.

ChatNoir Resiliparse A collection of robust and fast processing tools for parsing and analyzing web archive data. Resiliparse is part of the ChatNoir

ChatNoir 24 Nov 29, 2022
A collection of learning outcomes data analysis using Python and SQL, from DQLab.

Data Analyst with PYTHON Data Analyst berperan dalam menghasilkan analisa data serta mempresentasikan insight untuk membantu proses pengambilan keputu

null 6 Oct 11, 2022
MetPy is a collection of tools in Python for reading, visualizing and performing calculations with weather data.

MetPy MetPy is a collection of tools in Python for reading, visualizing and performing calculations with weather data. MetPy follows semantic versioni

Unidata 971 Dec 25, 2022
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Amundsen 3.7k Jan 3, 2023
Codes for the collection and predictive processing of bitcoin from the API of coinmarketcap

Codes for the collection and predictive processing of bitcoin from the API of coinmarketcap

Teo Calvo 5 Apr 26, 2022
Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

Data lineage made simple, reliable, and automated. Effortlessly track the flow of data, understand dependencies and analyze impact. Features Visualiza

null 898 Jan 9, 2023
Unsub is a collection analysis tool that assists libraries in analyzing their journal subscriptions.

About Unsub is a collection analysis tool that assists libraries in analyzing their journal subscriptions. The tool provides rich data and a summary g

null 9 Nov 16, 2022
🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

???? ??. The purpose of the panel-chemistry project is to make it really easy for you to do DATA ANALYSIS and build powerful DATA AND VIZ APPLICATIONS within the domain of Chemistry using using Python and HoloViz Panel.

Marc Skov Madsen 97 Dec 8, 2022
Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

null 2 Nov 20, 2021
PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift This project is composed of two parts: Part1 and Part2

Emmanuel Boateng Sifah 1 Jan 19, 2022
fds is a tool for Data Scientists made by DAGsHub to version control data and code at once.

Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc

DAGsHub 359 Dec 22, 2022
Python data processing, analysis, visualization, and data operations

Python This is a Python data processing, analysis, visualization and data operations of the source code warehouse, book ISBN: 9787115527592 Descriptio

FangWei 1 Jan 16, 2022
Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Data Scientist Learning Plan Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Trung-Duy Nguyen 27 Nov 1, 2022
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.

Tuplex 791 Jan 4, 2023
A data parser for the internal syncing data format used by Fog of World.

A data parser for the internal syncing data format used by Fog of World. The parser is not designed to be a well-coded library with good performance, it is more like a demo for showing the data structure.

Zed(Zijun) Chen 40 Dec 12, 2022
Fancy data functions that will make your life as a data scientist easier.

WhiteBox Utilities Toolkit: Tools to make your life easier Fancy data functions that will make your life as a data scientist easier. Installing To ins

WhiteBox 3 Oct 3, 2022
A Big Data ETL project in PySpark on the historical NYC Taxi Rides data

Processing NYC Taxi Data using PySpark ETL pipeline Description This is an project to extract, transform, and load large amount of data from NYC Taxi

Unnikrishnan 2 Dec 12, 2021