Big data on k8s

Overview
#  microsoft azure
# https://docs.microsoft.com/en-us/cli/azure/install-azure-cli
az account set --subscription []
az aks get-credentials --resource-group k8s-aks-owshq-dev --name aks-owshq-dev

# google gcp
# https://cloud.google.com/sdk/gcloud
gcloud container clusters get-credentials [] --region us-central1

# digital ocean
# https://docs.digitalocean.com/reference/doctl/how-to/install/
doctl kubernetes cluster kubeconfig save []

# linode
# https://www.linode.com/docs/guides/linode-cli/
cluster_id=[]
linode-cli lke clusters-list
linode-cli lke kubeconfig-view $cluster_id

# local kube config
kubectl config view
$HOME/.kube/config
# microsoft azure aks cost
# 7 = virtual machines (ds3v2)
# 4 vcpus & 14 gb of ram
# 1,170.19
# ~ R$ 10.000
You might also like...
NumPy and Pandas interface to Big Data
NumPy and Pandas interface to Big Data

Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte

NumPy and Pandas interface to Big Data
NumPy and Pandas interface to Big Data

Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte

NumPy and Pandas interface to Big Data
NumPy and Pandas interface to Big Data

Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte

 Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second 🚀
Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second 🚀

What is Vaex? Vaex is a high performance Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular data

[Preprint] Escaping the Big Data Paradigm with Compact Transformers, 2021
[Preprint] Escaping the Big Data Paradigm with Compact Transformers, 2021

Compact Transformers Preprint Link: Escaping the Big Data Paradigm with Compact Transformers By Ali Hassani[1]*, Steven Walton[1]*, Nikhil Shah[1], Ab

Python utility function to communicate with a subprocess using iterables: for when data is too big to fit in memory and has to be streamed

iterable-subprocess Python utility function to communicate with a subprocess using iterables: for when data is too big to fit in memory and has to be

Source codes of CenterTrack++ in 2021 ICME Workshop on Big Surveillance Data Processing and Analysis
Source codes of CenterTrack++ in 2021 ICME Workshop on Big Surveillance Data Processing and Analysis

MOT Tracked object bounding box association (CenterTrack++) New association method based on CenterTrack. Two new branches (Tracked Size and IOU) are a

AI Flow is an open source framework that bridges big data and artificial intelligence.
AI Flow is an open source framework that bridges big data and artificial intelligence.

Flink AI Flow Introduction Flink AI Flow is an open source framework that bridges big data and artificial intelligence. It manages the entire machine

QuakeLabeler is a Python package to create and manage your seismic training data, processes, and visualization in a single place — so you can focus on building the next big thing.
QuakeLabeler is a Python package to create and manage your seismic training data, processes, and visualization in a single place — so you can focus on building the next big thing.

QuakeLabeler Quake Labeler was born from the need for seismologists and developers who are not AI specialists to easily, quickly, and independently bu

Urban Big Data Centre Housing Sensor Project

Housing Sensor Project The Urban Big Data Centre is conducting a study of indoor environmental data in Scottish houses. We are using Raspberry Pi devi

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

About The ROOT system provides a set of OO frameworks with all the functionality needed to handle and analyze large amounts of data in a very efficien

BigDL - Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems
BigDL - Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems

Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems.

Big Bird: Transformers for Longer Sequences

BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the capabilities of a complete transformer that the sparse model can handle.

Big-Papa Integrates Javascript and python for remote cookie stealing which then can be used for session hijacking
Big-Papa Integrates Javascript and python for remote cookie stealing which then can be used for session hijacking

Big-Papa is a remote cookie stealer which can then be used for session hijacking and Bypassing 2 Factor Authentication

CVE-2021-22986 & F5 BIG-IP RCE
CVE-2021-22986 & F5 BIG-IP RCE

Vuln Impact This vulnerability allows for unauthenticated attackers with network access to the iControl REST interface, through the BIG-IP management

Find big moving stocks before they move using machine learning and anomaly detection
Find big moving stocks before they move using machine learning and anomaly detection

Surpriver - Find High Moving Stocks before they Move Find high moving stocks before they move using anomaly detection and machine learning. Surpriver

Code for the paper
Code for the paper "Reinforcement Learning as One Big Sequence Modeling Problem"

Trajectory Transformer Code release for Reinforcement Learning as One Big Sequence Modeling Problem. Installation All python dependencies are in envir

BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).
BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).

BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).

Raid ToolBox (RTB) is a big toolkit of Spamming/Raiding/Token management tools for discord.
Raid ToolBox (RTB) is a big toolkit of Spamming/Raiding/Token management tools for discord.

This code is very out of date and not very good, feel free to make it into something better. (we check the github page every 5 years to pulls your PRs

Owner
Luan Moreno
Data Engineer
Luan Moreno
IP address management (IPAM) and data center infrastructure management (DCIM) tool.

NetBox is an IP address management (IPAM) and data center infrastructure management (DCIM) tool. Initially conceived by the network engineering team a

NetBox Community 11.8k Jan 7, 2023
Ralph is the CMDB / Asset Management system for data center and back office hardware.

Ralph Ralph is full-featured Asset Management, DCIM and CMDB system for data centers and back offices. Features: keep track of assets purchases and th

Allegro Tech 1.9k Jan 1, 2023
A system for managing CI data for Mozilla projects

Treeherder Description Treeherder is a reporting dashboard for Mozilla checkins. It allows users to see the results of automatic builds and their resp

Mozilla 235 Dec 22, 2022
Spark development environment for k8s

Local Spark Dev Env with Docker Development environment for k8s. Using the spark-operator image to ensure it will be the same environment. Start conta

Otacilio Filho 18 Jan 4, 2022
A Pythonic Data Catalog powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to your big data workloads.

DeltaCAT DeltaCAT is a Pythonic Data Catalog powered by Ray. Its data storage model allows you to define and manage fast, scalable, ACID-compliant dat

null 45 Oct 15, 2022
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.

Tuplex 791 Jan 4, 2023
A Big Data ETL project in PySpark on the historical NYC Taxi Rides data

Processing NYC Taxi Data using PySpark ETL pipeline Description This is an project to extract, transform, and load large amount of data from NYC Taxi

Unnikrishnan 2 Dec 12, 2021
Utilize data analytics skills to solve real-world business problems using Humana’s big data

Humana-Mays-2021-HealthCare-Analytics-Case-Competition- The goal of the project is to utilize data analytics skills to solve real-world business probl

Yongxian (Caroline) Lun 1 Dec 27, 2021
Python for downloading model data (HRRR, RAP, GFS, NBM, etc.) from NOMADS, NOAA's Big Data Program partners (Amazon, Google, Microsoft), and the University of Utah Pando Archive System.

Python for downloading model data (HRRR, RAP, GFS, NBM, etc.) from NOMADS, NOAA's Big Data Program partners (Amazon, Google, Microsoft), and the University of Utah Pando Archive System.

Brian Blaylock 194 Jan 2, 2023
CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.

CKAN: The Open Source Data Portal Software CKAN is the world’s leading open-source data portal platform. CKAN makes it easy to publish, share and work

ckan 3.6k Dec 27, 2022