39 Repositories
Python hadoop-filesystem Libraries
Python utility to generate filesystem content for Obsidian.
Security Vault Generator Quickly parse, format, and output common frameworks/content for Obsidian.md. There is a strong focus on MITRE ATT&CK because
Quick and dirty FAT12 filesystem to ZIP file converter
Quick and Dirty FAT12 Filesystem Converter This is a really crappy Python script I wrote to convert a semi-compatible FAT12 filesystem from my HP150's
Oracle Cloud Infrastructure Object Storage fsspec implementation
Oracle Cloud Infrastructure Object Storage fsspec implementation The Oracle Cloud Infrastructure Object Storage service is an internet-scale, high-per
PyDeleter - delete a specifically formatted file in a directory or delete all other files
PyDeleter If you want to delete a specifically formatted file in a directory or delete all other files, PyDeleter does it for you. How to use? 1- Down
A webdav demo using a virtual filesystem that serves a random status of whether a cat in a box is dead or alive.
A webdav demo using a virtual filesystem that serves a random status of whether a cat in a box is dead or alive.
Distributed deep learning on Hadoop and Spark clusters.
Note: we're lovingly marking this project as Archived since we're no longer supporting it. You are welcome to read the code and fork your own version
Python HDFS client
Python HDFS client Because the world needs yet another way to talk to HDFS from Python. Usage This library provides a Python client for WebHDFS. NameN
Asynchronous serverless task queue with timed leasing of tasks
Asynchronous serverless task queue with timed leasing of tasks. Threaded implementations for SQS and local filesystem.
🗃️ Fileio-cli wrapper for fileioapi.py with fire.py, inspiration DOS
🗃️ File.io File.io simply upload a file, share the link, and after it is downloaded, the file is completely deleted. An API wrapper for the file.io w
Powerful Python library for atomic file writes.
Powerful Python library for atomic file writes.
Searches filesystem for CVE-2021-44228 and CVE-2021-45046 vulnerable instances of log4j library, including embedded (jar/war/zip) packaged ones.
log4shell_finder Python port of https://github.com/mergebase/log4j-detector log4j-detector is copyright (c) 2021 - MergeBase Software Inc. https://mer
"zpool iostats" for humans; find the slow parts of your ZFS pool
Getting the gist of zfs statistics vpool-demo.mp4 The ZFS command "zpool iostat" provides a histogram listing of how often it takes to do things in pa
Implementation of a hadoop based movie recommendation system
Implementation-of-a-hadoop-based-movie-recommendation-system 通过编写代码,设计一个基于Hadoop的电影推荐系统,通过此推荐系统的编写,掌握在Hadoop平台上的文件操作,数据处理的技能。windows 10 hadoop 2.8.3 p
An API wrapper for the file.io web service.
🗃️ File.io An API wrapper for the file.io web service. Install $ pip3 install fileio or
Python's Filesystem abstraction layer
PyFilesystem2 Python's Filesystem abstraction layer. Documentation Wiki API Documentation GitHub Repository Blog Introduction Think of PyFilesystem's
Cached file system for online resources in Python
Minato Cache & file system for online resources in Python Features Minato enables you to: Download & cache online recsources minato supports the follo
Hadoop Yan ResourceManager unauthorized RCE
Vuln Impact There was an unauthorized access vulnerability in Hadoop yarn ResourceManager. This vulnerability existed in Hadoop yarn, the core compone
Hadoop Yan RPC unauthorized RCE
Vuln Impact On November 15, 2021, A security researcher disclosed that there was an unauthorized access vulnerability in Hadoop yarn RPC. This vulnera
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
FUSE filesystem Python scripts for Nintendo console files
ninfs (formerly fuse-3ds) is a FUSE program to extract data from Nintendo game consoles. It works by presenting a virtual filesystem with the contents of your games, NAND, or SD card contents, and you can browse and copy out just the files that you need.
Python virtual filesystem for SQLite to read from and write to S3
Python virtual filesystem for SQLite to read from and write to S3
gitfs is a FUSE file system that fully integrates with git - Version controlled file system
gitfs is a FUSE file system that fully integrates with git. You can mount a remote repository's branch locally, and any subsequent changes made to the files will be automatically committed to the remote.
Python disk-backed cache (Django-compatible). Faster than Redis and Memcached. Pure-Python.
DiskCache is an Apache2 licensed disk and file backed cache library, written in pure-Python, and compatible with Django.
A cd command that learns - easily navigate directories from the command line
NAME autojump - a faster way to navigate your filesystem DESCRIPTION autojump is a faster way to navigate your filesystem. It works by maintaining a d
BigDL: Distributed Deep Learning Framework for Apache Spark
BigDL: Distributed Deep Learning on Apache Spark What is BigDL? BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can w
The Tahoe-LAFS decentralized secure filesystem.
Free and Open decentralized data store Tahoe-LAFS (Tahoe Least-Authority File Store) is the first free software / open-source storage technology that
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l
RMfuse provides access to your reMarkable Cloud files in the form of a FUSE filesystem
RMfuse provides access to your reMarkable Cloud files in the form of a FUSE filesystem. These files are exposed either in their original format, or as PDF files that contain your annotations. This lets you manage files in the reMarkable Cloud using the same tools you use on your local system.
Knowledge Management for Humans using Machine Learning & Tags
HyperTag HyperTag helps humans intuitively express how they think about their files using tags and machine learning.
Simple Python File Manager
This script lets you automatically relocate files based on their extensions. Very useful from the downloads folder !
Knowledge Management for Humans using Machine Learning & Tags
HyperTag helps humans intuitively express how they think about their files using tags and machine learning. Represent how you think using tags. Find what you look for using semantic search for your text documents (yes, even PDF's) and images.
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l
A pandas-like deferred expression system, with first-class SQL support
Ibis: Python data analysis framework for Hadoop and SQL engines Service Status Documentation Conda packages PyPI Azure Coverage Ibis is a toolbox to b
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
H2O H2O is an in-memory platform for distributed, scalable machine learning. H2O uses familiar interfaces like R, Python, Scala, Java, JSON and the Fl
Python library and shell utilities to monitor filesystem events.
Watchdog Python API and shell utilities to monitor file system events. Works on 3.6+. If you want to use Python 2.6, you should stick with watchdog
Python's Filesystem abstraction layer
PyFilesystem2 Python's Filesystem abstraction layer. Documentation Wiki API Documentation GitHub Repository Blog Introduction Think of PyFilesystem's
Run MapReduce jobs on Hadoop or Amazon Web Services
mrjob: the Python MapReduce library mrjob is a Python 2.7/3.4+ package that helps you write and run Hadoop Streaming jobs. Stable version (v0.7.4) doc
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Luigi is a Python (3.6, 3.7 tested) package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow managemen