321 Repositories
Python think-big-teach-small Libraries
Big-Papa Integrates Javascript and python for remote cookie stealing which then can be used for session hijacking
Big-Papa is a remote cookie stealer which can then be used for session hijacking and Bypassing 2 Factor Authentication
A small utility to deal with malware embedded hashes.
Uchihash is a small utility that can save malware analysts the time of dealing with embedded hash values used for various things such as: Dyn
Create HTML profiling reports from pandas DataFrame objects
Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree
Apache Spark - A unified analytics engine for large-scale data processing
Apache Spark Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an op
The pytest framework makes it easy to write small tests, yet scales to support complex functional testing
The pytest framework makes it easy to write small tests, yet scales to support complex functional testing for applications and libraries. An example o
A small utility to pretty-print Python tracebacks. ⛺
TBVaccine TBVaccine is a utility that pretty-prints Python tracebacks. It automatically highlights lines you care about and deemphasizes lines you don
a small, expressive orm -- supports postgresql, mysql and sqlite
peewee Peewee is a simple and small ORM. It has few (but expressive) concepts, making it easy to learn and intuitive to use. a small, expressive ORM p
A small Python module for determining appropriate platform-specific dirs, e.g. a "user data dir".
the problem What directory should your app use for storing user data? If running on macOS, you should use: ~/Library/Application Support/AppName If
a small library for extracting rich content from urls
A small library for extracting rich content from urls. what does it do? micawber supplies a few methods for retrieving rich metadata about a variety o
Big Bird: Transformers for Longer Sequences
BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the capabilities of a complete transformer that the sparse model can handle.
a small library for extracting rich content from urls
A small library for extracting rich content from urls. what does it do? micawber supplies a few methods for retrieving rich metadata about a variety o
Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis. You write a high level configuration file specifying your in
a small, expressive orm -- supports postgresql, mysql and sqlite
peewee Peewee is a simple and small ORM. It has few (but expressive) concepts, making it easy to learn and intuitive to use. a small, expressive ORM p
NumPy and Pandas interface to Big Data
Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree
Serverless proxy for Spark cluster
Hydrosphere Mist Hydrosphere Mist is a serverless proxy for Spark cluster. Mist provides a new functional programming framework and deployment model f
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
H2O H2O is an in-memory platform for distributed, scalable machine learning. H2O uses familiar interfaces like R, Python, Scala, Java, JSON and the Fl
Small, fast HTTP client library for Python. Features persistent connections, cache, and Google App Engine support. Originally written by Joe Gregorio, now supported by community.
Introduction httplib2 is a comprehensive HTTP client library, httplib2.py supports many features left out of other HTTP libraries. HTTP and HTTPS HTTP
:truck: Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
To launch a live notebook server to test optimus using binder or Colab, click on one of the following badges: Optimus is the missing framework to prof
NumPy and Pandas interface to Big Data
Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte