A pure Python implementation of Apache Spark's RDD and DStream interfaces.

Sven Kreiss

Last update: Dec 6, 2022

Related tags

Overview

pysparkling

Pysparkling provides a faster, more responsive way to develop programs for PySpark. It enables code intended for Spark applications to execute entirely in Python, without incurring the overhead of initializing and passing data through the JVM and Hadoop. The focus is on having a lightweight and fast implementation for small datasets at the expense of some data resilience features and some parallel processing features.

How does it work? To switch execution of a script from PySpark to pysparkling, have the code initialize a pysparkling Context instead of a SparkContext, and use the pysparkling Context to set up your RDDs. The beauty is you don't have to change a single line of code after the Context initialization, because pysparkling's API is (almost) exactly the same as PySpark's. Since it's so easy to switch between PySpark and pysparkling, you can choose the right tool for your use case.

When would I use it? Say you are writing a Spark application because you need robust computation on huge datasets, but you also want the same application to provide fast answers on a small dataset. You're finding Spark is not responsive enough for your needs, but you don't want to rewrite an entire separate application for the small-answers-fast problem. You'd rather reuse your Spark code but somehow get it to run fast. Pysparkling bypasses the stuff that causes Spark's long startup times and less responsive feel.

Here are a few areas where pysparkling excels:

Small to medium-scale exploratory data analysis
Application prototyping
Low-latency web deployments
Unit tests

Install

pip install pysparkling[s3,hdfs,streaming]

Documentation:

https://raw.githubusercontent.com/svenkreiss/pysparkling/master/docs/readthedocs.png

Features

Supports URI schemes s3://, hdfs://, gs://, http:// and file:// for Amazon S3, HDFS, Google Storage, web and local file access. Specify multiple files separated by comma. Resolves * and ? wildcards.
Handles .gz, .zip, .lzma, .xz, .bz2, .tar, .tar.gz and .tar.bz2 compressed files. Supports reading of .7z files.
Parallelization via multiprocessing.Pool, concurrent.futures.ThreadPoolExecutor or any other Pool-like objects that have a map(func, iterable) method.
Plain pysparkling does not have any dependencies (use pip install pysparkling). Some file access methods have optional dependencies: boto for AWS S3, requests for http, hdfs for hdfs

Examples

Some demos are in the notebooks docs/demo.ipynb and docs/iris.ipynb .

Word Count

from pysparkling import Context

counts = (
    Context()
    .textFile('README.rst')
    .map(lambda line: ''.join(ch if ch.isalnum() else ' ' for ch in line))
    .flatMap(lambda line: line.split(' '))
    .map(lambda word: (word, 1))
    .reduceByKey(lambda a, b: a + b)
)
print(counts.collect())

which prints a long list of pairs of words and their counts.

Comments

Introducing pybuilder

I implemented pybuilder now here. The changes do NOT include the optimizations of the headers (yet) ;-). That's for another PR. pybuilder works, I don't know about travis yet though.

opened by svaningelgem 14
Feat/spark sql
This PR is related to #47: It implements a big part of DataFrame related APIs using pure Python

It is already huge in term of code, and for that I'm sorry as there's a lot to review. On the other hand it adds a lot of features and support some of Spark nicest features :).

NB: I'm opening this PR as-is for 2 main reasons:

See what happens with the test suite of pysparkling (I haven't test a lot with python2 even if a lot of effort went into compatibility with it)

Discuss if there is a way to make it easier to ingest. A suggestion would be to split it but there are still a lot of connected components that are codependent (mainly DataFrame and Column).

What this PR is about

It introduce the DataFrame object, a data structure that contains Rows of data, Row are quite similar to namedtuple.

DataFrame coolest feature is that you describe operation based on the schema of your Row but not on their values (by manipulating Column). DataFrame operations are supported by the existing RDD code, like PySpark's DataFrame most of the logic is not directly in DataFrame but in another object (in PySpark it's in the Scala counterpart of DataFrame, here in a DataFrameInternal object written in Python).

What this PR includes:

pysparkling.sql module, including:

SparkSession and SQLContext that allow DataFrame creation and management

DataFrame and GroupedData

Column

Types

DataFrameReader partial support of JSON and CSV

Some missing methods of RDD

An implementation of most of PySpark SQL functions, both classic expression and aggregations

What it does not include and that should be address in another PR:

Raw SQL strings parsing, both for schema description and for query creation:

This does not work:

spark.sql("select count(1) from swimmers").show()

This works:

df = spark.read.csv("swimmer") df.select(count(1)).show()

Window functions

Catalog related features

Streaming related features

I'm available for any questions/as mush walk-through on the code as you want :smiley:
(twitter: https://twitter.com/geekowan if you want to send DMs)
opened by tools4origins 13
Optimize imports
Optimize import does cleanup:

unused imports

imports that aren't sorted within a group

two top-level imports on the same line.

Very handy tool to keep the imports nice and tidy.
opened by svaningelgem 11
Add accumulators, update broadcasts

Broadcast changes were made to make sure that Broadcast initialization has the same arguments as its pyspark counterpart (context as first argument).

Accumulator and AccumulatorParam implementation is partly taken from their pyspark counterparts. Note that this implementation does not support parallelization, but such addition should be fairly doable on top of current implementation (again, by looking at the pyspark counterpart).

Tests are passing, I added tests by re-using pyspark doctests.

Fixes #25

opened by alexprengere 11
[NOT FOR MERGE] Feat/sql string parsing
Let's introduce SQL parsing! And support things like spark main example:

But first let's figure out how :smile:

This PR adds one dependency (Antlr4) and one file (SqlBase.g4). It also contains many files generated based on SqlBase.g4, and that's why it is so big (and should not be merged).

Antlr is a parser, and SqlBase.g4 defines a SQL grammar: It formalizes how are structured SQL strings such as SELECT * FROM table WHERE column IS NOT NULL.

Note: SqlBase.g4 is derived from spark who itself is derived from Presto's one: This reinforces that we introduce the same SQL grammar as the one used by Spark.

Based on this grammar, Antlr4 will convert each string into a syntax tree, where each syntaxical component is a Node with a predefined type and predefined children that are themself trees. It make SQL string parsing much easier as SQL is a bit complex.

For instance it converts 42 > 1 into a tree like:

| ComparisonContext |-- ValueExpressionDefaultContext |---- ConstantDefaultContext |------ NumericLiteralContext |-------- IntegerLiteralContext |---------- TerminalNodeImpl # 42 |-- ComparisonOperatorContext |---- TerminalNodeImpl # > |-- ValueExpressionDefaultContext |---- ConstantDefaultContext |------ NumericLiteralContext |-------- IntegerLiteralContext |---------- TerminalNodeImpl # 1

I am not opening this PR in order to have it merged: I do not think that we should add generated code to git.

Rather, I am opening it to discuss how to automatized the code generation.

Currently, it requires the following steps:

Download antlr-4.7.1-complete.jarfrom https://www.antlr.org/download/

Run java -Xmx500M -cp "/path/to/antlr-4.7.1-complete.jar:$CLASSPATH" org.antlr.v4.Tool ${project_dir}/pysparkling/sql/ast/grammar/SqlBase.g4 -o ${project_dir}/pysparkling/sql/ast/generated

But that's only for developers: I think we will want to package the app with these generated files.

These are the steps why I think a bit more automation in the app lifecyle would be nice.

What do you think?
opened by tools4origins 7
Warning when running tests
nosetests is failing with the following error:

nosetests: error: Error reading config file 'setup.cfg': no such option 'with-doctest-ignore-unicode'

When using python setup.py test, it works but still we get a warning:

RuntimeWarning: Option 'with-doctest' in config file 'setup.cfg' ignored: excluded by runtime environment

Can we remove this option, or am I missing some libraries to make it work?
opened by alexprengere 6
Handle paths manipulation with os.path, some cleanups

I read most of the code of the project, and removed a bit of code here and there code when I saw opportunities. Most of the changes are in fileio. Tests are still passing, I hope I did not break anything.

opened by alexprengere 5
Feat/untangling imports
@tools4origins : you're not going to like this one when merging :(. Sorry about that!

Cleanup of imports.

change of name from stat_counter.py to statcounter.py as that's the name in pyspark.

Moved as much as I could into 'terminals'. Meaning modules which are not depending on any others except for externals to pysparkling. There's still a lot to do here!

Moved stuff out from pysparkling into pysparkling.sql because the SQL code just does not belong in the root!

The most glaring example of this is the method toDF which is moved to sql.session.py and is being monkey-patched. Just like pyspark is doing it. (don't re-invent the wheel :)).

Moved stuff to private (to pysparkling) modules (modules starting with _) which are not defined in pyspark. This helped to reduce the complexity a lot.

What I'm trying to achieve here is to reduce the dependency hell of the "SQL" module. Not much code has been changed, some code copied from pyspark to make it easier, but basically moving methods around into different files. I also started here to make a distinction between pysparkling internals and the pyspark interface. What I mean with that is that pyspark has a certain file structure, this I kept as rigorous as possible (it was already largely that way anyhow), But pysparkling specific methods I tried to move into "private" modules (_types, _casts, _expressions, ...).
opened by svaningelgem 3
Feat/spark sql side effects
This PR contains all the modifications required by the Spark SQL implementation (#92) outside of pysparkling.sql,

12 files are affected by this PR:

. ├── pysparkling │ ├── sql │ │ ├── internal_utils │ │ │ └── joins.py │ │ └── types.py │ ├── tests │ │ ├── test_stat_counter.py │ │ └── test_streaming_files │ ├── __init__.py │ ├── context.py │ ├── rdd.py │ ├── stat_counter.py │ ├── storagelevel.py │ └── utils.py ├── LICENSE └── setup.py

As it contains mostly interfaces with Spark SQL it sometimes refers to code that is not part of this PR, such references are commented in this PR.

Biggest chunks of code are:

pysparkling/stat_counter.py as this PR add stat counters similar to the existing StatCounter but for Column and Rows. Those counters computes the following stats:

mean

variance_pop

variance_samp

variance

stddev_pop

stddev_samp

stddev

min

max

sum

skewness

kurtosis

covar_samp

covar_pop

pearson_correlation

pysparkling/utils.py as it introduces many utils functions
opened by tools4origins 3
Travis python versions

Fixes #93.

It contains the commit history of #92 as I did not see a way to split git modifications on multiple branches while keeping the change history,

My suggestion is to progressively checkout from feat/sparkSQL on top of this commit, with one commit referencing feat/sparkSQL (a.k.a. 1ca7a2a64d7f53ff44d02b033e571914d032b60a) for each PR, this way it's still easy to navigate through history

What do you think of the process? This PR has a lot of commits but it's fairly easy to look at the files changes (commit comments are not relevant for those changes)

opened by tools4origins 3
fix RDD.reduce when rdd contains empty partitions

Fixes #83

It let the TypeError to be thrown instead of checking the emptiness of the partition beforehand as values is a generator and it seems better not to affect it.

opened by tools4origins 3
fix(dataFrameShow): Remove extra new lines
A regression was introduced in the latest commit on master:

spark.range(3).show() printed not wanted blank lines:

+---+ | id| +---+ | 0| <BLANK LINE> | 1| <BLANK LINE> | 2| <BLANK LINE> +---+

This went unnoticed because doctest was configure to ignore whitespace differences, which we do not want for instance because of this regression, hence the removal in setup.cfg
opened by tools4origins 1
Feat/expression types

This PR is on top of #157 so it may be good to merge #157 first. A diff between PR can be found here: https://github.com/tools4origins/pysparkling/pull/7/files

It adds column data types handling, mostly by implementing an Expression.data_type methods that take the DataFrame schema and returns the Expression type

I found a few minor issues (e.g. bad return type in functions) while implementing it so this PR fixes them too

opened by tools4origins 0
pytest-xdist
I just tried pytest-xdist to run the tests in parallel.

Mostly it went ok, but these 2 failed somehow:

FAILED pysparkling/sql/tests/test_write.py::DataFrameWriterTests::test_write_nested_rows_to_json - FileNotFoundError: [WinError 3] The system cannot find the path specified: '.tmp' FAILED pysparkling/tests/test_streaming_tcp.py::TCPTextTest::test_connect - AssertionError: 6 != 20

Would be nice to use the pytest-xdist (I ran it with -n 4) and it finished in a fraction of the time pytest takes to run.

Wouldn't that be a good first issue for someone to look at ;-)?
opened by svaningelgem 0
Feat/sql parsing

This PR implements an ANTLR-based logic to parse SQL.

ANTLR is a parser generator used by Apache Spark. As such, we are able to use the exact spark SQL syntax.

SQL strings are converted into an abstract syntax tree (AST) by this project https://github.com/pysparkling/python-sql-parser.

These AST are then converted into Pysparkling object using the pysparkling/sql/ast/ast_to_python.py module, in particular via its entry points parse_xxx, e.g. parse_sql or parse_schema.

This PR only exposes SQL parsing on SQL schemas via a refactoring of StructType.fromDDL.

It also contains part of the logic that will be used to handle other types of SQL statements.

opened by tools4origins 5
Make the imports of the `sql` module less complicated

The import structure within the sql module is complicated and not great at the moment.

The core classes outside of the sql module should be fine.

This issue is to check whether these imports outside of top-level are really necessary for those cases like this one? ==> Rationalize & reduce complexity in imports.

Originally posted by @svenkreiss in https://github.com/svenkreiss/pysparkling/pull/152#discussion_r574602389

opened by svaningelgem 0
Show a bit of coverage report in Github Action terminal

it might be worth to think about how to show a bit of the coverage report in the terminal output of the GitHub Action.

Originally posted by @svenkreiss in https://github.com/svenkreiss/pysparkling/issues/150#issuecomment-775915822

opened by svaningelgem 0

Releases(v0.6.2)

v0.6.2(Nov 13, 2022)
make dependencies optional: boto, requests

compatibility

Source code(tar.gz)
Source code(zip)
v0.6.1(Jan 10, 2021)

testing continuous deployment
Source code(tar.gz)
Source code(zip)
v0.6.0(Jul 13, 2019)
Broadcast, Accumulator and AccumulatorParam by @alexprengere

support for increasing partition numbers in coalesce and repartition by @tools4origins

Source code(tar.gz)
Source code(zip)
v0.5.0(May 3, 2019)
fixes for HDFS thanks to @tools4origins

fix for empty partitions by @tools4origins

api fixes by @artem0 and @tools4origins

various updates for streaming submodule

various updates to lint and test system

logging: converted some info messages to debug

... documentation for some point releases is missing

Source code(tar.gz)
Source code(zip)
v0.4.1(May 27, 2017)
retries for failed partitions

improve pysparkling.streaming.DStream

updates to docs

Source code(tar.gz)
Source code(zip)
v0.4.0(Mar 11, 2017)
major addition: pysparkling.streaming

updates to RDD.sample()

reorganized scripts and tests

added RDD.partitionBy()

minor updates to pysparkling.fileio

Source code(tar.gz)
Source code(zip)
v0.3.23(Aug 6, 2016)

small improvements to fileio and better documentation
Source code(tar.gz)
Source code(zip)
v0.3.22(Jun 18, 2016)
reimplement RDD.groupByKey()

clean up of docstrings

Source code(tar.gz)
Source code(zip)
v0.3.21(May 31, 2016)
faster text file reading by using io.TextIOWrapper for decoding

Source code(tar.gz)
Source code(zip)

v0.3.20(May 1, 2016)

* Google Storage file system (using ``gs://``)
* dependencies: ``requests`` and ``boto`` are not optional anymore
* ``aggregateByKey()`` and ``foldByKey()`` return RDDs
* Python 3: use ``sys.maxsize`` instead of ``sys.maxint``
* flake8 linting

Source code(tar.gz)
Source code(zip)

v0.3.19(Mar 6, 2016)

Source code(tar.gz)
Source code(zip)
v0.3.18(Feb 13, 2016)

Source code(tar.gz)
Source code(zip)
v0.2.28(Jul 3, 2015)

See HISTORY.rst.
Source code(tar.gz)
Source code(zip)
v0.2.24(Jun 16, 2015)
replace dill with cloudpickle in docs and test

add tests with pypy and pypy3

See HISTORY.rst and README.rst.
Source code(tar.gz)
Source code(zip)
v0.2.23(Jun 15, 2015)
added RDD.randomSplit()

saveAsTextFile() saves single file if there is only one partition (and does not break it out into partitions)

See HISTORY.rst.
Source code(tar.gz)
Source code(zip)
v0.2.22(Jun 12, 2015)
added Context.wholeTextFiles()

improved RDD.first() and RDD.take(n)

added fileio.TextFile

See change log in HISTORY.rst.
Source code(tar.gz)
Source code(zip)
v0.2.21(Jun 7, 2015)
added doc strings and created Sphinx documentation

implemented allowLocal in Context.runJob()

new IPython demo notebook at docs/demo.ipynb at https://github.com/svenkreiss/pysparkling/blob/master/docs/demo.ipynb

parallelize() can take an iterator

See HISTORY.rst.
Source code(tar.gz)
Source code(zip)
v0.2.19(Jun 4, 2015)

See HISTORY.rst.
Source code(tar.gz)
Source code(zip)
v0.2.16(Jun 1, 2015)
add values(), union(), zip(), zipWithUniqueId(), toLocalIterator()

improve aggregate() and fold()

add stats(), sampleStdev(), sampleVariance(), stdev(), variance()

make cache() and persist() do something useful

better partitioning in parallelize()

logo

fix foreach()

Source code(tar.gz)
Source code(zip)
v0.2.13(May 28, 2015)

See README.rst.
Source code(tar.gz)
Source code(zip)
v0.2.10(May 27, 2015)

See README.rst.
Source code(tar.gz)
Source code(zip)
v0.2.8(May 26, 2015)

See README.rst.
Source code(tar.gz)
Source code(zip)
v0.2.6(May 22, 2015)

See README.rst.
Source code(tar.gz)
Source code(zip)
v0.2.2(May 19, 2015)

See README.rst.
Source code(tar.gz)
Source code(zip)
v0.2.0(May 18, 2015)

See README.rst for change log.
Source code(tar.gz)
Source code(zip)
v0.1.1(May 12, 2015)

See README.rst.
Source code(tar.gz)
Source code(zip)
v0.1.0(May 12, 2015)

First release. See README.rst.
Source code(tar.gz)
Source code(zip)

Owner

Sven Kreiss

Postdoc at Visual Intelligence for Transportation (VITA) lab at EPFL with a background in particle physics.

GitHub https://pysparkling.readthedocs.io

Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second 🚀

What is Vaex? Vaex is a high performance Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular data

7.7k Jan 1, 2023

A Python package for manipulating 2-dimensional tabular data structures

datatable This is a Python package for manipulating 2-dimensional tabular data structures (aka data frames). It is close in spirit to pandas or SFrame

1.6k Jan 5, 2023

NumPy and Pandas interface to Big Data

Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte

3.1k Jan 1, 2023

High performance datastore for time series and tick data

Arctic TimeSeries and Tick store Arctic is a high performance datastore for numeric data. It supports Pandas, numpy arrays and pickled objects out-of-

2.9k Dec 23, 2022

python-bigquery Apache-2python-bigquery (🥈34 · ⭐ 3.5K · 📈) - Google BigQuery API client library. Apache-2

Python Client for Google BigQuery Querying massive datasets can be time consuming and expensive without the right hardware and infrastructure. Google

550 Jan 1, 2023

google-resumable-media Apache-2google-resumable-media (🥉28 · ⭐ 27) - Utilities for Google Media Downloads and Resumable.. Apache-2

google-resumable-media Utilities for Google Media Downloads and Resumable Uploads See the docs for examples and usage. Experimental asyncio Support Wh

36 Nov 22, 2022

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Petastorm Contents Petastorm Installation Generating a dataset Plain Python API Tensorflow API Pytorch API Spark Dataset Converter API Analyzing petas

1.6k Dec 31, 2022

google-cloud-bigtable Apache-2google-cloud-bigtable (🥈31 · ⭐ 3.5K) - Google Cloud Bigtable API client library. Apache-2

Python Client for Google Cloud Bigtable Google Cloud Bigtable is Google's NoSQL Big Data database service. It's the same database that powers many cor

39 Dec 3, 2022

Objax Apache-2Objax (🥉19 · ⭐ 580) - Objax is a machine learning framework that provides an Object.. Apache-2 jax

Objax Tutorials | Install | Documentation | Philosophy This is not an officially supported Google product. Objax is an open source machine learning fr

729 Jan 2, 2023

Pure-python-server - A blogging platform written in pure python for developer to share their coding knowledge

Pure Python web server - PyProject A blogging platform written in pure python (n

10 Nov 7, 2022

pure-predict: Machine learning prediction in pure Python

pure-predict speeds up and slims down machine learning prediction applications. It is a foundational tool for serverless inference or small batch prediction with popular machine learning frameworks like scikit-learn and fasttext. It implements the predict methods of these frameworks in pure Python.

84 Dec 29, 2022

Pure Python bindings for the pure C++11/OpenCL Qrack quantum computer simulator library

pyqrack Pure Python bindings for the pure C++11/OpenCL Qrack quantum computer simulator library (PyQrack is just pure Qrack.) IMPORTANT: You must buil

6 Jul 21, 2022

Python Fire is a library for automatically generating command line interfaces (CLIs) from absolutely any Python object.

Python Fire Python Fire is a library for automatically generating command line interfaces (CLIs) from absolutely any Python object. Python Fire is a s

23.6k Dec 31, 2022

fastgradio is a python library to quickly build and share gradio interfaces of your trained fastai models.

34 Jan 5, 2023

:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want

1.4k Feb 18, 2021

A pure Python implementation of Apache Spark's RDD and DStream interfaces.

Related tags

Overview

pysparkling

Install

Features

Examples

Comments

What this PR is about

What this PR includes:

What it does not include and that should be address in another PR:

Releases(v0.6.2)

v0.6.2(Nov 13, 2022)

v0.6.1(Jan 10, 2021)

v0.6.0(Jul 13, 2019)

v0.5.0(May 3, 2019)

v0.4.1(May 27, 2017)

v0.4.0(Mar 11, 2017)

v0.3.23(Aug 6, 2016)

v0.3.22(Jun 18, 2016)

v0.3.21(May 31, 2016)

v0.3.20(May 1, 2016)

v0.3.19(Mar 6, 2016)

v0.3.18(Feb 13, 2016)

v0.2.28(Jul 3, 2015)

v0.2.24(Jun 16, 2015)

v0.2.23(Jun 15, 2015)

v0.2.22(Jun 12, 2015)

v0.2.21(Jun 7, 2015)

v0.2.19(Jun 4, 2015)

v0.2.16(Jun 1, 2015)

v0.2.13(May 28, 2015)

v0.2.10(May 27, 2015)

v0.2.8(May 26, 2015)

v0.2.6(May 22, 2015)

v0.2.2(May 19, 2015)

v0.2.0(May 18, 2015)

v0.1.1(May 12, 2015)

v0.1.0(May 12, 2015)

Owner

Sven Kreiss

Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second 🚀

A Python package for manipulating 2-dimensional tabular data structures

NumPy and Pandas interface to Big Data

High performance datastore for time series and tick data

python-bigquery Apache-2python-bigquery (🥈34 · ⭐ 3.5K · 📈) - Google BigQuery API client library. Apache-2

google-resumable-media Apache-2google-resumable-media (🥉28 · ⭐ 27) - Utilities for Google Media Downloads and Resumable.. Apache-2

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

google-cloud-bigtable Apache-2google-cloud-bigtable (🥈31 · ⭐ 3.5K) - Google Cloud Bigtable API client library. Apache-2

Objax Apache-2Objax (🥉19 · ⭐ 580) - Objax is a machine learning framework that provides an Object.. Apache-2 jax

Pure-python-server - A blogging platform written in pure python for developer to share their coding knowledge

pure-predict: Machine learning prediction in pure Python

Pure Python bindings for the pure C++11/OpenCL Qrack quantum computer simulator library

Python Fire is a library for automatically generating command line interfaces (CLIs) from absolutely any Python object.

fastgradio is a python library to quickly build and share gradio interfaces of your trained fastai models.

:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!

Interfaces between napari and pymeshlab library to allow import, export and construction of surfaces.

Declarative User Interfaces for Python

Criando interfaces gráficas com Python e Qt 6 (PyQt6)

A python library for building user interfaces in discord.

Generating interfaces(CLI, Qt GUI, Dash web app) from a Python function.