🐍 A hyper-fast Python module for reading/writing JSON data using Rust's serde-json.

Matthias

Last update: Jan 1, 2023

Related tags

Text Data & NLP python rust json extension module serde encode decode hacktoberfest python-json

Overview

A hyper-fast, safe Python module to read and write JSON data. Works as a drop-in replacement for Python's built-in json module. This is alpha software and there will be bugs, so maybe don't deploy to production just yet. 😉

⚠️ NOTE

This project is not actively maintained. orjson is likely the better alternative.

Installation

pip install hyperjson

Usage

hyperjson is meant as a drop-in replacement for Python's json module:

>>> import hyperjson
>>> hyperjson.dumps([{"key": "value"}, 81, True])
'[{"key":"value"},81,true]'
>>> hyperjson.loads("""[{"key": "value"}, 81, true]""")
[{u'key': u'value'}, 81, True]

Motivation

Parsing JSON is a solved problem; so, no need to reinvent the wheel, right?
Well, unless you care about performance and safety.

Turns out, parsing JSON correctly is a hard problem. Thanks to Rust however, we can minimize the risk of running into stack overflows or segmentation faults however.

hyperjson is a thin wrapper around Rust's serde-json and pyo3. It is compatible with Python 3 (and 2 on a best-effort basis).

For a more in-depth discussion, watch the talk about this project recorded at the Rust Cologne Meetup in August 2018.

Goals

Compatibility: Support the full feature-set of Python's json module.
Safety: No segfaults, panics, or overflows.
Performance: Significantly faster than json and as fast as ujson (both written in C).

Non-goals

Support ujson and simplejson extensions:
Custom extensions like encode(), __json__(), or toDict() are not supported. The reason is, that they go against PEP8 (e.g. dunder methods are restricted to the standard library, camelCase is not Pythonic) and are not available in Python's json module.
Whitespace preservation: Whitespace in JSON strings is not preserved. Mainly because JSON is a whitespace-agnostic format and serde-json strips them out by default. In practice this should not be a problem, since your application must not depend on whitespace padding, but it's something to be aware of.

Benchmark

We are not fast yet. That said, we haven't done any big optimizations. In the long-term we might explore features of newer CPUs like multi-core and SIMD. That's one area other (C-based) JSON extensions haven't touched yet, because it might make code harder to debug and prone to race-conditions. In Rust, this is feasible due to crates like faster or rayon.

So there's a chance that the following measurements might improve soon.
If you want to help, check the instructions in the Development Environment section below.

Test machine:
MacBook Pro 15 inch, Mid 2015 (2,2 GHz Intel Core i7, 16 GB RAM) Darwin 17.6.18

Contributions welcome!

If you would like to hack on hyperjson, here's what needs to be done:

Implement loads()
Implement load()
Implement dumps()
Implement dump()
Benchmark against json and ujson (see #1)
Add a CI/CD pipeline for easier testing (see #2)
Create a proper pip package from it, to make installing easier (see #3).
Profile and optimize performance (see #16)
Add remaining keyword-only arguments to methods

Just pick one of the open tickets. We can provide mentorship if you like. 😃

Developer guide

This project uses poetry for managing the development environment. If you don't have it installed, run

curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python
export PATH="$HOME/.poetry/bin:$PATH"

The project requires the nightly version of Rust.

Install it via rustup:

rustup install nightly

If you have already installed the nightly version, make sure it is up-to-date:

rustup update nightly

After that, you can compile the current version of hyperjson and execute all tests and benchmarks with the following commands:

make install
make test
make bench

🤫 Pssst!... run make help to learn more.

Drawing pretty diagrams

In order to recreate the benchmark histograms, you first need a few additional prerequisites:

On macOS, please also add the following to your ~/.matplotlib/matplotlibrc (reference):

backend: TkAgg

After that, run the following:

make plot

License

hyperjson is licensed under either of

Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in hyperjson by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Comments

Error when installing via pip
Both pipenv install hyperjson and pip install hyperjson produce the following error:

Could not find a version that satisfies the requirement hyperjson (from versions: ) No matching distribution found for hyperjson

Tried this on Ubuntu 18.10 with python 3.6.7
opened by ZapAnton 11
[Github Action] Test and Bench via github actions

This sets up a github action to build/test/bench hyperjson across 3.5, 3.6, 3.7 x macos/ubuntu. This is my first github action, so things might not be as optimal as they could be. doesnt help with #60 atm, but deploying wheels looks pretty straightforward now that I'm comfy with actions.

opened by packysauce 10

Fix remaining unit tests

Here is a list of failing tests and their status:

============================ FAILURES =============================
________________ UltraJSONTests.testEncodeSymbols _________________

self = <test_ujson.UltraJSONTests testMethod=testEncodeSymbols>

    def testEncodeSymbols(self):
        s = '\u273f\u2661\u273f'  # ✿♡✿
        encoded = hyperjson.dumps(s)
        encoded_json = hyperjson.dumps(s)
>       self.assertEqual(len(encoded), len(s) * 6 + 2)  # 6 characters + quotes
E       AssertionError: 5 != 20

hyperjson/tests/test_ujson.py:229: AssertionError
_______________ UltraJSONTests.testEncodeUnicodeBMP _______________

self = <test_ujson.UltraJSONTests testMethod=testEncodeUnicodeBMP>

    def testEncodeUnicodeBMP(self):
        s = '\U0001f42e\U0001f42e\U0001F42D\U0001F42D'  # 🐮🐮🐭🐭
        encoded = hyperjson.dumps(s)
        encoded_json = hyperjson.dumps(s)

        if len(s) == 4:
>           self.assertEqual(len(encoded), len(s) * 12 + 2)
E           AssertionError: 6 != 50

hyperjson/tests/test_ujson.py:204: AssertionError
_____________ UltraJSONTests.test_ReadBadObjectSyntax _____________

self = <test_ujson.UltraJSONTests testMethod=test_ReadBadObjectSyntax>

    def test_ReadBadObjectSyntax(self):
        input = '{"age", 44}'
>       self.assertRaises(ValueError, hyperjson.loads, input)
E       _hyperjson.JSONDecodeError: Value: "{\"age\", 44}", Error: expected `:` at line 1 column 7

hyperjson/tests/test_ujson.py:820: JSONDecodeError
_________ UltraJSONTests.test_WriteArrayOfSymbolsFromList _________

self = <test_ujson.UltraJSONTests testMethod=test_WriteArrayOfSymbolsFromList>

    def test_WriteArrayOfSymbolsFromList(self):
        self.assertEqual("[true, false, null]",
>                        hyperjson.dumps([True, False, None]))
E       AssertionError: '[true, false, null]' != '[true,false,null]'
E       - [true, false, null]
E       ?       -      -
E       + [true,false,null]

hyperjson/tests/test_ujson.py:846: AssertionError
________ UltraJSONTests.test_WriteArrayOfSymbolsFromTuple _________

self = <test_ujson.UltraJSONTests testMethod=test_WriteArrayOfSymbolsFromTuple>

    def test_WriteArrayOfSymbolsFromTuple(self):
        self.assertEqual("[true, false, null]",
>                        hyperjson.dumps((True, False, None)))
E       AssertionError: '[true, false, null]' != '[true,false,null]'
E       - [true, false, null]
E       ?       -      -
E       + [true,false,null]

hyperjson/tests/test_ujson.py:850: AssertionError
___________ UltraJSONTests.test_decodeArrayDepthTooBig ____________

self = <test_ujson.UltraJSONTests testMethod=test_decodeArrayDepthTooBig>

    def test_decodeArrayDepthTooBig(self):
        input = '[' * (1024 * 1024)
>       self.assertRaises(RecursionError, hyperjson.loads, input)
E       _hyperjson.JSONDecodeError: Value: "[{{", Error: key must be a string at line 1 column 2

hyperjson/tests/test_ujson.py:397: JSONDecodeError
_______________________________ UltraJSONTests.test_decodeTrueBroken ________________________________

self = <test_ujson.UltraJSONTests testMethod=test_decodeTrueBroken>

    def test_decodeTrueBroken(self):
        input = "tru"
>       self.assertRaises(ValueError, hyperjson.loads, input)
E       _hyperjson.JSONDecodeError: Value: "tru", Error: expected ident at line 1 column 3

hyperjson/tests/test_ujson.py:413: JSONDecodeError
_______________________ UltraJSONTests.test_decodeWithTrailingNonWhitespaces ________________________

self = <test_ujson.UltraJSONTests testMethod=test_decodeWithTrailingNonWhitespaces>

    def test_decodeWithTrailingNonWhitespaces(self):
        input = "{}\n\t a"
>       self.assertRaises(JSONDecodeError, hyperjson.loads, input)
E       _hyperjson.JSONDecodeError: Value: "{}\n\t a", Error: trailing characters at line 2 column 3

hyperjson/tests/test_ujson.py:790: JSONDecodeError
__________________________________ UltraJSONTests.test_dumpToFile ___________________________________

self = <test_ujson.UltraJSONTests testMethod=test_dumpToFile>

    def test_dumpToFile(self):
        f = six.StringIO()
        hyperjson.dump([1, 2, 3], f)
>       self.assertEqual("[1, 2, 3]", f.getvalue())
E       AssertionError: '[1, 2, 3]' != '[1,2,3]'
E       - [1, 2, 3]
E       ?    -  -
E       + [1,2,3]

hyperjson/tests/test_ujson.py:556: AssertionError
_____________________________ UltraJSONTests.test_dumpToFileLikeObject ______________________________

self = <test_ujson.UltraJSONTests testMethod=test_dumpToFileLikeObject>

    def test_dumpToFileLikeObject(self):
        class filelike:
            def __init__(self):
                self.bytes = ''

            def write(self, bytes):
                self.bytes += bytes

        f = filelike()
        hyperjson.dump([1, 2, 3], f)
>       self.assertEqual("[1, 2, 3]", f.bytes)
E       AssertionError: '[1, 2, 3]' != '[1,2,3]'
E       - [1, 2, 3]
E       ?    -  -
E       + [1,2,3]

hyperjson/tests/test_ujson.py:568: AssertionError
_______________________ UltraJSONTests.test_encodeListLongUnsignedConversion ________________________

self = <test_ujson.UltraJSONTests testMethod=test_encodeListLongUnsignedConversion>

    def test_encodeListLongUnsignedConversion(self):
        input = [18446744073709551615,
                 18446744073709551615, 18446744073709551615]
        output = hyperjson.dumps(input)

>       self.assertEqual(input, hyperjson.loads(output))
E       AssertionError: Lists differ: [18446744073709551615, 18446744073709551615, 18446744073709551615] != [1.8446744073709552e+19, 1.8446744073709552e+19, 1.8446744073709552e+19]
E
E       First differing element 0:
E       18446744073709551615
E       1.8446744073709552e+19
E
E       - [18446744073709551615, 18446744073709551615, 18446744073709551615]
E       ?                   ^ ^^^^                ^ ^^^^                ^^^
E
E       + [1.8446744073709552e+19, 1.8446744073709552e+19, 1.8446744073709552e+19]
E       ?   +               +++ ^^^ ^               +++ ^^^ ^               +++ ^

hyperjson/tests/test_ujson.py:495: AssertionError
_________________________ UltraJSONTests.test_encodeLongUnsignedConversion __________________________

self = <test_ujson.UltraJSONTests testMethod=test_encodeLongUnsignedConversion>

    def test_encodeLongUnsignedConversion(self):
        input = 18446744073709551615
        output = hyperjson.dumps(input)

>       self.assertEqual(input, hyperjson.loads(output))
E       AssertionError: 18446744073709551615 != 1.8446744073709552e+19

hyperjson/tests/test_ujson.py:509: AssertionError
_______________________________ UltraJSONTests.test_encodeOrderedDict _______________________________

self = <test_ujson.UltraJSONTests testMethod=test_encodeOrderedDict>

    @unittest.skipIf(sys.version_info < (2, 7), "No Ordered dict in < 2.7")
    def test_encodeOrderedDict(self):
        from collections import OrderedDict
        input = OrderedDict([(1, 1), (0, 0), (8, 8), (2, 2)])
        self.assertEqual('{"1": 1, "0": 0, "8": 8, "2": 2}',
>                        hyperjson.dumps(input))
E       AssertionError: '{"1": 1, "0": 0, "8": 8, "2": 2}' != '{"0":0,"1":1,"2":2,"8":8}'
E       - {"1": 1, "0": 0, "8": 8, "2": 2}
E       + {"0":0,"1":1,"2":2,"8":8}

hyperjson/tests/test_ujson.py:369: AssertionError
___________________________________ UltraJSONTests.test_sortKeys ____________________________________

self = <test_ujson.UltraJSONTests testMethod=test_sortKeys>

    def test_sortKeys(self):
        data = {"a": 1, "c": 1, "b": 1, "e": 1, "f": 1, "d": 1}
        sortedKeys = hyperjson.dumps(data, sort_keys=True)
        self.assertEqual(
>           sortedKeys, '{"a": 1, "b": 1, "c": 1, "d": 1, "e": 1, "f": 1}')
E       AssertionError: '{"a":1,"b":1,"c":1,"d":1,"e":1,"f":1}' != '{"a": 1, "b": 1, "c": 1, "d": 1, "e": 1, "f": 1}'
E       - {"a":1,"b":1,"c":1,"d":1,"e":1,"f":1}
E       + {"a": 1, "b": 1, "c": 1, "d": 1, "e": 1, "f": 1}
E       ?      +  +    +  +    +  +    +  +    +  +    +

hyperjson/tests/test_ujson.py:865: AssertionError
========================= 30 failed, 102 passed, 28 skipped in 9.44 seconds =========================

bug help wanted good first issue

opened by mre 10

Applied several clippy suggestions

This PR tries to reduce the warning-noise when cargo clippy is applied.

The most common warning was about passing arguments by value instead of passing by reference:

warning: this argument is passed by value, but not consumed in the function body
   --> src/lib.rs:228:8
    |
228 |     s: PyObject,
    |        ^^^^^^^^ help: consider taking a reference instead: `&PyObject`
    |
    = help: for further information visit https://rust-lang-nursery.github.io/rust-clippy/v0.0.212/index.html#needless_pass_by_value

Some of the warnings I managed to resolve, but for functions annotated with #[pyfunction] the simple usage of &Option<> introduces an error:

error[E0277]: the trait bound `pyo3::PyObject: pyo3::PyTypeInfo` is not satisfied                                                                                                             
   --> src/lib.rs:153:1                                                                                                                                                                       
    |                                                                                                                                                                                         
153 | #[pyfunction]                                                                                                                                                                           
    | ^^^^^^^^^^^^^ the trait `pyo3::PyTypeInfo` is not implemented for `pyo3::PyObject`                                                                                                      
    |                                                                                                                                                                                         
    = note: required because of the requirements on the impl of `pyo3::PyTryFrom` for `pyo3::PyObject`                                                                                        
    = note: required because of the requirements on the impl of `pyo3::FromPyObject<'_>` for `&pyo3::PyObject`                                                                                
    = note: required by `pyo3::ObjectProtocol::extract`

The way to resolve the error could be the usage the PyObjectRef and also the usage of #[pyfn()] instead of #[pyfunction] (actually could not find #[pyfunction] in the 0.5.0 documentation - could it be deprecated?), but that would introduce the API breakage, so I left it for now.

The result changes are somewhat chaotic, so feel free to close

opened by ZapAnton 8

[bump_pyo3_version] Update to pyo3 0.8.0 and maturin

Like the title states. 0.6.0 introduced some API incompatibilities that I have fixed to the best of my abilities. I also went ahead and updated every dep involved, and fixed all incompatibilities discovered.

I ran profiles, tests, and played around with it in the python shell, and from what I can tell works well again

should fix #57

opened by packysauce 7

Linking error with make build

make build and by extension cargo build, cargo test and cargo bench fail with the following error:

error: linking with `cc` failed: exit code: 1
..... A lot of output ....
undefined reference to `PyExc_TypeError'
collect2: error: ld returned 1 exit status

Perhaps the build command should be modified to match setuptools-rust one?

E.g.

cargo rustc --lib --manifest-path Cargo.toml --features pyo3/extension-module pyo3/python3 --release -- --crate-type cdylib

opened by ZapAnton 7

Automate deployment to Pypi
We should automate the deployment process of hyperjson. Since I had some great experiences with Github actions, I would prefer to write a completely new CI pipeline with it and remove Travis from the project.

The pipeline should...

run the tests

publish the Python package for Python 3.5, 3.6, 3.7, and optionally sdist with the help of maturin.

(optionally) release to crates.io

If someone wants to tackle this, please go ahead. 😊
help wanted good first issue mentorship hacktoberfest
opened by mre 5
Nice performance boost for handling boolean values

We can serialize to bool directly without using extract. This gives us a nice performance boost as can be seen in the new plots. The same trick can probably be applied elsewhere. Good places to look for such improvements are the remaining extract calls and the macro handling for casts.

opened by mre 5
Find and fix possible performance bottlenecks

Yesterday I did some profiling using the setup described here. The resulting callgrind file is attached. This can be opened with qcachegrind on Mac or kcachegrind on Linux.

callgrind.out.35583.zip

If you don't have any of those programs handy, I've added a screenshot for the two main bottlenecks that I can see. I'm not an expert, but it looks like we spend a lot of time allocating, converting, and dropping the BTreeMap, which will be converted to a dictionary and returned to Python in the end.

I guess we could save a lot of time by making this part more efficient. E.g. by copying less and instead working on references. Might be mistaken, though. Help and pull requests are very welcome. 😊
enhancement mentorship

opened by mre 5

Fixup CI

Github deprecated the ::add_path command sometime in October which is causing the CI to fail, adding platform specific steps for modifying the PATH should fix this (at least it did in my poetry/py03 project).

Relevant error:

The `add-path` command is disabled. Please upgrade to using Environment Files or opt into unsecure command execution by setting the `ACTIONS_ALLOW_UNSECURE_COMMANDS` environment variable to `true`. For more information see: https://github.blog/changelog/2020-10-01-github-actions-deprecating-set-env-and-add-path-commands/

opened by wseaton 4

Bump pytest from 5.2.4 to 5.3.0
Bumps pytest from 5.2.4 to 5.3.0.

Release notes

Sourced from pytest's releases.

5.3.0

pytest 5.3.0 (2019-11-19)

Deprecations

#6179: The default value of junit_family option will change to xunit2 in pytest 6.0, given that this is the version supported by default in modern tools that manipulate this type of file.

In order to smooth the transition, pytest will issue a warning in case the --junitxml option is given in the command line but junit_family is not explicitly configured in pytest.ini.

For more information, see the docs.

Features

#4488: The pytest team has created the pytest-reportlog plugin, which provides a new --report-log=FILE option that writes report logs into a file as the test session executes.

Each line of the report log contains a self contained JSON object corresponding to a testing event, such as a collection or a test result report. The file is guaranteed to be flushed after writing each line, so systems can read and process events in real-time.

The plugin is meant to replace the --resultlog option, which is deprecated and meant to be removed in a future release. If you use --resultlog, please try out pytest-reportlog and provide feedback.

#4730: When sys.pycache_prefix (Python 3.8+) is set, it will be used by pytest to cache test files changed by the assertion rewriting mechanism.

This makes it easier to benefit of cached .pyc files even on file systems without permissions.

#5515: Allow selective auto-indentation of multiline log messages.

Adds command line option --log-auto-indent, config option log_auto_indent and support for per-entry configuration of

... (truncated)

Changelog

Sourced from pytest's changelog.

pytest 5.3.0 (2019-11-19)

Deprecations

#6179: The default value of junit_family option will change to xunit2 in pytest 6.0, given that this is the version supported by default in modern tools that manipulate this type of file.

In order to smooth the transition, pytest will issue a warning in case the --junitxml option is given in the command line but junit_family is not explicitly configured in pytest.ini.

For more information, see the docs.

Features

#4488: The pytest team has created the pytest-reportlog plugin, which provides a new --report-log=FILE option that writes report logs into a file as the test session executes.

Each line of the report log contains a self contained JSON object corresponding to a testing event, such as a collection or a test result report. The file is guaranteed to be flushed after writing each line, so systems can read and process events in real-time.

The plugin is meant to replace the --resultlog option, which is deprecated and meant to be removed in a future release. If you use --resultlog, please try out pytest-reportlog and provide feedback.

#4730: When sys.pycache_prefix (Python 3.8+) is set, it will be used by pytest to cache test files changed by the assertion rewriting mechanism.

This makes it easier to benefit of cached .pyc files even on file systems without permissions.

#5515: Allow selective auto-indentation of multiline log messages.

Adds command line option --log-auto-indent, config option log_auto_indent and support for per-entry configuration of indentation behavior on calls to logging.log().

Alters the default for auto-indention from on to off. This restores the older behavior that existed prior to v4.6.0. This reversion to earlier behavior was done because it is better to activate new features that may lead to broken tests explicitly rather than implicitly.

#5914: pytester learned two new functions, no_fnmatch_line and no_re_match_line.

The functions are used to ensure the captured text does not match the given pattern.

The previous idiom was to use re.match:

assert re.match(pat, result.stdout.str()) is None

Or the in operator:

assert text in result.stdout.str()

But the new functions produce best output on failure.

#6057: Added tolerances to complex values when printing pytest.approx.

... (truncated)

Commits

be59827 Small fixes in the CHANGELOG for 5.3.0

4b16b93 Preparing release version 5.3.0

21622d0 Merge remote-tracking branch 'upstream/master' into release-5.3.0

d1e2d12 python: remove unused pytest_make_parametrize_id hookimpl (#6228)

f36ea24 Remove check for os.symlink, always there in py3+ (#6227)

4804d4b python: remove unused pytest_make_parametrize_id hookimpl

b820b7e Merge pull request #6224 from blueyed/visit_Assert-minor-cleanup

8d3e8b1 Revert "ci: use tox -vv" (#6226)

63a23d8 Remove check for os.symlink, always there in py3+

eeeb196 Merge pull request #6202 from linw1995/fix_getmodpath

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

If all status checks pass Dependabot will automatically merge this pull request.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

@dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

Additionally, you can set the following in your Dependabot dashboard:

Update frequency (including time of day and day of week)

Pull request limits (per update run and/or open at any time)

Automerge options (never/patch/minor, and dev/runtime dependencies)

Out-of-range updates (receive only lockfile updates, if desired)

Security updates (receive only security updates, if desired)

dependencies
opened by dependabot-preview[bot] 4
Bump simplejson from 3.17.2 to 3.18.0
Bumps simplejson from 3.17.2 to 3.18.0.

Release notes

Sourced from simplejson's releases.

v3.18.0

Version 3.18.0 released 2022-11-14

Allow serialization of classes that implement for_json or _asdict by ignoring TypeError when those methods are called simplejson/simplejson#302

Raise JSONDecodeError instead of ValueError in invalid unicode escape sequence edge case simplejson/simplejson#298

v3.17.6

Version 3.17.6 released 2021-11-15

Declare support for Python 3.10 and add wheels simplejson/simplejson#291 simplejson/simplejson#292

v3.17.5

Version 3.17.5 released 2021-08-23

Fix the C extension module to harden is_namedtuple against looks-a-likes such as Mocks. Also prevent dict encoding from causing an unraised SystemError when encountering a non-Dict. Noticed by running user tests against a CPython interpreter with C asserts enabled (COPTS += -UNDEBUG). simplejson/simplejson#284

v3.17.4

Version 3.17.4 released 2021-08-19

Upgrade cibuildwheel simplejson/simplejson#287

v3.17.3

Version 3.17.3 released 2021-07-09

Replaced Travis-CI and AppVeyor with Github Actions, adding wheels for Python 3.9. simplejson/simplejson#283

Changelog

Sourced from simplejson's changelog.

Version 3.18.0 released 2022-11-14

Allow serialization of classes that implement for_json or _asdict by ignoring TypeError when those methods are called simplejson/simplejson#302

Raise JSONDecodeError instead of ValueError in invalid unicode escape sequence edge case simplejson/simplejson#298

Version 3.17.6 released 2021-11-15

Declare support for Python 3.10 and add wheels simplejson/simplejson#291 simplejson/simplejson#292

Version 3.17.5 released 2021-08-23

Fix the C extension module to harden is_namedtuple against looks-a-likes such as Mocks. Also prevent dict encoding from causing an unraised SystemError when encountering a non-Dict. Noticed by running user tests against a CPython interpreter with C asserts enabled (COPTS += -UNDEBUG). simplejson/simplejson#284

Version 3.17.4 released 2021-08-19

Upgrade cibuildwheel simplejson/simplejson#287

Version 3.17.3 released 2021-07-09

Replaced Travis-CI and AppVeyor with Github Actions, adding wheels for Python 3.9. simplejson/simplejson#283

Version 3.17.2 released 2020-07-16

Added arm64 to build matrix and reintroduced manylinux wheels simplejson/simplejson#264

No more bdist_wininst builds per PEP 527 simplejson/simplejson#260

Minor grammatical issue fixed in README simplejson/simplejson#261

Version 3.17.0 released 2019-11-17

Updated documentation to be Python 3 first, and have removed documentation notes about version changes that occurred more than five years ago. simplejson/simplejson#257

... (truncated)

Commits

66c62d8 Merge pull request #302 from simplejson/class-serialization

da213dc Revert unnecessary change

41f81a8 Rename variable for clarity

2c2c998 v3.18.0

3518a18 Implement speedups for #301

ef43c2d Implement tests and fallback implementation of #301

4364525 Merge pull request #300 from Attakay78/master

e4941c0 #299 Comment error fix

74c95b6 Update CHANGES.txt and bump version

d8cee7b Merge pull request #298 from ks888/fix-value-error

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies python
opened by dependabot[bot] 0
Bump regex from 1.3.1 to 1.5.6
Bumps regex from 1.3.1 to 1.5.6.

Changelog

Sourced from regex's changelog.

1.5.6 (2022-05-20)

This release includes a few bug fixes, including a bug that produced incorrect matches when a non-greedy ? operator was used.

[BUG #680](rust-lang/regex#680): Fixes a bug where [[:alnum:][:^ascii:]] dropped [:alnum:] from the class.

[BUG #859](rust-lang/regex#859): Fixes a bug where Hir::is_match_empty returned false for \b.

[BUG #862](rust-lang/regex#862): Fixes a bug where 'ab??' matches 'ab' instead of 'a' in 'ab'.

1.5.5 (2022-03-08)

This releases fixes a security bug in the regex compiler. This bug permits a vector for a denial-of-service attack in cases where the regex being compiled is untrusted. There are no known problems where the regex is itself trusted, including in cases of untrusted haystacks.

SECURITY #GHSA-m5pq-gvj9-9vr8: Fixes a bug in the regex compiler where empty sub-expressions subverted the existing mitigations in place to enforce a size limit on compiled regexes. The Rust Security Response WG published an advisory about this: https://groups.google.com/g/rustlang-security-announcements/c/NcNNL1Jq7Yw

1.5.4 (2021-05-06)

This release fixes another compilation failure when building regex. This time, the fix is for when the pattern feature is enabled, which only works on nightly Rust. CI has been updated to test this case.

[BUG #772](rust-lang/regex#772): Fix build when pattern feature is enabled.

1.5.3 (2021-05-01)

This releases fixes a bug when building regex with only the unicode-perl feature. It turns out that while CI was building this configuration, it wasn't actually failing the overall build on a failed compilation.

[BUG #769](rust-lang/regex#769): Fix build in regex-syntax when only the unicode-perl feature is enabled.

1.5.2 (2021-05-01)

This release fixes a performance bug when Unicode word boundaries are used.

... (truncated)

Commits

9aef5b1 1.5.6

2931b07 syntax: bump minimum regex-syntax version to 0.6.26

b41bde0 regex-syntax-0.6.26

d98da65 changelog: 1.5.6

1c19619 syntax: fix literal extraction for 'ab??'

88a2a62 syntax: fix 'is_match_empty' predicate

72f09f1 syntax: fix ascii class union bug

b537286 doc: fix some typos

258bdf7 changelog: 1.5.5

d130381 1.5.5

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies rust
opened by dependabot[bot] 0
Bump flake8 from 3.9.1 to 3.9.2
Bumps flake8 from 3.9.1 to 3.9.2.

Commits

c6e0d27 Release 3.9.2

c428c55 Merge pull request #1328 from PyCQA/fix_indent_size_str

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

dependencies
opened by dependabot[bot] 0
Bump autopep8 from 1.5.6 to 1.5.7
Bumps autopep8 from 1.5.6 to 1.5.7.

Release notes

Sourced from autopep8's releases.

v1.5.7

Change

#597: disallow 0 for indent-size option

#595: exit code is 99 when error occured cli option parsing

Bug Fix

#591, #592: exit code correctly on permission denied failure

Commits

32c78a3 version 1.5.7

a0e00a8 fix invalid regex

9274aac refactoring

745faa8 Merge pull request #597 from hhatto/change-indent-size-option-zero-is-not-all...

283e799 Merge branch 'master' into change-indent-size-option-zero-is-not-allowed

64087d9 change: disallow 0 for indent-size option

47690b0 Merge pull request #596 from howeaj/patch-1

85d7c81 Show more options in example config

3cf5bfe Merge pull request #595 from hhatto/fix-exit-code-99-with-cli-option-parse-error

a4f0e86 fix unit test

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

dependencies
opened by dependabot[bot] 0
[0.2.4] Low-level details in JSONDecodeError message by mistake?
Hi!

I'm happy to find that hyperjson rejects single surrogates as invalid characters but the exception text looks more low-level than expected. Is this a bug? Can the exception message be made more highlevel like "single surrogates not allowed"?

In [8]: hyperjson.__version__ Out[8]: '0.2.4' In [9]: hyperjson.loads('"\\ud800"') [..] JSONDecodeError: Value: PyObject(0x7f481b3a1530), Error: PyErr { type: Py(0x7f48262225c0, PhantomData) }: line 1 column 1 (char 0)

Thanks and best, Sebastian
opened by hartwork 0
Speed up boolean encoding/decoding

From our benchmarks we can see that we are consistently slower than everyone else when serializing/deserializing boolean values. We should fix that.

orjson is using an unsafe block to create a reference to a boolean: https://github.com/ijl/orjson/blob/03d55e99a953ce93cedc05f03e4b63b0bcbbcc7a/src/decode.rs#L81-L96

This avoids additional allocations. For comparison, this is our code at the moment:

https://github.com/mre/hyperjson/blob/ded13b4100638aa32fe19dc477f5cfe3e704893c/src/lib.rs#L475-L480

I wonder if we could achieve comparable performance without using unsafe. @konstin, any idea? Maybe there was a recent development in pyo3 that we could leverage here?
enhancement help wanted good first issue hacktoberfest

opened by mre 1

Releases(v0.2.4)

v0.2.4(Jan 31, 2020)
With this release, we move our linting, building, benchmarking, and publishing process to Github actions (#130)

Updated dependencies

Source code(tar.gz)
Source code(zip)
hyperjson-0.2.4-cp35-cp35m-macosx_10_7_x86_64.whl(163.82 KB)
hyperjson-0.2.4-cp35-cp35m-manylinux1_x86_64.whl(186.75 KB)
hyperjson-0.2.4-cp36-cp36m-macosx_10_7_x86_64.whl(163.82 KB)
hyperjson-0.2.4-cp36-cp36m-manylinux1_x86_64.whl(186.71 KB)
hyperjson-0.2.4-cp36-none-win_amd64.whl(160.56 KB)
hyperjson-0.2.4-cp37-cp37m-macosx_10_7_x86_64.whl(163.70 KB)
hyperjson-0.2.4-cp37-cp37m-manylinux1_x86_64.whl(186.58 KB)
hyperjson-0.2.4-cp37-none-win_amd64.whl(160.34 KB)
hyperjson-0.2.4-cp38-cp38-macosx_10_7_x86_64.whl(163.70 KB)
hyperjson-0.2.4-cp38-cp38-manylinux1_x86_64.whl(186.57 KB)
hyperjson-0.2.4-cp38-none-win_amd64.whl(160.35 KB)
v0.2.0(Nov 19, 2018)

This release features lots of improvements in the codebase. Apart from many performance improvements, hyperjson is also more stable thanks to the updates in pyo3 0.5. It is also the first public version, that is available on PyPi 🎉
Source code(tar.gz)
Source code(zip)
hyperjson-0.2.0-cp27-cp27mu-manylinux1_x86_64.whl(196.13 KB)
hyperjson-0.2.0-cp35-cp35m-manylinux1_x86_64.whl(194.23 KB)
hyperjson-0.2.0-cp36-cp36m-manylinux1_x86_64.whl(194.24 KB)
hyperjson-0.2.0-cp37-cp37m-manylinux1_x86_64.whl(194.16 KB)

Owner

Matthias

Curious person. Maker. Rustacean. Oxidizing things.

GitHub

This is my reading list for my PhD in AI, NLP, Deep Learning and more.

156 Dec 21, 2022

ThinkTwice: A Two-Stage Method for Long-Text Machine Reading Comprehension

ThinkTwice ThinkTwice is a retriever-reader architecture for solving long-text machine reading comprehension. It is based on the paper: ThinkTwice: A

4 Aug 6, 2021

Code repository for "It's About Time: Analog clock Reading in the Wild"

it's about time Code repository for "It's About Time: Analog clock Reading in the Wild" Packages required: pytorch (used 1.9, any reasonable version s

52 Nov 10, 2022

A 30000+ Chinese MRC dataset - Delta Reading Comprehension Dataset

Delta Reading Comprehension Dataset 台達閱讀理解資料集 Delta Reading Comprehension Dataset (DRCD) 屬於通用領域繁體中文機器閱讀理解資料集。本資料集期望成為適用於遷移學習之標準中文閱讀理解資料集。本資料集從2,108篇

272 Dec 15, 2022

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

13.6k Jan 5, 2023

GCRC: A Gaokao Chinese Reading Comprehension dataset for interpretable Evaluation

GCRC GCRC: A New Challenging MRC Dataset from Gaokao Chinese for Explainable Eva

5 Nov 4, 2022

The model is designed to train a single and large neural network in order to predict correct translation by reading the given sentence.

Neural Machine Translation communication system The model is basically direct to convert one source language to another targeted language using encode

7 Sep 22, 2022

Syntax-aware Multi-spans Generation for Reading Comprehension (TASLP 2022)

SyntaxGen Syntax-aware Multi-spans Generation for Reading Comprehension (TASLP 2022) In this repo, we upload all the scripts for this work. Due to siz

3 Jun 13, 2022

Codes for coreference-aware machine reading comprehension

Data and code for the paper "Tracing Origins: Coreference-aware Machine Reading Comprehension" at ACL2022. Dataset There are three folders for our thr

11 Sep 29, 2022

This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Proteno This is the data release associated with the corresponding NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deploymen

37 Dec 4, 2022

Addon for adding subtitle files to blender VSE as Text sequences. Using pysub2 python module.

Import Subtitles for Blender VSE Addon for adding subtitle files to blender VSE as Text sequences. Using pysub2 python module. Supported formats by py

4 Feb 27, 2022

🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

?? The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

15k Jan 2, 2023

lightweight, fast and robust columnar dataframe for data analytics with online update

streamdf Streamdf is a lightweight data frame library built on top of the dictionary of numpy array, developed for Kaggle's time-series code competiti

23 May 19, 2022

Python module (C extension and plain python) implementing Aho-Corasick algorithm

pyahocorasick pyahocorasick is a fast and memory efficient library for exact or approximate multi-pattern string search meaning that you can find mult

763 Dec 27, 2022

Python module (C extension and plain python) implementing Aho-Corasick algorithm

pyahocorasick pyahocorasick is a fast and memory efficient library for exact or approximate multi-pattern string search meaning that you can find mult

579 Feb 17, 2021

An ultra fast tiny model for lane detection, using onnx_parser, TensorRTAPI, torch2trt to accelerate. our model support for int8, dynamic input and profiling. (Nvidia-Alibaba-TensoRT-hackathon2021)

Ultra_Fast_Lane_Detection_TensorRT An ultra fast tiny model for lane detection, using onnx_parser, TensorRTAPI to accelerate. our model support for in

121 Dec 27, 2022

Blazing fast language detection using fastText model

Luga A blazing fast language detection using fastText's language models Luga is a Swahili word for language. fastText provides a blazing fast language

18 Dec 20, 2022

This is a general repo that helps you develop fast/effective NLP classifiers using Huggingface

NLP Classifier Introduction This project trains a bert model on any NLP classifcation model. And uses the model in make predictions on new data using

3 Mar 11, 2022

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group

8.4k Dec 30, 2022

🐍 A hyper-fast Python module for reading/writing JSON data using Rust's serde-json.

Related tags

Overview

⚠️ NOTE

Installation

Usage

Motivation

Goals

Non-goals

Benchmark

Contributions welcome!

Developer guide

Drawing pretty diagrams

License

Contribution

Comments

5.3.0

pytest 5.3.0 (2019-11-19)

Deprecations

Features

pytest 5.3.0 (2019-11-19)

Deprecations

Features

v3.18.0

v3.17.6

v3.17.5

v3.17.4

v3.17.3

1.5.6 (2022-05-20)

1.5.5 (2022-03-08)

1.5.4 (2021-05-06)

1.5.3 (2021-05-01)

1.5.2 (2021-05-01)

v1.5.7

Change

Bug Fix

Releases(v0.2.4)

v0.2.4(Jan 31, 2020)

v0.2.0(Nov 19, 2018)

Owner

Matthias

This is my reading list for my PhD in AI, NLP, Deep Learning and more.

ThinkTwice: A Two-Stage Method for Long-Text Machine Reading Comprehension

Code repository for "It's About Time: Analog clock Reading in the Wild"

A 30000+ Chinese MRC dataset - Delta Reading Comprehension Dataset

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

GCRC: A Gaokao Chinese Reading Comprehension dataset for interpretable Evaluation

The model is designed to train a single and large neural network in order to predict correct translation by reading the given sentence.

Syntax-aware Multi-spans Generation for Reading Comprehension (TASLP 2022)

Codes for coreference-aware machine reading comprehension

This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Addon for adding subtitle files to blender VSE as Text sequences. Using pysub2 python module.

🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

lightweight, fast and robust columnar dataframe for data analytics with online update

Python module (C extension and plain python) implementing Aho-Corasick algorithm

Python module (C extension and plain python) implementing Aho-Corasick algorithm

An ultra fast tiny model for lane detection, using onnx_parser, TensorRTAPI, torch2trt to accelerate. our model support for int8, dynamic input and profiling. (Nvidia-Alibaba-TensoRT-hackathon2021)

Blazing fast language detection using fastText model

This is a general repo that helps you develop fast/effective NLP classifiers using Huggingface

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.