A fast streaming JSON parser for Python that generates SAX-like events using yajl

Related tags

JSON json-streamer
Overview

json-streamer Build Status

jsonstreamer provides a SAX-like push parser via the JSONStreamer class and a 'object' parser via the ObjectStreamer class which emits top level entities in any JSON object. Based on the fast c libary 'yajl'. Great for parsing streaming json over a network as it comes in or json objects that are too large to hold in memory altogether.

Dependencies

git clone [email protected]:lloyd/yajl.git
cd yajl
./configure && make install

Setup

pip3 install jsonstreamer

Also available at PyPi - https://pypi.python.org/pypi/jsonstreamer

Example

Shell

python -m jsonstreamer.jsonstreamer < some_file.json

Code

variables which contain the input we want to parse

json_object = """
    {
        "fruits":["apple","banana", "cherry"],
        "calories":[100,200,50]
    }
"""
json_array = """[1,2,true,[4,5],"a"]"""

a catch-all event listener function which prints the events

def _catch_all(event_name, *args):
    print('\t{} : {}'.format(event_name, args))

JSONStreamer Example

Event listeners get events in their parameters and must have appropriate signatures for receiving their specific event of interest.

JSONStreamer provides the following events:

  • doc_start
  • doc_end
  • object_start
  • object_end
  • array_start
  • array_end
  • key - this also carries the name of the key as a string param
  • value - this also carries the value as a string|int|float|boolean|None param
  • element - this also carries the value as a string|int|float|boolean|None param

Listener methods must have signatures that match

For example for events: doc_start, doc_end, object_start, object_end, array_start and array_end the listener must be as such, note no params required

def listener():
    pass

OR, if your listener is a class method, it can have an additional 'self' param as such

def listener(self):
    pass

For events: key, value, element listeners must also receive an additional payload and must be declared as such

def key_listener(key_string):
    pass

import and run jsonstreamer on 'json_object'

from jsonstreamer import JSONStreamer 

print("\nParsing the json object:")
streamer = JSONStreamer() 
streamer.add_catch_all_listener(_catch_all)
streamer.consume(json_object[0:10]) #note that partial input is possible
streamer.consume(json_object[10:])
streamer.close()

output

Parsing the json object:
    doc_start : ()
    object_start : ()
    key : ('fruits',)
    array_start : ()
    element : ('apple',)
    element : ('banana',)
    element : ('cherry',)
    array_end : ()
    key : ('calories',)
    array_start : ()
    element : (100,)
    element : (200,)
    element : (50,)
    array_end : ()
    object_end : ()
    doc_end : ()

run jsonstreamer on 'json_array'

print("\nParsing the json array:")
streamer = JSONStreamer() #can't reuse old object, make a fresh one
streamer.add_catch_all_listener(_catch_all)
streamer.consume(json_array[0:5])
streamer.consume(json_array[5:])
streamer.close()

output

Parsing the json array:
    doc_start : ()
    array_start : ()
    element : (1,)
    element : (2,)
    element : (True,)
    array_start : ()
    element : (4,)
    element : (5,)
    array_end : ()
    element : ('a',)
    array_end : ()
    doc_end : ()

ObjectStreamer Example

ObjectStreamer provides the following events:

  • object_stream_start
  • object_stream_end
  • array_stream_start
  • array_stream_end
  • pair
  • element

import and run ObjectStreamer on 'json_object'

from jsonstreamer import ObjectStreamer

print("\nParsing the json object:")
object_streamer = ObjectStreamer()
object_streamer.add_catch_all_listener(_catch_all)
object_streamer.consume(json_object[0:9])
object_streamer.consume(json_object[9:])
object_streamer.close()

output

Parsing the json object:
    object_stream_start : ()
    pair : (('fruits', ['apple', 'banana', 'cherry']),)
    pair : (('calories', [100, 200, 50]),)
    object_stream_end : ()

run the ObjectStreamer on the 'json_array'

print("\nParsing the json array:")
object_streamer = ObjectStreamer()
object_streamer.add_catch_all_listener(_catch_all)
object_streamer.consume(json_array[0:4])
object_streamer.consume(json_array[4:])
object_streamer.close()

output - note that the events are different for an array

Parsing the json array:
    array_stream_start : ()
    element : (1,)
    element : (2,)
    element : (True,)
    element : ([4, 5],)
    element : ('a',)
    array_stream_end : ()

Example on attaching listeners for various events

ob_streamer = ObjectStreamer()

def pair_listener(pair):
    print('Explicit listener: Key: {} - Value: {}'.format(pair[0],pair[1]))
    
ob_streamer.add_listener('pair', pair_listener) #same for JSONStreamer
ob_streamer.consume(json_object)

ob_streamer.remove_listener(pair_listener) #if you need to remove the listener explicitly

Even easier way of attaching listeners

class MyClass:
    
    def __init__(self):
        self._obj_streamer = ObjectStreamer() #same for JSONStreamer
        
        # this automatically finds listeners in this class and attaches them if they are named
        # using the following convention '_on_eventname'. Note method names in this class
        self._obj_streamer.auto_listen(self) 
    
    def _on_object_stream_start(self):
        print ('Root Object Started')
        
    def _on_pair(self, pair):
        print('Key: {} - Value: {}'.format(pair[0],pair[1]))
        
    def parse(self, data):
        self._obj_streamer.consume(data)
        
        
m = MyClass()
m.parse(json_object)

Troubleshooting

  • If you get an OSError('Yajl cannot be found.') Please ensure that libyajl is available in the relevant directory. For example, on mac(osx) /usr/local/lib should have a "libyajl.dylib" Linux -> libyajl.so Windows -> yajl.dll
Comments
  • Trouble using 'jsonstreamer` with 'yajl-2' on Ubuntu 14.04

    Trouble using 'jsonstreamer` with 'yajl-2' on Ubuntu 14.04

    Hey @kashifrazzaqui

    I have been trying to use your library json-streamer for implementing a Streaming API..

    As directed, I have installed yajl on my Ubuntu 14.04 system and also verified it's presence and correct installation (refer: [1] & [2])

    Still, on running the command python3 -m jsonstreamer.jsonstreamer < test.json i.e. using it with jsonstreamer gives me the following :

      File "/usr/local/lib/python3.4/dist-packages/jsonstreamer/yajl/parse.py", line 29, in load_lib
        raise OSError('Yajl cannot be found.')
    OSError: Yajl cannot be found.
    

    Following up in https://github.com/lloyd/yajl/issues/190 it seems that there might be an issue in the parse.py file itself ? Maybe it's looking for yajl1 and not yajl2.

    Any pointers on this one ? Help appreciated.


    [1] Running gcc -lyajl yields:

    jigyasa@spin:~$ gcc -lyajl
    ....
    /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crt1.o: In function `_start':
    (.text+0x20): undefined reference to `main'
    collect2: error: ld returned 1 exit status
    

    [2] And sudo ldconfig -p | grep yajl results in:

    jigyasa@spin:~$ sudo ldconfig -p | grep yajl
        libyajl.so.2 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libyajl.so.2
    
    opened by jigyasa-grover 10
  • Ensure exception __str__ methods return strings

    Ensure exception __str__ methods return strings

    Hi there,

    Issues that throw JSONStreamerException classes are difficult to debug because there is no expectation that a str will be returned. This makes debugging a PITA.

    awesome_module.py", line 51, in map_step
        url + '\n' + str(e))
    TypeError: __str__ returned non-string (type bytes)
    
    opened by mach-kernel 3
  • Missing tests & tags

    Missing tests & tags

    PyPI has 1.3.6 , and no tests.

    GitHub only has a tag for v1.0.0 , so I cant use that.

    Could you tag v1.3.6 in GitHub, so I can use it to get tests, and finish https://build.opensuse.org/package/show/home:jayvdb:py-new/python-jsonstreamer after https://github.com/kashifrazzaqui/again/issues/8 is also fixed.

    opened by jayvdb 2
  • SyntaxError: invalid syntax

    SyntaxError: invalid syntax

    Traceback (most recent call last): File "test_jsonstreamer.py", line 3, in from jsonstreamer import JSONStreamer File "/usr/local/lib/python2.7/dist-packages/jsonstreamer/init.py", line 9, in from jsonstreamer.jsonstreamer import JSONStreamer, ObjectStreamer File "/usr/local/lib/python2.7/dist-packages/jsonstreamer/jsonstreamer.py", line 12, in from again import events File "/usr/local/lib/python2.7/dist-packages/again/init.py", line 4, in from .events import EventSource, AsyncEventSource File "/usr/local/lib/python2.7/dist-packages/again/events.py", line 49 yield from each(*args, **kwargs) ^ SyntaxError: invalid syntax python --version Python 2.7.3

    opened by tuhaolam 2
  • Want to split a 22M JSON file into smaller files to track a problem

    Want to split a 22M JSON file into smaller files to track a problem

    I have a large JSON file that has an error somewhere. I want to split the up the JSON file into smaller files that are also JSON so that I can find out where the error is. Possible with your package ?

    opened by winash12 1
  • Trouble using 'jsonstreamer` with 'yajl' on Windows 10

    Trouble using 'jsonstreamer` with 'yajl' on Windows 10

    Hey @kashifrazzaqui

    I have been trying to use your library json-streamer for implementing a Streaming API..

    As directed, I have installed yajl on my Windows 10 system and installed it as below:

    C:\Users\mianand\Downloads\lloyd-yajl-2.1.0-0-ga0ecdde\lloyd-yajl-66cb08c\build>nmake install

    Microsoft (R) Program Maintenance Utility Version 14.00.24210.0 Copyright (C) Microsoft Corporation. All rights reserved.

    [ 30%] Built target yajl_s [ 60%] Built target yajl [ 66%] Built target yajl_test [ 72%] Built target gen-extra-close [ 78%] Built target json_reformat [ 84%] Built target json_verify [ 90%] Built target parse_config [100%] Built target perftest Install the project... -- Install configuration: "Release" -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/lib/yajl.lib -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/lib/yajl.dll -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/lib/yajl_s.lib -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/include/yajl/yajl_parse.h -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/include/yajl/yajl_gen.h -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/include/yajl/yajl_common.h -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/include/yajl/yajl_tree.h -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/include/yajl/yajl_version.h -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/share/pkgconfig/yajl.pc -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/bin/json_reformat.exe -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/bin/json_verify.exe

    Still, on running the conda with python 3.6 gives me the following :

    from jsonstreamer import JSONStreamer Traceback (most recent call last): File "", line 1, in File "C:\Users\mianand\AppData\Local\Continuum\anaconda3\envs\pycharm_venv\lib\site-packages\jsonstreamer_init_.py", line 9, in from jsonstreamer.jsonstreamer import JSONStreamer, ObjectStreamer File "C:\Users\mianand\AppData\Local\Continuum\anaconda3\envs\pycharm_venv\lib\site-packages\jsonstreamer\jsonstreamer.py", line 14, in from .yajl.parse import YajlParser, YajlListener, YajlError File "C:\Users\mianand\AppData\Local\Continuum\anaconda3\envs\pycharm_venv\lib\site-packages\jsonstreamer\yajl\parse.py", line 32, in yajl = load_lib() File "C:\Users\mianand\AppData\Local\Continuum\anaconda3\envs\pycharm_venv\lib\site-packages\jsonstreamer\yajl\parse.py", line 29, in load_lib raise OSError('Yajl cannot be found.') OSError: Yajl cannot be found.

    Any pointers on this one ? Help appreciated.

    opened by mitendraanand 1
  • Not looking for yajl.dll when loading Yajl

    Not looking for yajl.dll when loading Yajl

    In the method load_lib(), there is never an attempt to load Yajl from yajl.dll, which is the name of Yajl on windows. I think it would be rather easy to add this, and make this package useful on Windows as well.

    opened by Groomtar 1
  • pypi version ahead of master branch

    pypi version ahead of master branch

    Please update the PyPI entry of json-streamer https://pypi.python.org/pypi/jsonstreamer/1.3.6 and consider linking there from the short text description here.

    opened by johnyf 1
  • outdated pypi package

    outdated pypi package

    Hi,

    Could you update the pypi package? As far as I see, there were some commits since the last pypi upload. Also, I think it is a bit confusing that there is one tagged release, which is 1.0, while pypi package has 1.3.6 version number, but both of them almost a year older than some important fixes, e.g. the exponential floats. (I can install the file on my own, but I think it would be nice to update the releases.)

    opened by dvolgyes 0
Releases(v1.3.8)
Owner
Kashif Razzaqui
https://medium.com/@kashifrazzaqui
Kashif Razzaqui
Json utils is a python module that you can use when working with json files.

Json-utils Json utils is a python module that you can use when working with json files. it comes packed with a lot of featrues Features Converting jso

Advik 4 Apr 24, 2022
Ibmi-json-beautify - Beautify json string with python

Ibmi-json-beautify - Beautify json string with python

Jefferson Vaughn 3 Feb 2, 2022
Creates fake JSON files from a JSON schema

Use jsf along with fake data generators to provide consistent and meaningful fake data for your system.

Andy Challis 86 Jan 3, 2023
Random JSON Key:Pair Json Generator

Random JSON Key:Value Pair Generator This simple script take an engish dictionary of words and and makes random key value pairs. The dictionary has ap

Chris Edwards 1 Oct 14, 2021
With the help of json txt you can use your txt file as a json file in a very simple way

json txt With the help of json txt you can use your txt file as a json file in a very simple way Dependencies re filemod pip install filemod Installat

Kshitij 1 Dec 14, 2022
Same as json.dumps or json.loads, feapson support feapson.dumps and feapson.loads

Same as json.dumps or json.loads, feapson support feapson.dumps and feapson.loads

boris 5 Dec 1, 2021
cysimdjson - Very fast Python JSON parsing library

Fast JSON parsing library for Python, 7-12 times faster than standard Python JSON parser.

TeskaLabs 235 Dec 29, 2022
simplejson is a simple, fast, extensible JSON encoder/decoder for Python

simplejson simplejson is a simple, fast, complete, correct and extensible JSON <http://json.org> encoder and decoder for Python 3.3+ with legacy suppo

null 1.5k Jan 5, 2023
Python script for converting .json to .md files using Mako templates.

Install Just install poetry and update script dependencies Usage Put your settings in settings.py and .json data (optionally, with attachments) in dat

Alexey Borontov 6 Dec 7, 2021
A Python tool that parses JSON documents using JsonPath

A Python tool that parses JSON documents using JsonPath

null 8 Dec 18, 2022
This open source Python project allow you to create JSON data trees using Minmup.com

This open source Python project allow you to create JSON data trees using Minmup.com. I try to develop this project all the time. But feel free to use :).

Arttu Väisänen 1 Jan 30, 2022
Console to handle object storage using JSON serialization and deserealization.

Console to handle object storage using JSON serialization and deserealization. This is a team project to develop a Python3 console that emulates the AirBnb object management.

Ronald Alexander 3 Dec 3, 2022
import json files directly in your python scripts

Install Install from git repository pip install git+https://github.com/zaghaghi/direct-json-import.git Use With the following json in a file named inf

Hamed Zaghaghi 51 Dec 1, 2021
jq for Python programmers Process JSON and HTML on the command-line with familiar syntax.

jq for Python programmers Process JSON and HTML on the command-line with familiar syntax.

Denis Volk 3 Jan 9, 2022
A JSON utility library for Python featuring Django-style queries and mutations.

JSON Enhanced JSON Enhanced implements fast and pythonic queries and mutations for JSON objects. Installation You can install json-enhanced with pip:

Collisio Technologies 4 Aug 22, 2022
json|dict to python object

Pyonize convert json|dict to python object Setup pip install pyonize Examples from pyonize import pyonize

bilal alpaslan 45 Nov 25, 2022
Editor for json/standard python data

Editor for json/standard python data

null 1 Dec 7, 2021
Convert your JSON data to a valid Python object to allow accessing keys with the member access operator(.)

JSONObjectMapper Allows you to transform JSON data into an object whose members can be queried using the member access operator. Unlike json.dumps in

Owen Trump 4 Jul 20, 2022
Python script to extract news from RSS feeds and save it as json.

Python script to extract news from RSS feeds and save it as json.

Alex Trbznk 14 Dec 22, 2022