SpyQL - SQL with Python in the middle

Overview

SpyQL

SQL with Python in the middle

https://pypi.python.org/pypi/spyql https://travis-ci.com/dcmoura/spyql https://spyql.readthedocs.io/en/latest/?version=latest codecov code style: black license: MIT

Concept

SpyQL is a query language that combines:

  • the simplicity and structure of SQL
  • with the power and readability of Python
SELECT
    date.fromtimestamp(purchase_ts) AS purchase_date,
    price * quantity AS total
FROM csv
WHERE department.upper() == 'IT'
TO json

SQL provides the structure of the query, while Python is used to define expressions, bringing along a vast ecosystem of packages.

SpyQL command-line tool

With the SpyQL command-line tool you can make SQL-like SELECTs powered by Python on top of text data (e.g. CSV and JSON). Data can come from files but also from data streams, such as as Kafka, or from databases such as PostgreSQL. Basically, data can come from any command that outputs text :-). More, data can be generated by a Python iterator! Take a look at the examples section to see how to query parquet, process API calls, transverse directories of zipped JSONs, among many other things.

SpyQL also allows you to easily convert between text data formats:

  • FROM: CSV, JSON, TEXT and Python iterators (YES, you can use a list comprehension as the data source)

  • TO: CSV, JSON, SQL (INSERT statements), pretty terminal printing, and terminal plotting.

The JSON format is JSON lines, where each line has a valid JSON object or array. Piping with jq allows SpyQL to handle any JSON input (more on the examples section).

You can leverage command line tools to process other file types like Parquet and XML (more on the examples section).

Installation

To install SpyQL, run this command in your terminal:

pip install spyql

Hello world

To test your installation run in the terminal:

spyql "SELECT 'Hello world' as Message TO pretty"

Output:

Message
-----------
Hello world

Try replacing the output format by json and csv, and try adding more columns. e.g. run in the terminal:

spyql "SELECT 'Hello world' as message, 1+2 as three TO json"

Output:

{"message": "Hello world", "three": 3}

Principles

Right now, the focus is on building a command-line tool that follows these core principles:

  • Simple: simple to use with a straightforward implementation
  • Familiar: you should feel at home if you are acquainted with SQL and Python
  • Light: small memory footprint that allows you to process large data that fit into your machine
  • Useful: it should make your life easier, filling a gap in the eco-system

Syntax

[ IMPORT python_module [ AS identifier ] [, ...] ]
SELECT [ DISTINCT | PARTIALS ] 
    [ * | python_expression [ AS output_column_name ] [, ...] ]
    [ FROM csv | spy | text | python_expression | json [ EXPLODE path ] ]
    [ WHERE python_expression ]
    [ GROUP BY output_column_number | python_expression  [, ...] ]
    [ ORDER BY output_column_number | python_expression
        [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ]
    [ LIMIT row_count ]
    [ OFFSET num_rows_to_skip ]
    [ TO csv | json | spy | sql | pretty | plot ]

Notable differences to SQL

In SpyQL:

  • there is guarantee that the order of the output rows is the same as in the input (if no reordering is done)
  • the AS keyword must precede a column alias definition (it is not optional as in SQL)
  • you can always access the nth input column by using the default column names colN (e.g. col1 for the first column)
  • currently only a small subset of SQL is supported, namely SELECT statements without: sub-queries, joins, set operations, etc (check the Syntax section)
  • sub-queries are achieved by piping (see the [Command line examples](#command line examples) section)
  • aggregation functions have the suffix _agg to avoid conflicts with python's built-in functions:
Operation PostgreSQL SpyQL
Sum all values of a column SELECT sum(col_name) SELECT sum_agg(col_name)
Sum an array SELECT sum(a) FROM (SELECT unnest(array[1,2,3]) AS a) AS t SELECT sum([1,2,3])
  • expressions are pure Python:
SQL SpySQL
x = y x == y
x BETWEEN a AND b a <= x <= b
CAST(x AS INTEGER) int(x)
CASE WHEN x > 0 THEN 1 ELSE -1 END 1 if x > 0 else -1
upper('hello') 'hello'.upper()

Notable differences to Python

Additional syntax

We added additional syntax for making querying easier:

Python SpySQL shortcut Purpose
json['hello']['planet earth'] json->hello->'planet earth' Easy access of elements in dicts (e.g. JSONs)

NULL datatype

Python's None generates exceptions when making operations on missing data, breaking query execution (e.g. None + 1 throws a TypeError). To overcome this, we created a NULL type that has the same behavior as in SQL (e.g. NULL + 1 returns NULL), allowing for queries to continue processing data.

Operation Native Python throws SpySQL returns SpySQL warning
NULL + 1 NameError NULL
a_dict['inexistent_key'] KeyError NULL yes
int('') ValueError NULL yes
int('abc') ValueError NULL yes

The above dictionary key access only returns NULL if the dict is an instance of NullSafeDict. SpyQL adds NullSafeDict, which extends python's native dict. JSONs are automatically loaded as NullSafeDict. Unless you are creating dictionaries on the fly you do not need to worry about this.

Importing python modules and user-defined functions

By default, spyql do some commonly used imports:

  • everything from the math module
  • datetime, date and timezone from the datetime module
  • the re module

SpyQL queries support a single import statement at the beginning of the query where several modules can be imported (e.g. IMPORT numpy AS np, sys SELECT ...). Note that the python syntax from module import identifier is not supported in queries.

In addition, you can create a python file that is loaded before executing queries. Here you can define imports, functions, variables, etc using regular python code. Everything defined in this file is available to all your spyql queries. The file should be located at XDG_CONFIG_HOME/spyql/init.py. If the environment variable XDG_CONFIG_HOME is not defined, it defaults to HOME/.config (e.g. /Users/janedoe/.config/spyql/init.py).

Example queries

You can run the following example queries in the terminal: spyql "the_query" < a_data_file

Example data files are not provided on most cases.

Query a CSV (and print a pretty table)

SELECT a_col_name, 'positive' if col2 >= 0 else 'negative' AS sign
FROM csv
TO pretty

Convert CSV to a flat JSON

SELECT * FROM csv TO json

Convert from CSV to a hierarchical JSON

SELECT {'client': {'id': col1, 'name': col2}, 'price': 120.40}
FROM csv TO json

or

SELECT {'id': col1, 'name': col2} AS client, 120.40 AS price
FROM csv TO json

JSON to CSV, filtering out NULLs

SELECT json->client->id AS id, json->client->name AS name, json->price AS price
FROM json
WHERE json->client->name is not NULL
TO csv

Explode JSON to CSV

SELECT json->invoice_num AS id, json->items->name AS name, json-items->price AS price
FROM json
EXPLODE json->items
TO csv

Sample input:

{"invoice_num" : 1028, "items": [{"name": "tomatoes", "price": 1.5}, {"name": "bananas", "price": 2.0}]}
{"invoice_num" : 1029, "items": [{"name": "peaches", "price": 3.12}]}

Output:

id, name, price
1028, tomatoes, 1.5
1028, bananas, 2.0
1029, peaches, 3.12

Python iterator/list/comprehension to JSON

SELECT 10 * cos(col1 * ((pi * 4) / 90)
FROM range(80)
TO json

or

SELECT col1
FROM [10 * cos(i * ((pi * 4) / 90)) for i in range(80)]
TO json

Importing python modules

Here we import hashlib to calculate a md5 hash for each input line. Before running this example you need to install the hashlib package (pip install hashlib).

IMPORT hashlib as hl
SELECT hl.md5(col1.encode('utf-8')).hexdigest()
FROM text

Getting the top 5 records

SELECT int(score) AS score, player_name
FROM csv
ORDER BY 1 DESC NULLS LAST, score_date
LIMIT 5

Aggregations

Totals by player, alphabetically ordered.

SELECT json->player_name, sum_agg(json->score) AS total_score
FROM json
GROUP BY 1
ORDER BY 1

Partial aggregations

Calculating the cumulative sum of a variable using the PARTIALS modifier. Also demoing the lag aggregator.

SELECT PARTIALS 
    json->new_entries, 
    sum_agg(json->new_entries) AS cum_new_entries,
    lag(json->new_entries) AS prev_entries
FROM json
TO json

Sample input:

{"new_entries" : 10}
{"new_entries" : 5}
{"new_entries" : 25}
{"new_entries" : null}
{}
{"new_entries" : 100}

Output:

{"new_entries" : 10,   "cum_new_entries" : 10,  "prev_entries": null}
{"new_entries" : 5,    "cum_new_entries" : 15,  "prev_entries": 10}
{"new_entries" : 25,   "cum_new_entries" : 40,  "prev_entries": 5}
{"new_entries" : null, "cum_new_entries" : 40,  "prev_entries": 25}
{"new_entries" : null, "cum_new_entries" : 40,  "prev_entries": null}
{"new_entries" : 100,  "cum_new_entries" : 140, "prev_entries": null}

If PARTIALSwas omitted the result would be equivalent to the last output row.

Distinct rows

SELECT DISTINCT *
FROM csv

Command line examples

To run the following examples, type Ctrl-x Ctrl-e on you terminal. This will open your default editor (emacs/vim). Paste the code of one of the examples, save and exit.

Queries on Parquet with directories

Here, find transverses a directory and executes parquet-tools for each parquet file, dumping each file to json format. jq -c makes sure that the output has 1 json per line before handing over to spyql. This is far from being an efficient way to query parquet files, but it might be a handy option if you need to do a quick inspection.

a_field, json->a_num_field * 2 + 1 FROM json "">
find /the/directory -name "*.parquet" -exec parquet-tools cat --json {} \; |
jq -c |
spyql "
	SELECT json->a_field, json->a_num_field * 2 + 1
	FROM json
"

Querying multiple json.gz files

a_field, json->a_num_field * 2 + 1 FROM json "">
gzcat *.json.gz |
jq -c |
spyql "
	SELECT json->a_field, json->a_num_field * 2 + 1
	FROM json
"

Querying YAML / XML / TOML files

yq converts yaml, xml and toml files to json, allowing to easily query any of these with spyql.

a_field FROM json"">
cat file.yaml | yq -c | spyql "SELECT json->a_field FROM json"
a_field FROM json"">
cat file.xml | xq -c | spyql "SELECT json->a_field FROM json"
a_field FROM json"">
cat file.toml | tomlq -c | spyql "SELECT json->a_field FROM json"

Kafka to PostegreSQL pipeline

Read data from a kafka topic and write to postgres table name customer.

customer->id AS id, json->customer->name AS name FROM json TO sql " | psql -U an_user_name -h a.host.com a_database_name">
kafkacat -b the.broker.com -t the.topic |
spyql -Otable=customer -Ochunk_size=1 --unbuffered "
	SELECT
		json->customer->id AS id,
		json->customer->name AS name
	FROM json
	TO sql
" |
psql -U an_user_name -h a.host.com a_database_name

Monitoring statistics in Kafka

Read data from a kafka topic, continuously calculating statistics.

kafkacat -b the.broker.com -t the.topic |
spyql --unbuffered "
	SELECT PARTIALS
        count_agg(*) AS running_count,
		sum_agg(value) AS running_sum,
		min_agg(value) AS min_so_far, 
        value AS current_value
	FROM json
	TO csv
" 

Sub-queries (piping)

A special file format (spy) is used to efficiently pipe data between queries.

first_name, json->middle_name, json->last_name]) AS full_name FROM json TO spy" | spyql "SELECT full_name, full_name.upper() FROM spy"">
cat a_file.json |
spyql "
	SELECT ' '.join([json->first_name, json->middle_name, json->last_name]) AS full_name
	FROM json
	TO spy" |
spyql "SELECT full_name, full_name.upper() FROM spy"

Queries over APIs

data->email AS email, 'Dear {}, thank you for being a great customer!'.format(json->data->first_name) AS msg FROM json EXPLODE json->data TO json "">
curl https://reqres.in/api/users?page=2 |
spyql "
	SELECT
		json->data->email AS email,
		'Dear {}, thank you for being a great customer!'.format(json->data->first_name) AS msg
	FROM json
	EXPLODE json->data
	TO json
"

Plotting to the terminal

spyql "
    SELECT col1
    FROM [10 * cos(i * ((pi * 4) / 90)) for i in range(80)]
    TO plot
"

Plotting with gnuplot

To the terminal:

spyql "
    SELECT col1
    FROM [10 * cos(i * ((pi * 4) / 90)) for i in range(80)]
    TO csv
" |
sed 1d |
feedgnuplot --terminal 'dumb 80,30' --exit --lines

To GUI:

spyql "
    SELECT col1
    FROM [10 * cos(i * ((pi * 4) / 90)) for i in range(80)]
    TO csv
" |
sed 1d |
feedgnuplot --lines --points --exit

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Comments
  • Interactive SpyQL

    Interactive SpyQL

    This is v0 for interactive spyql. It implements the following new things:

    1. Q class and q function in interactive.py: This is the class/function that the user can add in their script
    2. InteractiveProcessor in processor.py: row yielding loaded by the Processor
    3. InteractiveWriter in processor.py: writer that adds the output in a list

    Other Important changes:

    1. init_vars, go and _go now have user_query_vars as a valid input
    2. Processor can now be explicitly defined as interactive

    Misc:

    1. ifs changed to elifs where needed

    Issue: #64

    To test this python tests/interactive_test.py.

    opened by yashbonde 12
  • Refactor `parse_structure` and allow comment in query

    Refactor `parse_structure` and allow comment in query

    The purposes of this PR is

    • Refactor parse_structure function (cf. https://github.com/dcmoura/spyql/issues/85#issue-1459257843)
    • Support for line comments (cf. https://github.com/dcmoura/spyql/issues/84)
    enhancement 
    opened by Hayashi-Yudai 9
  • Key-value aggregations and lookups

    Key-value aggregations and lookups

    This PR closes #59.

    First, it introduces dict_agg for key-value aggregations. Example:

    $ cat codes.csv
    MCC,MCC (int),MNC,MNC (int),ISO,Country,Country Code,Network
    289,649,88,2191,ge,Abkhazia,7,A-Mobile
    289,649,68,1679,ge,Abkhazia,7,A-Mobile
    289,649,67,1663,ge,Abkhazia,7,Aquafon
    412,1042,01,31,af,Afghanistan,93,AWCC
    412,1042,50,1295,af,Afghanistan,93,Etisalat
    412,1042,30,783,af,Afghanistan,93,Etisalat
    452,1106,04,79,vn,Vietnam,84,Viettel
    452,1106,02,47,vn,Vietnam,84,VinaPhone
    543,1347,299,665,wf,Wallis and Futuna,,Failed Calls
    543,1347,01,31,wf,Wallis and Futuna,,Manuia
    421,1057,999,2457,ye,Yemen,967,Fix Line
    421,1057,04,79,ye,Yemen,967,HITS/Y Unitel
    421,1057,01,31,ye,Yemen,967,Sabaphone
    421,1057,03,63,ye,Yemen,967,Yemen Mob. CDMA
    645,1605,01,31,zm,Zambia,260,Airtel
    645,1605,299,665,zm,Zambia,260,Failed Calls
    
    $ spyql "
        SELECT dict_agg(MCC, Country) AS json 
        FROM csv TO json
      " < codes.csv > code_country.json
    
    $ jq . code_country.json                                        
    {
      "289": "Abkhazia",
      "412": "Afghanistan",
      "452": "Vietnam",
      "543": "Wallis and Futuna",
      "421": "Yemen",
      "645": "Zambia"
    }
    

    Second, it introduces the ~~kv function~~ -J option to allow loading JSON objects for key-value lookups, emulating left equi joins. Example:

    $ cat towers_mini.csv 
    radio,mcc,net,area,cell,unit,lon,lat,range,samples,changeable,created,updated,averageSignal
    UMTS,262,2,776,165186,0,13.329163,52.46521,1000,2,1,1286645042,1295769656,0
    GSM,262,3,1075,5651,0,13.329163,52.46521,1000,2,1,1286645042,1295769656,0
    UMTS,262,2,801,165123,0,13.295516967773,52.494735717773,1000,1,1,1286864565,1286864565,0
    UMTS,262,2,801,165121,0,13.301697,52.499886,1000,4,1,1286884439,1297735807,0
    GSM,624,2,6,13883,0,9.704361,4.063911,1588,8,1,1352031870,1374212876,0
    GSM,624,2,6,34381,0,9.709009,4.066368,2484,13,1,1352031870,1380389330,0
    GSM,452,1,10068,5293,0,105.8031463623,20.959854125977,1000,2,1,1459692342,1488347104,0
    GSM,645,3,130,3282,0,28.070755004883,-14.902267456055,1000,1,1,1459743588,1459743588,0
    GSM,645,3,1,32221,0,28.252716,-15.439224,1000,2,1,1369353852,1369353852,0
    GSM,645,3,1,33932,0,28.255325,-15.436752,1000,5,1,1369353888,1369353888,0
    GSM,645,3,1,31662,0,28.258072,-15.434555,1000,5,1,1369353960,1369353960,0
    
    $ spyql -Jcountries=code_country.json "SELECT mcc, countries[mcc] AS country, count_agg(1) AS n_recs FROM csv GROUP BY 1, 2 ORDER BY 1 TO pretty" < towers_mini.csv
      mcc  country      n_recs
    -----  ---------  --------
      262                    4
      452  Vietnam           1
      624                    2
      645  Zambia            4
    
    opened by dcmoura 7
  • Code improvements

    Code improvements

    Some cases where the code could be more pythonic:

    • some for cycles where range(len(l)) might be replaced by enumerate
    • some functions where yeld might be used to return a generator

    Changes should be accessed for performance.

    enhancement good first issue 
    opened by dcmoura 6
  • code improvement for issue #60

    code improvement for issue #60

    Improve the code.

    TODOs

    • [x] Fix code to more pythonic style
      • Fix import style
      • Remove some unused imports
    • [x] Fix error in lint by flake8

    Issue: https://github.com/dcmoura/spyql/issues/60

    opened by Hayashi-Yudai 5
  • LIKE clause

    LIKE clause

    Adds a LIKE function to the WHERE clause for easier matching. Closes #17

    Some notes

    • Right now it can match not only with strings but also with everything else, as the values are stringified before matching. Let me know if this is not intended and if it should be for strings only.
    • It only supports the % wildcard operator.
    • Let me know if the commit messages should follow a specific format

    Future work

    Either in the scope of this PR or in a future one, it is still missing other basic matching functions, like _ for single character matching or [] to specify a range.

    enhancement 
    opened by ricardocchaves 4
  • Fixes git-actions tests on py3.6 and adds py3.11

    Fixes git-actions tests on py3.6 and adds py3.11

    Python 3.6 env stopped launching. Seems to be related to an upgrade on ubuntu version and support for py3.6 was not included in the new version.

    Solution based on https://github.com/actions/setup-python/issues/543#issuecomment-1335614916

    opened by dcmoura 3
  • [FR] COUNT(DISTINCT col1) ... GROUP BY

    [FR] COUNT(DISTINCT col1) ... GROUP BY

    Thank you for the great tool!

    I'd like to request feature which works like COUNT(DISTINCT col1) ... GROUP BY of SQL. I tried count_agg(distinct col1), but got syntax error.

    opened by Minyus 3
  • Full-featured documentation

    Full-featured documentation

    This PR proposes a brand new documentation. This PR closes #34 .

    Please ignore the commits... the history of this PR is quite disturbing 😅 Everything will be squashed into a single commit when merging to master.

    Please focus on 1) the structure and then 2) the content of:

    • https://github.com/dcmoura/spyql/blob/docs/README.rst
    • https://spyql.readthedocs.io/en/docs/

    I am proposing deprecating the -> operator. It will still be supported but is not mentioned in the documentation. From now on, the . (dot) operator should be the standard way of accessing fields in dicts, followed by standard Python syntax (a_dict["key"]). Major drawback is that we will need to do a new video demoing the CLI and update some posts (e.g. stack overflow).

    There is still some work to do on the lib side. Namely, we should add a section under Examples/Recipes. However, there is still some dev work to do, namely on accessing Pandas dataframes and series, as well as numpy arrays. Eventually we can also visit polars and other libs for tabular data. When we do that, we can extend the documentation.

    In this first review let's try to focus on the big fish, and not so much on the details. My goal is to publish the documentation soon, and then we can iterate on it over time. Thanks!

    documentation 
    opened by dcmoura 3
  • Column names with TEXT format

    Column names with TEXT format

    When using TEXT as an import format, I couldn't find a way of configuring the column name. The column name seems to be evaluated as long as we don't specify * in the SELECT statement which we have to do for TEXT files. Am I missing something? Here's an example:

    cat /tmp/test.txt | spyql -Otable=domains 'Select * as colname from text To SQL'
    INSERT INTO "domains"("col1") VALUES ('aaaaaaaaaa'),('bbbbbbbbb'),('cccccccccc'),('dddddddddd'),('eeeeeeeeee');
    
    cat /tmp/test.txt | spyql -Otable=domains 'Select 1 as colname from text To SQL' 
    INSERT INTO "domains"("colname") VALUES (1),(1),(1),(1),(1);
    

    Expected result would be:

    cat /tmp/test.txt | spyql -Otable=domains 'Select * as colname from text To SQL'
    INSERT INTO "domains"("colname") VALUES ('aaaaaaaaaa'),('bbbbbbbbb'),('cccccccccc'),('dddddddddd'),('eeeeeeeeee');
    
    question 
    opened by henshin 3
  • Cannot dump JSON object with Null values to string

    Cannot dump JSON object with Null values to string

    Use Case

    I'm using the CLI to import JSON Line log files into a Postgres database. One of the columns I'm trying to import is jsonb, and needs to be inserted as a JSON string.

    Approach

    spyql 'IMPORT json SELECT json.dumps(.a) FROM {"a": {"foo": 1, "bar": 2}} TO sql'
    INSERT INTO "table_name"("dumps_a") VALUES ('{"foo": 1, "bar": 2}');
    

    This works fine until it hits a null value:

    spyql 'IMPORT json SELECT json.dumps(.a) FROM {"a": {"foo": 1, "bar": null}} TO sql'
    ERROR	could not evaluate SELECT expression #1: json.dumps(row.a)
    	at data row #1: [{'a': {'foo': 1, 'bar': NULL}}]
    TypeError: Object of type NullType is not JSON serializable
    

    I also tried passing the default kwarg to json.dumps and hit a different error:

    spyql 'IMPORT json SELECT json.dumps(.a, default=str) FROM {"a": {"foo": 1, "bar": null}} TO sql'
    ERROR	could not evaluate SELECT expression #1: json.dumps(row.a, default=str)
    	at data row #1: [{'a': {'foo': 1, 'bar': NULL}}]
    ValueError: Circular reference detected
    

    Very open to alternative approaches; thank you so much for making this great tool available. :)

    opened by mgburns 2
  • [FR] join 2 CSV files

    [FR] join 2 CSV files

    Hi @dcmoura , I see examples to join a JSON file in the document, would joining 2 CSV files be supported? https://spyql.readthedocs.io/en/latest/recipes.html?highlight=join#equi-joins

    opened by Minyus 6
  • general text file support

    general text file support

    I'm using spyql to handle general text files (single-column, not comma-separated), but got a problem.

    A single-column text file like the following is misinterpreted as multiple columns by spyql.

    test.csv:

    user_id
    a11e11
    b22e22
    
    $ spyql "SELECT * FROM csv('test.csv') TO pretty"
    us      r_id
    ----  ------
    a11       11
    b22       22
    

    If there are no common characters among rows, the following error is returned.

    ERROR   Could not detect CSV dialect from input
    _csv.Error: Could not determine delimiter
    

    I found that explicitly setting a delimiter like csv('test.csv', delimiter=',') works as a workaround in the current version, but I would like to suggest the following options.

    [Option 1] Modify default behavior of csv()

    1. As a delimiter character, search only "," (especially not alphanumeric characters)
    2. If no delimiter is detected (and each row is not too long), read it as a single-column text file rather than returning an error

    [Option 2] Add an alternative to csv() which can read a single-column text file (something like txt())

    opened by Minyus 6
  • Cannot run CI/CD pipe on Python 3.11

    Cannot run CI/CD pipe on Python 3.11

    Tried to add Python 3.11 but broke the build:

    Installing build dependencies: started
    [66](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:67)
      Installing build dependencies: finished with status 'done'
    [67](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:68)
      Getting requirements to build wheel: started
    [68](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:69)
      Getting requirements to build wheel: finished with status 'done'
    [69](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:70)
      Preparing metadata (pyproject.toml): started
    [70](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:71)
      Preparing metadata (pyproject.toml): finished with status 'error'
    [71](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:72)
      error: subprocess-exited-with-error
    [72](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:73)
      
    [73](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:74)
      × Preparing metadata (pyproject.toml) did not run successfully.
    [74](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:75)
      │ exit code: 1
    [75](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:76)
      ╰─> [11 lines of output]
    [76](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:77)
          💥 maturin failed
    [77](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:78)
            Caused by: Cargo metadata failed. Does your crate compile with `cargo build`?
    [78](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:79)
            Caused by: `cargo metadata` exited with an error: error: failed to run `rustc` to learn about target-specific information
    [79](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:80)
          
    [80](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:81)
          Caused by:
    [81](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:82)
            process didn't exit successfully: `rustc - --crate-name ___ --print=file-names -Z mutable-noalias=yes -C target-feature=+sse2 --crate-type bin --crate-type rlib --crate-type dylib --crate-type cdylib --crate-type staticlib --crate-type proc-macro --print=sysroot --print=cfg` (exit status: 1)
    [82](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:83)
            --- stderr
    [83](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:84)
            error: the option `Z` is only accepted on the nightly compiler
    [84](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:85)
          Error running maturin: Command '['maturin', 'pep517', 'write-dist-info', '--metadata-directory', '/tmp/pip-modern-metadata-2ryt4i58', '--interpreter', '/opt/hostedtoolcache/Python/3.11.0/x64/bin/python', '--manylinux=off', '--strip=on']' returned non-zero exit status 1.
    [85](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:86)
          Checking for Rust toolchain....
    [86](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:87)
          Running `maturin pep517 write-dist-info --metadata-directory /tmp/pip-modern-metadata-2ryt4i58 --interpreter /opt/hostedtoolcache/Python/3.11.0/x64/bin/python --manylinux=off --strip=on`
    [87](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:88)
          [end of output]
    [88](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:89)
      
    [89](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:90)
      note: This error originates from a subprocess, and is likely not a problem with pip.
    [90](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:91)
    error: metadata-generation-failed
    [91](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:92)
    
    [92](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:93)
    × Encountered error while generating package metadata.
    [93](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:94)
    ╰─> See above for output.
    [94](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:95)
    
    [95](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:96)
    note: This is an issue with the package mentioned above, not pip.
    [96](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:97)
    hint: See above for details.
    [97](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:98)
    Error: Process completed with exit code 1.
    
    bug good first issue 
    opened by dcmoura 0
  • Support reading and writing of JSON objects (not JSON lines)

    Support reading and writing of JSON objects (not JSON lines)

    Currently SPyQL only allows to read and write JSON lines. Writing JSON arrays can be done using the dict_agg, aggregating everything into an array and writing an JSON with a single line.

    The idea is to add an argument lines=True to json and orjson writers and processors. The processor should be able to handle single object files as well as arrays of objects or arrays of scalars. When lines is False the processor should load the full input into memory and then parse it. While this is not ideal, it is the most straightforward implementation. In addition, arrays of JSON shouldn't be used for large data, in that cases JSON lines should be used instead.

    The writer should write an array of objects when lines is False.

    core 
    opened by dcmoura 1
Releases(v0.9.0)
Owner
Daniel Moura
Daniel Moura
Simple DDL Parser to parse SQL (HQL, TSQL, AWS Redshift, Snowflake and other dialects) ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc.

Simple DDL Parser Build with ply (lex & yacc in python). A lot of samples in 'tests/. Is it Stable? Yes, library already has about 5000+ usage per day

Iuliia Volkova 95 Jan 5, 2023
PyRemoteSQL is a python SQL client that allows you to connect to your remote server with phpMyAdmin installed.

PyRemoteSQL Python MySQL remote client Basically this is a python SQL client that allows you to connect to your remote server with phpMyAdmin installe

ProbablyX 3 Nov 4, 2022
A simple python package that perform SQL Server Source Control and Auto Deployment.

deploydb Deploy your database objects automatically when the git branch is updated. Production-ready! ⚙️ Easy-to-use ?? Customizable ?? Installation I

Mert Güvençli 10 Dec 7, 2022
Python script to clone SQL dashboard from one workspace to another

Databricks dashboard clone Unofficial project to allow Databricks SQL dashboard copy from one workspace to another. Resource clone Setup: Create a fil

Quentin Ambard 12 Jan 1, 2023
Databank is an easy-to-use Python library for making raw SQL queries in a multi-threaded environment.

Databank Databank is an easy-to-use Python library for making raw SQL queries in a multi-threaded environment. No ORM, no frills. Thread-safe. Only ra

snapADDY GmbH 4 Apr 4, 2022
Estoult - a Python toolkit for data mapping with an integrated query builder for SQL databases

Estoult Estoult is a Python toolkit for data mapping with an integrated query builder for SQL databases. It currently supports MySQL, PostgreSQL, and

halcyon[nouveau] 15 Dec 29, 2022
Simplest SQL mapper in Python, probably

SQL MAPPER Basically what it does is: it executes some SQL thru a database connector you fed it, maps it to some model and gives to u. Also it can cre

null 2 Nov 7, 2022
Makes it easier to write raw SQL in Python.

CoolSQL Makes it easier to write raw SQL in Python. Usage Quick Start from coolsql import Field name = Field("name") age = Field("age") condition =

Aber 7 Aug 21, 2022
A pandas-like deferred expression system, with first-class SQL support

Ibis: Python data analysis framework for Hadoop and SQL engines Service Status Documentation Conda packages PyPI Azure Coverage Ibis is a toolbox to b

Ibis Project 2.3k Jan 6, 2023
SQL for Humans™

Records: SQL for Humans™ Records is a very simple, but powerful, library for making raw SQL queries to most relational databases. Just write SQL. No b

Kenneth Reitz 6.9k Jan 7, 2023
Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.

dataset: databases for lazy people In short, dataset makes reading and writing data in databases as simple as reading and writing JSON files. Read the

Friedrich Lindenberg 4.2k Jan 2, 2023
SQL for Humans™

Records: SQL for Humans™ Records is a very simple, but powerful, library for making raw SQL queries to most relational databases. Just write SQL. No b

Ken Reitz 6.9k Jan 3, 2023
Anomaly detection on SQL data warehouses and databases

With CueObserve, you can run anomaly detection on data in your SQL data warehouses and databases. Getting Started Install via Docker docker run -p 300

Cuebook 171 Dec 18, 2022
SQL queries to collections

SQC SQL Queries to Collections Examples from sqc import sqc data = [ {"a": 1, "b": 1}, {"a": 2, "b": 1}, {"a": 3, "b": 2}, ] Simple filte

Alexander Volkovsky 0 Jul 6, 2022
edaSQL is a library to link SQL to Exploratory Data Analysis and further more in the Data Engineering.

edaSQL is a python library to bridge the SQL with Exploratory Data Analysis where you can connect to the Database and insert the queries. The query results can be passed to the EDA tool which can give greater insights to the user.

Tamil Selvan 8 Dec 12, 2022
Some scripts for microsoft SQL server in old version.

MSSQL_Stuff Some scripts for microsoft SQL server which is in old version. Table of content Overview Usage References Overview These script works when

小离 5 Dec 29, 2022
Making it easy to query APIs via SQL

Shillelagh Shillelagh (ʃɪˈleɪlɪ) is an implementation of the Python DB API 2.0 based on SQLite (using the APSW library): from shillelagh.backends.apsw

Beto Dealmeida 207 Dec 30, 2022
Import entity definition document into SQLie3. Manage the entity. Also, create a "Create Table SQL file".

EntityDocumentMaker Version 1.00 After importing the entity definition (Excel file), store the data in sqlite3. エンティティ定義(Excelファイル)をインポートした後、データをsqlit

G-jon FujiYama 1 Jan 9, 2022
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

dbd: database prototyping tool dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL d

Zdenek Svoboda 47 Dec 7, 2022