A non-validating SQL parser module for Python

Andi Albrecht

Last update: Jan 4, 2023

Related tags

Text Processing sqlparse

Overview

python-sqlparse - Parse SQL statements

sqlparse is a non-validating SQL parser for Python. It provides support for parsing, splitting and formatting SQL statements.

The module is compatible with Python 3.5+ and released under the terms of the New BSD license.

Visit the project page at https://github.com/andialbrecht/sqlparse for further information about this project.

Quick Start

$ pip install sqlparse

>>> import sqlparse

>>> # Split a string containing two SQL statements:
>>> raw = 'select * from foo; select * from bar;'
>>> statements = sqlparse.split(raw)
>>> statements
['select * from foo;', 'select * from bar;']

>>> # Format the first statement and print it out:
>>> first = statements[0]
>>> print(sqlparse.format(first, reindent=True, keyword_case='upper'))
SELECT *
FROM foo;

>>> # Parsing a SQL statement:
>>> parsed = sqlparse.parse('select * from foo')[0]
>>> parsed.tokens
[<DML 'select' at 0x7f22c5e15368>, <Whitespace ' ' at 0x7f22c5e153b0>, <Wildcard '*' … ]
>>>

Links

Project page: https://github.com/andialbrecht/sqlparse
Bug tracker: https://github.com/andialbrecht/sqlparse/issues
Documentation: https://sqlparse.readthedocs.io/
Online Demo: https://sqlformat.org/

sqlparse is licensed under the BSD license.

Parts of the code are based on pygments written by Georg Brandl and others. pygments-Homepage: http://pygments.org/

Comments

$FOO.BAR vs FOO.BAR for grouping it as IdentifierList

https://github.com/andialbrecht/sqlparse/blob/9f44d54c07180b826a6276d3acf5e1458b507c3f/sqlparse/utils.py#L81-L90

FOO has a type of Name while $FOO has a type of Name.Placeholder. As FOO.BAR is parsed as an IdentifierList, I think $FOO.BAR should be parsed as IdentiferList as well.

Line 90 does not handle a tuple of Tokens such as (Token.Name, Token.Symbol) properly. In such a case, it would wrap t into a list, which looks like a bug to me.

I think this line should be written to wrap t into a list only if t is a single Token item. Otherwise, do not wrap it. For this intent, I'd suggest to change this line as follows: types = [t, ] if t and not isinstance(t[0], tuple) else t

opened by corncat-liu 0
JOIN could be better

Hey,

I just did my first test with a JOIN. At the end you see the "ON", which could look better with an "\n" infront. < snip > JOIN (SELECT hour(TIMESTAMP) AS HOUR, round(avg(value), 2) AS Home_Consumtion_P FROM history WHERE TIMESTAMP > §timestamp_begin§ AND TIMESTAMP < DATE_ADD(§timestamp_begin§,INTERVAL 1 DAY) AND DEVICE = @generator AND READING = @Home_Consumtion_P AND VALUE > 10 GROUP BY 1) X3 ON X1.HOUR = X2.HOUR AND X1.HOUR = X3.HOUR) X;

You did a graet job! Christian
Enhancement

opened by cheick66 4

Configurable syntax

This PR makes the Lexer a singleton class. This object carries the configured syntax as instance attributes. A library user who has non-standard syntax requirements is able to adapt the behavior of the Lexer to meet her needs. As an example for how to do this, please see the relevant test:

def test_configurable_syntax():
    sql = """select * from foo BACON SPAM EGGS;"""
    tokens = sqlparse.parse(sql)[0]

    assert list(
        (t.ttype, t.value) for t in tokens if t.ttype not in sqlparse.tokens.Whitespace
    ) == [
        (sqlparse.tokens.Keyword.DML, "select"),
        (sqlparse.tokens.Wildcard, "*"),
        (sqlparse.tokens.Keyword, "from"),
        (None, "foo BACON"),
        (None, "SPAM EGGS"),
        (sqlparse.tokens.Punctuation, ";"),
    ]

    Lexer().add_keywords(
        {
            "BACON": sqlparse.tokens.Name.Builtin,
            "SPAM": sqlparse.tokens.Keyword,
            "EGGS": sqlparse.tokens.Keyword,
        }
    )

    tokens = sqlparse.parse(sql)[0]

    assert list(
        (t.ttype, t.value) for t in tokens if t.ttype not in sqlparse.tokens.Whitespace
    ) == [
        (sqlparse.tokens.Keyword.DML, "select"),
        (sqlparse.tokens.Wildcard, "*"),
        (sqlparse.tokens.Keyword, "from"),
        (None, "foo"),
        (sqlparse.tokens.Name.Builtin, "BACON"),
        (sqlparse.tokens.Keyword, "SPAM"),
        (sqlparse.tokens.Keyword, "EGGS"),
        (sqlparse.tokens.Punctuation, ";"),
    ]
    # reset the syntax for later tests.
    Lexer().default_initialization()

opened by mrmasterplan 8

Split does not work properly

The split does not work properly for the below Oracle Stored procedure

CREATE OR REPLACE EDITIONABLE PROCEDURE "OWB_ADMIN"."AZBJ_RI_LIMIT_PROC" AS week_date_first DATE := TRUNC (azbj_pme_api.opus_date - 7); week_date_last DATE := TRUNC (azbj_pme_api.opus_date - 1); BEGIN logtrace ('LOG', 10001, 'PROC AZBJ_RI_LIMIT_PROC STARTED', 'AZBJ_RI_LIMIT_PROC'); BEGIN EXECUTE IMMEDIATE ('DROP TABLE STG_RI_LIMIT_DATA PURGE'); EXCEPTION WHEN OTHERS THEN NULL; END; END;

Bug

opened by corncat-liu 1
Split does not work for REPLACE PROCEDURE

The below one REPLACE PROCEDURE is split into multiple as in screenshot

REPLACE PROCEDURE WDP1POMD_OBJECT.TYPE_DO_DELETE( P_OBJECT_TYPE_CODE VARCHAR(1)) SQL SECURITY CREATOR #MAIN# : BEGIN DECLARE GV_PACKAGE_NAME VARCHAR(50) DEFAULT 'WDP1POMD_OBJECT_TYPE'; DECLARE gv_module_name VARCHAR(50); DECLARE EXIT HANDLER FOR SQLState '02000' RESIGNAL SQLSTATE 'T7665' SET MESSAGE_TEXT = '- 20011, Record not found.'; DECLARE EXIT HANDLER FOR SQLException BEGIN CALL WDP1POMD_UTIL.omd_fatal_log2(GV_PACKAGE_NAME || '.' || GV_MODULE_NAME, SQLCODE); RESIGNAL; END; SET GV_MODULE_NAME = 'DO_DELETE';

DELETE FROM WDP1TOMD.OMD_OBJECT_TYPE WHERE OBJECT_TYPE_CODE = P_OBJECT_TYPE_CODE; END;

Needs Feedback

opened by corncat-liu 1

Owner

Andi Albrecht

GitHub

Chilean Digital Vaccination Pass Parser (CDVPP) parses digital vaccination passes from PDF files

cdvpp Chilean Digital Vaccination Pass Parser (CDVPP) parses digital vaccination passes from PDF files Reads a Digital Vaccination Pass PDF file as in

1 Nov 17, 2021

This is REST-API for Indonesian Text Summarization using Non-Negative Matrix Factorization for the algorithm to summarize documents and FastAPI for the framework.

Indonesian Text Summarization Using FastAPI This is REST-API for Indonesian Text Summarization using Non-Negative Matrix Factorization for the algorit

2 Nov 3, 2022

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

Contents Maintainer wanted Introduction Installation Documentation License History Source code Authors Maintainer wanted I am looking for a new mainta

1.2k Dec 16, 2022

A simple Python module for parsing human names into their individual components

Name Parser A simple Python (3.2+ & 2.6+) module for parsing human names into their individual components. hn.title hn.first hn.middle hn.last hn.suff

574 Dec 20, 2022

PyMultiDictionary is a Dictionary Module for Python 3+ to get meanings, translations, synonyms and antonyms of words in 20 different languages

PyMultiDictionary PyMultiDictionary is a Dictionary Module for Python 3+ to get meanings, translations, synonyms and antonyms of words in 20 different

19 Dec 26, 2022

Implementation of hashids (http://hashids.org) in Python. Compatible with Python 2 and Python 3

hashids for Python 2.7 & 3 A python port of the JavaScript hashids implementation. It generates YouTube-like hashes from one or many numbers. Use hash

1.4k Jan 2, 2023

Python character encoding detector

Chardet: The Universal Character Encoding Detector Detects ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants) Big5, GB2312, EUC-TW, HZ-GB-2312, IS

1.8k Jan 8, 2023

Fuzzy String Matching in Python

FuzzyWuzzy Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

8.8k Jan 8, 2023

Paranoid text spacing in Python

pangu.py Paranoid text spacing for good readability, to automatically insert whitespace between CJK (Chinese, Japanese, Korean) and half-width charact

194 Nov 19, 2022

An implementation of figlet written in Python

All of the documentation and the majority of the work done was by Christopher Jones ([email protected]). Packaged by Peter Waller <[email protected]>,

1.1k Jan 2, 2023

Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

TextDistance TextDistance -- python library for comparing distance between two or more sequences by many algorithms. Features: 30+ algorithms Pure pyt

3k Jan 2, 2023

Python flexible slugify function

awesome-slugify Python flexible slugify function PyPi: https://pypi.python.org/pypi/awesome-slugify Github: https://github.com/dimka665/awesome-slugif

471 Dec 20, 2022

Python Lex-Yacc

2.4k Dec 31, 2022

Python library for creating PEG parsers

PyParsing -- A Python Parsing Module Introduction The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the t

1.7k Dec 27, 2022

Python port of Google's libphonenumber

phonenumbers Python Library This is a Python port of Google's libphonenumber library It supports Python 2.5-2.7 and Python 3.x (in the same codebase,

3.1k Dec 29, 2022

A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings.

Python User Agents user_agents is a Python library that provides an easy way to identify/detect devices like mobile phones, tablets and their capabili

1.3k Dec 22, 2022

A non-validating SQL parser module for Python

Related tags

Overview

python-sqlparse - Parse SQL statements

Quick Start

Links

Comments

$FOO.BAR vs FOO.BAR for grouping it as IdentifierList

JOIN could be better

Configurable syntax

Split does not work properly

Split does not work for REPLACE PROCEDURE

Owner

Andi Albrecht

Chilean Digital Vaccination Pass Parser (CDVPP) parses digital vaccination passes from PDF files

This is REST-API for Indonesian Text Summarization using Non-Negative Matrix Factorization for the algorithm to summarize documents and FastAPI for the framework.

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

A simple Python module for parsing human names into their individual components

PyMultiDictionary is a Dictionary Module for Python 3+ to get meanings, translations, synonyms and antonyms of words in 20 different languages

Implementation of hashids (http://hashids.org) in Python. Compatible with Python 2 and Python 3

Python character encoding detector

Fuzzy String Matching in Python

Paranoid text spacing in Python

An implementation of figlet written in Python

Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

Python flexible slugify function

Python Lex-Yacc

Python library for creating PEG parsers

Python port of Google's libphonenumber

A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings.

An anthology of a variety of tools for the Persian language in Python

Widevine KEY Extractor in Python

A Python app which can convert normal text to Handwritten text.