A non-validating SQL parser module for Python

Overview

python-sqlparse - Parse SQL statements

buildstatus coverage docs

sqlparse is a non-validating SQL parser for Python. It provides support for parsing, splitting and formatting SQL statements.

The module is compatible with Python 3.5+ and released under the terms of the New BSD license.

Visit the project page at https://github.com/andialbrecht/sqlparse for further information about this project.

Quick Start

$ pip install sqlparse
>>> import sqlparse

>>> # Split a string containing two SQL statements:
>>> raw = 'select * from foo; select * from bar;'
>>> statements = sqlparse.split(raw)
>>> statements
['select * from foo;', 'select * from bar;']

>>> # Format the first statement and print it out:
>>> first = statements[0]
>>> print(sqlparse.format(first, reindent=True, keyword_case='upper'))
SELECT *
FROM foo;

>>> # Parsing a SQL statement:
>>> parsed = sqlparse.parse('select * from foo')[0]
>>> parsed.tokens
[<DML 'select' at 0x7f22c5e15368>, <Whitespace ' ' at 0x7f22c5e153b0>, <Wildcard '*' … ]
>>>

Links

Project page
https://github.com/andialbrecht/sqlparse
Bug tracker
https://github.com/andialbrecht/sqlparse/issues
Documentation
https://sqlparse.readthedocs.io/
Online Demo
https://sqlformat.org/

sqlparse is licensed under the BSD license.

Parts of the code are based on pygments written by Georg Brandl and others. pygments-Homepage: http://pygments.org/

Comments
  • $FOO.BAR vs FOO.BAR for grouping it as IdentifierList

    $FOO.BAR vs FOO.BAR for grouping it as IdentifierList

    https://github.com/andialbrecht/sqlparse/blob/9f44d54c07180b826a6276d3acf5e1458b507c3f/sqlparse/utils.py#L81-L90

    FOO has a type of Name while $FOO has a type of Name.Placeholder. As FOO.BAR is parsed as an IdentifierList, I think $FOO.BAR should be parsed as IdentiferList as well.

    Line 90 does not handle a tuple of Tokens such as (Token.Name, Token.Symbol) properly. In such a case, it would wrap t into a list, which looks like a bug to me.

    I think this line should be written to wrap t into a list only if t is a single Token item. Otherwise, do not wrap it. For this intent, I'd suggest to change this line as follows: types = [t, ] if t and not isinstance(t[0], tuple) else t

    opened by corncat-liu 0
  • JOIN could be better

    JOIN could be better

    Hey,

    I just did my first test with a JOIN. At the end you see the "ON", which could look better with an "\n" infront. < snip > JOIN (SELECT hour(TIMESTAMP) AS HOUR, round(avg(value), 2) AS Home_Consumtion_P FROM history WHERE TIMESTAMP > §timestamp_begin§ AND TIMESTAMP < DATE_ADD(§timestamp_begin§,INTERVAL 1 DAY) AND DEVICE = @generator AND READING = @Home_Consumtion_P AND VALUE > 10 GROUP BY 1) X3 ON X1.HOUR = X2.HOUR AND X1.HOUR = X3.HOUR) X;

    You did a graet job! Christian

    Enhancement 
    opened by cheick66 4
  • Configurable syntax

    Configurable syntax

    This PR makes the Lexer a singleton class. This object carries the configured syntax as instance attributes. A library user who has non-standard syntax requirements is able to adapt the behavior of the Lexer to meet her needs. As an example for how to do this, please see the relevant test:

    def test_configurable_syntax():
        sql = """select * from foo BACON SPAM EGGS;"""
        tokens = sqlparse.parse(sql)[0]
    
        assert list(
            (t.ttype, t.value) for t in tokens if t.ttype not in sqlparse.tokens.Whitespace
        ) == [
            (sqlparse.tokens.Keyword.DML, "select"),
            (sqlparse.tokens.Wildcard, "*"),
            (sqlparse.tokens.Keyword, "from"),
            (None, "foo BACON"),
            (None, "SPAM EGGS"),
            (sqlparse.tokens.Punctuation, ";"),
        ]
    
        Lexer().add_keywords(
            {
                "BACON": sqlparse.tokens.Name.Builtin,
                "SPAM": sqlparse.tokens.Keyword,
                "EGGS": sqlparse.tokens.Keyword,
            }
        )
    
        tokens = sqlparse.parse(sql)[0]
    
        assert list(
            (t.ttype, t.value) for t in tokens if t.ttype not in sqlparse.tokens.Whitespace
        ) == [
            (sqlparse.tokens.Keyword.DML, "select"),
            (sqlparse.tokens.Wildcard, "*"),
            (sqlparse.tokens.Keyword, "from"),
            (None, "foo"),
            (sqlparse.tokens.Name.Builtin, "BACON"),
            (sqlparse.tokens.Keyword, "SPAM"),
            (sqlparse.tokens.Keyword, "EGGS"),
            (sqlparse.tokens.Punctuation, ";"),
        ]
        # reset the syntax for later tests.
        Lexer().default_initialization()
    
    opened by mrmasterplan 8
  • Split does not work properly

    Split does not work properly

    The split does not work properly for the below Oracle Stored procedure

    CREATE OR REPLACE EDITIONABLE PROCEDURE "OWB_ADMIN"."AZBJ_RI_LIMIT_PROC" AS week_date_first DATE := TRUNC (azbj_pme_api.opus_date - 7); week_date_last DATE := TRUNC (azbj_pme_api.opus_date - 1); BEGIN logtrace ('LOG', 10001, 'PROC AZBJ_RI_LIMIT_PROC STARTED', 'AZBJ_RI_LIMIT_PROC'); BEGIN EXECUTE IMMEDIATE ('DROP TABLE STG_RI_LIMIT_DATA PURGE'); EXCEPTION WHEN OTHERS THEN NULL; END; END;

    Screen Shot 2022-11-26 at 8 33 56 PM

    Bug 
    opened by corncat-liu 1
  • Split does not work for REPLACE PROCEDURE

    Split does not work for REPLACE PROCEDURE

    The below one REPLACE PROCEDURE is split into multiple as in screenshot

    REPLACE PROCEDURE WDP1POMD_OBJECT.TYPE_DO_DELETE( P_OBJECT_TYPE_CODE VARCHAR(1)) SQL SECURITY CREATOR #MAIN# : BEGIN DECLARE GV_PACKAGE_NAME VARCHAR(50) DEFAULT 'WDP1POMD_OBJECT_TYPE'; DECLARE gv_module_name VARCHAR(50); DECLARE EXIT HANDLER FOR SQLState '02000' RESIGNAL SQLSTATE 'T7665' SET MESSAGE_TEXT = '- 20011, Record not found.'; DECLARE EXIT HANDLER FOR SQLException BEGIN CALL WDP1POMD_UTIL.omd_fatal_log2(GV_PACKAGE_NAME || '.' || GV_MODULE_NAME, SQLCODE); RESIGNAL; END; SET GV_MODULE_NAME = 'DO_DELETE';

    DELETE FROM WDP1TOMD.OMD_OBJECT_TYPE WHERE OBJECT_TYPE_CODE = P_OBJECT_TYPE_CODE; END;

    Screen Shot 2022-11-26 at 8 18 11 PM

    Needs Feedback 
    opened by corncat-liu 1
Owner
Andi Albrecht
Andi Albrecht
Chilean Digital Vaccination Pass Parser (CDVPP) parses digital vaccination passes from PDF files

cdvpp Chilean Digital Vaccination Pass Parser (CDVPP) parses digital vaccination passes from PDF files Reads a Digital Vaccination Pass PDF file as in

Esteban Borai 1 Nov 17, 2021
This is REST-API for Indonesian Text Summarization using Non-Negative Matrix Factorization for the algorithm to summarize documents and FastAPI for the framework.

Indonesian Text Summarization Using FastAPI This is REST-API for Indonesian Text Summarization using Non-Negative Matrix Factorization for the algorit

Viqi Nurhaqiqi 2 Nov 3, 2022
The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

Contents Maintainer wanted Introduction Installation Documentation License History Source code Authors Maintainer wanted I am looking for a new mainta

Antti Haapala 1.2k Dec 16, 2022
A simple Python module for parsing human names into their individual components

Name Parser A simple Python (3.2+ & 2.6+) module for parsing human names into their individual components. hn.title hn.first hn.middle hn.last hn.suff

Derek Gulbranson 574 Dec 20, 2022
PyMultiDictionary is a Dictionary Module for Python 3+ to get meanings, translations, synonyms and antonyms of words in 20 different languages

PyMultiDictionary PyMultiDictionary is a Dictionary Module for Python 3+ to get meanings, translations, synonyms and antonyms of words in 20 different

Pablo Pizarro R. 19 Dec 26, 2022
Implementation of hashids (http://hashids.org) in Python. Compatible with Python 2 and Python 3

hashids for Python 2.7 & 3 A python port of the JavaScript hashids implementation. It generates YouTube-like hashes from one or many numbers. Use hash

David Aurelio 1.4k Jan 2, 2023
Python character encoding detector

Chardet: The Universal Character Encoding Detector Detects ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants) Big5, GB2312, EUC-TW, HZ-GB-2312, IS

Character Encoding Detector 1.8k Jan 8, 2023
Fuzzy String Matching in Python

FuzzyWuzzy Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

SeatGeek 8.8k Jan 8, 2023
Paranoid text spacing in Python

pangu.py Paranoid text spacing for good readability, to automatically insert whitespace between CJK (Chinese, Japanese, Korean) and half-width charact

Vinta Chen 194 Nov 19, 2022
An implementation of figlet written in Python

All of the documentation and the majority of the work done was by Christopher Jones ([email protected]). Packaged by Peter Waller <[email protected]>,

Peter Waller 1.1k Jan 2, 2023
Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

TextDistance TextDistance -- python library for comparing distance between two or more sequences by many algorithms. Features: 30+ algorithms Pure pyt

Life4 3k Jan 2, 2023
Python flexible slugify function

awesome-slugify Python flexible slugify function PyPi: https://pypi.python.org/pypi/awesome-slugify Github: https://github.com/dimka665/awesome-slugif

Dmitry Voronin 471 Dec 20, 2022
Python Lex-Yacc

PLY (Python Lex-Yacc) Copyright (C) 2001-2020 David M. Beazley (Dabeaz LLC) All rights reserved. Redistribution and use in source and binary forms, wi

David Beazley 2.4k Dec 31, 2022
Python library for creating PEG parsers

PyParsing -- A Python Parsing Module Introduction The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the t

Pyparsing 1.7k Dec 27, 2022
Python port of Google's libphonenumber

phonenumbers Python Library This is a Python port of Google's libphonenumber library It supports Python 2.5-2.7 and Python 3.x (in the same codebase,

David Drysdale 3.1k Dec 29, 2022
A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings.

Python User Agents user_agents is a Python library that provides an easy way to identify/detect devices like mobile phones, tablets and their capabili

Selwin Ong 1.3k Dec 22, 2022
An anthology of a variety of tools for the Persian language in Python

An anthology of a variety of tools for the Persian language in Python

Persian Tools 106 Nov 8, 2022
Widevine KEY Extractor in Python

Widevine Client 3 This was originally written by T3rry7f. This repo is slightly modified version of his repo. This only works on standard Windows! Usa

Vank0n (SJJeon) 68 Dec 29, 2022
A Python app which can convert normal text to Handwritten text.

Text to HandWritten Text ✍️ Converter Watch Tutorial for this project Usage:- Clone my repository. Open CMD in working directory. Run following comman

Kushal Bhavsar 5 Dec 11, 2022