Simple DDL Parser to parse SQL (HQL, TSQL, AWS Redshift, Snowflake and other dialects) ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc.

Overview

Simple DDL Parser

badge1 badge2 badge3 workflow

Build with ply (lex & yacc in python). A lot of samples in 'tests/.

Is it Stable?

Yes, library already has about 5000+ usage per day - https://pypistats.org/packages/simple-ddl-parser.

As maintainer I guarantee that any backward incompatible changes will not be done in patch or minor version. Only additionals & new features.

However, in process of adding support for new statements & features I see that output can be structured more optimal way and I hope to release version 1.0.* with more struct output result. But, it will not be soon, first of all, I want to add support for so much statements as I can. So I don't think make sense to expect version 1.0.* before, for example, version 0.26.0 :)

How does it work?

Parser tested on different DDLs for PostgreSQL & Hive. But idea to support as much as possible DDL dialects (AWS Redshift, Oracle, Hive, MsSQL, etc.). You can check dialects sections after Supported Statements section to get more information that statements from dialects already supported by parser. If you need some statement, that not supported by parser yet: please provide DDL example & information about that is it SQL dialect or DB.

Types that are used in your DB does not matter, so parser must also work successfuly to any DDL for SQL DB. Parser is NOT case sensitive, it did not expect that all queries will be in upper case or lower case. So you can write statements like this:

    Alter Table Persons ADD CONSTRAINT CHK_PersonAge CHECK (Age>=18 AND City='Sandnes');

It will be parsed as is without errors.

If you have samples that cause an error - please open the issue (but don't forget to add ddl example), I will be glad to fix it.

A lot of statements and output result you can find in tests on the github - https://github.com/xnuinside/simple-ddl-parser/tree/main/tests .

How to install

    pip install simple-ddl-parser

How to use

Extract additional information from HQL (& other dialects)

In some dialects like HQL there is a lot of additional information about table like, fore example, is it external table, STORED AS, location & etc. This propertie will be always empty in 'classic' SQL DB like PostgreSQL or MySQL and this is the reason, why by default this information are 'hidden'. Also some fields hidden in HQL, because they are simple not exists in HIVE, for example 'deferrable_initially' To get this 'hql' specific details about table in output please use 'output_mode' argument in run() method.

example:

    ddl = """
    CREATE TABLE IF NOT EXISTS default.salesorderdetail(
        SalesOrderID int,
        ProductID int,
        OrderQty int,
        LineTotal decimal
        )
    PARTITIONED BY (batch_id int, batch_id2 string, batch_32 some_type)
    LOCATION 's3://datalake/table_name/v1'
    ROW FORMAT DELIMITED
        FIELDS TERMINATED BY ','
        COLLECTION ITEMS TERMINATED BY '\002'
        MAP KEYS TERMINATED BY '\003'
    STORED AS TEXTFILE
    """

    result = DDLParser(ddl).run(output_mode="hql")
    print(result)

And you will get output with additional keys 'stored_as', 'location', 'external', etc.

    # additional keys examples
  {
    ...,
    'location': "'s3://datalake/table_name/v1'",
    'map_keys_terminated_by': "'\\003'",
    'partitioned_by': [{'name': 'batch_id', 'size': None, 'type': 'int'},
                        {'name': 'batch_id2', 'size': None, 'type': 'string'},
                        {'name': 'batch_32', 'size': None, 'type': 'some_type'}],
    'primary_key': [],
    'row_format': 'DELIMITED',
    'schema': 'default',
    'stored_as': 'TEXTFILE',
    ... 
  }

If you run parser with command line add flag '-o=hql' or '--output-mode=hql' to get the same result.

Possible output_modes: ["mssql", "mysql", "oracle", "hql", "sql", "redshift", "snowflake"]

From python code

    from simple_ddl_parser import DDLParser


    parse_results = DDLParser("""create table dev.data_sync_history(
        data_sync_id bigint not null,
        sync_count bigint not null,
        sync_mark timestamp  not  null,
        sync_start timestamp  not null,
        sync_end timestamp  not null,
        message varchar(2000) null,
        primary key (data_sync_id, sync_start)
    ); """).run()

    print(parse_results) 

To parse from file

    
    from simple_ddl_parser import parse_from_file

    result = parse_from_file('tests/sql/test_one_statement.sql')
    print(result)

From command line

simple-ddl-parser is installed to environment as command sdp

    sdp path_to_ddl_file

    # for example:

    sdp tests/sql/test_two_tables.sql
    

You will see the output in schemas folder in file with name test_two_tables_schema.json

If you want to have also output in console - use -v flag for verbose.

    
    sdp tests/sql/test_two_tables.sql -v
    

If you don't want to dump schema in file and just print result to the console, use --no-dump flag:

    
    sdp tests/sql/test_two_tables.sql --no-dump
    

You can provide target path where you want to dump result with argument -t, --targer:

    
    sdp tests/sql/test_two_tables.sql -t dump_results/
    

More details

DDLParser(ddl).run() .run() method contains several arguments, that impact changing output result. As you can saw upper exists argument output_mode that allow you to set dialect and get more fields in output relative to chosen dialect, for example 'hql'. Possible output_modes: ["mssql", "mysql", "oracle", "hql", "sql"]

Also in .run() method exists argument group_by_type (by default: False). By default output of parser looks like a List with Dicts where each dict == one entitiy from ddl (table, sequence, type, etc). And to understand that is current entity you need to check Dict like: if 'table_name' in dict - this is a table, if 'type_name' - this is a type & etc.

To make work little bit easy you can set group_by_type=True and you will get output already sorted by types, like:

    { 
        'tables': [all_pasrsed_tables], 
        'sequences': [all_pasrsed_sequences], 
        'types': [all_pasrsed_types], 
        'domains': [all_pasrsed_domains],
        ...
    }

For example:

    ddl = """
    CREATE TYPE "schema--notification"."ContentType" AS
        ENUM ('TEXT','MARKDOWN','HTML');
        CREATE TABLE "schema--notification"."notification" (
            content_type "schema--notification"."ContentType"
        );
    CREATE SEQUENCE dev.incremental_ids
        INCREMENT 10
        START 0
        MINVALUE 0
        MAXVALUE 9223372036854775807
        CACHE 1;
    """

    result = DDLParser(ddl).run(group_by_type=True)

    # result will be:

    {'sequences': [{'cache': 1,
                    'increment': 10,
                    'maxvalue': 9223372036854775807,
                    'minvalue': 0,
                    'schema': 'dev',
                    'sequence_name': 'incremental_ids',
                    'start': 0}],
    'tables': [{'alter': {},
                'checks': [],
                'columns': [{'check': None,
                            'default': None,
                            'name': 'content_type',
                            'nullable': True,
                            'references': None,
                            'size': None,
                            'type': '"schema--notification"."ContentType"',
                            'unique': False}],
                'index': [],
                'partitioned_by': [],
                'primary_key': [],
                'schema': '"schema--notification"',
                'table_name': '"notification"'}],
    'types': [{'base_type': 'ENUM',
                'properties': {'values': ["'TEXT'", "'MARKDOWN'", "'HTML'"]},
                'schema': '"schema--notification"',
                'type_name': '"ContentType"'}]}

ALTER statements

Right now added support only for ALTER statements with FOREIGEIN key

For example, if in your ddl after table defenitions (create table statements) you have ALTER table statements like this:

ALTER TABLE "material_attachments" ADD FOREIGN KEY ("material_id", "material_title") REFERENCES "materials" ("id", "title");

This statements will be parsed and information about them putted inside 'alter' key in table's dict. For example, please check alter statement tests - tests/test_alter_statements.py

More examples & tests

You can find in tests/ folder.

Dump result in json

To dump result in json use argument .run(dump=True)

You also can provide a path where you want to have a dumps with schema with argument .run(dump_path='folder_that_use_for_dumps/')

Supported Statements

  • CREATE TABLE [ IF NOT EXISTS ] + columns defenition, columns attributes: column name + type + type size(for example, varchar(255)), UNIQUE, PRIMARY KEY, DEFAULT, CHECK, NULL/NOT NULL, REFERENCES, ON DELETE, ON UPDATE, NOT DEFERRABLE, DEFERRABLE INITIALLY, GENERATED ALWAYS, STORED, COLLATE

  • STATEMENTS: PRIMARY KEY, CHECK, FOREIGN KEY in table defenitions (in create table();)

  • ALTER TABLE STATEMENTS: ADD CHECK (with CONSTRAINT), ADD FOREIGN KEY (with CONSTRAINT), ADD UNIQUE, ADD DEFAULT FOR

  • PARTITION BY statement

  • CREATE SEQUENCE with words: INCREMENT, START, MINVALUE, MAXVALUE, CACHE

  • CREATE TYPE statement: AS ENUM, AS OBJECT, INTERNALLENGTH, INPUT, OUTPUT

  • LIKE statement (in this and only in this case to output will be added 'like' keyword with information about table from that we did like - 'like': {'schema': None, 'table_name': 'Old_Users'}).

  • TABLESPACE statement

  • COMMENT ON statement

  • CREATE SCHEMA [IF NOT EXISTS] ... [AUTHORIZATION] ...

  • CREATE DOMAIN [AS]

  • CREATE [SMALLFILE | BIGFILE] [TEMPORARY] TABLESPACE statement

  • CREATE DATABASE + Properties parsing

HQL Dialect statements

  • PARTITIONED BY statement
  • ROW FORMAT, ROW FORMAT SERDE
  • WITH SERDEPROPERTIES ("input.regex" = "..some regex..")
  • STORED AS
  • COMMENT
  • LOCATION
  • FIELDS TERMINATED BY, LINES TERMINATED BY, COLLECTION ITEMS TERMINATED BY, MAP KEYS TERMINATED BY
  • TBLPROPERTIES ('parquet.compression'='SNAPPY' & etc.)

MSSQL / MySQL/ Oracle

  • type IDENTITY statement
  • FOREIGN KEY REFERENCES statement
  • 'max' specifier in column size
  • CONSTRAINT ... UNIQUE, CONSTRAINT ... CHECK, CONSTRAINT ... FOREIGN KEY, CONSTRAINT ... PRIMARY KEY
  • CREATE CLUSTERED INDEX

Oracle

  • ENCRYPT column property [+ NO SALT, SALT, USING]
  • STORAGE column property

AWS Redshift Dialect statements

  • ENCODE column property

  • SORTKEY, DISTSTYLE, DISTKEY, ENCODE table properties

  • CREATE TEMP / TEMPORARY TABLE

  • syntax like with LIKE statement:

create temp table tempevent(like event);

Snowflake Dialect statements

  • CREATE .. CLONE statements for table, database and schema
  • CREATE TABLE .. CLUSTER BY ..
  • CONSTRAINT .. [NOT] ENFORCED

TODO in next Releases (if you don't see feature that you need - open the issue)

-1. Add base support for BigQuery DDL dialect. 0. Add support for ALTER TABLE ... ADD COLUMN

  1. Add more support for CREATE type IS TABLE (example: CREATE OR REPLACE TYPE budget_tbl_typ IS TABLE OF NUMBER(8,2);
  2. Add support (ignore correctly) ALTER TABLE ... DROP CONSTRAINT ..., ALTER TABLE ... DROP INDEX ...
  3. Add support for COMMENT ON statement
  4. Add support for SKEWED BY for HQL

non-feature todo

  1. Provide API to get result as Python Object
  2. Add online demo (UI) to parse ddl

Historical context

This library is an extracted parser code from https://github.com/xnuinside/fakeme (Library for fake relation data generation, that I used in several work projects, but did not have time to make from it normal open source library)

For one of the work projects I needed to convert SQL ddl to Python ORM models in auto way and I tried to use https://github.com/andialbrecht/sqlparse but it works not well enough with ddl for my case (for example, if in ddl used lower case - nothing works, primary keys inside ddl are mapped as column name not reserved word and etc.). So I remembered about Parser in Fakeme and just extracted it & improved.

Changelog

v0.19.6 Fixes:

  1. Fixed issue with PARTITIONED BY multiple columns in HQL - https://github.com/xnuinside/simple-ddl-parser/issues/66
  2. Question symbol '?' now handled valid in strings - https://github.com/xnuinside/simple-ddl-parser/issues/64
  3. Fixed issue with escaping symbols & added tests -https://github.com/xnuinside/simple-ddl-parser/issues/63

Features:

  1. Added support for HQL statement TBLPROPERTIES - https://github.com/xnuinside/simple-ddl-parser/issues/65

v0.19.5 Fixes:

  1. Fixed issues with COMMENT statement in column definitions. Add bunch of tests, now they expect working ok.

v0.19.4

  1. Added support for PARTITION BY (previously was only PARTITIONED BY from HQL)

v0.19.2

  1. Added support for ` quotes in column & tables names

v0.19.1 Fixes:

  1. Issue with '\t' reported in https://github.com/xnuinside/simple-ddl-parser/issues/53

Features:

  1. Added base for future BigQuery support: added output_mode="bigquery". Pay attention that there is no schemas in BigQuery, so schemas are Datasets.

v0.19.0 Features

  1. Added support for base Snowflake SQL Dialect. Added new --output-mode='snowflake' (add "clone" key)

Added support for CREATE .. CLONE with same behaviour as CREATE .. LIKE Added support for CREATE .. CLONE for schemas and database - displayed in output as {"clone": {"from": ... }} CREATE TABLE .. CLUSTER BY .. CONSTRAINT .. [NOT] ENFORCED (value stored in 'primary_key_enforced')

  1. in CREATE DATABASE properties that goes after name like key=value now parsed valid. Check examples in tests
  2. Added support for varchar COLLATE column property

v0.18.0 Features

  1. Added base support fot AWS Redshift SQL dialect. Added support for ENCODE property in column. Added new --output-mode='redshift' that add to column 'encrypt' property by default. Also add table properties: distkeys, sortkey, diststyle, encode (table level encode), temp.

Supported Redshift statements: SORTKEY, DISTSTYLE, DISTKEY, ENCODE

CREATE TEMP / TEMPORARY TABLE

syntax like with LIKE statement:

create temp table tempevent(like event);

v0.17.0

  1. All dependencies were updated for the latest version.
  2. Added base support for CREATE [BIGFILE | SMALLFILE] [TEMPORARY] TABLESPACE
  3. Added support for create table properties like TABLESPACE user_data ENABLE STORAGE IN ROW CHUNK 8K RETENTION CACHE
  4. Added support for CREATE DATABASE statement

v0.16.3

  1. Fixed issue then using columns names equals some tokens like, for example, key caused the error. But still words 'foreign' and 'constraint' as column names cause the empty result. I hope they rarely used. Will be fixed in next releases.

v0.16.2

  1. Fixed issue with enum in lowercase

v0.16.0

  1. Fixed the issue when NULL column after DEFAULT used as default value.
  2. Added support for generated columns, statatements: AS , GENERATED ALWAYS, STORED in Column Defenitions, in output it placed to key 'generated'. Keyword 'generated' showed only if column is generated.
  3. Half of changelogs moved to ARCHIVE_CHANGELOG.txt
  4. Added base support for CREATE DOMAIN statement
  5. Added base support for CREATE SCHEMA [IF NOT EXISTS] ... [AUTHORIZATION] statement, added new type keyword 'schemas'

v0.15.0

  1. Garbage like '‘’' quotes are ignored now and changed to normal.
  2. Added support for HQL: LINES TERMINATED BY, COMMENT (for table), ROW FORMAT SERDE, WITH SERDEPROPERTIES ( "input.regex" = "..some regex..")
  3. Fixed issue when primary key with default option was not parsed correct - https://github.com/xnuinside/simple-ddl-parser/issues/40
  4. Fixed issue when expression in default value was not parsed correct - https://github.com/xnuinside/simple-ddl-parser/issues/39
  5. Added support for comments in Columns (except one case when COMMENT goes after DEFAULT word, in this case does not parse correct now - will be fixed in next releases)

v0.14.0

  1. Added support for CONSTRAINT ... PRIMARY KEY ...
  2. Added support for ENCRYPT [+ NO SALT, SALT, USING] statements for Oracle dialect. All default values taken from this doc https://docs.oracle.com/en/database/oracle/oracle-database/21/asoag/encrypting-columns-tables2.html Now if you use output_mode='oracle' in column will be showed new property 'encrypt'. If no ENCRYPT statement will be in table defenition - then value will be 'None', but if ENCRYPT exists when in encrypt property you will find this information:

{'encrypt' : { 'salt': True, 'encryption_algorithm': 'AES192', 'integrity_algorithm': 'SHA-1' }}

  1. Added support for oracle STORAGE statement, 'oracle' output_mode now has key 'storage' in table data defenition.
  2. Added support for TABLESPACE statement after columns defenition

v0.12.1

  1. () after DEFAULT now does not cause an issue
  2. ' and " does not lost now in DEFAULT values

v0.12.0

  1. Added support for MSSQL: types with 2 words like 'int IDENTITY', FOREIGN KEY REFERENCES statement, supported 'max' as type size, CONSTRAINT ... UNIQUE statement in table defenition, CONSTRAINT ... CHECK, CONSTRAINT ... FOREIGN KEY
  2. Added output_mode types: 'mysql', 'mssql' for SQL Server, 'oracle'. If chosed one of the above - added key 'constraints' in table defenition by default. 'constraints' contain dict with keys 'uniques', 'checks', 'references' it this is a COSTRAINT .. CHECK 'checks' key will be still in data output, but it will be duplicated to 'constraints': {'checks': ...}
  3. Added support for ALTER ADD ... UNIQUE
  4. Added support for CREATE CLUSTERED INDEX, if output_mode = 'mssql' then index will have additional arg 'clustered'.
  5. Added support for DESC & NULLS in CREATE INDEX statements. Detailed information places in key 'detailed_columns' in 'indexes', example: ' 'index': [{'clustered': False, 'columns': ['extra_funds'], 'detailed_columns': [{'name': 'extra_funds', 'nulls': 'LAST', 'order': 'ASC'}],
  6. Added support for statement ALTER TABLE ... ADD CONSTRAINT ... DEFAULT ... FOR ... ;

v0.11.0

  1. Now table can has name 'table'
  2. Added base support for statement CREATE TYPE: AS ENUM, AS OBJECT, INTERNALLENGTH, INPUT, OUTPUT (not all properties & types supported yet.)
  3. Added argument 'group_by_type' in 'run' method that will group output by type of parsed entities like: 'tables': [all_pasrsed_tables], 'sequences': [all_pasrsed_sequences], 'types': [all_pasrsed_types], 'domains': [all_pasrsed_domains]
  4. Type in column defenition also can be "schema"."YourCustomType"
  5. " now are not dissapeared if you use them in DDL.
Comments
  • getting an empty output for CREATE SCHEMA  and OPTIONS

    getting an empty output for CREATE SCHEMA and OPTIONS

    Hi @xnuinside I was trying to parse this ddl but getting an empty [ ] output

    from simple_ddl_parser import parse_from_file
    import json
    import os
    
    filename = r'example.ddl'
    result = parse_from_file(filename)
    print(result)
    output
    []
    
    json_data = json.dumps(result)
    print(json_data)
    output
    []
    

    Not supporting CREATE SCHEMA IF NOT EXISTS and it is expecting only one OPTIONS

    CREATE SCHEMA IF NOT EXISTS name-name OPTIONS ( location="path" ); CREATE TABLE name.hub.REF_CALENDAR ( calendar_dt DATE, calendar_dt_id INT, fiscal_half_year_day_no INT, fiscal_half_year_end_dt DATE, fiscal_half_year_month_no INT, fiscal_half_year_nm STRING, fiscal_half_year_no INT, fiscal_half_year_reporting_week_no INT, fiscal_half_year_start_dt DATE, fiscal_half_year_week_no INT, fiscal_year_day_no INT, fiscal_year_end_dt DATE, fiscal_year_long_nm STRING, fiscal_year_month_no INT, fiscal_year_nm STRING, fiscal_year_no INT, fiscal_year_quarter_no INT, fiscal_year_reporting_week_no INT, fiscal_year_start_dt DATE, fiscal_year_week_no INT, month_day_no INT, month_end_dt DATE, month_id STRING, month_nm STRING, month_reporting_week_no INT, month_start_dt DATE, month_week_no INT, quarter_day_no INT, quarter_end_dt DATE, quarter_month_no INT, quarter_reporting_week_no INT, quarter_start_dt DATE, quarter_week_no INT, reporting_week_day_no INT, reporting_week_end_dt DATE, week_day_nm STRING, week_day_no INT, week_day_short_nm STRING, weekend_ind INT, weekend_ind_desc STRING, year_day_no INT, year_month_no INT, year_no INT, year_quarter_no STRING, year_reporting_week_no INT, year_week_no INT ) OPTIONS ( description="Calendar table records reference list of calendar dates and related attributes used for reporting." );

    opened by ankitdata 23
  • Parse Hive External Table DDL with Partitions

    Parse Hive External Table DDL with Partitions

    I have below hive DDL which i need to parse but it is not working. @xnuinside - could you please help

    CREATE EXTERNAL TABLE IF NOT EXISTS database.table_name
    (
        day_long_nm     string,
        calendar_dt     date,
        source_batch_id string,
        field_qty       decimal(10, 0),
        field_bool      boolean,
        field_float     float,
        create_tmst     timestamp,
        field_double    double,
        field_long      bigint
    ) PARTITIONED BY (batch_id int) STORED AS PARQUET LOCATION 's3://datalake/table_name/v1'
    
    opened by nitin-kakkar 14
  • MSSQL/TSQL Support

    MSSQL/TSQL Support

    Hi @xnuinside

    I hope your are doing well. I saw that you made some major enhancements since to this project (FYI this is still amazing). I wanted to provide this comprehensive query for when you plan to add MS/T SQL support to the parser. You will notice that Postgres and MS SQL have similar syntaxes . The DDL below includes almost every TSQL data type. I also included the max sizes for each of the data types. Please feel free to let me know if you have any issues with the query.

    CREATE TABLE Persons (
        PersonID int PRIMARY KEY,
        LastName varchar(255),
        FirstName varchar(255),
        Address varchar(255),
        City varchar(255)
    );
    
    
    CREATE TABLE sqlserverlist (
    
    id INT IDENTITY (1,1) PRIMARY KEY, -- NOTE THE IDENTITY (1,1) IS SIMILAR TO serial in postgres - Format for IDENTITY [ (seed , increment) ]
    company_id BIGINT ,
    primary_id INT FOREIGN KEY REFERENCES Persons(PersonID), -- ADD THIS COLUMN FOR THE FOREIGN KEY
    age TINYINT NULL UNIQUE,
    days_active SMALLINT NOT NULL,
    user_origin_of_birth char(255),
    user_account VARCHAR(8000) NOT NULL,
    user_first_name VARCHAR(max) NOT NULL,
    user_last_name 	VARBINARY(8000) NOT NULL,
    user_street NCHAR(400) NULL,
    user_city NVARCHAR(4000),
    about_user NTEXT NULL,
    user_description TEXT,
    starting_funds FLOAT(53) NULL,
    extra_funds REAL,
    current_funds DECIMAL (38,20),
    ending_funds SMALLMONEY NOT NULL,
    birth_date DATE NOT NULL,
    time_of_birth TIME(7),
    enrollment_date SMALLDATETIME,
    delete_date DATETIME NULL,
    create_date DATETIME2(7) NOT NULL,
    user_time_zone DATETIMEOFFSET(7),
    oder_date date DEFAULT GETDATE(), -- added to demonstrate sql sever Defaults
    country varchar(255) DEFAULT 'Sandnes', -- added to demonstrate sql sever Defaults
    active bit NULL,
    home_size GEOMETRY, -- Sql Server Defaults to Null
    user_photo IMAGE, -- Sql Server Defaults to Null
    --UNIQUE (id),
    CONSTRAINT UC_sqlserverlist_last_name UNIQUE (company_id,user_last_name),
    CONSTRAINT CHK_Person_Age_under CHECK (days_active<=18 AND user_city='New York'),
    FOREIGN KEY (id) REFERENCES Persons(PersonID),
    CONSTRAINT FK_Person_Age_under  FOREIGN KEY (id)REFERENCES Persons(PersonID)
    )
    
    -- UNIQUE CONSTRAINTS
    ALTER TABLE sqlserverlist ADD UNIQUE (birth_date);
    ALTER TABLE sqlserverlist ADD CONSTRAINT UC_Person_ening_funds UNIQUE (current_funds,create_date);
    --CHECK CONSTRAINTS
    ALTER TABLE sqlserverlist ADD CHECK (starting_funds>=0);
    ALTER TABLE sqlserverlist ADD CONSTRAINT CHK_Person_Age CHECK (age >=18 AND user_city='Sandnes');
    -- FOREIGN KEY Constraints
    ALTER TABLE sqlserverlist ADD FOREIGN KEY (id) REFERENCES Persons(PersonID);
    ALTER TABLE sqlserverlist ADD CONSTRAINT FK_PersonOrder4 FOREIGN KEY (id) REFERENCES Persons(PersonID);
    -- INDEXES 
    CREATE INDEX i1 ON sqlserverlist (extra_funds);
    --CREATE CLUSTERED INDEX i2 ON sqlserverlist  (delete_date,create_date); --- This line is commented because TSQL doesnt allow two index on table added as an example
    --CREATE UNIQUE INDEX i1 ON t1 (delete_date DESC, create_date ASC, ending_funds  DESC); --- This line is commented because TSQL doesnt allow two index on table added as an example
    
    -- DEFAULTS
    ALTER TABLE sqlserverlist ADD CONSTRAINT df_user_street DEFAULT '1 WAY STREET' FOR user_street;
    SELECT * FROM #sqlserverlist
    
    opened by geob3d 11
  • create OR REPLACE transient/temporary TABLE not recognized by parser

    create OR REPLACE transient/temporary TABLE not recognized by parser

    Describe the bug "create OR REPLACE transient/temporary table" not recognized by parser. Thx for quickly adding the support for "OR REPLACE TABLE"! It seems that the "OR REPLACE" is only working with the "TABLE", but not with the Snowflake specific "TRANSIENT TABLE" or "TEMPORARY TABLE". However the support for snowflake's transient/temporary table is there e.g. "create TRANSIENT/TEMPORARY table" is working properly. Snowflake Docs: Working with Temporary and Transient Tables

    To Reproduce

    CREATE OR REPLACE TRANSIENT TABLE someTable (
        someField VARCHAR(4)
    )
    
    CREATE OR REPLACE TEMPORARY TABLE someTable (
        someField VARCHAR(4)
    )
    

    Expected behavior To get a proper output from the parser.

    opened by devxor 8
  • omymodels.errors.NoTablesError: No tables was found in DDL input.

    omymodels.errors.NoTablesError: No tables was found in DDL input.

    Describe the bug No tables was found in DDL input

    To Reproduce Steps to reproduce the behavior: Generate my DDL from SSMS and import to OMM

    Expected behavior find tables to parse

    Desktop (please complete the following information):

    • OS: [e.g. iOS] Win10

    script.sql.txt

    opened by KrunchMuffin 8
  • Table name which is also a

    Table name which is also a "reserved word" fails to parse

    Consider:

    create table index (col1:int);
    
    create table foo.[index] (col1:int);
    
    create table foo.index (col1:int);
    

    The first 2 lines understand a table named "index" is intended. The third line classifies the word "index" as the token 'INDEX' probably here.

    Though not a good or recommended practice, table names are allowed to be arbitrary strings, even reserved words.

    If strict adherence to quoted/delimited usage of key-word-as-identifier DDL is desired, then the first line (bare "index" ) should have failed. If relaxed adherence is intended, then all 3 should have succeeded.

    bug 
    opened by nurih 5
  • Fails to parse output from pg_dump?

    Fails to parse output from pg_dump?

    I'm not sure if I'm using this wrong, but DDLParser() fails to parse the attached PostgreSQL schema output from pg_dump

    schema.txt

    Traceback:

    ~/.cache/pypoetry/virtualenvs/capgras-r404Yw5Z-py3.8/lib/python3.8/site-packages/simple_ddl_parser/parser.py in run(self, dump, dump_path, file_path, output_mode, group_by_type, json_dump)
        228             Dict == one entity from ddl - one table or sequence or type.
        229         """
    --> 230         tables = self.parse_data()
        231         tables = result_format(tables, output_mode, group_by_type)
        232         if dump:
    
    ~/.cache/pypoetry/virtualenvs/capgras-r404Yw5Z-py3.8/lib/python3.8/site-packages/simple_ddl_parser/parser.py in parse_data(self)
        167                 if final_line or new_statement:
        168                     # end of sql operation, remove ; from end of line
    --> 169                     statement = statement[:-1]
        170                 elif num != len(lines) - 1 and not skip:
        171                     # continue combine lines in one massive
    
    TypeError: 'NoneType' object is not subscriptable
    

    Please feel free to close this issue if you don't intend to support this use-case.

    bug improvement 
    opened by bjmc 5
  • Upper case

    Upper case "CREATE" in column name ends table parse

    Describe the bug It looks like you are seeing CREATE and breaking the table DDL parse...

    To Reproduce

    t1.sql:

    DROP TABLE IF EXISTS demo;
    --
    CREATE TABLE demo
    (
         foo                             char(1),
         create_date                     DATETIME2,
         created_BY                      VARCHAR (20)
    )
    --
    

    t2.sql:

    DROP TABLE IF EXISTS demo;
    --
    CREATE TABLE demo
    (
         foo                             char(1),
         CREATE_date                     DATETIME2,
         created_by                      VARCHAR (20),
    )
    --
    
    sdp -v t1.sql
    $ sdp -v t1.sql
    Start parsing file t1.sql 
    
    File with result was saved to >> schemas folder
    [{'alter': {},
      'checks': [],
      'columns': [{'check': None,
                   'default': None,
                   'name': 'foo',
                   'nullable': True,
                   'references': None,
                   'size': 1,
                   'type': 'char',
                   'unique': False},
                  {'check': None,
                   'default': None,
                   'name': 'create_date',
                   'nullable': True,
                   'references': None,
                   'size': None,
                   'type': 'DATETIME2',
                   'unique': False},
                  {'check': None,
                   'default': None,
                   'name': 'created_BY',
                   'nullable': True,
                   'references': None,
                   'size': 20,
                   'type': 'VARCHAR',
                   'unique': False}],
      'index': [],
      'partitioned_by': [],
      'primary_key': [],
      'schema': None,
      'table_name': 'demo',
      'tablespace': None}]
    

    vs:

    $ sdp -v t2.sql
    Start parsing file t2.sql 
    
    File with result was saved to >> schemas folder
    [{'alter': {},
      'checks': [],
      'columns': [{'check': None,
                   'default': None,
                   'name': 'foo',
                   'nullable': True,
                   'references': None,
                   'size': 1,
                   'type': 'char',
                   'unique': False}],
      'index': [],
      'partitioned_by': [],
      'primary_key': [],
      'schema': None,
      'table_name': 'demo',
      'tablespace': None}]
    

    Expected behavior A capitalized substring field name that is CREATE shouldn't break the parser.

    opened by n2ygk 5
  • Unsupported ">

    Unsupported "DEFAULT NEXT VALUE FOR" & ON

    Unsupported parse of NEXT VALUE FOR and ON on CREATE TABLE

    When CREATE TABLE uses DEFAULT NEXT VALUE or script does user storage options (ex.: ON PRIMARY) , the parser does not output any results.,

    To Reproduce

    from simple_ddl_parser import DDLParser

    ddl_Orig = ''' CREATE TABLE [dbo].[SLIPEVENTO] ( [cdSLIP] [bigint] NOT NULL DEFAULT NEXT VALUE FOR [dbo].[sqCdSLIPEvt] , -- Referencia da origem do evento [hashOrigem] VARBINARY NOT NULL, -- HASH pela tabela de onde vem o lancamento -- SIC, DARE21, SF, Manual, ... [tbOrigem] [int] NOT NULL, [tbVrtDstDflt] [int] NULL, CONSTRAINT [pk_slipsEvt] PRIMARY KEY CLUSTERED ( [cdSLIP] ), -- chave de referencia para multiplos lancamentos distribuidos -- ??? Identifica um documento origem via HASH na tbOrigem CONSTRAINT [uk_slipsEvtHASHdocs] UNIQUE([hashOrigem],[tbOrigem]) -- A FK precisa vir por trigger, pois sao diferentes origens., -- ) ON [PRIMARY] GO

    '''

    ddl1 = ''' CREATE TABLE [dbo].[SLIPEVENTO] ( [cdSLIP] [bigint] NOT NULL, -- Referencia da origem do evento [hashOrigem] VARBINARY NOT NULL, -- HASH pela tabela de onde vem o lancamento -- SIC, DARE21, SF, Manual, ... [tbOrigem] [int] NOT NULL, [tbVrtDstDflt] [int] NULL, CONSTRAINT [pk_slipsEvt] PRIMARY KEY CLUSTERED ( [cdSLIP] ), -- chave de referencia para multiplos lancamentos distribuidos -- ??? Identifica um documento origem via HASH na tbOrigem CONSTRAINT [uk_slipsEvtHASHdocs] UNIQUE([hashOrigem],[tbOrigem]) -- A FK precisa vir por trigger, pois sao diferentes origens., -- ) ON [PRIMARY] '''

    ddl2 = ''' CREATE TABLE [dbo].[SLIPEVENTO] ( [cdSLIP] [bigint] NOT NULL, -- Referencia da origem do evento [hashOrigem] VARBINARY NOT NULL, -- HASH pela tabela de onde vem o lancamento -- SIC, DARE21, SF, Manual, ... [tbOrigem] [int] NOT NULL, [tbVrtDstDflt] [int] NULL, CONSTRAINT [pk_slipsEvt] PRIMARY KEY CLUSTERED ( [cdSLIP] ), -- chave de referencia para multiplos lancamentos distribuidos -- ??? Identifica um documento origem via HASH na tbOrigem CONSTRAINT [uk_slipsEvtHASHdocs] UNIQUE([hashOrigem],[tbOrigem]) -- A FK precisa vir por trigger, pois sao diferentes origens., -- ); '''

    result = DDLParser(ddl_Orig).run(dump=True) print(result)

    result = DDLParser(ddl1).run(dump=True) print(result)

    result = DDLParser(ddl2).run(dump=True) print(result)

    Expected behavior Interpretation as proposed on docs.

    Desktop (please complete the following information):

    • SQL Dialect - MSSQL - Autogenerated scripts - V12
    • OS: Debian - Python3.9
    • Idle3 ; Eric6, QtCreator (all of them )
    • Version
    • fred@alpha:~/Documentos/GoiasFomento/QtPy/src$ pip3 install simple-ddl-parser Collecting simple-ddl-parser Downloading simple_ddl_parser-0.20.0-py3-none-any.whl (31 kB) Requirement already satisfied: ply<4.0,>=3.11 in /usr/lib/python3/dist-packages (from simple-ddl-parser) (3.11) Installing collected packages: simple-ddl-parser Successfully installed simple-ddl-parser-0.20.0

    Additional context Add any other context about the problem here.

    CREATE TYPE does not parse too.

    DROP TYPE dbo.T_LCT_SLIPS ; GO CREATE TYPE dbo.T_LCT_SLIPS AS TABLE ( hashKY varbinary(48), numContratoGF bigint, FATORGERADOR date , dtREFSIC date, cdTbOrigem int, cdTbVrtDstDflt int, Sq int, vl DECIMAL(32,0), descr varchar(254), tpEvento varchar(254), tpLcto char ); GO

    opened by fredcwbr 5
  • Escaped single quote in COMMENT is not handled properly

    Escaped single quote in COMMENT is not handled properly

    Describe the bug Escaped single quote in COMMENT is not handled properly

    To Reproduce ddl = """ CREATE EXTERNAL TABLE test ( job_id STRING COMMENT 'test's' ) STORED AS PARQUET LOCATION 'hdfs://test' """ from simple_ddl_parser import DDLParser parse_results = DDLParser(ddl).run(output_mode="hql")

    Expected behavior Non empty json should be returned

    opened by kukigai 5
  • Feature: AWS Redshift Support

    Feature: AWS Redshift Support

    Would it be possible to add support for AWS Redshift? It's based on Postgres 8.0 but, being a distributed database with columnar storage, has added some syntax to control how rows are distributed and compressed. You can find a BNF-esque grammar here.

    Currently, this doesn't detect the DDL at all:

    >>> results = DDLParser("""
    ...     CREATE TABLE foo.bar(
    ...         asdf INTEGER ENCODE ZSTD NOT NULL,
    ...         qwerty VARCHAR(255) ENCODE LZO
    ...     )
    ...     SORTSTYLE COMPOUND
    ...    SORTKEY (qwerty)
    ...     DISTSTYLE EVEN
    ...     ;
    ... """).run()
    >>> results
    []
    

    It'd be great if we could get the information about encoding, sortkey, distkey, etc. with a "redshift" output mode.

    opened by dargueta 5
  • In the Bigquery apostrophes to delimiter decription string not work

    In the Bigquery apostrophes to delimiter decription string not work

    Describe the bug In the Bigquery when we use apostrophes to delimiter description string in options make parse fail

    To Reproduce This fail CREATE TABLE data.test ( col STRING OPTIONS(description='test') ) OPTIONS(description='test'); and this work

    CREATE TABLE data.test ( col STRING OPTIONS(description="test") ) OPTIONS(description="test");

    Expected behavior Both statements should parsed

    opened by lmarqueto 0
  • Oracle create table with default value is not working

    Oracle create table with default value is not working

    Following create table statement is not recognized by the parser, which is valid statement for Oracle database.

    from simple_ddl_parser import DDLParser
    DDLParser("create table test ( col varchar2(30 char) default user not null);").run()
    

    result is empty list.

    python version 3.9.5 simple-ddl-parser==0.29.0

    opened by ancc 0
  • Question: Does this library output to JSON schema?

    Question: Does this library output to JSON schema?

    Hi I was wondering if the JSON it outputs is compatible with https://json-schema.org/specification.html

    My use case is that I want to validate that incoming records match the redshift table schema

    for-the-future 
    opened by josh-andi 1
  • Column comments not assiciated to column names

    Column comments not assiciated to column names

    Describe the bug A DDL containing COMMENT ON COLUMN does not associate the comment to the column

    To Reproduce run this:

    #!/usr/bin/env python3
    # -*- coding: utf-8 -*-
    from simple_ddl_parser import DDLParser
    from pprint import pprint
    
    parse_results = DDLParser("""-- something.t_spiel definition
    
    -- Drop table
    
    -- DROP TABLE something.t_spiel;
    
    CREATE TABLE something.t_spiel (
    	id varchar(5) NOT NULL,
    	refprodid varchar(10) NULL,
    	titel varchar(100) NOT NULL,
    	refsaalid varchar(10) NULL, -- Verweis auf den Saal
    	freieplatzwahl bool NOT NULL DEFAULT true, -- Ja/Nein
    	CONSTRAINT idx_t_spiel_primary PRIMARY KEY (id)
    );
    
    -- Column comments
    
    -- COMMENT ON COLUMN something.t_spiel.refsaalid IS 'Verweis auf den Saal';
    -- COMMENT ON COLUMN something.t_spiel.freieplatzwahl IS 'Ja/Nein';
    
    -- something.t_spiel foreign keys
    
    ALTER TABLE something.t_spiel ADD CONSTRAINT "foreign_key_Ref_Prod_Id" FOREIGN KEY (refprodid) REFERENCES something.t_produktion(id);
    ALTER TABLE something.t_spiel ADD CONSTRAINT "foreign_key_Ref_Saal_Id" FOREIGN KEY (refsaalid) REFERENCES something.t_saal(id);
    """,silent=True).run(output_mode='sql')
    
    pprint(parse_results)
    
    

    This code produces ... {'comments': [' Verweis auf den Saal', ' Ja/Nein']}...

    which is just a collection of comments w/o reference to the fields they belong to.

    Expected behavior something like {'comments': {'something.t_spiel.refsaalid': ' Verweis auf den Saal', 'something.t_spiel.freieplatzwahl': ' Ja/Nein'}}... should be generated, or even better for postprocessing, the comment should be added to the field description: {'check': None, 'default': None, 'name': 'refsaalid', ... 'comment': 'Verweis auf den Saal' },

    Desktop (please complete the following information):

    • Database: Postgresql
    • DDL created with DBeaver
    opened by erwin-frohsinn 2
  • Some of multiple statements in a DDL file/string are ignored

    Some of multiple statements in a DDL file/string are ignored

    Describe the bug Having DDL in a file with multiple statements like in the example below will ignore the COMMENT line, so in the parsed result there will be no information about comment.

    To Reproduce

    CREATE OR REPLACE TABLE mytable (
        field_1 NUMBER(38, 0) NOT NULL,
        field_2 VARCHAR(250)
    );
    
    COMMENT ON COLUMN field_1 IS 'some comment';
    

    Expected behavior All the statements from DDL file/string parsed correctly and included in the result.

    opened by devxor 0
  • array parsing in hql failes

    array parsing in hql failes

    Hi Found this bug provided test:

       ddl = """
        CREATE TABLE IF NOT EXISTS default.salesorderdetail(
                column_abc ARRAY<string>,
                )
        """
        result = DDLParser(ddl, silent=False, normalize_names=True).run(output_mode="hql", group_by_type=True)
        expected = {
            "tables": [
                {
                    "if_not_exists": True,
                    "columns": [
                        {
                            "name": "column_abc",
                            "type": "ARRAY <string>",
                            "size": None,
                            "references": None,
                            "unique": False,
                            "nullable": True,
                            "default": None,
                            "check": None,
                        }]
                }]
        }
    
        assert expected == result
    

    Error Message:

    p = LexToken(RT,'ARRAY',1,65)

    def p_error(self, p):
        if not self.silent:
    
          raise DDLParserError(f"Unknown statement at {p}")
    

    E simple_ddl_parser.ddl_parser.DDLParserError: Unknown statement at LexToken(RT,'ARRAY',1,65)

    opened by herokri 1
Releases(v0.29.0)
  • v0.29.0(Nov 20, 2022)

    v0.29.0

    Fixes

    1. AUTOINCREMENT statement now parsed validly same way as AUTO_INCREMENT and showed up in output as 'autoincrement' property of the column Fix for: https://github.com/xnuinside/simple-ddl-parser/issues/170
    2. Fix issue ' TypeError argument of type 'NoneType' is not iterable' on some foreigen keys https://github.com/xnuinside/simple-ddl-parser/issues/148

    New Features

    1. Support for non-numeric column type parameters https://github.com/xnuinside/simple-ddl-parser/issues/171 It shows in column attribute 'type_parameters'.
    Source code(tar.gz)
    Source code(zip)
  • v0.28.1(Oct 31, 2022)

    v0.28.1

    Imporvements:

    1. Lines started with INSERT INTO statement now successfully ignored by parser (so you can keep them in ddl - they will be just skiped)

    Fixes:

    1. Important fix for multiline comments
    Source code(tar.gz)
    Source code(zip)
  • v0.28.0(Oct 30, 2022)

    v0.28.0

    Important Changes (Pay attention):

    1. Because of parsing now AUTO_INCREMENT as a separate property of column previous output changed. Previously it was parsed as a part of type like: 'INT AUTO_INCREMENT'. Now type will be only 'INT', but in column property you will see 'autoincrement': True.

    Amazing innovations:

    1. It's is weird to write in Changelog, but only in version 0.28.0 I recognise that floats that not supported by parser & it was fixed. Thanks for the sample in the issue: https://github.com/xnuinside/simple-ddl-parser/issues/163

    Improvements:

    MariaDB:

    1. Added support for MariaDB AUTO_INCREMENT (from ddl here - https://github.com/xnuinside/simple-ddl-parser/issues/144) If column is Auto Incremented - it indicated as 'autoincrement': True in column definition

    Common:

    1. Added parsing for multiline comments in DDL with /* */ syntax.
    2. Comments from DDL now all placed in 'comments' keyword if you use group_by_type= arg in parser.
    3. Added argument 'parser_settings={}' (dict type) in method parse_from_file() - this way you can pass any arguments that you want to DDLParser (& that supported by it) So, if you want to set log_level=logging.WARNING for parser - just use it as: parse_from_file('path_to_file', parser_settings={'log_level': logging.WARNING}). For issue: https://github.com/xnuinside/simple-ddl-parser/issues/160
    Source code(tar.gz)
    Source code(zip)
  • v0.27.0(Aug 8, 2022)

    v0.27.0

    Fixes:

    1. Fixed parsing CHECKS with IN statement - https://github.com/xnuinside/simple-ddl-parser/issues/150
    2. @# symbols added to ID token - (partialy) https://github.com/xnuinside/simple-ddl-parser/issues/146

    Improvements:

    1. Added support for '*' in size column (ORACLE dialect) - https://github.com/xnuinside/simple-ddl-parser/issues/151
    2. Added arg 'debug' to parser, works same way as 'silent' - to get more clear error output.

    New features:

    1. Added support for ORACLE 'ORGANIZATION INDEX'
    2. Added support for SparkSQL Partition by with procedure call - https://github.com/xnuinside/simple-ddl-parser/issues/154
    3. Added support for DEFAULT CHARSET statement MySQL - https://github.com/xnuinside/simple-ddl-parser/issues/153
    Source code(tar.gz)
    Source code(zip)
  • v0.26.3(Jun 7, 2022)

    Improvements:

    1. Added support for OR REPLACE in CREATE TABLE: https://github.com/xnuinside/simple-ddl-parser/issues/131
    2. Added support for AUTO INCREMENT in column: https://github.com/xnuinside/simple-ddl-parser/issues/130
    Source code(tar.gz)
    Source code(zip)
  • v0.26.2(May 6, 2022)

    v0.26.2

    Fixes:

    1. Fixed a huge bug for incorrect parsing lines with 'USE' & 'GO' strings inside.
    2. Fixed parsing for CREATE SCHEMA for Snowlake & Oracle DDLs

    Improvements:

    1. Added COMMENT statement for CREATE TABLE ddl (for SNOWFLAKE dialect support)
    2. Added COMMENT statement for CREATE SCHEMA ddl (for SNOWFLAKE dialect support)
    Source code(tar.gz)
    Source code(zip)
  • v0.26.1(Apr 24, 2022)

    Fixes:

    1. support Multiple SERDEPROPERTIES - https://github.com/xnuinside/simple-ddl-parser/issues/126
    2. Fix for issue with LOCATION and TBLPROPERTIES clauses in CREATE TABLE LIKE - https://github.com/xnuinside/simple-ddl-parser/issues/125
    3. LOCATION now works correctly with double quote strings
    Source code(tar.gz)
    Source code(zip)
  • v0.26.0(Mar 29, 2022)

    Improvements:

    1. Added more explicit debug message on Statement errors - https://github.com/xnuinside/simple-ddl-parser/issues/116
    2. Added support for "USING INDEX TABLESPACE" statement in ALTER - https://github.com/xnuinside/simple-ddl-parser/issues/119
    3. Added support for IN statements in CHECKS - https://github.com/xnuinside/simple-ddl-parser/issues/121

    New features:

    1. Support SparkSQL USING - https://github.com/xnuinside/simple-ddl-parser/issues/117 Updates initiated by ticket https://github.com/xnuinside/simple-ddl-parser/issues/120:
    2. In Parser you can use argument json_dump=True in method .run() if you want get result in JSON format.
    • README updated

    Fixes:

    1. Added support for PARTITION BY one column without type
    2. Alter table add constraint PRIMARY KEY - https://github.com/xnuinside/simple-ddl-parser/issues/119
    3. Fix for paring SET statement - https://github.com/xnuinside/simple-ddl-parser/pull/122
    4. Fix for disappeared colums without properties - https://github.com/xnuinside/simple-ddl-parser/issues/123
    Source code(tar.gz)
    Source code(zip)
  • v0.25.0(Feb 7, 2022)

    v0.25.0

    Fixes:

    1. Fix for issue with 'at time zone' https://github.com/xnuinside/simple-ddl-parser/issues/112

    New features:

    1. Added flag to raise errors if parser cannot parse statement DDLParser(.., silent=False) - https://github.com/xnuinside/simple-ddl-parser/issues/109
    2. Added flag to DDLParser(.., normalize_names=True) that change output of parser: if flag is True (default 'False') then all identifiers will be returned without '[', '"' and other delimeters that used in different SQL dialects to separate custom names from reserverd words & statements. For example, if flag set 'True' and you pass this input:

    CREATE TABLE [dbo].[TO_Requests]( [Request_ID] [int] IDENTITY(1,1) NOT NULL, [user_id] [int]

    In output you will have names like 'dbo' and 'TO_Requests', not '[dbo]' and '[TO_Requests]'.

    Source code(tar.gz)
    Source code(zip)
  • v0.24.1(Jan 6, 2022)

    Fixes:

    HQL:

    1. fields_terminated_by now parses , as "','", not as '' previously

    Common:

    1. To output added 'if_not_exists' field in result to get availability 1-to-1 re-create ddl by metadata.
    Source code(tar.gz)
    Source code(zip)
  • v0.24.0(Jan 4, 2022)

    Fixes:

    HQL:

    1. More then 2 tblproperties now are parsed correctly https://github.com/xnuinside/simple-ddl-parser/pull/104

    Common:

    1. 'set' in lower case now also parsed validly.
    2. Now names like 'schema', 'database', 'table' can be used as names in CREATE DABASE | SCHEMA | TABLESPACE | DOMAIN | TYPE statements and after INDEX and CONSTRAINT.
    3. Creation of empty tables also parsed correctly (like CREATE Table table;).

    New Statements Support:

    HQL:

    1. Added support for CLUSTERED BY - https://github.com/xnuinside/simple-ddl-parser/issues/103
    2. Added support for INTO ... BUCKETS
    3. CREATE REMOTE DATABASE | SCHEMA
    Source code(tar.gz)
    Source code(zip)
  • v0.23.0(Dec 24, 2021)

    v0.23.0

    Big refactoring: less code complexity & increase code coverage. Radon added to pre-commit hooks.

    Fixes:

    1. Fix for issue with ALTER UNIQUE - https://github.com/xnuinside/simple-ddl-parser/issues/101

    New Features

    1. SQL Comments string from DDL now parsed to "comments" key in output.

    PostgreSQL:

    1. Added support for ALTER TABLE ONLY | ALTER TABLE IF EXISTS
    Source code(tar.gz)
    Source code(zip)
  • v0.22.6(Nov 27, 2021)

    v0.22.6

    Fixes:

    1. Fix for: https://github.com/xnuinside/simple-ddl-parser/issues/99 Issue with pg_dump output: Now SETs with = and ; in them are parsed normally.

    Improvements:

    1. Added support for types with 3+ words like PostgreSQL "timestamp without time zone"
    2. Added support for SEQUENCE properties with 'NO' word like NO MAXVALUE. It's parsed now as 'maxvalue': False}
    Source code(tar.gz)
    Source code(zip)
  • v0.22.4(Nov 20, 2021)

  • v0.22.1(Nov 16, 2021)

    v0.22.1

    New Features:

    BigQuery:

    1. Added support for OPTION for full CREATE TABLE statement & column definition

    Improvements:

    1. CLUSTED BY can be used without ()
    Source code(tar.gz)
    Source code(zip)
  • v0.22.0(Nov 14, 2021)

    v0.22.0

    New Features:

    BigQuery:

    I started to add partial support for BigQuery

    1. Added support for OPTIONS in CREATE SCHEMA statement

    MSSQL:

    1. Added support for PRIMARY KEY CLUSTERED - full details about clusterisation are parsed now in separate key 'clustered_primary_key'. I don't like that but when I started I did not thought about all those details, so in version 1.0.* I will work on more beutiful and logically output structure. https://github.com/xnuinside/simple-ddl-parser/issues/91

    Pay attention: previously they parsed somehow, but in incorrect structure.

    Improvements:

    1. Strings in double quotes moved as separate token from ID to fix a lot of issues with strings with spaces inside
    2. Now parser can parse statements separated by new line also (without GO or ; at the end of statement) - https://github.com/xnuinside/simple-ddl-parser/issues/90

    Fixes:

    1. Now open strings is not valid in checks (previously the was parsed.) Open string sample 'some string (exist open quote, but there is no close quote)
    2. Order like ASC, DESK in primary keys now parsed valid (not as previously as column name)
    Source code(tar.gz)
    Source code(zip)
  • v0.21.0(Oct 6, 2021)

    v0.21.0

    New Features:

    MSSQL:

    1. Added support for statements:
      1. PERIOD FOR SYSTEM_TIME in CREATE TABLE statement
      2. ON [PRIMARY] after CREATE TABLE statement (sample in test files test_mssql_specific.py)
      3. WITH statement for TABLE properties
      4. TEXTIMAGE_ON statement
      5. DEFAULT NEXT VALUE FOR in COLUMN DEFAULT

    Common:

    1. Added support for separating tables DDL by 'GO' statement as in output of MSSQL
    2. Added support for CREATE TYPE as TABLE
    Source code(tar.gz)
    Source code(zip)
  • v0.20.0(Oct 3, 2021)

    v0.20.0

    New Features:

    Common

    1. SET statements from DDL scripts now collected as type 'ddl_properties' (if you use group_by_type=True) and parsed as dicts with 2 keys inside {'name': 'property name', 'value': 'property value'}

    MySQL

    1. Added support for MySQL ON UPDATE statements in column (without REFERENCE)

    MSSQL

    1. Added support for CONSTRAINT [CLUSTERED]... PRIMARY KEY for Table definition
    2. Added support for WITH statement in CONSTRAINT (Table definition)
    Source code(tar.gz)
    Source code(zip)
  • v0.19.7(Sep 18, 2021)

    v0.19.7

    Fixes:

    1. Add support for more special symbols to strings - https://github.com/xnuinside/simple-ddl-parser/issues/68

    Features:

    1. Added support for HQL statements: STORED AS INPUTFORMAT, OUTPUTFORMAT - https://github.com/xnuinside/simple-ddl-parser/issues/69 SKEWED BY
    Source code(tar.gz)
    Source code(zip)
  • v0.19.6(Sep 17, 2021)

    v0.19.6

    Fixes:

    1. Fixed issue with PARTITIONED BY multiple columns in HQL - https://github.com/xnuinside/simple-ddl-parser/issues/66
    2. Question symbol '?' now handled valid in strings - https://github.com/xnuinside/simple-ddl-parser/issues/64
    3. Fixed issue with escaping symbols & added tests -https://github.com/xnuinside/simple-ddl-parser/issues/63

    Features:

    1. Added support for HQL statement TBLPROPERTIES - https://github.com/xnuinside/simple-ddl-parser/issues/65
    Source code(tar.gz)
    Source code(zip)
  • v0.19.5(Sep 16, 2021)

    v0.19.5 Fixes:

    1. Fixed issues with COMMENT statement in column definitions. Added bunch of tests, now they expect working ok.
    

    DDL like:

    CREATE EXTERNAL TABLE test ( job_id STRING COMMENT 't# est | & * % $ // * 6 % !' ) STORED AS PARQUET LOCATION 'hdfs://test'

    Now parsed valid.

    Source code(tar.gz)
    Source code(zip)
  • v0.19.0(Aug 5, 2021)

    v0.19.0

    Features

    1. Added support for base Snowflake SQL Dialect. Added new --output-mode='snowflake' (add "clone" key)

    Added support for CREATE .. CLONE with same behaviour as CREATE .. LIKE Added support for CREATE .. CLONE for schemas and database - displayed in output as {"clone": {"from": ... }} CREATE TABLE .. CLUSTER BY .. CONSTRAINT .. [NOT] ENFORCED (value stored in 'primary_key_enforced')

    1. in CREATE DATABASE properties that goes after name like key=value now parsed valid. Check examples in tests
    2. Added support for varchar COLLATE column property
    Source code(tar.gz)
    Source code(zip)
  • v0.18.0(Aug 2, 2021)

    Features

    1. Added base support fot AWS Redshift SQL dialect. Added support for ENCODE property in column. Added new --output-mode='redshift' that add to column 'encrypt' property by default. Also add table properties: distkeys, sortkey, diststyle, encode (table level encode), temp.

    Supported Redshift statements: SORTKEY, DISTSTYLE, DISTKEY, ENCODE

    CREATE TEMP / TEMPORARY TABLE

    syntax like with LIKE statement:

    create temp table tempevent(like event);

    Source code(tar.gz)
    Source code(zip)
  • v0.17.0(Jul 2, 2021)

    1. All dependencies were updated for the latest version.
    2. Added base support for CREATE [BIGFILE | SMALLFILE] [TEMPORARY] TABLESPACE
    3. Added support for create table properties like TABLESPACE user_data ENABLE STORAGE IN ROW CHUNK 8K RETENTION CACHE
    4. Added support for CREATE DATABASE statement
    Source code(tar.gz)
    Source code(zip)
Owner
Iuliia Volkova
Developer https://twitter.com/xnuinside | https://www.linkedin.com/in/xnuinside
Iuliia Volkova
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

AWS Data Wrangler Pandas on AWS Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretMana

Amazon Web Services - Labs 3.3k Dec 31, 2022
dask-sql is a distributed SQL query engine in python using Dask

dask-sql is a distributed SQL query engine in Python. It allows you to query and transform your data using a mixture of common SQL operations and Python code and also scale up the calculation easily if you need it.

Nils Braun 271 Dec 30, 2022
A tiny python web application based on Flask to set, get, expire, delete keys of Redis database easily with direct link at the browser.

First Redis Python (CRUD) A tiny python web application based on Flask to set, get, expire, delete keys of Redis database easily with direct link at t

Max Base 9 Dec 24, 2022
A simple python package that perform SQL Server Source Control and Auto Deployment.

deploydb Deploy your database objects automatically when the git branch is updated. Production-ready! ⚙️ Easy-to-use ?? Customizable ?? Installation I

Mert Güvençli 10 Dec 7, 2022
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

dbd: database prototyping tool dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL d

Zdenek Svoboda 47 Dec 7, 2022
PyRemoteSQL is a python SQL client that allows you to connect to your remote server with phpMyAdmin installed.

PyRemoteSQL Python MySQL remote client Basically this is a python SQL client that allows you to connect to your remote server with phpMyAdmin installe

ProbablyX 3 Nov 4, 2022
Python script to clone SQL dashboard from one workspace to another

Databricks dashboard clone Unofficial project to allow Databricks SQL dashboard copy from one workspace to another. Resource clone Setup: Create a fil

Quentin Ambard 12 Jan 1, 2023
Databank is an easy-to-use Python library for making raw SQL queries in a multi-threaded environment.

Databank Databank is an easy-to-use Python library for making raw SQL queries in a multi-threaded environment. No ORM, no frills. Thread-safe. Only ra

snapADDY GmbH 4 Apr 4, 2022
Estoult - a Python toolkit for data mapping with an integrated query builder for SQL databases

Estoult Estoult is a Python toolkit for data mapping with an integrated query builder for SQL databases. It currently supports MySQL, PostgreSQL, and

halcyon[nouveau] 15 Dec 29, 2022
Simplest SQL mapper in Python, probably

SQL MAPPER Basically what it does is: it executes some SQL thru a database connector you fed it, maps it to some model and gives to u. Also it can cre

null 2 Nov 7, 2022
Makes it easier to write raw SQL in Python.

CoolSQL Makes it easier to write raw SQL in Python. Usage Quick Start from coolsql import Field name = Field("name") age = Field("age") condition =

Aber 7 Aug 21, 2022
Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.

dataset: databases for lazy people In short, dataset makes reading and writing data in databases as simple as reading and writing JSON files. Read the

Friedrich Lindenberg 4.2k Jan 2, 2023
Anomaly detection on SQL data warehouses and databases

With CueObserve, you can run anomaly detection on data in your SQL data warehouses and databases. Getting Started Install via Docker docker run -p 300

Cuebook 171 Dec 18, 2022
edaSQL is a library to link SQL to Exploratory Data Analysis and further more in the Data Engineering.

edaSQL is a python library to bridge the SQL with Exploratory Data Analysis where you can connect to the Database and insert the queries. The query results can be passed to the EDA tool which can give greater insights to the user.

Tamil Selvan 8 Dec 12, 2022
A pandas-like deferred expression system, with first-class SQL support

Ibis: Python data analysis framework for Hadoop and SQL engines Service Status Documentation Conda packages PyPI Azure Coverage Ibis is a toolbox to b

Ibis Project 2.3k Jan 6, 2023
SQL for Humans™

Records: SQL for Humans™ Records is a very simple, but powerful, library for making raw SQL queries to most relational databases. Just write SQL. No b

Kenneth Reitz 6.9k Jan 7, 2023
SQL for Humans™

Records: SQL for Humans™ Records is a very simple, but powerful, library for making raw SQL queries to most relational databases. Just write SQL. No b

Ken Reitz 6.9k Jan 3, 2023
SQL queries to collections

SQC SQL Queries to Collections Examples from sqc import sqc data = [ {"a": 1, "b": 1}, {"a": 2, "b": 1}, {"a": 3, "b": 2}, ] Simple filte

Alexander Volkovsky 0 Jul 6, 2022
Some scripts for microsoft SQL server in old version.

MSSQL_Stuff Some scripts for microsoft SQL server which is in old version. Table of content Overview Usage References Overview These script works when

小离 5 Dec 29, 2022