PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.

Overview

PyPika - Python Query Builder

BuildStatus CoverageStatus Codacy Docs PyPi License

Abstract

What is PyPika?

PyPika is a Python API for building SQL queries. The motivation behind PyPika is to provide a simple interface for building SQL queries without limiting the flexibility of handwritten SQL. Designed with data analysis in mind, PyPika leverages the builder design pattern to construct queries to avoid messy string formatting and concatenation. It is also easily extended to take full advantage of specific features of SQL database vendors.

What are the design goals for PyPika?

PyPika is a fast, expressive and flexible way to replace handwritten SQL (or even ORM for the courageous souls amongst you). Validation of SQL correctness is not an explicit goal of PyPika. With such a large number of SQL database vendors providing a robust validation of input data is difficult. Instead you are encouraged to check inputs you provide to PyPika or appropriately handle errors raised from your SQL database - just as you would have if you were writing SQL yourself.

Read the docs: http://pypika.readthedocs.io/en/latest/

Installation

PyPika supports python 3.6+. It may also work on pypy, cython, and jython, but is not being tested for these versions.

To install PyPika run the following command:

pip install pypika

Tutorial

The main classes in pypika are pypika.Query, pypika.Table, and pypika.Field.

from pypika import Query, Table, Field

Selecting Data

The entry point for building queries is pypika.Query. In order to select columns from a table, the table must first be added to the query. For simple queries with only one table, tables and columns can be references using strings. For more sophisticated queries a pypika.Table must be used.

q = Query.from_('customers').select('id', 'fname', 'lname', 'phone')

To convert the query into raw SQL, it can be cast to a string.

str(q)

Alternatively, you can use the Query.get_sql() function:

q.get_sql()

Tables, Columns, Schemas, and Databases

In simple queries like the above example, columns in the "from" table can be referenced by passing string names into the select query builder function. In more complex examples, the pypika.Table class should be used. Columns can be referenced as attributes on instances of pypika.Table.

from pypika import Table, Query

customers = Table('customers')
q = Query.from_(customers).select(customers.id, customers.fname, customers.lname, customers.phone)

Both of the above examples result in the following SQL:

SELECT id,fname,lname,phone FROM customers

An alias for the table can be given using the .as_ function on pypika.Table

Table('x_view_customers').as_('customers')
q = Query.from_(customers).select(customers.id, customers.phone)
SELECT id,phone FROM x_view_customers customers

A schema can also be specified. Tables can be referenced as attributes on the schema.

from pypika import Table, Query, Schema

views = Schema('views')
q = Query.from_(views.customers).select(customers.id, customers.phone)
SELECT id,phone FROM views.customers

Also references to databases can be used. Schemas can be referenced as attributes on the database.

from pypika import Table, Query, Database

my_db = Database('my_db')
q = Query.from_(my_db.analytics.customers).select(customers.id, customers.phone)
SELECT id,phone FROM my_db.analytics.customers

Results can be ordered by using the following syntax:

from pypika import Order
Query.from_('customers').select('id', 'fname', 'lname', 'phone').orderby('id', order=Order.desc)

This results in the following SQL:

SELECT "id","fname","lname","phone" FROM "customers" ORDER BY "id" DESC

Arithmetic

Arithmetic expressions can also be constructed using pypika. Operators such as +, -, *, and / are implemented by pypika.Field which can be used simply with a pypika.Table or directly.

from pypika import Field

q = Query.from_('account').select(
    Field('revenue') - Field('cost')
)
SELECT revenue-cost FROM accounts

Using pypika.Table

accounts = Table('accounts')
q = Query.from_(accounts).select(
    accounts.revenue - accounts.cost
)
SELECT revenue-cost FROM accounts

An alias can also be used for fields and expressions.

q = Query.from_(accounts).select(
    (accounts.revenue - accounts.cost).as_('profit')
)
SELECT revenue-cost profit FROM accounts

More arithmetic examples

table = Table('table')
q = Query.from_(table).select(
    table.foo + table.bar,
    table.foo - table.bar,
    table.foo * table.bar,
    table.foo / table.bar,
    (table.foo+table.bar) / table.fiz,
)
SELECT foo+bar,foo-bar,foo*bar,foo/bar,(foo+bar)/fiz FROM table

Filtering

Queries can be filtered with pypika.Criterion by using equality or inequality operators

customers = Table('customers')
q = Query.from_(customers).select(
    customers.id, customers.fname, customers.lname, customers.phone
).where(
    customers.lname == 'Mustermann'
)
SELECT id,fname,lname,phone FROM customers WHERE lname='Mustermann'

Query methods such as select, where, groupby, and orderby can be called multiple times. Multiple calls to the where method will add additional conditions as

customers = Table('customers')
q = Query.from_(customers).select(
    customers.id, customers.fname, customers.lname, customers.phone
).where(
    customers.fname == 'Max'
).where(
    customers.lname == 'Mustermann'
)
SELECT id,fname,lname,phone FROM customers WHERE fname='Max' AND lname='Mustermann'

Filters such as IN and BETWEEN are also supported

customers = Table('customers')
q = Query.from_(customers).select(
    customers.id,customers.fname
).where(
    customers.age[18:65] & customers.status.isin(['new', 'active'])
)
SELECT id,fname FROM customers WHERE age BETWEEN 18 AND 65 AND status IN ('new','active')

Filtering with complex criteria can be created using boolean symbols &, |, and ^.

AND

customers = Table('customers')
q = Query.from_(customers).select(
    customers.id, customers.fname, customers.lname, customers.phone
).where(
    (customers.age >= 18) & (customers.lname == 'Mustermann')
)
SELECT id,fname,lname,phone FROM customers WHERE age>=18 AND lname='Mustermann'

OR

customers = Table('customers')
q = Query.from_(customers).select(
    customers.id, customers.fname, customers.lname, customers.phone
).where(
    (customers.age >= 18) | (customers.lname == 'Mustermann')
)
SELECT id,fname,lname,phone FROM customers WHERE age>=18 OR lname='Mustermann'

XOR

customers = Table('customers')
q = Query.from_(customers).select(
    customers.id, customers.fname, customers.lname, customers.phone
).where(
    (customers.age >= 18) ^ customers.is_registered
)
SELECT id,fname,lname,phone FROM customers WHERE age>=18 XOR is_registered

Convenience Methods

In the Criterion class, there are the static methods any and all that allow building chains AND and OR expressions with a list of terms.

from pypika import Criterion

customers = Table('customers')
q = Query.from_(customers).select(
    customers.id,
    customers.fname
).where(
    Criterion.all([
        customers.is_registered,
        customers.age >= 18,
        customers.lname == "Jones",
    ])
)
SELECT id,fname FROM customers WHERE is_registered AND age>=18 AND lname = "Jones"

Grouping and Aggregating

Grouping allows for aggregated results and works similar to SELECT clauses.

from pypika import functions as fn

customers = Table('customers')
q = Query \
    .from_(customers) \
    .where(customers.age >= 18) \
    .groupby(customers.id) \
    .select(customers.id, fn.Sum(customers.revenue))
SELECT id,SUM("revenue") FROM "customers" WHERE "age">=18 GROUP BY "id"

After adding a GROUP BY clause to a query, the HAVING clause becomes available. The method Query.having() takes a Criterion parameter similar to the method Query.where().

from pypika import functions as fn

payments = Table('payments')
q = Query \
    .from_(payments) \
    .where(payments.transacted[date(2015, 1, 1):date(2016, 1, 1)]) \
    .groupby(payments.customer_id) \
    .having(fn.Sum(payments.total) >= 1000) \
    .select(payments.customer_id, fn.Sum(payments.total))
SELECT customer_id,SUM(total) FROM payments
WHERE transacted BETWEEN '2015-01-01' AND '2016-01-01'
GROUP BY customer_id HAVING SUM(total)>=1000

Joining Tables and Subqueries

Tables and subqueries can be joined to any query using the Query.join() method. Joins can be performed with either a USING or ON clauses. The USING clause can be used when both tables/subqueries contain the same field and the ON clause can be used with a criterion. To perform a join, ...join() can be chained but then must be followed immediately by ...on(<criterion>) or ...using(*field).

Join Types

All join types are supported by PyPika.

Query \
    .from_(base_table)
    ...
    .join(join_table, JoinType.left)
    ...
Query \
    .from_(base_table)
    ...
    .left_join(join_table) \
    .left_outer_join(join_table) \
    .right_join(join_table) \
    .right_outer_join(join_table) \
    .inner_join(join_table) \
    .outer_join(join_table) \
    .full_outer_join(join_table) \
    .cross_join(join_table) \
    .hash_join(join_table) \
    ...

See the list of join types here pypika.enums.JoinTypes

Example of a join using ON
history, customers = Tables('history', 'customers')
q = Query \
    .from_(history) \
    .join(customers) \
    .on(history.customer_id == customers.id) \
    .select(history.star) \
    .where(customers.id == 5)
SELECT "history".* FROM "history" JOIN "customers" ON "history"."customer_id"="customers"."id" WHERE "customers"."id"=5

As a shortcut, the Query.join().on_field() function is provided for joining the (first) table in the FROM clause with the joined table when the field name(s) are the same in both tables.

Example of a join using ON
history, customers = Tables('history', 'customers')
q = Query \
    .from_(history) \
    .join(customers) \
    .on_field('customer_id', 'group') \
    .select(history.star) \
    .where(customers.group == 'A')
SELECT "history".* FROM "history" JOIN "customers" ON "history"."customer_id"="customers"."customer_id" AND "history"."group"="customers"."group" WHERE "customers"."group"='A'
Example of a join using USING
history, customers = Tables('history', 'customers')
q = Query \
    .from_(history) \
    .join(customers) \
    .using('customer_id') \
    .select(history.star) \
    .where(customers.id == 5)
SELECT "history".* FROM "history" JOIN "customers" USING "customer_id" WHERE "customers"."id"=5
Example of a correlated subquery in the SELECT
history, customers = Tables('history', 'customers')
last_purchase_at = Query.from_(history).select(
    history.purchase_at
).where(history.customer_id==customers.customer_id).orderby(
    history.purchase_at, order=Order.desc
).limit(1)
q = Query.from_(customers).select(
    customers.id, last_purchase_at.as_('last_purchase_at')
)
SELECT
  "id",
  (SELECT "history"."purchase_at"
   FROM "history"
   WHERE "history"."customer_id" = "customers"."customer_id"
   ORDER BY "history"."purchase_at" DESC
   LIMIT 1) "last_purchase_at"
FROM "customers"

Unions

Both UNION and UNION ALL are supported. UNION DISTINCT is synonomous with "UNION`` so and PyPika does not provide a separate function for it. Unions require that queries have the same number of SELECT clauses so trying to cast a unioned query to string with through a SetOperationException if the column sizes are mismatched.

To create a union query, use either the Query.union() method or + operator with two query instances. For a union all, use Query.union_all() or the * operator.

provider_a, provider_b = Tables('provider_a', 'provider_b')
q = Query.from_(provider_a).select(
    provider_a.created_time, provider_a.foo, provider_a.bar
) + Query.from_(provider_b).select(
    provider_b.created_time, provider_b.fiz, provider_b.buz
)
SELECT "created_time","foo","bar" FROM "provider_a" UNION SELECT "created_time","fiz","buz" FROM "provider_b"

Intersect

INTERSECT is supported. Intersects require that queries have the same number of SELECT clauses so trying to cast a intersected query to string with through a SetOperationException if the column sizes are mismatched.

To create a intersect query, use the Query.intersect() method.

provider_a, provider_b = Tables('provider_a', 'provider_b')
q = Query.from_(provider_a).select(
    provider_a.created_time, provider_a.foo, provider_a.bar
)
r = Query.from_(provider_b).select(
    provider_b.created_time, provider_b.fiz, provider_b.buz
)
intersected_query = q.intersect(r)
SELECT "created_time","foo","bar" FROM "provider_a" INTERSECT SELECT "created_time","fiz","buz" FROM "provider_b"

Minus

MINUS is supported. Minus require that queries have the same number of SELECT clauses so trying to cast a minus query to string with through a SetOperationException if the column sizes are mismatched.

To create a minus query, use either the Query.minus() method or - operator with two query instances.

provider_a, provider_b = Tables('provider_a', 'provider_b')
q = Query.from_(provider_a).select(
    provider_a.created_time, provider_a.foo, provider_a.bar
)
r = Query.from_(provider_b).select(
    provider_b.created_time, provider_b.fiz, provider_b.buz
)
minus_query = q.minus(r)

(or)

minus_query = Query.from_(provider_a).select(
    provider_a.created_time, provider_a.foo, provider_a.bar
) - Query.from_(provider_b).select(
    provider_b.created_time, provider_b.fiz, provider_b.buz
)
SELECT "created_time","foo","bar" FROM "provider_a" MINUS SELECT "created_time","fiz","buz" FROM "provider_b"

EXCEPT

EXCEPT is supported. Minus require that queries have the same number of SELECT clauses so trying to cast a except query to string with through a SetOperationException if the column sizes are mismatched.

To create a except query, use the Query.except_of() method.

provider_a, provider_b = Tables('provider_a', 'provider_b')
q = Query.from_(provider_a).select(
    provider_a.created_time, provider_a.foo, provider_a.bar
)
r = Query.from_(provider_b).select(
    provider_b.created_time, provider_b.fiz, provider_b.buz
)
minus_query = q.except_of(r)
SELECT "created_time","foo","bar" FROM "provider_a" EXCEPT SELECT "created_time","fiz","buz" FROM "provider_b"

Date, Time, and Intervals

Using pypika.Interval, queries can be constructed with date arithmetic. Any combination of intervals can be used except for weeks and quarters, which must be used separately and will ignore any other values if selected.

from pypika import functions as fn

fruits = Tables('fruits')
q = Query.from_(fruits) \
    .select(fruits.id, fruits.name) \
    .where(fruits.harvest_date + Interval(months=1) < fn.Now())
SELECT id,name FROM fruits WHERE harvest_date+INTERVAL 1 MONTH<NOW()

Tuples

Tuples are supported through the class pypika.Tuple but also through the native python tuple wherever possible. Tuples can be used with pypika.Criterion in WHERE clauses for pairwise comparisons.

from pypika import Query, Tuple

q = Query.from_(self.table_abc) \
    .select(self.table_abc.foo, self.table_abc.bar) \
    .where(Tuple(self.table_abc.foo, self.table_abc.bar) == Tuple(1, 2))
SELECT "foo","bar" FROM "abc" WHERE ("foo","bar")=(1,2)

Using pypika.Tuple on both sides of the comparison is redundant and PyPika supports native python tuples.

from pypika import Query, Tuple

q = Query.from_(self.table_abc) \
    .select(self.table_abc.foo, self.table_abc.bar) \
    .where(Tuple(self.table_abc.foo, self.table_abc.bar) == (1, 2))
SELECT "foo","bar" FROM "abc" WHERE ("foo","bar")=(1,2)

Tuples can be used in IN clauses.

Query.from_(self.table_abc) \
        .select(self.table_abc.foo, self.table_abc.bar) \
        .where(Tuple(self.table_abc.foo, self.table_abc.bar).isin([(1, 1), (2, 2), (3, 3)]))
SELECT "foo","bar" FROM "abc" WHERE ("foo","bar") IN ((1,1),(2,2),(3,3))

Strings Functions

There are several string operations and function wrappers included in PyPika. Function wrappers can be found in the pypika.functions package. In addition, LIKE and REGEX queries are supported as well.

from pypika import functions as fn

customers = Tables('customers')
q = Query.from_(customers).select(
    customers.id,
    customers.fname,
    customers.lname,
).where(
    customers.lname.like('Mc%')
)
SELECT id,fname,lname FROM customers WHERE lname LIKE 'Mc%'
from pypika import functions as fn

customers = Tables('customers')
q = Query.from_(customers).select(
    customers.id,
    customers.fname,
    customers.lname,
).where(
    customers.lname.regex(r'^[abc][a-zA-Z]+&')
)
SELECT id,fname,lname FROM customers WHERE lname REGEX '^[abc][a-zA-Z]+&';
from pypika import functions as fn

customers = Tables('customers')
q = Query.from_(customers).select(
    customers.id,
    fn.Concat(customers.fname, ' ', customers.lname).as_('full_name'),
)
SELECT id,CONCAT(fname, ' ', lname) full_name FROM customers

Custom Functions

Custom Functions allows us to use any function on queries, as some functions are not covered by PyPika as default, we can appeal to Custom functions.

from pypika import CustomFunction

customers = Tables('customers')
DateDiff = CustomFunction('DATE_DIFF', ['interval', 'start_date', 'end_date'])

q = Query.from_(customers).select(
    customers.id,
    customers.fname,
    customers.lname,
    DateDiff('day', customers.created_date, customers.updated_date)
)
SELECT id,fname,lname,DATE_DIFF('day',created_date,updated_date) FROM customers

Case Statements

Case statements allow fow a number of conditions to be checked sequentially and return a value for the first condition met or otherwise a default value. The Case object can be used to chain conditions together along with their output using the when method and to set the default value using else_.

from pypika import Case, functions as fn

customers = Tables('customers')
q = Query.from_(customers).select(
    customers.id,
    Case()
       .when(customers.fname == "Tom", "It was Tom")
       .when(customers.fname == "John", "It was John")
       .else_("It was someone else.").as_('who_was_it')
)
SELECT "id",CASE WHEN "fname"='Tom' THEN 'It was Tom' WHEN "fname"='John' THEN 'It was John' ELSE 'It was someone else.' END "who_was_it" FROM "customers"

With Clause

With clause allows give a sub-query block a name, which can be referenced in several places within the main SQL query. The SQL WITH clause is basically a drop-in replacement to the normal sub-query.

from pypika import Table, AliasedQuery, Query

customers = Table('customers')

sub_query = (Query
            .from_(customers)
            .select('*'))

test_query = (Query
            .with_(sub_query, "an_alias")
            .from_(AliasedQuery("an_alias"))
            .select('*'))

You can use as much as .with_() as you want.

WITH an_alias AS (SELECT * FROM "customers") SELECT * FROM an_alias

Inserting Data

Data can be inserted into tables either by providing the values in the query or by selecting them through another query.

By default, data can be inserted by providing values for all columns in the order that they are defined in the table.

Insert with values

customers = Table('customers')

q = Query.into(customers).insert(1, 'Jane', 'Doe', '[email protected]')
INSERT INTO customers VALUES (1,'Jane','Doe','[email protected]')
customers =  Table('customers')

q = customers.insert(1, 'Jane', 'Doe', '[email protected]')
INSERT INTO customers VALUES (1,'Jane','Doe','[email protected]')

Multiple rows of data can be inserted either by chaining the insert function or passing multiple tuples as args.

customers = Table('customers')

q = Query.into(customers).insert(1, 'Jane', 'Doe', '[email protected]').insert(2, 'John', 'Doe', '[email protected]')
customers = Table('customers')

q = Query.into(customers).insert((1, 'Jane', 'Doe', '[email protected]'),
                                 (2, 'John', 'Doe', '[email protected]'))

Insert with constraint violation handling

MySQL
customers = Table('customers')

q = Query.into(customers)\
    .insert(1, 'Jane', 'Doe', '[email protected]')\
    .on_duplicate_key_ignore())
INSERT INTO customers VALUES (1,'Jane','Doe','[email protected]') ON DUPLICATE KEY IGNORE
customers = Table('customers')

q = Query.into(customers)\
    .insert(1, 'Jane', 'Doe', '[email protected]')\
    .on_duplicate_key_update(customers.email, Values(customers.email))
INSERT INTO customers VALUES (1,'Jane','Doe','[email protected]') ON DUPLICATE KEY UPDATE `email`=VALUES(`email`)

.on_duplicate_key_update works similar to .set for updating rows, additionally it provides the Values wrapper to update to the value specified in the INSERT clause.

PostgreSQL
customers = Table('customers')

q = Query.into(customers)\
    .insert(1, 'Jane', 'Doe', '[email protected]')\
    .on_conflict(customers.email)
    .do_nothing()
INSERT INTO "abc" VALUES (1,'Jane','Doe','[email protected]') ON CONFLICT ("email") DO NOTHING
customers = Table('customers')

q = Query.into(customers)\
    .insert(1, 'Jane', 'Doe', '[email protected]')\
    .on_conflict(customers.email)
    .do_update(customers.email, '[email protected]')
INSERT INTO "customers" VALUES (1,'Jane','Doe','[email protected]') ON CONFLICT ("email") DO UPDATE SET "email"='[email protected]'

Insert from a SELECT Sub-query

INSERT INTO "customers" VALUES (1,'Jane','Doe','[email protected]'),(2,'John','Doe','[email protected]')

To specify the columns and the order, use the columns function.

customers = Table('customers')

q = Query.into(customers).columns('id', 'fname', 'lname').insert(1, 'Jane', 'Doe')
INSERT INTO customers (id,fname,lname) VALUES (1,'Jane','Doe','[email protected]')

Inserting data with a query works the same as querying data with the additional call to the into method in the builder chain.

customers, customers_backup = Tables('customers', 'customers_backup')

q = Query.into(customers_backup).from_(customers).select('*')
INSERT INTO customers_backup SELECT * FROM customers
customers, customers_backup = Tables('customers', 'customers_backup')

q = Query.into(customers_backup).columns('id', 'fname', 'lname')
    .from_(customers).select(customers.id, customers.fname, customers.lname)
INSERT INTO customers_backup SELECT "id", "fname", "lname" FROM customers

The syntax for joining tables is the same as when selecting data

customers, orders, orders_backup = Tables('customers', 'orders', 'orders_backup')

q = Query.into(orders_backup).columns('id', 'address', 'customer_fname', 'customer_lname')
    .from_(customers)
    .join(orders).on(orders.customer_id == customers.id)
    .select(orders.id, customers.fname, customers.lname)
INSERT INTO "orders_backup" ("id","address","customer_fname","customer_lname")
SELECT "orders"."id","customers"."fname","customers"."lname" FROM "customers"
JOIN "orders" ON "orders"."customer_id"="customers"."id"

Updating Data

PyPika allows update queries to be constructed with or without where clauses.

customers = Table('customers')

Query.update(customers).set(customers.last_login, '2017-01-01 10:00:00')

Query.update(customers).set(customers.lname, 'smith').where(customers.id == 10)
UPDATE "customers" SET "last_login"='2017-01-01 10:00:00'

UPDATE "customers" SET "lname"='smith' WHERE "id"=10

The syntax for joining tables is the same as when selecting data

customers, profiles = Tables('customers', 'profiles')

Query.update(customers)
     .join(profiles).on(profiles.customer_id == customers.id)
     .set(customers.lname, profiles.lname)
UPDATE "customers"
JOIN "profiles" ON "profiles"."customer_id"="customers"."id"
SET "customers"."lname"="profiles"."lname"

Using pypika.Table alias to perform the update

customers = Table('customers')

customers.update()
        .set(customers.lname, 'smith')
        .where(customers.id == 10)
UPDATE "customers" SET "lname"='smith' WHERE "id"=10

Using limit for performing update

customers = Table('customers')

customers.update()
        .set(customers.lname, 'smith')
        .limit(2)
UPDATE "customers" SET "lname"='smith' LIMIT 2

Parametrized Queries

PyPika allows you to use Parameter(str) term as a placeholder for parametrized queries.

customers = Table('customers')

q = Query.into(customers).columns('id', 'fname', 'lname')
    .insert(Parameter(':1'), Parameter(':2'), Parameter(':3'))
INSERT INTO customers (id,fname,lname) VALUES (:1,:2,:3)

This allows you to build prepared statements, and/or avoid SQL-injection related risks.

Due to the mix of syntax for parameters, depending on connector/driver, it is required that you specify the parameter token explicitly or use one of the specialized Parameter types per [PEP-0249](https://www.python.org/dev/peps/pep-0249/#paramstyle): QmarkParameter(), NumericParameter(int), NamedParameter(str), FormatParameter(), PyformatParameter(str)

An example of some common SQL parameter styles used in Python drivers are:

PostgreSQL:
$number OR %s + :name (depending on driver)
MySQL:
%s
SQLite:
?
Vertica:
:name
Oracle:
:number + :name
MSSQL:
%(name)s OR :name + :number (depending on driver)

You can find out what parameter style is needed for DBAPI compliant drivers here: https://www.python.org/dev/peps/pep-0249/#paramstyle or in the DB driver documentation.

Temporal support

Temporal criteria can be added to the tables.

Select

Here is a select using system time.

t = Table("abc")
q = Query.from_(t.for_(SYSTEM_TIME.as_of('2020-01-01'))).select("*")

This produces:

SELECT * FROM "abc" FOR SYSTEM_TIME AS OF '2020-01-01'

You can also use between.

t = Table("abc")
q = Query.from_(
    t.for_(SYSTEM_TIME.between('2020-01-01', '2020-02-01'))
).select("*")

This produces:

SELECT * FROM "abc" FOR SYSTEM_TIME BETWEEN '2020-01-01' AND '2020-02-01'

You can also use a period range.

t = Table("abc")
q = Query.from_(
    t.for_(SYSTEM_TIME.from_to('2020-01-01', '2020-02-01'))
).select("*")

This produces:

SELECT * FROM "abc" FOR SYSTEM_TIME FROM '2020-01-01' TO '2020-02-01'

Finally you can select for all times:

t = Table("abc")
q = Query.from_(t.for_(SYSTEM_TIME.all_())).select("*")

This produces:

SELECT * FROM "abc" FOR SYSTEM_TIME ALL

A user defined period can also be used in the following manner.

t = Table("abc")
q = Query.from_(
    t.for_(t.valid_period.between('2020-01-01', '2020-02-01'))
).select("*")

This produces:

SELECT * FROM "abc" FOR "valid_period" BETWEEN '2020-01-01' AND '2020-02-01'

Joins

With joins, when the table object is used when specifying columns, it is important to use the table from which the temporal constraint was generated. This is because Table("abc") is not the same table as Table("abc").for_(...). The following example demonstrates this.

t0 = Table("abc").for_(SYSTEM_TIME.as_of('2020-01-01'))
t1 = Table("efg").for_(SYSTEM_TIME.as_of('2020-01-01'))
query = (
    Query.from_(t0)
    .join(t1)
    .on(t0.foo == t1.bar)
    .select("*")
)

This produces:

SELECT * FROM "abc" FOR SYSTEM_TIME AS OF '2020-01-01'
JOIN "efg" FOR SYSTEM_TIME AS OF '2020-01-01'
ON "abc"."foo"="efg"."bar"

Update & Deletes

An update can be written as follows:

t = Table("abc")
q = Query.update(
    t.for_portion(
        SYSTEM_TIME.from_to('2020-01-01', '2020-02-01')
    )
).set("foo", "bar")

This produces:

UPDATE "abc"
FOR PORTION OF SYSTEM_TIME FROM '2020-01-01' TO '2020-02-01'
SET "foo"='bar'

Here is a delete:

t = Table("abc")
q = Query.from_(
    t.for_portion(t.valid_period.from_to('2020-01-01', '2020-02-01'))
).delete()

This produces:

DELETE FROM "abc"
FOR PORTION OF "valid_period" FROM '2020-01-01' TO '2020-02-01'

Creating Tables

The entry point for creating tables is pypika.Query.create_table, which is used with the class pypika.Column. As with selecting data, first the table should be specified. This can be either a string or a pypika.Table. Then the columns, and constraints. Here's an example that demonstrates much of the functionality.

stmt = Query \
    .create_table("person") \
    .columns(
        Column("id", "INT", nullable=False),
        Column("first_name", "VARCHAR(100)", nullable=False),
        Column("last_name", "VARCHAR(100)", nullable=False),
        Column("phone_number", "VARCHAR(20)", nullable=True),
        Column("status", "VARCHAR(20)", nullable=False, default=ValueWrapper("NEW")),
        Column("date_of_birth", "DATETIME")) \
    .unique("last_name", "first_name") \
    .primary_key("id")

This produces:

CREATE TABLE "person" (
    "id" INT NOT NULL,
    "first_name" VARCHAR(100) NOT NULL,
    "last_name" VARCHAR(100) NOT NULL,
    "phone_number" VARCHAR(20) NULL,
    "status" VARCHAR(20) NOT NULL DEFAULT 'NEW',
    "date_of_birth" DATETIME,
    UNIQUE ("last_name","first_name"),
    PRIMARY KEY ("id")
)

There is also support for creating a table from a query.

stmt = Query.create_table("names").as_select(
    Query.from_("person").select("last_name", "first_name")
)

This produces:

CREATE TABLE "names" AS (SELECT "last_name","first_name" FROM "person")

License

Copyright 2020 KAYAK Germany, GmbH

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Crafted with in Berlin.

Comments
  • Basic parametrised query generation (WIP)

    Basic parametrised query generation (WIP)

    This PR adds parametrized query building by adding a Parameter(':1') term that will be the parameter placeholder. Due to placeholders in SQL for parametrized queries being so varied (even between drivers or setup for the same dialect) that I decided to leave implementation as explicit as possible. Whatever is indicated as the string is included verbatim into the SQL.

    An example of this madness: PostgreSQL → $number OR %s + :name (depending on driver) MySQL → %s SQLite → ? Vertica → :name Oracle → :number + :name MSSQL → %(name)s OR :name + :number (depending on driver)

    This PR is a work-in-progress:

    • [x] Added Python3.7 and PyPy3.5 to Travis builds.
    • [x] Added some tests of when to NOT parametrize.
    • [x] Added some parameter tests.
    • [x] Implemented a Parameter() term, and added to root __init__.py for convenience
    • [x] Removed __str__() where the parent has the same variant implemented.
    • [x] Update documentation

    Fixes #113, potential workaround for some of #3

    opened by grigi 13
  • additional Join functionality

    additional Join functionality

    • It would be a nice convenience function just to specify a string or list of strs if the column(s) to join on are the same in both tables. e.g. from_(t1).join(t2).on('c1')

    • it's not clear how I can construct a join USING clause "... from t1 join t2 USING (c1, c2)"

    i'm happy to contribute if this is not yet available but you think would be helpful

    I'm sorry the docs don't show a preferred method of getting in touch. please suggest most appropriate communication channel. You can look me up as "CJ Lee" from London on linkedin

    P.S. love this package thanks for this

    opened by cjstudioz 10
  • Feature/temporal query support

    Feature/temporal query support

    I've added support for temporal table querying. I built this on top of my previous pull request BTW :(

    I'm not sure how close this is to how you would like it done.

    There are a bunch of examples in the README/tutorial near the bottom. The approach I've taken is to add the criteria to the Table. Here's an example:

    t = Table("abc")
    q = Query.from_(t.for_(SYSTEM_TIME.as_of('2020-01-01'))).select("*")
    

    This produces:

    'SELECT * FROM "abc" FOR SYSTEM_TIME AS OF '2020-01-01'
    

    There is support for all the valid operations including UPDATE and DELETE.

    A side affect of applying the criteria to the Table is that joins are also supported.

    Rob

    opened by rob-blackbourn 9
  • How to insert Array value into field?

    How to insert Array value into field?

    For example

    CREATE TABLE public.catalog (
       id           SERIAL PRIMARY KEY 
      ,name         VARCHAR(100) NOT NULL
      ,price        decimal(6,2) NOT NULL
      ,retail_price decimal(6,2) NOT NULL
      ,code         VARCHAR(100) NOT NULL
      ,property_values   integer[]
      ,images	VARCHAR[]
    );
    
    
        sql = str(
            pypika.PostgreSQLQuery
            .into(pypika.Table('catalog', schema='public'))
            .insert(
                (
                   1,
                   'test',
                   '0.00',
                   '0.00',
                   'test',
                   [1, 2, 3],
                   ["some", "cool"],
                )
            )
        )
    
    opened by lordent 9
  • Adds support for correlated subqueries

    Adds support for correlated subqueries

    Correlated subqueries [1] were prevented by validation which asserted that the field filtered existed in the _from tables. There was a work around which was to temporarily add the table to the _from list, then create the correlated subquery, then later remove it.

    This approach introduces a way to specifically highlight that the query being built is a correlated subquery, and as such include the outer table at the time of creation with from_.

    An alternate approach may have been to introduce a flag which disabled table based validation on fields when creating new WHERE clauses. This approach was decided against because a correlated subquery requires namespaces lest there be a clash when filtering "id" against "id" in the outer table.

    [1] https://en.wikipedia.org/wiki/Correlated_subquery

    opened by jtrain 8
  • How to include SQL_CALC_FOUND_ROWS option?

    How to include SQL_CALC_FOUND_ROWS option?

    I have a query that I'm trying to replicate in pypika that uses the SQL_CALC_FOUND_ROWS [1] option, but cannot figure out a way to get the option in the query. If I add it as a column, pypika wants to select it, but the option shouldn't be quoted in any way and there shouldn't be a comma to separate the option from the columns.

    purchase_order = pypika.Table("purchase_order")
    query = (
            pypika.MySQLQuery.from_(purchase_order)
            .where(purchase_order.is_active == 1)
            .select("SQL_CALC_FOUND_ROWS")
            .select(purchase_order.purchase_order_id.as_("_id"))
            .limit(pypika.Parameter("%(limit)s"))
        )
    

    will result in

    SELECT 
      `SQL_CALC_FOUND_ROWS`,
      `purchase_order_id` `_id`
    FROM `purchase_order` 
    WHERE `is_active`=1 
    LIMIT %(limit)s
    

    however the correct syntax for this option is...

    SELECT 
      SQL_CALC_FOUND_ROWS
      `purchase_order_id` `_id`
    FROM `purchase_order` 
    WHERE `is_active`=1 
    LIMIT %(limit)s
    

    Is there a way I can insert this option? I saw a Psuedocolumn class that seems like the right area, but if I try defining a new Psuedocolumn and passing it in, I get an exception raised.

    purchase_order = pypika.Table("purchase_order")
    CalcFoundRows = pypika.terms.Psuedocolumn('SQL_CALC_FOUND_ROWS')
    query = (
            pypika.MySQLQuery.from_(purchase_order)
            .where(purchase_order.is_active == 1)
            .select(CalcFoundRows)
            .select(purchase_order.purchase_order_id.as_("_id"))
            .limit(pypika.Parameter("%(limit)s"))
        )
    

    raises...

    AttributeError
    'Psuedocolumn' object has no attribute 'table'
    

    Any pointers on how I could get this to work? This is a fantastic lib by the way, I needed a way to programmatically define queries and really didn't want to throw an ORM into the mix. Thanks.

    [1] https://dev.mysql.com/doc/refman/5.7/en/information-functions.html#function_found-rows

    opened by stickystyle 8
  • Use self._wrapper_cls in QueryBuilder.set

    Use self._wrapper_cls in QueryBuilder.set

    This commit fixes a problem where the QueryBuilder.set method always used the ValueWrapper class instead of the instance's _wrapper_cls.

    Signed-off-by: Petter Nyström [email protected]

    opened by jimorie 7
  • RETURNING clause for postgresql

    RETURNING clause for postgresql

    This PR implements postgres RETURNING clause. This PR implements it directly in QueryBuilder, which could be quite misleading, for other sql users, so if you have any propositions regarding how this code should be arranged I can do some refactoring

    opened by abondar 7
  • Add support for FOR UPDATE NOWAIT, SKIP LOCKED, OF

    Add support for FOR UPDATE NOWAIT, SKIP LOCKED, OF

    Hi, due to inactivity of original PR #565 I've imported current changes and applied fixes as suggested by @x8lucas8x in their last comment. Please let me know if this is what you expected and/or if any other changes should be applied.

    My changes move the initial implementation to MySQL and PostgreSQL dialects, as these databases seem to support these enhancements to FOR UPDATE. Tests are updated accordingly.

    opened by squareapartments 6
  • Adding pypika/pivot.py with pivot function

    Adding pypika/pivot.py with pivot function

    Wasn't really sure where this should go so I made a new file. The behavior is similar to this PIVOT method, the difference is that PIVOT implicitly groups on all other columns, whereas with this pivot there is no implicit grouping.

    https://docs.snowflake.com/en/sql-reference/constructs/pivot.html

    opened by adamddav 6
  • Enums not quoted

    Enums not quoted

    Bug report

    Description

    When using a member of class(Enum) or class(str, Enum) in a query, I expect the underlying .value of the member to be used when constructing a query so that the result is the same as if I had passed the literal value explicitly.

    Example

    from pypika import Query, Table 
    from enum import Enum
    
    class Status(Enum):
        ON = 'on'
        OFF = 'off'
    
    t = Table('tab')
    q = Query.from_(t).select(t.col1).where(t.status == Status.ON)
    

    Actual behavior

    >>> q.get_sql()
    SELECT "col1" FROM "tab" WHERE "status"=on
    

    Expected behavior

    >>> q.get_sql()
    SELECT "col1" FROM "tab" WHERE "status"='on'
    

    Version info

    Python 3.7.9 PyPika 0.47.7

    opened by cbows 6
  • Add additional body statements/table options to query builders

    Add additional body statements/table options to query builders

    hi all

    MySQLQuery.create_table(country_table)\
        .columns(
            Column('hashkey',  "VARCHAR(50)" , nullable=False, default=ValueWrapper(""))
        )
    

    currently I am having such an object in order to achieve

    CREATE TABLE `countries` (
        `hashkey` VARCHAR(50) NOT NULL DEFAULT ''
        UNIQUE INDEX `hashkey2` (`hashkey`) USING BTREE
    )
    COLLATE='utf8_general_ci'
    

    I need to add some statements to the body and the table option but wondering how can I add them

    or simply, at this moment it is not supported?

    opened by hmlamad 0
  • update BasicCriterion.__bool__ to prevent chained comparisons

    update BasicCriterion.__bool__ to prevent chained comparisons

    I know it's really not the intended usage, but one could write an expression like 0 <= Field("abc") < 10, which would result in a Criterion that looks like BasicCriterion(Equality.lt, Field("abc"), 10).

    This is not surprising because python breaks down the chained comparison as separate comparisons joined by the python and, whose behavior cannot be overridden. This is dangerous because the statement is valid but the resulting criterion does not match what the code intended.

    I know updating (Basic)Criterion.__bool__ to throw is probably not the right solution here, but it would be nice if there was some way to detect/prevent against this.

    opened by thinkingatoms 0
  • Double quotes inconsistency in WITH clause

    Double quotes inconsistency in WITH clause

    Working with Oracle databases can be tricky sometime, especially with double quotes and aliases. In the current state, there is a bug around the WITH clause that does not encapsulate the alias inside double quotes which is inconsistent with other clauses. It can lead to invalid identifier on Oracale as demonstrated with the query

    sub_query = Query().from_(my_table).where(my_table.ID == "12345").select(my_table.ID)
    
    QN = AliasedQuery("sub_query_with")
    
    query = (
        Query()
        .with_(sub_query, "sub_query_with")
        .from_(QN)
        .select(
            QN.star
        )
    )
    
    df_temp = claim_conn.read_dataframe(query.get_sql())
    

    A workaround has been proposed here working with UPPERCASE aliases, but with a proper management of double quotes inside WITH clause it would be better

    opened by gandroz 0
  • CTE using subquery over direct select statement

    CTE using subquery over direct select statement

    Following the example here I was able to get a cte working with the output being Select * from a physical table which generates the following as expected:

    with an_alias AS
    (
      SELECT fizz
      FROM  efg
    )
    SELECT *
    FROM abc
    JOIN an_alias on an_alias.fizz = abc.buzz
    

    When attempting the same for a subquery, the final select statement becomes a nested subquery rather than direct access to the subquery with a Select ...

            sub_query_1 = Query.from_(self.table_efg).select("fizz")
            sub_query_2 = Query.from_(self.table_hij).select("buzz")
            test_query = (
                Query.with_(sub_query_1, "an_alias")
                .from_(subquery_2)
                .join(AliasedQuery("an_alias"))
                .on(AliasedQuery("an_alias").fizz == subquery_2.buzz)
                .select(subquery_2.buzz)
            )
    

    Output:

    with an_alias AS
    (
      SELECT fizz
      FROM  efg
    )
    SELECT *
    FROM (
                   SELECT buzz 
                   FROM abc
    ) sq0 JOIN an_alias on an_alias.fizz = sq0.buzz
    

    Is it possible currently to unnest the final select like below?

    with an_alias AS
    (
      SELECT fizz
      FROM  efg
    )
    SELECT buzz 
    FROM abc JOIN an_alias on an_alias.fizz = sq0.buzz
    

    It's causing a large increase in the execution plan to have to run the subquery then join to it after.

    opened by alexf-bond 0
Releases(0.9.3)
Async ORM based on PyPika

PyPika-ORM - ORM for PyPika SQL Query Builder The package gives you ORM for PyPika with asycio support for a range of databases (SQLite, PostgreSQL, M

Kirill Klenov 7 Jun 4, 2022
dask-sql is a distributed SQL query engine in python using Dask

dask-sql is a distributed SQL query engine in Python. It allows you to query and transform your data using a mixture of common SQL operations and Python code and also scale up the calculation easily if you need it.

Nils Braun 271 Dec 30, 2022
Databank is an easy-to-use Python library for making raw SQL queries in a multi-threaded environment.

Databank Databank is an easy-to-use Python library for making raw SQL queries in a multi-threaded environment. No ORM, no frills. Thread-safe. Only ra

snapADDY GmbH 4 Apr 4, 2022
SQL queries to collections

SQC SQL Queries to Collections Examples from sqc import sqc data = [ {"a": 1, "b": 1}, {"a": 2, "b": 1}, {"a": 3, "b": 2}, ] Simple filte

Alexander Volkovsky 0 Jul 6, 2022
An extension package of 🤗 Datasets that provides support for executing arbitrary SQL queries on HF datasets

datasets_sql A ?? Datasets extension package that provides support for executing arbitrary SQL queries on HF datasets. It uses DuckDB as a SQL engine

Mario Šaško 19 Dec 15, 2022
Simple DDL Parser to parse SQL (HQL, TSQL, AWS Redshift, Snowflake and other dialects) ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc.

Simple DDL Parser Build with ply (lex & yacc in python). A lot of samples in 'tests/. Is it Stable? Yes, library already has about 5000+ usage per day

Iuliia Volkova 95 Jan 5, 2023
edaSQL is a library to link SQL to Exploratory Data Analysis and further more in the Data Engineering.

edaSQL is a python library to bridge the SQL with Exploratory Data Analysis where you can connect to the Database and insert the queries. The query results can be passed to the EDA tool which can give greater insights to the user.

Tamil Selvan 8 Dec 12, 2022
Making it easy to query APIs via SQL

Shillelagh Shillelagh (ʃɪˈleɪlɪ) is an implementation of the Python DB API 2.0 based on SQLite (using the APSW library): from shillelagh.backends.apsw

Beto Dealmeida 207 Dec 30, 2022
Implementing basic MySQL CRUD (Create, Read, Update, Delete) queries, using Python.

MySQL with Python Implementing basic MySQL CRUD (Create, Read, Update, Delete) queries, using Python. We can connect to a MySQL database hosted locall

MousamSingh 5 Dec 1, 2021
Implementing basic MongoDB CRUD (Create, Read, Update, Delete) queries, using Python.

MongoDB with Python Implementing basic MongoDB CRUD (Create, Read, Update, Delete) queries, using Python. We can connect to a MongoDB database hosted

MousamSingh 4 Dec 1, 2021
Python PostgreSQL adapter to stream results of multi-statement queries without a server-side cursor

streampq Stream results of multi-statement PostgreSQL queries from Python without server-side cursors. Has benefits over some other Python PostgreSQL

Department for International Trade 6 Oct 31, 2022
A wrapper for SQLite and MySQL, Most of the queries wrapped into commands for ease.

Before you proceed, make sure you know Some real SQL, before looking at the code, otherwise you probably won't understand anything. Installation pip i

Refined 4 Jul 30, 2022
Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.

dataset: databases for lazy people In short, dataset makes reading and writing data in databases as simple as reading and writing JSON files. Read the

Friedrich Lindenberg 4.2k Jan 2, 2023
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

dbd: database prototyping tool dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL d

Zdenek Svoboda 47 Dec 7, 2022
Anomaly detection on SQL data warehouses and databases

With CueObserve, you can run anomaly detection on data in your SQL data warehouses and databases. Getting Started Install via Docker docker run -p 300

Cuebook 171 Dec 18, 2022
Generate database table diagram from SQL data definition.

sql2diagram Generate database table diagram from SQL data definition. e.g. "CREATE TABLE ..." See Example below How does it works? Analyze the SQL to

django-cas-ng 1 Feb 8, 2022
Query multiple mongoDB database collections easily

leakscoop Perform queries across multiple MongoDB databases and collections, where the field names and the field content structure in each database ma

bagel 5 Jun 24, 2021
PyRemoteSQL is a python SQL client that allows you to connect to your remote server with phpMyAdmin installed.

PyRemoteSQL Python MySQL remote client Basically this is a python SQL client that allows you to connect to your remote server with phpMyAdmin installe

ProbablyX 3 Nov 4, 2022
A simple python package that perform SQL Server Source Control and Auto Deployment.

deploydb Deploy your database objects automatically when the git branch is updated. Production-ready! ⚙️ Easy-to-use ?? Customizable ?? Installation I

Mert Güvençli 10 Dec 7, 2022