python-bigquery Apache-2python-bigquery (🥈34 · ⭐ 3.5K · 📈) - Google BigQuery API client library. Apache-2

Google APIs

Last update: Jan 1, 2023

Related tags

Database Drivers python-bigquery

Overview

Python Client for Google BigQuery

Querying massive datasets can be time consuming and expensive without the right hardware and infrastructure. Google BigQuery solves this problem by enabling super-fast, SQL queries against append-mostly tables, using the processing power of Google's infrastructure.

Quick Start

In order to use this library, you first need to go through the following steps:

Installation

Install this library in a virtualenv using pip. virtualenv is a tool to create isolated Python environments. The basic problem it addresses is one of dependencies and versions, and indirectly permissions.

With virtualenv, it's possible to install this library without needing system install permissions, and without clashing with the installed system dependencies.

Supported Python Versions

Python >= 3.6, < 3.10

Unsupported Python Versions

Python == 2.7, Python == 3.5.

The last version of this library compatible with Python 2.7 and 3.5 is google-cloud-bigquery==1.28.0.

Mac/Linux

pip install virtualenv
virtualenv <your-env>
source <your-env>/bin/activate
<your-env>/bin/pip install google-cloud-bigquery

Windows

pip install virtualenv
virtualenv <your-env>
<your-env>\Scripts\activate
<your-env>\Scripts\pip.exe install google-cloud-bigquery

Example Usage

Perform a query

from google.cloud import bigquery

client = bigquery.Client()

# Perform a query.
QUERY = (
    'SELECT name FROM `bigquery-public-data.usa_names.usa_1910_2013` '
    'WHERE state = "TX" '
    'LIMIT 100')
query_job = client.query(QUERY)  # API request
rows = query_job.result()  # Waits for query to finish

for row in rows:
    print(row.name)

Instrumenting With OpenTelemetry

This application uses OpenTelemetry to output tracing data from API calls to BigQuery. To enable OpenTelemetry tracing in the BigQuery client the following PyPI packages need to be installed:

pip install google-cloud-bigquery[opentelemetry] opentelemetry-exporter-google-cloud

After installation, OpenTelemetry can be used in the BigQuery client and in BigQuery jobs. First, however, an exporter must be specified for where the trace data will be outputted to. An example of this can be found here:

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchExportSpanProcessor
from opentelemetry.exporter.cloud_trace import CloudTraceSpanExporter
trace.set_tracer_provider(TracerProvider())
trace.get_tracer_provider().add_span_processor(
    BatchExportSpanProcessor(CloudTraceSpanExporter())
)

In this example all tracing data will be published to the Google Cloud Trace console. For more information on OpenTelemetry, please consult the OpenTelemetry documentation.

Comments

Can't upload data with "2019-07-08 08:00:00" datetime format to Google Bigquery with pandas
Environment details

I'm using pandas with google-cloud-python

Steps to reproduce

I have a dataframe has datetime format, ex: "2019-07-08 08:00:00" and my schema has created column with DATETIME type.

I tried covert it to use pq.to_datetime()

Then I used load_table_from_dataframe() to insert data.

Code example

my_df = get_sessions() # this return a dataframe has a column name is created which is datetime[ns] type ex :"2020-01-08 08:00:00" my_df['created'] = pd.to_datetime(my_df['created'], format='%Y-%m-%d %H:%M:%S').astype('datetime64[ns]') res = bigquery_client.client.load_table_from_dataframe(my_df, table_id) res.result() # exp: my value "2020-01-08 08:00:00" is being changed as INVALID or this value "0013-03-01T03:05:00" or other wrong value @plamut please help

I just updated my problem . Here Thanks!
api: bigquery type: bug priority: p2 external
opened by namnguyenbk 36
Bigquery: import error with v1.24.0

bug googleapis/google-cloud-python#9965 is still happening in v1.24.0 and six v1.14.0

`File "/root/.local/share/virtualenvs/code-788z9T0p/lib/python3.6/site-packages/google/cloud/bigquery/schema.py", line 17, in

from six.moves import collections_abc ImportError: cannot import name 'collections_abc' `

why did you close the googleapis/google-cloud-python#9965 issue if it still reproduces for many people?

api: bigquery needs more info type: question

opened by sagydr 31

BigQuery: make jobs awaitable

I know BigQuery jobs are asynchronous by default. However, I am struggling to make my datapipeline async end-to-end.

Looking at this JS example, I thought it would be the most Pythonic to make a BigQuery job awaitable. However, I can't get that to work in Python i.e. errors when await client.query(query). Looking at the source code, I don't see which method returns an awaitable object.

I have little experience in writing async Python code and found this example that wraps jobs in a async def coroutine.

class BQApi(object):                                                                                                 
    def __init__(self):                                                                                              
        self.api = bigquery.Client.from_service_account_json(BQ_CONFIG["credentials"])                               

    async def exec_query(self, query, **kwargs) -> bigquery.table.RowIterator:                                       
        job = self.api.query(query, **kwargs)                                                                        
        task = asyncio.create_task(self.coroutine_job(job))                                                          
        return await task                                                                                            

    @staticmethod                                                                                                    
    async def coroutine_job(job):                                                                                    
        return job.result()

The google.api_core.operation.Operation shows how to use add_done_callback to asynchronously wait for long-running operations. I have tried that, but the following yields AttributeError: 'QueryJob' object has no attribute '_condition' :

from concurrent.futures import ThreadPoolExecutor, as_completed
query1 = 'SELECT 1'
query2 = 'SELECT 2'

def my_callback(future):
    result = future.result()

operations = [bq.query(query1), bq.query(query2)]
[operation.add_done_callback(my_callback) for operation in operations]
results2 = []
for future in as_completed(operations):
  results2.append(list(future.result()))

Given that jobs are already asynchronous, would it make sense to add a method that returns an awaitable?

Or am I missing something and is there an Pythonic way to use the BigQuery client with the async/await pattern?

wontfix api: bigquery type: feature request Python 3 Only

opened by dkapitan 27

BigQuery: Upload pandas DataFrame containing arrays
The support for python Bigquery API indicates that arrays are possible, however, when passing from a pandas dataframe to bigquery there is a pyarrow struct issue.

The only way round it seems its to drop columns then use JSON Normalise for a separate table.

from google.cloud import bigquery project = 'lake' client = bigquery.Client(credentials=credentials, project=project) dataset_ref = client.dataset('XXX') table_ref = dataset_ref.table('RAW_XXX') job_config = bigquery.LoadJobConfig() job_config.autodetect = True job_config.write_disposition = 'WRITE_TRUNCATE' client.load_table_from_dataframe(appended_data, table_ref,job_config=job_config).result()

This is the error recieved. NotImplementedError: struct

The reason I wanted to use this API as it indicates Nested Array support, which is perfect for our data lake in BQ but I assume this doesn't work?
api: bigquery type: feature request
opened by AETDDraper 21

500 server error when creating table using clustering

Environment details

OS type and version: Ubuntu20 PopOs
Python version: 3.7.8
pip version: 20.1.1
google-cloud-bigquery version: 1.27.2

Steps to reproduce

I'm creating a table with some columns, one of them is of type GEOGRAHPY. When I try to create the table with a sample data, if I choose to use clustering, I got the 500 error. I can create the table only if no clustering is made. Also I can create the table with clustering if I don't include the column of type GEOGRAHPY. Code with a toy example to reproduce it:

Code example

import time
import pandas as pd
from google.cloud import bigquery
from shapely.geometry import Point

client = bigquery.Client()
PROJECT_ID = ""
table_id = f"{PROJECT_ID}.data_capture.toy"

df = pd.DataFrame(
    dict(
        lat=[6.208969] * 100,
        lon=[-75.571696] * 100,
        logged_at=[int(time.time() * 1000) for _ in range(100)],
    )
)
df["point"] = df.apply(lambda row: Point(row["lon"], row["lat"]).wkb_hex, axis=1)

job_config = bigquery.LoadJobConfig(
    schema=[
        bigquery.SchemaField("lon", "FLOAT64", "REQUIRED"),
        bigquery.SchemaField("lat", "FLOAT64", "REQUIRED"),
        bigquery.SchemaField("point", "GEOGRAPHY", "REQUIRED"),
        bigquery.SchemaField("logged_at", "TIMESTAMP", "REQUIRED"),
    ],
    write_disposition="WRITE_TRUNCATE",
    time_partitioning=bigquery.TimePartitioning(
        type_=bigquery.TimePartitioningType.DAY, field="logged_at",
    ),
    clustering_fields=["logged_at"],
)

job = client.load_table_from_dataframe(
    df, table_id, job_config=job_config
)  # Make an API request.
job.result()  # Wait for the job to complete.

Stack trace

Traceback (most recent call last):
  File "test.py", line 108, in <module>
    job.result()  # Wait for the job to complete.
  File "/home/charlie/data/kiwi/data-upload/.venv/lib/python3.7/site-packages/google/cloud/bigquery/job.py", line 812, in result
    return super(_AsyncJob, self).result(timeout=timeout)
  File "/home/charlie/data/kiwi/data-upload/.venv/lib/python3.7/site-packages/google/api_core/future/polling.py", line 130, in result
    raise self._exception
google.api_core.exceptions.InternalServerError: 500 An internal error occurred and the request could not be completed. Error: 3144498

Thank you in advance!

api: bigquery type: bug priority: p2 external

opened by charlielito 20

ImportError: cannot import name bigquery_storage_v1beta1 from google.cloud

This error occurs when I run a query using %%bigquery magics in GCP-hosted notebook and the query fails.

"ImportError: cannot import name bigquery_storage_v1beta1 from google.cloud (unknown location)"

Environment details

OS type and version:
Python version: 3.7
pip version: 20.1.1
google-cloud-bigquery version: 1.26.0 and 1.26.1; not an issue with 1.25.0

Steps to reproduce

Install current version of google-cloud-bigquery
Query

Code example

%pip install --upgrade google-cloud-bigquery[bqstorage,pandas]
%load_ext google.cloud.bigquery
import google.cloud.bigquery.magics

%%bigquery stackoverflow --use_bqstorage_api
SELECT
  CONCAT(
    'https://stackoverflow.com/questions/',
    CAST(id as STRING)) as url,
  view_count
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE tags like '%google-bigquery%'
ORDER BY view_count DESC
LIMIT 10

Stack trace

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-2-29432a7a9e7c> in <module>
----> 1 get_ipython().run_cell_magic('bigquery', 'stackoverflow --use_bqstorage_api', "SELECT\n  CONCAT(\n    'https://stackoverflow.com/questions/',\n    CAST(id as STRING)) as url,\n  view_count\nFROM `bigquery-public-data.stackoverflow.posts_questions`\nWHERE tags like '%google-bigquery%'\nORDER BY view_count DESC\nLIMIT 10\n")

/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2369             with self.builtin_trap:
   2370                 args = (magic_arg_s, cell)
-> 2371                 result = fn(*args, **kwargs)
   2372             return result
   2373 

/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/magics.py in _cell_magic(line, query)
    589             )
    590         else:
--> 591             result = query_job.to_dataframe(bqstorage_client=bqstorage_client)
    592 
    593         if args.destination_var:

/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/job.py in to_dataframe(self, bqstorage_client, dtypes, progress_bar_type, create_bqstorage_client, date_as_object)
   3381             progress_bar_type=progress_bar_type,
   3382             create_bqstorage_client=create_bqstorage_client,
-> 3383             date_as_object=date_as_object,
   3384         )
   3385 

/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/table.py in to_dataframe(self, bqstorage_client, dtypes, progress_bar_type, create_bqstorage_client, date_as_object)
   1725                 progress_bar_type=progress_bar_type,
   1726                 bqstorage_client=bqstorage_client,
-> 1727                 create_bqstorage_client=create_bqstorage_client,
   1728             )
   1729             df = record_batch.to_pandas(date_as_object=date_as_object)

/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/table.py in to_arrow(self, progress_bar_type, bqstorage_client, create_bqstorage_client)
   1543             record_batches = []
   1544             for record_batch in self._to_arrow_iterable(
-> 1545                 bqstorage_client=bqstorage_client
   1546             ):
   1547                 record_batches.append(record_batch)

/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/table.py in _to_page_iterable(self, bqstorage_download, tabledata_list_download, bqstorage_client)
   1432     ):
   1433         if bqstorage_client is not None:
-> 1434             for item in bqstorage_download():
   1435                 yield item
   1436             return

/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py in _download_table_bqstorage(project_id, table, bqstorage_client, preserve_order, selected_fields, page_to_item)
    626     # is available and can be imported.
    627     from google.cloud import bigquery_storage_v1
--> 628     from google.cloud import bigquery_storage_v1beta1
    629 
    630     if "$" in table.table_id:

ImportError: cannot import name 'bigquery_storage_v1beta1' from 'google.cloud' (unknown location)

api: bigquery type: bug priority: p2 external

opened by vanessanielsen 20

convert time columns to dbtime by default in `to_dataframe`

Currently TIME columns are just exposed as string objects. This would be a better experience and align with better with the expectations for working with timeseries in pandas https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html

Presumably one could combine a date column with a time column to create a datetime by adding them.
api: bigquery type: feature request semver: major

opened by tswast 19
feat: add support for Parquet options
Closes #661.

For load jobs and external tables config.

PR checklist:

[x] Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea

[x] Ensure the tests and linter pass

[x] Code coverage does not decrease (if any source code was changed)

[x] Appropriate docs were updated (if necessary)

api: bigquery cla: yes
opened by plamut 19
feat: use geopandas for GEOGRAPHY columns if geopandas is installed

This would technically be a breaking change, but it might make sense to do while we are changing default dtypes in https://github.com/googleapis/python-bigquery/pull/786 for https://issuetracker.google.com/144712110

If the GeoPandas library is installed (meaning, GeoPandas should be considered an optional "extra"), it may make sense to use the extension dtypes provided by GeoPandas by default on GEOGRAPHY columns.
api: bigquery type: feature request semver: major

opened by tswast 18
Purpose of timeout in client.get_job(timeout=5)

Hi,

What is the purpose of using timeout while fetching job information since the bigquery.Client says time to wait for before retrying. Is retry attempt should be made by the user or the client will handle that

But I am getting an exception raised

host='bigquery.googleapis.com', port=443): Read timed out. (read timeout=5)"}
api: bigquery type: question

opened by nitishxp 18
fix: support ARRAY data type when loading from DataFrame with Parquet
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

[x] Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea

[x] Ensure the tests and linter pass

[x] Code coverage does not decrease (if any source code was changed)

[x] Appropriate docs were updated (if necessary)

Fixes #19 🦕
api: bigquery cla: yes
opened by judahrand 17
docs: revise create table cmek sample
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

[ ] Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea

[ ] Ensure the tests and linter pass

[ ] Code coverage does not decrease (if any source code was changed)

[ ] Appropriate docs were updated (if necessary)

Towards #790 🦕
api: bigquery samples size: m
opened by Mattix23 1
docs: revise label table code samples
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

[ ] Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea

[ ] Ensure the tests and linter pass

[ ] Code coverage does not decrease (if any source code was changed)

[ ] Appropriate docs were updated (if necessary)

Towards #790 🦕
api: bigquery samples size: m
opened by Mattix23 1
add update_table_access.py
in public docs we less sample python code for setting iam on bigquery table. https://cloud.google.com/bigquery/docs/control-access-to-resources-iam#grant_access_to_a_table_or_view

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

[ ] Make sure to open an issue as a bug/issue

[ ] before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea

[ ] Ensure the tests and linter pass

[ ] Code coverage does not decrease (if any source code was changed)

[ ] Appropriate docs were updated (if necessary)

Fixes #<1449> 🦕
api: bigquery size: m
opened by nonokangwei 2
Grant access to bigquery table sample code

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Is your feature request related to a problem? Please describe. in https://cloud.google.com/bigquery/docs/control-access-to-resources-iam#grant_access_to_a_table_or_view, there has no sample code for python to grant access on table or view Describe the solution you'd like give the grant access on table or view python simple code Describe alternatives you've considered customer need do coding by complex documentation. Additional context n/a
api: bigquery samples

opened by nonokangwei 0

Support for JSON query parameters

It is currently not possible to escape json parameters in a query like this:

job_config = bigquery.QueryJobConfig(
    config.query_parameters=[
        bigquery.JsonQueryParameter("data", {"foo": "bar"})
    ]
)
stmt = 'UPDATE FROM my_table SET data=@data';
query_job = client.query(stmt, job_config=job_config)

It seem quite important to avoid SQL injections.

api: bigquery

opened by maingoh 0

docs: Revised create_partitioned_table sample
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

[ ] Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea

[ ] Ensure the tests and linter pass

[ ] Code coverage does not decrease (if any source code was changed)

[ ] Appropriate docs were updated (if necessary)

Towards #790 🦕
api: bigquery samples size: m
opened by thejaredchapman 1

Releases(v3.4.1)

v3.4.1(Dec 12, 2022)
3.4.1 (2022-12-09)

Documentation

Add info about streaming quota limits to insert_rows* methods (#1409) (0f08e9a)

Dependencies

make pyarrow and BQ Storage optional dependencies (e1aa921)

Source code(tar.gz)
Source code(zip)
v3.4.0(Nov 18, 2022)
3.4.0 (2022-11-17)

Features

Add reference_file_schema_uri to LoadJobConfig, ExternalConfig (#1399) (931285f)

Add default value expression (#1408) (207aa50)

Add More Specific Type Annotations for Row Dictionaries (#1295) (eb49873)

Source code(tar.gz)
Source code(zip)
v3.3.6(Nov 4, 2022)
3.3.6 (2022-11-02)

Features

Reconfigure tqdm progress bar in %%bigquery magic (#1355) (506f781)

Bug Fixes

Corrects test for non-existent attribute (#1395) (a80f436)

deps: Allow protobuf 3.19.5 (#1379) (3e4a074)

deps: Allow pyarrow < 11 (#1393) (c898546)

deps: Require requests>=2.21.0 (#1388) (e398336)

Refactor to adapt to changes to shapely dependency (#1376) (2afd278)

Documentation

Fix typos (#1372) (21cc525)

Miscellaneous Chores

release 3.3.6 (4fce1d9)

Source code(tar.gz)
Source code(zip)
v3.3.5(Oct 11, 2022)
3.3.5 (2022-10-10)

Bug Fixes

deps: Allow protobuf 3.19.5 (#1379) (3e4a074)

Source code(tar.gz)
Source code(zip)
v3.3.4(Oct 3, 2022)
3.3.4 (2022-09-29)

Bug Fixes

deps: Require protobuf >= 3.20.2 (#1369) (f13383a)

Source code(tar.gz)
Source code(zip)
v3.3.3(Sep 29, 2022)
3.3.3 (2022-09-28)

Bug Fixes

Refactors code to account for a tdqm code deprecation (#1357) (1369a9d)

Validate opentelemetry span job attributes have values (#1327) (8287af1)

Documentation

samples: uses function (create_job) more appropriate to the described sample intent (5aeedaa)

Source code(tar.gz)
Source code(zip)
v3.3.2(Aug 16, 2022)
3.3.2 (2022-08-16)

Bug Fixes

deps: require proto-plus >= 1.22.0 (1de7a52)

deps: require protobuf >=3.19, < 5.0.0 (#1311) (1de7a52)

Source code(tar.gz)
Source code(zip)
v3.3.1(Aug 9, 2022)
3.3.1 (2022-08-09)

Bug Fixes

deps: allow pyarrow < 10 (#1304) (13616a9)

Source code(tar.gz)
Source code(zip)
v3.3.0(Jul 26, 2022)
3.3.0 (2022-07-25)

Features

add destination_expiration_time property to copy job (#1277) (728b07c)

Bug Fixes

require python 3.7+ (#1284) (52d9f14)

Documentation

samples: add table snapshot sample (#1274) (e760d1b)

samples: explicitly add bq to samples reqs, upgrade grpc to fix bug on m1 (#1290) (9b7e3e4)

Source code(tar.gz)
Source code(zip)
v2.34.4(Jun 9, 2022)
2.34.4 (2022-05-31)

Bug Fixes

deps: require protobuf>= 3.12.0, <4.0.0dev (#1266) (4ec586b)

Source code(tar.gz)
Source code(zip)
v1.28.2(Jun 9, 2022)
1.28.2 (2022-06-07)

Bug Fixes

deps: require protobuf<4.0.0 on v1 branch (#1271) (3dbeb72)

Source code(tar.gz)
Source code(zip)
v3.2.0(Jun 6, 2022)
3.2.0 (2022-06-06)

Features

add support for table clones (#1235) (176fb2a)

Bug Fixes

deps: proto-plus >= 1.15.0, <2.0.0dev (ba58d3a)

deps: require packaging >= 14.3, <22.0.0dev (ba58d3a)

deps: require protobuf>= 3.12.0, <4.0.0dev (#1263) (ba58d3a)

Documentation

fix changelog header to consistent size (#1268) (d03e2a2)

Source code(tar.gz)
Source code(zip)
v3.1.0(May 9, 2022)
3.1.0 (2022-05-09)

Features

add str method to table (#1199) (8da4fa9)

refactor AccessEntry to use _properties pattern (#1125) (acd5612)

support using BIGQUERY_EMULATOR_HOST environment variable (#1222) (39294b4)

Bug Fixes

deps: allow pyarrow v8 (#1245) (d258690)

export bigquery.HivePartitioningOptions (#1217) (8eb757b)

Skip geography_as_object conversion for REPEATED fields (#1220) (4d3d6ec)

Documentation

updated variable typo in comment in code sample (#1239) (e420112)

Source code(tar.gz)
Source code(zip)
v1.28.1(Apr 6, 2022)
1.28.1 (2022-04-04)

Bug Fixes

deps: require google-api-core >= 1.31.5, >= 2.3.2 on v1 release (#1166) (34d9bec)

Source code(tar.gz)
Source code(zip)
v3.0.1(Mar 30, 2022)
3.0.1 (2022-03-30)

Bug Fixes

deps: raise exception when pandas is installed but db-dtypes is not (#1191) (4333910)

deps: restore dependency on python-dateutil (#1187) (212d7ec)

Source code(tar.gz)
Source code(zip)
v3.0.0(Mar 29, 2022)
3.0.0 (2022-03-29)

⚠ BREAKING CHANGES

BigQuery Storage and pyarrow are required dependencies (#776)

use nullable Int64 and boolean dtypes in to_dataframe (#786)

destination tables are no-longer removed by create_job (#891)

In to_dataframe, use dbdate and dbtime dtypes from db-dtypes package for BigQuery DATE and TIME columns (#972)

automatically convert out-of-bounds dates in to_dataframe, remove date_as_object argument (#972)

mark the package as type-checked (#1058)

default to DATETIME type when loading timezone-naive datetimes from Pandas (#1061)

remove out-of-date BigQuery ML protocol buffers (#1178)

Features

add api_method parameter to Client.query to select INSERT or QUERY API (#967) (76d88fb)

default to DATETIME type when loading timezone-naive datetimes from Pandas (#1061) (76d88fb)

destination tables are no-longer removed by create_job (#891) (76d88fb)

In to_dataframe, use dbdate and dbtime dtypes from db-dtypes package for BigQuery DATE and TIME columns (#972) (76d88fb)

mark the package as type-checked (#1058) (76d88fb)

use StandardSqlField class for Model.feature_columns and Model.label_columns (#1117) (76d88fb)

Bug Fixes

automatically convert out-of-bounds dates in to_dataframe, remove date_as_object argument (#972) (76d88fb)

improve type annotations for mypy validation (#1081) (76d88fb)

remove out-of-date BigQuery ML protocol buffers (#1178) (76d88fb)

use nullable Int64 and boolean dtypes in to_dataframe (#786) (76d88fb)

Documentation

Add migration guide from version 2.x to 3.x (#1027) (76d88fb)

Dependencies

BigQuery Storage and pyarrow are required dependencies (#776) (76d88fb)

Source code(tar.gz)
Source code(zip)
v2.34.3(Mar 29, 2022)
2.34.3 (2022-03-29)

Bug Fixes

update content-type header (#1171) (921b440)

Source code(tar.gz)
Source code(zip)
v2.34.2(Mar 7, 2022)
2.34.2 (2022-03-05)

Bug Fixes

deps: require google-api-core>=1.31.5, >=2.3.2 (#1157) (0c15790)

deps: require proto-plus>=1.15.0 (0c15790)

Source code(tar.gz)
Source code(zip)
v2.34.1(Mar 2, 2022)
2.34.1 (2022-03-02)

Dependencies

add "extra" for IPython, exclude bad IPython release (#1151) (0fbe12d)

allow pyarrow 7.0 (#1112) (57f8ea9)

Source code(tar.gz)
Source code(zip)
v2.34.0(Feb 18, 2022)
2.34.0 (2022-02-18)

Features

support BI Engine statistics in query job (#1144) (7482549)

Source code(tar.gz)
Source code(zip)
v2.33.0(Feb 17, 2022)
2.33.0 (2022-02-16)

Features

add --no_query_cache option to %%bigquery magics to disable query cache (#1141) (7dd30af)

Bug Fixes

return 403 when VPC-SC violation happens (#1131) (f5daa9b)

Documentation

reference BigQuery REST API defaults in LoadJobConfig descrip… (#1132) (18d9580)

show common job properties in get_job and cancel_job samples (#1137) (8edc10d)

Source code(tar.gz)
Source code(zip)
v2.32.0(Jan 13, 2022)
2.32.0 (2022-01-12)

Features

support authorized dataset entity (#1075) (c098cd0)

Bug Fixes

remove query text from exception message, use exception.debug_message instead (#1105) (e23114c)

Source code(tar.gz)
Source code(zip)
v2.31.0(Dec 2, 2021)
Features

allow cell magic body to be a $variable (#1053) (3a681e0)

promote RowIterator.to_arrow_iterable to public method (#1073) (21cd710)

Bug Fixes

apply timeout to all resumable upload requests (#1070) (3314dfb)

Dependencies

support OpenTelemetry >= 1.1.0 (#1050) (4616cd5)

Source code(tar.gz)
Source code(zip)
v2.30.1(Nov 4, 2021)
Bug Fixes

error if eval()-ing repr(SchemaField) (#1046) (13ac860)

Documentation

show gcloud command to authorize against sheets (#1045) (20c9024)

use stable URL for pandas intersphinx links (#1048) (73312f8)

Source code(tar.gz)
Source code(zip)
v2.30.0(Nov 3, 2021)
Features

accept TableListItem where TableReference is accepted (#1016) (fe16adc)

support Python 3.10 (#1043) (5bbb832)

Documentation

add code samples for Jupyter/IPython magics (#1013) (61141ee)

samples: add create external table with hive partitioning (#1033) (d64f5b6)

Source code(tar.gz)
Source code(zip)
v2.29.0(Oct 27, 2021)
Features

add QueryJob.schema property for dry run queries (#1014) (2937fa1)

add session and connection properties to QueryJobConfig (#1024) (e4c94f4)

add support for INTERVAL data type to list_rows (#840) (e37380a)

allow queryJob.result() to be called on a dryRun (#1015) (685f06a)

Documentation

document ScriptStatistics and other missing resource classes (#1023) (6679109)

fix formatting of generated client docstrings (#1009) (f7b0ee4)

Dependencies

allow pyarrow 6.x (#1031) (1c2de74)

Source code(tar.gz)
Source code(zip)
v2.28.1(Oct 7, 2021)
Bug Fixes

support ARRAY data type when loading from DataFrame with Parquet (#980) (1e59083)

Source code(tar.gz)
Source code(zip)
v2.28.0(Sep 30, 2021)
Features

add AvroOptions to configure AVRO external data (#994) (1a9431d)

Documentation

link to stable pandas docs (#990) (ea50e80)

Source code(tar.gz)
Source code(zip)
v2.27.1(Sep 27, 2021)
Bug Fixes

remove py.typed since package fails mypy check (#988) (39030f2)

Source code(tar.gz)
Source code(zip)
v2.27.0(Sep 27, 2021)
Features

Add py.typed for PEP 561 compliance (#976) (96e6bee)

include key metadata in Job representation (#964) (acca1cb)

Bug Fixes

Arrow extension-type metadata was not set when calling the REST API or when there are no rows (#946) (864383b)

disambiguate missing policy tags from explicitly unset policy tags (#983) (f83c00a)

remove default timeout (#974) (1cef0d4)

Documentation

simplify destination table sample with f-strings (#966) (ab6e76f)

Source code(tar.gz)
Source code(zip)

Owner

Google APIs

Clients for Google APIs and tools that help produce them.

GitHub

Pandas Google BigQuery

pandas-gbq pandas-gbq is a package providing an interface to the Google BigQuery API from pandas Installation Install latest release version via conda

345 Dec 28, 2022

Google Cloud Client Library for Python

Google Cloud Python Client Python idiomatic clients for Google Cloud Platform services. Stability levels The development status classifier on PyPI ind

4.1k Jan 1, 2023

Python client for Apache Kafka

Kafka Python client Python client for the Apache Kafka distributed stream processing system. kafka-python is designed to function much like the offici