Description
I've rewritten the current read_gbq()
function using google-cloud-python
, which handles the naming of structs and arrays out of the box. For more discussion about this, see: https://github.com/pydata/pandas-gbq/issues/23.
~However, because of the fact that google-cloud-python potentially uses different authentication flows and may break existing behavior, I've left the existing read_gbq()
function and and named this new function from_gbq()
. If in the future we are able to reconcile the authentication flows and/or decide to deprecate flows that are not supported in google-cloud-python
, we can rename this to read_gbq()
.~
UPDATE: As requested in comment by @jreback (https://github.com/pydata/pandas-gbq/pull/25/files/a763cf071813c836b7e00ae40ccf14e93e8fd72b#r110518161), I deleted old read_gbq()
and named my new function read_gbq()
, deleting all legacy functions and code.
Added in a few lines to requirements file, but I'll leave it to you @jreback to deal with conda dependency issues which you mentioned in Issue 23.
Let know if any questions or if any tests need to be written. You can confirm that it works by running the following:
q = """
select ROW_NUMBER() over () row_num, struct(a,b) col, c, d, c*d c_times_d, e
from
(select * from
(SELECT 1 a, 2 b, null c, 0 d, 100 e)
UNION ALL
(SELECT 5 a, 6 b, 0 c, null d, 200 e)
UNION ALL
(SELECT 8 a, 9 b, 10.0 c, 10 d, 300 e)
)
"""
df = gbq.read_gbq(q, dialect='standard')
df
| row_num | col | c | d | c_times_d | e |
|---------|--------------------|------|------|-----------|-----|
| 2 | {u'a': 5, u'b': 6} | 0.0 | NaN | NaN | 200 |
| 1 | {u'a': 1, u'b': 2} | NaN | 0.0 | NaN | 100 |
| 3 | {u'a': 8, u'b': 9} | 10.0 | 10.0 | 100.0 | 300 |
q = """
select array_agg(a) mylist
from
(select "1" a UNION ALL select "2" a)
"""
df = gbq.read_gbq(q, dialect='standard')
df
| mylist |
|--------|
| [1, 2] |
q = """
select array_agg(struct(a,b)) col, f
from
(select * from
(SELECT 1 a, 2 b, null c, 0 d, 100 e, "hello" f)
UNION ALL
(SELECT 5 a, 6 b, 0 c, null d, 200 e, "ok" f)
UNION ALL
(SELECT 8 a, 9 b, 10.0 c, 10 d, 300 e, "ok" f)
)
group by f
"""
df = gbq.read_gbq(q, dialect='standard')
df
| col | f |
|------------------------------------------|-------|
| [{u'a': 5, u'b': 6}, {u'a': 8, u'b': 9}] | ok |
| [{u'a': 1, u'b': 2}] | hello |
Confirmed that col_order
and index_col
still work ~(feel free to pull that out into a separate function since there's now redundant code with read_gbq()
)~, and I removed the type conversion lines which appear to be unnecessary (google-cloud-python and/or pandas appears to do the necessary type conversion automatically, even if there are nulls; can confirm by examining the datatypes in the resulting dataframes).
type: feature request