-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
api: bigqueryIssues related to the BigQuery API.Issues related to the BigQuery API.api: bigquerystorageIssues related to the BigQuery Storage API.Issues related to the BigQuery Storage API.priority: p2Moderately-important priority. Fix may not be included in next release.Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Description
Steps to reproduce
- Call
list_rowswithmax_resultsset. - Call
to_dataframeorto_arrow. - Observe that more rows were returned than were requested.
Code example
from google.cloud import bigquery
from google.cloud import bigquery_storage
bqclient = bigquery.Client()
bqstorage_client = bigquery_storage.BigQueryStorageClient()
df_tabledata_list = bqclient.list_rows(
"bigquery-public-data.utility_us.country_code_iso",
selected_fields=[bigquery.SchemaField("country_name", "STRING")],
max_results=100,
).to_dataframe()
print("tabledata.list: {} rows".format(len(df_tabledata_list.index)))
df_bqstorage = bqclient.list_rows(
"bigquery-public-data.utility_us.country_code_iso",
selected_fields=[bigquery.SchemaField("country_name", "STRING")],
max_results=100,
).to_dataframe(bqstorage_client=bqstorage_client)
print("bqstorage: {} rows".format(len(df_bqstorage.index)))Output
tabledata.list: 100 rows
bqstorage: 278 rows
Possible fixes
- (Harder) Keep track of how many rows you've downloaded in a BQ Storage session so far. Once you've downloaded enough rows, close all streams (is this even possible?).
- (Easier, but acceptable) If
max_resultsis set, always download data with tabledata.list.
I think we should implement the fix with (2) because if max_results is set, it's unlikely that we are downloading all that many rows where using the BQ Storage API makes sense.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
api: bigqueryIssues related to the BigQuery API.Issues related to the BigQuery API.api: bigquerystorageIssues related to the BigQuery Storage API.Issues related to the BigQuery Storage API.priority: p2Moderately-important priority. Fix may not be included in next release.Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.Error or flaw in code with unintended results or allowing sub-optimal usage patterns.