BigQuery: Add dry_run option to BigQuery magic#9067
BigQuery: Add dry_run option to BigQuery magic#9067tswast merged 20 commits intogoogleapis:masterfrom
Conversation
…nt, a QueryJob object is returned for inspection instead of an empty DataFrame
There was a problem hiding this comment.
Thanks for the PR! It generally it looks good to me, apart from the dry_run argument help string.
I noticed, however, than when using the dry_run mode, an error is printed to the console:
In [16]: %%bigquery --dry_run
...: SELECT * FROM `bigquery-public-data.samples.shakespeare` LIMIT 5;
...:
...:
Executing query with job ID: None
ERROR:
404 GET https://www.googleapis.com/bigquery/v2/projects/precise-truck-742/queries/None?maxResults=0&location=US: Not found: Job precise-truck-742:US.None
(job ID: None)
-----Query Job SQL Follows-----
| . | . | . | . | . | . |
1:SELECT * FROM `bigquery-public-data.samples.shakespeare` LIMIT 5;
2:
| . | . | . | . | . | . |
This can be confusing for the users, and would be good to have it fixed. However, if fixing _run_query() proves to be too complex in the scope of this PR, we can make these changes separately.
Co-Authored-By: Peter Lamut <plamut@users.noreply.github.com>
|
I'll investigate that error being printed out. I agree that it would be confusing to users, and don't think it's outside of the scope of the PR |
|
When you run a dry run query, a job is not actually created, so fetching the results fails. |
|
Have you run this in a notebook, yet? I'm curious what the output looks like. We probably want to improve the |
|
@tswast as of now, running it in a notebook is silent. I tried modifying |
This reverts commit e3107f6.
…s total processed bytes to console
…s total processed bytes to console
1a9f187 to
6efa408
Compare
There was a problem hiding this comment.
The new changes look good IMO.
I still have two questions, though:
-
If
destination_varis used and an error occurs, should the failed query job still be stored for introspection? Currently it isn't:In [179]: %%bigquery result --dry_run ...: SELECT SELECT * FROM `bigquery-public-data.samples.shakespeare` LIMIT 5; ...: ...: ...: ERROR: 400 POST https://www.googleapis.com/bigquery/v2/projects/precise-truck-742/jobs: Syntax error: Unexpected keyword SELECT at [1:8] In [180]: "result" in locals() Out[180]: False -
If
destination_varis not specified and an error occurs, it would IMO still be useful to print a query even on dry runs, especially on syntax errors. Currently this information is omitted in dry runs, is this intentional?In [188]: %%bigquery --dry_run ...: SELECT SELECT * FROM `bigquery-public-data.samples.shakespeare` LIMIT 5; ...: ...: ...: ERROR: 400 POST https://www.googleapis.com/bigquery/v2/projects/precise-truck-742/jobs: Syntax error: Unexpected keyword SELECT at [1:8]Without the
--dry-runoption, the query gets printed as a part of the error message.
There was a problem hiding this comment.
So currently, destination_var doesn't return anything if an error occurs by default, even when the --dry_run flag isn't present.
Fine then, the --dry_run option does not have to deal with that, either. 👍
The error output is now much more useful, looks good from my side.
Co-Authored-By: Tim Swast <swast@google.com>
* added dry_run option to bigquery magics. when --dry_run flag is present, a QueryJob object is returned for inspection instead of an empty DataFrame * print estimated bytes instead of total bytes * updated docstring for _AsyncJob._begin * Update docstring for QueryJob._begin * added SQL query to error output and messaging for failure to save to variable in magics Co-Authored-By: Peter Lamut <plamut@users.noreply.github.com> Co-Authored-By: Tim Swast <swast@google.com>
* added dry_run option to bigquery magics. when --dry_run flag is present, a QueryJob object is returned for inspection instead of an empty DataFrame * print estimated bytes instead of total bytes * updated docstring for _AsyncJob._begin * Update docstring for QueryJob._begin * added SQL query to error output and messaging for failure to save to variable in magics Co-Authored-By: Peter Lamut <plamut@users.noreply.github.com> Co-Authored-By: Tim Swast <swast@google.com>



See #8143
Adding the
--dry_runflag returns aQueryJobinstead of a pandasDataFrameas requested in the original issue. TheQueryJobcan also be stored in a variable if thedestination_varmagic argument is present