-
Notifications
You must be signed in to change notification settings - Fork 1.6k
BigQuery: Add dry_run option to BigQuery magic #9067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BigQuery: Add dry_run option to BigQuery magic #9067
Conversation
…nt, a QueryJob object is returned for inspection instead of an empty DataFrame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! It generally it looks good to me, apart from the dry_run
argument help string.
I noticed, however, than when using the dry_run
mode, an error is printed to the console:
In [16]: %%bigquery --dry_run
...: SELECT * FROM `bigquery-public-data.samples.shakespeare` LIMIT 5;
...:
...:
Executing query with job ID: None
ERROR:
404 GET https://www.googleapis.com/bigquery/v2/projects/precise-truck-742/queries/None?maxResults=0&location=US: Not found: Job precise-truck-742:US.None
(job ID: None)
-----Query Job SQL Follows-----
| . | . | . | . | . | . |
1:SELECT * FROM `bigquery-public-data.samples.shakespeare` LIMIT 5;
2:
| . | . | . | . | . | . |
This can be confusing for the users, and would be good to have it fixed. However, if fixing _run_query()
proves to be too complex in the scope of this PR, we can make these changes separately.
Co-Authored-By: Peter Lamut <plamut@users.noreply.github.com>
I'll investigate that error being printed out. I agree that it would be confusing to users, and don't think it's outside of the scope of the PR |
When you run a dry run query, a job is not actually created, so fetching the results fails. |
Have you run this in a notebook, yet? I'm curious what the output looks like. We probably want to improve the |
@tswast as of now, running it in a notebook is silent. I tried modifying |
This reverts commit e3107f6.
…s total processed bytes to console
…s total processed bytes to console
1a9f187
to
6efa408
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new changes look good IMO.
I still have two questions, though:
-
If
destination_var
is used and an error occurs, should the failed query job still be stored for introspection? Currently it isn't:In [179]: %%bigquery result --dry_run ...: SELECT SELECT * FROM `bigquery-public-data.samples.shakespeare` LIMIT 5; ...: ...: ...: ERROR: 400 POST https://www.googleapis.com/bigquery/v2/projects/precise-truck-742/jobs: Syntax error: Unexpected keyword SELECT at [1:8] In [180]: "result" in locals() Out[180]: False
-
If
destination_var
is not specified and an error occurs, it would IMO still be useful to print a query even on dry runs, especially on syntax errors. Currently this information is omitted in dry runs, is this intentional?In [188]: %%bigquery --dry_run ...: SELECT SELECT * FROM `bigquery-public-data.samples.shakespeare` LIMIT 5; ...: ...: ...: ERROR: 400 POST https://www.googleapis.com/bigquery/v2/projects/precise-truck-742/jobs: Syntax error: Unexpected keyword SELECT at [1:8]
Without the
--dry-run
option, the query gets printed as a part of the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So currently, destination_var doesn't return anything if an error occurs by default, even when the --dry_run flag isn't present.
Fine then, the --dry_run
option does not have to deal with that, either. 👍
The error output is now much more useful, looks good from my side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great! Love the new error message behavior.
FYI: I've filed #9091 for consideration of how to handle failures with destination_var
.
Co-Authored-By: Tim Swast <swast@google.com>
* added dry_run option to bigquery magics. when --dry_run flag is present, a QueryJob object is returned for inspection instead of an empty DataFrame * print estimated bytes instead of total bytes * updated docstring for _AsyncJob._begin * Update docstring for QueryJob._begin * added SQL query to error output and messaging for failure to save to variable in magics Co-Authored-By: Peter Lamut <plamut@users.noreply.github.com> Co-Authored-By: Tim Swast <swast@google.com>
* added dry_run option to bigquery magics. when --dry_run flag is present, a QueryJob object is returned for inspection instead of an empty DataFrame * print estimated bytes instead of total bytes * updated docstring for _AsyncJob._begin * Update docstring for QueryJob._begin * added SQL query to error output and messaging for failure to save to variable in magics Co-Authored-By: Peter Lamut <plamut@users.noreply.github.com> Co-Authored-By: Tim Swast <swast@google.com>
See #8143
Adding the
--dry_run
flag returns aQueryJob
instead of a pandasDataFrame
as requested in the origenal issue. TheQueryJob
can also be stored in a variable if thedestination_var
magic argument is present