feat: add default timeout for Client.get_job()#1935
feat: add default timeout for Client.get_job()#1935Linchin merged 24 commits intogoogleapis:mainfrom
Conversation
tswast
left a comment
There was a problem hiding this comment.
Please add some tests where *Job.reload(timeout=None), *Job.done(timeout=None), and *Job.result(timeout=None) disable the timeouts for jobs.get.
google/cloud/bigquery/job/base.py
Outdated
| extra_params["location"] = self.location | ||
| span_attributes = {"path": self.path} | ||
| kwargs: Dict[str, Any] = {} | ||
| if type(timeout) is object: |
There was a problem hiding this comment.
Let's add some comments explaining why we need to omit the timeout if we get _DEFAULT_VALUE. (Because we want to use the default API-level timeouts but wait indefinitely for the job to finish)
There was a problem hiding this comment.
I added explanation why kwargs is used here, but I feel like Queryjob.result() is a better place to explain the default timeout behavior, so I added it in docstring there instead.
google/cloud/bigquery/job/base.py
Outdated
| self, | ||
| retry: "retries.Retry" = DEFAULT_RETRY, | ||
| timeout: Optional[float] = None, | ||
| timeout: Optional[Union[float, object]] = PollingFuture._DEFAULT_VALUE, |
There was a problem hiding this comment.
Done isn't waiting for the job to complete, so I wonder if we actually need PollingFuture._DEFAULT_VALUE here?
It seems more analogous to Client.get_job so maybe we can use the same DEFAULT_GET_JOB_TIMEOUT, here? 🤔 We need to double-check that we aren't calling any other API requests like getQueryResults, if so.
There was a problem hiding this comment.
done() calls reload(), which in turn calls Client.get_job(). So I added the sentinel values along the path result() -> done() -> reload(). Indeed I realize we don't have to add sentinel for done() or reload(), because it will be passed from result() anyway. As to getQueryResults, I don't think it's called from done() onward (it's used in result()). I think either way (using PollingFuture._DEFAULT_VALUE or DEFAULT_GET_JOB_TIMEOUT) it's effectively the same, so I think either way it's fine. Are there any other factors I'm not aware of?
There was a problem hiding this comment.
I'd like to keep our type annotations as narrow as we can. My worry with PollingFuture._DEFAULT_VALUE is it opens it up to object, which technically is anything (string, bytes, anything) in Python.
There was a problem hiding this comment.
Indeed, I ended up only using sentinel for result(), and used DEFAULT_GET_JOB_TIMEOUT for subsequent calls.
google/cloud/bigquery/job/query.py
Outdated
| if type(timeout) is object: | ||
| timeout = None | ||
| get_job_timeout = PollingFuture._DEFAULT_VALUE | ||
| else: | ||
| get_job_timeout = timeout |
There was a problem hiding this comment.
Why not do the kwargs trick here too?
There was a problem hiding this comment.
I wonder if it's necessary - the purpose here is slightly different from when calling client.get_job(). I'm only preserving the sentinel value for the job.done() call, and use None as default for any other calls.
There was a problem hiding this comment.
done, following later comments
google/cloud/bigquery/job/base.py
Outdated
| client=None, | ||
| retry: "retries.Retry" = DEFAULT_RETRY, | ||
| timeout: Optional[float] = None, | ||
| timeout: Optional[Union[float, object]] = POLLING_DEFAULT_VALUE, |
There was a problem hiding this comment.
I'd much rather this defaulted to the same timeout as get_job() and anywhere that calls reload(), we make sure we do the check for POLLING_DEFAULT_VALUE before calling reload() to make sure we only pass in float or None or omit the value.
google/cloud/bigquery/job/base.py
Outdated
| self, | ||
| retry: "retries.Retry" = DEFAULT_RETRY, | ||
| timeout: Optional[float] = None, | ||
| timeout: Optional[Union[float, object]] = POLLING_DEFAULT_VALUE, |
There was a problem hiding this comment.
Same here. I don't think this needs to handle POLLING_DEFAULT_VALUE since it's not waiting for the job to finish, just making at most 1 API call.
google/cloud/bigquery/job/query.py
Outdated
| # timeout to QueryJob.done(), and use None for the other timeouts. | ||
| if type(timeout) is object: | ||
| timeout = None | ||
| get_job_timeout = POLLING_DEFAULT_VALUE |
There was a problem hiding this comment.
Why not set to DEFAULT_GET_JOB_TIMEOUT or do the kwargs trick to omit it?
Co-authored-by: Tim Sweña (Swast) <swast@google.com>
Co-authored-by: Tim Sweña (Swast) <swast@google.com>
tswast
left a comment
There was a problem hiding this comment.
Love it! Just a couple of typos to fix please before merging
When calling
QueryJob.result(),QueryJob.reload()is called in the background with an empty timeout value. This leadsQueryJobto waiting indefinitely ifQueryJob.reload()loses connection.This PR:
QueryJob.reload()to callClient.get_job()instead of calling API directlyClient.get_job()Fixes #1922 🦕