BigQuery: deprecate client.dataset() part 1#9032
BigQuery: deprecate client.dataset() part 1#9032IlyaFaer merged 33 commits intogoogleapis:masterfrom
client.dataset() part 1#9032Conversation
sync forks
*.rst + *.py + test + conf +
Methods were divided into 3 files: - add label - get labels - delete labels *.rst - docs updated tests passed successfully
minor corrections, 'dataset_exists' moved to the 'Getting a Dataset' section
grammar fix
Deleted extra 'samples/'
chged quotes + added dots
import updates
Chged assertion unit
corrected the test asserts
* deleted unnecessary schema (add_empty_column) * added 'get_dataset' method to check that dataset actually was updated (label_dataset & delete_dataset_labels) * chged name of the fixture on more suitable (conftest & test_delete_dataset_labels & test_get_dataset_labels & list_datasets_by_label) * deleted unnecessary 'labels' variable (test_label_dataset)
|
All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the ℹ️ Googlers: Go here for more info. |
bigquery/samples/label_dataset.py
Outdated
|
|
||
| dataset = client.update_dataset(dataset, ["labels"]) | ||
|
|
||
| dataset = client.get_dataset(dataset_id) |
There was a problem hiding this comment.
This line was additionally added to check that changes were uploaded to the source.
There was a problem hiding this comment.
update_dataset returns a full dataset resource, so this should be unnecessary in the sample. I agree that it makes sense to have this in the test for the sample.
| dataset = client.get_dataset(dataset_id) |
|
|
||
| dataset = client.update_dataset(dataset, ["labels"]) | ||
|
|
||
| dataset = client.get_dataset(dataset_id) |
There was a problem hiding this comment.
There was a problem hiding this comment.
Let's move this to the sample test. Since update_dataset returns the full resource, get_dataset after update_dataset is unnecessary for normal use.
| dataset = client.get_dataset(dataset_id) |
|
@googlebot I consent |
|
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
tswast
left a comment
There was a problem hiding this comment.
Thank you for your contribution. Looking pretty good, but I'd like to make some improvements to the samples while we are making these changes.
Thank you for your patience regarding the code review.
bigquery/samples/label_dataset.py
Outdated
|
|
||
| dataset = client.update_dataset(dataset, ["labels"]) | ||
|
|
||
| dataset = client.get_dataset(dataset_id) |
There was a problem hiding this comment.
update_dataset returns a full dataset resource, so this should be unnecessary in the sample. I agree that it makes sense to have this in the test for the sample.
| dataset = client.get_dataset(dataset_id) |
bigquery/samples/label_dataset.py
Outdated
| # dataset_id = "your-project.your_dataset" | ||
|
|
||
| dataset = client.get_dataset(dataset_id) | ||
|
|
There was a problem hiding this comment.
This is too much whitespace for a code sample.
bigquery/samples/label_dataset.py
Outdated
| dataset = client.get_dataset(dataset_id) | ||
|
|
||
| dataset.labels = {"color": "green"} | ||
|
|
There was a problem hiding this comment.
I think it makes more sense to group all these lines together.
|
|
||
| dataset = client.update_dataset(dataset, ["labels"]) | ||
|
|
||
| dataset = client.get_dataset(dataset_id) |
There was a problem hiding this comment.
Let's move this to the sample test. Since update_dataset returns the full resource, get_dataset after update_dataset is unnecessary for normal use.
| dataset = client.get_dataset(dataset_id) |
|
|
||
| # Load all rows from a table | ||
| rows = client.list_rows(table) | ||
| if len(list(rows)) == table.num_rows: |
There was a problem hiding this comment.
| if len(list(rows)) == table.num_rows: | |
| # Iterate over rows to make the API requests to fetch row data. | |
| rows = list(rows_iter) |
There was a problem hiding this comment.
I know we had asserts in these samples before, but I've changed my mind regarding these statements. I think this causes more confusion than it explains and should be removed from the sample.
|
|
||
| def test_get_dataset_labels(capsys, client, dataset_id, dataset_with_labels_id): | ||
|
|
||
| get_dataset_labels.get_dataset_labels(client, dataset_id) |
There was a problem hiding this comment.
Shouldn't this be dataset_with_labels_id? The dataset_id fixture is not needed, right?
bigquery/samples/tests/conftest.py
Outdated
|
|
||
|
|
||
| @pytest.fixture | ||
| def table_w_data(client): |
There was a problem hiding this comment.
Let's call this table_with_data
There was a problem hiding this comment.
I changed this fixture to return the table_id instead of the table reference. That's why I assume better name is gonna be table_with_data_id.
bigquery/samples/tests/conftest.py
Outdated
|
|
||
|
|
||
| @pytest.fixture | ||
| def dataset_with_labels_id(client, dataset_id): |
There was a problem hiding this comment.
Optional: Rather than make this fixture, we could combine all the "labels" tests into a common test file. See the models samples for an example.
There was a problem hiding this comment.
I decided to follow your advice and combined all dataset_label tests into one.
|
|
||
| dataset = client.get_dataset(dataset_id) | ||
|
|
||
| print("Dataset ID: {}".format(dataset_id)) |
There was a problem hiding this comment.
I wonder if we really want to be showing how to print the labels every time? It feels redundant.
Let's instead make a call to get_dataset in the tests for this sample and make sure the color label has been removed.
| print("First {} rows of the table are loaded".format(number_of_rows)) | ||
|
|
||
| # Specify selected fields to limit the results to certain columns | ||
| fields = table.schema[:2] # first two columns |
There was a problem hiding this comment.
@tswast Most of the changes are clear to me, but I assume it's gonna be a problem here. Since I removed table = client.get_table(table_id), I can't find the way how to select only 2 columns to insert them into the selected_fields option.
There was a problem hiding this comment.
Makes sense. Yes, we can keep the get_table call for this part of the sample.
bigquery/samples/tests/conftest.py
Outdated
|
|
||
|
|
||
| @pytest.fixture | ||
| def dataset_with_labels_id(client, dataset_id): |
There was a problem hiding this comment.
I decided to follow your advice and combined all dataset_label tests into one.
bigquery/samples/tests/conftest.py
Outdated
|
|
||
|
|
||
| @pytest.fixture | ||
| def table_w_data(client): |
There was a problem hiding this comment.
I changed this fixture to return the table_id instead of the table reference. That's why I assume better name is gonna be table_with_data_id.
tswast
left a comment
There was a problem hiding this comment.
Looking good. Thank you for making those updates. Just a few more suggestions.
| print("First {} rows of the table are loaded".format(number_of_rows)) | ||
|
|
||
| # Specify selected fields to limit the results to certain columns | ||
| fields = table.schema[:2] # first two columns |
There was a problem hiding this comment.
Makes sense. Yes, we can keep the get_table call for this part of the sample.
bigquery/samples/tests/conftest.py
Outdated
| def table_with_data_id(client): | ||
| dataset = client.get_dataset("bigquery-public-data.samples") | ||
| table = dataset.table("shakespeare") | ||
| return "{}.{}.{}".format(table.project, table.dataset_id, table.table_id) |
There was a problem hiding this comment.
| return "{}.{}.{}".format(table.project, table.dataset_id, table.table_id) | |
| return "bigquery-public-data.samples.shakespeare" |
bigquery/samples/tests/conftest.py
Outdated
| @pytest.fixture | ||
| def table_with_data_id(client): | ||
| dataset = client.get_dataset("bigquery-public-data.samples") | ||
| table = dataset.table("shakespeare") |
There was a problem hiding this comment.
| table = dataset.table("shakespeare") |
bigquery/samples/tests/conftest.py
Outdated
|
|
||
| @pytest.fixture | ||
| def table_with_data_id(client): | ||
| dataset = client.get_dataset("bigquery-public-data.samples") |
There was a problem hiding this comment.
The API call to get_dataset is unnecessary.
| dataset = client.get_dataset("bigquery-public-data.samples") |
|
|
||
| if datasets: | ||
| print("Datasets filtered by {}:".format(label_filter)) | ||
| for dataset in datasets: # API request(s) |
There was a problem hiding this comment.
Since you've already called list on the return value, this actually doesn't make any additional API requests.
| for dataset in datasets: # API request(s) | |
| for dataset in datasets: |
* rewrote table_with_data_id fixture * removed extra "# API request"
|
|
||
| dataset = client.update_dataset(dataset, ["labels"]) | ||
| print("Labels deleted from {}".format(dataset_id)) | ||
| # [END bigquery_delete_label_dataset] |
There was a problem hiding this comment.
Let's have this function return dataset after the # [END ...] line, so it's outside the sample. Then update the test for this sample to verify that dataset.labels.get("color") == None.
tswast
left a comment
There was a problem hiding this comment.
Looks great! Thanks for your patience with the review.
You might have to update to master to get the fix for the failing unit tests (due to resumable upload package release).
|
@tswast, all checks are finished with OK, so I think it can be merged |
Deprecate `client.dataset()` part 1
Deprecate `client.dataset()` part 1
Deprecate `client.dataset()` part 1
Towards #8989
This PR contains five snippets: