Skip to content

feat: BigtableLoader and BigtableSaver implementation#5

Merged
ron-gal merged 18 commits intomainfrom
loader
Feb 7, 2024
Merged

feat: BigtableLoader and BigtableSaver implementation#5
ron-gal merged 18 commits intomainfrom
loader

Conversation

@ron-gal
Copy link
Contributor

@ron-gal ron-gal commented Jan 23, 2024

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@ron-gal ron-gal requested a review from a team January 23, 2024 18:38
@ron-gal ron-gal changed the title Loader feat: BigtableLoader and BigtableSaver implementation Jan 23, 2024
@product-auto-label product-auto-label bot added the api: bigtable Issues related to the googleapis/langchain-google-bigtable-python API. label Feb 6, 2024
"## Pre-reqs"
"## Setting up\n",
"\n",
"To run this notebook, you will need a Google Cloud Project, a Bigtable instance, and Google credentials."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should include link to instructions at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"metadata": {},
"source": [
"### Load from table"
"You can fetch the documents by calling the `load` method of the loader. It will return a list with all the documents. If you want to avoid this blocking call, you can call `lazy_load` method that returns an Iterator."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should encourage lazy_load -- load is only around for legacy reasons IIUC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 107 to 110
"docs_iterator = loader.lazy_load()\n",
"for doc in docs_iterator:\n",
" print(doc)\n",
" break"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should probably just be: for doc in loader.lazy_load()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@kurtisvg kurtisvg self-assigned this Feb 7, 2024
@kurtisvg kurtisvg self-requested a review February 7, 2024 16:09
Comment on lines 48 to 63
class MetadataMapping:
def __init__(
self,
column_family: str,
column_name: str,
metadata_key: str,
encoding: Encoding,
custom_encoding_func: Callable[[Any], bytes] = _not_implemented,
custom_decoding_func: Callable[[bytes], Any] = _not_implemented,
):
self.column_family = column_family
self.column_name = column_name
self.metadata_key = metadata_key
self.encoding = encoding
self.custom_encoding_func = custom_encoding_func
self.custom_decoding_func = custom_decoding_func
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we just use a dataclass here? https://docs.python.org/3/library/dataclasses.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


from langchain_google_bigtable.document_loader import BigtableLoader, BigtableSaver

__all__ = ["BigtableLoader", "BigtableSaver"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to expose Encoding + Metadata mapping here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

metadata_mappings: Optional. The array of mappings that maps from Bigtable columns to keys on the metadata dictionary, including the encoding to use when mapping from Bigtable bytes to a python type.
"""
self.client = (
(client or bigtable.Client(admin=True))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be using the default client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ron-gal ron-gal requested a review from kurtisvg February 7, 2024 16:31
.table(table_id)
)
if content_encoding not in SUPPORTED_DOC_ENCODING:
raise ValueError(f"{content_encoding} not in {SUPPORTED_DOC_ENCODING}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
raise ValueError(f"{content_encoding} not in {SUPPORTED_DOC_ENCODING}")
raise ValueError(f"content_encoding '{content_encoding}' not supported for content (must be {SUPPORTED_DOC_ENCODING}")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in follow up PR

@ron-gal ron-gal merged commit c0f4244 into main Feb 7, 2024
@ron-gal ron-gal deleted the loader branch February 7, 2024 16:48
@release-please release-please bot mentioned this pull request Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigtable Issues related to the googleapis/langchain-google-bigtable-python API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants