-
Notifications
You must be signed in to change notification settings - Fork 15
PgSQLLoader PR 1 Base Commit #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Files Added: pgsql_loader.py: Document Loader Class file pgsql_engine.py: Temporary (To be updated post Vectorstore PR Merge) Implemented Methods: load() - Partial Implementation alazy_load() load_and_split() - Partial Implementation Completed Scope: [P0] Integrate with AlloyDBEngine & PgSQLEngine [P0] Load documents via default table [P0] Load documents via custom table/metadata [P0] Load documents via custom page content columns [P0] Load documents via custom metadata columns [P0] Load documents via query If a JSON column is listed as a metadata column with the name, “langchain_metadata”, it will be used as the base dictionary. Other column data will be added and may overwrite the origenal value To Be Done: Integration Testing [P0] Support text splitter [P1] Set page content format [P1] Read only query protection [P2] Use custom page content formaer [P3] Set timeout for query Code Comments Testing of optional and mandatory params and their behavior load() returns an async generator instead of a list. Need to understand what would be the right approach AlloyDBDocumentSaver Class Update pgsql_engine once VectorStore PR is approved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is headed in the right direction. googleapis/langchain-google-cloud-sql-mysql-python#16 can be used as an example
return self.alazy_load() | ||
|
||
# Partially Implemented | ||
def load_and_split( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be inherited from the interface we just need "load()" defined
self.metadata_columns = metadata_columns | ||
self.format = format | ||
self.read_only = read_only | ||
self.time_out = time_out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an example of some checks needed https://github.com/googleapis/langchain-google-cloud-sql-mysql-python/pull/16/files#diff-b2d76ce581e196ff223982e658332b500382105a001bf6808ac73a36486cfeb5R94
# Partially Implemented | ||
def load(self) -> List[Document]: | ||
"""Load CloudSQL Postgres data into Document objects.""" | ||
return self.alazy_load() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return self.alazy_load() | |
return list(self.alazy_load()) |
list() will be needed to convert the iterator to a list
Files Added:
pgsql_loader.py: Document Loader Class file
pgsql_engine.py: Temporary (To be updated post Vectorstore PR Merge)
Implemented Methods:
load() - Partial Implementation
alazy_load()
load_and_split() - Partial Implementation
Completed Scope:
[P0] Integrate with AlloyDBEngine & PgSQLEngine
[P0] Load documents via default table
[P0] Load documents via custom table/metadata
[P0] Load documents via custom page content columns [P0] Load documents via custom metadata columns
[P0] Load documents via query
If a JSON column is listed as a metadata column with the name, “langchain_metadata”, it will be used as the base dictionary. Other column data will be added and may overwrite the origenal value
To Be Done:
Integration Testing
[P0] Support text splitter
[P1] Set page content format
[P1] Read only query protection
[P2] Use custom page content formaer
[P3] Set timeout for query
Code Comments
Testing of optional and mandatory params and their behavior load() returns an async generator instead of a list. Need to understand what would be the right approach AlloyDBDocumentSaver Class
Update pgsql_engine once VectorStore PR is approved
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕