-
Notifications
You must be signed in to change notification settings - Fork 15.1k
Add delete_by_property method in weaviate hook #50735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add delete_by_property method in weaviate hook #50735
Conversation
a609c21
to
f65174f
Compare
Add a DAG test, which delete the objects properly. Test Scenarios
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, it looks good. left a few nits
providers/weaviate/src/airflow/providers/weaviate/hooks/weaviate.py
Outdated
Show resolved
Hide resolved
f65174f
to
55772ae
Compare
0217d64
to
51d15e3
Compare
providers/weaviate/src/airflow/providers/weaviate/hooks/weaviate.py
Outdated
Show resolved
Hide resolved
self.log.error(e) | ||
failed_collection_list.append(collection_name) | ||
elif if_error == "stop": | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we guard other cases? (It normally won't happen, but could be a typo or so)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I am actually thinking about add one more generic Exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I feel this can result in the logic in the except block being duplicated. Not sure if there is a good way to implement it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't really need elif if_error == "stop"
here. we probably could try
else:
raise e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am facing a trade off here. I can define a function, just wondering if that is necessary, or there is a better way to keep code DRY.
Option 1
except (
weaviate.exceptions.UnexpectedStatusCodeException,
weaviate.exceptions.WeaviateDeleteManyError,
Exception # capture generic exception, but the above two can also be captured by this, keeping for better readability
) as e:
if if_error == "continue":
self.log.error(e)
failed_collection_list.append(collection_name)
else:
raise e
Option 2
except (
weaviate.exceptions.UnexpectedStatusCodeException,
weaviate.exceptions.WeaviateDeleteManyError
) as e:
if if_error == "continue":
self.log.error(e)
failed_collection_list.append(collection_name)
else:
raise e
except Exception as e: # use another except block, but code don't follow DRY
if if_error == "continue":
self.log.error(e)
failed_collection_list.append(collection_name)
else:
raise e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any known exception other than those 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I haven’t observed any failure while running the DAG testing. Probably due to the number of objects are small if the Filter is defined proper. For the two exceptions, I found those from the client API document, one is related to the connection and one is related to the delete_many operation. 🤔 I think I can define some invalid filters to see what issue may be raised. Like filter on a property that doesn’t exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. We should avoid using Exception
and AirflowException
unless necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I run multiple tests with invalid filter, and the WeaviateDeleteManyError
is captured. Basically, in the method, there are two main operations. The first one is to get the collection.
Scenarios
- Collection not found:
WeaviateDeleteManyError
is captured, e.g.could not find class Collection_c in schema.
- Connection issue, I think this should be captured by
UnexpectedStatusCodeException
- Invalid filter,
WeaviateDeleteManyError
is captured. e.g.,no such prop with name 'label' found in class
…llections based on various filtering criteria
51d15e3
to
55c377f
Compare
Motivation
By adding the
delete_by_property
method, users can delete multiple objects in multiple collections based on various filtering criteria, making data management more targeted and efficient.Close #42565
Reference
https://weaviate.io/developers/weaviate/search/filters
https://weaviate.io/developers/weaviate/manage-data/delete#delete-multiple-objects
Testing
Testing Code Sample
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in airflow-core/newsfragments.