Skip to content

Add soft-delete support and set dag_version FK on delete to SET NULL #50922

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

sunank200
Copy link
Collaborator

@sunank200 sunank200 commented May 21, 2025

This PR introduces soft-delete for DAG versions and changes the foreign-key behaviour on task_instance.dag_version_id so that deleting a version no longer cascades, but instead sets the reference to NULL.

closes: #49580


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@boring-cyborg boring-cyborg bot added area:API Airflow's REST/HTTP API area:db-migrations PRs with DB migration kind:documentation labels May 21, 2025
@sunank200 sunank200 force-pushed the delete-dag-version-records branch 2 times, most recently from 644ce09 to 0020deb Compare May 21, 2025 18:05
@sunank200 sunank200 requested a review from dstandish May 21, 2025 18:05
@sunank200 sunank200 force-pushed the delete-dag-version-records branch 2 times, most recently from 8e07a32 to 14a5876 Compare May 21, 2025 19:15
@sunank200 sunank200 marked this pull request as ready for review May 21, 2025 19:43
@sunank200 sunank200 force-pushed the delete-dag-version-records branch 2 times, most recently from 3ed69fc to e6947f5 Compare May 22, 2025 08:36
@sunank200 sunank200 requested review from uranusjr and Lee-W May 22, 2025 09:10
Copy link
Member

@Lee-W Lee-W left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me

@sunank200 sunank200 force-pushed the delete-dag-version-records branch from 445f8d9 to 8d042f3 Compare May 22, 2025 11:09
@uranusjr
Copy link
Member

This looks good, but I’m not sure we want to remove the cascade deletion. I feel it is fine if we do soft deletion; if the user really wants the record to be actually gone, the ti should be gone too. See discussion in linked issue.

@sunank200
Copy link
Collaborator Author

sunank200 commented May 23, 2025

soft deletion

@uranusjr I have just added soft deletion and added back cascade deletion. Please re-review.

But now that I think about it. Task instances are your single source of truth for what actually ran, when and with what outcome. If you delete the serialised-DAG (or its version), you wipe out every TaskInstance that pointed at it. That means no more execution history or retry state for those runs. Is this the right behaviour for the user?

@sunank200 sunank200 force-pushed the delete-dag-version-records branch from e7196d2 to 54a32bc Compare May 23, 2025 07:28
@sunank200 sunank200 force-pushed the delete-dag-version-records branch from 54a32bc to eeee740 Compare May 23, 2025 07:40
Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,

I don't see any change in the API / UI part. Meaning that active=False versions are still retrieved and displayed. I assume we want to propagate that change there too ?

I know active is to mark that the version was soft deleted or not. But it can be confused with which dag version is active for the dag, like what's the current version at. Dag version is always the latest one but I find that this could be confusing in a sense. (Or maybe that's just me)

@dstandish dstandish closed this May 23, 2025
@dstandish
Copy link
Contributor

Just closing this for now to prevent accidental merge --- I talked with @sunank200 and we are thinking that the best course of action is to simply change the ondelete behavior to be "on delete restrict" because it's always assumed that this value will be populated and this will simply prevent unintentional deletion of TI rows

@dstandish
Copy link
Contributor

Other notes: if we don't change the ondelete behavior, when adding soft delete does not actually solve the problem (risk of accidental TI deletion) becasue it's purely a software concept. So the question of whether we need soft deletes is really a separate issue.

And, I am not aware of any reason we actually want soft deletes or what the point of it would be or what the user interface would be.

So that's why we think, let's just add restrict.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:API Airflow's REST/HTTP API area:db-migrations PRs with DB migration kind:documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deleting dag version record deletes task instance record
5 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy