Skip to content

Clear out the dag code and serialized_dag tables on 3.0 upgrade #49563

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

dstandish
Copy link
Contributor

@dstandish dstandish commented Apr 22, 2025

What this is about

This will discard the v1 serdags and let them be reserialized after new dag processor starts up.

Rather than go through the trouble of migrating the data for serialized dag and dag code, we can simply delete it and let it be regenerated after upgrade / downgrade.

Why does this make sense?

Prior to airflow version 3, both serialized_dag and dag_code would have been deleted every time the dag was reprocessed. So, it was always ephemeral in 2.x. And we typically did a airflow dags reserialize on upgrade.

So this is just deleting it one more time and reserializing it one more time on the way to 3.0, after which we we don't delete everything with each run of dag processor.

There's little value in migrating the data when it can just be regenerated.

Similarly, when going back down to airflow 2.x from 3.0, rather than migrating the data, just delete it. Because it will be regenerated in 2.x, and the PKs don't allow more than one version anyway.

An important note

Immediately after upgrade, when the 3.0 api server (nee webserver) is up, if the dags have not been reserialized, e.g. by running airflow dags reserialize, or letting the dag processor hit them, then the dags will all be visible on the home page. But when you click into an individual dag, you won't see the task history until the dag is reprocessed, and the serdag recreated. We could bubble up a message about this like "No serdag... wait until your dag has been reprocessed" or somethnig. Or, we could just leave it as is and document that.

@dstandish dstandish force-pushed the simplify-migration-re-dag-versions-etc branch from 3d3261d to f0284c5 Compare April 22, 2025 16:29
@dstandish dstandish marked this pull request as ready for review April 22, 2025 19:10
@dstandish dstandish requested a review from ephraimbuddy as a code owner April 22, 2025 19:10
@dstandish dstandish force-pushed the simplify-migration-re-dag-versions-etc branch 2 times, most recently from 2e4808c to 9e58bc2 Compare April 22, 2025 20:47
@tirkarthi
Copy link
Contributor

With 3.0.0 released should all migrations target 3.0.1 and should this change be done as a separate migration file?

@dstandish
Copy link
Contributor Author

dstandish commented Apr 23, 2025

With 3.0.0 released should all migrations target 3.0.1 and should this change be done as a separate migration file?

I don't think so. The way I think you should think about this is, it is just "fixing" the existing migration. We want to clear the data out before this migration runs. There's no way to do that before this runs besides modifying the migration -- or inserting a new migration in before it but... don't think that makes sense.

And yes we could delete the data after, by inserting another migration after -- but then, we'd still be susceptible to finding new bugs in this particular migration. (since it would be run prior to the data deletion)

Indeed, this isn't even the first change to this migration that will come in 3.0.1 -- see #49478.

Additionally....

If a user has already upgraded to 3.0.0, we would not want to delete their serdags -- which is what would happen if we added a new migration file for 3.0.1.

The end result here is that, users who upgraded to 3.0.0 from 2.x will have the migrated data. But, users who (perhaps prudently) wait for a patch release or two to come out, would have the truncate and reserialize behavior that this PR would enact.

@kaxil kaxil added this to the Airflow 3.0.1 milestone Apr 24, 2025
This will discard the v1 serdags and let them be reserialized after new dag processor starts up.

Rather than go through the trouble of migrating the data for serialized dag and dag code, we can simply delete it and let it be regenerated after upgrade / downgrade.

Why does this make sense?

Prior to airflow version 3, both serialized_dag and dag_code would have been deleted every time the dag was reprocessed.  So, it was always ephemeral in 2.x.  And we typically did a `airflow dags reserialize` on upgrade.

So this is just deleting it one more time and reserializing it one more time on the way to 3.0, after which we we _don't_ delete everything with each run of dag processor.

There's little value in migrating the data when it can just be regenerated.

Similarly, when going back down to airflow 2.x from 3.0, rather than migrating the data, just delete it.  Because it will be regenerated in 2.x, and the PKs don't allow more than one version anyway.
@dstandish dstandish force-pushed the simplify-migration-re-dag-versions-etc branch from 9e58bc2 to 2a76028 Compare April 24, 2025 17:28
@dstandish dstandish merged commit c7e5406 into apache:main Apr 24, 2025
51 checks passed
@dstandish dstandish deleted the simplify-migration-re-dag-versions-etc branch April 24, 2025 18:01
kaxil pushed a commit that referenced this pull request Apr 24, 2025
This will discard the v1 serdags and let them be reserialized after new dag processor starts up.

Rather than go through the trouble of migrating the data for serialized dag and dag code, we can simply delete it and let it be regenerated after upgrade / downgrade.

Why does this make sense?

Prior to airflow version 3, both serialized_dag and dag_code would have been deleted every time the dag was reprocessed.  So, it was always ephemeral in 2.x.  And we typically did a `airflow dags reserialize` on upgrade.

So this is just deleting it one more time and reserializing it one more time on the way to 3.0, after which we we _don't_ delete everything with each run of dag processor.

There's little value in migrating the data when it can just be regenerated.

Similarly, when going back down to airflow 2.x from 3.0, rather than migrating the data, just delete it.  Because it will be regenerated in 2.x, and the PKs don't allow more than one version anyway.

(cherry picked from commit c7e5406)
prabhusneha pushed a commit to astronomer/airflow that referenced this pull request Apr 25, 2025
…he#49563)

This will discard the v1 serdags and let them be reserialized after new dag processor starts up.

Rather than go through the trouble of migrating the data for serialized dag and dag code, we can simply delete it and let it be regenerated after upgrade / downgrade.

Why does this make sense?

Prior to airflow version 3, both serialized_dag and dag_code would have been deleted every time the dag was reprocessed.  So, it was always ephemeral in 2.x.  And we typically did a `airflow dags reserialize` on upgrade.

So this is just deleting it one more time and reserializing it one more time on the way to 3.0, after which we we _don't_ delete everything with each run of dag processor.

There's little value in migrating the data when it can just be regenerated.

Similarly, when going back down to airflow 2.x from 3.0, rather than migrating the data, just delete it.  Because it will be regenerated in 2.x, and the PKs don't allow more than one version anyway.
jroachgolf84 pushed a commit to jroachgolf84/airflow that referenced this pull request Apr 30, 2025
…he#49563)

This will discard the v1 serdags and let them be reserialized after new dag processor starts up.

Rather than go through the trouble of migrating the data for serialized dag and dag code, we can simply delete it and let it be regenerated after upgrade / downgrade.

Why does this make sense?

Prior to airflow version 3, both serialized_dag and dag_code would have been deleted every time the dag was reprocessed.  So, it was always ephemeral in 2.x.  And we typically did a `airflow dags reserialize` on upgrade.

So this is just deleting it one more time and reserializing it one more time on the way to 3.0, after which we we _don't_ delete everything with each run of dag processor.

There's little value in migrating the data when it can just be regenerated.

Similarly, when going back down to airflow 2.x from 3.0, rather than migrating the data, just delete it.  Because it will be regenerated in 2.x, and the PKs don't allow more than one version anyway.
@ldacey
Copy link
Contributor

ldacey commented May 3, 2025

Is there a way to fix this manually?

airflow db migrate was failing (no dag_id in the dag_code table). I ended up deleting rows in the dag_code and serialized_dag tables but now my DAG processor has these errors:

    latest_ser_dag._data = new_serialized_dag._data
    ^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute '_data'

I tried entering the dag processor pod and running airflow dags reserialize but no luck.

@kaxil
Copy link
Member

kaxil commented May 7, 2025

Is there a way to fix this manually?

airflow db migrate was failing (no dag_id in the dag_code table). I ended up deleting rows in the dag_code and serialized_dag tables but now my DAG processor has these errors:

    latest_ser_dag._data = new_serialized_dag._data
    ^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute '_data'

I tried entering the dag processor pod and running airflow dags reserialize but no luck.

If you have access to DB, truncate dag_code and serialized_dag table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy