-
Notifications
You must be signed in to change notification settings - Fork 15.1k
Clear out the dag code and serialized_dag tables on 3.0 upgrade #49563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clear out the dag code and serialized_dag tables on 3.0 upgrade #49563
Conversation
3d3261d
to
f0284c5
Compare
2e4808c
to
9e58bc2
Compare
airflow-core/src/airflow/migrations/versions/0047_3_0_0_add_dag_versioning.py
Show resolved
Hide resolved
With 3.0.0 released should all migrations target 3.0.1 and should this change be done as a separate migration file? |
I don't think so. The way I think you should think about this is, it is just "fixing" the existing migration. We want to clear the data out before this migration runs. There's no way to do that before this runs besides modifying the migration -- or inserting a new migration in before it but... don't think that makes sense. And yes we could delete the data after, by inserting another migration after -- but then, we'd still be susceptible to finding new bugs in this particular migration. (since it would be run prior to the data deletion) Indeed, this isn't even the first change to this migration that will come in 3.0.1 -- see #49478. Additionally.... If a user has already upgraded to 3.0.0, we would not want to delete their serdags -- which is what would happen if we added a new migration file for 3.0.1. The end result here is that, users who upgraded to 3.0.0 from 2.x will have the migrated data. But, users who (perhaps prudently) wait for a patch release or two to come out, would have the truncate and reserialize behavior that this PR would enact. |
This will discard the v1 serdags and let them be reserialized after new dag processor starts up. Rather than go through the trouble of migrating the data for serialized dag and dag code, we can simply delete it and let it be regenerated after upgrade / downgrade. Why does this make sense? Prior to airflow version 3, both serialized_dag and dag_code would have been deleted every time the dag was reprocessed. So, it was always ephemeral in 2.x. And we typically did a `airflow dags reserialize` on upgrade. So this is just deleting it one more time and reserializing it one more time on the way to 3.0, after which we we _don't_ delete everything with each run of dag processor. There's little value in migrating the data when it can just be regenerated. Similarly, when going back down to airflow 2.x from 3.0, rather than migrating the data, just delete it. Because it will be regenerated in 2.x, and the PKs don't allow more than one version anyway.
9e58bc2
to
2a76028
Compare
This will discard the v1 serdags and let them be reserialized after new dag processor starts up. Rather than go through the trouble of migrating the data for serialized dag and dag code, we can simply delete it and let it be regenerated after upgrade / downgrade. Why does this make sense? Prior to airflow version 3, both serialized_dag and dag_code would have been deleted every time the dag was reprocessed. So, it was always ephemeral in 2.x. And we typically did a `airflow dags reserialize` on upgrade. So this is just deleting it one more time and reserializing it one more time on the way to 3.0, after which we we _don't_ delete everything with each run of dag processor. There's little value in migrating the data when it can just be regenerated. Similarly, when going back down to airflow 2.x from 3.0, rather than migrating the data, just delete it. Because it will be regenerated in 2.x, and the PKs don't allow more than one version anyway. (cherry picked from commit c7e5406)
…he#49563) This will discard the v1 serdags and let them be reserialized after new dag processor starts up. Rather than go through the trouble of migrating the data for serialized dag and dag code, we can simply delete it and let it be regenerated after upgrade / downgrade. Why does this make sense? Prior to airflow version 3, both serialized_dag and dag_code would have been deleted every time the dag was reprocessed. So, it was always ephemeral in 2.x. And we typically did a `airflow dags reserialize` on upgrade. So this is just deleting it one more time and reserializing it one more time on the way to 3.0, after which we we _don't_ delete everything with each run of dag processor. There's little value in migrating the data when it can just be regenerated. Similarly, when going back down to airflow 2.x from 3.0, rather than migrating the data, just delete it. Because it will be regenerated in 2.x, and the PKs don't allow more than one version anyway.
…he#49563) This will discard the v1 serdags and let them be reserialized after new dag processor starts up. Rather than go through the trouble of migrating the data for serialized dag and dag code, we can simply delete it and let it be regenerated after upgrade / downgrade. Why does this make sense? Prior to airflow version 3, both serialized_dag and dag_code would have been deleted every time the dag was reprocessed. So, it was always ephemeral in 2.x. And we typically did a `airflow dags reserialize` on upgrade. So this is just deleting it one more time and reserializing it one more time on the way to 3.0, after which we we _don't_ delete everything with each run of dag processor. There's little value in migrating the data when it can just be regenerated. Similarly, when going back down to airflow 2.x from 3.0, rather than migrating the data, just delete it. Because it will be regenerated in 2.x, and the PKs don't allow more than one version anyway.
Is there a way to fix this manually? airflow db migrate was failing (no dag_id in the dag_code table). I ended up deleting rows in the dag_code and serialized_dag tables but now my DAG processor has these errors:
I tried entering the dag processor pod and running |
If you have access to DB, truncate |
What this is about
This will discard the v1 serdags and let them be reserialized after new dag processor starts up.
Rather than go through the trouble of migrating the data for serialized dag and dag code, we can simply delete it and let it be regenerated after upgrade / downgrade.
Why does this make sense?
Prior to airflow version 3, both serialized_dag and dag_code would have been deleted every time the dag was reprocessed. So, it was always ephemeral in 2.x. And we typically did a
airflow dags reserialize
on upgrade.So this is just deleting it one more time and reserializing it one more time on the way to 3.0, after which we we don't delete everything with each run of dag processor.
There's little value in migrating the data when it can just be regenerated.
Similarly, when going back down to airflow 2.x from 3.0, rather than migrating the data, just delete it. Because it will be regenerated in 2.x, and the PKs don't allow more than one version anyway.
An important note
Immediately after upgrade, when the 3.0 api server (nee webserver) is up, if the dags have not been reserialized, e.g. by running
airflow dags reserialize
, or letting the dag processor hit them, then the dags will all be visible on the home page. But when you click into an individual dag, you won't see the task history until the dag is reprocessed, and the serdag recreated. We could bubble up a message about this like "No serdag... wait until your dag has been reprocessed" or somethnig. Or, we could just leave it as is and document that.