-
Notifications
You must be signed in to change notification settings - Fork 15.1k
Add DatasetDagRunQueue to all the consuming DAGs of a dataset alias #41264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
phanikumv
merged 12 commits into
apache:main
from
astronomer:shift-dataset-aliases-in-dag-schedule-resolution-before-dataset-event-is-created
Aug 7, 2024
Merged
Add DatasetDagRunQueue to all the consuming DAGs of a dataset alias #41264
phanikumv
merged 12 commits into
apache:main
from
astronomer:shift-dataset-aliases-in-dag-schedule-resolution-before-dataset-event-is-created
Aug 7, 2024
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
c32fc46
to
c70f670
Compare
87bf7d3
to
2aa49d9
Compare
1a21af9
to
f3375a2
Compare
uranusjr
reviewed
Aug 6, 2024
uranusjr
reviewed
Aug 6, 2024
…e object into queue_dagruns
…ink to a dag that was not previously link to the resolved dataset
…ding to the new feature added
bc9b269
to
5e27790
Compare
phanikumv
approved these changes
Aug 7, 2024
Lee-W
added a commit
to astronomer/airflow
that referenced
this pull request
Oct 28, 2024
…setting the `alias_id` column as part of the primary key in the `dag_schedule_dataset_alias_reference` table. Previously, only dag_id was marked as the primary key, causing a mismatch with the local definition, which triggered an SAWarning. Example: https://github.com/apache/airflow/actions/runs/11526187767/job/32090094094?pr=43243#step:6:745 When running a migration you see this. It was missed in apache#41264: ``` /opt/airflow/airflow/migrations/versions/0026_2_10_0_dag_schedule_dataset_alias_reference.py:46 SAWarning: Table 'dag_schedule_dataset_alias_reference' specifies columns 'dag_id' as primary_key=True, not matching locally specified columns 'alias_id', 'dag_id'; setting the current primary key columns to 'alias_id', 'dag_id'. This warning may become an exception in a future release ``` It is already a primary key in models: https://github.com/apache/airflow/blob/e9192f5db32e453f150c73ad31287d4953e3c43d/airflow/models/asset.py#L290-L291 Kaxil verified that both columns are marked as primary keys already in 2.10.
romsharon98
pushed a commit
that referenced
this pull request
Oct 28, 2024
…setting the `alias_id` column as part of the primary key in the `dag_schedule_dataset_alias_reference` table. Previously, only dag_id was marked as the primary key, causing a mismatch with the local definition, which triggered an SAWarning. (#43425) Example: https://github.com/apache/airflow/actions/runs/11526187767/job/32090094094?pr=43243#step:6:745 When running a migration you see this. It was missed in #41264: ``` /opt/airflow/airflow/migrations/versions/0026_2_10_0_dag_schedule_dataset_alias_reference.py:46 SAWarning: Table 'dag_schedule_dataset_alias_reference' specifies columns 'dag_id' as primary_key=True, not matching locally specified columns 'alias_id', 'dag_id'; setting the current primary key columns to 'alias_id', 'dag_id'. This warning may become an exception in a future release ``` It is already a primary key in models: https://github.com/apache/airflow/blob/e9192f5db32e453f150c73ad31287d4953e3c43d/airflow/models/asset.py#L290-L291 Kaxil verified that both columns are marked as primary keys already in 2.10.
utkarsharma2
pushed a commit
that referenced
this pull request
Nov 1, 2024
…setting the `alias_id` column as part of the primary key in the `dag_schedule_dataset_alias_reference` table. Previously, only dag_id was marked as the primary key, causing a mismatch with the local definition, which triggered an SAWarning. (#43425) Example: https://github.com/apache/airflow/actions/runs/11526187767/job/32090094094?pr=43243#step:6:745 When running a migration you see this. It was missed in #41264: ``` /opt/airflow/airflow/migrations/versions/0026_2_10_0_dag_schedule_dataset_alias_reference.py:46 SAWarning: Table 'dag_schedule_dataset_alias_reference' specifies columns 'dag_id' as primary_key=True, not matching locally specified columns 'alias_id', 'dag_id'; setting the current primary key columns to 'alias_id', 'dag_id'. This warning may become an exception in a future release ``` It is already a primary key in models: https://github.com/apache/airflow/blob/e9192f5db32e453f150c73ad31287d4953e3c43d/airflow/models/asset.py#L290-L291 Kaxil verified that both columns are marked as primary keys already in 2.10. (cherry picked from commit 72947cb)
utkarsharma2
pushed a commit
that referenced
this pull request
Nov 1, 2024
…setting the `alias_id` column as part of the primary key in the `dag_schedule_dataset_alias_reference` table. Previously, only dag_id was marked as the primary key, causing a mismatch with the local definition, which triggered an SAWarning. (#43425) Example: https://github.com/apache/airflow/actions/runs/11526187767/job/32090094094?pr=43243#step:6:745 When running a migration you see this. It was missed in #41264: ``` /opt/airflow/airflow/migrations/versions/0026_2_10_0_dag_schedule_dataset_alias_reference.py:46 SAWarning: Table 'dag_schedule_dataset_alias_reference' specifies columns 'dag_id' as primary_key=True, not matching locally specified columns 'alias_id', 'dag_id'; setting the current primary key columns to 'alias_id', 'dag_id'. This warning may become an exception in a future release ``` It is already a primary key in models: https://github.com/apache/airflow/blob/e9192f5db32e453f150c73ad31287d4953e3c43d/airflow/models/asset.py#L290-L291 Kaxil verified that both columns are marked as primary keys already in 2.10. (cherry picked from commit 72947cb)
utkarsharma2
pushed a commit
that referenced
this pull request
Nov 1, 2024
…setting the `alias_id` column as part of the primary key in the `dag_schedule_dataset_alias_reference` table. Previously, only dag_id was marked as the primary key, causing a mismatch with the local definition, which triggered an SAWarning. (#43425) Example: https://github.com/apache/airflow/actions/runs/11526187767/job/32090094094?pr=43243#step:6:745 When running a migration you see this. It was missed in #41264: ``` /opt/airflow/airflow/migrations/versions/0026_2_10_0_dag_schedule_dataset_alias_reference.py:46 SAWarning: Table 'dag_schedule_dataset_alias_reference' specifies columns 'dag_id' as primary_key=True, not matching locally specified columns 'alias_id', 'dag_id'; setting the current primary key columns to 'alias_id', 'dag_id'. This warning may become an exception in a future release ``` It is already a primary key in models: https://github.com/apache/airflow/blob/e9192f5db32e453f150c73ad31287d4953e3c43d/airflow/models/asset.py#L290-L291 Kaxil verified that both columns are marked as primary keys already in 2.10. (cherry picked from commit 72947cb)
utkarsharma2
pushed a commit
that referenced
this pull request
Nov 1, 2024
…setting the `alias_id` column as part of the primary key in the `dag_schedule_dataset_alias_reference` table. Previously, only dag_id was marked as the primary key, causing a mismatch with the local definition, which triggered an SAWarning. (#43425) Example: https://github.com/apache/airflow/actions/runs/11526187767/job/32090094094?pr=43243#step:6:745 When running a migration you see this. It was missed in #41264: ``` /opt/airflow/airflow/migrations/versions/0026_2_10_0_dag_schedule_dataset_alias_reference.py:46 SAWarning: Table 'dag_schedule_dataset_alias_reference' specifies columns 'dag_id' as primary_key=True, not matching locally specified columns 'alias_id', 'dag_id'; setting the current primary key columns to 'alias_id', 'dag_id'. This warning may become an exception in a future release ``` It is already a primary key in models: https://github.com/apache/airflow/blob/e9192f5db32e453f150c73ad31287d4953e3c43d/airflow/models/asset.py#L290-L291 Kaxil verified that both columns are marked as primary keys already in 2.10. (cherry picked from commit 72947cb)
kosteev
pushed a commit
to GoogleCloudPlatform/composer-airflow
that referenced
this pull request
May 6, 2025
…setting the `alias_id` column as part of the primary key in the `dag_schedule_dataset_alias_reference` table. Previously, only dag_id was marked as the primary key, causing a mismatch with the local definition, which triggered an SAWarning. (#43425) Example: https://github.com/apache/airflow/actions/runs/11526187767/job/32090094094?pr=43243#step:6:745 When running a migration you see this. It was missed in apache/airflow#41264: ``` /opt/airflow/airflow/migrations/versions/0026_2_10_0_dag_schedule_dataset_alias_reference.py:46 SAWarning: Table 'dag_schedule_dataset_alias_reference' specifies columns 'dag_id' as primary_key=True, not matching locally specified columns 'alias_id', 'dag_id'; setting the current primary key columns to 'alias_id', 'dag_id'. This warning may become an exception in a future release ``` It is already a primary key in models: https://github.com/apache/airflow/blob/e9192f5db32e453f150c73ad31287d4953e3c43d/airflow/models/asset.py#L290-L291 Kaxil verified that both columns are marked as primary keys already in 2.10. (cherry picked from commit 72947cb09854f00d347b7c9cd09eb9ac89c75480) GitOrigin-RevId: dd296c5338150cebe24c1edb46ba5a944f82a5eb
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why
Ever since #40693, we have been able to schedule a DAG based on DatasetAlias. When a dataset alias is resolved in a producer DAG for the first time, a consumer DAG that depends on that dataset alias will have to wait for the next round of DAG parsing to realize its dependency on the resolved datasets. Consequently, the consumer DAG will need to wait for the second run of the producer DAG to be triggered.
What
This PR created DDRQ for the consuming dags of dataset alias as well. So after the consumer DAG is updated after DAG parsing, it will have DDRQ which might triggers it.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.