Skip to content

Add DatasetDagRunQueue to all the consuming DAGs of a dataset alias #41264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

Lee-W
Copy link
Member

@Lee-W Lee-W commented Aug 5, 2024

Why

Ever since #40693, we have been able to schedule a DAG based on DatasetAlias. When a dataset alias is resolved in a producer DAG for the first time, a consumer DAG that depends on that dataset alias will have to wait for the next round of DAG parsing to realize its dependency on the resolved datasets. Consequently, the consumer DAG will need to wait for the second run of the producer DAG to be triggered.

What

This PR created DDRQ for the consuming dags of dataset alias as well. So after the consumer DAG is updated after DAG parsing, it will have DDRQ which might triggers it.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added the area:db-migrations PRs with DB migration label Aug 5, 2024
@Lee-W Lee-W force-pushed the shift-dataset-aliases-in-dag-schedule-resolution-before-dataset-event-is-created branch 3 times, most recently from c32fc46 to c70f670 Compare August 6, 2024 02:55
@Lee-W Lee-W changed the title feat: create dag_schedule_dataset_alias_reference Add DatasetDagRunQueue to all the consuming DAGs of a dataset alias Aug 6, 2024
@Lee-W Lee-W force-pushed the shift-dataset-aliases-in-dag-schedule-resolution-before-dataset-event-is-created branch 2 times, most recently from 87bf7d3 to 2aa49d9 Compare August 6, 2024 07:37
@Lee-W Lee-W marked this pull request as ready for review August 6, 2024 07:37
@Lee-W Lee-W force-pushed the shift-dataset-aliases-in-dag-schedule-resolution-before-dataset-event-is-created branch 3 times, most recently from 1a21af9 to f3375a2 Compare August 6, 2024 09:46
@phanikumv phanikumv added this to the Airflow 2.10.0 milestone Aug 6, 2024
@Lee-W Lee-W force-pushed the shift-dataset-aliases-in-dag-schedule-resolution-before-dataset-event-is-created branch from bc9b269 to 5e27790 Compare August 6, 2024 13:13
@phanikumv phanikumv merged commit c8bc42c into apache:main Aug 7, 2024
81 checks passed
@phanikumv phanikumv deleted the shift-dataset-aliases-in-dag-schedule-resolution-before-dataset-event-is-created branch August 7, 2024 02:15
@ephraimbuddy ephraimbuddy added changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) type:new-feature Changelog: New Features and removed changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) labels Aug 9, 2024
Lee-W added a commit to astronomer/airflow that referenced this pull request Oct 28, 2024
…setting the `alias_id` column as part of the primary key in the `dag_schedule_dataset_alias_reference` table. Previously, only dag_id was marked as the primary key, causing a mismatch with the local definition, which triggered an SAWarning.

Example: https://github.com/apache/airflow/actions/runs/11526187767/job/32090094094?pr=43243#step:6:745

When running a migration you see this. It was missed in apache#41264:
```
/opt/airflow/airflow/migrations/versions/0026_2_10_0_dag_schedule_dataset_alias_reference.py:46 SAWarning:
Table 'dag_schedule_dataset_alias_reference' specifies columns 'dag_id' as primary_key=True, not matching locally specified columns 'alias_id', 'dag_id'; setting the current primary key columns to 'alias_id', 'dag_id'.
This warning may become an exception in a future release
```

It is already a primary key in models:

https://github.com/apache/airflow/blob/e9192f5db32e453f150c73ad31287d4953e3c43d/airflow/models/asset.py#L290-L291

Kaxil verified that both columns are marked as primary keys already in 2.10.
romsharon98 pushed a commit that referenced this pull request Oct 28, 2024
…setting the `alias_id` column as part of the primary key in the `dag_schedule_dataset_alias_reference` table. Previously, only dag_id was marked as the primary key, causing a mismatch with the local definition, which triggered an SAWarning. (#43425)

Example: https://github.com/apache/airflow/actions/runs/11526187767/job/32090094094?pr=43243#step:6:745

When running a migration you see this. It was missed in #41264:
```
/opt/airflow/airflow/migrations/versions/0026_2_10_0_dag_schedule_dataset_alias_reference.py:46 SAWarning:
Table 'dag_schedule_dataset_alias_reference' specifies columns 'dag_id' as primary_key=True, not matching locally specified columns 'alias_id', 'dag_id'; setting the current primary key columns to 'alias_id', 'dag_id'.
This warning may become an exception in a future release
```

It is already a primary key in models:

https://github.com/apache/airflow/blob/e9192f5db32e453f150c73ad31287d4953e3c43d/airflow/models/asset.py#L290-L291

Kaxil verified that both columns are marked as primary keys already in 2.10.
utkarsharma2 pushed a commit that referenced this pull request Nov 1, 2024
…setting the `alias_id` column as part of the primary key in the `dag_schedule_dataset_alias_reference` table. Previously, only dag_id was marked as the primary key, causing a mismatch with the local definition, which triggered an SAWarning. (#43425)

Example: https://github.com/apache/airflow/actions/runs/11526187767/job/32090094094?pr=43243#step:6:745

When running a migration you see this. It was missed in #41264:
```
/opt/airflow/airflow/migrations/versions/0026_2_10_0_dag_schedule_dataset_alias_reference.py:46 SAWarning:
Table 'dag_schedule_dataset_alias_reference' specifies columns 'dag_id' as primary_key=True, not matching locally specified columns 'alias_id', 'dag_id'; setting the current primary key columns to 'alias_id', 'dag_id'.
This warning may become an exception in a future release
```

It is already a primary key in models:

https://github.com/apache/airflow/blob/e9192f5db32e453f150c73ad31287d4953e3c43d/airflow/models/asset.py#L290-L291

Kaxil verified that both columns are marked as primary keys already in 2.10.

(cherry picked from commit 72947cb)
utkarsharma2 pushed a commit that referenced this pull request Nov 1, 2024
…setting the `alias_id` column as part of the primary key in the `dag_schedule_dataset_alias_reference` table. Previously, only dag_id was marked as the primary key, causing a mismatch with the local definition, which triggered an SAWarning. (#43425)

Example: https://github.com/apache/airflow/actions/runs/11526187767/job/32090094094?pr=43243#step:6:745

When running a migration you see this. It was missed in #41264:
```
/opt/airflow/airflow/migrations/versions/0026_2_10_0_dag_schedule_dataset_alias_reference.py:46 SAWarning:
Table 'dag_schedule_dataset_alias_reference' specifies columns 'dag_id' as primary_key=True, not matching locally specified columns 'alias_id', 'dag_id'; setting the current primary key columns to 'alias_id', 'dag_id'.
This warning may become an exception in a future release
```

It is already a primary key in models:

https://github.com/apache/airflow/blob/e9192f5db32e453f150c73ad31287d4953e3c43d/airflow/models/asset.py#L290-L291

Kaxil verified that both columns are marked as primary keys already in 2.10.

(cherry picked from commit 72947cb)
utkarsharma2 pushed a commit that referenced this pull request Nov 1, 2024
…setting the `alias_id` column as part of the primary key in the `dag_schedule_dataset_alias_reference` table. Previously, only dag_id was marked as the primary key, causing a mismatch with the local definition, which triggered an SAWarning. (#43425)

Example: https://github.com/apache/airflow/actions/runs/11526187767/job/32090094094?pr=43243#step:6:745

When running a migration you see this. It was missed in #41264:
```
/opt/airflow/airflow/migrations/versions/0026_2_10_0_dag_schedule_dataset_alias_reference.py:46 SAWarning:
Table 'dag_schedule_dataset_alias_reference' specifies columns 'dag_id' as primary_key=True, not matching locally specified columns 'alias_id', 'dag_id'; setting the current primary key columns to 'alias_id', 'dag_id'.
This warning may become an exception in a future release
```

It is already a primary key in models:

https://github.com/apache/airflow/blob/e9192f5db32e453f150c73ad31287d4953e3c43d/airflow/models/asset.py#L290-L291

Kaxil verified that both columns are marked as primary keys already in 2.10.

(cherry picked from commit 72947cb)
utkarsharma2 pushed a commit that referenced this pull request Nov 1, 2024
…setting the `alias_id` column as part of the primary key in the `dag_schedule_dataset_alias_reference` table. Previously, only dag_id was marked as the primary key, causing a mismatch with the local definition, which triggered an SAWarning. (#43425)

Example: https://github.com/apache/airflow/actions/runs/11526187767/job/32090094094?pr=43243#step:6:745

When running a migration you see this. It was missed in #41264:
```
/opt/airflow/airflow/migrations/versions/0026_2_10_0_dag_schedule_dataset_alias_reference.py:46 SAWarning:
Table 'dag_schedule_dataset_alias_reference' specifies columns 'dag_id' as primary_key=True, not matching locally specified columns 'alias_id', 'dag_id'; setting the current primary key columns to 'alias_id', 'dag_id'.
This warning may become an exception in a future release
```

It is already a primary key in models:

https://github.com/apache/airflow/blob/e9192f5db32e453f150c73ad31287d4953e3c43d/airflow/models/asset.py#L290-L291

Kaxil verified that both columns are marked as primary keys already in 2.10.

(cherry picked from commit 72947cb)
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request May 6, 2025
…setting the `alias_id` column as part of the primary key in the `dag_schedule_dataset_alias_reference` table. Previously, only dag_id was marked as the primary key, causing a mismatch with the local definition, which triggered an SAWarning. (#43425)

Example: https://github.com/apache/airflow/actions/runs/11526187767/job/32090094094?pr=43243#step:6:745

When running a migration you see this. It was missed in apache/airflow#41264:
```
/opt/airflow/airflow/migrations/versions/0026_2_10_0_dag_schedule_dataset_alias_reference.py:46 SAWarning:
Table 'dag_schedule_dataset_alias_reference' specifies columns 'dag_id' as primary_key=True, not matching locally specified columns 'alias_id', 'dag_id'; setting the current primary key columns to 'alias_id', 'dag_id'.
This warning may become an exception in a future release
```

It is already a primary key in models:

https://github.com/apache/airflow/blob/e9192f5db32e453f150c73ad31287d4953e3c43d/airflow/models/asset.py#L290-L291

Kaxil verified that both columns are marked as primary keys already in 2.10.

(cherry picked from commit 72947cb09854f00d347b7c9cd09eb9ac89c75480)

GitOrigin-RevId: dd296c5338150cebe24c1edb46ba5a944f82a5eb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:db-migrations PRs with DB migration type:new-feature Changelog: New Features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy