Content-Length: 367161 | pFad | http://github.com/apache/airflow/pull/50730

41 Fix memory leak in DagFileProcessorManager by MaslikovEgor · Pull Request #50730 · apache/airflow · GitHub
Skip to content

Fix memory leak in DagFileProcessorManager #50730

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

MaslikovEgor
Copy link

Fix memory leak in DagFileProcessorManager

Wrap DagFileProcessorProcess.start() in try/except and ensure the logger filehandle is closed on startup failure to avoid file descriptor leaks. Update DagFileStat to record import errors and prevent tight retry loops, and switch metric tags to use string file paths.

Closes: #50708

Fix memory leak in DagFileProcessorManager

Wrap DagFileProcessorProcess.start() in try/except and ensure the logger filehandle is closed on startup failure to avoid file descriptor leaks. Update DagFileStat to record import errors and prevent tight retry loops, and switch metric tags to use string file paths.

Closes: apache#50708
Copy link

boring-cyborg bot commented May 17, 2025

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@eladkal eladkal added this to the Airflow 3.0.2 milestone May 19, 2025
@eladkal eladkal added the type:bug-fix Changelog: Bug Fixes label May 19, 2025
@zachliu
Copy link
Contributor

zachliu commented May 19, 2025

@MaslikovEgor out of curiosity, how did you discover this was the root cause of #50708?

@MaslikovEgor
Copy link
Author

@MaslikovEgor out of curiosity, how did you discover this was the root cause of #50708?

I have faced the same issue with manager.py before that is why I think I am pretty familiar with this module. The leak is not big that is why I checked how resources cleaned

@zachliu
Copy link
Contributor

zachliu commented May 20, 2025

unfortunately this doesn't fix the small memory leak. i understand you are trying to properly cleanup everything upon the failure of DagFileProcessorProcess.start(), but it seems the leak doesn't come from file handles here

to make sure everyone is on the same page. in #50708, i meant whether DagFileProcessorProcess.start() failed or not, the memory always leaked. my guess is the leak point is somewhere among the initialization steps. possibly airflow/__init__.py

return processor
except Exception as e:
# Clean up resources on failure to start
logger_filehandle.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to put this in the finally block instead?

Comment on lines +1059 to +1061
processor = self._processors.pop(proc, None)
if processor:
processor.logger_filehandle.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
processor = self._processors.pop(proc, None)
if processor:
processor.logger_filehandle.close()
if processor := self._processors.pop(proc, None):
processor.logger_filehandle.close()

There are elsewhere that can benefit from this pattern too.

@zachliu
Copy link
Contributor

zachliu commented May 21, 2025

i may have identified the leak...
i narrowed it down to these two library imports:

from importlib import metadata
from sqlalchemy import create_engine

each dag-processor's child process imports them but couldn't wipe clean the memory footprint. hence the "pinhole" leak
if anyone can confirm this, i'll close #50708

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Small memory leak in dag-processor after upgrade to airflow 3.0
4 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/apache/airflow/pull/50730

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy