Skip to content

remote logging for s3 fails while still looking for internal logs #50866

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 tasks done
KaYunKIM opened this issue May 21, 2025 · 6 comments
Open
2 tasks done

remote logging for s3 fails while still looking for internal logs #50866

KaYunKIM opened this issue May 21, 2025 · 6 comments
Labels
area:core area:logging kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet

Comments

@KaYunKIM
Copy link

Apache Airflow version

3.0.1

If "Other Airflow 2 version" selected, which one?

No response

What happened?

After upgrading to airflow 3.0.1, remote logging is not working as intended.
I have set airflow.cfg for remote logging as below:
remote_logging = True remote_log_conn_id = s3_conn delete_local_logs = True remote_base_log_folder = s3://my-logs remote_task_handler_kwargs = {"delete_local_copy": true}

But airflow ui fails to fetch logs for both cases, triggering with ui and cli. Also the tasks are marked as fail.
["Could not read served logs: Invalid URL 'http://:8793/log/dag_id=init_test_3.0.1/run_id=manual__2025-05-21T02:10:37.485869+00:00/task_id=print_time/attempt=1.log': No host supplied"]

However, when i run airflow dags test with cli, my tasks succeed, but still fails to read logs.
["Could not read served logs: HTTPConnectionPool(host='ip-172-29-XX-XX.ap-northeast-2.compute.internal', port=8793): Max retries exceeded with url: /log/dag_id=init_test_3.0.1/run_id=manual__2025-05-21T02:15:10.539747+00:00/task_id=print_time/attempt=1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffff73126db0>: Failed to establish a new connection: [Errno 111] Connection refused'))"]

I can see the remote logging has not been configured properly, but I don't know why.

What you think should happen instead?

I expect my tasks my tasks to succeed and the logs are fetched remotely from s3.

How to reproduce

remote_logging = True remote_log_conn_id = s3_conn delete_local_logs = True remote_base_log_folder = s3://my-logs remote_task_handler_kwargs = {"delete_local_copy": true}

Operating System

Debian GNU/Linux

Versions of Apache Airflow Providers

No response

Deployment

Other Docker-based deployment

Deployment details

I am running airflow on ECS, using apache/airflow:3.0.1-python3.12 base image

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@KaYunKIM KaYunKIM added kind:bug This is a clearly a bug area:core needs-triage label for new issues that we didn't triage yet labels May 21, 2025
Copy link

boring-cyborg bot commented May 21, 2025

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@dosubot dosubot bot added the area:logging label May 21, 2025
@gopidesupavan
Copy link
Member

Which executor are you using ?

@KaYunKIM
Copy link
Author

I'm using celery executor

@gopidesupavan
Copy link
Member

I'm using celery executor

hm its strange, its working for me in local with celery executor. looks like some host issue.

@KaYunKIM
Copy link
Author

KaYunKIM commented May 23, 2025

I'm using celery executor

hm its strange, its working for me in local with celery executor. looks like some host issue.

It was an issue with:

  1. worker not able to execute the task: Leaves log file, but an empty one. And it was because the task got killed by I assume the next reason,
  2. scheduler not able to find DAG: DAG 'init_test_3.0.1' for task instance <TaskInstance: init_test_3.0.1.print_time manual__2025-05-23T01:11:24.335987+00:00 [queued]> not found in serialized_dag table.
    => which i checked with metadata DB, the dag being present in the serialized_dag table.

I solved with adding /execution/ for execution_api_server_url in airflow.cfg.

So for example,
execution_api_server_url = http://airflow-webserver.airflow.local:8080/execution/
=> Previously, I forgot to add /execution/ at the end.

In this way, both scheduler and worker can connect to webserver api, and the remote logging succeeded as well:)

@gopidesupavan
Copy link
Member

Cool would you like anything or we can close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core area:logging kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet
Projects
None yet
Development

No branches or pull requests

2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy