-
Notifications
You must be signed in to change notification settings - Fork 544
uWSGI workers dying in 1.40.0 #2699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I also had a similar issue with 1.40, it was killing my uwsgi workers, had to downgrade to 1.39. Also wish I had more information to give but there wasn't much to go on. The issue was worse with profiling on (workers would be killed via SIGBUS) |
@fiendish @seb-b Hey folks, thanks for bringing this to our attention. Even if there is no repro, more info could help point us in the right direction -- what kind of apps (Django, Flask, FastAPI, something else?) are you running? Which SDK integrations are active ( Looking at the changelog from 1.40.0 there's a couple things we could try:
|
Oops, yeah, sorry. In my haste I forgot to include. In my case it's Flask 2.1.3 + uWSGI 2.0.23, and integrations = [
FlaskIntegration(),
LoggingIntegration(
level=logging.INFO,
event_level=logging.ERROR,
)
] I'll try to find time over the weekend or monday to try those things. |
Same here with Django 3.2 LTS, uWSGI 2.0.23. This are my settings:
|
Yes, I'd like to try this. But I'll have to wait until it is a bit more calm here in the evening (in ~6hrs or so). |
We hit the same issues to with Python (version 3.11) Django (version 4.1.10) app running on uWSGI (version 2.0.23) and sentry-sdk 1.40.0. We upgraded from sentry-sdk 1.15.0. We did not change the sentry_sdk.init, which is
Our uWSGI Setup:
After 1 hr of worker lifetime, we see that the uWSGI master complains that the listen queue is full. Log snippet attached below. The workers never get respawned and we eventually loose the kubernetes pod.
I was able to reproduce this in our test environment by keeping the Additionally I noticed that the worker processes never went away when I did I also confirmed that, this issue does not occur when we kill by I have noticed this to happen only when |
@PAStheLoD thanks for spending time on this. While you are at it, does |
Thanks @PAStheLoD, this is very helpful. I'd also be interested in whether My current thoughts on this are that there has to be a way to make this work since the transport worker and profiler work largely the same way and we're not seeing them explode people's apps. One difference (other than the use of I'm also wondering if upgrading to the recent uWSGI 2.0.24 might make a difference? |
We were also hit by uWSGI workers getting stuck, presumably due to changes in sentry-sdk - although we did try to revert upgrades without much luck. In the end we ended up migrating from uWSGI to Unit and it's been rock solid for us. Just in case anybody else also get frustrated with uWSGI... |
Everyone, we've just released 1.40.4 with a tentative fix. It'd greatly help us out if you could try it out if you're affected by this issue and report back. What you need to do is install sentry_sdk.init(
... # your usual stuff
# Metrics collection is still off by default in 1.40.4 if you're under uWSGI,
# but this can be overridden by
_experiments={
"force_enable_metrics": True,
}
) Caveat: The fix in 1.40.4 should work as long as you're not manually emitting any metrics (with Also please note that for the SDK to not break in other unexpected ways uWSGI needs to be run with |
I can reproduce the issue. uwsgi invocation with a.py from @PAStheLoD:
The segfault is caused by uwsgi not handling threads correctly with default arguments. When forking python with multiple threads running the fork hooks To make the segfault disappear uwsgi has to be called with
Interestingly this behaviour is still not fully correct: Only Not sure if there is any viable workaround for the sentry SDK, as the corruption happens outside of its control. IMHO this is not a sentry issue, but caused by unsafe defaults in uwsgi. I wish uwsgi would pick safer defaults. Used software versions: |
Thanks a lot for looking into this @natano! I think the best we can do on the SDK side moving forward is to issue a warning on startup if we detect we're in non-lazy mode and The fix in 1.40.4 should work even without So to sum up for future reference, if you run into this issue:
|
To add some extra seasoning to this issue. We saw failures in our AWS Lambda executors after moving to 1.40.3:
|
@thedanfields I'd wager a guess that's a AWS specific issue - reminds me of #2632 without the upgrading to Python 3.12 part. I wonder if we've hit some kind of AWS Lambda limit on the number of threads since the SDK is now spawning one additional thread. Could you open a new issue with this please? |
@sentrivana , happy to oblige I created a new issue as requested. |
it seems getsentry/sentry-python#2699 is closed, and the outcome is: * deadlock is fixed * segfault is caused by improper uwsgi settings, and the SDK should warn if those are present I ran `snuba api` locally and the warning doesn't trigger, so I assume we're good. But also, I think we may have only encountered the deadlock, not the segfault.
it seems getsentry/sentry-python#2699 is closed, and the outcome is: * deadlock is fixed * segfault is caused by improper uwsgi settings, and the SDK should warn if those are present I ran `snuba api` locally and the warning doesn't trigger, so I assume we're good. But also, I think we may have only encountered the deadlock, not the segfault.
How do you use Sentry?
Sentry Saas (sentry.io)
Version
1.40.0
Steps to Reproduce
I don't have a reproduction yet due to the nature of the issue, for which I apologize, but with 1.40.0 my server processes are silently dying after about 90 minutes of run time. This does not happen with 1.39.2. I just want to get it on your radar that the latest update may be causing issues for some. I wish I had more information, but I don't at this time.
Expected Result
^
Actual Result
^
The text was updated successfully, but these errors were encountered: