-
Notifications
You must be signed in to change notification settings - Fork 10.8k
segfault in python 1.31.0
#23796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hopefully this will help; this is the backtrace generated by GDB when analyzing the core dump:
|
CC @veblush |
After re-running tests, this seems to also affect version A coworker of mine noted the presence of this error too:
I've seen a few conversations around discussing similar issues and that seemed related to fork. Also, here's some more stack traces
|
@huguesalary The stack trace looks slightly different in your most recent comment. Which thread segfaulted in that case? Thread 6? |
I somehow managed to lose the trace and can't generate another one right now, but, I believe Now, I also had instances where the segfault that occured looked exactly like |
After deploying So, although I was able to create a segfault with |
Two things that might be relevant to this come to my mind:
Another interesting thing is the folk error message. It shouldn't be shown, I guess. Richard, gRPC Python is supposed to not use epollex at all? |
I have the same problem. |
Some examples Another one I can provide additional info for debug purposes but I don`t have a 100% case to reproduce it. It is floating error. |
We're using Flask application and Google PubSub emulator and Google Datastore emulator (both communicating via grpc through client libraries) to run tests. With grpcio==1.31.0 we are getting segmentation fault errors but fortunately, with 1.30.0 it works ok. Not sure what could be the issue. We're Python 3.7.5 |
@gnossen Is there anything we can provide to help make progress on this issue? |
Hi @gnossen, just checking in again. Is there any additional information that would be useful? |
This is happening to me too with grpcio-1.33.2-cp36-cp36m-manylinux2014_x86_64 on Ubuntu 18.04.4 LTS (GNU/Linux 5.3.0-1030-aws x86_64). I get constant segfaults. Downgrading to 1.30.0 solves the problem. |
Same here. Python multiprocessing.Process died randomly, leaving segfault message in dmesg like install |
We faced the exact same issue - multiprocessing with grpc leading to segmentation faults. We were using |
Sorry for the delay. This thread got buried in my inbox. @wireman27 @exzhawk @dmjef Can one of you please point me to your code and/or describe how to reproduce this issue? |
It looks like the default polling strategy changed from |
@gnossen Reproducing might be difficult because we use Important to note here is that we initialise 10 sub-processes with |
We have a similar issue. We are running processes in multiprocessing using google cloud pubsub. The code we are using is very similar to the function With all versions above 1.30 we get messages like these:
and then segmentation faults. But it seems the occurrence of the segmentation fault depends on the type of process we are running. |
This is starting to be a huge issue for my team; we make heavy use of grpcio and grpc-google-iam-v1. Does anyone have this issue too or a workaround for working with GCP libraries?
|
We have had this issue too and this thread was helpful for us. We have a workaround, so I thought I would provide details, in hopes it may help. What we did was as suggested, to set We have also found that workers running on python
which is similar to what @cpaulik is seeing. If I had to guess, I would naively suspect that what is occurring is that both the parent and child processes are attempting to use the same socket connection (as |
This reverts commit 2e6c725.
thank you so much @jrmlhermitte I've been trying find a solution for segfault issue for a very long time. This worked like a charm. |
Current version of Airflow image uses grpcio==1.31.0, which causes segfaults: b/174948982. Depedency added to allow only versions up to 1.30.0. It hasn't been yet fixed in upstream: grpc/grpc#23796 Change-Id: I62f3fbd75ff64dab6772a534424d06b178e67a42 GitOrigin-RevId: f0a17fad94c95fcb0794c8501ce579b391473c1e
Current version of Airflow image uses grpcio==1.31.0, which causes segfaults: b/174948982. Depedency added to allow only versions up to 1.30.0. It hasn't been yet fixed in upstream: grpc/grpc#23796 Change-Id: I62f3fbd75ff64dab6772a534424d06b178e67a42 GitOrigin-RevId: f0a17fad94c95fcb0794c8501ce579b391473c1e
Enable all db and flow tests. On linux, multiprocessing's default is fork, which causes gRPC to fail because its default polling mechanism is epoll. See grpc/grpc#23796
Hi, 2022 checking in. It appears that gRPC doesn’t work with any kind of pre-fork (eg uWSGI) model system unless you specify the environment variables noted above. Perhaps a solution would be to add some documentation? |
Note: the epollex poller has been removed in gRPC 1.46.0 released in May 2022. So I don't think that the solution above to force the poller to |
Since 1.46.0 epoll1 is the default poll implementation used and we shouldn't encounter segfaults again. grpc/grpc#23796 (comment)
What version of gRPC and what language are you using?
Python
1.31.0
.What operating system (Linux, Windows,...) and version?
Linux 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u2 (2019-11-11) x86_64 x86_64 x86_64 GNU/Linux
What runtime / compiler are you using (e.g. python version or version of gcc)
Python
3.5.9
What did you do?
Upgraded to
1.31.0
What did you expect to see?
No segfault
What did you see instead?
We started seeing segfaults in both our celery workers and our uWSGI logs:
We tested
1.30.0
and did not observe the segfaults anymore.When this happens with our celery workers, what seems to trigger the segfault is when the worker is restarted after having executed the maximum tasks specified by
maxtaskperchild
. For examplecelery -A the_app worker -Q a_queue --concurrency=1 --maxtasksperchild=100 -l info
.I can't currently provide cleaned up code for you to reproduce, but I believe any code making GRPC calls should trigger this after enough time.
The text was updated successfully, but these errors were encountered: