Got the same crash while running https://github.com/pranithk/gluster-tests/blob/master/afr/self-heal.sh Core was generated by `/usr/local/sbin/glusterfsd -s localhost --volfile-id vol.pranithk-laptop.tmp-0'. Program terminated with signal 11, Segmentation fault. #0 0x00007f4ad769841e in rpcsvc_get_listener (svc=0x1ddd9e0, port=65535, trans=0x1dee6e0) at rpcsvc.c:1238 1238 if (listener->trans == trans) { Missing separate debuginfos, use: debuginfo-install glibc-2.15-37.fc17.x86_64 keyutils-libs-1.5.5-2.fc17.x86_64 krb5-libs-1.10-5.fc17.x86_64 libaio-0.3.109-5.fc17.x86_64 libcom_err-1.42-4.fc17.x86_64 libgcc-4.7.0-5.fc17.x86_64 libselinux-2.1.10-3.fc17.x86_64 openssl-1.0.0j-1.fc17.x86_64 zlib-1.2.5-6.fc17.x86_64 (gdb) bt #0 0x00007f4ad769841e in rpcsvc_get_listener (svc=0x1ddd9e0, port=65535, trans=0x1dee6e0) at rpcsvc.c:1238 #1 0x00007f4ad769715e in rpcsvc_notify (trans=0x1dfb070, mydata=0x1ddd9e0, event=RPC_TRANSPORT_CLEANUP, data=0x0) at rpcsvc.c:651 #2 0x00007f4ad769be1c in rpc_transport_unref (this=0x1dfb070) at rpc-transport.c:476 #3 0x00007f4ad4729f90 in socket_event_handler (fd=7, idx=3, data=0x1dfb070, poll_in=1, poll_out=0, poll_err=16) at socket.c:2106 #4 0x00007f4ad7915806 in event_dispatch_epoll_handler (event_pool=0x1ddcea0, events=0x1df9a10, i=1) at event-epoll.c:384 #5 0x00007f4ad79159eb in event_dispatch_epoll (event_pool=0x1ddcea0) at event-epoll.c:445 #6 0x00007f4ad78ed40d in event_dispatch (event_pool=0x1ddcea0) at event.c:113 #7 0x000000000040857f in main (argc=19, argv=0x7fff46640108) at glusterfsd.c:1893
http://review.gluster.org/4230 : root caused the issue as a race between freeing the RPC object and handling socket's POLL_ERR May be couple of iterations of review before getting the patch accepted, after its accepted in upstream, will backport to 2.0.z
CHANGE: http://review.gluster.org/4230 (rpc: check the ctx->listener before accessing rpcsvc object) merged in master by Anand Avati (avati)
https://code.engineering.redhat.com/gerrit/#/c/1763/ has been submitted for review.
Verified with glusterfs-3.4.0.18rhs-1 Test steps ========== 1. Ran the test case - https://tcms.engineering.redhat.com/case/243098/?from_plan=7656 2. Created distributed volume with 3 bricks and fuse mounted it in 2 clients From clients ran intense IO, ( creating multiple files using dd command ), and on RHS server side, kickstarted a continuous graph change ( by setting/unsetting a volume option, stat-prefetch on/off for instance ), and getting sosreports from RHS machines There seemed to be no crash or errors.