Created attachment 1024891 [details] glfsheal core file How reproducible: Frequently. Steps to Reproduce: 1. gluster volume create testvol replica 2 127.0.0.2:/bricks/brick{1..2} 2. gluster volume heal testvol info Note: Step-2 launches the `glfsheal` binary with the volname as the argument. So you can also run `glfsheal <volname>` instead of step-2.
[root@vm1 /]# gdb glfsheal core.8409 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/local/sbin/glfsheal testvol'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f9421da3586 in afr_notify (this=0x7f941400af80, event=6, data=0x7f9414009570, data2=0x7f9421da632a <notify>) at afr-common.c:4015 4015 if (priv->child_up[i]) Missing separate debuginfos, use: debuginfo-install glibc-2.20-8.fc21.x86_64 keyutils-libs-1.5.9-4.fc21.x86_64 krb5-libs-1.12.2-15.fc21.x86_64 libacl-2.2.52-7.fc21.x86_64 libattr-2.4.47-9.fc21.x86_64 libcom_err-1.42.12-3.fc21.x86_64 libselinux-2.3-5.fc21.x86_64 libxml2-2.9.1-7.fc21.x86_64 ncurses-libs-5.9-16.20140323.fc21.x86_64 openssl-libs-1.0.1k-6.fc21.x86_64 pcre-8.35-8.fc21.x86_64 readline-6.3-5.fc21.x86_64 sssd-client-1.12.4-2.fc21.x86_64 xz-libs-5.1.2-14alpha.fc21.x86_64 zlib-1.2.8-7.fc21.x86_64 (gdb) thread apply all bt Thread 3 (Thread 0x7f94258cc700 (LWP 8412)): #0 0x00007f942e5258fd in nanosleep () from /lib64/libpthread.so.0 #1 0x00007f9430316b2e in gf_timer_proc (ctx=0x1c07170) at timer.c:195 #2 0x00007f942e51d52a in start_thread () from /lib64/libpthread.so.0 #3 0x00007f942de6c22d in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7f94307b2700 (LWP 8409)): #0 0x00007f942e522939 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f943032ef36 in event_dispatch_destroy (event_pool=0x1c25ef0) at event.c:262 #2 0x00007f942f5d48e8 in pub_glfs_fini (fs=0x1c07010) at glfs.c:1162 #3 0x0000000000403a07 in main (argc=2, argv=0x7ffde4e451c8) at glfs-heal.c:829 Thread 1 (Thread 0x7f941bfff700 (LWP 8415)): #0 0x00007f9421da3586 in afr_notify (this=0x7f941400af80, event=6, data=0x7f9414009570, data2=0x7f9421da632a <notify>) at afr-common.c:4015 #1 0x00007f9421da6431 in notify (this=0x7f941400af80, event=6, data=0x7f9414009570) at afr.c:37 #2 0x00007f94302ed144 in xlator_notify (xl=0x7f941400af80, event=6, data=0x7f9414009570) at xlator.c:489 #3 0x00007f943030a3cf in default_notify (this=0x7f9414009570, event=6, data=0x0) at defaults.c:2316 #4 0x00007f9421fc866d in client_notify_dispatch (this=0x7f9414009570, event=6, data=0x0) at client.c:87 #5 0x00007f9421fc8548 in client_notify_dispatch_uniq (this=0x7f9414009570, event=6, data=0x0) at client.c:65 #6 0x00007f9421fd112d in client_rpc_notify (rpc=0x7f9414032890, mydata=0x7f9414009570, event=RPC_CLNT_DISCONNECT, data=0x0) at client.c:2097 #7 0x00007f942f7feecd in rpc_clnt_notify (trans=0x7f9414032d20, mydata=0x7f94140328c0, event=RPC_TRANSPORT_DISCONNECT, data=0x7f9414032d20) at rpc-clnt.c:861 #8 0x00007f942f7fb68a in rpc_transport_notify (this=0x7f9414032d20, event=RPC_TRANSPORT_DISCONNECT, data=0x7f9414032d20) at rpc-transport.c:543 #9 0x00007f94244b4fc8 in socket_event_poll_err (this=0x7f9414032d20) at socket.c:1205 #10 0x00007f94244b9683 in socket_event_handler (fd=10, idx=2, data=0x7f9414032d20, poll_in=1, poll_out=0, poll_err=16) at socket.c:2410 #11 0x00007f943036364c in event_dispatch_epoll_handler (event_pool=0x1c25ef0, event=0x7f941bffef20) at event-epoll.c:572 #12 0x00007f943036399f in event_dispatch_epoll_worker (data=0x7f9414032490) at event-epoll.c:674 #13 0x00007f942e51d52a in start_thread () from /lib64/libpthread.so.0 #14 0x00007f942de6c22d in clone () from /lib64/libc.so.6
(gdb) thread 1 [Switching to thread 1 (Thread 0x7f941bfff700 (LWP 8415))] #0 0x00007f9421da3586 in afr_notify (this=0x7f941400af80, event=6, data=0x7f9414009570, data2=0x7f9421da632a <notify>) at afr-common.c:4015 4015 if (priv->child_up[i]) (gdb) p this->name $1 = 0x7f941400a2a0 "testvol-replicate-0" (gdb) p priv $2 = (afr_private_t *) 0x7f941402d420 (gdb) p priv->child_up $3 = (unsigned char *) 0x7f941402d42000 <error: Cannot access memory at address 0x7f941402d42000> (gdb)
Not seen this in last 2+ years! Not seen now