Description of problem: ===================== I was running volume creates and deletes on my brickmux setup, with creation of different type of volumes. I saw after about 10 hrs, shd crash dumps with below BT Missing separate debuginfo for Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/c8/fbb951579a5ccf45f786661b585545f43e4870 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id shd/dist-arb_ex8z2g5bax33k -p /va'. Program terminated with signal 11, Segmentation fault. #0 __get_heard_from_all_status (this=this@entry=0x7f7fce0f2250) at afr-common.c:5024 5024 for (i = 0; i < priv->child_count; i++) { Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64 libcom_err-1.42.9-15.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libuuid-2.23.2-61.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64 (gdb) bt #0 __get_heard_from_all_status (this=this@entry=0x7f7fce0f2250) at afr-common.c:5024 #1 0x00007f7fef97be27 in afr_notify (this=0x7f7fce0f2250, event=6, data=0x7f7fce0de3d0, data2=<optimized out>) at afr-common.c:5519 #2 0x00007f7fef97c6c9 in notify (this=<optimized out>, event=<optimized out>, data=<optimized out>) at afr.c:42 #3 0x00007f7ffe32c2a2 in xlator_notify (xl=0x7f7fce0f2250, event=event@entry=6, data=data@entry=0x7f7fce0de3d0) at xlator.c:692 #4 0x00007f7ffe3e3d15 in default_notify (this=this@entry=0x7f7fce0de3d0, event=event@entry=6, data=data@entry=0x0) at defaults.c:3388 #5 0x00007f7fefbb6469 in client_notify_dispatch (this=this@entry=0x7f7fce0de3d0, event=event@entry=6, data=data@entry=0x0) at client.c:97 #6 0x00007f7fefbb64ca in client_notify_dispatch_uniq (this=this@entry=0x7f7fce0de3d0, event=event@entry=6, data=data@entry=0x0) at client.c:71 #7 0x00007f7fefbb748d in client_rpc_notify (rpc=0x7f7fce838270, mydata=0x7f7fce0de3d0, event=<optimized out>, data=<optimized out>) at client.c:2365 #8 0x00007f7ffe0d8203 in rpc_clnt_handle_disconnect (conn=0x7f7fce8382a0, clnt=0x7f7fce838270) at rpc-clnt.c:826 #9 rpc_clnt_notify (trans=0x7f7fce8385b0, mydata=0x7f7fce8382a0, event=RPC_TRANSPORT_DISCONNECT, data=<optimized out>) at rpc-clnt.c:887 #10 0x00007f7ffe0d4a53 in rpc_transport_notify (this=this@entry=0x7f7fce8385b0, event=event@entry=RPC_TRANSPORT_DISCONNECT, data=data@entry=0x7f7fce8385b0) at rpc-transport.c:547 #11 0x00007f7ff26ee2df in socket_event_poll_err (this=this@entry=0x7f7fce8385b0, gen=gen@entry=1, idx=idx@entry=183) at socket.c:1385 #12 0x00007f7ff26f06ea in socket_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7f7fce8385b0, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=16, event_thread_died=0 '\000') at socket.c:3008 #13 0x00007f7ffe395416 in event_dispatch_epoll_handler (event=0x7f7ff0e3ce70, event_pool=0x55b40b2bf5b0) at event-epoll.c:648 #14 event_dispatch_epoll_worker (data=0x55b40b311c80) at event-epoll.c:761 #15 0x00007f7ffd16dea5 in start_thread () from /lib64/libpthread.so.0 #16 0x00007f7ffca338cd in clone () from /lib64/libc.so.6 Version-Release number of selected component (if applicable): =================== 6.0.6 How reproducible: =============== was hit thrice on this cluster Steps to Reproduce: 1.created a 3 node brickmux setup 2. triggered a script which creates 1000 volumes of different types randomly(singlebrick,rep3, distrep3, arb,dist-arb,ecv,dist-ecv) 3. then post that we delete all the volumes 4. then again re-iterate step2,3 for j in {1..100};do echo "########## loop $j #### " |& tee -a volc.log; date |& tee -a volc.log;for i in {1..1000};do python randvol-create.py |& tee -a volc.log;done;for v in $(gluster v list);do gluster v stop $v --mode=script|& tee -a volc.log;date |& tee -a volc.log; gluster v del $v --mode=script|& tee -a volc.log;done;done Actual results: ================ glustershd crash as above
logs and sosreports @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1725022/ cores at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1725022/rhs-gp-srv1.lab.eng.blr.redhat.com/