Description of problem: ---------------------------------------- Coredump was found on 4 of the node in the cluster . Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/64/b236454ed57a3fb07532d2829ed254fa0599ad [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO'. Program terminated with signal 11, Segmentation fault. #0 0x00007f8fd3687570 in pthread_spin_lock () from /lib64/libpthread.so.0 Missing separate debuginfos, use: debuginfo-install glibc-2.17-307.el7.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-46.el7.x86_64 libcom_err-1.42.9-17.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-15.el7.x86_64 libuuid-2.23.2-63.el7.x86_64 libxml2-2.9.1-6.el7.4.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 userspace-rcu-0.7.9-2.el7rhgs.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-18.el7.x86_64 (gdb) bt #0 0x00007f8fd3687570 in pthread_spin_lock () from /lib64/libpthread.so.0 #1 0x00007f8fd4870e60 in mem_put (ptr=0x7f8fb80d1070) at mem-pool.c:868 #2 0x00007f8fc8956f92 in FRAME_DESTROY (frame=0x7f8fb8016b98) at ../../../../libglusterfs/src/glusterfs/stack.h:173 #3 STACK_DESTROY (stack=0x7f8fb805ee68) at ../../../../libglusterfs/src/glusterfs/stack.h:193 #4 glusterd_ac_friend_add (event=<optimized out>, ctx=<optimized out>) at glusterd-sm.c:334 #5 0x00007f8fc895930e in glusterd_friend_sm () at glusterd-sm.c:1451 #6 0x00007f8fc89b5cc2 in __glusterd_mgmt_hndsk_version_ack_cbk (req=req@entry=0x7f8fb8121828, iov=iov@entry=0x7f8fb8121860, count=count@entry=1, myframe=myframe@entry=0x7f8fb8016b98) at glusterd-handshake.c:1941 #7 0x00007f8fc89a25aa in glusterd_big_locked_cbk (req=0x7f8fb8121828, iov=0x7f8fb8121860, count=1, myframe=0x7f8fb8016b98, fn=0x7f8fc89b5870 <__glusterd_mgmt_hndsk_version_ack_cbk>) at glusterd-rpc-ops.c:211 #8 0x00007f8fd45ed0f1 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f8fb8006ac0, pollin=pollin@entry=0x7f8fb8008b40) at rpc-clnt.c:764 #9 0x00007f8fd45ed457 in rpc_clnt_notify (trans=0x7f8fb8006de0, mydata=0x7f8fb8006af0, event=<optimized out>, data=0x7f8fb8008b40) at rpc-clnt.c:931 #10 0x00007f8fd45e9af3 in rpc_transport_notify (this=this@entry=0x7f8fb8006de0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f8fb8008b40) at rpc-transport.c:547 #11 0x00007f8fc7b70b35 in socket_event_poll_in (notify_handled=true, this=0x7f8fb8006de0) at socket.c:2582 #12 socket_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7f8fb8006de0, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at socket.c:2991 #13 0x00007f8fd48a8796 in event_dispatch_epoll_handler (event=0x7f8fc59b5130, event_pool=0x5582845a6570) at event-epoll.c:656 #14 event_dispatch_epoll_worker (data=0x5582846036d0) at event-epoll.c:769 #15 0x00007f8fd3682ea5 in start_thread () from /lib64/libpthread.so.0 #16 0x00007f8fd2f488dd in clone () from /lib64/libc.so.6 Version-Release number of selected component : ----------------------------------------------------- glusterfs-6.0-34.el7rhgs.x86_64 3.10.0-1127.8.2.el7.x86_64 How reproducible: ----------------------------------------------------- 1/1 Steps to Reproduce: ------------------------------------------------------ 1.Created 4 x3 distrep volume ,mounted it to the client . 2.On the client started creating file using dd "dd if=/dev/urandom of=file bs=10Mcount=1024" 3.While IO on file was inprogress, created multiple gluster snap using for i in {1..50} ; do gluster snapshot create snap$i vol-name no-timestamp; done 4.The file creation was in progress and the snaps was being created . THe following was checked using command gluster snapshot list volname. 5. While the file creation was in process saw that 2 of the bricks went down. 6. Tried bringing up them by restarting volume forcefully and restarting glusterd. The glusterd restarted on node 1 but failed on other nodes. 7.Then in a different volume (replica) did some I/O and created snaps . 8.There was an accident reboot on three of the nodes . Additional info: ------------------------------------------------------- 1. The peers were disconnected . 2. One node was down. 3.Expecting the crash due to glusterd spin lock issue. 4. Unable to start glusterd. [root@dhcp47-136 ~]# systemctl start glusterd Job for glusterd.service failed because the control process exited with error code. See "systemctl status glusterd.service" and "journalctl -xe" for details. [root@dhcp47-136 ~]# journalctl -xe -- -- Unit session-159.scope has finished starting up. -- -- The start-up result is done. Jun 01 15:06:01 dhcp47-136.lab.eng.blr.redhat.com CROND[14524]: (root) CMD (/usr/lib64/sa/sa1 1 1) Jun 01 15:07:01 dhcp47-136.lab.eng.blr.redhat.com systemd[1]: Started Session 160 of user root. -- Subject: Unit session-160.scope has finished start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit session-160.scope has finished starting up. -- -- The start-up result is done. Jun 01 15:07:01 dhcp47-136.lab.eng.blr.redhat.com CROND[14538]: (root) CMD (/usr/lib64/sa/sa1 1 1) Jun 01 15:08:01 dhcp47-136.lab.eng.blr.redhat.com systemd[1]: Started Session 161 of user root. -- Subject: Unit session-161.scope has finished start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit session-161.scope has finished starting up. -- -- The start-up result is done. Jun 01 15:08:01 dhcp47-136.lab.eng.blr.redhat.com CROND[14552]: (root) CMD (/usr/lib64/sa/sa1 1 1) Jun 01 15:08:33 dhcp47-136.lab.eng.blr.redhat.com polkitd[1178]: Registered Authentication Agent for unix-process:14566:837407 (system bus name :1.342 [/usr/bin/pkttyagent --notify-fd 5 --fallback], object path Jun 01 15:08:33 dhcp47-136.lab.eng.blr.redhat.com systemd[1]: Starting GlusterFS, a clustered file-system server... -- Subject: Unit glusterd.service has begun start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit glusterd.service has begun starting up. Jun 01 15:08:34 dhcp47-136.lab.eng.blr.redhat.com systemd[1]: glusterd.service: control process exited, code=exited status=1 Jun 01 15:08:34 dhcp47-136.lab.eng.blr.redhat.com systemd[1]: Failed to start GlusterFS, a clustered file-system server. -- Subject: Unit glusterd.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit glusterd.service has failed. -- -- The result is failed. Jun 01 15:08:34 dhcp47-136.lab.eng.blr.redhat.com systemd[1]: Unit glusterd.service entered failed state. Jun 01 15:08:34 dhcp47-136.lab.eng.blr.redhat.com systemd[1]: glusterd.service failed. Jun 01 15:08:34 dhcp47-136.lab.eng.blr.redhat.com polkitd[1178]: Unregistered Authentication Agent for unix-process:14566:837407 (system bus name :1.342, object path /org/freedesktop/PolicyKit1/AuthenticationAge Jun 01 15:09:01 dhcp47-136.lab.eng.blr.redhat.com systemd[1]: Started Session 162 of user root. -- Subject: Unit session-162.scope has finished start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit session-162.scope has finished starting up. -- -- The start-up result is done. Jun 01 15:09:01 dhcp47-136.lab.eng.blr.redhat.com CROND[14608]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:1462