1842449 – [RHEL7.8] Corefiles seen in few server nodes.

Bug 1842449 - [RHEL7.8] Corefiles seen in few server nodes.

Summary: [RHEL7.8] Corefiles seen in few server nodes.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.5
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.5.z Batch Update 4
Assignee:	Nikhil Ladha
QA Contact:	Leela Venkaiah Gangavarapu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-06-01 09:42 UTC by Mugdha Soni
Modified:	2021-04-29 07:20 UTC (History)
CC List:	10 users (show)
Fixed In Version:	glusterfs-6.0-50
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-04-29 07:20:36 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2021:1462	0	None	None	None	2021-04-29 07:20:53 UTC

Description Mugdha Soni 2020-06-01 09:42:15 UTC

Description of problem:
----------------------------------------
Coredump was found on 4 of the node in the cluster . 

Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/64/b236454ed57a3fb07532d2829ed254fa0599ad [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO'. Program terminated with signal 11, Segmentation fault. #0 0x00007f8fd3687570 in pthread_spin_lock () from /lib64/libpthread.so.0 Missing separate debuginfos, use: debuginfo-install glibc-2.17-307.el7.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-46.el7.x86_64 libcom_err-1.42.9-17.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-15.el7.x86_64 libuuid-2.23.2-63.el7.x86_64 libxml2-2.9.1-6.el7.4.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 userspace-rcu-0.7.9-2.el7rhgs.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-18.el7.x86_64 (gdb) bt #0 0x00007f8fd3687570 in pthread_spin_lock () from /lib64/libpthread.so.0 #1 0x00007f8fd4870e60 in mem_put (ptr=0x7f8fb80d1070) at mem-pool.c:868 #2 0x00007f8fc8956f92 in FRAME_DESTROY (frame=0x7f8fb8016b98) at ../../../../libglusterfs/src/glusterfs/stack.h:173 #3 STACK_DESTROY (stack=0x7f8fb805ee68) at ../../../../libglusterfs/src/glusterfs/stack.h:193 #4 glusterd_ac_friend_add (event=<optimized out>, ctx=<optimized out>) at glusterd-sm.c:334 #5 0x00007f8fc895930e in glusterd_friend_sm () at glusterd-sm.c:1451 #6 0x00007f8fc89b5cc2 in __glusterd_mgmt_hndsk_version_ack_cbk (req=req@entry=0x7f8fb8121828, iov=iov@entry=0x7f8fb8121860, count=count@entry=1, myframe=myframe@entry=0x7f8fb8016b98) at glusterd-handshake.c:1941 #7 0x00007f8fc89a25aa in glusterd_big_locked_cbk (req=0x7f8fb8121828, iov=0x7f8fb8121860, count=1, myframe=0x7f8fb8016b98, fn=0x7f8fc89b5870 <__glusterd_mgmt_hndsk_version_ack_cbk>) at glusterd-rpc-ops.c:211 #8 0x00007f8fd45ed0f1 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f8fb8006ac0, pollin=pollin@entry=0x7f8fb8008b40) at rpc-clnt.c:764 #9 0x00007f8fd45ed457 in rpc_clnt_notify (trans=0x7f8fb8006de0, mydata=0x7f8fb8006af0, event=<optimized out>, data=0x7f8fb8008b40) at rpc-clnt.c:931 #10 0x00007f8fd45e9af3 in rpc_transport_notify (this=this@entry=0x7f8fb8006de0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f8fb8008b40) at rpc-transport.c:547 #11 0x00007f8fc7b70b35 in socket_event_poll_in (notify_handled=true, this=0x7f8fb8006de0) at socket.c:2582 #12 socket_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7f8fb8006de0, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at socket.c:2991 #13 0x00007f8fd48a8796 in event_dispatch_epoll_handler (event=0x7f8fc59b5130, event_pool=0x5582845a6570) at event-epoll.c:656 #14 event_dispatch_epoll_worker (data=0x5582846036d0) at event-epoll.c:769 #15 0x00007f8fd3682ea5 in start_thread () from /lib64/libpthread.so.0 #16 0x00007f8fd2f488dd in clone () from /lib64/libc.so.6

Version-Release number of selected component :
-----------------------------------------------------
glusterfs-6.0-34.el7rhgs.x86_64
3.10.0-1127.8.2.el7.x86_64


How reproducible:
-----------------------------------------------------
1/1


Steps to Reproduce:
------------------------------------------------------
1.Created 4 x3 distrep volume ,mounted it to the client .
 
2.On the client started creating file using dd "dd if=/dev/urandom of=file bs=10Mcount=1024"  

3.While IO on file was inprogress, created multiple gluster snap using
for i in {1..50} ; do gluster snapshot create snap$i vol-name no-timestamp; done

4.The file creation was in progress and the snaps was being created . THe following was checked using command gluster snapshot list volname.

5. While the file creation was in process saw that 2 of the bricks went down.

6. Tried bringing up them by restarting volume forcefully and restarting glusterd. The glusterd restarted on node 1 but failed on other nodes.

7.Then in a different volume (replica) did some I/O and created snaps .

8.There was an accident reboot on three of the nodes . 

Additional info:
-------------------------------------------------------
1. The peers were disconnected .
2. One node was down.
3.Expecting the crash due to glusterd spin lock issue.
4. Unable to start glusterd.

[root@dhcp47-136 ~]# systemctl start glusterd
Job for glusterd.service failed because the control process exited with error code. See "systemctl status glusterd.service" and "journalctl -xe" for details.

[root@dhcp47-136 ~]# journalctl -xe
-- 
-- Unit session-159.scope has finished starting up.
-- 
-- The start-up result is done.
Jun 01 15:06:01 dhcp47-136.lab.eng.blr.redhat.com CROND[14524]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Jun 01 15:07:01 dhcp47-136.lab.eng.blr.redhat.com systemd[1]: Started Session 160 of user root.
-- Subject: Unit session-160.scope has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit session-160.scope has finished starting up.
-- 
-- The start-up result is done.
Jun 01 15:07:01 dhcp47-136.lab.eng.blr.redhat.com CROND[14538]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Jun 01 15:08:01 dhcp47-136.lab.eng.blr.redhat.com systemd[1]: Started Session 161 of user root.
-- Subject: Unit session-161.scope has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit session-161.scope has finished starting up.
-- 
-- The start-up result is done.
Jun 01 15:08:01 dhcp47-136.lab.eng.blr.redhat.com CROND[14552]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Jun 01 15:08:33 dhcp47-136.lab.eng.blr.redhat.com polkitd[1178]: Registered Authentication Agent for unix-process:14566:837407 (system bus name :1.342 [/usr/bin/pkttyagent --notify-fd 5 --fallback], object path 
Jun 01 15:08:33 dhcp47-136.lab.eng.blr.redhat.com systemd[1]: Starting GlusterFS, a clustered file-system server...
-- Subject: Unit glusterd.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit glusterd.service has begun starting up.
Jun 01 15:08:34 dhcp47-136.lab.eng.blr.redhat.com systemd[1]: glusterd.service: control process exited, code=exited status=1
Jun 01 15:08:34 dhcp47-136.lab.eng.blr.redhat.com systemd[1]: Failed to start GlusterFS, a clustered file-system server.
-- Subject: Unit glusterd.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit glusterd.service has failed.
-- 
-- The result is failed.
Jun 01 15:08:34 dhcp47-136.lab.eng.blr.redhat.com systemd[1]: Unit glusterd.service entered failed state.
Jun 01 15:08:34 dhcp47-136.lab.eng.blr.redhat.com systemd[1]: glusterd.service failed.
Jun 01 15:08:34 dhcp47-136.lab.eng.blr.redhat.com polkitd[1178]: Unregistered Authentication Agent for unix-process:14566:837407 (system bus name :1.342, object path /org/freedesktop/PolicyKit1/AuthenticationAge
Jun 01 15:09:01 dhcp47-136.lab.eng.blr.redhat.com systemd[1]: Started Session 162 of user root.
-- Subject: Unit session-162.scope has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit session-162.scope has finished starting up.
-- 
-- The start-up result is done.
Jun 01 15:09:01 dhcp47-136.lab.eng.blr.redhat.com CROND[14608]: (root) CMD (/usr/lib64/sa/sa1 1 1)

Comment 21 errata-xmlrpc 2021-04-29 07:20:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1462

Note You need to log in before you can comment on or make changes to this bug.