Bug 1434047

Summary: glusterd crashed and core dumped
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: SATHEESARAN <sasundar>
Component: glusterdAssignee: Atin Mukherjee <amukherj>
Status: CLOSED DUPLICATE QA Contact: SATHEESARAN <sasundar>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: amukherj, rhs-bugs, sasundar, storage-qa-internal, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-28 03:30:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1277939    
Attachments:
Description Flags
glusterd coredump none

Description SATHEESARAN 2017-03-20 15:47:48 UTC
Description of problem:
-----------------------
RHV-RHGS HCI setup uses RHEL 7.3 nodes to run both gluster server as well as virt services. Observed the glusterd crash on one of the nodes, not sure the exact step that happened at that point. Looks like moving the node to maintenance should have probably caused. 

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
3.8.4-17.el7rhgs

How reproducible:
-----------------
Always

Steps to Reproduce:
-------------------
No definite steps as such. Not sure what step has caused it

Actual results:
--------------
glusterd crashed and coredumped

Expected results:
-----------------
glusterd should not crash

Comment 1 SATHEESARAN 2017-03-20 15:48:15 UTC
<snip>
[2017-03-14 16:17:35.761561] W [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x204dc) [0x7ff912d884dc] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.
so(+0x2a138) [0x7ff912d92138] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd531a) [0x7ff912e3d31a] ) 0-management: Lock for vol engine not held
[2017-03-14 16:17:35.761572] W [MSGID: 106118] [glusterd-handler.c:5833:__glusterd_peer_rpc_notify] 0-management: Lock not released for engine
[2017-03-14 16:17:35.761593] W [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x204dc) [0x7ff912d884dc] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.
so(+0x2a138) [0x7ff912d92138] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd531a) [0x7ff912e3d31a] ) 0-management: Lock for vol vmstore not held
[2017-03-14 16:17:35.761603] W [MSGID: 106118] [glusterd-handler.c:5833:__glusterd_peer_rpc_notify] 0-management: Lock not released for vmstore
[2017-03-14 16:17:35.761626] C [MSGID: 106002] [glusterd-server-quorum.c:347:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume data. Stopping local bricks.
[2017-03-14 16:17:35.762663] C [MSGID: 106002] [glusterd-server-quorum.c:347:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume vmstore. Stopping local bricks.
[2017-03-14 16:17:35.765737] W [glusterfsd.c:1288:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7ff91d475dc5] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xe5) [0x7ff91eb09c45] -->/usr/sbin/glusterd(cle
anup_and_exit+0x6b) [0x7ff91eb09abb] ) 0-: received signum (15), shutting down
[2017-03-14 16:17:35.768719] I [MSGID: 101053] [mem-pool.c:641:mem_pool_destroy] 0-management: size=588 max=0 total=0
[2017-03-14 16:17:35.768745] I [MSGID: 101053] [mem-pool.c:641:mem_pool_destroy] 0-management: size=124 max=0 total=0
[2017-03-14 16:17:35.768822] I [MSGID: 106144] [glusterd-pmap.c:295:pmap_registry_remove] 0-pmap: removing brick /gluster_bricks/data/data on port 49153
[2017-03-14 16:17:35.768913] I [MSGID: 106144] [glusterd-pmap.c:295:pmap_registry_remove] 0-pmap: removing brick /gluster_bricks/vmstore/vmstore on port 49154
[2017-03-14 16:17:35.773479] W [socket.c:595:__socket_rwv] 0-management: readv on /var/run/gluster/43e15551f8ea28ea635bca7bdbd58919.socket failed (No data available)
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2017-03-14 16:17:35
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.8.4
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7ff91e614c02]
/lib64/libglusterfs.so.0(gf_print_trace+0x324)[0x7ff91e61e694]
/lib64/libc.so.6(+0x35250)[0x7ff91ccf8250]
/lib64/liburcu-bp.so.1(rcu_read_lock_bp+0x2d)[0x7ff9127fa0ad]
/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x102212)[0x7ff912e6a212]
/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x1026ed)[0x7ff912e6a6ed]
/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x1028d9)[0x7ff912e6a8d9]
/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x4492c)[0x7ff912dac92c]
/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x449c2)[0x7ff912dac9c2]
/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x1d79b)[0x7ff912d8579b]
/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x204dc)[0x7ff912d884dc]
/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb3)[0x7ff91e3dea03]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7ff91e3da9f3]
/usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so(+0x9754)[0x7ff9101f8754]
/lib64/libglusterfs.so.0(+0x83770)[0x7ff91e66e770]
/lib64/libpthread.so.0(+0x7dc5)[0x7ff91d475dc5]
/lib64/libc.so.6(clone+0x6d)[0x7ff91cdba73d]

Comment 2 SATHEESARAN 2017-03-20 15:52:22 UTC
Created attachment 1264785 [details]
glusterd coredump

Comment 7 Atin Mukherjee 2017-03-28 03:30:04 UTC
As per the initial log you shared I do see a cleanup thread :

[2017-03-14 16:17:35.765737] W [glusterfsd.c:1288:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7ff91d475dc5] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xe5) [0x7ff91eb09c45] -->/usr/sbin/glusterd(cle
anup_and_exit+0x6b) [0x7ff91eb09abb] ) 0-: received signum (15), shutting down

This is a duplicate of BZ 1238067

*** This bug has been marked as a duplicate of bug 1238067 ***