Bug 1434047

Summary:

glusterd crashed and core dumped

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

SATHEESARAN <sasundar>

Component:

glusterd

Assignee:

Atin Mukherjee <amukherj>

Status:

CLOSED DUPLICATE

QA Contact:

SATHEESARAN <sasundar>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

rhgs-3.2

CC:

amukherj, rhs-bugs, sasundar, storage-qa-internal, vbellur

Target Milestone:

---

Keywords:

ZStream

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-03-28 03:30:04 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1277939

Attachments:

Description	Flags
glusterd coredump	none

Description SATHEESARAN 2017-03-20 15:47:48 UTC

Description of problem:
-----------------------
RHV-RHGS HCI setup uses RHEL 7.3 nodes to run both gluster server as well as virt services. Observed the glusterd crash on one of the nodes, not sure the exact step that happened at that point. Looks like moving the node to maintenance should have probably caused. 

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
3.8.4-17.el7rhgs

How reproducible:
-----------------
Always

Steps to Reproduce:
-------------------
No definite steps as such. Not sure what step has caused it

Actual results:
--------------
glusterd crashed and coredumped

Expected results:
-----------------
glusterd should not crash

Comment 1 SATHEESARAN 2017-03-20 15:48:15 UTC

<snip>
[2017-03-14 16:17:35.761561] W [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x204dc) [0x7ff912d884dc] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.
so(+0x2a138) [0x7ff912d92138] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd531a) [0x7ff912e3d31a] ) 0-management: Lock for vol engine not held
[2017-03-14 16:17:35.761572] W [MSGID: 106118] [glusterd-handler.c:5833:__glusterd_peer_rpc_notify] 0-management: Lock not released for engine
[2017-03-14 16:17:35.761593] W [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x204dc) [0x7ff912d884dc] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.
so(+0x2a138) [0x7ff912d92138] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd531a) [0x7ff912e3d31a] ) 0-management: Lock for vol vmstore not held
[2017-03-14 16:17:35.761603] W [MSGID: 106118] [glusterd-handler.c:5833:__glusterd_peer_rpc_notify] 0-management: Lock not released for vmstore
[2017-03-14 16:17:35.761626] C [MSGID: 106002] [glusterd-server-quorum.c:347:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume data. Stopping local bricks.
[2017-03-14 16:17:35.762663] C [MSGID: 106002] [glusterd-server-quorum.c:347:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume vmstore. Stopping local bricks.
[2017-03-14 16:17:35.765737] W [glusterfsd.c:1288:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7ff91d475dc5] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xe5) [0x7ff91eb09c45] -->/usr/sbin/glusterd(cle
anup_and_exit+0x6b) [0x7ff91eb09abb] ) 0-: received signum (15), shutting down
[2017-03-14 16:17:35.768719] I [MSGID: 101053] [mem-pool.c:641:mem_pool_destroy] 0-management: size=588 max=0 total=0
[2017-03-14 16:17:35.768745] I [MSGID: 101053] [mem-pool.c:641:mem_pool_destroy] 0-management: size=124 max=0 total=0
[2017-03-14 16:17:35.768822] I [MSGID: 106144] [glusterd-pmap.c:295:pmap_registry_remove] 0-pmap: removing brick /gluster_bricks/data/data on port 49153
[2017-03-14 16:17:35.768913] I [MSGID: 106144] [glusterd-pmap.c:295:pmap_registry_remove] 0-pmap: removing brick /gluster_bricks/vmstore/vmstore on port 49154
[2017-03-14 16:17:35.773479] W [socket.c:595:__socket_rwv] 0-management: readv on /var/run/gluster/43e15551f8ea28ea635bca7bdbd58919.socket failed (No data available)
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2017-03-14 16:17:35
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.8.4
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7ff91e614c02]
/lib64/libglusterfs.so.0(gf_print_trace+0x324)[0x7ff91e61e694]
/lib64/libc.so.6(+0x35250)[0x7ff91ccf8250]
/lib64/liburcu-bp.so.1(rcu_read_lock_bp+0x2d)[0x7ff9127fa0ad]
/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x102212)[0x7ff912e6a212]
/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x1026ed)[0x7ff912e6a6ed]
/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x1028d9)[0x7ff912e6a8d9]
/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x4492c)[0x7ff912dac92c]
/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x449c2)[0x7ff912dac9c2]
/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x1d79b)[0x7ff912d8579b]
/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x204dc)[0x7ff912d884dc]
/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb3)[0x7ff91e3dea03]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7ff91e3da9f3]
/usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so(+0x9754)[0x7ff9101f8754]
/lib64/libglusterfs.so.0(+0x83770)[0x7ff91e66e770]
/lib64/libpthread.so.0(+0x7dc5)[0x7ff91d475dc5]
/lib64/libc.so.6(clone+0x6d)[0x7ff91cdba73d]

Comment 2 SATHEESARAN 2017-03-20 15:52:22 UTC

Created attachment 1264785 [details]
glusterd coredump

Comment 7 Atin Mukherjee 2017-03-28 03:30:04 UTC

As per the initial log you shared I do see a cleanup thread :

[2017-03-14 16:17:35.765737] W [glusterfsd.c:1288:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7ff91d475dc5] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xe5) [0x7ff91eb09c45] -->/usr/sbin/glusterd(cle
anup_and_exit+0x6b) [0x7ff91eb09abb] ) 0-: received signum (15), shutting down

This is a duplicate of BZ 1238067

*** This bug has been marked as a duplicate of bug 1238067 ***