Bug 1108570

Summary:

RHS3.0: glusterd crash when detaching a node from existing cluster with peer detach command

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

Saurabh <saujain>

Component:

core

Assignee:

krishnan parthasarathi <kparthas>

Status:

CLOSED ERRATA

QA Contact:

Saurabh <saujain>

Severity:

high

Docs Contact:

Priority:

urgent

Version:

rhgs-3.0

CC:

kparthas, mzywusko, nsathyan, psriniva, rcyriac, rhs-bugs, security-response-team, ssamanta, storage-qa-internal

Target Milestone:

---

Target Release:

RHGS 3.0.0

Hardware:

x86_64

OS:

All

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Previously when the peer that is probed for was offline and the peer-probe or peer-detach commands were executed in quick succession, the glusterd Management Service would become unresponsive. With this fix, the peer-probe and peer-detach commands work as expected.

Story Points:

---

Clone Of:

Clones:

1109812 (view as bug list)

Environment:

Last Closed:

2014-09-22 19:41:10 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1109812

Attachments:

Description	Flags
sosreport of existing rhs node	none
coredump	none

Description Saurabh 2014-06-12 09:19:29 UTC

Description of problem:
Executed the gluster peer detach comamnd to detach a RHSS node from the cluster.

This resulted in a crash for glusterd

A Bz 1108505 just before this BZ was filed, that was related to gluster peer probe failure.

Version-Release number of selected component (if applicable):
glusterfs-3.6.0.15-1.el6rhs.x86_64

How reproducible:
happens to be seen this time

Steps to Reproduce:
1. gluster peer detach <rhss-nodename>
2.
3.

Actual results:
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2014-06-11 22:18:03
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.6.0.15
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x7fa407b7ae56]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x7fa407b9528f]
/lib64/libc.so.6(+0x329a0)[0x7fa406bc39a0]
/lib64/libpthread.so.0(pthread_spin_lock+0x0)[0x7fa407311380]
/usr/lib64/libglusterfs.so.0(__gf_free+0x14a)[0x7fa407ba81fa]
/usr/lib64/glusterfs/3.6.0.15/xlator/mgmt/glusterd.so(glusterd_peer_destroy+0x3f)[0x7fa3fca79c3f]
/usr/lib64/glusterfs/3.6.0.15/xlator/mgmt/glusterd.so(glusterd_friend_cleanup+0xb8)[0x7fa3fca880b8]
/usr/lib64/glusterfs/3.6.0.15/xlator/mgmt/glusterd.so(+0x4a49f)[0x7fa3fca5f49f]
/usr/lib64/glusterfs/3.6.0.15/xlator/mgmt/glusterd.so(glusterd_friend_sm+0x1c6)[0x7fa3fca5ff36]
/usr/lib64/glusterfs/3.6.0.15/xlator/mgmt/glusterd.so(__glusterd_handle_cli_deprobe+0x1b9)[0x7fa3fca5db39]
/usr/lib64/glusterfs/3.6.0.15/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f)[0x7fa3fca45e2f]
/usr/lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x7fa407bb6742]
/lib64/libc.so.6(+0x43bf0)[0x7fa406bd4bf0]
---------


bt,
(gdb) bt
#0  0x00007fa407311380 in pthread_spin_lock () from /lib64/libpthread.so.0
#1  0x00007fa407ba81fa in __gf_free () from /usr/lib64/libglusterfs.so.0
#2  0x00007fa3fca79c3f in glusterd_peer_destroy () from /usr/lib64/glusterfs/3.6.0.15/xlator/mgmt/glusterd.so
#3  0x00007fa3fca880b8 in glusterd_friend_cleanup () from /usr/lib64/glusterfs/3.6.0.15/xlator/mgmt/glusterd.so
#4  0x00007fa3fca5f49f in ?? () from /usr/lib64/glusterfs/3.6.0.15/xlator/mgmt/glusterd.so
#5  0x00007fa3fca5ff36 in glusterd_friend_sm () from /usr/lib64/glusterfs/3.6.0.15/xlator/mgmt/glusterd.so
#6  0x00007fa3fca5db39 in __glusterd_handle_cli_deprobe () from /usr/lib64/glusterfs/3.6.0.15/xlator/mgmt/glusterd.so
#7  0x00007fa3fca45e2f in glusterd_big_locked_handler () from /usr/lib64/glusterfs/3.6.0.15/xlator/mgmt/glusterd.so
#8  0x00007fa407bb6742 in synctask_wrap () from /usr/lib64/libglusterfs.so.0
#9  0x00007fa406bd4bf0 in ?? () from /lib64/libc.so.6
#10 0x0000000000000000 in ?? ()


memory on the node on which the gluster command was executed,
[root@nfs1 ~]# free -tg
             total       used       free     shared    buffers     cached
Mem:             7          1          6          0          0          0
-/+ buffers/cache:          0          7
Swap:            7          0          7
Total:          15          1         14



Expected results:
detach should not result in crash.

Additional info:

Comment 1 Saurabh 2014-06-12 09:26:27 UTC

Created attachment 908018 [details]
sosreport of existing rhs node

Comment 3 Saurabh 2014-06-12 09:29:45 UTC

Created attachment 908022 [details]
coredump

Comment 6 Saurabh 2014-06-20 09:49:56 UTC

[root@nfs1 ~]# gluster peer detach rhsauto005.lab.eng.blr.redhat.com
peer detach: success
[root@nfs1 ~]# 
[root@nfs1 ~]# gluster peer status
Number of Peers: 3

Hostname: 10.70.37.215
Uuid: b9eded1c-fbae-4e9b-aa31-26a06e747d83
State: Peer in Cluster (Connected)

Hostname: 10.70.37.44
Uuid: e3a7651a-2d8d-4cd3-8dc0-e607dc019754
State: Peer in Cluster (Connected)

Hostname: 10.70.37.201
Uuid: 542bf4aa-b6b5-40c3-82bf-f344fb637a99
State: Peer in Cluster (Connected)


and I didn't see any core dump after these commands, hence moving this BZ to verified.

Comment 8 Pavithra 2014-08-05 05:39:52 UTC

Hi KP,

Please review the edited doc text for technical accuracy and sign off.

Comment 9 Pavithra 2014-08-05 05:39:53 UTC

Hi KP,

Please review the edited doc text for technical accuracy and sign off.

Comment 11 errata-xmlrpc 2014-09-22 19:41:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html

Comment 13 Red Hat Bugzilla 2023-09-14 02:09:55 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days