Bug 1108570 - RHS3.0: glusterd crash when detaching a node from existing cluster with peer detach command
Summary: RHS3.0: glusterd crash when detaching a node from existing cluster with peer ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: core
Version: rhgs-3.0
Hardware: x86_64
OS: All
urgent
high
Target Milestone: ---
: RHGS 3.0.0
Assignee: krishnan parthasarathi
QA Contact: Saurabh
URL:
Whiteboard:
Depends On:
Blocks: 1109812
TreeView+ depends on / blocked
 
Reported: 2014-06-12 09:19 UTC by Saurabh
Modified: 2023-09-14 02:09 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously when the peer that is probed for was offline and the peer-probe or peer-detach commands were executed in quick succession, the glusterd Management Service would become unresponsive. With this fix, the peer-probe and peer-detach commands work as expected.
Clone Of:
: 1109812 (view as bug list)
Environment:
Last Closed: 2014-09-22 19:41:10 UTC
Embargoed:


Attachments (Terms of Use)
sosreport of existing rhs node (6.72 MB, application/x-xz)
2014-06-12 09:26 UTC, Saurabh
no flags Details
coredump (400.57 KB, application/x-xz)
2014-06-12 09:29 UTC, Saurabh
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:1278 0 normal SHIPPED_LIVE Red Hat Storage Server 3.0 bug fix and enhancement update 2014-09-22 23:26:55 UTC

Description Saurabh 2014-06-12 09:19:29 UTC
Description of problem:
Executed the gluster peer detach comamnd to detach a RHSS node from the cluster.

This resulted in a crash for glusterd

A Bz 1108505 just before this BZ was filed, that was related to gluster peer probe failure.

Version-Release number of selected component (if applicable):
glusterfs-3.6.0.15-1.el6rhs.x86_64

How reproducible:
happens to be seen this time

Steps to Reproduce:
1. gluster peer detach <rhss-nodename>
2.
3.

Actual results:
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2014-06-11 22:18:03
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.6.0.15
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x7fa407b7ae56]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x7fa407b9528f]
/lib64/libc.so.6(+0x329a0)[0x7fa406bc39a0]
/lib64/libpthread.so.0(pthread_spin_lock+0x0)[0x7fa407311380]
/usr/lib64/libglusterfs.so.0(__gf_free+0x14a)[0x7fa407ba81fa]
/usr/lib64/glusterfs/3.6.0.15/xlator/mgmt/glusterd.so(glusterd_peer_destroy+0x3f)[0x7fa3fca79c3f]
/usr/lib64/glusterfs/3.6.0.15/xlator/mgmt/glusterd.so(glusterd_friend_cleanup+0xb8)[0x7fa3fca880b8]
/usr/lib64/glusterfs/3.6.0.15/xlator/mgmt/glusterd.so(+0x4a49f)[0x7fa3fca5f49f]
/usr/lib64/glusterfs/3.6.0.15/xlator/mgmt/glusterd.so(glusterd_friend_sm+0x1c6)[0x7fa3fca5ff36]
/usr/lib64/glusterfs/3.6.0.15/xlator/mgmt/glusterd.so(__glusterd_handle_cli_deprobe+0x1b9)[0x7fa3fca5db39]
/usr/lib64/glusterfs/3.6.0.15/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f)[0x7fa3fca45e2f]
/usr/lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x7fa407bb6742]
/lib64/libc.so.6(+0x43bf0)[0x7fa406bd4bf0]
---------


bt,
(gdb) bt
#0  0x00007fa407311380 in pthread_spin_lock () from /lib64/libpthread.so.0
#1  0x00007fa407ba81fa in __gf_free () from /usr/lib64/libglusterfs.so.0
#2  0x00007fa3fca79c3f in glusterd_peer_destroy () from /usr/lib64/glusterfs/3.6.0.15/xlator/mgmt/glusterd.so
#3  0x00007fa3fca880b8 in glusterd_friend_cleanup () from /usr/lib64/glusterfs/3.6.0.15/xlator/mgmt/glusterd.so
#4  0x00007fa3fca5f49f in ?? () from /usr/lib64/glusterfs/3.6.0.15/xlator/mgmt/glusterd.so
#5  0x00007fa3fca5ff36 in glusterd_friend_sm () from /usr/lib64/glusterfs/3.6.0.15/xlator/mgmt/glusterd.so
#6  0x00007fa3fca5db39 in __glusterd_handle_cli_deprobe () from /usr/lib64/glusterfs/3.6.0.15/xlator/mgmt/glusterd.so
#7  0x00007fa3fca45e2f in glusterd_big_locked_handler () from /usr/lib64/glusterfs/3.6.0.15/xlator/mgmt/glusterd.so
#8  0x00007fa407bb6742 in synctask_wrap () from /usr/lib64/libglusterfs.so.0
#9  0x00007fa406bd4bf0 in ?? () from /lib64/libc.so.6
#10 0x0000000000000000 in ?? ()


memory on the node on which the gluster command was executed,
[root@nfs1 ~]# free -tg
             total       used       free     shared    buffers     cached
Mem:             7          1          6          0          0          0
-/+ buffers/cache:          0          7
Swap:            7          0          7
Total:          15          1         14



Expected results:
detach should not result in crash.

Additional info:

Comment 1 Saurabh 2014-06-12 09:26:27 UTC
Created attachment 908018 [details]
sosreport of existing rhs node

Comment 3 Saurabh 2014-06-12 09:29:45 UTC
Created attachment 908022 [details]
coredump

Comment 6 Saurabh 2014-06-20 09:49:56 UTC
[root@nfs1 ~]# gluster peer detach rhsauto005.lab.eng.blr.redhat.com
peer detach: success
[root@nfs1 ~]# 
[root@nfs1 ~]# gluster peer status
Number of Peers: 3

Hostname: 10.70.37.215
Uuid: b9eded1c-fbae-4e9b-aa31-26a06e747d83
State: Peer in Cluster (Connected)

Hostname: 10.70.37.44
Uuid: e3a7651a-2d8d-4cd3-8dc0-e607dc019754
State: Peer in Cluster (Connected)

Hostname: 10.70.37.201
Uuid: 542bf4aa-b6b5-40c3-82bf-f344fb637a99
State: Peer in Cluster (Connected)


and I didn't see any core dump after these commands, hence moving this BZ to verified.

Comment 8 Pavithra 2014-08-05 05:39:52 UTC
Hi KP,

Please review the edited doc text for technical accuracy and sign off.

Comment 9 Pavithra 2014-08-05 05:39:53 UTC
Hi KP,

Please review the edited doc text for technical accuracy and sign off.

Comment 11 errata-xmlrpc 2014-09-22 19:41:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html

Comment 13 Red Hat Bugzilla 2023-09-14 02:09:55 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.