1237022 – Probing a new RHGS node, which is part of another cluster, should throw proper error message in logs and CLI

Bug 1237022 - Probing a new RHGS node, which is part of another cluster, should throw proper error message in logs and CLI

Summary: Probing a new RHGS node, which is part of another cluster, should throw prope...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.1.2
Assignee:	Gaurav Kumar Garg
QA Contact:	Byreddy
Docs Contact:
URL:
Whiteboard:	glusterd
Depends On:	1004699
Blocks:	1010153 1216951 1252448 1260783
TreeView+	depends on / blocked

Reported:	2015-06-30 08:37 UTC by SATHEESARAN
Modified:	2016-03-21 10:43 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-3.7.5-0.3
Doc Type:	Bug Fix
Doc Text:	Previously, when the user tried to add a node to the existing cluster using "gluster peer probe \ <ip/hostname>" command, then the command failed without providing the proper cause of failure. With this fix, the proper error message is displayed during peer probe with an already existing cluster.
Clone Of:	1004699
Clones:	1252448 1319688 (view as bug list)
Environment:
Last Closed:	2016-03-01 05:27:18 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:0193	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 update 2	2016-03-01 10:20:36 UTC

Description SATHEESARAN 2015-06-30 08:37:59 UTC

+++ This bug was initially created as a clone of Bug #1004699 +++

Description of problem:
glusterd: If RHSS is already part of another cluster and User tries to add it using command 'gluster peer probe <hostname/ip>' ; It is failing  with error 'peer probe: failed:' but not giving reason for failure


Version-Release number of selected component (if applicable):
3.4.0.30rhs-2.el6rhs.x86_64

How reproducible:
always

Steps to Reproduce:
1. had a cluster of 2 RHSS
[root@DHT2 ~]# gluster peer status
Number of Peers: 1

Hostname: 10.70.37.195
Uuid: 0d0f02d7-1dd1-4252-bff9-3c28b113fba0
State: Peer in Cluster (Connected)


2. try to add one of this RHSS node from 3rd RHSS as below

[root@DHT3 ~]# gluster peer probe 10.70.37.66
peer probe: failed: 
[root@DHT3 ~]# echo $?
1



Actual results:
It is not giving any reason for failure

Expected results:
It should give reason for failure. It should say that host is already part of another cluster

-> In Anshi It was giving reason for failures as below:-

[root@localhost ~]# gluster peer probe 10.70.42.186
10.70.42.186 is already part of another cluster

[root@localhost ~]# glusterfs -V
glusterfs 3.3.0.7rhs built on Mar 20 2013 13:29:01
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU
General Public License.


Additional info:
log snippet:-
/var/log/glusterfs/etc-glusterfs-glusterd.vol.log 

[2013-09-05 05:08:24.320028] I [glusterd-handler.c:821:__glusterd_handle_cli_probe] 0-glusterd: Received CLI probe req 10.70.37.66 24007
[2013-09-05 05:08:24.577461] I [glusterd-handler.c:2905:glusterd_probe_begin] 0-glusterd: Unable to find peerinfo for host: 10.70.37.66 (24007)
[2013-09-05 05:08:24.730458] I [rpc-clnt.c:974:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2013-09-05 05:08:24.730618] I [socket.c:3487:socket_init] 0-management: SSL support is NOT enabled
[2013-09-05 05:08:24.730646] I [socket.c:3502:socket_init] 0-management: using system polling thread
[2013-09-05 05:08:24.735911] I [glusterd-handler.c:2886:glusterd_friend_add] 0-management: connect returned 0
[2013-09-05 05:08:24.936651] I [glusterd-rpc-ops.c:235:__glusterd_probe_cbk] 0-glusterd: Received probe resp from uuid: bdfdd4e6-9d7a-4759-8c20-bb5e76adc3d5, host: 10.70.37.66
[2013-09-05 05:08:24.936980] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=2236 max=2 total=4
[2013-09-05 05:08:24.937095] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=2 total=4

Comment 1 SATHEESARAN 2015-06-30 08:42:57 UTC

I have noticed that there were messages neither in logs nor in CLI, about the actual reason of failure.

User/Admin should be given few clues like - "host is already part of another cluster", which could be helpful and that is how RHGS used to fail in RHS 2.0

This is not a functional issue, but about requirement to provide helpful message in logs/CLI

Comment 3 Gaurav Kumar Garg 2015-07-15 07:04:26 UTC

marking this bug as a known issue.

Comment 4 monti lawrence 2015-07-22 20:58:46 UTC

Doc text is edited. Please sign off to be included in Known Issues.

Comment 5 Gaurav Kumar Garg 2015-07-27 05:37:32 UTC

hi,

this doc text looks good to me.

Comment 6 Anand Nekkunti 2015-10-05 05:54:34 UTC

Upstream patch merged: http://review.gluster.org/11884

Comment 9 Byreddy 2015-10-20 03:43:09 UTC

With the RHGS version "glusterfs-3.7.5-0.3." This issue no more observed.

Steps done:
1. Created two node (node-1 & node-2) cluster using rhel7.2 with rhgs 3.1.2
2. From another node (node-3) tried to peer probe the node-1 which is already part of cluster and got the error message saying that node (node-1) is already part of cluster.

with above info, moving this bug to next state.

Comment 11 errata-xmlrpc 2016-03-01 05:27:18 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html

Note You need to log in before you can comment on or make changes to this bug.