1001056 – glusterd: Probing a machine which is part of another cluster fails but no error reported in CLI and glusterd logs

Bug 1001056 - glusterd: Probing a machine which is part of another cluster fails but no error reported in CLI and glusterd logs

Summary: glusterd: Probing a machine which is part of another cluster fails but no er...

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	2.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Nagaprasad Sathyanarayana
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:	glusterd
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-08-26 12:24 UTC by Rahul Hinduja
Modified:	2016-02-18 00:20 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-12-03 17:23:12 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Rahul Hinduja 2013-08-26 12:24:16 UTC

Description of problem:
=======================

Probe a machine which is a part of another cluster fails which is expected but no error is reported in the cli and in the logs.


[root@dj ~]# gluster peer probe 10.70.34.119
peer probe: failed: 
[root@dj ~]# 


logs on dj:
===========


[2013-08-26 04:41:37.573560] I [glusterd-handler.c:821:__glusterd_handle_cli_probe] 0-glusterd: Received CLI probe req 10.70.34.119 24007
[2013-08-26 04:41:37.603588] I [glusterd-handler.c:2905:glusterd_probe_begin] 0-glusterd: Unable to find peerinfo for host: 10.70.34.119 (24007)
[2013-08-26 04:41:37.608636] I [rpc-clnt.c:967:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2013-08-26 04:41:37.608721] I [socket.c:3487:socket_init] 0-management: SSL support is NOT enabled
[2013-08-26 04:41:37.608740] I [socket.c:3502:socket_init] 0-management: using system polling thread
[2013-08-26 04:41:37.612267] I [glusterd-handler.c:2886:glusterd_friend_add] 0-management: connect returned 0
[2013-08-26 04:41:37.690146] I [glusterd-rpc-ops.c:241:__glusterd_probe_cbk] 0-glusterd: Received probe resp from uuid: 5e3e1a7c-5bc6-4bb7-add9-afd45b8ff33c, host: 10.70.34.119
[2013-08-26 04:41:37.690349] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=2236 max=2 total=4
[2013-08-26 04:41:37.690370] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=2 total=4





Log on 10.70.34.119 machine which was being probed.
===================================================

[2013-08-26 11:59:08.587823] I [glusterd-handshake.c:553:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 2
[2013-08-26 11:59:08.594390] I [glusterd-handler.c:2324:__glusterd_handle_probe_query] 0-glusterd: Received probe from uuid: 2dde2c42-1616-4109-b782-dd37185702d8
[2013-08-26 11:59:08.597958] I [glusterd-handler.c:2376:__glusterd_handle_probe_query] 0-glusterd: Responded to 10.70.34.90, op_ret: -1, op_errno: 3, ret: 0

Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.4.0.20rhs-2.el6rhs.x86_64


Steps to Reproduce:
===================
1. Probe a machine which is part of another cluster.

Actual results:
===============


# gluster peer probe 10.70.34.119
peer probe: failed: 
# 


Expected results:
Should report proper error that the machine is part of another cluster.

Additional info:
================

separating this issue from bug 1000986

Comment 2 SATHEESARAN 2014-11-13 11:17:42 UTC

This issue is no longer seen in RHS 3.0.3
I have tested the same, but log messages requires changes.

Here are the steps
1. rhss1 and rhss2 formed the trusted storage pool
2. rhss3 is a new node from which I tried to probe rhss1

Console logs on rhss3
----------------------
[root@rhss3 ~]# gluster peer probe 10.70.37.44
peer probe: failed: 10.70.37.44 is already part of another cluster

Content in .cmd_history on rhss3
---------------------------------
[2014-11-13 16:23:41.426243]  : pe probe 10.70.37.44 : FAILED : 10.70.37.44 is already part of another cluster

Corresponding glusterd logs on that machine rhss3
---------------------------------------------------
2014-11-13 16:22:41.239253] E [rpc-transport.c:481:rpc_transport_unref] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f) [0x7fc31d726e4f] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xcb) [0x7fc31d7259eb] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_unref+0x63) [0x7fc31d724623]))) 0-rpc_transport: invalid argument: this
[2014-11-13 16:23:41.407235] I [glusterd-handler.c:1109:__glusterd_handle_cli_probe] 0-glusterd: Received CLI probe req 10.70.37.44 24007
[2014-11-13 16:23:41.411602] I [glusterd-handler.c:3199:glusterd_probe_begin] 0-glusterd: Unable to find peerinfo for host: 10.70.37.44 (24007)
[2014-11-13 16:23:41.416065] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2014-11-13 16:23:41.418799] I [glusterd-handler.c:3180:glusterd_friend_add] 0-management: connect returned 0
[2014-11-13 16:23:41.426210] I [glusterd-rpc-ops.c:237:__glusterd_probe_cbk] 0-glusterd: Received probe resp from uuid: 1d9677dc-6159-405e-9319-ad85ec030880, host: 10.70.37.44

Corresponding glusterd logs on that machine rhss1
--------------------------------------------------
[2014-11-13 16:23:41.341482] I [glusterd-handshake.c:1011:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30000
[2014-11-13 16:23:41.344539] I [glusterd-handler.c:2611:__glusterd_handle_probe_query] 0-glusterd: Received probe from uuid: b5ab2ec3-5411-45fa-a30f-43bd04caf96b
[2014-11-13 16:23:41.346843] I [glusterd-handler.c:2663:__glusterd_handle_probe_query] 0-glusterd: Responded to 10.70.37.216, op_ret: -1, op_errno: 3, ret: 0

tl;dr :

glusterd logs needs to be improved to contain the information regarding the "Peer Probe failure" and the reason behind it

Comment 4 Vivek Agarwal 2015-12-03 17:23:12 UTC

Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.

Note You need to log in before you can comment on or make changes to this bug.