1104626 – rebalance: peer probe fails to add a new peer when rebalance is in progress

Bug 1104626 - rebalance: peer probe fails to add a new peer when rebalance is in progress

Summary: rebalance: peer probe fails to add a new peer when rebalance is in progress

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	rhgs-3.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Bug Updates Notification Mailing List
QA Contact:	Matt Zywusko
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-06-04 11:28 UTC by Saurabh
Modified:	2023-09-14 02:09 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-04-20 05:43:01 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
new-node sosreport (5.80 MB, application/x-xz) 2014-06-04 11:28 UTC, Saurabh	no flags	Details
sosreport from where peer probe was executed (12.60 MB, application/x-xz) 2014-06-04 11:31 UTC, Saurabh	no flags	Details
View All

Description Saurabh 2014-06-04 11:28:23 UTC

Created attachment 902156 [details]
new-node sosreport

Description of problem:
While rebalance was in progress tried out to peer probe an rhs node, the peer probe was unsuccessful.


Version-Release number of selected component (if applicable):
glusterfs-3.6.0.12-1.el6rhs.x86_64

How reproducible:
happened to be seen this time

Steps to Reproduce:
take a four node cluster
1. create a volume of 6x2 type, start it
2. mount the volume over nfs
3. create some directories and files.
4. once the data creation is finished, add-brick and start rebalance
5. while rebalance is going on, try probe a new rhs node.
6. gluster peer status

Actual results:
step 6 result on the node, where the peer probe command was executed,
[root@nfs1 ~]# gluster peer status
Number of Peers: 4

Hostname: 10.70.37.215
Uuid: 77f03019-30a1-4e81-b8df-6613159c8890
State: Peer in Cluster (Connected)

Hostname: 10.70.37.44
Uuid: ad14a2bb-d39c-4bdf-93e8-32c7568c6d05
State: Peer in Cluster (Connected)

Hostname: 10.70.37.201
Uuid: 3aaa0a5e-91d9-46c9-bb46-a46947ddaca5
State: Peer in Cluster (Connected)

Hostname: rhsauto049.lab.eng.blr.redhat.com
Uuid: 821e3f6f-5438-41fb-8a5d-f060704d0e8a
State: Probe Sent to Peer (Connected)


peer status from the already node existing of the cluster,
[root@nfs2 ~]# gluster peer status
Number of Peers: 3

Hostname: 10.70.37.62
Uuid: cb4a3869-24e0-4817-be29-73621ff218cb
State: Peer in Cluster (Connected)

Hostname: 10.70.37.201
Uuid: 3aaa0a5e-91d9-46c9-bb46-a46947ddaca5
State: Peer in Cluster (Connected)

Hostname: 10.70.37.44
Uuid: ad14a2bb-d39c-4bdf-93e8-32c7568c6d05
State: Peer in Cluster (Connected)


gluster peer status from the new rhs node, 
[root@rhsauto049 ~]# gluster peer status
[root@rhsauto049 ~]# 

on new node, the glusterd logs
2014-06-04 10:58:27.655442] I [glusterd-handler.c:1314:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2014-06-04 10:58:27.655567] I [socket.c:3148:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1)
[2014-06-04 10:58:27.655596] E [rpcsvc.c:1247:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 3) to rpc-transport (socket.management)
[2014-06-04 10:58:27.655621] E [glusterd-utils.c:410:glusterd_submit_reply] 0-: Reply submission failed
[2014-06-04 10:58:27.658359] I [socket.c:2239:socket_event_handler] 0-transport: disconnecting now
[2014-06-04 10:58:27.658409] I [glusterd-handler.c:1314:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2014-06-04 10:58:27.658501] I [socket.c:3148:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1)
[2014-06-04 10:58:27.658550] E [rpcsvc.c:1247:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 3) to rpc-transport (socket.management)
[2014-06-04 10:58:27.658583] E [glusterd-utils.c:410:glusterd_submit_reply] 0-: Reply submission failed
[2014-06-04 10:58:27.659642] I [socket.c:2239:socket_event_handler] 0-transport: disconnecting now
[2014-06-04 10:58:27.659699] I [glusterd-handler.c:1314:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2014-06-04 10:58:27.659838] I [socket.c:3148:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1)
[2014-06-04 10:58:27.659868] E [rpcsvc.c:1247:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 3) to rpc-transport (socket.management)
[2014-06-04 10:58:27.659892] E [glusterd-utils.c:410:glusterd_submit_reply] 0-: Reply submission failed

Expected results:

peer probe is expected to pass and the status on all nodes should be same which is not the case at this time

Additional info:

Comment 1 Saurabh 2014-06-04 11:31:15 UTC

Created attachment 902157 [details]
sosreport from where peer probe was executed

Comment 5 Atin Mukherjee 2015-04-06 09:32:52 UTC

As per the logs it looks like there was a flaky n/w around 11:38 because of which peer probe command was bailed out:

[2014-06-04 10:38:26.033906] E [rpc-clnt.c:201:call_bail] 0-management: bailing out frame type(GLUSTERD-DUMP) op(DUMP(1)) xid = 0x1 sent = 2014-06-04 10:28:25.894138. timeout = 600 for 10.70.37.62:24007
[2014-06-04 10:38:26.034024] E [glusterd-handshake.c:1650:__glusterd_peer_dump_version_cbk] 0-: Error through RPC layer, retry again later

This doesn't look like a issue at the application layer. Could you retest and confirm the behaviour. Request you to close this bug and kindly re-open if the problem persists.

Comment 7 Vivek Agarwal 2015-04-20 05:43:01 UTC

Per discussion with Atin, this works. Please reopen if you see this.

Comment 8 Red Hat Bugzilla 2023-09-14 02:09:17 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.