1235540 – peer probe results in Peer Rejected(Connected)

Bug 1235540 - peer probe results in Peer Rejected(Connected)

Summary: peer probe results in Peer Rejected(Connected)

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	core
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.1.0
Assignee:	rjoseph
QA Contact:	Saurabh
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1202842 1235751 1236019
TreeView+	depends on / blocked

Reported:	2015-06-25 06:47 UTC by Saurabh
Modified:	2016-09-17 14:38 UTC (History)
CC List:	13 users (show)
Fixed In Version:	glusterfs-3.7.1-6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1235751 (view as bug list)
Environment:
Last Closed:	2015-07-29 05:07:33 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
sosreport of nfs11 (7.74 MB, application/x-xz) 2015-06-25 07:15 UTC, Saurabh	no flags	Details
sosreport of nfs15 (6.57 MB, application/x-xz) 2015-06-25 07:21 UTC, Saurabh	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:1495	0	normal	SHIPPED_LIVE	Important: Red Hat Gluster Storage 3.1 update	2015-07-29 08:26:26 UTC

Description Saurabh 2015-06-25 06:47:30 UTC

Description of problem:
I am having a cluster of 4 nodes and tried to add one more node to the existing cluster.
This addition resulted in a Peer Rejected(connected) issue.
I found that the glusterfsd.log mentions,

[2015-06-25 11:51:24.197134] E [MSGID: 106010] [glusterd-utils.c:2649:glusterd_compare_friend_volume] 0-management: Version of Cksums gluster_shared_storage differ. local cksum = 1662818965, remote cksum = 2342825121 on peer 10.70.46.39

Could not make out how the cksum is different only for volume gluster_shared_storage, whereas the other existing volume vol2, it is same.


Version-Release number of selected component (if applicable):
glusterfs-3.7.1-5.el6rhs.x86_64

How reproducible:
always

Steps to Reproduce:
1. create the meta volume gluster_shared_storage, start it, mount it on all nodes with the native client
2. create another volume called as vol2, start it
3. gluster peer probe <ip of new node>

Actual results:
from exiting node, where peer probe command was executed,
[root@nfs11 ~]# gluster peer status
Number of Peers: 4

Hostname: 10.70.46.27
Uuid: 238ffb63-3548-43c3-a527-ed53aa8f188c
State: Peer in Cluster (Connected)

Hostname: 10.70.46.25
Uuid: 99ddf436-01d1-4c62-8b21-096b1a08a6de
State: Peer in Cluster (Connected)

Hostname: 10.70.46.29
Uuid: d05e8c04-9142-406a-9374-51b478ced7e5
State: Peer in Cluster (Connected)

Hostname: 10.70.46.39
Uuid: 745b58ba-c963-4004-93fe-5ada9b39d107
State: Peer Rejected (Connected)


from the new node, which is attempted to join the cluster,
[root@nfs15 ~]# gluster peer status
Number of Peers: 1

Hostname: 10.70.46.8
Uuid: dea164a8-d55c-409e-92a5-960fd1dcf7d5
State: Peer Rejected (Connected)


Expected results:
peer probe should be succesful, cksum should be same for existing volume on all nodes including the new one.

Additional info:

Comment 2 Saurabh 2015-06-25 07:15:50 UTC

Created attachment 1042978 [details]
sosreport of nfs11

Comment 3 Saurabh 2015-06-25 07:21:35 UTC

Created attachment 1042980 [details]
sosreport of nfs15

Comment 4 Saurabh 2015-06-25 10:33:30 UTC

I tried to reproduce the issue with a new VM having the latest iso installed and issue happens with this one,

[root@nfs11 ~]# gluster peer probe 10.70.46.22
peer probe: success. 
[root@nfs11 ~]# gluster peer status
Number of Peers: 4

Hostname: 10.70.46.27
Uuid: 238ffb63-3548-43c3-a527-ed53aa8f188c
State: Peer in Cluster (Connected)

Hostname: 10.70.46.25
Uuid: 99ddf436-01d1-4c62-8b21-096b1a08a6de
State: Peer in Cluster (Connected)

Hostname: 10.70.46.29
Uuid: d05e8c04-9142-406a-9374-51b478ced7e5
State: Peer in Cluster (Connected)

Hostname: 10.70.46.22
Uuid: 39aea6ea-602f-472c-8a00-e72d253d04d6
State: Peer Rejected (Connected)




[root@nfs16 ~]# gluster peer status
Number of Peers: 1

Hostname: 10.70.46.8
Uuid: dea164a8-d55c-409e-92a5-960fd1dcf7d5
State: Peer Rejected (Connected)


[2015-06-25 10:24:00.818488] E [MSGID: 106010] [glusterd-utils.c:2649:glusterd_compare_friend_volume] 0-management: Version of Cksums gluster_shared_storage differ. local cksum = 1662818965, remote cksum = 2342825121 on peer 10.70.46.22
[2015-06-25 10:24:00.818626] I [glusterd-handler.c:3719:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to 10.70.46.22 (0), ret: 0
[2015-06-25 10:24:03.543396] I [glusterd-handler.c:1395:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req

Comment 5 rjoseph 2015-06-25 15:29:32 UTC

Initial RCA:

ganesha is started with the below mentioned command:
gluster nfs-ganesha enable

The above command will disable gluster nfs and thus "nfs.disable" will set to "on" in volinfo for all the volumes in the cluster. This option is set in volinfo but not persisted in the store. Because of which during handshake the new node gets "nfs.disable" as "on" but the current node does not have this data present in the store leading to mismatched cksum.

I will investigate further and send a patch soon.

Comment 7 rjoseph 2015-06-26 13:33:33 UTC

Upstream master: http://review.gluster.org/11412/
Upstream release 3.7: http://review.gluster.org/11428
RHGS 3.1: https://code.engineering.redhat.com/gerrit/51703/

Comment 9 Saurabh 2015-07-17 13:26:20 UTC

Executed a peer probe to a new vm and it worked fine.

Comment 10 errata-xmlrpc 2015-07-29 05:07:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html

Note You need to log in before you can comment on or make changes to this bug.