Bug 1235540

Summary: peer probe results in Peer Rejected(Connected)
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Saurabh <saujain>
Component: coreAssignee: rjoseph
Status: CLOSED ERRATA QA Contact: Saurabh <saujain>
Severity: urgent Docs Contact:
Priority: high    
Version: rhgs-3.1CC: amainkar, annair, ansubram, asrivast, kkeithle, mmadhusu, mzywusko, ndevos, rhs-bugs, rjoseph, skoduri, storage-qa-internal, vagarwal
Target Milestone: ---   
Target Release: RHGS 3.1.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.7.1-6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1235751 (view as bug list) Environment:
Last Closed: 2015-07-29 05:07:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1202842, 1235751, 1236019    
Attachments:
Description Flags
sosreport of nfs11
none
sosreport of nfs15 none

Description Saurabh 2015-06-25 06:47:30 UTC
Description of problem:
I am having a cluster of 4 nodes and tried to add one more node to the existing cluster.
This addition resulted in a Peer Rejected(connected) issue.
I found that the glusterfsd.log mentions,

[2015-06-25 11:51:24.197134] E [MSGID: 106010] [glusterd-utils.c:2649:glusterd_compare_friend_volume] 0-management: Version of Cksums gluster_shared_storage differ. local cksum = 1662818965, remote cksum = 2342825121 on peer 10.70.46.39

Could not make out how the cksum is different only for volume gluster_shared_storage, whereas the other existing volume vol2, it is same.


Version-Release number of selected component (if applicable):
glusterfs-3.7.1-5.el6rhs.x86_64

How reproducible:
always

Steps to Reproduce:
1. create the meta volume gluster_shared_storage, start it, mount it on all nodes with the native client
2. create another volume called as vol2, start it
3. gluster peer probe <ip of new node>

Actual results:
from exiting node, where peer probe command was executed,
[root@nfs11 ~]# gluster peer status
Number of Peers: 4

Hostname: 10.70.46.27
Uuid: 238ffb63-3548-43c3-a527-ed53aa8f188c
State: Peer in Cluster (Connected)

Hostname: 10.70.46.25
Uuid: 99ddf436-01d1-4c62-8b21-096b1a08a6de
State: Peer in Cluster (Connected)

Hostname: 10.70.46.29
Uuid: d05e8c04-9142-406a-9374-51b478ced7e5
State: Peer in Cluster (Connected)

Hostname: 10.70.46.39
Uuid: 745b58ba-c963-4004-93fe-5ada9b39d107
State: Peer Rejected (Connected)


from the new node, which is attempted to join the cluster,
[root@nfs15 ~]# gluster peer status
Number of Peers: 1

Hostname: 10.70.46.8
Uuid: dea164a8-d55c-409e-92a5-960fd1dcf7d5
State: Peer Rejected (Connected)


Expected results:
peer probe should be succesful, cksum should be same for existing volume on all nodes including the new one.

Additional info:

Comment 2 Saurabh 2015-06-25 07:15:50 UTC
Created attachment 1042978 [details]
sosreport of nfs11

Comment 3 Saurabh 2015-06-25 07:21:35 UTC
Created attachment 1042980 [details]
sosreport of nfs15

Comment 4 Saurabh 2015-06-25 10:33:30 UTC
I tried to reproduce the issue with a new VM having the latest iso installed and issue happens with this one,

[root@nfs11 ~]# gluster peer probe 10.70.46.22
peer probe: success. 
[root@nfs11 ~]# gluster peer status
Number of Peers: 4

Hostname: 10.70.46.27
Uuid: 238ffb63-3548-43c3-a527-ed53aa8f188c
State: Peer in Cluster (Connected)

Hostname: 10.70.46.25
Uuid: 99ddf436-01d1-4c62-8b21-096b1a08a6de
State: Peer in Cluster (Connected)

Hostname: 10.70.46.29
Uuid: d05e8c04-9142-406a-9374-51b478ced7e5
State: Peer in Cluster (Connected)

Hostname: 10.70.46.22
Uuid: 39aea6ea-602f-472c-8a00-e72d253d04d6
State: Peer Rejected (Connected)




[root@nfs16 ~]# gluster peer status
Number of Peers: 1

Hostname: 10.70.46.8
Uuid: dea164a8-d55c-409e-92a5-960fd1dcf7d5
State: Peer Rejected (Connected)


[2015-06-25 10:24:00.818488] E [MSGID: 106010] [glusterd-utils.c:2649:glusterd_compare_friend_volume] 0-management: Version of Cksums gluster_shared_storage differ. local cksum = 1662818965, remote cksum = 2342825121 on peer 10.70.46.22
[2015-06-25 10:24:00.818626] I [glusterd-handler.c:3719:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to 10.70.46.22 (0), ret: 0
[2015-06-25 10:24:03.543396] I [glusterd-handler.c:1395:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req

Comment 5 rjoseph 2015-06-25 15:29:32 UTC
Initial RCA:

ganesha is started with the below mentioned command:
gluster nfs-ganesha enable

The above command will disable gluster nfs and thus "nfs.disable" will set to "on" in volinfo for all the volumes in the cluster. This option is set in volinfo but not persisted in the store. Because of which during handshake the new node gets "nfs.disable" as "on" but the current node does not have this data present in the store leading to mismatched cksum.

I will investigate further and send a patch soon.

Comment 7 rjoseph 2015-06-26 13:33:33 UTC
Upstream master: http://review.gluster.org/11412/
Upstream release 3.7: http://review.gluster.org/11428
RHGS 3.1: https://code.engineering.redhat.com/gerrit/51703/

Comment 9 Saurabh 2015-07-17 13:26:20 UTC
Executed a peer probe to a new vm and it worked fine.

Comment 10 errata-xmlrpc 2015-07-29 05:07:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html