Bug 1235540 - peer probe results in Peer Rejected(Connected)
Summary: peer probe results in Peer Rejected(Connected)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: core
Version: rhgs-3.1
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ---
: RHGS 3.1.0
Assignee: rjoseph
QA Contact: Saurabh
URL:
Whiteboard:
Depends On:
Blocks: 1202842 1235751 1236019
TreeView+ depends on / blocked
 
Reported: 2015-06-25 06:47 UTC by Saurabh
Modified: 2016-09-17 14:38 UTC (History)
13 users (show)

Fixed In Version: glusterfs-3.7.1-6
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1235751 (view as bug list)
Environment:
Last Closed: 2015-07-29 05:07:33 UTC
Embargoed:


Attachments (Terms of Use)
sosreport of nfs11 (7.74 MB, application/x-xz)
2015-06-25 07:15 UTC, Saurabh
no flags Details
sosreport of nfs15 (6.57 MB, application/x-xz)
2015-06-25 07:21 UTC, Saurabh
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:1495 0 normal SHIPPED_LIVE Important: Red Hat Gluster Storage 3.1 update 2015-07-29 08:26:26 UTC

Description Saurabh 2015-06-25 06:47:30 UTC
Description of problem:
I am having a cluster of 4 nodes and tried to add one more node to the existing cluster.
This addition resulted in a Peer Rejected(connected) issue.
I found that the glusterfsd.log mentions,

[2015-06-25 11:51:24.197134] E [MSGID: 106010] [glusterd-utils.c:2649:glusterd_compare_friend_volume] 0-management: Version of Cksums gluster_shared_storage differ. local cksum = 1662818965, remote cksum = 2342825121 on peer 10.70.46.39

Could not make out how the cksum is different only for volume gluster_shared_storage, whereas the other existing volume vol2, it is same.


Version-Release number of selected component (if applicable):
glusterfs-3.7.1-5.el6rhs.x86_64

How reproducible:
always

Steps to Reproduce:
1. create the meta volume gluster_shared_storage, start it, mount it on all nodes with the native client
2. create another volume called as vol2, start it
3. gluster peer probe <ip of new node>

Actual results:
from exiting node, where peer probe command was executed,
[root@nfs11 ~]# gluster peer status
Number of Peers: 4

Hostname: 10.70.46.27
Uuid: 238ffb63-3548-43c3-a527-ed53aa8f188c
State: Peer in Cluster (Connected)

Hostname: 10.70.46.25
Uuid: 99ddf436-01d1-4c62-8b21-096b1a08a6de
State: Peer in Cluster (Connected)

Hostname: 10.70.46.29
Uuid: d05e8c04-9142-406a-9374-51b478ced7e5
State: Peer in Cluster (Connected)

Hostname: 10.70.46.39
Uuid: 745b58ba-c963-4004-93fe-5ada9b39d107
State: Peer Rejected (Connected)


from the new node, which is attempted to join the cluster,
[root@nfs15 ~]# gluster peer status
Number of Peers: 1

Hostname: 10.70.46.8
Uuid: dea164a8-d55c-409e-92a5-960fd1dcf7d5
State: Peer Rejected (Connected)


Expected results:
peer probe should be succesful, cksum should be same for existing volume on all nodes including the new one.

Additional info:

Comment 2 Saurabh 2015-06-25 07:15:50 UTC
Created attachment 1042978 [details]
sosreport of nfs11

Comment 3 Saurabh 2015-06-25 07:21:35 UTC
Created attachment 1042980 [details]
sosreport of nfs15

Comment 4 Saurabh 2015-06-25 10:33:30 UTC
I tried to reproduce the issue with a new VM having the latest iso installed and issue happens with this one,

[root@nfs11 ~]# gluster peer probe 10.70.46.22
peer probe: success. 
[root@nfs11 ~]# gluster peer status
Number of Peers: 4

Hostname: 10.70.46.27
Uuid: 238ffb63-3548-43c3-a527-ed53aa8f188c
State: Peer in Cluster (Connected)

Hostname: 10.70.46.25
Uuid: 99ddf436-01d1-4c62-8b21-096b1a08a6de
State: Peer in Cluster (Connected)

Hostname: 10.70.46.29
Uuid: d05e8c04-9142-406a-9374-51b478ced7e5
State: Peer in Cluster (Connected)

Hostname: 10.70.46.22
Uuid: 39aea6ea-602f-472c-8a00-e72d253d04d6
State: Peer Rejected (Connected)




[root@nfs16 ~]# gluster peer status
Number of Peers: 1

Hostname: 10.70.46.8
Uuid: dea164a8-d55c-409e-92a5-960fd1dcf7d5
State: Peer Rejected (Connected)


[2015-06-25 10:24:00.818488] E [MSGID: 106010] [glusterd-utils.c:2649:glusterd_compare_friend_volume] 0-management: Version of Cksums gluster_shared_storage differ. local cksum = 1662818965, remote cksum = 2342825121 on peer 10.70.46.22
[2015-06-25 10:24:00.818626] I [glusterd-handler.c:3719:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to 10.70.46.22 (0), ret: 0
[2015-06-25 10:24:03.543396] I [glusterd-handler.c:1395:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req

Comment 5 rjoseph 2015-06-25 15:29:32 UTC
Initial RCA:

ganesha is started with the below mentioned command:
gluster nfs-ganesha enable

The above command will disable gluster nfs and thus "nfs.disable" will set to "on" in volinfo for all the volumes in the cluster. This option is set in volinfo but not persisted in the store. Because of which during handshake the new node gets "nfs.disable" as "on" but the current node does not have this data present in the store leading to mismatched cksum.

I will investigate further and send a patch soon.

Comment 7 rjoseph 2015-06-26 13:33:33 UTC
Upstream master: http://review.gluster.org/11412/
Upstream release 3.7: http://review.gluster.org/11428
RHGS 3.1: https://code.engineering.redhat.com/gerrit/51703/

Comment 9 Saurabh 2015-07-17 13:26:20 UTC
Executed a peer probe to a new vm and it worked fine.

Comment 10 errata-xmlrpc 2015-07-29 05:07:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html


Note You need to log in before you can comment on or make changes to this bug.