Bug 1235540

Summary:

peer probe results in Peer Rejected(Connected)

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

Saurabh <saujain>

Component:

core

Assignee:

rjoseph

Status:

CLOSED ERRATA

QA Contact:

Saurabh <saujain>

Severity:

urgent

Docs Contact:

Priority:

high

Version:

rhgs-3.1

CC:

amainkar, annair, ansubram, asrivast, kkeithle, mmadhusu, mzywusko, ndevos, rhs-bugs, rjoseph, skoduri, storage-qa-internal, vagarwal

Target Milestone:

---

Target Release:

RHGS 3.1.0

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

glusterfs-3.7.1-6

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Clones:

1235751 (view as bug list)

Environment:

Last Closed:

2015-07-29 05:07:33 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1202842, 1235751, 1236019

Attachments:

Description	Flags
sosreport of nfs11	none
sosreport of nfs15	none

Description Saurabh 2015-06-25 06:47:30 UTC

Description of problem:
I am having a cluster of 4 nodes and tried to add one more node to the existing cluster.
This addition resulted in a Peer Rejected(connected) issue.
I found that the glusterfsd.log mentions,

[2015-06-25 11:51:24.197134] E [MSGID: 106010] [glusterd-utils.c:2649:glusterd_compare_friend_volume] 0-management: Version of Cksums gluster_shared_storage differ. local cksum = 1662818965, remote cksum = 2342825121 on peer 10.70.46.39

Could not make out how the cksum is different only for volume gluster_shared_storage, whereas the other existing volume vol2, it is same.


Version-Release number of selected component (if applicable):
glusterfs-3.7.1-5.el6rhs.x86_64

How reproducible:
always

Steps to Reproduce:
1. create the meta volume gluster_shared_storage, start it, mount it on all nodes with the native client
2. create another volume called as vol2, start it
3. gluster peer probe <ip of new node>

Actual results:
from exiting node, where peer probe command was executed,
[root@nfs11 ~]# gluster peer status
Number of Peers: 4

Hostname: 10.70.46.27
Uuid: 238ffb63-3548-43c3-a527-ed53aa8f188c
State: Peer in Cluster (Connected)

Hostname: 10.70.46.25
Uuid: 99ddf436-01d1-4c62-8b21-096b1a08a6de
State: Peer in Cluster (Connected)

Hostname: 10.70.46.29
Uuid: d05e8c04-9142-406a-9374-51b478ced7e5
State: Peer in Cluster (Connected)

Hostname: 10.70.46.39
Uuid: 745b58ba-c963-4004-93fe-5ada9b39d107
State: Peer Rejected (Connected)


from the new node, which is attempted to join the cluster,
[root@nfs15 ~]# gluster peer status
Number of Peers: 1

Hostname: 10.70.46.8
Uuid: dea164a8-d55c-409e-92a5-960fd1dcf7d5
State: Peer Rejected (Connected)


Expected results:
peer probe should be succesful, cksum should be same for existing volume on all nodes including the new one.

Additional info:

Comment 2 Saurabh 2015-06-25 07:15:50 UTC

Created attachment 1042978 [details]
sosreport of nfs11

Comment 3 Saurabh 2015-06-25 07:21:35 UTC

Created attachment 1042980 [details]
sosreport of nfs15

Comment 4 Saurabh 2015-06-25 10:33:30 UTC

I tried to reproduce the issue with a new VM having the latest iso installed and issue happens with this one,

[root@nfs11 ~]# gluster peer probe 10.70.46.22
peer probe: success. 
[root@nfs11 ~]# gluster peer status
Number of Peers: 4

Hostname: 10.70.46.27
Uuid: 238ffb63-3548-43c3-a527-ed53aa8f188c
State: Peer in Cluster (Connected)

Hostname: 10.70.46.25
Uuid: 99ddf436-01d1-4c62-8b21-096b1a08a6de
State: Peer in Cluster (Connected)

Hostname: 10.70.46.29
Uuid: d05e8c04-9142-406a-9374-51b478ced7e5
State: Peer in Cluster (Connected)

Hostname: 10.70.46.22
Uuid: 39aea6ea-602f-472c-8a00-e72d253d04d6
State: Peer Rejected (Connected)




[root@nfs16 ~]# gluster peer status
Number of Peers: 1

Hostname: 10.70.46.8
Uuid: dea164a8-d55c-409e-92a5-960fd1dcf7d5
State: Peer Rejected (Connected)


[2015-06-25 10:24:00.818488] E [MSGID: 106010] [glusterd-utils.c:2649:glusterd_compare_friend_volume] 0-management: Version of Cksums gluster_shared_storage differ. local cksum = 1662818965, remote cksum = 2342825121 on peer 10.70.46.22
[2015-06-25 10:24:00.818626] I [glusterd-handler.c:3719:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to 10.70.46.22 (0), ret: 0
[2015-06-25 10:24:03.543396] I [glusterd-handler.c:1395:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req

Comment 5 rjoseph 2015-06-25 15:29:32 UTC

Initial RCA:

ganesha is started with the below mentioned command:
gluster nfs-ganesha enable

The above command will disable gluster nfs and thus "nfs.disable" will set to "on" in volinfo for all the volumes in the cluster. This option is set in volinfo but not persisted in the store. Because of which during handshake the new node gets "nfs.disable" as "on" but the current node does not have this data present in the store leading to mismatched cksum.

I will investigate further and send a patch soon.

Comment 7 rjoseph 2015-06-26 13:33:33 UTC

Upstream master: http://review.gluster.org/11412/
Upstream release 3.7: http://review.gluster.org/11428
RHGS 3.1: https://code.engineering.redhat.com/gerrit/51703/

Comment 9 Saurabh 2015-07-17 13:26:20 UTC

Executed a peer probe to a new vm and it worked fine.

Comment 10 errata-xmlrpc 2015-07-29 05:07:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html