Red Hat Bugzilla – Bug 1283929
[GlusterD]: "Peer Rejected" not showing in all nodes of the cluster.
Last modified: 2015-11-20 05:29:10 EST
Description of problem:
If we peer probe a 3.1.2 node (ISO installed) from one of the updated cluster node ( 2.1.6 to 3.1.2) with out bumping up the op-version, peer status is showing "Rejected" in the node where "peer probe command" is executed and it's not showing in other node of the cluster.
Same issue observed during the below scenario also:
Peer probe a node having same volume name from other two node cluster (node-1 and node-2) then "Peer rejected" is showing only in the node where "peer probe" command is executed (node-1) and not in the other node (node-2)
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Have two node cluster with rhgs 2.1.6 (node-1 and node-2)
2. Update both to latest 3.1.2
3. Peer probe 3.1.2 node (node-3, ISO installed ) from node-1 // it will fail
4. Check the peer status on all the nodes.
1. Have three nodes with rhgs 3.1.2 //Don't create the cluster (node-1, node-2 and node-3)
2. Create a two node cluster using node-1 and node-2.
3. Create Distributed volume using node-1 and node-2 with name "Dis"
4. Now create one Distributed volume in node-3 with same name "Dis"
5. Peer probe node-3 from node-1 // it will fail
6. Check peer status
"Peer Rejected" not showing in one of the cluster node (node-2)
"Peer Rejected" should show in all the nodes of the cluster
if we do peer probe once again on the node where "Peer Rejected" in the peer status, it will fail with error message "peer probe: failed: node-3 is already part of another cluster"
We are going to mandate bumping up the op-version as part of the upgrade process and hence the 1st case is not a relevant & valid test and this results into incomplete handshaking on the cluster. The reason why the 2nd peer probe attempt is failing because that peer is already got added (just an entry) in the trusted storage pool and peer validation has this check. For the 2nd case, probing a node with a same volume name is *not at all* recommended and can lead to an issue. Another bugzilla BZ 1279681 has the same symptoms and I've had a same comment on this as well.
Since the tests performed here are not supported and there is no way to land in this situation, closing this bug saying won't fix (as I agreed we have a bug in the code but that's not hit in the recommended path).