Red Hat Bugzilla – Bug 1279681
[Tier]: Peer probe happened with nodes having same volume name with diff vol type.
Last modified: 2018-01-29 16:28:16 EST
Description of problem:
Peer probe is happened b/w the rhgs nodes which are having same volume name in which one node volume is normal volume and other node volume is Tiered volume
and after peer probe success, normal volume bricks are killed and tiered volume displayed in the cluster.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Have two nodes (node-1 and node-2) with rhgs 3.1.2(glusterfs-3.7.5-5) //DON'T CREATE THE CLUSTER now
2. Create one normal volume in node-1 with vol name Dis ( Eg: Distributed vol type )
3. Create one tiered volume in node-2 with vol name Dis.
4. Make a note of PIDs of bricks on both the nodes.
5. Peer probe node-2 from node-1 // it will success
6. Check volume status on both the nodes //will see Tiered volume on both the nodes
7. Now check the PIDs on both the nodes as step-4 // will see only tiered volume brick PIDs
Peer probe happening with nodes having same volume name.
Peer probe should not happen with nodes having same volume name.
You can peer probe from any one node in create a cluster, it not like the step-5 in reproducing steps, you need to peer probe from node-1 only.
The difference between distributed volume and a tiered volume is in terms of its version. Since a tier volume creation is a multi step process (create volume + attach tier) the version will be always +1 compared to a normal volume. In this case glusterd_compare_friend_volume() detects a version mismatch and hence initiates a update req from the other peer which results into overwriting the distributed volume with the tier volume. This could very well happen for a distributed vs distributed volume as well where the version of these volume differ.
This bug is caught as part of negative testing as probing a node with same volume name configured in not recommended, hence lowering the priority and severity.
This is a bug from Day 1 and the fix is not that straight forward. I'd suggest to defer it from 3.1.2
Development Management has reviewed and declined this request.
You may appeal this decision by reopening this request.
This situation is avoided from the addition of the following patch.
master : http://review.gluster.org/#/c/12864/
3.7 : http://review.gluster.org/#/c/12888/
peer probe to a node with an existing volume is avoided and as we wont be able to create two volumes with same name in a cluster. the above situation can not be reached. The patch will get in for 3.1.3 during rebase.
The way peer probe can be done has been changed, earlier it can be probed between two clusters and this created the issue.
The above mentioned patch prevents probing from one cluster to another cluster.
Now peer probe can be done from one cluster to a node which is not a part of any other cluster. Doing so we eliminate the chance of creating volumes with same volume name. as the issue was caused because we probed from one node have a volume name same as the volume name on the probed cluster.
The patch got in the 3.1.3 from the tag 3.7.9-1
As the issue is resolved. i'm closing the bug. please feel free to open it if the problem exists after following the mentioned methods for prevention.
(In reply to hari gowtham from comment #9)
> The way peer probe can be done has been changed, earlier it can be probed
> between two clusters and this created the issue.
> The above mentioned patch prevents probing from one cluster to another
> Now peer probe can be done from one cluster to a node which is not a part of
> any other cluster. Doing so we eliminate the chance of creating volumes with
> same volume name. as the issue was caused because we probed from one node
> have a volume name same as the volume name on the probed cluster.
> The patch got in the 3.1.3 from the tag 3.7.9-1
> As the issue is resolved. i'm closing the bug. please feel free to open it
> if the problem exists after following the mentioned methods for prevention.
I do not believe that this BZ can be CLOSED CURRENTRELEASE if the reported issue is still reproducible at the current RELEASED version, which is actually RHGS 3.1.2, and not RHGS 3.1.3.
The fix could be verified by QE either during the release cycle of RHGS 3.1.3, or post the GA of RHGS 3.1.3 since the BZ is not part of the approved list of bugs for the release.
Most importantly, the BZ needs to remain OPEN at least till the GA of RHGS 3.1.3 before it may be closed.
Moving to MODIFIED.
Patch available downstream as commit 8a9a532.