Description of problem: On a Distributed-Disperse volume 2 x (8 + 4), attached tier distributed-Replicate 4 x 2 = 8. Then went on to do some IOs. After that i did a detach tier which was successful. But when i tried to attach tier again, this is failing. Even after gluster volume stop, restart of glusterd and start the volume i face the same issue. Version-Release number of selected component (if applicable): glusterfs-3.7.9-1.el7rhgs.x86_64 How reproducible: Always Steps to Reproduce: 1.Create a Distributed-Disperse volume 2 x (8 + 4) 2.Attach tier distributed-Replicate 4 x 2 = 8 3.Run IOs 4.Detach tier | commit 5.Clean the brick attributes related to detach tier 5.Attach tier. Actual results: volume attach-tier: failed: Pre Validation failed... Brick may be containing or be contained by an existing brick Expected results: Attaching tier should be successful Additional info:
upstream patch http://review.gluster.org/#/c/13890/
RCA: For validating new bricks, we use a variable "real_path" which will be filled for every brick in local node. This variable real_path will be calculated when we create a new brick, also when we restore the brick during a glusterd restart. Now with some reason if an handshake happens from peer node because of the mismatch in data, at this time we are not populating the variable , ie it will null. If real_path becomes null, then creating a new brick will fail. which means we cannot create or add a brick into the cluster.
This is a regression, and hence setting the keyword.
Downstream patch https://code.engineering.redhat.com/gerrit/71498 posted for review.
Upstream master patch : http://review.gluster.org/13890 Upstream release-3.7 patch : http://review.gluster.org/13914 Downstream patch : https://code.engineering.redhat.com/gerrit/71498
Modified reproducible: 1) Create a volume on a multinode cluster 2) Kill one node. 3) Do a configuration change (volume set commands like turning off write-behind) 4) Tty to add a brick, or create a new volume.
Verified this bug using the build "glusterfs-3.7.9-2.el7r" with below steps. Steps: ====== 1. Created two node cluster with one sample Distribute volume using two bricks. 2. Stopped glusterd on node2 3. Changed the write-behind value using volume set option on node1 4. started glusterd on node2 5. Checked handshake has happened using volume get option on node2 6. Expanded the volume by adding the bricks on node1 All the steps mentioned above worked well. Note: This issue not always reproducible in last nightly build (3.7.9-1) with above steps. Fix is working fine, changing the status to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240