Description of problem: ======================= If cold tier is Distributed-Disperse (2x{4+2}) and hot tier is distributed-replicate(2x2), then the total subvolumes in the system are 4. But the lock files created under shared storage are 3 and hence only 3 bricks from 3 subvolume acquires the lock and participate in syncing. While the remaining one subvolume never participates in syncing. But if both cold and hot tier are Distributed-Replicate (2x2), then the lock files created are 4 and all 4 subvolume participates in syncing I am suspecting an issue with the xml output generation of a volume file. A) If cold tier is Distributed-Disperse and hot tier is Distributed-Replicate, the xml output wrongly shows hot tier as REPLICATE: Example: <hotBrickType>Replicate</hotBrickType> <numberOfBricks>0 x 6 = 4</numberOfBricks> B) If cold tier and hot tier both are Distributed-Replicate, the xml output is correctly shows hot tier as Distributed-Replicate: Example: <hotBrickType>Distributed-Replicate</hotBrickType> <numberOfBricks>2 x 2 = 4</numberOfBricks> Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.7.5-10.el7rhgs.x86_64 How reproducible: ================= 2/2 Steps to Reproduce: =================== 1. Create master and slave cluster 2. Create Master volume (Cold Tier as distributed-disperse and Hot tier as Distributed-Replicate) 3. Create Slave volume (Distributed-Replicate) 4. Create and Start geo-rep session between master and slave Actual results: =============== Only brick from one subvolume in hot tier becomes ACTIVE Expected results: ================ One brick from each subvolume in hot tier should become ACTIVE
upstream patch available: http://review.gluster.org/#/c/12982/
Two patches one from cli xml and other from geo-rep is needed to fix this issue. Below are the upstream patches posted. 1. cli xml: http://review.gluster.org/12982 2. Geo-rep: http://review.gluster.org/12994
cli/xml downstream patch: https://code.engineering.redhat.com/gerrit/64279
It required 3rd geo-rep patch. 3. Upstream Patch: http://review.gluster.org/#/c/13062/ Downstream geo-rep patches: 1. https://code.engineering.redhat.com/gerrit/#/c/64378/ 2. https://code.engineering.redhat.com/gerrit/#/c/64379/
Downstream Geo-rep patches merged.
Verified with build: glusterfs-3.7.5-13.el7rhgs.x86_64 --xml info of volume correctly shows DR as: volume info: ============ Hot Tier Type : Distributed-Replicate Number of Bricks: 3 x 2 = 6 volume info --xml: ================== <hotBrickType>Distributed-Replicate</hotBrickType> <hotreplicaCount>2</hotreplicaCount> <hotbrickCount>6</hotbrickCount> <numberOfBricks>3 x 2 = 6</numberOfBricks> Geo-Rep creates correct number of lock files for HT under /var/run/gluster/shared_storage/geo-rep/ [root@dhcp37-165 scripts]# ls /var/run/gluster/shared_storage/geo-rep/ 00fc636b-e04b-4d55-80a6-0da14b3f78af_82e6e34b-b161-433e-8ad7-7438ed97a8e6_subvol_cold_1.lock 00fc636b-e04b-4d55-80a6-0da14b3f78af_82e6e34b-b161-433e-8ad7-7438ed97a8e6_subvol_cold_2.lock 00fc636b-e04b-4d55-80a6-0da14b3f78af_82e6e34b-b161-433e-8ad7-7438ed97a8e6_subvol_hot_1.lock 00fc636b-e04b-4d55-80a6-0da14b3f78af_82e6e34b-b161-433e-8ad7-7438ed97a8e6_subvol_hot_2.lock 00fc636b-e04b-4d55-80a6-0da14b3f78af_82e6e34b-b161-433e-8ad7-7438ed97a8e6_subvol_hot_3.lock [root@dhcp37-165 scripts]# Geo-Rep shows correct number of ACTIVE's, and initial sync got successful. Moving this bug to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html