Bug 1291195
| Summary: | [georep+tiering]: Geo-replication sync is broken if cold tier is EC | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Rahul Hinduja <rhinduja> | |
| Component: | geo-replication | Assignee: | Bug Updates Notification Mailing List <rhs-bugs> | |
| Status: | CLOSED ERRATA | QA Contact: | Rahul Hinduja <rhinduja> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | rhgs-3.1 | CC: | annair, asrivast, avishwan, byarlaga, chrisw, csaba, khiremat, nlevinki, sankarshan | |
| Target Milestone: | --- | Keywords: | ZStream | |
| Target Release: | RHGS 3.1.2 | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | glusterfs-3.7.5-13 | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1292084 (view as bug list) | Environment: | ||
| Last Closed: | 2016-03-01 06:02:54 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1292084 | |||
upstream patch available: http://review.gluster.org/#/c/12982/ Two patches one from cli xml and other from geo-rep is needed to fix this issue. Below are the upstream patches posted. 1. cli xml: http://review.gluster.org/12982 2. Geo-rep: http://review.gluster.org/12994 cli/xml downstream patch: https://code.engineering.redhat.com/gerrit/64279 It required 3rd geo-rep patch. 3. Upstream Patch: http://review.gluster.org/#/c/13062/ Downstream geo-rep patches: 1. https://code.engineering.redhat.com/gerrit/#/c/64378/ 2. https://code.engineering.redhat.com/gerrit/#/c/64379/ Downstream Geo-rep patches merged. Verified with build: glusterfs-3.7.5-13.el7rhgs.x86_64
--xml info of volume correctly shows DR as:
volume info:
============
Hot Tier Type : Distributed-Replicate
Number of Bricks: 3 x 2 = 6
volume info --xml:
==================
<hotBrickType>Distributed-Replicate</hotBrickType>
<hotreplicaCount>2</hotreplicaCount>
<hotbrickCount>6</hotbrickCount>
<numberOfBricks>3 x 2 = 6</numberOfBricks>
Geo-Rep creates correct number of lock files for HT under /var/run/gluster/shared_storage/geo-rep/
[root@dhcp37-165 scripts]# ls /var/run/gluster/shared_storage/geo-rep/
00fc636b-e04b-4d55-80a6-0da14b3f78af_82e6e34b-b161-433e-8ad7-7438ed97a8e6_subvol_cold_1.lock
00fc636b-e04b-4d55-80a6-0da14b3f78af_82e6e34b-b161-433e-8ad7-7438ed97a8e6_subvol_cold_2.lock
00fc636b-e04b-4d55-80a6-0da14b3f78af_82e6e34b-b161-433e-8ad7-7438ed97a8e6_subvol_hot_1.lock
00fc636b-e04b-4d55-80a6-0da14b3f78af_82e6e34b-b161-433e-8ad7-7438ed97a8e6_subvol_hot_2.lock
00fc636b-e04b-4d55-80a6-0da14b3f78af_82e6e34b-b161-433e-8ad7-7438ed97a8e6_subvol_hot_3.lock
[root@dhcp37-165 scripts]#
Geo-Rep shows correct number of ACTIVE's, and initial sync got successful. Moving this bug to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html |
Description of problem: ======================= If cold tier is Distributed-Disperse (2x{4+2}) and hot tier is distributed-replicate(2x2), then the total subvolumes in the system are 4. But the lock files created under shared storage are 3 and hence only 3 bricks from 3 subvolume acquires the lock and participate in syncing. While the remaining one subvolume never participates in syncing. But if both cold and hot tier are Distributed-Replicate (2x2), then the lock files created are 4 and all 4 subvolume participates in syncing I am suspecting an issue with the xml output generation of a volume file. A) If cold tier is Distributed-Disperse and hot tier is Distributed-Replicate, the xml output wrongly shows hot tier as REPLICATE: Example: <hotBrickType>Replicate</hotBrickType> <numberOfBricks>0 x 6 = 4</numberOfBricks> B) If cold tier and hot tier both are Distributed-Replicate, the xml output is correctly shows hot tier as Distributed-Replicate: Example: <hotBrickType>Distributed-Replicate</hotBrickType> <numberOfBricks>2 x 2 = 4</numberOfBricks> Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.7.5-10.el7rhgs.x86_64 How reproducible: ================= 2/2 Steps to Reproduce: =================== 1. Create master and slave cluster 2. Create Master volume (Cold Tier as distributed-disperse and Hot tier as Distributed-Replicate) 3. Create Slave volume (Distributed-Replicate) 4. Create and Start geo-rep session between master and slave Actual results: =============== Only brick from one subvolume in hot tier becomes ACTIVE Expected results: ================ One brick from each subvolume in hot tier should become ACTIVE