Bug 1291195

Summary: [georep+tiering]: Geo-replication sync is broken if cold tier is EC
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rahul Hinduja <rhinduja>
Component: geo-replicationAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED ERRATA QA Contact: Rahul Hinduja <rhinduja>
Severity: urgent Docs Contact:
Priority: urgent    
Version: rhgs-3.1CC: annair, asrivast, avishwan, byarlaga, chrisw, csaba, khiremat, nlevinki, sankarshan
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.1.2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.7.5-13 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1292084 (view as bug list) Environment:
Last Closed: 2016-03-01 06:02:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1292084    

Description Rahul Hinduja 2015-12-14 09:32:55 UTC
Description of problem:
=======================

If cold tier is Distributed-Disperse (2x{4+2}) and hot tier is distributed-replicate(2x2), then the total subvolumes in the system are 4. But the lock files created under shared storage are 3 and hence only 3 bricks from 3 subvolume acquires the lock and participate in syncing. While the remaining one subvolume never participates in syncing. 

But if both cold and hot tier are Distributed-Replicate (2x2), then the lock files created are 4 and all 4 subvolume participates in syncing

I am suspecting an issue with the xml output generation of a volume file. 

A) If cold tier is Distributed-Disperse and hot tier is Distributed-Replicate, the xml output wrongly shows hot tier as REPLICATE: Example:

            <hotBrickType>Replicate</hotBrickType>
            <numberOfBricks>0 x 6 = 4</numberOfBricks>

B) If cold tier and hot tier both are Distributed-Replicate, the xml output is correctly shows hot tier as Distributed-Replicate: Example:

            <hotBrickType>Distributed-Replicate</hotBrickType>
            <numberOfBricks>2 x 2 = 4</numberOfBricks>

Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.7.5-10.el7rhgs.x86_64


How reproducible:
=================
2/2

Steps to Reproduce:
===================
1. Create master and slave cluster
2. Create Master volume (Cold Tier as distributed-disperse and Hot tier as Distributed-Replicate)
3. Create Slave volume (Distributed-Replicate)
4. Create and Start geo-rep session between master and slave

Actual results:
===============

Only brick from one subvolume in hot tier becomes ACTIVE


Expected results:
================

One brick from each subvolume in hot tier should become ACTIVE

Comment 6 Gaurav Kumar Garg 2015-12-16 12:44:57 UTC
upstream patch available: http://review.gluster.org/#/c/12982/

Comment 7 Kotresh HR 2015-12-17 12:36:04 UTC
Two patches one from cli xml and other from geo-rep is needed to fix this issue.
Below are the upstream patches posted.

1. cli xml: http://review.gluster.org/12982
2. Geo-rep: http://review.gluster.org/12994

Comment 8 Gaurav Kumar Garg 2015-12-21 11:57:23 UTC
cli/xml downstream patch: https://code.engineering.redhat.com/gerrit/64279

Comment 9 Kotresh HR 2015-12-22 15:53:55 UTC
It required 3rd geo-rep patch.

3. Upstream Patch: http://review.gluster.org/#/c/13062/

Downstream geo-rep patches:
1. https://code.engineering.redhat.com/gerrit/#/c/64378/
2. https://code.engineering.redhat.com/gerrit/#/c/64379/

Comment 10 Aravinda VK 2015-12-22 17:07:24 UTC
Downstream Geo-rep patches merged.

Comment 11 Rahul Hinduja 2015-12-28 05:51:19 UTC
Verified with build: glusterfs-3.7.5-13.el7rhgs.x86_64

--xml info of volume correctly shows DR as:

volume info:
============

Hot Tier Type : Distributed-Replicate
Number of Bricks: 3 x 2 = 6


volume info --xml:
==================

            <hotBrickType>Distributed-Replicate</hotBrickType>
            <hotreplicaCount>2</hotreplicaCount>
            <hotbrickCount>6</hotbrickCount>
            <numberOfBricks>3 x 2 = 6</numberOfBricks>


Geo-Rep creates correct number of lock files for HT under /var/run/gluster/shared_storage/geo-rep/

[root@dhcp37-165 scripts]# ls /var/run/gluster/shared_storage/geo-rep/
00fc636b-e04b-4d55-80a6-0da14b3f78af_82e6e34b-b161-433e-8ad7-7438ed97a8e6_subvol_cold_1.lock
00fc636b-e04b-4d55-80a6-0da14b3f78af_82e6e34b-b161-433e-8ad7-7438ed97a8e6_subvol_cold_2.lock
00fc636b-e04b-4d55-80a6-0da14b3f78af_82e6e34b-b161-433e-8ad7-7438ed97a8e6_subvol_hot_1.lock
00fc636b-e04b-4d55-80a6-0da14b3f78af_82e6e34b-b161-433e-8ad7-7438ed97a8e6_subvol_hot_2.lock
00fc636b-e04b-4d55-80a6-0da14b3f78af_82e6e34b-b161-433e-8ad7-7438ed97a8e6_subvol_hot_3.lock
[root@dhcp37-165 scripts]# 

Geo-Rep shows correct number of ACTIVE's, and initial sync got successful. Moving this bug to verified state.

Comment 14 errata-xmlrpc 2016-03-01 06:02:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html