Bug 1291195 - [georep+tiering]: Geo-replication sync is broken if cold tier is EC
Summary: [georep+tiering]: Geo-replication sync is broken if cold tier is EC
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: geo-replication
Version: rhgs-3.1
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: RHGS 3.1.2
Assignee: Bug Updates Notification Mailing List
QA Contact: Rahul Hinduja
URL:
Whiteboard:
Depends On:
Blocks: 1292084
TreeView+ depends on / blocked
 
Reported: 2015-12-14 09:32 UTC by Rahul Hinduja
Modified: 2016-03-01 06:02 UTC (History)
9 users (show)

Fixed In Version: glusterfs-3.7.5-13
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1292084 (view as bug list)
Environment:
Last Closed: 2016-03-01 06:02:54 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0193 0 normal SHIPPED_LIVE Red Hat Gluster Storage 3.1 update 2 2016-03-01 10:20:36 UTC

Description Rahul Hinduja 2015-12-14 09:32:55 UTC
Description of problem:
=======================

If cold tier is Distributed-Disperse (2x{4+2}) and hot tier is distributed-replicate(2x2), then the total subvolumes in the system are 4. But the lock files created under shared storage are 3 and hence only 3 bricks from 3 subvolume acquires the lock and participate in syncing. While the remaining one subvolume never participates in syncing. 

But if both cold and hot tier are Distributed-Replicate (2x2), then the lock files created are 4 and all 4 subvolume participates in syncing

I am suspecting an issue with the xml output generation of a volume file. 

A) If cold tier is Distributed-Disperse and hot tier is Distributed-Replicate, the xml output wrongly shows hot tier as REPLICATE: Example:

            <hotBrickType>Replicate</hotBrickType>
            <numberOfBricks>0 x 6 = 4</numberOfBricks>

B) If cold tier and hot tier both are Distributed-Replicate, the xml output is correctly shows hot tier as Distributed-Replicate: Example:

            <hotBrickType>Distributed-Replicate</hotBrickType>
            <numberOfBricks>2 x 2 = 4</numberOfBricks>

Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.7.5-10.el7rhgs.x86_64


How reproducible:
=================
2/2

Steps to Reproduce:
===================
1. Create master and slave cluster
2. Create Master volume (Cold Tier as distributed-disperse and Hot tier as Distributed-Replicate)
3. Create Slave volume (Distributed-Replicate)
4. Create and Start geo-rep session between master and slave

Actual results:
===============

Only brick from one subvolume in hot tier becomes ACTIVE


Expected results:
================

One brick from each subvolume in hot tier should become ACTIVE

Comment 6 Gaurav Kumar Garg 2015-12-16 12:44:57 UTC
upstream patch available: http://review.gluster.org/#/c/12982/

Comment 7 Kotresh HR 2015-12-17 12:36:04 UTC
Two patches one from cli xml and other from geo-rep is needed to fix this issue.
Below are the upstream patches posted.

1. cli xml: http://review.gluster.org/12982
2. Geo-rep: http://review.gluster.org/12994

Comment 8 Gaurav Kumar Garg 2015-12-21 11:57:23 UTC
cli/xml downstream patch: https://code.engineering.redhat.com/gerrit/64279

Comment 9 Kotresh HR 2015-12-22 15:53:55 UTC
It required 3rd geo-rep patch.

3. Upstream Patch: http://review.gluster.org/#/c/13062/

Downstream geo-rep patches:
1. https://code.engineering.redhat.com/gerrit/#/c/64378/
2. https://code.engineering.redhat.com/gerrit/#/c/64379/

Comment 10 Aravinda VK 2015-12-22 17:07:24 UTC
Downstream Geo-rep patches merged.

Comment 11 Rahul Hinduja 2015-12-28 05:51:19 UTC
Verified with build: glusterfs-3.7.5-13.el7rhgs.x86_64

--xml info of volume correctly shows DR as:

volume info:
============

Hot Tier Type : Distributed-Replicate
Number of Bricks: 3 x 2 = 6


volume info --xml:
==================

            <hotBrickType>Distributed-Replicate</hotBrickType>
            <hotreplicaCount>2</hotreplicaCount>
            <hotbrickCount>6</hotbrickCount>
            <numberOfBricks>3 x 2 = 6</numberOfBricks>


Geo-Rep creates correct number of lock files for HT under /var/run/gluster/shared_storage/geo-rep/

[root@dhcp37-165 scripts]# ls /var/run/gluster/shared_storage/geo-rep/
00fc636b-e04b-4d55-80a6-0da14b3f78af_82e6e34b-b161-433e-8ad7-7438ed97a8e6_subvol_cold_1.lock
00fc636b-e04b-4d55-80a6-0da14b3f78af_82e6e34b-b161-433e-8ad7-7438ed97a8e6_subvol_cold_2.lock
00fc636b-e04b-4d55-80a6-0da14b3f78af_82e6e34b-b161-433e-8ad7-7438ed97a8e6_subvol_hot_1.lock
00fc636b-e04b-4d55-80a6-0da14b3f78af_82e6e34b-b161-433e-8ad7-7438ed97a8e6_subvol_hot_2.lock
00fc636b-e04b-4d55-80a6-0da14b3f78af_82e6e34b-b161-433e-8ad7-7438ed97a8e6_subvol_hot_3.lock
[root@dhcp37-165 scripts]# 

Geo-Rep shows correct number of ACTIVE's, and initial sync got successful. Moving this bug to verified state.

Comment 14 errata-xmlrpc 2016-03-01 06:02:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html


Note You need to log in before you can comment on or make changes to this bug.