1285295 – [geo-rep]: Recommended Shared volume use on geo-replication is broken in latest build

Bug 1285295 - [geo-rep]: Recommended Shared volume use on geo-replication is broken in latest build

Summary: [geo-rep]: Recommended Shared volume use on geo-replication is broken in late...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	geo-replication
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.1.2
Assignee:	Kotresh HR
QA Contact:	Rahul Hinduja
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1224928 1260783 1285488 1287456
TreeView+	depends on / blocked

Reported:	2015-11-25 11:25 UTC by Rahul Hinduja
Modified:	2016-03-01 05:58 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-3.7.5-9
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1285488 (view as bug list)
Environment:
Last Closed:	2016-03-01 05:58:07 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:0193	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 update 2	2016-03-01 10:20:36 UTC

Description Rahul Hinduja 2015-11-25 11:25:22 UTC

Description of problem:
=======================

Using shared volume for geo-replicaion is recommended so that only one worker from a subvolume becomes ACTIVE and participates in syncing. With the latest build all the bricks in a subvolume becomes ACTIVE as:

[root@dhcp37-165 ~]# gluster volume geo-replication master 10.70.37.99::slave status
 
MASTER NODE                          MASTER VOL    MASTER BRICK         SLAVE USER    SLAVE                 SLAVE NODE      STATUS    CRAWL STATUS       LAST_SYNCED                  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
dhcp37-165.lab.eng.blr.redhat.com    master        /rhs/brick1/ct-b1    root          10.70.37.99::slave    10.70.37.99     Active    Changelog Crawl    2015-11-25 14:09:19          
dhcp37-165.lab.eng.blr.redhat.com    master        /rhs/brick2/ct-b7    root          10.70.37.99::slave    10.70.37.99     Active    Changelog Crawl    2015-11-25 14:09:19          
dhcp37-110.lab.eng.blr.redhat.com    master        /rhs/brick1/ct-b5    root          10.70.37.99::slave    10.70.37.112    Active    Changelog Crawl    2015-11-25 14:09:27          
dhcp37-160.lab.eng.blr.redhat.com    master        /rhs/brick1/ct-b3    root          10.70.37.99::slave    10.70.37.162    Active    Changelog Crawl    2015-11-25 14:09:27          
dhcp37-158.lab.eng.blr.redhat.com    master        /rhs/brick1/ct-b4    root          10.70.37.99::slave    10.70.37.87     Active    Changelog Crawl    2015-11-25 14:51:50          
dhcp37-155.lab.eng.blr.redhat.com    master        /rhs/brick1/ct-b6    root          10.70.37.99::slave    10.70.37.88     Active    Changelog Crawl    2015-11-25 14:51:47          
dhcp37-133.lab.eng.blr.redhat.com    master        /rhs/brick1/ct-b2    root          10.70.37.99::slave    10.70.37.199    Active    Changelog Crawl    2015-11-25 14:51:50          
dhcp37-133.lab.eng.blr.redhat.com    master        /rhs/brick2/ct-b8    root          10.70.37.99::slave    10.70.37.199    Active    Changelog Crawl    2015-11-25 14:51:48          
[root@dhcp37-165 ~]# 

[root@dhcp37-165 geo-rep]# ls
cbe0236c-db59-48eb-b3eb-2e436a505e11_32530124-055f-4dd8-a7cc-d8c8ebeb91bb_subvol_1.lock
cbe0236c-db59-48eb-b3eb-2e436a505e11_32530124-055f-4dd8-a7cc-d8c8ebeb91bb_subvol_2.lock
cbe0236c-db59-48eb-b3eb-2e436a505e11_32530124-055f-4dd8-a7cc-d8c8ebeb91bb_subvol_3.lock
cbe0236c-db59-48eb-b3eb-2e436a505e11_32530124-055f-4dd8-a7cc-d8c8ebeb91bb_subvol_4.lock
[root@dhcp37-165 geo-rep]# 


[root@dhcp37-165 syncdaemon]# gluster volume geo-replication master 10.70.37.99::slave config use_meta_volumetrue
[root@dhcp37-165 syncdaemon]# 


Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.7.5-7.el7rhgs.x86_64


How reproducible:
=================

1/1


Steps to Reproduce:
===================
1. Create Master and Slave Cluster
2. Create Master and Slave volume
3. Enable shared storage on master volume "gluster v set all cluster.enable-shared-storage enable"
4. Mount Master volume and create some data 
5. Create Geo-Rep session between master and Slave volume 
6. Enable Meta volume
7. Start the Geo-Rep session between master and slave


Actual results:
===============

All bricks on a subvolume becomes ACTIVE 

Expected results:
=================

Only 1 brick from each subvolume should become ACTIVE

Comment 3 Kotresh HR 2015-11-26 06:18:53 UTC

Patch sent upstream:
http://review.gluster.org/#/c/12752/

Comment 4 Kotresh HR 2015-11-26 06:19:36 UTC

The workaround to go ahead with the testing is to comment following lines
in /usr/libexec/master.py

471             # Close the previously acquired lock so that
 472             # fd will not leak. Reset fd to None
 473             if gconf.mgmt_lock_fd:
 474                 os.close(gconf.mgmt_lock_fd)
 475                 gconf.mgmt_lock_fd = None
 476 
 477             # Save latest FD for future use
 478             gconf.mgmt_lock_fd = fd


484             # When previously Active becomes Passive, Close the
 485             # fd of previously acquired lock
 486             if gconf.mgmt_lock_fd:
 487                 os.close(gconf.mgmt_lock_fd)
 488                 gconf.mgmt_lock_fd = None

Comment 5 Aravinda VK 2015-12-02 09:46:45 UTC

Downstream patch https://code.engineering.redhat.com/gerrit/#/c/62769/

Comment 6 Rahul Hinduja 2015-12-04 09:19:09 UTC

Verified with build: glusterfs-geo-replication-3.7.5-9.el7rhgs.x86_64

One brick from each subvolume becomes ACTIVE. Moving bug to verified state.

Comment 9 errata-xmlrpc 2016-03-01 05:58:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html

Note You need to log in before you can comment on or make changes to this bug.