Bug 1234882

Summary: [geo-rep]: Feature fan-out fails with the use of meta volume config
Product: [Community] GlusterFS Reporter: Kotresh HR <khiremat>
Component: geo-replicationAssignee: Kotresh HR <khiremat>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, chrisw, csaba, gluster-bugs, khiremat, nlevinki, rcyriac, rhinduja, storage-qa-internal
Target Milestone: ---Keywords: Regression, Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.8rc2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1234419
: 1234898 (view as bug list) Environment:
Last Closed: 2016-06-16 13:15:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1234419    
Bug Blocks: 1234898    

Description Kotresh HR 2015-06-23 12:47:01 UTC
+++ This bug was initially created as a clone of Bug #1234419 +++

Description of problem:
=======================

When the geo-rep session was created between 2 slaves, one slaves bricks all becomes PASSIVE. It is only with the use of meta volume config set to true. 

Slave volumes: slave1 and slave2


Creating geo-rep Session between master volume and slave volumes (slave1,slave2)

[root@georep1 scripts]# gluster volume geo-replication master 10.70.46.154::slave1 create push-pem force
Creating geo-replication session between master & 10.70.46.154::slave1 has been successful
[root@georep1 scripts]# gluster volume geo-replication master 10.70.46.154::slave2 create push-pem force
Creating geo-replication session between master & 10.70.46.154::slave2 has been successful
[root@georep1 scripts]# 

Setting the use-meta-volume for slave1 and slave2 volume:

[root@georep1 scripts]# gluster volume geo-replication master 10.70.46.154::slave1 config use_meta_volume true
geo-replication config updated successfully
[root@georep1 scripts]# gluster volume geo-replication master 10.70.46.154::slave2 config use_meta_volume true
geo-replication config updated successfully
[root@georep1 scripts]# 


Starting geo-rep session for slave volumes slave1, slave2

[root@georep1 scripts]# gluster volume geo-replication master 10.70.46.154::slave1 start
Starting geo-replication session between master & 10.70.46.154::slave1 has been successful
[root@georep1 scripts]#
[root@georep1 scripts]# gluster volume geo-replication master 10.70.46.154::slave2 start
Starting geo-replication session between master & 10.70.46.154::slave2 has been successful
[root@georep1 scripts]# 

Status:
=======
[root@georep1 scripts]# gluster volume geo-replication master 10.70.46.154::slave1 status
 
MASTER NODE    MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE                   SLAVE NODE      STATUS     CRAWL STATUS       LAST_SYNCED                  
-----------------------------------------------------------------------------------------------------------------------------------------------------
georep1        master        /rhs/brick1/b1    root          10.70.46.154::slave1    10.70.46.101    Active     Changelog Crawl    2015-06-23 00:46:12          
georep1        master        /rhs/brick2/b2    root          10.70.46.154::slave1    10.70.46.101    Active     Changelog Crawl    2015-06-23 00:46:12          
georep3        master        /rhs/brick1/b1    root          10.70.46.154::slave1    10.70.46.154    Passive    N/A                N/A                          
georep3        master        /rhs/brick2/b2    root          10.70.46.154::slave1    10.70.46.154    Passive    N/A                N/A                          
georep2        master        /rhs/brick1/b1    root          10.70.46.154::slave1    10.70.46.103    Passive    N/A                N/A                          
georep2        master        /rhs/brick2/b2    root          10.70.46.154::slave1    10.70.46.103    Passive    N/A                N/A                          
[root@georep1 scripts]# 
[root@georep1 scripts]# 
[root@georep1 scripts]# gluster volume geo-replication master 10.70.46.154::slave2 status
 
MASTER NODE    MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE                   SLAVE NODE      STATUS     CRAWL STATUS    LAST_SYNCED          
------------------------------------------------------------------------------------------------------------------------------------------
georep1        master        /rhs/brick1/b1    root          10.70.46.154::slave2    10.70.46.101    Passive    N/A             N/A                  
georep1        master        /rhs/brick2/b2    root          10.70.46.154::slave2    10.70.46.101    Passive    N/A             N/A                  
georep3        master        /rhs/brick1/b1    root          10.70.46.154::slave2    10.70.46.154    Passive    N/A             N/A                  
georep3        master        /rhs/brick2/b2    root          10.70.46.154::slave2    10.70.46.154    Passive    N/A             N/A                  
georep2        master        /rhs/brick1/b1    root          10.70.46.154::slave2    10.70.46.103    Passive    N/A             N/A                  
georep2        master        /rhs/brick2/b2    root          10.70.46.154::slave2    10.70.46.103    Passive    N/A             N/A                  
[root@georep1 scripts]# 


The second slave volume slave2 has all the passive bricks, and hence the sync never happens to the slave2 volume.

Meta volume bricks:

[root@georep1 scripts]# ls /var/run/gluster/ss_brick/geo-rep/
6f023fd5-49a5-4af7-a68a-b7071a8b9ff0_subvol_1.lock  6f023fd5-49a5-4af7-a68a-b7071a8b9ff0_subvol_2.lock
[root@georep1 scripts]# 



Version-Release number of selected component (if applicable):
==============================================================


How reproducible:
=================
1/1


Master:
=======

[root@georep1 scripts]# gluster volume info
 
Volume Name: gluster_shared_storage
Type: Replicate
Volume ID: 102b304d-494a-40cc-84e0-3eca89b3e559
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.46.97:/var/run/gluster/ss_brick
Brick2: 10.70.46.93:/var/run/gluster/ss_brick
Brick3: 10.70.46.96:/var/run/gluster/ss_brick
Options Reconfigured:
performance.readdir-ahead: on
cluster.enable-shared-storage: enable
 
Volume Name: master
Type: Distributed-Replicate
Volume ID: 6f023fd5-49a5-4af7-a68a-b7071a8b9ff0
Status: Started
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.46.96:/rhs/brick1/b1
Brick2: 10.70.46.97:/rhs/brick1/b1
Brick3: 10.70.46.93:/rhs/brick1/b1
Brick4: 10.70.46.96:/rhs/brick2/b2
Brick5: 10.70.46.97:/rhs/brick2/b2
Brick6: 10.70.46.93:/rhs/brick2/b2
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
performance.readdir-ahead: on
cluster.enable-shared-storage: enable
[root@georep1 scripts]# 


Slave:
======

[root@georep4 scripts]# gluster volume info
 
Volume Name: slave1
Type: Replicate
Volume ID: fc1e64c2-2028-4977-844a-678f4cc31351
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.46.154:/rhs/brick1/b1
Brick2: 10.70.46.101:/rhs/brick1/b1
Brick3: 10.70.46.103:/rhs/brick1/b1
Options Reconfigured:
performance.readdir-ahead: on
 
Volume Name: slave2
Type: Replicate
Volume ID: 800f46c8-2708-48e5-9256-df8dbbdc5906
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.46.154:/rhs/brick2/b2
Brick2: 10.70.46.101:/rhs/brick2/b2
Brick3: 10.70.46.103:/rhs/brick2/b2
Options Reconfigured:
performance.readdir-ahead: on
[root@georep4 scripts]#

Comment 1 Anand Avati 2015-06-23 13:15:29 UTC
REVIEW: http://review.gluster.org/11367 (geo-rep: Fix geo-rep fanout setup with meta volume) posted (#1) for review on master by Kotresh HR (khiremat)

Comment 2 Kotresh HR 2015-06-25 05:28:59 UTC
Patch is merged! Missed automatic upate.

COMMIT: http://review.gluster.org/11367 

geo-rep: Fix geo-rep fanout setup with meta volume
    
    Lock filename was formed with 'master volume id'
    and 'subvol number'. Hence multiple slaves try
    acquiring lock on same file and become PASSIVE
    ending up not syncing data. Using 'slave volume id'
    in lock filename will fix the issue making lock
    file unique across different slaves.
    
    BUG: 1234882
    Change-Id: Ie3590b36ed03e80d74c0cfc1290dd72122a3b4b1
    Signed-off-by: Kotresh HR <khiremat>
    Reviewed-on: http://review.gluster.org/11367
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: Aravinda VK <avishwan>

Comment 3 Nagaprasad Sathyanarayana 2015-10-25 14:47:28 UTC
Fix for this BZ is already present in a GlusterFS release. You can find clone of this BZ, fixed in a GlusterFS release and closed. Hence closing this mainline BZ as well.

Comment 4 Niels de Vos 2016-06-16 13:15:21 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user