Bug 1415166

Summary: [geo-rep]: Worker becomes faulty due to failure in "Incorrect mountbroker user directory attributes [Permission denied]"
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rahul Hinduja <rhinduja>
Component: geo-replicationAssignee: Sunny Kumar <sunkumar>
Status: CLOSED CURRENTRELEASE QA Contact: Rahul Hinduja <rhinduja>
Severity: medium Docs Contact:
Priority: low    
Version: rhgs-3.2CC: csaba, khiremat, rhs-bugs, sabose, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-28 10:59:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rahul Hinduja 2017-01-20 12:49:27 UTC
Description of problem:
=======================

While creating a non-root geo-rep session and starting, one of the geo-rep worker went to faulty state for an iteration and comes back online. 

glusterd logs for the slave to which worker is connected shows the following error:

[2017-01-19 15:55:32.233419] E [MSGID: 106176] [glusterd-mountbroker.c:646:glusterd_do_mount] 0-management: Incorrect mountbroker user directory attributes [Permission denied]
[2017-01-19 15:55:32.233446] W [MSGID: 106176] [glusterd-mountbroker.c:724:glusterd_do_mount] 0-management: unsuccessful mount request [Permission denied]

[root@dhcp37-177 ~]# gluster volume geo-replication master geoaccount123.37.121::slave status
 
MASTER NODE     MASTER VOL    MASTER BRICK       SLAVE USER       SLAVE                                SLAVE NODE      STATUS     CRAWL STATUS       LAST_SYNCED                  
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
10.70.37.177    master        /rhs/brick1/b1     geoaccount123    geoaccount123.37.121::slave    10.70.37.68     Active     Changelog Crawl    2017-01-19 15:55:44          
10.70.37.177    master        /rhs/brick2/b5     geoaccount123    geoaccount123.37.121::slave    10.70.37.68     Active     Changelog Crawl    2017-01-19 15:55:44          
10.70.37.177    master        /rhs/brick3/b9     geoaccount123    geoaccount123.37.121::slave    N/A             Faulty     N/A                N/A                          
10.70.37.43     master        /rhs/brick1/b2     geoaccount123    geoaccount123.37.121::slave    10.70.37.198    Passive    N/A                N/A                          
10.70.37.43     master        /rhs/brick2/b6     geoaccount123    geoaccount123.37.121::slave    10.70.37.198    Passive    N/A                N/A                          
10.70.37.43     master        /rhs/brick3/b10    geoaccount123    geoaccount123.37.121::slave    10.70.37.198    Active     Changelog Crawl    2017-01-19 15:55:44          
10.70.37.76     master        /rhs/brick1/b3     geoaccount123    geoaccount123.37.121::slave    10.70.37.121    Active     Changelog Crawl    2017-01-19 15:55:44          
10.70.37.76     master        /rhs/brick2/b7     geoaccount123    geoaccount123.37.121::slave    10.70.37.121    Active     Changelog Crawl    2017-01-19 15:55:44          
10.70.37.76     master        /rhs/brick3/b11    geoaccount123    geoaccount123.37.121::slave    10.70.37.121    Active     Changelog Crawl    2017-01-19 15:55:44          
10.70.37.56     master        /rhs/brick1/b4     geoaccount123    geoaccount123.37.121::slave    10.70.37.201    Passive    N/A                N/A                          
10.70.37.56     master        /rhs/brick2/b8     geoaccount123    geoaccount123.37.121::slave    10.70.37.201    Passive    N/A                N/A                          
10.70.37.56     master        /rhs/brick3/b12    geoaccount123    geoaccount123.37.121::slave    10.70.37.201    Passive    N/A                N/A                          
[root@dhcp37-177 ~]# 

However, the worker comes online and becomes Passive. Permissions at the slave node seems to be valid:

[root@dhcp37-68 glusterfs]# ls -l /var/ | grep mount
drwx--x--x.  4 root root   37 Jan 20 11:19 mountbroker-root
[root@dhcp37-68 glusterfs]# ls -l /var/mountbroker-root/
total 0
drwx--x--x. 2 root          root 6 Jan 20 11:20 mb_hive
drwx------. 2 geoaccount123 root 6 Jan 20 11:20 user1000
[root@dhcp37-68 glusterfs]# 


Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.8.4-10.el7rhgs.x86_64


How reproducible:
=================

I have seen this issue twice in 3 runs so far. 


Steps to Reproduce:
===================
1. Create Master Cluster and Volume
2. Create Slave Cluster and Volume 
3. Create non-root geo-rep session between master and slave using "gluster-mountbroker setup" 
4. Start the geo-rep session

Actual results:
===============
One of the worker goes to faulty 

Expected results:
=================

Worker shouldn't go to faulty and it should be in Active/Passive from Initializing state

Comment 6 Sunny Kumar 2019-11-21 09:42:31 UTC
This issue is fixed by:
1. https://review.gluster.org/#/c/glusterfs/+/22890/

@Rahul, Can we close this bug.