Bug 1609163

Summary:	Fuse mount of volume fails when gluster_shared_storage is enabled
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Jilju Joy <jijoy>
Component:	glusterd	Assignee:	Sanju <srakonde>
Status:	CLOSED ERRATA	QA Contact:	Jilju Joy <jijoy>
Severity:	high	Docs Contact:
Priority:	high
Version:	rhgs-3.4	CC:	amukherj, apaladug, msaini, rallan, rhs-bugs, rkavunga, sankarshan, storage-qa-internal, vbellur, vdas
Target Milestone:	---
Target Release:	RHGS 3.4.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.12.2-16	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1610726 (view as bug list)		Environment:
Last Closed:	2018-09-04 06:51:13 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1503137, 1610726

Description Jilju Joy 2018-07-27 07:52:33 UTC

Description of problem:
-----------------------
While mounting(glusterfs) a Distributed-Replicate volume from node where gluster_shared_storage is enabled, mounting fails with below error in log.


[2018-07-27 06:42:44.748162] W [MSGID: 114043] [client-handshake.c:1108:client_setvolume_cbk] 0-testvol-client-0: failed to set the volume [Permission denied]
[2018-07-27 06:42:44.748235] W [MSGID: 114007] [client-handshake.c:1137:client_setvolume_cbk] 0-testvol-client-0: failed to get 'process-uuid' from reply dict [Invalid argument]
[2018-07-27 06:42:44.748268] E [MSGID: 114044] [client-handshake.c:1143:client_setvolume_cbk] 0-testvol-client-0: SETVOLUME on remote-host failed: Authentication failed [Permission denied]
[2018-07-27 06:42:44.748298] I [MSGID: 114049] [client-handshake.c:1257:client_setvolume_cbk] 0-testvol-client-0: sending AUTH_FAILED event
[2018-07-27 06:42:44.748365] E [fuse-bridge.c:5328:notify] 0-fuse: Server authenication failed. Shutting down.

==============================================================
Version-Release number of selected component (if applicable):
-------------------------------------------------------------
[root@dhcp37-132 ~]# rpm -qa | grep glusterfs
glusterfs-server-3.12.2-14.el7rhgs.x86_64
glusterfs-3.12.2-14.el7rhgs.x86_64
glusterfs-ganesha-3.12.2-14.el7rhgs.x86_64
glusterfs-libs-3.12.2-14.el7rhgs.x86_64
glusterfs-fuse-3.12.2-14.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-14.el7rhgs.x86_64
glusterfs-api-3.12.2-14.el7rhgs.x86_64
glusterfs-client-xlators-3.12.2-14.el7rhgs.x86_64
glusterfs-rdma-3.12.2-14.el7rhgs.x86_64
glusterfs-cli-3.12.2-14.el7rhgs.x86_64

=================================================================
How reproducible:
-----------------
3/3

================================================================
Steps to Reproduce:
-------------------
1. Create a 6x3 Distributed-Replicate volume in cluster where gluster_shared_storage is enabled
2. Try to mount the volume on fuse client.

================================================================
Actual results:
---------------
Mount operation fails with the below error message:
"Mount failed. Please check the log file for more details."

===============================================================
Expected results:
-----------------
Volume should mount properly.

==============================================================
Additional info:
----------------
* Able to mount volume locally, but not on any other node in the cluster.
* gnfs mount is passing.
* Brick multiplexing is happening with the existing gluster_shared_storage and newly created volume.
* Seems like it fails to distinguish between gluster_shared_storage and testvolume.So authentication process for mounting testvolume is done like the way it has to be done for gluster_shared_storage (possible to mount on localhost only).

===============================================================
gluster-health-report
---------------------

[root@dhcp37-173 ~]# gluster-health-report

Loaded reports: errors_in_logs, disk_usage, gfid-mismatch-dht-report, memory_usage, firewall-check, kernel_issues, glusterd, glusterd-peer-disconnect, coredump, glusterd_volume_version_cksum_errors, glusterd-op-version, georep, errors_in_logs, ifconfig, nic-health, process_status

[     OK] Disk used percentage  path=/  percentage=17
[     OK] Disk used percentage  path=/var  percentage=17
[     OK] Disk used percentage  path=/tmp  percentage=17
[     OK] All peers are in connected state  connected_count=5  total_peer_count=5
[     OK] no gfid mismatch
[     OK] op-version is up to date  op_version=  max_op_version=
[     OK] The maximum size of core files created is set to unlimted.
[     OK] Ports open for glusterd:
tcp        0      0 0.0.0.0:24007           0.0.0.0:*               LISTEN      1513/glusterd       

[     OK] Ports open for glusterfsd:
3:tcp        0      0 0.0.0.0:49152           0.0.0.0:*               LISTEN      17293/glusterfsd    
4:tcp        0      0 0.0.0.0:49153           0.0.0.0:*               LISTEN      17333/glusterfsd    

[  ERROR] Report failure  report=report_check_worker_restarts
[WARNING] Glusterd uptime is less than 24 hours  uptime_sec=5619
[WARNING] Errors in Glusterd log file  num_errors=79
[WARNING] Warnings in Glusterd log file  num_warning=37
[     OK] No errors seen at network card
[     OK] No errors seen at network card
High CPU usage by Self-heal
[WARNING] Errors in Glusterd log file num_errors=199
[WARNING] Warnings in Glusterd log file num_warnings=128

....
You can find the detailed health-reportat /var/log/glusterfs/gluster-health-report-2018-07-27-11-34.log
================================================================================

[root@dhcp37-50 ~]# gluster-health-report

Loaded reports: errors_in_logs, disk_usage, gfid-mismatch-dht-report, memory_usage, firewall-check, kernel_issues, glusterd, glusterd-peer-disconnect, coredump, glusterd_volume_version_cksum_errors, glusterd-op-version, georep, errors_in_logs, ifconfig, nic-health, process_status

[     OK] Disk used percentage  path=/  percentage=17
[     OK] Disk used percentage  path=/var  percentage=17
[     OK] Disk used percentage  path=/tmp  percentage=17
[     OK] All peers are in connected state  connected_count=5  total_peer_count=5
[     OK] no gfid mismatch
[ NOT OK] Failed to check op-version
[     OK] The maximum size of core files created is set to unlimted.
[     OK] Ports open for glusterd:
tcp        0      0 0.0.0.0:24007           0.0.0.0:*               LISTEN      1525/glusterd       

[     OK] Ports open for glusterfsd:
3:tcp        0      0 0.0.0.0:49152           0.0.0.0:*               LISTEN      2369/glusterfsd     
4:tcp        0      0 0.0.0.0:49153           0.0.0.0:*               LISTEN      2378/glusterfsd     

[  ERROR] Report failure  report=report_check_worker_restarts
[WARNING] Glusterd uptime is less than 24 hours  uptime_sec=5745
[WARNING] Errors in Glusterd log file  num_errors=107
[WARNING] Warnings in Glusterd log file  num_warning=56
[     OK] No errors seen at network card
[     OK] No errors seen at network card
0
[WARNING] Errors in Glusterd log file num_errors=175
[WARNING] Warnings in Glusterd log file num_warnings=187

....
You can find the detailed health-reportat /var/log/glusterfs/gluster-health-report-2018-07-27-11-41.log

================================================================================
[root@dhcp37-132 ~]# gluster-health-report

Loaded reports: errors_in_logs, disk_usage, gfid-mismatch-dht-report, memory_usage, firewall-check, kernel_issues, glusterd, glusterd-peer-disconnect, coredump, glusterd_volume_version_cksum_errors, glusterd-op-version, georep, errors_in_logs, ifconfig, nic-health, process_status

[     OK] Disk used percentage  path=/  percentage=15
[     OK] Disk used percentage  path=/var  percentage=15
[     OK] Disk used percentage  path=/tmp  percentage=15
[     OK] All peers are in connected state  connected_count=5  total_peer_count=5
[     OK] no gfid mismatch
[ NOT OK] Failed to check op-version
[     OK] The maximum size of core files created is set to unlimted.
[     OK] Ports open for glusterd:
tcp        0      0 0.0.0.0:24007           0.0.0.0:*               LISTEN      1505/glusterd       

[     OK] Ports open for glusterfsd:
3:tcp        0      0 0.0.0.0:49152           0.0.0.0:*               LISTEN      2365/glusterfsd     
4:tcp        0      0 0.0.0.0:49153           0.0.0.0:*               LISTEN      2374/glusterfsd     

[  ERROR] Report failure  report=report_check_worker_restarts
[WARNING] Glusterd uptime is less than 24 hours  uptime_sec=5572
[WARNING] Errors in Glusterd log file  num_errors=109
[WARNING] Warnings in Glusterd log file  num_warning=63
[     OK] No errors seen at network card
[     OK] No errors seen at network card
0
[WARNING] Errors in Glusterd log file num_errors=166
[WARNING] Warnings in Glusterd log file num_warnings=148

....
You can find the detailed health-reportat /var/log/glusterfs/gluster-health-report-2018-07-27-11-42.log
================================================================================
[root@dhcp37-172 ~]# gluster-health-report

Loaded reports: errors_in_logs, disk_usage, gfid-mismatch-dht-report, memory_usage, firewall-check, kernel_issues, glusterd, glusterd-peer-disconnect, coredump, glusterd_volume_version_cksum_errors, glusterd-op-version, georep, errors_in_logs, ifconfig, nic-health, process_status

[     OK] Disk used percentage  path=/  percentage=16
[     OK] Disk used percentage  path=/var  percentage=16
[     OK] Disk used percentage  path=/tmp  percentage=16
[     OK] All peers are in connected state  connected_count=5  total_peer_count=5
[     OK] no gfid mismatch
[     OK] op-version is up to date  op_version=  max_op_version=
[     OK] The maximum size of core files created is set to unlimted.
[     OK] Ports open for glusterd:
tcp        0      0 0.0.0.0:24007           0.0.0.0:*               LISTEN      1528/glusterd       

[     OK] Ports open for glusterfsd:
3:tcp        0      0 0.0.0.0:49152           0.0.0.0:*               LISTEN      2366/glusterfsd     
4:tcp        0      0 0.0.0.0:49153           0.0.0.0:*               LISTEN      4515/glusterfsd     

[  ERROR] Report failure  report=report_check_worker_restarts
[WARNING] Glusterd uptime is less than 24 hours  uptime_sec=4619
[WARNING] Errors in Glusterd log file  num_errors=56
[WARNING] Warnings in Glusterd log file  num_warning=79
[     OK] No errors seen at network card
[     OK] No errors seen at network card
0
[WARNING] Errors in Glusterd log file num_errors=36136
[WARNING] Warnings in Glusterd log file num_warnings=174

....
You can find the detailed health-reportat /var/log/glusterfs/gluster-health-report-2018-07-27-11-43.log

================================================================================
[root@dhcp37-197 ~]# gluster-health-report

Loaded reports: errors_in_logs, disk_usage, gfid-mismatch-dht-report, memory_usage, firewall-check, kernel_issues, glusterd, glusterd-peer-disconnect, coredump, glusterd_volume_version_cksum_errors, glusterd-op-version, georep, errors_in_logs, ifconfig, nic-health, process_status

[     OK] Disk used percentage  path=/  percentage=18
[     OK] Disk used percentage  path=/var  percentage=18
[     OK] Disk used percentage  path=/tmp  percentage=18
[     OK] All peers are in connected state  connected_count=5  total_peer_count=5
[     OK] no gfid mismatch
[     OK] op-version is up to date  op_version=  max_op_version=
[     OK] The maximum size of core files created is set to unlimted.
[     OK] Ports open for glusterd:
tcp        0      0 0.0.0.0:24007           0.0.0.0:*               LISTEN      1525/glusterd       

[     OK] Ports open for glusterfsd:
3:tcp        0      0 0.0.0.0:49152           0.0.0.0:*               LISTEN      2362/glusterfsd     

[  ERROR] Report failure  report=report_check_worker_restarts
[WARNING] Glusterd uptime is less than 24 hours  uptime_sec=4477
[WARNING] Errors in Glusterd log file  num_errors=108
[WARNING] Warnings in Glusterd log file  num_warning=68
[     OK] No errors seen at network card
[     OK] No errors seen at network card
0
[WARNING] Errors in Glusterd log file num_errors=170
[WARNING] Warnings in Glusterd log file num_warnings=186

....
You can find the detailed health-reportat /var/log/glusterfs/gluster-health-report-2018-07-27-11-44.log

================================================================================
[root@dhcp37-56 ~]# gluster-health-report

Loaded reports: errors_in_logs, disk_usage, gfid-mismatch-dht-report, memory_usage, firewall-check, kernel_issues, glusterd, glusterd-peer-disconnect, coredump, glusterd_volume_version_cksum_errors, glusterd-op-version, georep, errors_in_logs, ifconfig, nic-health, process_status

[     OK] Disk used percentage  path=/  percentage=15
[     OK] Disk used percentage  path=/var  percentage=15
[     OK] Disk used percentage  path=/tmp  percentage=15
[     OK] All peers are in connected state  connected_count=5  total_peer_count=5
[     OK] no gfid mismatch
[     OK] op-version is up to date  op_version=  max_op_version=
[     OK] The maximum size of core files created is set to unlimted.
[     OK] Ports open for glusterd:
tcp        0      0 0.0.0.0:24007           0.0.0.0:*               LISTEN      1525/glusterd       

[     OK] Ports open for glusterfsd:
3:tcp        0      0 0.0.0.0:49152           0.0.0.0:*               LISTEN      2364/glusterfsd     

[  ERROR] Report failure  report=report_check_worker_restarts
[WARNING] Glusterd uptime is less than 24 hours  uptime_sec=4342
[WARNING] Errors in Glusterd log file  num_errors=60
[WARNING] Warnings in Glusterd log file  num_warning=58
[     OK] No errors seen at network card
[     OK] No errors seen at network card
0
[WARNING] Errors in Glusterd log file num_errors=36020
[WARNING] Warnings in Glusterd log file num_warnings=183

....
You can find the detailed health-reportat /var/log/glusterfs/gluster-health-report-2018-07-27-11-45.log

Comment 2 Mohammed Rafi KC 2018-07-27 11:06:22 UTC

RCA:

Gluster shared storage does a couple of more strict authentication to validate the clients. Because shared storage is an internal volume which stores metdata information for gluster features. So any bricks from normal volumes shouldn't be attached to gluster shared storage bricks.

Here the bricks from volume "testvolume" attached to shared storage and it does a strict validation. Because of this reason, volume mount will fail.

A fix for this bug will be in glusterd where we have to modify the code to select compatible bricks for brick multiplexing.

Comment 11 errata-xmlrpc 2018-09-04 06:51:13 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607