Bug 1392170

Summary:	[SAMBA-volfile] : Unable to mount samba share when a ping able ip that does not belong to the existing gluster cluster is provided before a valid ip in the glusterfs volfile server section in smb.conf
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Vivek Das <vdas>
Component:	samba	Assignee:	Raghavendra Talur <rtalur>
Status:	CLOSED NOTABUG	QA Contact:	storage-qa-internal <storage-qa-internal>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.2	CC:	rhs-smb, rjoseph, rtalur
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-11-09 09:37:22 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Vivek Das 2016-11-05 14:10:52 UTC

Description of problem:
On an existing 4 node gluster cluster with samba ctdb setup.
In the smb.conf under glusterfs:volfile_server section when we provide an ip (which is ping able but is not among one of the 4 node gluster cluster) followed by a valid ip actually belonging to one of the 4 nodes cluster where glusterd is active, it still fails to mount samba share when tried on windows.

glusterfs:volfile_server = tcp+10.70.42.23:24007 10.70.47.12 dhcp47-122222.lab.eng.blr.redhat.com

In the above 10.70.42.23 is not among the 4 nodes forming the gluster cluster.
Whereas 10.70.47.12 is one of the valid ip of the cluster.

Version-Release number of selected component (if applicable):
samba-client-4.4.6-2.el7rhgs.x86_64
glusterfs-cli-3.8.4-3.el7rhgs.x86_64
Windows 8
Windows 10

How reproducible:
Always

Steps to Reproduce:
1.On a 4 node gluster cluster with samba CTDB
2.Create a 2X2 volume
3.In smb.conf under the volume section provide additional section glusterfs:volfile_server
4.Add an ip that does not belong to the cluster but the ip is ping able followed by a valid ip from the cluster where glusterd is active
5.service ctdb stop
6.service ctdb start
7.Wait for ctdb status to be OK for all nodes
8.Goto windows and try mounting the samba share
9.Try to mount a cifs share as well

Actual results:
Mount will fail
Windows error 59
for cifs in linux error can't read superblock

Expected results:
The invalid ip should be ignored and mount should pass with the active ip provided.

Additional info:
Prior to mount------

[gluster-zombie]
	comment = For samba share of volume zombie
	path = /
	guest ok = Yes
	read only = No
	vfs objects = glusterfs
	glusterfs:volfile_server = tcp+10.70.42.23:24007 10.70.47.12 dhcp47-122222.lab.eng.blr.redhat.com
	glusterfs:loglevel = 7
	glusterfs:logfile = /var/log/samba/glusterfs-zombie.%M.log
	glusterfs:volume = zombie




[root@dhcp47-12 samba]# netstat -tnap | grep smbd
tcp        0      0 0.0.0.0:139             0.0.0.0:*               LISTEN      22561/smbd          
tcp        0      0 0.0.0.0:445             0.0.0.0:*               LISTEN      22561/smbd          
tcp6       0      0 :::139                  :::*                    LISTEN      22561/smbd          
tcp6       0      0 :::445                  :::*                    LISTEN      22561/smbd    


[root@dhcp47-12 samba]# ps aux | grep smbd
root     22561  0.0  0.0 424076  6012 ?        Ss   13:32   0:00 /usr/sbin/smbd
root     22563  0.0  0.0 405260  3128 ?        S    13:32   0:00 /usr/sbin/smbd
root     22564  0.0  0.0 405252  2820 ?        S    13:32   0:00 /usr/sbin/smbd
root     24323  0.0  0.0 112648   968 pts/0    S+   13:36   0:00 grep --color=auto smbd


[root@dhcp47-12 samba]# smbstatus

Samba version 4.4.6
PID     Username     Group        Machine                                   Protocol Version  Encryption           Signing              
----------------------------------------------------------------------------------------------------------------------------------------

Service      pid     Machine       Connected at                     Encryption   Signing     
---------------------------------------------------------------------------------------------

No locked files

Comment 3 Raghavendra Talur 2016-11-07 15:21:43 UTC

Root Cause:

There is a difference between the following two scenarios:
a. Glusterd process is running on the given volfile_server entry
b. Glusterd process is *not* running on the given volfile_server entry

Scenario (b) is handled properly, if no response is received from the volfile_server entry, we move to the next entry in the list.

In scenario (a), we have a Glusterd that belongs to a trusted storage pool that does not have any Volume corresponding to the volname given. In this case, we terminate the glfs_init process even before trying other entries in the list. This is based on the assumption that if one Glusterd process authoritatively says such a volume does not exist, it is true for the whole trusted storage pool and there is no need to verify the same from other Glusterd(s).

Given below is the log messages to prove the same:


[2016-11-07 15:14:38.715596] E [MSGID: 104021] [glfs-mgmt.c:552:glfs_mgmt_getspec_cbk] 0-gfapi: failed to get the 'volume file' from server [No such file or directory]
[2016-11-07 15:14:38.715661] E [MSGID: 104007] [glfs-mgmt.c:633:glfs_mgmt_getspec_cbk] 0-glfs-mgmt: failed to fetch volume file (key:xcube) [Invalid argument] <------------- We got EINVAL not ENOTCONN
[2016-11-07 15:14:38.715910] E [MSGID: 104024] [glfs-mgmt.c:735:mgmt_rpc_notify] 0-glfs-mgmt: failed to connect with remote-host: 192.168.21.5 (No such file or directory) [No such file or directory]
[2016-11-07 15:14:38.715936] I [MSGID: 104044] [glfs-mgmt.c:837:mgmt_rpc_notify] 0-glfs-mgmt: connecting to next volfile server 192.168.21.4 at port 24007 with transport: tcp
[2016-11-07 15:14:38.715961] I [MSGID: 101191] [event-epoll.c:659:event_dispatch_epoll_worker] 0-epoll: Exited thread with index 1



Severity for this bug is low considering the above information. Would like opinion of others regarding the same.

Comment 4 rjoseph 2016-11-08 09:43:16 UTC

(In reply to Raghavendra Talur from comment #3)
> Root Cause:

> In scenario (a), we have a Glusterd that belongs to a trusted storage pool
> that does not have any Volume corresponding to the volname given. In this
> case, we terminate the glfs_init process even before trying other entries in
> the list. This is based on the assumption that if one Glusterd process
> authoritatively says such a volume does not exist, it is true for the whole
> trusted storage pool and there is no need to verify the same from other
> Glusterd(s).

I agree that if glusterd returns that no such volume exists then we need not look into other nodes. This list should only have ips/hostname belonging to the same trusted storage pool.

> 
> Given below is the log messages to prove the same:
> 
> 
> [2016-11-07 15:14:38.715596] E [MSGID: 104021]
> [glfs-mgmt.c:552:glfs_mgmt_getspec_cbk] 0-gfapi: failed to get the 'volume
> file' from server [No such file or directory]
> [2016-11-07 15:14:38.715661] E [MSGID: 104007]
> [glfs-mgmt.c:633:glfs_mgmt_getspec_cbk] 0-glfs-mgmt: failed to fetch volume
> file (key:xcube) [Invalid argument] <------------- We got EINVAL not ENOTCONN
> [2016-11-07 15:14:38.715910] E [MSGID: 104024]
> [glfs-mgmt.c:735:mgmt_rpc_notify] 0-glfs-mgmt: failed to connect with
> remote-host: 192.168.21.5 (No such file or directory) [No such file or
> directory]
> [2016-11-07 15:14:38.715936] I [MSGID: 104044]
> [glfs-mgmt.c:837:mgmt_rpc_notify] 0-glfs-mgmt: connecting to next volfile
> server 192.168.21.4 at port 24007 with transport: tcp

Why are we seeing this message if we are not connecting to the next vlofile?

Comment 5 Raghavendra Talur 2016-11-08 10:36:30 UTC

(In reply to rjoseph from comment #4)

> > Given below is the log messages to prove the same:
> > 
> > 
> > [2016-11-07 15:14:38.715596] E [MSGID: 104021]
> > [glfs-mgmt.c:552:glfs_mgmt_getspec_cbk] 0-gfapi: failed to get the 'volume
> > file' from server [No such file or directory]
> > [2016-11-07 15:14:38.715661] E [MSGID: 104007]
> > [glfs-mgmt.c:633:glfs_mgmt_getspec_cbk] 0-glfs-mgmt: failed to fetch volume
> > file (key:xcube) [Invalid argument] <------------- We got EINVAL not ENOTCONN
> > [2016-11-07 15:14:38.715910] E [MSGID: 104024]
> > [glfs-mgmt.c:735:mgmt_rpc_notify] 0-glfs-mgmt: failed to connect with
> > remote-host: 192.168.21.5 (No such file or directory) [No such file or
> > directory]
> > [2016-11-07 15:14:38.715936] I [MSGID: 104044]
> > [glfs-mgmt.c:837:mgmt_rpc_notify] 0-glfs-mgmt: connecting to next volfile
> > server 192.168.21.4 at port 24007 with transport: tcp
> 
> Why are we seeing this message if we are not connecting to the next vlofile?


The reconnect thread does proceed to try next server but the main thread terminates because of previous error code. The current model does not give
enough information to reconnect for it to stop retrying.

Comment 6 Raghavendra Talur 2016-11-09 09:37:22 UTC

Based on the above comments, it is agreed that it is not the right way to use volfile_server option. All the ip/hostnames given in the list should belong to same Gluster Trusted Storage Pool. Closing the bug for the same reason.