Hide Forgot
Description of problem: On an existing 4 node gluster cluster with samba ctdb setup. In the smb.conf under glusterfs:volfile_server section when we provide an ip (which is ping able but is not among one of the 4 node gluster cluster) followed by a valid ip actually belonging to one of the 4 nodes cluster where glusterd is active, it still fails to mount samba share when tried on windows. glusterfs:volfile_server = tcp+10.70.42.23:24007 10.70.47.12 dhcp47-122222.lab.eng.blr.redhat.com In the above 10.70.42.23 is not among the 4 nodes forming the gluster cluster. Whereas 10.70.47.12 is one of the valid ip of the cluster. Version-Release number of selected component (if applicable): samba-client-4.4.6-2.el7rhgs.x86_64 glusterfs-cli-3.8.4-3.el7rhgs.x86_64 Windows 8 Windows 10 How reproducible: Always Steps to Reproduce: 1.On a 4 node gluster cluster with samba CTDB 2.Create a 2X2 volume 3.In smb.conf under the volume section provide additional section glusterfs:volfile_server 4.Add an ip that does not belong to the cluster but the ip is ping able followed by a valid ip from the cluster where glusterd is active 5.service ctdb stop 6.service ctdb start 7.Wait for ctdb status to be OK for all nodes 8.Goto windows and try mounting the samba share 9.Try to mount a cifs share as well Actual results: Mount will fail Windows error 59 for cifs in linux error can't read superblock Expected results: The invalid ip should be ignored and mount should pass with the active ip provided. Additional info: Prior to mount------ [gluster-zombie] comment = For samba share of volume zombie path = / guest ok = Yes read only = No vfs objects = glusterfs glusterfs:volfile_server = tcp+10.70.42.23:24007 10.70.47.12 dhcp47-122222.lab.eng.blr.redhat.com glusterfs:loglevel = 7 glusterfs:logfile = /var/log/samba/glusterfs-zombie.%M.log glusterfs:volume = zombie [root@dhcp47-12 samba]# netstat -tnap | grep smbd tcp 0 0 0.0.0.0:139 0.0.0.0:* LISTEN 22561/smbd tcp 0 0 0.0.0.0:445 0.0.0.0:* LISTEN 22561/smbd tcp6 0 0 :::139 :::* LISTEN 22561/smbd tcp6 0 0 :::445 :::* LISTEN 22561/smbd [root@dhcp47-12 samba]# ps aux | grep smbd root 22561 0.0 0.0 424076 6012 ? Ss 13:32 0:00 /usr/sbin/smbd root 22563 0.0 0.0 405260 3128 ? S 13:32 0:00 /usr/sbin/smbd root 22564 0.0 0.0 405252 2820 ? S 13:32 0:00 /usr/sbin/smbd root 24323 0.0 0.0 112648 968 pts/0 S+ 13:36 0:00 grep --color=auto smbd [root@dhcp47-12 samba]# smbstatus Samba version 4.4.6 PID Username Group Machine Protocol Version Encryption Signing ---------------------------------------------------------------------------------------------------------------------------------------- Service pid Machine Connected at Encryption Signing --------------------------------------------------------------------------------------------- No locked files
Root Cause: There is a difference between the following two scenarios: a. Glusterd process is running on the given volfile_server entry b. Glusterd process is *not* running on the given volfile_server entry Scenario (b) is handled properly, if no response is received from the volfile_server entry, we move to the next entry in the list. In scenario (a), we have a Glusterd that belongs to a trusted storage pool that does not have any Volume corresponding to the volname given. In this case, we terminate the glfs_init process even before trying other entries in the list. This is based on the assumption that if one Glusterd process authoritatively says such a volume does not exist, it is true for the whole trusted storage pool and there is no need to verify the same from other Glusterd(s). Given below is the log messages to prove the same: [2016-11-07 15:14:38.715596] E [MSGID: 104021] [glfs-mgmt.c:552:glfs_mgmt_getspec_cbk] 0-gfapi: failed to get the 'volume file' from server [No such file or directory] [2016-11-07 15:14:38.715661] E [MSGID: 104007] [glfs-mgmt.c:633:glfs_mgmt_getspec_cbk] 0-glfs-mgmt: failed to fetch volume file (key:xcube) [Invalid argument] <------------- We got EINVAL not ENOTCONN [2016-11-07 15:14:38.715910] E [MSGID: 104024] [glfs-mgmt.c:735:mgmt_rpc_notify] 0-glfs-mgmt: failed to connect with remote-host: 192.168.21.5 (No such file or directory) [No such file or directory] [2016-11-07 15:14:38.715936] I [MSGID: 104044] [glfs-mgmt.c:837:mgmt_rpc_notify] 0-glfs-mgmt: connecting to next volfile server 192.168.21.4 at port 24007 with transport: tcp [2016-11-07 15:14:38.715961] I [MSGID: 101191] [event-epoll.c:659:event_dispatch_epoll_worker] 0-epoll: Exited thread with index 1 Severity for this bug is low considering the above information. Would like opinion of others regarding the same.
(In reply to Raghavendra Talur from comment #3) > Root Cause: > In scenario (a), we have a Glusterd that belongs to a trusted storage pool > that does not have any Volume corresponding to the volname given. In this > case, we terminate the glfs_init process even before trying other entries in > the list. This is based on the assumption that if one Glusterd process > authoritatively says such a volume does not exist, it is true for the whole > trusted storage pool and there is no need to verify the same from other > Glusterd(s). I agree that if glusterd returns that no such volume exists then we need not look into other nodes. This list should only have ips/hostname belonging to the same trusted storage pool. > > Given below is the log messages to prove the same: > > > [2016-11-07 15:14:38.715596] E [MSGID: 104021] > [glfs-mgmt.c:552:glfs_mgmt_getspec_cbk] 0-gfapi: failed to get the 'volume > file' from server [No such file or directory] > [2016-11-07 15:14:38.715661] E [MSGID: 104007] > [glfs-mgmt.c:633:glfs_mgmt_getspec_cbk] 0-glfs-mgmt: failed to fetch volume > file (key:xcube) [Invalid argument] <------------- We got EINVAL not ENOTCONN > [2016-11-07 15:14:38.715910] E [MSGID: 104024] > [glfs-mgmt.c:735:mgmt_rpc_notify] 0-glfs-mgmt: failed to connect with > remote-host: 192.168.21.5 (No such file or directory) [No such file or > directory] > [2016-11-07 15:14:38.715936] I [MSGID: 104044] > [glfs-mgmt.c:837:mgmt_rpc_notify] 0-glfs-mgmt: connecting to next volfile > server 192.168.21.4 at port 24007 with transport: tcp Why are we seeing this message if we are not connecting to the next vlofile?
(In reply to rjoseph from comment #4) > > Given below is the log messages to prove the same: > > > > > > [2016-11-07 15:14:38.715596] E [MSGID: 104021] > > [glfs-mgmt.c:552:glfs_mgmt_getspec_cbk] 0-gfapi: failed to get the 'volume > > file' from server [No such file or directory] > > [2016-11-07 15:14:38.715661] E [MSGID: 104007] > > [glfs-mgmt.c:633:glfs_mgmt_getspec_cbk] 0-glfs-mgmt: failed to fetch volume > > file (key:xcube) [Invalid argument] <------------- We got EINVAL not ENOTCONN > > [2016-11-07 15:14:38.715910] E [MSGID: 104024] > > [glfs-mgmt.c:735:mgmt_rpc_notify] 0-glfs-mgmt: failed to connect with > > remote-host: 192.168.21.5 (No such file or directory) [No such file or > > directory] > > [2016-11-07 15:14:38.715936] I [MSGID: 104044] > > [glfs-mgmt.c:837:mgmt_rpc_notify] 0-glfs-mgmt: connecting to next volfile > > server 192.168.21.4 at port 24007 with transport: tcp > > Why are we seeing this message if we are not connecting to the next vlofile? The reconnect thread does proceed to try next server but the main thread terminates because of previous error code. The current model does not give enough information to reconnect for it to stop retrying.
Based on the above comments, it is agreed that it is not the right way to use volfile_server option. All the ip/hostnames given in the list should belong to same Gluster Trusted Storage Pool. Closing the bug for the same reason.