Bug 1524815

Summary: Writing to samba share stops if another gluster node becomes unavailable
Product: [Community] GlusterFS Reporter: david.spisla
Component: gluster-smbAssignee: bugs <bugs>
Status: CLOSED NOTABUG QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.12CC: bugs
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-01-08 07:31:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Detailed list of installed packages, smb.conf, smbd-debugger-output, smbd-strace-output, volinfo, etc. none

Description david.spisla 2017-12-12 07:34:57 UTC
Created attachment 1366458 [details]
Detailed list of installed packages, smb.conf, smbd-debugger-output, smbd-strace-output, volinfo, etc.

Description of problem:
We found a strange behaviour with our gluster installation using Gluster 3.12 together with Samba (+Gluster-Vfs Plugin).
Please find attached the detailed list of installed packages, smb.conf, smbd-debugger-output, smbd-strace-output, volinfo, etc.

Writing to samba share stops if another gluster node becomes unavailable

Version-Release number of selected component (if applicable):
gluster3.12, samba4.6.2, samba-vfs-glusterfs4.6.2

How reproducible:
We’re using a setup with two nodes. A volume with one brick on each node, Replica 2. Samba provides a share on Node1. Start writing a big file to the samba share on Node1 and disconnect Node2 (we stopped the VM).

Actual results:
Writing to samba share will freeze. It depends on the samba client if the write process will proceed after ~70 second (or longer) or if it fails

Expected results:
Writing to samba share will not stop or interrupted if another gluster node becomes unavailable

Additional info:
The attached files suggest the suspicion that the running gluster process is not aware in the fact that the other node died and is blocked until the dead node is available again or a timeout occurs.
We also tried to use a fuse mount (samba used it as a local directory) with nearly the same result.

Because of the long timeout the smb client might fail.

Comment 1 david.spisla 2018-01-08 07:34:47 UTC
The behaviour above is caused by the "network.ping-timeout" default value of 42.
Reduce this value if you are using a smb client. The Windows SMB Client use a timeout of 25. So 42 is too high for this Client.