Bug 1713307

Summary: ganesh-nfs didn't failback when writing files on Mac nfs client if the power is shut down
Product: [Community] GlusterFS Reporter: guolei <guol-fnst>
Component: ganesha-nfsAssignee: Soumya Koduri <skoduri>
Status: CLOSED UPSTREAM QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 4.1CC: bugs, guol-fnst, jthottan, skoduri
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-12 12:48:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ganesha log none

Description guolei 2019-05-23 11:19:36 UTC
Created attachment 1572444 [details]
ganesha log

Description of problem:
We did some failover/failback tests on 3 nodes(Node-1 Node-2 Node-3).
The software architecture is "glusterfs +ctdb(public address) + nfs-ganesha". Gluster volume type is  replica 3.

We used CTDB's floating ip to mount the volume on Mac OS via nfs from Node-1, and wrote file A a to the mountpoint. When the file A was copied to the mountpoint, the power of Node-1 is shut down. The coping process was suspended, however  we can copy other files to the mountpoint normally.
20 minutes later, everything became OK, File A resumed being copied.

Windows NFS client has the ame behaviors with Mac.
But Centos NFS client works very well ,and shows no suspending.





Version-Release number of selected component (if applicable):

gluster version: 4.1.8
nfs-ganesha version: 2.7.3
Mac client(10.14.0)

How reproducible:


Steps to Reproduce:
1.create a gluster volume (replica 3), and export it with CTDB+ganesha-nfs
2.Mount the vol on Mac os or Windows via CTDB floating IP.Copy a file to the mountpiont.
3.Shut down the power of the node where the floating IP exists.

Actual results:
The coping process was suspended, however  we can copy other files to the mountpoint normally.
20 minutes later, everything became OK, File A resumed being copied.
No matter how many times we try, We must wait for 20 minutes.

Expected results:
File A can be transferrd in 1 or 2 minutes. 


Additional info:
Here is the ganesha log of Node-2 when the floating ip transferred to Node-2.

Comment 1 Soumya Koduri 2019-05-24 15:23:53 UTC
Can you please collect packet traces from all the machines (Node-1, Node-2 and especially from the client machine) while repeating this test for just that single file (i.e, FileA).

Comment 3 guolei 2019-10-29 02:50:21 UTC
After I modified the following parameters, it became ok!
server.tcp-user-timeout: 3
client.tcp-user-timeout: 5

Can you explain how it works?
May I close this bug ?

Comment 4 Worker Ant 2020-03-12 12:48:27 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/955, and will be tracked there from now on. Visit GitHub issues URL for further details