Created attachment 1572444 [details] ganesha log Description of problem: We did some failover/failback tests on 3 nodes(Node-1 Node-2 Node-3). The software architecture is "glusterfs +ctdb(public address) + nfs-ganesha". Gluster volume type is replica 3. We used CTDB's floating ip to mount the volume on Mac OS via nfs from Node-1, and wrote file A a to the mountpoint. When the file A was copied to the mountpoint, the power of Node-1 is shut down. The coping process was suspended, however we can copy other files to the mountpoint normally. 20 minutes later, everything became OK, File A resumed being copied. Windows NFS client has the ame behaviors with Mac. But Centos NFS client works very well ,and shows no suspending. Version-Release number of selected component (if applicable): gluster version: 4.1.8 nfs-ganesha version: 2.7.3 Mac client(10.14.0) How reproducible: Steps to Reproduce: 1.create a gluster volume (replica 3), and export it with CTDB+ganesha-nfs 2.Mount the vol on Mac os or Windows via CTDB floating IP.Copy a file to the mountpiont. 3.Shut down the power of the node where the floating IP exists. Actual results: The coping process was suspended, however we can copy other files to the mountpoint normally. 20 minutes later, everything became OK, File A resumed being copied. No matter how many times we try, We must wait for 20 minutes. Expected results: File A can be transferrd in 1 or 2 minutes. Additional info: Here is the ganesha log of Node-2 when the floating ip transferred to Node-2.
Can you please collect packet traces from all the machines (Node-1, Node-2 and especially from the client machine) while repeating this test for just that single file (i.e, FileA).
After I modified the following parameters, it became ok! server.tcp-user-timeout: 3 client.tcp-user-timeout: 5 Can you explain how it works? May I close this bug ?
This bug is moved to https://github.com/gluster/glusterfs/issues/955, and will be tracked there from now on. Visit GitHub issues URL for further details