Bug 1713307 - ganesh-nfs didn't failback when writing files on Mac nfs client if the power is shut down
Summary: ganesh-nfs didn't failback when writing files on Mac nfs client if the power ...
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: GlusterFS
Classification: Community
Component: ganesha-nfs
Version: 4.1
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Soumya Koduri
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-23 11:19 UTC by guolei
Modified: 2020-03-12 12:48 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-12 12:48:27 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
ganesha log (3.43 MB, application/gzip)
2019-05-23 11:19 UTC, guolei
no flags Details

Description guolei 2019-05-23 11:19:36 UTC
Created attachment 1572444 [details]
ganesha log

Description of problem:
We did some failover/failback tests on 3 nodes(Node-1 Node-2 Node-3).
The software architecture is "glusterfs +ctdb(public address) + nfs-ganesha". Gluster volume type is  replica 3.

We used CTDB's floating ip to mount the volume on Mac OS via nfs from Node-1, and wrote file A a to the mountpoint. When the file A was copied to the mountpoint, the power of Node-1 is shut down. The coping process was suspended, however  we can copy other files to the mountpoint normally.
20 minutes later, everything became OK, File A resumed being copied.

Windows NFS client has the ame behaviors with Mac.
But Centos NFS client works very well ,and shows no suspending.





Version-Release number of selected component (if applicable):

gluster version: 4.1.8
nfs-ganesha version: 2.7.3
Mac client(10.14.0)

How reproducible:


Steps to Reproduce:
1.create a gluster volume (replica 3), and export it with CTDB+ganesha-nfs
2.Mount the vol on Mac os or Windows via CTDB floating IP.Copy a file to the mountpiont.
3.Shut down the power of the node where the floating IP exists.

Actual results:
The coping process was suspended, however  we can copy other files to the mountpoint normally.
20 minutes later, everything became OK, File A resumed being copied.
No matter how many times we try, We must wait for 20 minutes.

Expected results:
File A can be transferrd in 1 or 2 minutes. 


Additional info:
Here is the ganesha log of Node-2 when the floating ip transferred to Node-2.

Comment 1 Soumya Koduri 2019-05-24 15:23:53 UTC
Can you please collect packet traces from all the machines (Node-1, Node-2 and especially from the client machine) while repeating this test for just that single file (i.e, FileA).

Comment 3 guolei 2019-10-29 02:50:21 UTC
After I modified the following parameters, it became ok!
server.tcp-user-timeout: 3
client.tcp-user-timeout: 5

Can you explain how it works?
May I close this bug ?

Comment 4 Worker Ant 2020-03-12 12:48:27 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/955, and will be tracked there from now on. Visit GitHub issues URL for further details


Note You need to log in before you can comment on or make changes to this bug.