Bug 1713307

Summary:

ganesh-nfs didn't failback when writing files on Mac nfs client if the power is shut down

Product:

[Community] GlusterFS

Reporter:

guolei <guol-fnst>

Component:

ganesha-nfs

Assignee:

Soumya Koduri <skoduri>

Status:

CLOSED UPSTREAM

QA Contact:

Severity:

medium

Docs Contact:

Priority:

medium

Version:

4.1

CC:

bugs, guol-fnst, jthottan, skoduri

Target Milestone:

---

Keywords:

Triaged

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-03-12 12:48:27 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
ganesha log	none

Description guolei 2019-05-23 11:19:36 UTC

Created attachment 1572444 [details]
ganesha log

Description of problem:
We did some failover/failback tests on 3 nodes(Node-1 Node-2 Node-3).
The software architecture is "glusterfs +ctdb(public address) + nfs-ganesha". Gluster volume type is  replica 3.

We used CTDB's floating ip to mount the volume on Mac OS via nfs from Node-1, and wrote file A a to the mountpoint. When the file A was copied to the mountpoint, the power of Node-1 is shut down. The coping process was suspended, however  we can copy other files to the mountpoint normally.
20 minutes later, everything became OK, File A resumed being copied.

Windows NFS client has the ame behaviors with Mac.
But Centos NFS client works very well ,and shows no suspending.





Version-Release number of selected component (if applicable):

gluster version: 4.1.8
nfs-ganesha version: 2.7.3
Mac client(10.14.0)

How reproducible:


Steps to Reproduce:
1.create a gluster volume (replica 3), and export it with CTDB+ganesha-nfs
2.Mount the vol on Mac os or Windows via CTDB floating IP.Copy a file to the mountpiont.
3.Shut down the power of the node where the floating IP exists.

Actual results:
The coping process was suspended, however  we can copy other files to the mountpoint normally.
20 minutes later, everything became OK, File A resumed being copied.
No matter how many times we try, We must wait for 20 minutes.

Expected results:
File A can be transferrd in 1 or 2 minutes. 


Additional info:
Here is the ganesha log of Node-2 when the floating ip transferred to Node-2.

Comment 1 Soumya Koduri 2019-05-24 15:23:53 UTC

Can you please collect packet traces from all the machines (Node-1, Node-2 and especially from the client machine) while repeating this test for just that single file (i.e, FileA).

Comment 3 guolei 2019-10-29 02:50:21 UTC

After I modified the following parameters, it became ok!
server.tcp-user-timeout: 3
client.tcp-user-timeout: 5

Can you explain how it works?
May I close this bug ?

Comment 4 Worker Ant 2020-03-12 12:48:27 UTC

This bug is moved to https://github.com/gluster/glusterfs/issues/955, and will be tracked there from now on. Visit GitHub issues URL for further details