Description of problem:
In case of failover/failback , there could be situations where in the VIP fails over within the glusterfs ping timeout. Since the gluster server may not have cleaned up the earlier lock state, the reclaim of the locks by the clients (if any) would fail.
To fix the same, we may need to increase the nfs-ganesha service monitor interval to be at least twice to glusterfs ping timeout value.
Modified the DocText a little bit. Soumya, was there not something with a grace time in the brick processes? I thought the IP-failover needed to be
2 x network.ping-timeout
1 x grace timeout for releasing the locks
As soon as the network ping times out, glusterFS process starts flushing the locks of all the fds opened by that client. Assuming that the locks get flushed within another network.ping-timeout value, we are recommending to have monitor script pitch in after 2*network.ping-timeout. Is it safe to have that assumption?
Btw the grace period of the NFS server gets included in the total failover time seen by the NFS clients to get back the I/O going (which has to be documented as part of BZ#1257545)
Please review the edited doc text and sign off to be included in the Known Issues chapter.
A small correction needed in the below statement
' After an IP failover, some locks may get cleaned by the GlusterFS server process, hence reclaiming the lock state by NFS clients fails'
'After an IP failover, some locks may not get cleaned by the GlusterFS server process, hence reclaiming the lock state by NFS clients may fail'
upadated the doc text as per Comment 6(https://bugzilla.redhat.com/show_bug.cgi?id=1257548#c6)
Will address this as part of multi-protocol effort
This should be addressed as part of Lock reclaim support in GlusterFS - https://review.gluster.org/#/c/14986/
Increasing the monitor (as originally described in the bug) will increase the failover/failback time. Its tricky to decide how and to what values these intervals should be configured to and may vary from time to time. So I suggest we note down such recommendations in the admin guide (nfs-ganesha trouble shooting section). Hence converting this BZ to doc componenent.
See also : bug1608899
Doc link: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html-single/administration_guide/#nfs_ganesha