Bug 1257548 - nfs-ganesha service monitor period interval should be atleast twice the gluster ping timeout
Summary: nfs-ganesha service monitor period interval should be atleast twice the glust...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: nfs-ganesha
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Soumya Koduri
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1255689
TreeView+ depends on / blocked
 
Reported: 2015-08-27 10:12 UTC by Soumya Koduri
Modified: 2019-05-20 12:40 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
nfs-ganesha service monitor script which triggers IP failover runs periodically every 10 seconds. The ping-timeout of the GlusterFS server (after which the locks of the unreachable client gets flushed) is 42 seconds by default. After an IP failover, some locks may not get cleaned by the GlusterFS server process, hence reclaiming the lock state by NFS clients may fail Workaround (if any): It is recommended to set the nfs-ganesha service monitor period interval (default 10sec) at least as twice as the Gluster server ping-timout (default 42sec). Hence, either decrease the network ping-timeout using the following command: # gluster volume set <volname> network.ping-timeout <ping_timeout_value> or increase nfs-service monitor interval time using the following commands: # pcs resource op remove nfs-mon monitor # pcs resource op add nfs-mon monitor interval=<interval_period_value> timeout=<timeout_value>
Clone Of:
Environment:
Last Closed: 2019-05-20 12:40:29 UTC
Embargoed:


Attachments (Terms of Use)

Description Soumya Koduri 2015-08-27 10:12:55 UTC
Description of problem:

In case of failover/failback , there could be situations where in the VIP fails over within the glusterfs ping timeout. Since the gluster server may not have cleaned up the earlier lock state, the reclaim of the locks by the clients (if any) would fail. 

To fix the same, we may need to increase the nfs-ganesha service monitor interval to be at least twice to glusterfs ping timeout value.

Comment 2 Niels de Vos 2015-09-15 13:25:18 UTC
Modified the DocText a little bit. Soumya, was there not something with a grace time in the brick processes? I thought the IP-failover needed to be

  2 x network.ping-timeout
  1 x grace timeout for releasing the locks
  ------------------------------------------
  (total)

Comment 3 Soumya Koduri 2015-09-15 13:54:57 UTC
As soon as the network ping times out, glusterFS process starts flushing the locks of all the fds opened by that client. Assuming that the locks get flushed within another network.ping-timeout value, we are recommending to have monitor script pitch in after 2*network.ping-timeout. Is it safe to have that assumption?

Comment 4 Soumya Koduri 2015-09-15 13:57:13 UTC
Btw the grace period of the NFS server gets included in the total failover time seen by the NFS clients to get back the I/O going (which has to be documented as part of BZ#1257545)

Comment 5 Anjana Suparna Sriram 2015-09-18 07:45:08 UTC
Hi Soumya,

Please review the edited doc text and sign off to be included in the Known Issues chapter.

Regards,
Anjana

Comment 6 Soumya Koduri 2015-09-18 08:48:21 UTC
A small correction needed in the below statement 

' After an IP failover, some locks may get cleaned by the GlusterFS server process, hence reclaiming the lock state by NFS clients fails'

to

'After an IP failover, some locks may not get cleaned by the GlusterFS server process, hence reclaiming the lock state by NFS clients may fail'

Comment 7 Anjana Suparna Sriram 2015-09-18 09:59:22 UTC
upadated the doc text as per Comment 6(https://bugzilla.redhat.com/show_bug.cgi?id=1257548#c6)

Comment 8 Soumya Koduri 2016-01-28 11:08:20 UTC
Will address this as part of multi-protocol effort

Comment 9 Soumya Koduri 2017-05-03 12:13:56 UTC
This should be addressed as part of Lock reclaim support in GlusterFS - https://review.gluster.org/#/c/14986/

Comment 13 Soumya Koduri 2019-05-06 12:42:57 UTC
Increasing the monitor (as originally described in the bug) will increase the failover/failback time. Its tricky to decide how and to what values these intervals should be configured to and may vary from time to time. So I suggest we note down such recommendations in the admin guide (nfs-ganesha trouble shooting section). Hence converting this BZ to doc componenent.


See also : bug1608899
Doc link: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html-single/administration_guide/#nfs_ganesha


Note You need to log in before you can comment on or make changes to this bug.