Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1257548 - nfs-ganesha service monitor period interval should be atleast twice the gluster ping timeout
nfs-ganesha service monitor period interval should be atleast twice the glust...
Status: ASSIGNED
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: nfs-ganesha (Show other bugs)
3.1
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Soumya Koduri
storage-qa-internal@redhat.com
: FutureFeature, RFE, ZStream
Depends On:
Blocks: 1255689
  Show dependency treegraph
 
Reported: 2015-08-27 06:12 EDT by Soumya Koduri
Modified: 2018-08-12 11:25 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Known Issue
Doc Text:
nfs-ganesha service monitor script which triggers IP failover runs periodically every 10 seconds. The ping-timeout of the GlusterFS server (after which the locks of the unreachable client gets flushed) is 42 seconds by default. After an IP failover, some locks may not get cleaned by the GlusterFS server process, hence reclaiming the lock state by NFS clients may fail Workaround (if any): It is recommended to set the nfs-ganesha service monitor period interval (default 10sec) at least as twice as the Gluster server ping-timout (default 42sec). Hence, either decrease the network ping-timeout using the following command: # gluster volume set <volname> network.ping-timeout <ping_timeout_value> or increase nfs-service monitor interval time using the following commands: # pcs resource op remove nfs-mon monitor # pcs resource op add nfs-mon monitor interval=<interval_period_value> timeout=<timeout_value>
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Soumya Koduri 2015-08-27 06:12:55 EDT
Description of problem:

In case of failover/failback , there could be situations where in the VIP fails over within the glusterfs ping timeout. Since the gluster server may not have cleaned up the earlier lock state, the reclaim of the locks by the clients (if any) would fail. 

To fix the same, we may need to increase the nfs-ganesha service monitor interval to be at least twice to glusterfs ping timeout value.
Comment 2 Niels de Vos 2015-09-15 09:25:18 EDT
Modified the DocText a little bit. Soumya, was there not something with a grace time in the brick processes? I thought the IP-failover needed to be

  2 x network.ping-timeout
  1 x grace timeout for releasing the locks
  ------------------------------------------
  (total)
Comment 3 Soumya Koduri 2015-09-15 09:54:57 EDT
As soon as the network ping times out, glusterFS process starts flushing the locks of all the fds opened by that client. Assuming that the locks get flushed within another network.ping-timeout value, we are recommending to have monitor script pitch in after 2*network.ping-timeout. Is it safe to have that assumption?
Comment 4 Soumya Koduri 2015-09-15 09:57:13 EDT
Btw the grace period of the NFS server gets included in the total failover time seen by the NFS clients to get back the I/O going (which has to be documented as part of BZ#1257545)
Comment 5 Anjana Suparna Sriram 2015-09-18 03:45:08 EDT
Hi Soumya,

Please review the edited doc text and sign off to be included in the Known Issues chapter.

Regards,
Anjana
Comment 6 Soumya Koduri 2015-09-18 04:48:21 EDT
A small correction needed in the below statement 

' After an IP failover, some locks may get cleaned by the GlusterFS server process, hence reclaiming the lock state by NFS clients fails'

to

'After an IP failover, some locks may not get cleaned by the GlusterFS server process, hence reclaiming the lock state by NFS clients may fail'
Comment 7 Anjana Suparna Sriram 2015-09-18 05:59:22 EDT
upadated the doc text as per Comment 6(https://bugzilla.redhat.com/show_bug.cgi?id=1257548#c6)
Comment 8 Soumya Koduri 2016-01-28 06:08:20 EST
Will address this as part of multi-protocol effort
Comment 9 Soumya Koduri 2017-05-03 08:13:56 EDT
This should be addressed as part of Lock reclaim support in GlusterFS - https://review.gluster.org/#/c/14986/

Note You need to log in before you can comment on or make changes to this bug.