1242358 – Different epoch values for each of NFS-Ganesha heads

Bug 1242358 - Different epoch values for each of NFS-Ganesha heads

Summary: Different epoch values for each of NFS-Ganesha heads

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nfs-ganesha
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.1.3
Assignee:	Soumya Koduri
QA Contact:	Shashank Raj
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1188184 1224250 1299184 1317482 1317902
TreeView+	depends on / blocked

Reported:	2015-07-13 07:25 UTC by Meghana
Modified:	2016-11-08 03:52 UTC (History)
CC List:	9 users (show)
Fixed In Version:	nfs-ganesha-2.3.1-5
Doc Type:	Bug Fix
Doc Text:	Previously, while configuring nfs-ganesha cluster, there were cases where in nfs-ganesha process on each node would come up at the same time resulting in most of them having same epoch value. As a consequence, same epoch values on all the NFS-Ganesha heads resulted in NFS server sending NFS4ERR_FHEXPIRED error instead of NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID after failover. This resulted in NFSv4 clients not able to recover locks after failover. With this fix, a new option "EPOCH_EXEC" is added to '/etc/sysconfig/ganesha' to take the path of the script (default: '/bin/true') which is used to generate epoch value. For Gluster, a new script '/usr/libexec/ganesha/generate_epoch.py' is added and will be used to generate epoch value. A new helper service 'nfs-ganesha-config' added to process the init options provided in '/etc/sysconfig/ganesha' and copy the results to '/run/sysconfig/ganesha' to be used by nfs-ganesha while starting. Now, NFS-Ganesha will have unique epoch value on each of the nodes of the cluster resulting in smooth failover.
Clone Of:
Clones:	1317482 (view as bug list)
Environment:
Last Closed:	2016-06-23 05:32:22 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1224250	0	high	CLOSED	nfs-ganesha: cthon does not finish when failover is triggered by killing nfs-ganesha process	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHEA-2016:1247	0	normal	SHIPPED_LIVE	nfs-ganesha update for Red Hat Gluster Storage 3.1 update 3	2016-06-23 09:12:43 UTC

Internal Links: 1224250

Description Meghana 2015-07-13 07:25:13 UTC

Description of problem:
When the epoch values are same for all the NFS-Ganesha heads, cton lock tests fail during failover

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 6 Shashank Raj 2016-05-30 11:49:32 UTC

Verified this bug with latest glusterfs-3.7.9-6 and nfs-ganesha-2.3.1-7 build and below are the observation:

the epoch value in /run/sysconfig/ganesha is different all the time:

node 1: EPOCH="-E 6290437990960529408"
node 2: EPOCH="-E 6290439785305014272"
node 3: EPOCH="-E 6290439794071699456"
node 4: EPOCH="-E 6290439789269680128"

and everytime when we restart ganesha service, the value gets changed and is different on all ganesha nodes (even when they are restarted at the same time)

node 1: EPOCH="-E 6290439159191633920"
node 2: EPOCH="-E 6290440949241151488"
node 3: EPOCH="-E 6290440949417902080"
node 4: EPOCH="-E 6290440961795751936"

Based on the above observation, marking this bug as Verified.

Comment 7 Divya 2016-06-13 10:39:36 UTC

Soumya,

Please review and sign-off the edited doc text,

Comment 8 Soumya Koduri 2016-06-13 12:13:37 UTC

The doc text looks good to me.

Comment 10 errata-xmlrpc 2016-06-23 05:32:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2016:1247

Note You need to log in before you can comment on or make changes to this bug.