Bug 1242358

Summary:	Different epoch values for each of NFS-Ganesha heads
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Meghana <mmadhusu>
Component:	nfs-ganesha	Assignee:	Soumya Koduri <skoduri>
Status:	CLOSED ERRATA	QA Contact:	Shashank Raj <sraj>
Severity:	high	Docs Contact:
Priority:	high
Version:	rhgs-3.1	CC:	asrivast, divya, kkeithle, ndevos, nlevinki, rhinduja, sankarshan, sashinde, skoduri
Target Milestone:	---	Keywords:	ZStream
Target Release:	RHGS 3.1.3
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	nfs-ganesha-2.3.1-5	Doc Type:	Bug Fix
Doc Text:	Previously, while configuring nfs-ganesha cluster, there were cases where in nfs-ganesha process on each node would come up at the same time resulting in most of them having same epoch value. As a consequence, same epoch values on all the NFS-Ganesha heads resulted in NFS server sending NFS4ERR_FHEXPIRED error instead of NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID after failover. This resulted in NFSv4 clients not able to recover locks after failover. With this fix, a new option "EPOCH_EXEC" is added to '/etc/sysconfig/ganesha' to take the path of the script (default: '/bin/true') which is used to generate epoch value. For Gluster, a new script '/usr/libexec/ganesha/generate_epoch.py' is added and will be used to generate epoch value. A new helper service 'nfs-ganesha-config' added to process the init options provided in '/etc/sysconfig/ganesha' and copy the results to '/run/sysconfig/ganesha' to be used by nfs-ganesha while starting. Now, NFS-Ganesha will have unique epoch value on each of the nodes of the cluster resulting in smooth failover.	Story Points:	---
Clone Of:
Clones:	1317482 (view as bug list)		Environment:
Last Closed:	2016-06-23 05:32:22 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1188184, 1224250, 1299184, 1317482, 1317902

Description Meghana 2015-07-13 07:25:13 UTC

Description of problem:
When the epoch values are same for all the NFS-Ganesha heads, cton lock tests fail during failover

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 6 Shashank Raj 2016-05-30 11:49:32 UTC

Verified this bug with latest glusterfs-3.7.9-6 and nfs-ganesha-2.3.1-7 build and below are the observation:

the epoch value in /run/sysconfig/ganesha is different all the time:

node 1: EPOCH="-E 6290437990960529408"
node 2: EPOCH="-E 6290439785305014272"
node 3: EPOCH="-E 6290439794071699456"
node 4: EPOCH="-E 6290439789269680128"

and everytime when we restart ganesha service, the value gets changed and is different on all ganesha nodes (even when they are restarted at the same time)

node 1: EPOCH="-E 6290439159191633920"
node 2: EPOCH="-E 6290440949241151488"
node 3: EPOCH="-E 6290440949417902080"
node 4: EPOCH="-E 6290440961795751936"

Based on the above observation, marking this bug as Verified.

Comment 7 Divya 2016-06-13 10:39:36 UTC

Soumya,

Please review and sign-off the edited doc text,

Comment 8 Soumya Koduri 2016-06-13 12:13:37 UTC

The doc text looks good to me.

Comment 10 errata-xmlrpc 2016-06-23 05:32:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2016:1247