Bug 1242358

Summary: Different epoch values for each of NFS-Ganesha heads
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Meghana <mmadhusu>
Component: nfs-ganeshaAssignee: Soumya Koduri <skoduri>
Status: CLOSED ERRATA QA Contact: Shashank Raj <sraj>
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.1CC: asrivast, divya, kkeithle, ndevos, nlevinki, rhinduja, sankarshan, sashinde, skoduri
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.1.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: nfs-ganesha-2.3.1-5 Doc Type: Bug Fix
Doc Text:
Previously, while configuring nfs-ganesha cluster, there were cases where in nfs-ganesha process on each node would come up at the same time resulting in most of them having same epoch value. As a consequence, same epoch values on all the NFS-Ganesha heads resulted in NFS server sending NFS4ERR_FHEXPIRED error instead of NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID after failover. This resulted in NFSv4 clients not able to recover locks after failover. With this fix, a new option "EPOCH_EXEC" is added to '/etc/sysconfig/ganesha' to take the path of the script (default: '/bin/true') which is used to generate epoch value. For Gluster, a new script '/usr/libexec/ganesha/generate_epoch.py' is added and will be used to generate epoch value. A new helper service 'nfs-ganesha-config' added to process the init options provided in '/etc/sysconfig/ganesha' and copy the results to '/run/sysconfig/ganesha' to be used by nfs-ganesha while starting. Now, NFS-Ganesha will have unique epoch value on each of the nodes of the cluster resulting in smooth failover.
Story Points: ---
Clone Of:
: 1317482 (view as bug list) Environment:
Last Closed: 2016-06-23 05:32:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1188184, 1224250, 1299184, 1317482, 1317902    

Description Meghana 2015-07-13 07:25:13 UTC
Description of problem:
When the epoch values are same for all the NFS-Ganesha heads, cton lock tests fail during failover

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 6 Shashank Raj 2016-05-30 11:49:32 UTC
Verified this bug with latest glusterfs-3.7.9-6 and nfs-ganesha-2.3.1-7 build and below are the observation:

the epoch value in /run/sysconfig/ganesha is different all the time:

node 1: EPOCH="-E 6290437990960529408"
node 2: EPOCH="-E 6290439785305014272"
node 3: EPOCH="-E 6290439794071699456"
node 4: EPOCH="-E 6290439789269680128"

and everytime when we restart ganesha service, the value gets changed and is different on all ganesha nodes (even when they are restarted at the same time)

node 1: EPOCH="-E 6290439159191633920"
node 2: EPOCH="-E 6290440949241151488"
node 3: EPOCH="-E 6290440949417902080"
node 4: EPOCH="-E 6290440961795751936"

Based on the above observation, marking this bug as Verified.

Comment 7 Divya 2016-06-13 10:39:36 UTC
Soumya,

Please review and sign-off the edited doc text,

Comment 8 Soumya Koduri 2016-06-13 12:13:37 UTC
The doc text looks good to me.

Comment 10 errata-xmlrpc 2016-06-23 05:32:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2016:1247