Bug 1272872

Summary:	nfs-ganesha: the nfs-ganesha server is not responding even though the server is alive
Product:	[Community] GlusterFS	Reporter:	Saurabh <saujain>
Component:	ganesha-nfs	Assignee:	Soumya Koduri <skoduri>
Status:	CLOSED EOL	QA Contact:
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	3.7.5	CC:	jthottan, kkeithle, mzywusko, ndevos, skoduri
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-03-08 11:01:40 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Saurabh 2015-10-19 06:27:21 UTC

Description of problem:
I had a testsetup with nfs-ganesha running on 4 nodes with HA capabilities. 
Now, I start I/O from one of the nodes and finds that I/O is moving ahead after sometime. This is because the nfs-ganesha process is not responding back, whereas the server process is still running and hence the failover is also not happening.

Altogether the I/O gets stuck.

Version-Release number of selected component (if applicable):
glusterfs-3.7.5-1.el7.x86_64
nfs-ganesha-2.3-0.rc6.el7.centos.x86_64

How reproducible:
happens in first instance of execution 

Steps to Reproduce:
1. setup a 4 node cluster of glusterfs and 4 node nfs-ganesha front end
2. mount the volume over nfs-ganesha with vers=4 on a client
3. start executing arequal tool on the mount-point

Actual results:
step 3 result,
the I/O is stuck and nfs-ganesha is not responding,
even a showmount on the nfs-ganesha server results in rpc-timeout,
# showmount -e localhost
rpc mount export: RPC: Timed out

the strace on nfs-ganesha does not move beyond to display the calls,
# strace -p 19773
Process 19773 attached
futex(0x7efde8b819d0, FUTEX_WAIT, 19803, NULL^CProcess 19773 detached
 <detached ...>

The ganesha.log says,
17/10/2015 01:48:58 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[main] nfs_Start_threads :THREAD :EVENT :General fridge was started successfully
17/10/2015 01:48:58 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[main] nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
17/10/2015 01:48:58 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[main] nfs_start :NFS STARTUP :EVENT :             NFS SERVER INITIALIZED
17/10/2015 01:48:58 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[main] nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
17/10/2015 01:49:58 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now NOT IN GRACE
17/10/2015 01:52:01 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume vol2 exported at : '/'
17/10/2015 03:49:17 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat
17/10/2015 03:50:33 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat
17/10/2015 03:53:48 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat
17/10/2015 03:55:03 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat


Finally the failover does not happen and the I/O does not move ahead


Expected results:
nfs-ganesha should respond back or get killed, as in case of process not running, HA capabilities can be used and I/O shall move ahead

Additional info:

Comment 3 Kaushal 2017-03-08 11:01:40 UTC

This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.