Bug 1272872 - nfs-ganesha: the nfs-ganesha server is not responding even though the server is alive
Summary: nfs-ganesha: the nfs-ganesha server is not responding even though the server ...
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: ganesha-nfs
Version: 3.7.5
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Soumya Koduri
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-10-19 06:27 UTC by Saurabh
Modified: 2017-03-08 11:01 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-03-08 11:01:40 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Saurabh 2015-10-19 06:27:21 UTC
Description of problem:
I had a testsetup with nfs-ganesha running on 4 nodes with HA capabilities. 
Now, I start I/O from one of the nodes and finds that I/O is moving ahead after sometime. This is because the nfs-ganesha process is not responding back, whereas the server process is still running and hence the failover is also not happening.

Altogether the I/O gets stuck.

Version-Release number of selected component (if applicable):
glusterfs-3.7.5-1.el7.x86_64
nfs-ganesha-2.3-0.rc6.el7.centos.x86_64

How reproducible:
happens in first instance of execution 

Steps to Reproduce:
1. setup a 4 node cluster of glusterfs and 4 node nfs-ganesha front end
2. mount the volume over nfs-ganesha with vers=4 on a client
3. start executing arequal tool on the mount-point

Actual results:
step 3 result,
the I/O is stuck and nfs-ganesha is not responding,
even a showmount on the nfs-ganesha server results in rpc-timeout,
# showmount -e localhost
rpc mount export: RPC: Timed out

the strace on nfs-ganesha does not move beyond to display the calls,
# strace -p 19773
Process 19773 attached
futex(0x7efde8b819d0, FUTEX_WAIT, 19803, NULL^CProcess 19773 detached
 <detached ...>

The ganesha.log says,
17/10/2015 01:48:58 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[main] nfs_Start_threads :THREAD :EVENT :General fridge was started successfully
17/10/2015 01:48:58 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[main] nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
17/10/2015 01:48:58 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[main] nfs_start :NFS STARTUP :EVENT :             NFS SERVER INITIALIZED
17/10/2015 01:48:58 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[main] nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
17/10/2015 01:49:58 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now NOT IN GRACE
17/10/2015 01:52:01 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume vol2 exported at : '/'
17/10/2015 03:49:17 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat
17/10/2015 03:50:33 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat
17/10/2015 03:53:48 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat
17/10/2015 03:55:03 : epoch 56215bb1 : vm1 : ganesha.nfsd-19773[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat


Finally the failover does not happen and the I/O does not move ahead


Expected results:
nfs-ganesha should respond back or get killed, as in case of process not running, HA capabilities can be used and I/O shall move ahead

Additional info:

Comment 3 Kaushal 2017-03-08 11:01:40 UTC
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.


Note You need to log in before you can comment on or make changes to this bug.