Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1224618 - Ganesha server became unresponsive after successfull failover [NEEDINFO]
Ganesha server became unresponsive after successfull failover
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: nfs-ganesha (Show other bugs)
3.1
x86_64 All
high Severity high
: ---
: RHGS 3.1.0
Assigned To: Kaleb KEITHLEY
Saurabh
:
Depends On:
Blocks: 1202842
  Show dependency treegraph
 
Reported: 2015-05-25 03:03 EDT by Apeksha
Modified: 2017-03-17 06:17 EDT (History)
9 users (show)

See Also:
Fixed In Version: glusterfs-3.7.1-2
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-07-29 00:52:37 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
ndevos: needinfo? (nsathyan)


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:1495 normal SHIPPED_LIVE Important: Red Hat Gluster Storage 3.1 update 2015-07-29 04:26:26 EDT

  None (edit)
Description Apeksha 2015-05-25 03:03:30 EDT
Description of problem:
Ganesha server became unresponsive after successfull failover and io completion

Version-Release number of selected component (if applicable):
nfs-ganesha-2.2.0-0.el6.x86_64
glusterfs-3.7.0-2.el6rhs.x86_64

How reproducible:
once

Steps to Reproduce:
1. Running IO (linux untar process) on 4 node ganesha cluster
2. Rebooted nfs2, failover happened suceesfully on nfs1
3. IO resumed after around 7 min and also completes
4. nfs1 server times out for the showmount command but ganesha is still running on that server
5. showmount command is succeessfull from nfs3 and nfs4 servers
6. On client the df -h command hangs


Actual results: showmount command times out on nfs1 server and  On client the df -h command hangs

Expected results: showmount commanf must be successfull and df -h command on client must not hang

Additional info:
/var/log/ganesha.log:

24/05/2015 03:24:39 : epoch 555f34d6 : nfs1 : ganesha.nfsd-8909[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat
24/05/2015 03:25:54 : epoch 555f34d6 : nfs1 : ganesha.nfsd-8909[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat
24/05/2015 03:29:08 : epoch 555f34d6 : nfs1 : ganesha.nfsd-8909[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat

/var/log/messages:

May 24 11:00:10 nfs1 crmd[9408]:   notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ]
May 24 11:00:10 nfs1 crmd[9408]:   notice: run_graph: Transition 176 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-107.bz2): Complete
May 24 11:00:10 nfs1 crmd[9408]:   notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
May 24 11:00:10 nfs1 pengine[9406]:   notice: process_pe_message: Calculated Transition 176: /var/lib/pacemaker/pengine/pe-input-107.bz2
May 24 11:00:19 nfs1 lrmd[9401]:   notice: operation_finished: nfs-mon_monitor_10000:21301:stderr [ Error: Resource does not exist. ]
May 24 11:00:31 nfs1 lrmd[9401]:   notice: operation_finished: nfs-mon_monitor_10000:21458:stderr [ Error: Resource does not exist. ]
May 24 11:00:42 nfs1 lrmd[9401]:   notice: operation_finished: nfs-mon_monitor_10000:21515:stderr [ Error: Resource does not exist. ]


pcs status:
Full list of resources:

 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ nfs1 nfs3 nfs4 ]
     Stopped: [ nfs2 ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ nfs1 nfs3 nfs4 ]
     Stopped: [ nfs2 ]
 nfs1-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs1 
 nfs1-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs1 
 nfs2-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs1 
 nfs2-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs1 
 nfs3-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs3 
 nfs3-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs3 
 nfs4-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs4 
 nfs4-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs4 

/tmp/gfapi.log

[2015-05-22 16:19:46.448068] I [MSGID: 109036] [dht-common.c:6689:dht_log_new_layout_for_dir_selfheal] 0-vol1-dht: Setting layout of /t2/linux-2.6.31.1/arch/mips/include/asm/mach-rm with [Subvol_name: vol1-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ],
[2015-05-22 16:19:46.869975] I [MSGID: 109036] [dht-common.c:6689:dht_log_new_layout_for_dir_selfheal] 0-vol1-dht: Setting layout of /t2/linux-2.6.31.1/arch/mips/include/asm/mach-sibyte with [Subvol_name: vol1-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ],
[2015-05-22 16:19:47.209560] I [MSGID: 109036] [dht-common.c:6689:dht_log_new_layout_for_dir_selfheal] 0-vol1-dht: Setting layout of /t2/linux-2.6.31.1/arch/mips/include/asm/mach-tx39xx with [Subvol_name: vol1-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ],
Comment 2 Apeksha 2015-05-25 03:11:11 EDT
sosreports : http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1224618/
Comment 5 Saurabh 2015-07-07 05:59:16 EDT
Ganesha server was not unresponsive post failover and I/O also finished post failover.
Comment 6 errata-xmlrpc 2015-07-29 00:52:37 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html

Note You need to log in before you can comment on or make changes to this bug.