Description of problem: Ganesha server became unresponsive after successfull failover and io completion Version-Release number of selected component (if applicable): nfs-ganesha-2.2.0-0.el6.x86_64 glusterfs-3.7.0-2.el6rhs.x86_64 How reproducible: once Steps to Reproduce: 1. Running IO (linux untar process) on 4 node ganesha cluster 2. Rebooted nfs2, failover happened suceesfully on nfs1 3. IO resumed after around 7 min and also completes 4. nfs1 server times out for the showmount command but ganesha is still running on that server 5. showmount command is succeessfull from nfs3 and nfs4 servers 6. On client the df -h command hangs Actual results: showmount command times out on nfs1 server and On client the df -h command hangs Expected results: showmount commanf must be successfull and df -h command on client must not hang Additional info: /var/log/ganesha.log: 24/05/2015 03:24:39 : epoch 555f34d6 : nfs1 : ganesha.nfsd-8909[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat 24/05/2015 03:25:54 : epoch 555f34d6 : nfs1 : ganesha.nfsd-8909[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat 24/05/2015 03:29:08 : epoch 555f34d6 : nfs1 : ganesha.nfsd-8909[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat /var/log/messages: May 24 11:00:10 nfs1 crmd[9408]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ] May 24 11:00:10 nfs1 crmd[9408]: notice: run_graph: Transition 176 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-107.bz2): Complete May 24 11:00:10 nfs1 crmd[9408]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] May 24 11:00:10 nfs1 pengine[9406]: notice: process_pe_message: Calculated Transition 176: /var/lib/pacemaker/pengine/pe-input-107.bz2 May 24 11:00:19 nfs1 lrmd[9401]: notice: operation_finished: nfs-mon_monitor_10000:21301:stderr [ Error: Resource does not exist. ] May 24 11:00:31 nfs1 lrmd[9401]: notice: operation_finished: nfs-mon_monitor_10000:21458:stderr [ Error: Resource does not exist. ] May 24 11:00:42 nfs1 lrmd[9401]: notice: operation_finished: nfs-mon_monitor_10000:21515:stderr [ Error: Resource does not exist. ] pcs status: Full list of resources: Clone Set: nfs-mon-clone [nfs-mon] Started: [ nfs1 nfs3 nfs4 ] Stopped: [ nfs2 ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ nfs1 nfs3 nfs4 ] Stopped: [ nfs2 ] nfs1-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs1 nfs1-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs1 nfs2-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs1 nfs2-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs1 nfs3-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs3 nfs3-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs3 nfs4-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs4 nfs4-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs4 /tmp/gfapi.log [2015-05-22 16:19:46.448068] I [MSGID: 109036] [dht-common.c:6689:dht_log_new_layout_for_dir_selfheal] 0-vol1-dht: Setting layout of /t2/linux-2.6.31.1/arch/mips/include/asm/mach-rm with [Subvol_name: vol1-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2015-05-22 16:19:46.869975] I [MSGID: 109036] [dht-common.c:6689:dht_log_new_layout_for_dir_selfheal] 0-vol1-dht: Setting layout of /t2/linux-2.6.31.1/arch/mips/include/asm/mach-sibyte with [Subvol_name: vol1-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2015-05-22 16:19:47.209560] I [MSGID: 109036] [dht-common.c:6689:dht_log_new_layout_for_dir_selfheal] 0-vol1-dht: Setting layout of /t2/linux-2.6.31.1/arch/mips/include/asm/mach-tx39xx with [Subvol_name: vol1-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ],
sosreports : http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1224618/
Ganesha server was not unresponsive post failover and I/O also finished post failover.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days