Bug 1224618

Summary:	Ganesha server became unresponsive after successfull failover
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Apeksha <akhakhar>
Component:	nfs-ganesha	Assignee:	Kaleb KEITHLEY <kkeithle>
Status:	CLOSED ERRATA	QA Contact:	Saurabh <saujain>
Severity:	high	Docs Contact:
Priority:	high
Version:	rhgs-3.1	CC:	annair, asrivast, mmadhusu, mzywusko, ndevos, nlevinki, nsathyan, pasik, saujain
Target Milestone:	---
Target Release:	RHGS 3.1.0
Hardware:	x86_64
OS:	All
Whiteboard:
Fixed In Version:	glusterfs-3.7.1-2	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-07-29 04:52:37 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1202842

Description Apeksha 2015-05-25 07:03:30 UTC

Description of problem:
Ganesha server became unresponsive after successfull failover and io completion

Version-Release number of selected component (if applicable):
nfs-ganesha-2.2.0-0.el6.x86_64
glusterfs-3.7.0-2.el6rhs.x86_64

How reproducible:
once

Steps to Reproduce:
1. Running IO (linux untar process) on 4 node ganesha cluster
2. Rebooted nfs2, failover happened suceesfully on nfs1
3. IO resumed after around 7 min and also completes
4. nfs1 server times out for the showmount command but ganesha is still running on that server
5. showmount command is succeessfull from nfs3 and nfs4 servers
6. On client the df -h command hangs


Actual results: showmount command times out on nfs1 server and  On client the df -h command hangs

Expected results: showmount commanf must be successfull and df -h command on client must not hang

Additional info:
/var/log/ganesha.log:

24/05/2015 03:24:39 : epoch 555f34d6 : nfs1 : ganesha.nfsd-8909[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat
24/05/2015 03:25:54 : epoch 555f34d6 : nfs1 : ganesha.nfsd-8909[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat
24/05/2015 03:29:08 : epoch 555f34d6 : nfs1 : ganesha.nfsd-8909[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat

/var/log/messages:

May 24 11:00:10 nfs1 crmd[9408]:   notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ]
May 24 11:00:10 nfs1 crmd[9408]:   notice: run_graph: Transition 176 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-107.bz2): Complete
May 24 11:00:10 nfs1 crmd[9408]:   notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
May 24 11:00:10 nfs1 pengine[9406]:   notice: process_pe_message: Calculated Transition 176: /var/lib/pacemaker/pengine/pe-input-107.bz2
May 24 11:00:19 nfs1 lrmd[9401]:   notice: operation_finished: nfs-mon_monitor_10000:21301:stderr [ Error: Resource does not exist. ]
May 24 11:00:31 nfs1 lrmd[9401]:   notice: operation_finished: nfs-mon_monitor_10000:21458:stderr [ Error: Resource does not exist. ]
May 24 11:00:42 nfs1 lrmd[9401]:   notice: operation_finished: nfs-mon_monitor_10000:21515:stderr [ Error: Resource does not exist. ]


pcs status:
Full list of resources:

 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ nfs1 nfs3 nfs4 ]
     Stopped: [ nfs2 ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ nfs1 nfs3 nfs4 ]
     Stopped: [ nfs2 ]
 nfs1-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs1 
 nfs1-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs1 
 nfs2-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs1 
 nfs2-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs1 
 nfs3-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs3 
 nfs3-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs3 
 nfs4-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs4 
 nfs4-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs4 

/tmp/gfapi.log

[2015-05-22 16:19:46.448068] I [MSGID: 109036] [dht-common.c:6689:dht_log_new_layout_for_dir_selfheal] 0-vol1-dht: Setting layout of /t2/linux-2.6.31.1/arch/mips/include/asm/mach-rm with [Subvol_name: vol1-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ],
[2015-05-22 16:19:46.869975] I [MSGID: 109036] [dht-common.c:6689:dht_log_new_layout_for_dir_selfheal] 0-vol1-dht: Setting layout of /t2/linux-2.6.31.1/arch/mips/include/asm/mach-sibyte with [Subvol_name: vol1-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ],
[2015-05-22 16:19:47.209560] I [MSGID: 109036] [dht-common.c:6689:dht_log_new_layout_for_dir_selfheal] 0-vol1-dht: Setting layout of /t2/linux-2.6.31.1/arch/mips/include/asm/mach-tx39xx with [Subvol_name: vol1-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ],

Comment 2 Apeksha 2015-05-25 07:11:11 UTC

sosreports : http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1224618/

Comment 5 Saurabh 2015-07-07 09:59:16 UTC

Ganesha server was not unresponsive post failover and I/O also finished post failover.

Comment 6 errata-xmlrpc 2015-07-29 04:52:37 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html

Comment 7 Red Hat Bugzilla 2023-09-14 02:59:34 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days