1341567 – After setting up ganesha on RHEL 6, nodes remains in stopped state and grace related failures observed in pcs status

Bug 1341567 - After setting up ganesha on RHEL 6, nodes remains in stopped state and grace related failures observed in pcs status

Summary: After setting up ganesha on RHEL 6, nodes remains in stopped state and grace...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nfs-ganesha
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.1.3
Assignee:	Kaleb KEITHLEY
QA Contact:	Shashank Raj
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1311817 1341768 1341770 1341772
TreeView+	depends on / blocked

Reported:	2016-06-01 10:02 UTC by Shashank Raj
Modified:	2016-11-08 03:52 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.7.9-8
Doc Type:	Bug Fix
Doc Text:	When fully qualified domain names are used, the output of pcs status is expected to contain Failed Actions, for example: Failed Actions: * nfs-grace_monitor_5000 on node1.fully.qualified.domain.name.com 'unknown error' (1): call=20, status=complete, exitreason='none', last-rc-change='Wed Jun 1 12:32:32 2016', queued=0ms, exec=0ms * nfs-grace_monitor_5000 on node2.fully.qualified.domain.name.com 'unknown error' (1): call=18, status=complete, exitreason='none', last-rc-change='Wed Jun 1 12:32:42 2016', queued=0ms, exec=0ms This is "normal" as long as all the nfs-grace-clone and cluster_ip-1 resource agents are in the Started state.
Clone Of:
Clones:	1341768 (view as bug list)
Environment:
Last Closed:	2016-06-23 05:25:23 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1240	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 Update 3	2016-06-23 08:51:28 UTC

Description Shashank Raj 2016-06-01 10:02:11 UTC

Description of problem:

After setting up ganesha on RHEL 6, nodes remains in stopped state and  grace related failures observed in pcs status

Version-Release number of selected component (if applicable):

glusterfs-3.7.9-7
nfs-ganesha-2.3.1-7

How reproducible:

Always

Steps to Reproduce:

1. Try to setup ganesha on RHEL 6 platform
2. Observe that after setting up, nodes of the cluster remains in stopped state, failures related to nfs-grace_start are observed in pcs status.
3. Assigned VIPs are not pingable and hence the volume can't be mounted through VIP.
4. Following error messages are seen in /var/log/messages:

Jun  1 20:54:08 dhcp43-119 pengine[22877]:  warning: Forcing nfs-grace-clone away from dhcp42-33.lab.eng.blr.redhat.com after 1000000 failures (max=1000000)
Jun  1 20:54:08 dhcp43-119 pengine[22877]:  warning: Forcing nfs-grace-clone away from dhcp42-33.lab.eng.blr.redhat.com after 1000000 failures (max=1000000)
Jun  1 20:54:08 dhcp43-119 pengine[22877]:  warning: Forcing nfs-grace-clone away from dhcp43-119.lab.eng.blr.redhat.com after 1000000 failures (max=1000000)
Jun  1 20:54:08 dhcp43-119 pengine[22877]:  warning: Forcing nfs-grace-clone away from dhcp43-119.lab.eng.blr.redhat.com after 1000000 failures (max=1000000)
Jun  1 20:54:08 dhcp43-119 pengine[22877]:   notice: Start   dhcp43-119.lab.eng.blr.redhat.com-cluster_ip-1#011(dhcp43-119.lab.eng.blr.redhat.com - blocked)
Jun  1 20:54:08 dhcp43-119 pengine[22877]:   notice: Start   dhcp42-33.lab.eng.blr.redhat.com-cluster_ip-1#011(dhcp42-33.lab.eng.blr.redhat.com - blocked)


Actual results:

After setting up ganesha on RHEL 6, nodes remains in stopped state and  grace related failures observed in pcs status

Expected results:

Setup should be successful and no failures should be observed.

Additional info:

Comment 2 Shashank Raj 2016-06-01 10:08:17 UTC

sosreports are placed at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1341567

Comment 6 Kaleb KEITHLEY 2016-06-01 15:48:27 UTC

Seems to be a timing issue, race condition.

The ganesha_grace RA's start method is running before ganesha_mon RA's monitor method has set the resources that the grace start method queries.

Comment 7 Atin Mukherjee 2016-06-02 06:13:24 UTC

Upstream mainline patch : http://review.gluster.org/#/c/14607 posted for review.

Comment 9 Kaleb KEITHLEY 2016-06-02 10:06:56 UTC

The downstream patch is probably more relevant here.

https://code.engineering.redhat.com/gerrit/75679

And yes, I want the doc text here.

Comment 11 Shashank Raj 2016-06-04 07:12:00 UTC

Verified this bug with latest glusterfs-3.7.9-8 build and its working as expected.

On both RHEL 6 and RHEL 7 platform, after setting up ganesha, no grace related failures are seen and all the nodes shows in started state.

Based on the above observation, marking this bug as Verified.

Comment 13 Kaleb KEITHLEY 2016-06-09 09:59:43 UTC

Okay, yes, a known issue or documenting it is fine.

Comment 15 errata-xmlrpc 2016-06-23 05:25:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240

Note You need to log in before you can comment on or make changes to this bug.