Bug 1341768

Summary:	After setting up ganesha on RHEL 6, nodes remains in stopped state and grace related failures observed in pcs status
Product:	[Community] GlusterFS	Reporter:	Kaleb KEITHLEY <kkeithle>
Component:	common-ha	Assignee:	Kaleb KEITHLEY <kkeithle>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	mainline	CC:	bugs, jthottan, kkeithle, mzywusko, ndevos, rcyriac, rhinduja, skoduri, storage-qa-internal
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1341567
Clones:	1341770 (view as bug list)		Environment:
Last Closed:	2016-12-06 05:15:44 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1341567
Bug Blocks:	1341770, 1341772

Comment 1 Vijay Bellur 2016-06-01 18:51:27 UTC

REVIEW: http://review.gluster.org/14607 (common-ha: race/timing issue setting up cluster) posted (#1) for review on master by Kaleb KEITHLEY (kkeithle)

Comment 2 Vijay Bellur 2016-06-02 15:54:51 UTC

REVIEW: http://review.gluster.org/14607 (common-ha: race/timing issue setting up cluster) posted (#2) for review on master by Kaleb KEITHLEY (kkeithle)

Comment 3 Vijay Bellur 2016-06-03 12:50:31 UTC

REVIEW: http://review.gluster.org/14607 (common-ha: race/timing issue setting up cluster) posted (#3) for review on master by Kaleb KEITHLEY (kkeithle)

Comment 4 Vijay Bellur 2016-06-04 04:10:07 UTC

COMMIT: http://review.gluster.org/14607 committed in master by Atin Mukherjee (amukherj) 
------
commit 04b5886132ee0fe84011033cd2db08285cc75e31
Author: Kaleb S KEITHLEY <kkeithle>
Date:   Wed Jun 1 14:40:13 2016 -0400

    common-ha: race/timing issue setting up cluster
    
    The ganesha_grace resource agent can start before the ganesha_mon
    resource agent, with the result that the crm_attribute that
    ganesha_grace expects to find has not been created yet.
    
    This is never (never? Or just so rarely that it has never actually
    been seen during development) seen with four nodes, but with just
    two nodes it's very repeatable.
    
    Note that when long (FQDN) names are used it is not unexpected to
    see Failed Actions in the output of `pcs status`, e.g.:
    
    * nfs-grace_monitor_5000 on node1.fully.qualified.domain.name.com
    'unknown error' (1): call=20, status=complete, exitreason='none',
    last-rc-change='Wed Jun  1 12:32:32 2016', queued=0ms, exec=0ms
    * nfs-grace_monitor_5000 on node2.fully.qualified.domain.name.com
    'unknown error' (1): call=18, status=complete, exitreason='none',
    last-rc-change='Wed Jun  1 12:32:42 2016', queued=0ms, exec=0ms
    
    and as long as all the ganesha_grace_clone and cluster_ip-1
    resource agents are in Started state then this is okay.
    
    Change-Id: I726c9946ceb1ca92872b321612eb0f4c3cc039d8
    BUG: 1341768
    Signed-off-by: Kaleb S KEITHLEY <kkeithle>
    Reviewed-on: http://review.gluster.org/14607
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Atin Mukherjee <amukherj>