Bug 1341772 - After setting up ganesha on RHEL 6, nodes remains in stopped state and grace related failures observed in pcs status
Summary: After setting up ganesha on RHEL 6, nodes remains in stopped state and grace...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: common-ha
Version: 3.7.11
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Kaleb KEITHLEY
QA Contact:
URL:
Whiteboard:
Depends On: 1341567 1341768 1341770
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-06-01 18:02 UTC by Kaleb KEITHLEY
Modified: 2016-07-20 13:55 UTC (History)
9 users (show)

Fixed In Version: glusterfs-3.7.13
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1341770
Environment:
Last Closed: 2016-07-19 10:27:39 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Comment 1 Vijay Bellur 2016-06-01 20:46:08 UTC
REVIEW: http://review.gluster.org/14610 (common-ha: race/timing issue setting up cluster) posted (#1) for review on release-3.7 by Kaleb KEITHLEY (kkeithle)

Comment 2 Vijay Bellur 2016-06-02 15:59:11 UTC
REVIEW: http://review.gluster.org/14610 (common-ha: race/timing issue setting up cluster) posted (#2) for review on release-3.7 by Kaleb KEITHLEY (kkeithle)

Comment 3 Vijay Bellur 2016-06-03 12:52:23 UTC
REVIEW: http://review.gluster.org/14610 (common-ha: race/timing issue setting up cluster) posted (#3) for review on release-3.7 by Kaleb KEITHLEY (kkeithle)

Comment 4 Vijay Bellur 2016-06-24 16:42:17 UTC
COMMIT: http://review.gluster.org/14610 committed in release-3.7 by Kaleb KEITHLEY (kkeithle) 
------
commit 6a9a48e4d70a56167c0f1e8432bba9050264ab97
Author: Kaleb S KEITHLEY <kkeithle>
Date:   Wed Jun 1 16:43:12 2016 -0400

    common-ha: race/timing issue setting up cluster
    
    The ganesha_grace resource agent can start before the ganesha_mon
    resource agent, with the result that the crm_attribute that
    ganesha_grace expects to find has not been created yet.
    
    This is never (never? Or just so rarely that it has never actually
    been seen during development) seen with four nodes, but with just
    two nodes it's very repeatable.
    
    Note that when long (FQDN) names are used it is not unexpected to
    see Failed Actions in the output of `pcs status`, e.g.:
    
    * nfs-grace_monitor_5000 on node1.fully.qualified.domain.name.com
    'unknown error' (1): call=20, status=complete, exitreason='none',
    last-rc-change='Wed Jun  1 12:32:32 2016', queued=0ms, exec=0ms
    * nfs-grace_monitor_5000 on node2.fully.qualified.domain.name.com
    'unknown error' (1): call=18, status=complete, exitreason='none',
    last-rc-change='Wed Jun  1 12:32:42 2016', queued=0ms, exec=0ms
    
    and as long as all the ganesha_grace_clone and cluster_ip-1
    resource agents are in Started state then this is okay.
    
    backport master:
    > http://review.gluster.org/14607
    > BUG: 1341768
    release-3.8
    > http://review.gluster.org/14609
    > BUG: 1341770
    
    Change-Id: I726c9946ceb1ca92872b321612eb0f4c3cc039d8
    BUG: 1341772
    Signed-off-by: Kaleb S KEITHLEY <kkeithle>
    Reviewed-on: http://review.gluster.org/14610
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: jiffin tony Thottan <jthottan>

Comment 5 Kaushal 2016-07-20 13:55:16 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.13, please open a new bug report.

glusterfs-3.7.13 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-users/2016-July/027604.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.