Bug 1336947

Summary: [NFS-Ganesha] : stonith-enabled option not set with new versions of cman,pacemaker,corosync and pcs
Product: [Community] GlusterFS Reporter: Kaleb KEITHLEY <kkeithle>
Component: common-haAssignee: Kaleb KEITHLEY <kkeithle>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.8.0CC: asoman, bugs, jthottan, kgaillot, kkeithle, ndevos, nlevinki, skoduri, sraj, storage-qa-internal
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.8rc2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1336945
: 1336948 (view as bug list) Environment:
Last Closed: 2016-06-16 14:06:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1334092, 1336945    
Bug Blocks: 1336948    

Comment 1 Vijay Bellur 2016-05-17 21:39:50 UTC
REVIEW: http://review.gluster.org/14405 (common-ha: stonith-enabled option set error in new pacemaker) posted (#1) for review on release-3.8 by Kaleb KEITHLEY (kkeithle)

Comment 2 Vijay Bellur 2016-05-19 10:17:23 UTC
COMMIT: http://review.gluster.org/14405 committed in release-3.8 by Kaleb KEITHLEY (kkeithle) 
------
commit 8603b0ae4e82d5d8f3ad129c490a0c717aaadcb4
Author: Kaleb S KEITHLEY <kkeithle>
Date:   Tue May 17 17:35:02 2016 -0400

    common-ha: stonith-enabled option set error in new pacemaker
    
    Setting the option too early results in an error in newer versions
    of pacemaker. Postpone setting the option in order for it to succeed.
    
    N.B. We do not use a fencing agent. Yes, we know this is "not supported."
    
    Backport from mainline
    > http://review.gluster.org/#/c/14404/
    > BUG: 1336945
    > Change-Id: I86953fdd67e6736294dbd2d0795611837188bd9d
    
    Change-Id: I402992bcb90a92dbcc915a75fe03b25221625e98
    BUG: 1336947
    Signed-off-by: Kaleb S KEITHLEY <kkeithle>
    Reviewed-on: http://review.gluster.org/14405
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>

Comment 3 Ken Gaillot 2016-05-19 14:50:42 UTC
Your "too early" comment finally rang a bell: What you're seeing is an unfortunately necessary side effect of a fix for an issue in 6.7. The situation was discussed in BZ#1320740.

The bottom line is, yes, you have to wait a short while after startup before querying or writing to the CIB in RHEL 6.8, so the cluster has time to elect a DC. The easiest way is just to loop if the option change fails, or to loop until "crmadmin --dc_lookup --timeout=5000" exits 0 before doing the option change.

Comment 4 Vijay Bellur 2016-05-19 17:07:02 UTC
REVIEW: http://review.gluster.org/14427 (common-ha: wait for cluster to elect DC before accessing CIB) posted (#1) for review on release-3.8 by Kaleb KEITHLEY (kkeithle)

Comment 5 Vijay Bellur 2016-05-24 09:36:21 UTC
COMMIT: http://review.gluster.org/14427 committed in release-3.8 by Kaleb KEITHLEY (kkeithle) 
------
commit cfb0ef35483eb0656ca190c1665e68c5ff448092
Author: Kaleb S KEITHLEY <kkeithle>
Date:   Thu May 19 13:04:50 2016 -0400

    common-ha: wait for cluster to elect DC before accessing CIB
    
    access attempts, e.g. `pcs property set stonith-enabled=false`
    will fail (or time out) if attempted "too early", i.e. before
    the cluster has elected its DC.
    
    see https://bugzilla.redhat.com/show_bug.cgi?id=1336947#c3 and
    https://bugzilla.redhat.com/show_bug.cgi?id=1320740
    
    Change-Id: Ifc0aa7ce652c1da339b9eb8fe17e40e8a09b1096
    BUG: 1336947
    Signed-off-by: Kaleb S KEITHLEY <kkeithle>
    Reviewed-on: http://review.gluster.org/14427
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: soumya k <skoduri>
    Reviewed-by: jiffin tony Thottan <jthottan>

Comment 6 Niels de Vos 2016-06-16 14:06:58 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user