REVIEW: http://review.gluster.org/14610 (common-ha: race/timing issue setting up cluster) posted (#1) for review on release-3.7 by Kaleb KEITHLEY (kkeithle)
REVIEW: http://review.gluster.org/14610 (common-ha: race/timing issue setting up cluster) posted (#2) for review on release-3.7 by Kaleb KEITHLEY (kkeithle)
REVIEW: http://review.gluster.org/14610 (common-ha: race/timing issue setting up cluster) posted (#3) for review on release-3.7 by Kaleb KEITHLEY (kkeithle)
COMMIT: http://review.gluster.org/14610 committed in release-3.7 by Kaleb KEITHLEY (kkeithle) ------ commit 6a9a48e4d70a56167c0f1e8432bba9050264ab97 Author: Kaleb S KEITHLEY <kkeithle> Date: Wed Jun 1 16:43:12 2016 -0400 common-ha: race/timing issue setting up cluster The ganesha_grace resource agent can start before the ganesha_mon resource agent, with the result that the crm_attribute that ganesha_grace expects to find has not been created yet. This is never (never? Or just so rarely that it has never actually been seen during development) seen with four nodes, but with just two nodes it's very repeatable. Note that when long (FQDN) names are used it is not unexpected to see Failed Actions in the output of `pcs status`, e.g.: * nfs-grace_monitor_5000 on node1.fully.qualified.domain.name.com 'unknown error' (1): call=20, status=complete, exitreason='none', last-rc-change='Wed Jun 1 12:32:32 2016', queued=0ms, exec=0ms * nfs-grace_monitor_5000 on node2.fully.qualified.domain.name.com 'unknown error' (1): call=18, status=complete, exitreason='none', last-rc-change='Wed Jun 1 12:32:42 2016', queued=0ms, exec=0ms and as long as all the ganesha_grace_clone and cluster_ip-1 resource agents are in Started state then this is okay. backport master: > http://review.gluster.org/14607 > BUG: 1341768 release-3.8 > http://review.gluster.org/14609 > BUG: 1341770 Change-Id: I726c9946ceb1ca92872b321612eb0f4c3cc039d8 BUG: 1341772 Signed-off-by: Kaleb S KEITHLEY <kkeithle> Reviewed-on: http://review.gluster.org/14610 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: jiffin tony Thottan <jthottan>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.13, please open a new bug report. glusterfs-3.7.13 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://www.gluster.org/pipermail/gluster-users/2016-July/027604.html [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user