rubygem-staypuft: Deployment - puppet error related to /Stage[main]/Quickstack::Pacemaker::Galera/Quickstack::Pacemaker::Resource::Galera[galera]/Exec[create galera resource] Environment: openstack-foreman-installer-3.0.6-1.el7ost.noarch ruby193-rubygem-staypuft-0.5.6-1.el7ost.noarch ruby193-rubygem-foreman_openstack_simplify-0.0.6-8.el7ost.noarch rhel-osp-installer-client-0.5.3-1.el7ost.noarch openstack-puppet-modules-2014.2.7-2.el7ost.noarch rhel-osp-installer-0.5.3-1.el7ost.noarch Steps to reproduce: 1. Install rhel-osp-installer 2. Create/run Neutron deployment with 3 controllers and 2 computes Result: Puppet reports error: /usr/sbin/pcs cluster cib /tmp/galera-ra && /usr/sbin/pcs -f /tmp/galera-ra resource create galera galera enable_creation=true wsrep_cluster_address="gcomm://lb-backend-maca25400702876,lb-backend-maca25400702877,lb-backend-maca25400702875" op promote timeout=300s on-fail=block --master meta master-max=3 ordered=true && /usr/sbin/pcs cluster cib-push /tmp/galera-ra returned 1 instead of one of [0] Expected result: No such puppet error in reports.
Created attachment 969748 [details] messages and pacemaker logs from controllers
Crag, can you take a look and see if this is any fix needed on our side?
One thing that might be a clue as to the real problem from pacemaker.log1-reported_issue, although it occurs one second after the failed attempt to add the galera resource: Dec 16 15:05:54 [13567] maca25400702876.example.com cib: warning: cib_process_diff: Bad global update <diff crm_feature_set="3.0.7" digest="eadc64bb435e1aea13a01288e1499fb8"> Dec 16 15:05:54 [13567] maca25400702876.example.com cib: warning: cib_process_diff: Bad global update <diff-removed admin_epoch="0" epoch="36" num_updates="1"> Dec 16 15:05:54 [13567] maca25400702876.example.com cib: warning: cib_process_diff: Bad global update <cib num_updates="1"/> Dec 16 15:05:54 [13567] maca25400702876.example.com cib: warning: cib_process_diff: Bad global update </diff-removed> Dec 16 15:05:54 [13567] maca25400702876.example.com cib: warning: cib_process_diff: Bad global update <diff-added> Dec 16 15:05:54 [13567] maca25400702876.example.com cib: warning: cib_process_diff: Bad global update <cib epoch="36" num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2" cib-last-written="Tue Dec 16 15:05:53 2014" update-origin="lb-backend-maca25400702876" update-client="cibadmin" crm_feature_set="3.0.7" have-quorum="1" dc-uuid="3"/> Dec 16 15:05:54 [13567] maca25400702876.example.com cib: warning: cib_process_diff: Bad global update </diff-added> Dec 16 15:05:54 [13567] maca25400702876.example.com cib: warning: cib_process_diff: Bad global update </diff> For now, a workaround can be to add retry capability around creating the galera resource agent in puppet.
David, any ideas on this one?
The retry option: https://github.com/redhat-openstack/astapor/pull/435
(In reply to Crag Wolfe from comment #6) > The retry option: > https://github.com/redhat-openstack/astapor/pull/435 If this actually fixes something, we have bigger problems. I'll investigate.
(In reply to David Vossel from comment #7) > (In reply to Crag Wolfe from comment #6) > > The retry option: > > https://github.com/redhat-openstack/astapor/pull/435 > > If this actually fixes something, we have bigger problems. > > I'll investigate. wow, you guys hit a good one. I'm actually not entirely sure what to do about this yet. It appears the galera resource creation occurred during DC election. Somehow, it looks like between the time a local cib copy written to the file, the galera instance is injected into the copy, and the local cib copy is pushed back into pacemaker... there's a DC election going on. This resulted in the cib copy you were trying to push back into pacemaker being rejected. The update looked out of date because it didn't have the new DC changes. I hate to say it, but the quick fix of re-attempting the resource addition might is our best option right now. I'm going to open a pacemaker bug so we can try and come up with a better solution on our end. This should be an incredibly rare occurrence. If you all encounter this often, then we need to investigate this even further to understand why. -- David
Merged
Verified: Environment: ruby193-rubygem-staypuft-0.5.12-1.el7ost.noarch openstack-puppet-modules-2014.2.8-1.el7ost.noarch ruby193-rubygem-foreman_openstack_simplify-0.0.6-8.el7ost.noarch openstack-foreman-installer-3.0.10-2.el7ost.noarch rhel-osp-installer-0.5.5-1.el7ost.noarch rhel-osp-installer-client-0.5.5-1.el7ost.noarch The reported issue doesn't reproduce.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0156.html