Bug 1174955
| Summary: | rubygem-staypuft: Deployment - puppet error related to /Stage[main]/Quickstack::Pacemaker::Galera/Quickstack::Pacemaker::Resource::Galera[galera]/Exec[create galera resource] | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Alexander Chuzhoy <sasha> | ||||
| Component: | openstack-foreman-installer | Assignee: | Crag Wolfe <cwolfe> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Alexander Chuzhoy <sasha> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | unspecified | CC: | cwolfe, dvossel, jguiditt, mburns, mlopes, morazi, rhos-maint, yeylon | ||||
| Target Milestone: | ga | ||||||
| Target Release: | Installer | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | openstack-foreman-installer-3.0.8-1.el7ost | Doc Type: | Bug Fix | ||||
| Doc Text: |
This bug fix addresses a rare concurrency issue with Pacemaker that causes the Galera resource creation process to fail.
This fix adds a retry to the command, with a sleep function. This is expected to avoid the concurrency issue and result in successful resource creation.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2015-02-09 15:18:20 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1177026 | ||||||
| Attachments: |
|
||||||
|
Description
Alexander Chuzhoy
2014-12-16 20:33:00 UTC
Created attachment 969748 [details]
messages and pacemaker logs from controllers
Crag, can you take a look and see if this is any fix needed on our side? One thing that might be a clue as to the real problem from pacemaker.log1-reported_issue, although it occurs one second after the failed attempt to add the galera resource: Dec 16 15:05:54 [13567] maca25400702876.example.com cib: warning: cib_process_diff: Bad global update <diff crm_feature_set="3.0.7" digest="eadc64bb435e1aea13a01288e1499fb8"> Dec 16 15:05:54 [13567] maca25400702876.example.com cib: warning: cib_process_diff: Bad global update <diff-removed admin_epoch="0" epoch="36" num_updates="1"> Dec 16 15:05:54 [13567] maca25400702876.example.com cib: warning: cib_process_diff: Bad global update <cib num_updates="1"/> Dec 16 15:05:54 [13567] maca25400702876.example.com cib: warning: cib_process_diff: Bad global update </diff-removed> Dec 16 15:05:54 [13567] maca25400702876.example.com cib: warning: cib_process_diff: Bad global update <diff-added> Dec 16 15:05:54 [13567] maca25400702876.example.com cib: warning: cib_process_diff: Bad global update <cib epoch="36" num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2" cib-last-written="Tue Dec 16 15:05:53 2014" update-origin="lb-backend-maca25400702876" update-client="cibadmin" crm_feature_set="3.0.7" have-quorum="1" dc-uuid="3"/> Dec 16 15:05:54 [13567] maca25400702876.example.com cib: warning: cib_process_diff: Bad global update </diff-added> Dec 16 15:05:54 [13567] maca25400702876.example.com cib: warning: cib_process_diff: Bad global update </diff> For now, a workaround can be to add retry capability around creating the galera resource agent in puppet. David, any ideas on this one? The retry option: https://github.com/redhat-openstack/astapor/pull/435 (In reply to Crag Wolfe from comment #6) > The retry option: > https://github.com/redhat-openstack/astapor/pull/435 If this actually fixes something, we have bigger problems. I'll investigate. (In reply to David Vossel from comment #7) > (In reply to Crag Wolfe from comment #6) > > The retry option: > > https://github.com/redhat-openstack/astapor/pull/435 > > If this actually fixes something, we have bigger problems. > > I'll investigate. wow, you guys hit a good one. I'm actually not entirely sure what to do about this yet. It appears the galera resource creation occurred during DC election. Somehow, it looks like between the time a local cib copy written to the file, the galera instance is injected into the copy, and the local cib copy is pushed back into pacemaker... there's a DC election going on. This resulted in the cib copy you were trying to push back into pacemaker being rejected. The update looked out of date because it didn't have the new DC changes. I hate to say it, but the quick fix of re-attempting the resource addition might is our best option right now. I'm going to open a pacemaker bug so we can try and come up with a better solution on our end. This should be an incredibly rare occurrence. If you all encounter this often, then we need to investigate this even further to understand why. -- David Merged Verified: Environment: ruby193-rubygem-staypuft-0.5.12-1.el7ost.noarch openstack-puppet-modules-2014.2.8-1.el7ost.noarch ruby193-rubygem-foreman_openstack_simplify-0.0.6-8.el7ost.noarch openstack-foreman-installer-3.0.10-2.el7ost.noarch rhel-osp-installer-0.5.5-1.el7ost.noarch rhel-osp-installer-client-0.5.5-1.el7ost.noarch The reported issue doesn't reproduce. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0156.html |