Bug 1448639
| Summary: | HA deployments can fail due to puppet race condition | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Marian Krcmarik <mkrcmari> |
| Component: | puppet-pacemaker | Assignee: | Michele Baldessari <michele> |
| Status: | CLOSED ERRATA | QA Contact: | Marian Krcmarik <mkrcmari> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 11.0 (Ocata) | CC: | dciabrin, fdinitto, jjoyce, jschluet, mbayer, mcornea, oblaut, royoung, rscarazz, sclewis, slinaber, tvignaud, ushkalim |
| Target Milestone: | ga | Keywords: | Automation, Regression, Triaged, ZStream |
| Target Release: | 11.0 (Ocata) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | puppet-pacemaker-0.5.0-5.el7ost | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-05-17 21:41:27 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Marian Krcmarik
2017-05-06 14:04:35 UTC
So this bug has been present since January but, I believe some recent change in puppet-tripleo exposed this race much more. A logstash query in upstream CI searching for this problem shows the problem to show up on the 2nd of May for the first time. In terms of how often and when we hit this we can say the following: - It affects any HA deployment - Upstream once it started showing it seemed to affect quite many CI deployments: http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22Error:%20unable%20to%20get%20cib%5C%22 Since Marian observed it downstream as well, I think we should get this in ASAP. (In reply to Michele Baldessari from comment #4) > So this bug has been present since January but, I believe some recent change > in puppet-tripleo exposed this race much more. A logstash query in upstream > CI searching for this problem shows the problem to show up on the 2nd of May > for the first time. In terms of how often and when we hit this we can say > the following: > - It affects any HA deployment > - Upstream once it started showing it seemed to affect quite many CI > deployments: > > http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message: > %5C%22Error:%20unable%20to%20get%20cib%5C%22 > > Since Marian observed it downstream as well, I think we should get this in > ASAP. Just to clarify, this issue affects all deployments and not just composable roles ones. Verified. we ran 12 deployments with the fix and did not saw a failure while without the fix we got an average of 1 failure per 4 runs. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1248 |