Bug 1646332 - Pacemaker resource constraints cause API outage during maintenance
Summary: Pacemaker resource constraints cause API outage during maintenance
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z10
: 10.0 (Newton)
Assignee: Lukas Bezdicka
QA Contact: Raviv Bar-Tal
URL:
Whiteboard:
Depends On:
Blocks: 1585770 1647438
TreeView+ depends on / blocked
 
Reported: 2018-11-05 12:04 UTC by Lukas Bezdicka
Modified: 2022-03-13 17:03 UTC (History)
8 users (show)

Fixed In Version: openstack-tripleo-heat-templates-5.3.10-22.el7ost
Doc Type: Bug Fix
Doc Text:
This update fixes an issue that caused OpenStack API outages and control plane loss during execution of the "pcs cluster stop" command, greatly reducing the incidence of failed requests during minor updates. Note: In manual maintenance procedures, operators should migrate the VIPs off the affected node first.
Clone Of:
: 1647438 (view as bug list)
Environment:
Last Closed: 2019-01-16 17:09:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-13847 0 None None None 2022-03-13 17:03:10 UTC
Red Hat Product Errata RHBA-2019:0055 0 None None None 2019-01-16 17:09:52 UTC

Description Lukas Bezdicka 2018-11-05 12:04:51 UTC
In more targeted testing of OSP10 minor update we found out there is an issue with way pacemaker services are created and how we defined the order constraints. We will have to provide fix for all the releases of OSP, but for OSP10 we probably will go with workaround I created and tested. Issue is pacemaker resource order constraints are kind Optional which means they do not apply on shutdown of pacemaker cluster on the node. This causes haproxy to be stopped before the VIP is migrated away from the node and subsequent APIs failure. Migration of VIPs will be applied in yum_update.sh script but we should also update the reboot documentation/procedure for the operators.

Comment 10 errata-xmlrpc 2019-01-16 17:09:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0055


Note You need to log in before you can comment on or make changes to this bug.