Bug 1189921 - [HA] start/stop ordering constraint are not correct and can cause cluster to fail on shutdown
Summary: [HA] start/stop ordering constraint are not correct and can cause cluster to ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-foreman-installer
Version: 6.0 (Juno)
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: z2
: Installer
Assignee: Jason Guiditta
QA Contact: Ofer Blaut
URL:
Whiteboard:
Depends On: 1188949
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-02-05 20:06 UTC by Mike Burns
Modified: 2022-07-09 07:09 UTC (History)
11 users (show)

Fixed In Version: openstack-foreman-installer-3.0.17-1.el7ost
Doc Type: Bug Fix
Doc Text:
An error exists in Pacemaker's start/stop ordering contraints that causes services on VIP nodes to shutdown before other nodes in a cluster. The services on other nodes fail to shutdown, which causes a cluster shutdown failure. Likewise, the VIP nodes sometimes would not start before other nodes, which caused failure cluster start-up failure. This fix corrects the ordering constraints and now full cluster shutdown and start-ups work correctly.
Clone Of: 1188949
Environment:
Last Closed: 2015-04-07 15:08:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:0791 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux OpenStack Platform Installer update 2015-04-07 19:07:29 UTC

Description Mike Burns 2015-02-05 20:06:22 UTC
+++ This bug was initially created as a clone of Bug #1188949 +++

There is currently an error in the way pacemaker start/stop ordering constraint are expressed.

This can potentially lead to a cluster meltdown when issuing:

pcs cluster stop --all

because some services will fail to stop. The service will try to contact another API to notify the shutdown, but the VIP for the API is already down at that stage.

Workaround:

pcs cluster disable keystone
wait for keystone to be Stopped
pcs cluster stop --all

on start:

pcs cluster start --all
pcs enable keystone

This specific sequence affects only the process to put all controller nodes in shutdown at once. It does NOT affect reboot or shutdown of one node at a time (for upgrade purposes for example), hence the medium severity.

I am currently working on a new constraint set that should prevent this problem, in the meantime this should be documented in the release notes for GA.

--- Additional comment from Fabio Massimo Di Nitto on 2015-02-04 02:14:32 EST ---

Reference arch: https://docs.google.com/a/redhat.com/document/d/1iO41-wcY81xKn46UDkjZ-HGFR80ARXwEElRqd2HtDI8/edit# is now updated with the new constraint order in v0.5

For expert users:

- in previous setups the VIPs and lb-haproxy-clone start order was expressed as:

  pcs constraint order start lb-haproxy-clone then vip-...

- this needs to be reversed by:

  pcs constraint delete order-lb-haproxy-clone-vip-...-mandatory

  pcs constraint order start vip-... then lb-haproxy-clone

Replace "..." with the current name of the vip- service and it has to be done for all vip.

This change can be applied on a live cluster (no need to stop any service to perform this change).

Comment 1 Jason Guiditta 2015-02-06 21:15:04 UTC
Posted: https://github.com/redhat-openstack/astapor/pull/472

Comment 2 Jason Guiditta 2015-02-12 22:16:41 UTC
Additional patch here:
https://github.com/redhat-openstack/astapor/pull/479

Comment 3 Fabio Massimo Di Nitto 2015-02-13 19:31:19 UTC
>   pcs constraint order start vip-... then lb-haproxy-clone
> 
> Replace "..." with the current name of the vip- service and it has to be
> done for all vip.
> 
> This change can be applied on a live cluster (no need to stop any service to
> perform this change).

this turns out not to be the correct command. The correct command is:

pcs constraint order start vip-... then lb-haproxy-clone kind=Optional

(note the extra option)

This change is required to avoid a chain of events in start/stop all services when a VIP needs to move from node to another.

Comment 4 Jason Guiditta 2015-02-16 22:08:23 UTC
(In reply to Fabio Massimo Di Nitto from comment #3)
> >   pcs constraint order start vip-... then lb-haproxy-clone
> > 
> > Replace "..." with the current name of the vip- service and it has to be
> > done for all vip.
> > 
> > This change can be applied on a live cluster (no need to stop any service to
> > perform this change).
> 
> this turns out not to be the correct command. The correct command is:
> 
> pcs constraint order start vip-... then lb-haproxy-clone kind=Optional
> 
> (note the extra option)
> 
> This change is required to avoid a chain of events in start/stop all
> services when a VIP needs to move from node to another.

Fabio, I believe we now have this working as described above.  Can you just verify this is indeed the recommendation we should follow for ref arch at this point, or is there further thought/discussion needed on this topic?  Just as an example, our current constraints with the latest patch look like this:

Ordering Constraints:
  start fs-varlibglanceimages-clone then start glance-registry-clone (kind:Mandatory)
  start glance-registry-clone then start glance-api-clone (kind:Mandatory)
  start openstack-nova-consoleauth-clone then start openstack-nova-novncproxy-clone (kind:Mandatory)
  start openstack-nova-novncproxy-clone then start openstack-nova-api-clone (kind:Mandatory)
  start openstack-nova-api-clone then start openstack-nova-scheduler-clone (kind:Mandatory)
  start openstack-nova-scheduler-clone then start openstack-nova-conductor-clone (kind:Mandatory)
  start heat-api-clone then start heat-api-cfn-clone (kind:Mandatory)
  start heat-api-cfn-clone then start heat-api-cloudwatch-clone (kind:Mandatory)
  start heat-api-cloudwatch-clone then start openstack-heat-engine (kind:Mandatory)
  start keystone-clone then start neutron-server-clone (kind:Mandatory)
  start keystone-clone then start openstack-nova-consoleauth-clone (kind:Mandatory)
  start keystone-clone then start glance-registry-clone (kind:Mandatory)
  start neutron-scale-clone then start neutron-ovs-cleanup-clone (kind:Mandatory)
  start neutron-ovs-cleanup-clone then start neutron-netns-cleanup-clone (kind:Mandatory)
  start neutron-netns-cleanup-clone then start neutron-openvswitch-agent-clone (kind:Mandatory)
  start neutron-openvswitch-agent-clone then start neutron-dhcp-agent-clone (kind:Mandatory)
  start neutron-dhcp-agent-clone then start neutron-l3-agent-clone (kind:Mandatory)
  start neutron-l3-agent-clone then start neutron-metadata-agent-clone (kind:Mandatory)
  start ip-neutron-adm-192.168.201.105 then start haproxy-clone (kind:Optional)
  start ip-nova-pub-192.168.201.63 then start haproxy-clone (kind:Optional)
  start ip-glance-adm-192.168.201.25 then start haproxy-clone (kind:Optional)
  start galera-master then start keystone-clone (kind:Mandatory)
  start rabbitmq-server-clone then start keystone-clone (kind:Mandatory)
  start memcached-clone then start keystone-clone (kind:Mandatory)
  start ip-heat-adm-192.168.201.115 then start haproxy-clone (kind:Optional)
  start ip-glance-prv-192.168.201.24 then start haproxy-clone (kind:Optional)
  start ip-nova-adm-192.168.201.65 then start haproxy-clone (kind:Optional)
  start ip-amqp-pub-192.168.201.13 then start haproxy-clone (kind:Optional)
  start ip-keystone-pub-192.168.201.33 then start haproxy-clone (kind:Optional)
  start ip-keystone-prv-192.168.201.34 then start haproxy-clone (kind:Optional)
  start haproxy-clone then start keystone-clone (kind:Mandatory)
  start ip-keystone-adm-192.168.201.35 then start haproxy-clone (kind:Optional)
  start ip-galera-pub-192.168.201.7 then start haproxy-clone (kind:Optional)
  start ip-heat-pub-192.168.201.113 then start haproxy-clone (kind:Optional)
  start ip-glance-pub-192.168.201.23 then start haproxy-clone (kind:Optional)
  start ip-neutron-pub-192.168.201.103 then start haproxy-clone (kind:Optional)
  start ip-heat-prv-192.168.201.114 then start haproxy-clone (kind:Optional)
  start ip-neutron-prv-192.168.201.104 then start haproxy-clone (kind:Optional)
  start ip-nova-prv-192.168.201.64 then start haproxy-clone (kind:Optional)
Colocation Constraints:
  ip-neutron-adm-192.168.201.105 with ip-neutron-pub-192.168.201.103 (score:INFINITY)
  ip-neutron-prv-192.168.201.104 with ip-neutron-pub-192.168.201.103 (score:INFINITY)
  ip-keystone-adm-192.168.201.35 with ip-keystone-pub-192.168.201.33 (score:INFINITY)
  ip-keystone-prv-192.168.201.34 with ip-keystone-pub-192.168.201.33 (score:INFINITY)
  ip-glance-adm-192.168.201.25 with ip-glance-pub-192.168.201.23 (score:INFINITY)
  ip-glance-prv-192.168.201.24 with ip-glance-pub-192.168.201.23 (score:INFINITY)
  glance-registry-clone with fs-varlibglanceimages-clone (score:INFINITY)
  glance-api-clone with glance-registry-clone (score:INFINITY)
  ip-nova-adm-192.168.201.65 with ip-nova-pub-192.168.201.63 (score:INFINITY)
  ip-nova-prv-192.168.201.64 with ip-nova-pub-192.168.201.63 (score:INFINITY)
  openstack-nova-novncproxy-clone with openstack-nova-consoleauth-clone (score:INFINITY)
  openstack-nova-api-clone with openstack-nova-novncproxy-clone (score:INFINITY)
  openstack-nova-scheduler-clone with openstack-nova-api-clone (score:INFINITY)
  openstack-nova-scheduler-clone with openstack-nova-conductor-clone (score:INFINITY)
  ip-heat-adm-192.168.201.115 with ip-heat-pub-192.168.201.113 (score:INFINITY)
  ip-heat-prv-192.168.201.114 with ip-heat-pub-192.168.201.113 (score:INFINITY)
  ip-heat_cfn-adm-192.168.201.125 with ip-heat_cfn-pub-192.168.201.123 (score:INFINITY)
  ip-heat_cfn-prv-192.168.201.124 with ip-heat_cfn-pub-192.168.201.123 (score:INFINITY)
  heat-api-cloudwatch-clone with heat-api-cfn-clone (score:INFINITY)
  openstack-heat-engine with heat-api-cloudwatch-clone (score:INFINITY)
  heat-api-cfn-clone with heat-api-clone (score:INFINITY)
  neutron-ovs-cleanup-clone with neutron-scale-clone (score:INFINITY)
  neutron-netns-cleanup-clone with neutron-ovs-cleanup-clone (score:INFINITY)
  neutron-openvswitch-agent-clone with neutron-netns-cleanup-clone (score:INFINITY)
  neutron-dhcp-agent-clone with neutron-openvswitch-agent-clone (score:INFINITY)
  neutron-l3-agent-clone with neutron-dhcp-agent-clone (score:INFINITY)
  neutron-metadata-agent-clone with neutron-l3-agent-clone (score:INFINITY)
  ip-nova-adm-192.168.201.65 with haproxy-clone (score:INFINITY)
  ip-heat-adm-192.168.201.115 with haproxy-clone (score:INFINITY)
  ip-glance-prv-192.168.201.24 with haproxy-clone (score:INFINITY)
  ip-neutron-adm-192.168.201.105 with haproxy-clone (score:INFINITY)
  ip-glance-adm-192.168.201.25 with haproxy-clone (score:INFINITY)
  ip-keystone-pub-192.168.201.33 with haproxy-clone (score:INFINITY)
  ip-keystone-prv-192.168.201.34 with haproxy-clone (score:INFINITY)
  ip-amqp-pub-192.168.201.13 with haproxy-clone (score:INFINITY)
  ip-keystone-adm-192.168.201.35 with haproxy-clone (score:INFINITY)
  ip-galera-pub-192.168.201.7 with haproxy-clone (score:INFINITY)
  ip-heat-pub-192.168.201.113 with haproxy-clone (score:INFINITY)
  ip-glance-pub-192.168.201.23 with haproxy-clone (score:INFINITY)
  ip-nova-pub-192.168.201.63 with haproxy-clone (score:INFINITY)
  ip-neutron-pub-192.168.201.103 with haproxy-clone (score:INFINITY)
  ip-heat-prv-192.168.201.114 with haproxy-clone (score:INFINITY)
  ip-neutron-prv-192.168.201.104 with haproxy-clone (score:INFINITY)
  ip-nova-prv-192.168.201.64 with haproxy-clone (score:INFINITY)

Comment 5 Fabio Massimo Di Nitto 2015-02-17 05:27:42 UTC
It appears to be correct.

Comment 6 Jason Guiditta 2015-02-19 20:37:26 UTC
Merged

Comment 8 Leonid Natapov 2015-03-14 22:30:08 UTC
openstack-foreman-installer-3.0.17-1.el7ost
------------------------------------------

Location Constraints:
Ordering Constraints:
  start openstack-nova-consoleauth-clone then start openstack-nova-novncproxy-clone (kind:Mandatory)
  start openstack-nova-novncproxy-clone then start openstack-nova-api-clone (kind:Mandatory)
  start openstack-nova-api-clone then start openstack-nova-scheduler-clone (kind:Mandatory)
  start openstack-nova-scheduler-clone then start openstack-nova-conductor-clone (kind:Mandatory)
  start cinder-scheduler-clone then start cinder-volume (kind:Mandatory)
  start cinder-api-clone then start cinder-scheduler-clone (kind:Mandatory)
  start neutron-scale-clone then start neutron-ovs-cleanup-clone (kind:Mandatory)
  start heat-api-clone then start heat-api-cfn-clone (kind:Mandatory)
  start heat-api-cfn-clone then start heat-api-cloudwatch-clone (kind:Mandatory)
  start heat-api-cloudwatch-clone then start openstack-heat-engine (kind:Mandatory)
  start keystone-clone then start neutron-server-clone (kind:Mandatory)
  start keystone-clone then start openstack-nova-consoleauth-clone (kind:Mandatory)
  start keystone-clone then start cinder-api-clone (kind:Mandatory)
  start keystone-clone then start openstack-ceilometer-central (kind:Mandatory)
  start mongod-clone then start openstack-ceilometer-central (kind:Mandatory)
  start openstack-ceilometer-central then start openstack-ceilometer-collector-clone (kind:Mandatory)
  start openstack-ceilometer-collector-clone then start openstack-ceilometer-api-clone (kind:Mandatory)
  start openstack-ceilometer-api-clone then start ceilometer-delay-clone (kind:Mandatory)
  start ceilometer-delay-clone then start openstack-ceilometer-alarm-evaluator-clone (kind:Mandatory)
  start openstack-ceilometer-alarm-evaluator-clone then start openstack-ceilometer-alarm-notifier-clone (kind:Mandatory)
  start openstack-ceilometer-alarm-notifier-clone then start openstack-ceilometer-notification-clone (kind:Mandatory)
  start neutron-ovs-cleanup-clone then start neutron-netns-cleanup-clone (kind:Mandatory)
  start neutron-netns-cleanup-clone then start neutron-openvswitch-agent-clone (kind:Mandatory)
  start neutron-openvswitch-agent-clone then start neutron-dhcp-agent-clone (kind:Mandatory)
  start neutron-dhcp-agent-clone then start neutron-l3-agent-clone (kind:Mandatory)
  start neutron-l3-agent-clone then start neutron-metadata-agent-clone (kind:Mandatory)
  start ip-nova-pub-192.168.0.35 then start haproxy-clone (kind:Optional)
  start ip-neutron-pub-192.168.0.32 then start haproxy-clone (kind:Optional)
  start ip-horizon-prv-192.168.0.24 then start haproxy-clone (kind:Optional)
  start galera-master then start keystone-clone (kind:Mandatory)
  start rabbitmq-server-clone then start keystone-clone (kind:Mandatory)
  start memcached-clone then start keystone-clone (kind:Mandatory)
  start ip-ceilometer-adm-192.168.0.2 then start haproxy-clone (kind:Optional)
  start ip-heat-adm-192.168.0.17 then start haproxy-clone (kind:Optional)
  start ip-neutron-adm-192.168.0.30 then start haproxy-clone (kind:Optional)
  start ip-keystone-adm-192.168.0.26 then start haproxy-clone (kind:Optional)
  start ip-ceilometer-pub-192.168.0.4 then start haproxy-clone (kind:Optional)
  start ip-ceilometer-prv-192.168.0.3 then start haproxy-clone (kind:Optional)
  start ip-horizon-adm-192.168.0.23 then start haproxy-clone (kind:Optional)
  start ip-cinder-prv-192.168.0.6 then start haproxy-clone (kind:Optional)
  start ip-cinder-pub-192.168.0.12 then start haproxy-clone (kind:Optional)
  start ip-amqp-pub-192.168.0.36 then start haproxy-clone (kind:Optional)
  start ip-keystone-pub-192.168.0.28 then start haproxy-clone (kind:Optional)
  start ip-keystone-prv-192.168.0.27 then start haproxy-clone (kind:Optional)
  start haproxy-clone then start keystone-clone (kind:Mandatory)
start ip-nova-adm-192.168.0.33 then start haproxy-clone (kind:Optional)
  start ip-galera-pub-192.168.0.13 then start haproxy-clone (kind:Optional)
  start ip-horizon-pub-192.168.0.25 then start haproxy-clone (kind:Optional)
  start ip-heat-pub-192.168.0.19 then start haproxy-clone (kind:Optional)
  start ip-cinder-adm-192.168.0.5 then start haproxy-clone (kind:Optional)
  start ip-heat-prv-192.168.0.18 then start haproxy-clone (kind:Optional)
  start ip-neutron-prv-192.168.0.31 then start haproxy-clone (kind:Optional)
  start ip-nova-prv-192.168.0.34 then start haproxy-clone (kind:Optional)
Colocation Constraints:
  ip-ceilometer-prv-192.168.0.3 with ip-ceilometer-pub-192.168.0.4 (score:INFINITY)
  ip-horizon-adm-192.168.0.23 with ip-horizon-pub-192.168.0.25 (score:INFINITY)
  ip-neutron-adm-192.168.0.30 with ip-neutron-pub-192.168.0.32 (score:INFINITY)
  ip-horizon-prv-192.168.0.24 with ip-horizon-pub-192.168.0.25 (score:INFINITY)
  ip-ceilometer-adm-192.168.0.2 with ip-ceilometer-pub-192.168.0.4 (score:INFINITY)
  ip-neutron-prv-192.168.0.31 with ip-neutron-pub-192.168.0.32 (score:INFINITY)
  ip-keystone-adm-192.168.0.26 with ip-keystone-pub-192.168.0.28 (score:INFINITY)
  ip-keystone-prv-192.168.0.27 with ip-keystone-pub-192.168.0.28 (score:INFINITY)
  ip-nova-adm-192.168.0.33 with ip-nova-pub-192.168.0.35 (score:INFINITY)
  ip-nova-prv-192.168.0.34 with ip-nova-pub-192.168.0.35 (score:INFINITY)
  openstack-nova-novncproxy-clone with openstack-nova-consoleauth-clone (score:INFINITY)
  openstack-nova-api-clone with openstack-nova-novncproxy-clone (score:INFINITY)
  openstack-nova-scheduler-clone with openstack-nova-api-clone (score:INFINITY)
  openstack-nova-scheduler-clone with openstack-nova-conductor-clone (score:INFINITY)
  ip-cinder-prv-192.168.0.6 with ip-cinder-pub-192.168.0.12 (score:INFINITY)
  ip-cinder-adm-192.168.0.5 with ip-cinder-pub-192.168.0.12 (score:INFINITY)
  cinder-volume with cinder-scheduler-clone (score:INFINITY)
  cinder-scheduler-clone with cinder-api-clone (score:INFINITY)
  ip-heat-adm-192.168.0.17 with ip-heat-pub-192.168.0.19 (score:INFINITY)
  ip-heat-prv-192.168.0.18 with ip-heat-pub-192.168.0.19 (score:INFINITY)
  ip-heat_cfn-prv-192.168.0.21 with ip-heat_cfn-pub-192.168.0.22 (score:INFINITY)
  ip-heat_cfn-adm-192.168.0.20 with ip-heat_cfn-pub-192.168.0.22 (score:INFINITY)
  heat-api-cloudwatch-clone with heat-api-cfn-clone (score:INFINITY)
  openstack-heat-engine with heat-api-cloudwatch-clone (score:INFINITY)
  heat-api-cfn-clone with heat-api-clone (score:INFINITY)
  neutron-ovs-cleanup-clone with neutron-scale-clone (score:INFINITY)
  neutron-netns-cleanup-clone with neutron-ovs-cleanup-clone (score:INFINITY)
  neutron-openvswitch-agent-clone with neutron-netns-cleanup-clone (score:INFINITY)
  neutron-dhcp-agent-clone with neutron-openvswitch-agent-clone (score:INFINITY)
  neutron-l3-agent-clone with neutron-dhcp-agent-clone (score:INFINITY)
  neutron-metadata-agent-clone with neutron-l3-agent-clone (score:INFINITY)
  ip-neutron-adm-192.168.0.30 with haproxy-clone (score:INFINITY)
  ip-ceilometer-prv-192.168.0.3 with haproxy-clone (score:INFINITY)
  ip-horizon-adm-192.168.0.23 with haproxy-clone (score:INFINITY)
  ip-neutron-pub-192.168.0.32 with haproxy-clone (score:INFINITY)
  ip-keystone-adm-192.168.0.26 with haproxy-clone (score:INFINITY)
  ip-heat-adm-192.168.0.17 with haproxy-clone (score:INFINITY)
  ip-ceilometer-adm-192.168.0.2 with haproxy-clone (score:INFINITY)
  ip-ceilometer-pub-192.168.0.4 with haproxy-clone (score:INFINITY)
  ip-horizon-prv-192.168.0.24 with haproxy-clone (score:INFINITY)
  ip-cinder-pub-192.168.0.12 with haproxy-clone (score:INFINITY)
  ip-amqp-pub-192.168.0.36 with haproxy-clone (score:INFINITY)
  ip-keystone-pub-192.168.0.28 with haproxy-clone (score:INFINITY)
  ip-keystone-prv-192.168.0.27 with haproxy-clone (score:INFINITY)
  ip-nova-adm-192.168.0.33 with haproxy-clone (score:INFINITY)
  ip-galera-pub-192.168.0.13 with haproxy-clone (score:INFINITY)
  ip-horizon-pub-192.168.0.25 with haproxy-clone (score:INFINITY)
  ip-heat-pub-192.168.0.19 with haproxy-clone (score:INFINITY)
  ip-cinder-adm-192.168.0.5 with haproxy-clone (score:INFINITY)
  ip-nova-pub-192.168.0.35 with haproxy-clone (score:INFINITY)
  ip-heat-prv-192.168.0.18 with haproxy-clone (score:INFINITY)
  ip-neutron-prv-192.168.0.31 with haproxy-clone (score:INFINITY)
  ip-cinder-prv-192.168.0.6 with haproxy-clone (score:INFINITY)
  ip-nova-prv-192.168.0.34 with haproxy-clone (score:INFINITY)

Comment 11 errata-xmlrpc 2015-04-07 15:08:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0791.html


Note You need to log in before you can comment on or make changes to this bug.