Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1236374 - Overcloud Heat stops working when turning off one controller in HA setup
Overcloud Heat stops working when turning off one controller in HA setup
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates (Show other bugs)
7.0 (Kilo)
Unspecified Unspecified
high Severity high
: ga
: Director
Assigned To: Giulio Fidente
Marius Cornea
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-06-28 07:07 EDT by Marius Cornea
Modified: 2015-08-05 09:57 EDT (History)
6 users (show)

See Also:
Fixed In Version: openstack-tripleo-heat-templates-0.8.6-26.el7ost
Doc Type: Bug Fix
Doc Text:
Heat services restarted on unrelated redis VIP relocation. In Pacemaker, the Heat resource failed to restart due to dependencies on the Ceilometer resource, which failed to restart on relocation of the redis VIP due to clustering failures. This fix stops Heat from restarting when Ceilometer restarts.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-08-05 09:57:22 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
heat logs (2.25 KB, application/x-gzip)
2015-06-28 07:07 EDT, Marius Cornea
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
OpenStack gerrit 197510 None None None Never
OpenStack gerrit 198022 None None None Never
Red Hat Product Errata RHEA-2015:1549 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform director Release 2015-08-05 13:49:10 EDT

  None (edit)
Description Marius Cornea 2015-06-28 07:07:34 EDT
Created attachment 1044009 [details]
heat logs

Description of problem:
Overcloud Heat stops working when turning off one controller in HA setup. I'm using a virt env with 3 x controllers. When turnning off first controller all heat related API stop responding and they show as down in haproxy. 

Version-Release number of selected component (if applicable):
openstack-puppet-modules-2015.1.7-5.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy overcloud with 3 controllers
2. Turn off one of the controllers
3. Check if overcloud Heat is working

Actual results:
None of the heat APIs are responding. 

Expected results:
Heat API continue working.

Additional info:
Attaching relevant logs and service statuses.
Comment 3 Mike Burns 2015-06-28 07:43:39 EDT
fwiw, this works for me.  I turned off 1 of my 3 controllers, and heat stack-list still came back for me.  pcs status showed the other 2 still up, heat command worked (definitely against overcloud since there was no stack listed).  heat services in pacemaker showed 2 of 3 servers still alive.  

Restarting the down host resulted in the host rejoining the cluster with no errors.
Comment 4 Marius Cornea 2015-06-28 09:36:15 EDT
You are correct. I did some more checks and it looks like this is only generated by a specific controller in the cluster. To reproduce it start with a fresh deployment and turn off overcloud-controller-0. I was able to see this happening 2 times.
Comment 5 Giulio Fidente 2015-07-01 05:46:17 EDT
Heat stop/start is triggered by cascading effects of the Redis VIP relocating.

This can be avoided fixing the colocation and ordering constraints as per suggestion from David:

- delete the promote redis master then start vip ordering constraint
- delete the colocate vip with redis-master instance colocation constraint
- delete the start vip then start openstack-ceilometer-central-clone order constraint.
Comment 6 David Vossel 2015-07-01 10:55:05 EDT
(In reply to Giulio Fidente from comment #5)
> Heat stop/start is triggered by cascading effects of the Redis VIP
> relocating.
> 
> This can be avoided fixing the colocation and ordering constraints as per
> suggestion from David:
> 
> - delete the promote redis master then start vip ordering constraint
> - delete the colocate vip with redis-master instance colocation constraint
> - delete the start vip then start openstack-ceilometer-central-clone order
> constraint.

and adding these constraints.

- pcs constraint order start ip-192.0.2.7 then haproxy-clone kind=Optional
- pcs constraint colocation add ip-192.0.2.7 with haproxy-clone
- pcs constraint order start ip-192.0.2.6 then haproxy-clone kind=Optional
- pcs constraint colocation add ip-192.0.2.6 with haproxy-clone


Just as an update though. we are discussing whether HAproxy should be involved with redis or not.  It is possible this recommendation could change. Don't consider any of this finalized yet.
Comment 10 errata-xmlrpc 2015-08-05 09:57:22 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1549

Note You need to log in before you can comment on or make changes to this bug.