Bug 1390962 - HAProxy doesn't load the new configuration after scaling out the role running the Openstack API services
Summary: HAProxy doesn't load the new configuration after scaling out the role running...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-tripleoclient
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: rc
: 10.0 (Newton)
Assignee: Carlos Camacho
QA Contact: Marius Cornea
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-02 10:01 UTC by Marius Cornea
Modified: 2016-12-14 16:27 UTC (History)
11 users (show)

Fixed In Version: python-tripleoclient-5.3.0-7.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-14 16:27:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
haproxy.cfg (12.80 KB, text/plain)
2016-11-02 10:05 UTC, Marius Cornea
no flags Details
HAProxy stats (147.96 KB, text/html)
2016-11-02 10:06 UTC, Marius Cornea
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1639302 0 None None None 2016-11-07 14:51:15 UTC
Launchpad 1640175 0 None None None 2016-11-09 06:52:40 UTC
OpenStack gerrit 395749 0 None None None 2016-11-09 18:14:26 UTC
Red Hat Product Errata RHEA-2016:2948 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 10 enhancement update 2016-12-14 19:55:27 UTC

Description Marius Cornea 2016-11-02 10:01:55 UTC
Description of problem:
HAProxy doesn't load the new configuration after scaling out the role running the Openstack API services. I'm deploying the Openstack API services on a custom role and starting with a deployment with 3 nodes running this role. After the deployment is done I'm trying to add an additional node and I can see haproxy.cfg gets updates with all 4 nodes but HAProxy stats page only shows the initial 3 backends.

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-5.0.0-1.1.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy overcloud with 3 nodes running the Openstack API services. Deploy command and environment files:
http://paste.openstack.org/show/587619/

2. Increment ServiceApiCount and rerun deploy command

Actual results:
Update completes ok and haproxy.cfg contains the new backend node but HAProxy stats page reports only the initial 3 nodes as backends instead of 4.

Expected results:
The new HAProxy config gets loaded.

Additional info:
'systemctl reload haproxy' is needed on all controllers so it loads the new config.

This symptom looks pretty similar to the one in bug 1320379 but I need someone to confirm this. Thanks.

Comment 1 Marius Cornea 2016-11-02 10:05:46 UTC
Created attachment 1216454 [details]
haproxy.cfg

Attaching the haproxy.cfg and the HAProxy stats page.

Comment 2 Marius Cornea 2016-11-02 10:06:29 UTC
Created attachment 1216455 [details]
HAProxy stats

Comment 3 Carlos Camacho 2016-11-02 15:46:27 UTC
Deploying it my local env to confirm.

Comment 6 Carlos Camacho 2016-11-03 15:52:09 UTC
Just omit last 2 comments, summarized here:

Scaling out a controller node haproxy is not reloaded correctly, the new config is correct but needs to be reloaded.
`systemctl reload haproxy` fix the issue after the update.

Added here:
https://github.com/openstack/tripleo-heat-templates/commit/843d25af04f58d033e12c2c3d619303d7fd8bb02

Removed here (in favor of solving https://bugzilla.redhat.com/show_bug.cgi?id=1321036):
https://github.com/openstack/tripleo-heat-templates/commit/6e56f873148784ff34babf62a8ccc718e7d789d3

The restart should be executed in the converge step but not able to find it for a non-upgrade task after finishing the deployment, this is only about deploy again the overcloud adding 1 more node..
`systemctl reload haproxy` executed only when injecting the TLS cert and when running pacemaker_maintenance_mode.sh (called from pre_puppet_pacemaker.yaml)

To summarize: Before was executed in post-deployment step now it's in a pre-deployment step...

Comment 7 Carlos Camacho 2016-11-04 08:32:25 UTC
Added an upstream patch for testing.

Comment 8 Carlos Camacho 2016-11-07 14:51:15 UTC
After testing the patch, hitting this new upstream issue when started Mistral Workflow, it fails due to a malformed template.

https://bugs.launchpad.net/tripleo/+bug/1639302

Comment 9 Carlos Camacho 2016-11-09 06:52:40 UTC
Marius the upstream patch I believe will be landed soon, https://review.openstack.org/#/c/393644/ mind to check it?

Comment 10 Carlos Camacho 2016-11-09 06:53:48 UTC
Also added a new reference to this current error in Launchpad.

Comment 11 Marius Cornea 2016-11-09 15:19:27 UTC
(In reply to Carlos Camacho from comment #9)
> Marius the upstream patch I believe will be landed soon,
> https://review.openstack.org/#/c/393644/ mind to check it?

Looks good, I tested it on my environment and reloads the HAProxy new config.

Comment 14 errata-xmlrpc 2016-12-14 16:27:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html


Note You need to log in before you can comment on or make changes to this bug.