Bug 1822204 - Minor updates in composable HA break due to haproxy rules being applied too late
Summary: Minor updates in composable HA break due to haproxy rules being applied too late
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.0 (Train)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: beta
: 16.1 (Train on RHEL 8.2)
Assignee: Michele Baldessari
QA Contact: pkomarov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-08 13:41 UTC by Michele Baldessari
Modified: 2020-07-29 07:51 UTC (History)
6 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.3.2-0.20200616081529.396affd.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-29 07:51:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1871646 0 None None None 2020-04-08 13:59:38 UTC
OpenStack gerrit 718201 0 None MERGED Move the haproxy iptables rules creation to host_prep_tasks 2020-08-09 06:24:59 UTC
Red Hat Product Errata RHBA-2020:3148 0 None None None 2020-07-29 07:51:53 UTC

Description Michele Baldessari 2020-04-08 13:41:55 UTC
Description of problem:
Any role that has haproxy needs custom iptables rules that open up the traffic for all the haproxy stanzas. This is normally not spectacularly interesting or important when the role containing haproxy also contains all other controller services (mysql/redis/rabbit/etc), because those controller services open up their own ports. However in the composable HA case where databases and/or messaging is split off to a separate role these haproxy iptables rules become crucial.

In such a composable HA scenario minor updates can potentially break. Imagine the following scenario. Note that a minor update only runs the update tasks, host_prep_tasks and the docker_config tasks, aka the transient containers. 

Now imagine the following scenario:
1) Minor update on controller-2, followed by controller-1
At this point the haproxy rules have disappeared from controller-2 and controller-1 because they run on the deployment steps which are not run during minor update.
2) Minor update of controller-0
At this point any transient container that tries to update or poke the DB will be stuck with:
 2020-04-07 15:00:53.606 12 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -300 attempts left.: oslo_db.exception.DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'overcloud.internalapi.redhat.local' (timed out)
Because those haproxy ports (3306 in this specific case) will not appear until we run the converge step.

Version-Release number of selected component (if applicable):
OSP16 and OSP15 are affected. (OSP13 seems not to be affected because iptables rules were created inside haproxy_init_bundle transient container which *does* run during minor updates)

Comment 12 errata-xmlrpc 2020-07-29 07:51:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3148


Note You need to log in before you can comment on or make changes to this bug.