Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1822204

Summary: Minor updates in composable HA break due to haproxy rules being applied too late
Product: Red Hat OpenStack Reporter: Michele Baldessari <michele>
Component: openstack-tripleo-heat-templatesAssignee: Michele Baldessari <michele>
Status: CLOSED ERRATA QA Contact: pkomarov
Severity: high Docs Contact:
Priority: high    
Version: 16.0 (Train)CC: bperkins, lmiccini, mburns, pkomarov, sathlang, tvignaud
Target Milestone: betaKeywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-11.3.2-0.20200616081529.396affd.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-29 07:51:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michele Baldessari 2020-04-08 13:41:55 UTC
Description of problem:
Any role that has haproxy needs custom iptables rules that open up the traffic for all the haproxy stanzas. This is normally not spectacularly interesting or important when the role containing haproxy also contains all other controller services (mysql/redis/rabbit/etc), because those controller services open up their own ports. However in the composable HA case where databases and/or messaging is split off to a separate role these haproxy iptables rules become crucial.

In such a composable HA scenario minor updates can potentially break. Imagine the following scenario. Note that a minor update only runs the update tasks, host_prep_tasks and the docker_config tasks, aka the transient containers. 

Now imagine the following scenario:
1) Minor update on controller-2, followed by controller-1
At this point the haproxy rules have disappeared from controller-2 and controller-1 because they run on the deployment steps which are not run during minor update.
2) Minor update of controller-0
At this point any transient container that tries to update or poke the DB will be stuck with:
 2020-04-07 15:00:53.606 12 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -300 attempts left.: oslo_db.exception.DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'overcloud.internalapi.redhat.local' (timed out)
Because those haproxy ports (3306 in this specific case) will not appear until we run the converge step.

Version-Release number of selected component (if applicable):
OSP16 and OSP15 are affected. (OSP13 seems not to be affected because iptables rules were created inside haproxy_init_bundle transient container which *does* run during minor updates)

Comment 12 errata-xmlrpc 2020-07-29 07:51:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3148