Bug 2207941 - HAProxy healthchecks sometimes fail on a FIPS-enabled control plane
Summary: HAProxy healthchecks sometimes fail on a FIPS-enabled control plane
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 17.1 (Wallaby)
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: beta
: 17.1
Assignee: OSP Team
QA Contact: dabarzil
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-17 12:02 UTC by Damien Ciabrini
Modified: 2023-12-15 04:26 UTC (History)
11 users (show)

Fixed In Version: openstack-tripleo-heat-templates-14.3.1-1.20230519151008.el9ost puppet-tripleo-14.2.3-1.20230517011018.el9ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-16 01:15:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 884176 0 None MERGED Allow clustercheck to wait before finishing 2023-05-31 11:21:50 UTC
OpenStack gerrit 884178 0 None MERGED Allow clustercheck to wait before finishing 2023-05-31 11:21:51 UTC
Red Hat Issue Tracker OSP-25105 0 None None None 2023-05-17 12:02:52 UTC
Red Hat Product Errata RHEA-2023:4577 0 None None None 2023-08-16 01:15:48 UTC

Description Damien Ciabrini 2023-05-17 12:02:02 UTC
Description of problem:
On a HA control plane deployed on VMs, we are seeing a high number of healthchecks failure in HAProxy logs for all services, but most of them impact the mysql service. 

[WARNING]  (730596) : Health check for backup server mysql/controller-1.internalapi.redhat.local failed, reason: Socket error, check duration: 1ms, status: 2/3 UP.                                                  
[WARNING]  (730596) : Health check for backup server mysql/controller-1.internalapi.redhat.local succeeded, reason: Layer7 check passed, code: 200, check duration: 89ms, status: 3/3 UP.                            

This seems to be a systemic issue on the environment, which is consuming a lot of sys time (from 10 to 30 sys time in top).

We have identified that under this situation, the galera service itself is working, but the healthcheck are incorrectly parsed by HAProxy, due to what seems to be a race condition in socket closure. 

Version-Release number of selected component (if applicable):


How reproducible:
Happens very often.

Steps to Reproduce:
1. Deploy a VM control plane with FIPS enabled 

Actual results:
Some health check fails in HAProxy logs, which might impact availability of the mysql service sporadically.

Expected results:
No healthcheck should fail on the FIPS-enabled env because the galera service is working fine.

Additional info:

Comment 8 Lukas Svaty 2023-06-08 11:07:49 UTC
Raising severity due to the blocker+

Comment 9 dabarzil 2023-06-13 18:22:26 UTC
Verified on CI https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/DFG-all-unified-17.1_d-rhel-vhost-3cont_2comp-ipv4-vxlan-lvm-fips/48/
with openstack-tripleo-heat-templates-14.3.1-1.20230519151010.el9ost.noarch
puppet-tripleo-14.2.3-1.20230517011019.el9ost.noarch

Comment 17 errata-xmlrpc 2023-08-16 01:15:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577

Comment 18 Red Hat Bugzilla 2023-12-15 04:26:16 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.