Description of problem: On a HA control plane deployed on VMs, we are seeing a high number of healthchecks failure in HAProxy logs for all services, but most of them impact the mysql service. [WARNING] (730596) : Health check for backup server mysql/controller-1.internalapi.redhat.local failed, reason: Socket error, check duration: 1ms, status: 2/3 UP. [WARNING] (730596) : Health check for backup server mysql/controller-1.internalapi.redhat.local succeeded, reason: Layer7 check passed, code: 200, check duration: 89ms, status: 3/3 UP. This seems to be a systemic issue on the environment, which is consuming a lot of sys time (from 10 to 30 sys time in top). We have identified that under this situation, the galera service itself is working, but the healthcheck are incorrectly parsed by HAProxy, due to what seems to be a race condition in socket closure. Version-Release number of selected component (if applicable): How reproducible: Happens very often. Steps to Reproduce: 1. Deploy a VM control plane with FIPS enabled Actual results: Some health check fails in HAProxy logs, which might impact availability of the mysql service sporadically. Expected results: No healthcheck should fail on the FIPS-enabled env because the galera service is working fine. Additional info:
Raising severity due to the blocker+
Verified on CI https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/DFG-all-unified-17.1_d-rhel-vhost-3cont_2comp-ipv4-vxlan-lvm-fips/48/ with openstack-tripleo-heat-templates-14.3.1-1.20230519151010.el9ost.noarch puppet-tripleo-14.2.3-1.20230517011019.el9ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2023:4577
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days