Bug 2005849

Summary: haproxy cannot connect to mysql (NOSRV)
Product: Red Hat OpenStack Reporter: Eduardo Olivares <eolivare>
Component: python-PyMySQLAssignee: OSP Team <rhos-maint>
Status: CLOSED DUPLICATE QA Contact: nlevinki <nlevinki>
Severity: high Docs Contact:
Priority: high    
Version: 16.1 (Train)CC: apevec, jschluet, lhh, lmiccini, michele
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-30 07:51:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Eduardo Olivares 2021-09-20 10:33:18 UTC
Description of problem:
The following OSP 16.1 update job fails:
https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/DFG-network-networking-ovn-update-16.1_director-rhel-virthost-3cont_2comp_2net-ipv4-geneve-composable/75/

The job updates OSP from 16.1 z6-async-rhbz1999919 (http://download.eng.bos.redhat.com/rcm-guest/puddles/OpenStack/16.1-RHEL-8/z6-async-rhbz1999919/) to 16.1 z7-spin4 (http://download.eng.bos.redhat.com/rcm-guest/puddles/OpenStack/16.1-RHEL-8/z7-spin4/).

The update is performed successfully. After the update, the overcloud nodes are rebooted. Then the following command fails at 14:59:10 (see [1]):
$ source ~/overcloudrc && (openstack flavor delete 200 || true) && openstack flavor create --id 200 --ram 2048 --disk 10 --vcpus 2 guest_image
Gateway Timeout (HTTP 504)\nGateway Timeout (HTTP 504)


According to the logs, the overcloud reboot finished successfully at 14:59:02 (see [2]).


Apparently, the haproxy fails to connect to mysql from 14:54:08 until the job ends at 15:17:55 (see [3]).


According to the pacemaker logs from controller-0 (see [4]), the galera-bundle resource was up and running at 14:57:
Sep 17 14:57:24 controller-0 pacemaker-schedulerd[2985] (pe__print_bundle)  info:  Container bundle set: galera-bundle [cluster.common.tag/rhosp16-openstack-mariadb:pcmklatest]
Sep 17 14:57:24 controller-0 pacemaker-schedulerd[2985] (common_print)  info:    galera-bundle-0    (ocf::heartbeat:galera):    Master controller-0
Sep 17 14:57:24 controller-0 pacemaker-schedulerd[2985] (common_print)  info:    galera-bundle-1    (ocf::heartbeat:galera):    Master controller-1
Sep 17 14:57:24 controller-0 pacemaker-schedulerd[2985] (common_print)  info:    galera-bundle-2    (ocf::heartbeat:galera):    Master controller-2



[1] https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/DFG-network-networking-ovn-update-16.1_director-rhel-virthost-3cont_2comp_2net-ipv4-geneve-composable/75/consoleFull
[2] http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-networking-ovn-update-16.1_director-rhel-virthost-3cont_2comp_2net-ipv4-geneve-composable/75/console_logs/ir-tripleo-overcloud-reboot.log
[3] http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-networking-ovn-update-16.1_director-rhel-virthost-3cont_2comp_2net-ipv4-geneve-composable/75/controller-0/var/log/containers/haproxy/haproxy.log.gz
[4] http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-networking-ovn-update-16.1_director-rhel-virthost-3cont_2comp_2net-ipv4-geneve-composable/75/controller-0/var/log/pacemaker/pacemaker.log.gz


Version-Release number of selected component (if applicable):
z7-spin4

How reproducible:
Only tested once

Steps to Reproduce:
1. run the ovn osp16.1 update job - the failure should happen during the overcloud reboot stage
2.
3.