Description of problem: The issue happens during the FFU procedure, while the Overcloud FFU Upgrade stage is performed. The topology used is HA-no-ceph (3 computes and 2 controllers) The script overcloud_upgrade_run-controller-0.sh upgrades the first controller node to osp16.1, disables controller-1 and controller-2 and created a pacemaker single resource cluster with controller-0. Meanwhile, both compute nodes are running osp13. Workload was created before starting the upgrade procedure. The corresponding VM instance can be successfully reached. At this moment (with only controller-0 upgraded), new workload is created successfully, but there is no connectivity with its FIP. Findings: 1) ovn_metadata_agent is unhealthy 38b5aa21f39c 192.168.24.1:8787/rh-osbs/rhosp13-openstack-neutron-metadata-agent-ovn:20200730.1 "dumb-init --singl..." 23 hours ago Up 7 minutes (unhealthy) ovn_metadata_agent 2) neutron-haproxy-ovnmeta sidecar container is not created for the new workload (only a sidecar container exists for the old workload) The issue is reproduced on a BM server: panther23.lab.eng.tlv2.redhat.com The issue can be reproduced from the undercloud node at panther23 by running /home/stack/workload_launch.sh (new created workload can be removed running /home/stack/workload_launch.sh cleanup). Version-Release number of selected component (if applicable): FFU upgrade performed from osp13: 2020-08-05.1 to osp16.1: RHOS-16.1-RHEL-8-20200821.n.0 How reproducible: 100% - several OVN FFU jobs reproduced it, both with and without DVR enabled. https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/DFG-upgrades-ffu-ffu-upgrade-13-16.1_director-rhel-virthost-3cont_2comp-ipv6-geneve-HA-no-ceph-ovn-dvr/ https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/upgrades/view/ffu/job/DFG-upgrades-ffu-ffu-upgrade-13-16.1_director-rhel-virthost-3cont_2comp-ipv6-geneve-HA-no-ceph-ovn/ Steps to Reproduce: 1. Run an OVN FFU job like the ones pasted above. The issue is reproduced after the first controller node is upgraded, during the Overcloud FFU Upgrade Run stage 2. 3.
This is the same reason as in bug 1871834 - the metadata agent config uses the old VIP while OVN DB is listening on a new one. [root@controller-0 ~]# ss -putnal | grep 6642 tcp LISTEN 0 10 [fd00:fd00:fd00:2000::136]:6642 [::]:* users:(("ovsdb-server",pid=672283,fd=14)) [root@compute-1 ~]# grep ovn_sb_connection /var/lib/config-data/puppet-generated/neutron/etc/neutron/plugins/networking-ovn/networking-ovn-metadata-agent.ini ovn_sb_connection=tcp:[fd00:fd00:fd00:2000::ef]:6642
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.3 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:5413